Subject: kern/9206: msdosfs behaving badly
To: None <gnats-bugs@gnats.netbsd.org>
From: Paulo Alexandre Pinto Pires <pappires@ppires.org>
List: netbsd-bugs
Date: 01/16/2000 09:32:18
>Number: 9206
>Category: kern
>Synopsis: writing on msdosfs partitions fails or/and destroys data
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people (Kernel Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jan 16 09:30:00 2000
>Last-Modified:
>Originator: Paulo Alexandre Pinto Pires
>Organization:
COPPE/UFRJ
>Release: 19991116
>Environment:
System: NetBSD mateus.ppires.org 1.4O NetBSD 1.4O (MATEUS-19991117) #0: Wed Nov 17 01:50:25 BRST 1999 pappires@mateus.ppires.org:/usr/src/sys/arch/i386/compile/MATEUS-19991117 i386
MS-DOS FAT32 filesystem in a Quantum FIREBALL ST3.2S Ultra SCSI HD,
Adaptec 2940UW.
>Description:
When writing to msdos filesystem, the system sometimes fails to
do the requested operation or it writes wrong data to files,
directories or even to the file allocation tables. As a result
of no so prolonged use, one can end up with destroyed file contents,
or (as happened to me) with a totally messed file system.
Some more critical and strange failures caused data written to a
file deep in the directory tree to be appended to MSDOS.SYS. It
surely was not a problem with the file system being corrupted
before running NetBSD, because it happened just after the MS-DOS
partition had been formatted.
The problem has shown up to me in different computers, with different
hard disks and interface types (both SCSI and ATA/IDE), and it seems
be be around since at least 1.4K. It may be present in earlier
stages of -current, but it does not shows up in 1.4 release.
At first, it looked like some kind of timing problem, or an effect
of work under high load, so I wrote a program to copy an entire
directory tree, taking care of calling open(2) with O_SYNC|O_DSYNC
and sync(2) after each write(2) or mkdir(2). I also prepared the
program to retry a failed operation up to five times, and send a
warning before each retry. Many operations failed once or twice,
but executed fine after the retry. Then I had to abort the program
because a very crowded sub-directory got created as a file, and
every operation from that point and descending, consequently, failed.
>How-To-Repeat:
Try to copy a big directory tree (or extract from a large and complex
archive) into the msdos volume. The system will either report that
some files or directories could not be created, or such files can
get written wrong or in the wrong place.
>Fix:
The only work around I could think of is booting from 1.4 ins-
tallation floppy or CD. 1.4 seems not to be affected by any of these
symptoms.
>Audit-Trail:
>Unformatted: