Subject: filesystem bug?
To: None <tech-kern@netbsd.org, tech-userlevel@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-userlevel
Date: 12/01/2004 17:38:19
There's a machine at $DAYJOB that's running 2.0_RC4 (the latest I could
find when I went to install it).  It's got a filesystem that seems to
have got roached (possibly because of softdeps - it was mounted with
softdeps when the problem started).

Thing is, despite the filesystem being corrupted, fsck is happy with it
and the kernel mostly is too.

The filesystem is mounted on /dumps.

The first odd thing is that a directory is missing its . and .. entries:

# ls -ld /dumps/slot19
drwxr-xr-x  2 backup        512 Dec  1 16:50 /dumps/slot19
# ls -la /dumps/slot19
total 0
# 

This was noticed because rmdir says ENOTEMPTY:

# rmdir /dumps/slot19
rmdir: /dumps/slot19: Directory not empty
# 

even though ls shows nothing.

The really interesting thing is the contents of the directory:

# cat /dumps/slot19 | hexdump -C
00000000  38 d3 42 00 40 d3 42 00  48 d3 42 00 50 d3 42 00  |8.B.@.B.H.B.P.B.|
00000010  58 d3 42 00 60 d3 42 00  68 d3 42 00 70 d3 42 00  |X.B.`.B.h.B.p.B.|
00000020  78 d3 42 00 80 d3 42 00  88 d3 42 00 90 d3 42 00  |x.B...B...B...B.|
00000030  98 d3 42 00 a0 d3 42 00  a8 d3 42 00 b0 d3 42 00  |..B...B...B...B.|
00000040  b8 d3 42 00 c0 d3 42 00  c8 d3 42 00 d0 d3 42 00  |..B...B...B...B.|
00000050  d8 d3 42 00 e0 d3 42 00  e8 d3 42 00 f0 d3 42 00  |..B...B...B...B.|
00000060  f8 d3 42 00 00 d4 42 00  08 d4 42 00 10 d4 42 00  |..B...B...B...B.|
00000070  18 d4 42 00 20 d4 42 00  28 d4 42 00 30 d4 42 00  |..B. .B.(.B.0.B.|
00000080  38 d4 42 00 40 d4 42 00  48 d4 42 00 50 d4 42 00  |8.B.@.B.H.B.P.B.|
00000090  58 d4 42 00 70 d7 42 00  78 d7 42 00 80 d7 42 00  |X.B.p.B.x.B...B.|
000000a0  88 d7 42 00 00 00 00 00  00 00 00 00 00 00 00 00  |..B.............|
000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200
# 

I have no idea how it got that way, and it looks almost like the
beginning of an indirect block instead of a directory's contents - but
shouldn't fsck have fixed it?  This is after running fsck - it found
four orphaned files, which I told it to toast; I then ran fsck again,
to be safe, and it found nothing wrong at all.  (And yes, I used -f.)

For the moment I'm leaving these curious objects intact in case someone
can think of something worth doing with them, but they will probably
have to be toasted within a day or so - this is a "we want it in
production yesterday" machine.

There are actually four directories in this anomalous state.  They all
act the same to ls (nothing shown, even with -a), but their contents
are different:

# cat /dumps/slot20 | hexdump -C
00000000  00 00 00 00 60 f7 ea c1  c0 d2 42 00 00 00 00 00  |....`.....B.....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200
# cat /dumps/slot23 | hexdump -C
00000000  00 80 ee cd fc 90 f3 c1  2e 00 00 00 02 00 00 00  |................|
00000010  3c 01 04 02 2e 2e 00 00  05 00 00 00 10 00 08 04  |<...............|
00000020  69 6e 66 6f 00 f0 31 cc  00 d6 99 00 10 00 04 06  |info..1.........|
00000030  73 6c 6f 74 30 31 00 cc  00 28 5f 00 10 00 04 06  |slot01...(_.....|
00000040  73 6c 6f 74 30 32 00 cc  00 bc 59 01 10 00 04 06  |slot02....Y.....|
00000050  73 6c 6f 74 30 33 00 cc  00 f8 82 01 10 00 04 06  |slot03..........|
00000060  73 6c 6f 74 30 34 00 cc  00 20 e2 01 10 00 04 06  |slot04... ......|
00000070  73 6c 6f 74 30 35 00 cc  00 a4 55 00 10 00 04 06  |slot05....U.....|
00000080  73 6c 6f 74 30 36 00 cc  00 76 80 00 10 00 04 06  |slot06...v......|
00000090  73 6c 6f 74 30 37 00 cc  00 ec 00 01 10 00 04 06  |slot07..........|
000000a0  73 6c 6f 74 30 38 00 cc  00 66 5a 00 10 00 04 06  |slot08...fZ.....|
000000b0  73 6c 6f 74 30 39 00 cc  00 86 71 01 10 00 04 06  |slot09....q.....|
000000c0  73 6c 6f 74 31 30 00 cc  00 ca ad 01 10 00 04 06  |slot10..........|
000000d0  73 6c 6f 74 31 31 00 cc  00 66 25 01 10 00 04 06  |slot11...f%.....|
000000e0  73 6c 6f 74 31 32 00 cc  00 d6 64 01 10 00 04 06  |slot12....d.....|
000000f0  73 6c 6f 74 31 33 00 cc  00 3a 22 01 10 00 04 06  |slot13...:".....|
00000100  73 6c 6f 74 31 34 00 cc  00 1c a8 00 10 00 04 06  |slot14..........|
00000110  73 6c 6f 74 31 35 00 cc  00 e6 8a 01 10 00 04 06  |slot15..........|
00000120  73 6c 6f 74 31 36 00 cc  00 d0 ee 01 10 00 04 06  |slot16..........|
00000130  73 6c 6f 74 31 37 00 cc  00 ce 1c 02 10 00 04 06  |slot17..........|
00000140  73 6c 6f 74 31 38 00 cc  00 42 00 02 10 00 04 06  |slot18...B......|
00000150  73 6c 6f 74 31 39 00 cc  00 d8 01 02 30 00 04 06  |slot19......0...|
00000160  73 6c 6f 74 32 30 00 cc  00 46 0e 00 10 00 04 06  |slot20...F......|
00000170  73 6c 6f 74 32 31 00 cc  00 f2 41 01 10 00 04 06  |slot21....A.....|
00000180  73 6c 6f 74 32 32 00 cc  00 4a 13 02 60 00 04 06  |slot22...J..`...|
00000190  73 6c 6f 74 32 33 00 cc  00 4a 7d 00 10 00 04 06  |slot23...J}.....|
000001a0  73 6c 6f 74 32 34 00 cc  00 16 fd 01 10 00 04 06  |slot24..........|
000001b0  73 6c 6f 74 32 35 00 cc  00 16 67 00 10 00 04 06  |slot25....g.....|
000001c0  73 6c 6f 74 32 36 00 cc  00 90 8b 00 10 00 04 06  |slot26..........|
000001d0  73 6c 6f 74 32 37 00 cc  00 96 97 01 10 00 04 06  |slot27..........|
000001e0  73 6c 6f 74 32 38 00 cc  00 bc 24 02 18 00 04 06  |slot28....$.....|
000001f0  73 6c 6f 74 32 39 00 cc  00 00 00 00 00 00 00 00  |slot29..........|
00000200
# cat /dumps/slot29 | hexdump -C
00000000  00 a0 ee cd fc 90 f3 c1  bf 5d b3 98 af 23 bf 4e  |.........]...#.N|
00000010  bf 98 df 49 fe 89 d6 c5  fc 71 f2 2f fb da 62 fe  |...I.....q./..b.|
00000020  7a f2 0f 26 2c e6 cf 93  ff 3f cb 17 f3 4f 90 9f  |z..&,....?...O..|
00000030  55 be 98 ff 0a f9 82 fb  99 3f 2a 55 14 32 2d 6a  |U........?*U.2-j|
00000040  5e fc 6f c9 12 49 9f 16  74 5c 9c a4 d1 13 c5 c7  |^.o..I..t\......|
00000050  4b 1a 7d d0 d2 a5 92 46  8f b3 6c 99 a4 d1 af 24  |K.}....F..l....$|
00000060  24 48 1a 3d 4a 62 a2 a4  d1 97 24 25 49 fa b4 a0  |$H.=Jb....$%I...|
00000070  97 2f 97 74 ac 58 31 39  59 d2 73 82 4e 49 91 34  |./.t.X19Y.s.NI.4|
00000080  6a 7d 6a aa a4 51 df d3  d2 24 8d 9a 9e 9e 2e e9  |j}j..Q...$......|
00000090  d3 82 ce c8 90 34 6a 77  66 a6 a4 51 af b3 b2 24  |.....4jwf..Q...$|
000000a0  8d 1a 9d 9d 2d 69 d4 65  b5 5a d2 a8 c5 39 39 92  |....-i.e.Z...99.|
000000b0  46 fd cd cd 95 34 6a 6e  5e 9e a4 51 67 f3 f3 25  |F....4jn^..Qg..%|
000000c0  8d da 5a 50 20 69 d4 53  8d 46 d2 a8 a1 5a ad a4  |..ZP i.S.F...Z..|
000000d0  51 37 0b 0b 25 8d 5a 59  54 24 69 d4 c7 e2 62 49  |Q7..%.ZYT$i...bI|
000000e0  a3 26 96 94 48 1a 75 50  a7 93 34 6a 5f 69 a9 a4  |.&..H.uP..4j_i..|
000000f0  51 ef ca ca 24 8d 1a 57  5e 2e 69 d4 35 bd 5e d2  |Q...$..W^.i.5.^.|
00000100  a8 65 06 83 a4 51 bf 2a  2a 24 8d 9a 55 59 29 69  |.e...Q.**$..UY)i|
00000110  d4 a9 aa 2a 49 a3 36 55  57 4b 1a f5 a8 a6 46 d2  |...*I.6UWK....F.|
00000120  a8 41 b5 b5 92 46 dd a9  ab 93 34 6a 4d 7d bd a4  |.A...F....4jM}..|
00000130  51 5f 1a 1a 24 8d 9a d2  d8 28 69 d4 91 a6 26 49  |Q_..$....(i...&I|
00000140  a3 76 34 37 4b 1a f5 a2  a5 45 d2 a8 11 ad ad 92  |.v47K....E......|
00000150  46 5d 30 1a 25 8d 5a d0  d6 26 69 e0 7f 7b bb a4  |F]0.%.Z..&i..{..|
00000160  81 f9 1d 1d 92 06 ce 77  76 4a 1a d8 de d5 25 69  |.......wvJ....%i|
00000170  e0 79 77 b7 a4 81 e1 3d  3d 92 06 6e f7 f6 4a 1a  |.yw....==..n..J.|
00000180  58 6d 32 49 1a f8 dc d7  27 69 60 72 7f bf a4 81  |Xm2I....'i`r....|
00000190  c3 66 b3 a4 81 bd 16 8b  a4 81 b7 03 03 92 06 c6  |.f..............|
000001a0  0e 0e 4a 1a b8 6a b5 4a  1a 58 6a b3 49 1a f8 39  |..J..j.J.Xj.I..9|
000001b0  34 24 69 60 e6 f0 b0 a4  81 93 76 bb a4 81 8d 0e  |4$i`......v.....|
000001c0  87 a4 81 87 23 23 92 06  06 3a 9d 92 06 ee b9 5c  |....##...:.....\|
000001d0  92 06 d6 b9 dd 92 06 be  8d 8e 4a 1a 98 36 36 26  |..........J..66&|
000001e0  69 e0 98 c7 23 69 60 97  d7 2b 69 e0 95 cf 27 69  |i...#i`..+i...'i|
000001f0  60 94 df 2f 69 e0 52 20  20 69 60 d1 c4 84 a4 81  |`../i.R  i`.....|
00000200
# 

This is i386.  uname shows 2.0_RC4, saying the kernel is

autobuild@tgm.netbsd.org:/autobuild/netbsd-2-0/i386/OBJ/autobuild/netbsd-2-0/src/sys/arch/i386/compile/GENERIC

with a timestamp of "Sun Oct 17 19:11:36 UTC 2004".

It seems to me that at the very least there's a bug in fsck; it should
have griped about missing . and/or .. in those directories.

The filesystem is unfortunately quite large (>1TB), so it is not
practical to give anyone a dd image of it.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B