Subject: bin/24453: dump(8) hangs
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <apb@cequrux.com>
List: netbsd-bugs
Date: 02/17/2004 12:18:02
>Number: 24453
>Category: bin
>Synopsis: dump(8) hangs
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Feb 17 10:20:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator: Alan Barrett
>Release: NetBSD 1.6ZK
>Organization:
Not much
>Environment:
Architecture: i386
Machine: i386
NetBSD-1.6ZK, kernel built from sources checked out from CVS on 2004-02-15,
userland built from sources about a week older than that.
>Description:
dump(8) often just hangs. I have marked this PR critical/high
because it is preventing backups from being made.
>How-To-Repeat:
The problem is intermittent; small file systems seem to dump fine most
of the time, while large file systems seem to exhibit the problem much
more often.
Here's an example of a failure.
/dev/raid0i is an FFSv1+softdep file system on a RAID1 disk. I have not
run "fsck_ffs -c 4" on this file system. Here is some of the output from
dumpfs:
file system: /dev/rraid0i
endian little-endian
magic 11954 (UFS1) time Tue Feb 17 10:40:42 2004
cylgrp dynamic inodes 4.4BSD sblock FFSv1 fslevel 3
nbfree 98352 ndir 119718 nifree 281416 nffree 112751
ncg 51 ncyl 16384 size 4194304 blocks 4128607
bsize 16384 shift 14 mask 0xffffc000
fsize 2048 shift 11 mask 0xfffff800
frag 8 shift 3 fsbtodb 2
cpg 324 bpg 10368 fpg 82944 ipg 20352
minfree 5% optim space maxcontig 4 maxbpg 4096
rotdelay 0ms headswitch 0us trackseek 0us rps 60
ntrak 8 nsect 128 npsect 128 spc 1024
symlinklen 60 trackskew 0 interleave 1 contigsumsize 4
maxfilesize 0x000400400402ffff
nindir 4096 inopb 128 nspf 4
avgfilesize 16384 avgfpdir 64
sblkno 8 cblkno 16 iblkno 24 dblkno 1296
sbsize 2048 cgsize 16384 offset 32 mask 0xfffffff8
csaddr 1296 cssize 2048 shift 10 mask 0xfffffc00
cgrotor 0 fmod 0 ronly 0 clean 0x02
flags soft-updates
fsmnt /r0i
When I run "dump 0ushf 1048576 0 - /dev/rraid0i | dd bs=1m progress=1
of=/dev/null", I get the following output within the first 10 minutes
(interspersed with the dots from "dd progress=1", not shown).
DUMP: Found /dev/rraid0i on /r0i in /etc/fstab
DUMP: Date of this level 0 dump: Tue Feb 17 10:07:30 2004
DUMP: Date of last level 0 dump: the epoch
DUMP: Dumping /dev/rraid0i (/r0i) to standard output
DUMP: Label: none
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 6566086 tape blocks.
DUMP: Volume 1 started at: Tue Feb 17 10:07:41 2004
DUMP: dumping (Pass III) [directories]
DUMP: 4.73% done, finished in 1:40
DUMP: dumping (Pass IV) [regular files]
DUMP: 18.61% done, finished in 0:43
About two minutes after the message about "18.61% done", the dots from
"dd progress=1" stop being printed. Ten minutes later, "dd" still
hasn't printed another dot. At this point, pressing control-T causes dd
and dump to report the following:
0+145365 records in
0+145365 records out
1488537600 bytes transferred in 1413.092 secs (1053390 bytes/sec)
22.14% done at 1036 KB/s, finished in 1:22
Also around this time, "pstree -ap | grep dump" shows the following:
| | `-dump,9699) 0ushf 1048576 0 - /dev/rraid0i
| | `-dump,8577) 0ushf 1048576 0 - /dev/rraid0i
| | |-dump,21473) 0ushf 1048576 0 - /dev/rraid0i
| | |-dump,4474) 0ushf 1048576 0 - /dev/rraid0i
| | `-dump,19390) 0ushf 1048576 0 - /dev/rraid0i
and "ps -axlww | grep dump | grep -v grep" shows the following:
0 4474 8577 27 18 0 672 4 pause SW+ pd 0:45.36 dump 0ushf 1048576 0 - /dev/rraid0i
0 8577 9699 0 2 0 728 4 netio SW+ pd 0:15.62 dump 0ushf 1048576 0 - /dev/rraid0i
0 9699 14660 26 10 0 672 4 wait SW+ pd 0:03.09 dump 0ushf 1048576 0 - /dev/rraid0i
0 19390 8577 30 18 0 672 4 pause SW+ pd 0:44.49 dump 0ushf 1048576 0 - /dev/rraid0i
0 21473 8577 23 18 0 672 4 pause SW+ pd 0:45.03 dump 0ushf 1048576 0 - /dev/rraid0i
I tried to use "ktruss -d -i -p 9699" to see what was happening, but
ktruss printed nothing for more than an hour. Eventually, I pressed
control-T again, and got the following output from ktruss:
SIGINFO caught handler=0x804d974 mask=0x0 code=0x0
8577 dump gettimeofday(0xbfbfee18, 0) = 0
8577 dump emul(netbsd)
8577 dump write(0x2, 0xbfbfee80, 0x2a) = 42
"22.14% done at 229 KB/s, finished in 6:11\n"
8577 dump compat_16___sigreturn14(0xbfbfef2c) JUSTRETURN
along with the following from dd and dump:
0+145365 records in
0+145365 records out
1488537600 bytes transferred in 6348.986 secs (234452 bytes/sec)
22.14% done at 229 KB/s, finished in 6:11
>Fix:
Not known.
>Release-Note:
>Audit-Trail:
>Unformatted: