Subject: kern/6916: narcoleptic dump(8)
To: None <gnats-bugs@gnats.netbsd.org>
From: None <windsor@warthog.com>
List: netbsd-bugs
Date: 01/30/1999 22:12:18
>Number: 6916
>Category: kern
>Synopsis: My dump/restore went to sleep!
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people (Kernel Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jan 30 20:20:01 1999
>Last-Modified:
>Originator: Rob Windsor
>Organization:
NosePickers Anonymous
>Release: 1.3.3
>Environment:
System: NetBSD evolution 1.3.3 NetBSD 1.3.3 (EVOLUTION) #5: Sat Jan 23 01:00:29 CST 1999 windsor@evolution:/usr/src/sys/arch/sparc/compile/EVOLUTION sparc
>Description:
I was doing a "dump | restore" action from cron (one disk to another)
and dump went to la-la land. The crontab has the following lines
(just so that you understand what is happening):
mount /dev/sd1d /mirror/work
(cd /mirror/work ; dump 0f - /usr | restore -rf - )
Top shows:
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
758 windsor 28 0 972K 120K run 7:25 0.93% 0.93% top
513 root 2 0 788K 68K sleep 1:36 0.05% 0.05% sshd1
855 root 2 0 4196K 48K sleep 2:39 0.00% 0.00% restore
159 root 18 0 20K 8K sleep 0:56 0.00% 0.00% update
857 root 18 -5 600K 32K sleep 0:29 0.00% 0.00% dump
859 root 18 -5 600K 32K sleep 0:28 0.00% 0.00% dump
858 root 18 -5 600K 32K sleep 0:28 0.00% 0.00% dump
205 root 2 0 368K 108K sleep 0:20 0.00% 0.00% sshd1
487 root 2 0 788K 84K sleep 0:15 0.00% 0.00% sshd1
856 root 2 -5 656K 88K sleep 0:12 0.00% 0.00% dump
97 root 10 0 36K 8K sleep 0:11 0.00% 0.00% ipmon
125 root 10 0 64M 0K sleep 0:11 0.00% 0.00% mount_mfs
755 windsor 2 0 108K 20K sleep 0:10 0.00% 0.00% tail
462 root 2 0 788K 68K sleep 0:07 0.00% 0.00% <sshd1>
854 root 10 -5 600K 76K sleep 0:03 0.00% 0.00% <dump>
(after it wedged, I tried to renice the dump processes to wake
them up)
ps -alx shows:
: evolution; ps alx | egrep 'dump|restore|PPID'
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND
0 854 853 93 10 -5 600 76 wait IW< ?? 0:03.55 dump 0f - /us
0 855 853 170 2 0 4196 48 netio I ?? 2:39.19 (restore)
0 856 854 15 2 -5 656 88 netio I< ?? 0:12.62 dump 0f - /us
0 857 856 25 18 -5 600 32 pause I< ?? 0:29.05 dump 0f - /us
0 858 856 28 18 -5 600 32 pause I< ?? 0:28.65 dump 0f - /us
0 859 856 25 18 -5 600 32 pause I< ?? 0:28.42 dump 0f - /us
101 1248 489 8 30 0 84 84 - R+ p1 0:00.08 egrep dump|re
>How-To-Repeat:
hmm. do a "dump | restore" from one disk to another with enough
stuff going on that one of the dump processes page out? I'm not
sure how this happened, so I'm not sure how to repeat it.
>Fix:
nfc
>Audit-Trail:
>Unformatted: