Port-macppc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: TSTILE lockups on disk dump/backups



Following up….

I am trying to fix a problem with passwordless ssh between my 8.2 systems and some older Mac systems. In doing testing, I started a dump script, and then killed the dumps. I ended up with a tstile process - the gzip. I don’t know if the dumps wer stuck, but they died as desired, and the gzip appears to be stuck.vmstat -s is below, with ps -axl.

-dgl-

> On Dec 21, 2024, at 9:21 AM, Martin Husemann <martin%duskware.de@localhost> wrote:
> 
> On Sat, Dec 21, 2024 at 02:20:23AM -0600, Don Lee wrote:
>> Any thought about how I should deal with this? My kernel debugging
>> skills are weak. Is there anything I can do to produce useful debug
>> information next time this happens?
> 
> A tstile just means the process/kernel thread is blocked for a longer
> time waiting for "something". In this context that would typically be
> either a disk buffer ("buf") read from disk (the source you are backing
> up) or a network buffer ("mbuf") handed to the network device output
> queue.
> 
> This would point to a bug in the disk driver or the network driver.
> 
> The other option would be your system runing out of kernel address space,
> so it can not allocate either type of buffer. This would show up in 
> the output of vmstat -s (or you are running against limits for either
> type of buffer, but I would expect that to fail differently).
> 
> Can you start new processes in the state when it hangs? Like the vmstat -s 
> mentioned above, or "crash" to look at kernel state? I am not sure "crash"
> would work on macppc and 9.x good enough.
> 
> Martin

+mercy$ ps -axl
 UID   PID  PPID  CPU PRI NI   VSZ   RSS WCHAN   STAT TTY       TIME COMMAND
   0     0     0    0 125  0     0 67804 uvm     DKl  ?     96:15.96 [system]
   0     1     0    0  85  0  6644   284 wait    Is   ?      0:04.78 init 
   0    38     1    0  85  0  6532  1808 nanoslp I    ?      0:00.97 /usr/libexec/getty std.9600 ttyE1 
   0   172     1    0  85  0 11928  2424 kqueue  Ss   ?      3:46.75 /usr/sbin/syslogd 
   0   202     1    0  85  0 31920 15264 kqueue  Is   ?      3:10.08 /usr/pkg/sbin/named 
   0   237     1  142  85  0  6624  1568 select  Is   ?      0:00.00 /usr/sbin/lpd -s 
   0   373     1    0  85  0 11832 10872 pause   Ss   ?      1:53.24 /usr/sbin/ntpd -p /var/run/ntpd.pid 
   0   378     1 1826  85  0 18528  3048 select  Is   ?      2:31.59 sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups (sshd)
   0   462     1    0  85  0  6532   732 nanoslp I    ?      0:01.18 /usr/libexec/getty std.9600 ttyE2 
   0   560     1    0  85  0 28428 10672 kqueue  Is   ?      0:00.95 /usr/pkg/sbin/lwresd 
   0   575     1    0  85  0  6528  1656 kqueue  Is   ?      0:00.19 /usr/sbin/inetd -l 
   0   606     1    0  85  0 17136  2740 kqueue  Ss   ?      0:22.73 /usr/libexec/postfix/master -w 
  12   635   606    0  85  0 16744  4912 kqueue  I    ?      0:02.88 qmgr -l -t unix -u 
   0   648     1    0  85  0  7276  1752 nanoslp Ss   ?      0:04.53 /usr/sbin/cron 
   0   786     1  228  85  0  6092    40 ttyraw  I    ?      0:00.00 /usr/libexec/getty std.38400 ttyZ0 
   0   805     1    0  85  0  6532  1808 nanoslp I    ?      0:00.67 /usr/libexec/getty std.9600 ttyE3 
 999   981 24032    0  85  0 98912 11400 semwait S    ?      0:00.60 /usr/pkg/sbin/httpd -k start 
1043  1754 15863    0  85  0 19780  5272 select  S    ?      0:03.16 sshd: screview 
   0  4206 27475  679  85  0  6128  1548 nanoslp I    ?      0:00.00 sleep 150 
  12  4681   606    0  85  0 17008  4916 kqueue  I    ?      0:00.03 proxymap -t unix -u 
 999  5475 24032    0  85  0 99136 11808 semwait S    ?      0:00.57 /usr/pkg/sbin/httpd -k start 
1001  6977 21060    0  85  0 19872  5284 select  S    ?      0:00.48 sshd: donlee@pts/2 (sshd)
 999  7535 24032    0  85  0 98752 10756 kqueue  S    ?      0:00.08 /usr/pkg/sbin/httpd -k start 
 999  8921 24032    0  85  0 99104 11420 semwait S    ?      0:00.54 /usr/pkg/sbin/httpd -k start 
1001  8972 27821    0  85  0 19776  5284 select  I    ?      0:00.50 sshd: donlee@pts/0 (sshd)
  12 11260   606    0  85  0 16984  4848 kqueue  S    ?      0:00.03 pickup -l -t unix -u 
 999 14429 24032  967  85  0 36192  4280 socket  I    ?      0:00.07 /usr/pkg/sbin/httpd -k start 
   0 15863   378   22  85  0 19212  7340 select  Is   ?      0:00.25 sshd: screview [priv] 
   0 15888   378  960  85  0 19232  7424 select  Is   ?      0:00.36 sshd: donlee [priv] 
  12 16463   606    0  85  0 17816  5412 kqueue  I    ?      0:00.03 smtpd -n smtp -t inet -u -o stress= -s 2 
1001 16576 18110    0  85  0 19872  5280 select  I    ?      0:00.33 sshd: donlee@pts/3 (sshd)
   0 18110   378 4580  84  0 19328  7440 select  Is   ?      0:00.37 sshd: donlee [priv] 
  12 18451   606    0  85  0 16976  4832 kqueue  S    ?      0:00.03 anvil -l -t unix -u 
 999 18588 24032    0  85  0 98608 10588 semwait S    ?      0:00.08 /usr/pkg/sbin/httpd -k start 
   0 21060   378 4284  84  0 19328  7440 select  Is   ?      0:00.38 sshd: donlee [priv] 
1001 22743 15888    0  85  0 19776  5280 select  I    ?      0:00.79 sshd: donlee@pts/1 (sshd)
 999 23402 24032    0  85  0 98752 10820 semwait S    ?      0:00.27 /usr/pkg/sbin/httpd -k start 
   0 24032     1    0  85  0 97904 22004 select  Ss   ?      0:03.20 /usr/pkg/sbin/httpd -k start 
   0 27475     1  679  85  0  6788  1840 pause   I    ?      0:01.78 /bin/ksh /usr/local/apache/admin/watch_ap 
   0 27821   378  263  85  0 19232  7428 select  Is   ?      0:00.37 sshd: donlee [priv] 
 999 29660 24032    0  85  0 98752 10716 semwait S    ?      0:00.31 /usr/pkg/sbin/httpd -k start 
1001  5918  8972    0  85  0  7092  1924 ttyraw  Is+  pts/0  0:00.03 -ksh 
   0    80     1    0  76 19 40640 30044 pause   IN+  pts/1  0:00.44 dump -t -0 -u -f - -L usr0-241229 -a /usr 
   0  3132 10445    0  85  0  5768  1280 pipe_rd I+   pts/1  0:00.01 tee -a /var/log/dump.log 
1001  5899 22743    0  85  0  7128  1936 pause   Is   pts/1  0:00.02 -ksh 
   0  7387 10445    0  85  0  6780  1376 pause   I+   pts/1  0:00.01 /bin/ksh ./dodumps 
   0 10445 16554    0  85  0  6772  1836 pause   I+   pts/1  0:00.01 /bin/ksh ./dodumps 
   0 16554  5899    0  85  0  7048  1928 pause   I    pts/1  0:00.33 ksh 
   0 21288  7387    0 117  0  7144  2108 tstile  D+   pts/1  0:10.81 gzip -c 
   0 29716 10445    0  85  0  8508  2788 pipe_rd I+   pts/1  0:00.01 mail -s mercy weekly remote backup for 241229 root 
1001 21766  6977    0  85  0  7092  1924 pause   Is   pts/2  0:00.03 -ksh 
   0 26577 29993    0  43  0  6896  1680 -       O+   pts/2  0:00.01 ps -axl 
   0 29993 21766    0  85  0  7048  1924 pause   S    pts/2  0:00.20 ksh 
   0  8984 12914    0  85  0  7048  1924 ttyraw  I+   pts/3  0:00.19 ksh 
1001 12914 16576  113  85  0  7092  1924 pause   Is   pts/3  0:00.02 -ksh 
   0   694     1    0  85  0  6284  1800 ttyraw  Is+  ttyE0  0:00.01 /usr/libexec/getty std.9600 ttyE0 
+mercy$ ^C -axl
+mercy$ vmstat -s
     4096 bytes per page
        1 page color
   250844 pages managed
     2222 pages free
   140041 pages active
    69363 pages inactive
        0 pages paging
     2730 pages wired
        0 zero pages
        1 reserve pagedaemon pages
        5 reserve kernel pages
     4915 boot kernel pages
    35578 kernel pool pages
    23598 anonymous pages
   164759 cached file pages
    23777 cached executable pages
      256 minimum free pages
      341 target free pages
    83614 maximum wired pages
        1 swap devices
  1048697 swap pages
        0 swap pages in use
        0 swap allocations
225500305 total faults taken
228149211 traps
156287160 device interrupts
 50546946 CPU context switches
 63050311 software interrupts
        0 system calls
        0 pagein requests
        0 pageout requests
        0 pages swapped in
        0 pages swapped out
   392258 forks total
     9727 forks blocked parent
     9727 forks shared address space with parent
        0 pagealloc zero wanted and avail
 32805864 pagealloc zero wanted and not avail
        0 aborts of idle page zeroing
 80964508 pagealloc desired color avail
        0 pagealloc desired color not avail
 80964508 pagealloc local cpu avail
        0 pagealloc local cpu not avail
        0 faults with no memory
        0 faults with no anons
        0 faults had to wait on pages
        0 faults found released page
    15090 faults relock (15090 ok)
 52280209 anon page faults
        0 anon retry faults
 29181270 amap copy faults
 29921894 neighbour anon page faults
421666821 neighbour object page faults
137180816 locked pager get faults
    14978 unlocked pager get faults
 30782693 anon faults
 21465518 anon copy on write faults
118058653 object faults
 19122163 promote copy faults
 32802454 promote zero fill faults
     5143 times daemon wokeup
     5143 revolutions of the clock hand
  5724141 pages freed by daemon
  6745369 pages scanned by daemon
        0 anonymous pages scanned by daemon
  5724141 object pages scanned by daemon
    33115 pages reactivated
      233 pages found busy by daemon
        0 total pending pageouts
  6899800 pages deactivated
314091251 total name lookups
287646312 good hits
 20255856 negative hits
     4097 bad hits
        0 false hits
  5508051 miss
   676935 too long
  3818493 pass2 hits
  3889436 2passes
          cache hits (91% pos + 6% neg) system 1% per-process
          deletions 0%, falsehits 0%, toolong 0%

+mercy$ 



Home | Main Index | Thread Index | Old Index