Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Machine livelock with latest (4.99.48) kernel on sparc64 -- mem leak?
Rafal Boni wrote:
Rafal Boni wrote:
Rafal Boni wrote:
I just rebooted my trusty Netra T1 with a shiny new 4.99.48 kernel and
thought I'd kick off a userland build. Things seemed to go swimmingly
for a few minutes, then the machine ground to an un-usable state --
userland seems to be mostly non-responsive, though the machine is
pingable, answers a ^T at a tty (well, it seems to be wedged harder
now.. it did for a while after the apparent lockup), and the disk sounds
like progress is being made on the build.
But, I can't get any echo from a tty anymore, and god forbid I should
want to log in ;)
Anyone seeing anything similar? Should I go back to the last-known-good
kernel for a while? ;)
Machine is a Netra T1 200 -- UltraSPARC-IIe @ 500 MHz with 512MB RAM.
So I thought I'd give it one more try, and I saw the same thing happen
this time with a kernel build (thought I'd see if I maybe there was
something else in the latest CVS that would help).
The machine locked up ~ 18:01; it's now 2+ hours later and the disk is
still chugging along. Here's the last thing 'top' on the console said
before the hang:
load averages: 4.95, 4.71, 3.82 up 0 days, 13:48
18:01:34
29 processes: 1 runnable, 27 sleeping, 1 on processor
CPU states: 0.0% user, 0.0% nice, 8.1% system, 3.4% interrupt, 88.5%
idle
Memory: 184K Act, 336K Inact, 6096K Wired, 128K Exec, 328K File, 304K Free
Swap: 2050M Total, 36M Used, 2014M Free
Unless top's reporting is just way off (it didn't seem to be at the
start), there's a sucking memory leak somewhere -- where'd the other 500
MB of memory go?
DDB's ps/l (as well as backtrace) also shows an interesting fact -- the
active LWP is the system idle loop every time I'd ended up in DDB due to
this hang.
vmstat seems to confirm this is due to some memory-related condition...
stats below are samples every 2 seconds (I'm impatient ;)). This has
happened every time I've started off a more significant build on this
box running 4.99.48 -- be it just a kernel build or an attempt to build
the whole system.
Here's idle vmstat shortly after the machine booted:
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr m0 c0 in sy cs us sy id
0 0 0 17904 450064 595 0 0 0 0 0 0 0 152 503 123 1 5 94
0 0 0 17912 450056 5 0 0 0 0 0 0 0 112 17 47 0 0 100
0 0 0 17912 450056 4 0 0 0 0 0 0 0 105 11 35 0 0 100
Build kicked off:
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr m0 c0 in sy cs us sy id
0 0 0 24680 439448 2423 0 0 0 0 0 0 0 153 918 222 2 13 85
2 0 0 24320 430968 10055 0 0 0 0 0 0 0 134 6054 389 42 51 6
0 0 0 30928 417976 6342 0 0 0 0 0 0 0 183 3184 404 35 34 31
2 0 0 31560 410648 7866 0 0 0 0 0 0 0 170 5527 420 39 46 15
1 0 0 32240 403224 7973 0 0 0 0 0 0 0 175 5742 408 40 49 10
1 0 0 32520 396464 7351 0 0 0 0 0 0 0 186 5428 430 38 45 17
2 0 0 32496 389896 8045 0 0 0 0 0 0 0 173 5937 399 41 46 13
0 0 0 32704 384400 5585 0 0 0 0 0 0 0 217 4269 510 35 37 28
Now systems starts to get less and less usable (memory dropping, CPU
mostly idle, disk is making lots of noise; where did all my processes go?):
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr m0 c0 in sy cs us sy id
4 0 0 36960 20360 8355 0 0 0 0 0 0 0 169 5331 375 51 48 0
4 0 0 38008 12432 8077 0 0 0 0 0 0 0 176 5338 379 52 47 1
2 0 0 38224 5632 7746 0 0 0 0 0 0 0 177 5092 378 50 45 5
2 0 0 24152 1200 8160 1 0 36 127 128 0 0 188 5353 404 48 48 4
2 0 0 22912 680 7761 0 0 152 495 495 0 0 200 5074 383 53 45 2
3 0 0 22416 1056 7672 33 0 265 337 369 0 0 222 5027 405 60 37 3
3 0 0 21944 792 7643 61 0 234 421 482 0 0 222 5013 407 50 44 6
1 0 0 21360 1240 8619 86 23 293 471 958 0 0 279 5476 508 50 44 6
0 0 0 21720 1048 3845 215 151 231 374 1913 0 0 506 2352 827 22 23 55
0 0 0 19928 992 2626 240 202 253 336 1758 0 0 630 1367 1000 10
19 71
0 0 0 20384 360 964 174 167 225 290 1535 0 0 598 267 871 9 12 80
0 0 0 21832 360 747 172 166 231 309 1562 0 0 620 255 908 4 8 88
0 0 0 20696 792 1154 130 181 186 335 1061 0 0 684 453 1000 2 12 86
0 0 0 21240 432 839 122 161 123 268 976 0 0 721 285 1061 2 11 88
0 0 0 21000 544 659 170 151 86 258 1135 0 0 654 142 980 1 8 91
0 0 0 21024 424 733 199 182 175 282 1496 0 0 673 188 1029 2 7 91
0 0 0 20968 480 588 175 133 103 253 1290 0 0 651 101 975 0 4 95
0 0 0 21040 416 657 162 146 100 277 1517 0 0 702 132 1026 1 7 92
0 0 0 20864 376 643 50 141 83 257 786 0 0 664 101 995 0 4 96
0 0 0 20256 272 867 201 187 140 340 1628 0 0 821 174 1231 1 9 90
0 0 0 20832 248 523 283 203 158 254 2141 0 0 675 90 1048 1 4 95
0 0 0 21000 368 637 101 151 100 282 1275 0 0 737 96 1088 1 7 92
0 0 0 20720 416 695 200 149 104 288 1586 0 0 723 139 1116 1 7 92
0 0 0 20704 600 726 124 149 90 318 912 0 0 724 135 1075 1 5 93
0 0 0 20232 608 812 186 157 151 318 844 0 0 755 211 1157 3 7 90
0 0 0 20632 336 684 128 118 81 269 656 0 0 709 128 1070 0 6 93
0 0 0 20040 416 733 222 158 150 303 1115 0 0 677 154 1057 3 7 91
0 0 0 20320 360 604 242 183 120 295 1172 0 0 763 73 1136 2 7 91
Here's final fun spike of frentic VM activity before I decided to kill
the system due to lack of response:
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr m0 c0 in sy cs us sy id
0 0 0 18800 264 6396 2067 2173 701 3327 18417 0 0 8465 85 13536 0
9 91
0 0 0 18928 272 4386 1494 1490 431 2268 11808 0 0 5835 49 9153 0
9 91
0 0 0 18768 352 3475 1081 1195 319 1789 8276 0 0 4664 50 7081 0
8 92
0 0 0 18600 312 5580 1710 2177 658 2846 22349 0 0 7802 94 12409 0
8 92
0 0 0 18720 240 4348 1410 1692 484 2249 14998 0 0 5995 95 9471 0
8 92
0 0 0 18440 392 5522 1706 2164 660 2812 19205 0 0 7753 86 11928 0
10 90
0 0 0 18608 320 6945 2180 2706 818 3560 25290 0 0 9940 103 15098 0
9 91
0 0 0 18608 344 6709 1999 2567 838 3462 23361 0 0 9417 107 14569 0
9 91
0 0 0 18648 288 3505 1202 1400 386 1781 12668 0 0 4868 42 7655 0
9 91
The free list never topped 350K after the system cratered.
Home |
Main Index |
Thread Index |
Old Index