Subject: Re: Bad response...
To: Simon Burge <simonb@wasabisystems.com>
From: Noriyuki Soda <soda@sra.co.jp>
List: current-users
Date: 08/31/2004 23:17:03
>>>>> On Tue, 31 Aug 2004 23:00:26 +1000,
Simon Burge <simonb@wasabisystems.com> said:
> procs memory page disks faults cpu
> r b w avm fre flt re pi po fr sr wd2 wd3 in sy cs us sy id
> delay from last update is 2.028 seconds
> 10 1 0 1468860 680 851 17694 0 300 6 24659 52 54 755 2448 1260 36 63 0
> 3 1 0 1468908 880 256 16210 1 264 6 17513 39 40 434 1136 406 18 82 0
> 4 1 0 1469180 1360 418 5982 0 194 8 8315 71 74 423 1615 635 34 65 1
> 2 1 0 1470084 400 900 205 0 0 0 1643 103 100 479 2798 1436 81 9 11
> 1 1 0 1469276 1168 1087 24 0 0 0 2014 114 112 493 2776 1544 91 8 1
> 3 1 0 1469576 976 799 19265 0 37 1 20960 98 97 460 2313 1233 75 24 1
> 1 1 0 1469612 1292 367 19966 0 212 121 20751 57 55 376 1416 518 39 61 0
> 3 2 0 1470112 1344 324 9121 22 346 121 14998 59 67 494 1180 600 25 74 2
> 2 1 0 1470072 1356 891 0 55 9 3 1434 92 139 497 2589 1447 81 15 4
> 3 1 0 1471064 1460 775 20 10 275 1 1521 123 121 493 2143 1255 69 28 3
> 2 1 0 1472968 1112 590 186 28 616 40 1080 86 78 590 1840 901 37 63 0
> 1 3 1 1473472 792 231 191 27 222 56 469 63 76 387 491 433 14 80 6
> 4 2 0 1472980 1000 224 460 28 233 55 748 35 47 353 229 223 7 90 3
> 4 2 0 1473000 804 173 202 15 218 65 485 42 44 379 682 333 20 79 1
> 5 2 0 1472940 936 316 206 49 230 55 491 76 58 422 768 500 17 81 2
> 5 1 0 1472936 800 177 190 16 236 50 476 47 30 366 933 350 24 76 0
> 7 1 0 1472928 1060 259 284 3 219 72 596 50 38 341 1011 343 21 78 1
> 4 2 0 1473960 296 298 199 25 208 70 477 61 64 386 586 472 18 74 8
> delay from last update is 1.623 seconds
> 9 1 0 1472968 1164 265 345 11 385 167 913 36 36 539 554 321 9 88 2
> 5 1 0 1472968 932 297 412 24 330 231 981 47 48 521 618 468 15 80 5
> 0 3 0 1473828 320 436 175 27 205 75 463 90 87 421 484 538 22 71 8
> 0 2 0 1473324 1500 206 156 73 178 91 425 58 61 384 496 498 0 60 40
Two problems are observed here at least.
1. unnecessary page-in/out.
Because a page-out usually only happens at anon pages, it seems anon
pages are needlessly considered inactive, and get paged out.
It is better to look at "anonymous pages", "cached file pages" and
"cached executable pages" in "vmstat -s" to see how many pages are
used for those 3 types of pages. (Perhaps "top" should show the size
of anon pages, too.)
If the above guess is right, increasing vm.anonmax (and decreasing
vm.file{max,min}) may help.
If the unnecessary page-in happened at exec pages, increasing
vm.execmax may help, too.
2. excessive cpu load in kernel mode by the page scanner.
Heavy file I/O often causes this since UBC... ;-/
Solaris 8 introduced new cache strategy to avoid this.
See page 17 of the following URL:
http://nordu.dkuug.dk/NordU2000/papers/papers/th16-solaris8.pdf
(There wasn't Solaris 8 at the days when vm.{anon,exec,file}{min,max}
were designed. Solaris used priority paging at that age.
But priority paging was abandoned since Solaris 8.)
I guess Thor may dislike this strategy, because there are some
situations where one wants to give file cache pages higher priority
over anon/exec pages (e.g. on ftp.netbsd.org).
But I think the Solaris 8 strategy is better for most cases
(otherwise Solaris won't choose that strategy...).
Anyway, I think current default of vm.file{min,max} is too high.
Changing vm.file{min,max} from {10,50} to {5,10} (or {2,4} even)
is better for most users, IMHO.
--
soda