tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Odd pigz behavior on build cluster - libpthread bug?
On Sat, Jun 14, 2014 at 09:20:27AM -0400, Thor Lancelot Simon wrote:
>
> Now we seem to be stuck like this on the master:
>
> 26269 builds 0 0 25M 1372K CPU/18 130:47 99.02% 99.02% nbpigz
> 21425 builds 0 0 25M 1344K CPU/13 130:31 99.02% 99.02% nbpigz
>
> This looks very odd to me. Why has pigz been running so long, and even
> if it's doing something, why's it on so few CPUs -- shouldn't each process
> have a thread per CPU? It doesn't seem to, the load average is only 4.
OK, here's the deal: it's doing something, but, to me it also looks like
it may be exhibiting pathological libpthread behavior of some kind.
26269 1 nbpigz EMUL "netbsd"
26269 1 nbpigz RET ___lwp_park50 -1 errno 3 No such process
26269 1 nbpigz CALL ___lwp_park50(0,0xffffffff,0x7f7ff7e12ac0,0)
26269 1 nbpigz RET ___lwp_park50 -1 errno 3 No such process
26269 1 nbpigz CALL ___lwp_park50(0,0xffffffff,0x7f7ff7e12ac0,0)
26269 1 nbpigz RET ___lwp_park50 -1 errno 3 No such process
26269 1 nbpigz CALL ___lwp_park50(0,0xffffffff,0x7f7ff7e12ac0,0)
26269 1 nbpigz RET ___lwp_park50 -1 errno 3 No such process
26269 1 nbpigz CALL ___lwp_park50(0,0xffffffff,0x7f7ff7e12ac0,0)
26269 1 nbpigz RET ___lwp_park50 -1 errno 3 No such process
26269 1 nbpigz CALL ___lwp_park50(0,0xffffffff,0x7f7ff7e12ac0,0)
26269 1 nbpigz RET ___lwp_park50 -1 errno 3 No such process
26269 1 nbpigz CALL ___lwp_park50(0,0xffffffff,0x7f7ff7e12ac0,0)
26269 1 nbpigz RET ___lwp_park50 -1 errno 3 No such process
[repeat 685,651 times]
26269 2 nbpigz CALL _lwp_unpark(1,0x7f7ff7e12ac0)
26269 2 nbpigz RET _lwp_unpark 0
26269 2 nbpigz CALL ___lwp_park50(0,0,0x7f7ff7b030f8,0x7f7ff7b030f8)
26269 1 nbpigz RET ___lwp_park50 -1 errno 3 No such process
26269 1 nbpigz CALL __sigprocmask14(3,0x7f7fffffc120,0)
26269 1 nbpigz RET __sigprocmask14 0
26269 1 nbpigz CALL getcontext(0x7f7ff6400a88)
26269 1 nbpigz RET getcontext 0
26269 1 nbpigz CALL _lwp_create(0x7f7ff6400a88,0x40,0x7f7ff6400180)
26269 1 nbpigz RET _lwp_create 0
26269 1 nbpigz CALL ___lwp_park50(0,0,0x7f7ff6800030,0x7f7ff6800030)
26269 1 nbpigz RET ___lwp_park50 -1 errno 37 Operation already in
progress
26269 1 nbpigz CALL ___lwp_park50(0,0,0x7f7ff6800030,0x7f7ff6800030)
26269 3 nbpigz CALL _lwp_ctl(1,0x7f7ff6400208)
26269 3 nbpigz RET _lwp_ctl 0
26269 3 nbpigz CALL mmap(0,0x1000,3,0x1002,0xffffffff,0,0)
[mmap more stuff]
26269 3 nbpigz CALL _lwp_unpark(2,0x7f7ff7b030f8)
26269 3 nbpigz RET _lwp_unpark 0
26269 2 nbpigz CSW stop kernel
26269 2 nbpigz CSW resume kernel
26269 2 nbpigz RET ___lwp_park50 0
26269 2 nbpigz CALL write(1,0x7f7ff7747000,0x1127)
26269 3 nbpigz CALL ___lwp_park50(0,0,0x7f7ff7b03098,0x7f7ff7b03098)
26269 2 nbpigz GIO fd 1 wrote 4088 bytes
[after the write, there is a short burst of repeated calls to lwp_self
and sigprocmask, then the cycle seems to resume]
WTF? Ideas most welcome.
Thor
Home |
Main Index |
Thread Index |
Old Index