> On Sep 15, 2021, at 5:53 AM, Dima Veselov <kab00m%lich.phys.spbu.ru@localhost> wrote: >> > > BLOCKED CAUGHT IGNORED > 0 44ab 98489000 > Do I understand correctly that there is no blocked signals for this process? Yes, there are no blocked signals... It catches: HUP INT ILL ABRT FPE SEGV TERM It ignores: PIPE URG CHLD IO WINCH INFO PWR So perhaps the handler does something and does not exit? > > I have compared fstat output of a working process and of hanged one. > The only difference is one LDAP connection which was changed (possibly > reconnected), this can be related or may be not. > > What kind of another clue the output can tell? > > radiusd radiusd 25118 wd /export 3924481 drwxr-xr-x 512 r radiusd radiusd 25118 0 / 106352 crw-rw-rw- null rw > radiusd radiusd 25118 1 /var 1695507 -rw-r----- 401724344 w radiusd radiusd 25118 2 /var 1695507 -rw-r----- 401724344 w radiusd radiusd 25118 3* kqueue pending 0 > radiusd radiusd 25118 4* crypto 0xffffe715032aa4d0 > radiusd radiusd 25118 5 / 106013 -rw-r--r-- 92 r radiusd radiusd 25118 6 /var 1695507 -rw-r----- 401724344 w radiusd radiusd 25118 7 /var 1695517 -rw-r----- 10828283 rw > radiusd radiusd 25118 8 / 106013 -rw-r--r-- 92 r radiusd radiusd 25118 9* internet stream tcp central.:postgresql <-> almaz.:59270 > radiusd radiusd 25118 10* internet stream tcp steel.:ldap <-> almaz.:59264 > radiusd radiusd 25118 11* internet stream tcp steel.:ldap <-> almaz.:59263 > radiusd radiusd 25118 12* internet stream tcp central.:postgresql <-> almaz.:63159 > radiusd radiusd 25118 13* internet stream tcp central.:postgresql <-> almaz.:59262 > radiusd radiusd 25118 14* internet stream tcp central.:postgresql <-> almaz.:59234 > radiusd radiusd 25118 19* pipe 0xffffe71416359510 <- 0xffffe714e88e1530 rn > radiusd radiusd 25118 20* pipe 0xffffe714e88e1530 -> 0xffffe71416359510 wn > radiusd radiusd 25118 21* internet dgram udp *:radius > radiusd radiusd 25118 22* internet dgram udp *:radius-acct > radiusd radiusd 25118 23* internet6 dgram udp *:radius > radiusd radiusd 25118 24* internet6 dgram udp *:radius-acct > radiusd radiusd 25118 25* internet dgram udp localhost:18120 > radiusd radiusd 25118 26* internet dgram udp localhost:18121 > radiusd radiusd 25118 27* internet dgram udp *:51200 > radiusd radiusd 25118 28* internet6 dgram udp *:65238 > radiusd radiusd 25118 31* internet stream tcp steel.:ldap <-> almaz.:60645 > > FD #3 "kqueue pending" was there when it was working as well. > > One more thing I do not understand - radiusd had never been caught hanging > if run in foreground. Is this kind of a clue? Yes, the question is what happened to fd#3 (presumably the kqueue). If you can get into the debugger (gdb <radiusd> <pid>) and look at the kqueue call and see what fd is passed to it? christos
Attachment:
signature.asc
Description: Message signed with OpenPGP