NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: FreeRADIUS instability
On Tue, Sep 14, 2021 at 06:08:31PM -0000, Christos Zoulas wrote:
I do not know if this is NetBSD-related, but I suffer from FreeRADIUS
instability on NetBSD for a long time and do not know how to debug this.
Symptoms are: RADIUS server randomly (once a day or once a week) can stop
answering and this is not connected to the actual load. While in that state
it can be killed with -9 only, other signals do nothing, rc.d restart script
just hang.
I have compiled debug version of it and connected gdb:
(gdb) bt
#0 0x000077280da42b8a in _sys___kevent50 () from /usr/lib/libc.so.12
#1 0x000077280e807879 in __kevent50 () from /usr/lib/libpthread.so.1
#2 0x00007728106270e1 in fr_event_loop (el=0x7728105bcb20)
at src/lib/event.c:625
#3 0x00000000004364dd in radius_event_process () at src/main/process.c:6056
#4 0x00000000004466c3 in main (argc=<optimized out>, argv=<optimized out>)
at src/main/radiusd.c:641
gdb always show it is stuck in kevent call. radiusd was started with -txx
meaning no threads were used.
src/lib/event.c:625 says:
rcode = kevent(el->kq, NULL, 0, el->events, FR_EV_MAX_FDS, ts_wake);
It seems kevent call is misused somehow leading to not returning from
this syscall or syscall is blocked. What I can debug further?
Well, it seems that the signals are blocked and this does not have to
do with kevent (probably FreeRADIUS does it explicitly). You can use
ps -p $pid-of-freeradius -o sigmask,sigcatch,sigignore
to see what signals are handled.
BLOCKED CAUGHT IGNORED
0 44ab 98489000
Do I understand correctly that there is no blocked signals for this process?
Now, why kevent is stuck, is a different story. You can use
fstat -p $pid-of-freeradius to see what files it has open; perhaps this
will provide a clue.
I have compared fstat output of a working process and of hanged one.
The only difference is one LDAP connection which was changed (possibly
reconnected), this can be related or may be not.
What kind of another clue the output can tell?
radiusd radiusd 25118 wd /export 3924481 drwxr-xr-x 512 r
radiusd radiusd 25118 0 / 106352 crw-rw-rw- null rw
radiusd radiusd 25118 1 /var 1695507 -rw-r----- 401724344 w
radiusd radiusd 25118 2 /var 1695507 -rw-r----- 401724344 w
radiusd radiusd 25118 3* kqueue pending 0
radiusd radiusd 25118 4* crypto 0xffffe715032aa4d0
radiusd radiusd 25118 5 / 106013 -rw-r--r-- 92 r
radiusd radiusd 25118 6 /var 1695507 -rw-r----- 401724344 w
radiusd radiusd 25118 7 /var 1695517 -rw-r----- 10828283 rw
radiusd radiusd 25118 8 / 106013 -rw-r--r-- 92 r
radiusd radiusd 25118 9* internet stream tcp central.:postgresql <-> almaz.:59270
radiusd radiusd 25118 10* internet stream tcp steel.:ldap <-> almaz.:59264
radiusd radiusd 25118 11* internet stream tcp steel.:ldap <-> almaz.:59263
radiusd radiusd 25118 12* internet stream tcp central.:postgresql <-> almaz.:63159
radiusd radiusd 25118 13* internet stream tcp central.:postgresql <-> almaz.:59262
radiusd radiusd 25118 14* internet stream tcp central.:postgresql <-> almaz.:59234
radiusd radiusd 25118 19* pipe 0xffffe71416359510 <- 0xffffe714e88e1530 rn
radiusd radiusd 25118 20* pipe 0xffffe714e88e1530 -> 0xffffe71416359510 wn
radiusd radiusd 25118 21* internet dgram udp *:radius
radiusd radiusd 25118 22* internet dgram udp *:radius-acct
radiusd radiusd 25118 23* internet6 dgram udp *:radius
radiusd radiusd 25118 24* internet6 dgram udp *:radius-acct
radiusd radiusd 25118 25* internet dgram udp localhost:18120
radiusd radiusd 25118 26* internet dgram udp localhost:18121
radiusd radiusd 25118 27* internet dgram udp *:51200
radiusd radiusd 25118 28* internet6 dgram udp *:65238
radiusd radiusd 25118 31* internet stream tcp steel.:ldap <-> almaz.:60645
FD #3 "kqueue pending" was there when it was working as well.
One more thing I do not understand - radiusd had never been caught hanging
if run in foreground. Is this kind of a clue?
--
Sincerely yours,
Dima Veselov
Physics R&D Establishment of Saint-Petersburg University
Home |
Main Index |
Thread Index |
Old Index