NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/56979: fork(2) fails to be signal safe
The following reply was made to PR lib/56979; it has been noted by GNATS.
From: Tom Lane <tgl%sss.pgh.pa.us@localhost>
To: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/56979: fork(2) fails to be signal safe
Date: Sat, 15 Oct 2022 21:17:37 -0400
Taylor R Campbell <riastradh%NetBSD.org@localhost> writes:
>> From: Tom Lane <tgl%sss.pgh.pa.us@localhost>
>> Didn't take long to find out that there's still a problem. With
>> this patch, it gets past the fork() all right, but there's still
>> a risk of the child process getting stuck on the RTLD lock later:
> Do I understand correctly that this means you're trying to call dlopen
> from a signal handler?
Well, it *was* a signal handler, but once it issues fork() I wouldn't
personally regard it as a signal handler anymore. The child process
is certainly never going to return control to the interrupted code.
The parent process (the Postgres "postmaster") runs with signals blocked
everywhere except this one select() call in its wait loop, so it's safer
than it sounds. The postmaster has been coded like that since the
nineties, and AFAIR this is the first bit of trouble we've had with it.
> I didn't follow exactly what you're doing, but I suspect it would be
> much more reliable to have the signal handler set a flag or write a
> flag to a pipe and cause select(2) to fail with EINTR and process the
> flag
People have been talking about changing the postmaster to not do anything
interesting in signal handlers since the nineties, too, and it's not
gotten done yet. The effort-to-reward ratio is just not very good.
It might happen sometime, but I'm not holding my breath.
>> (BTW, is the RTLD lock business new in v10? I'm surprised that
>> we've not heard field reports of Postgres getting stuck at startup
>> on NetBSD.)
> Yes.
OK, thanks for confirming that. What we've done about this for the
moment is to force linking with -Wl,-z,now on NetBSD, which fixes
this particular problem --- at least, we've not seen it since then
on two different NetBSD test machines that previously did exhibit
the failure intermittently --- and it seems like generally a good
idea anyway.
regards, tom lane
Home |
Main Index |
Thread Index |
Old Index