Subject: Re: kern/24829
To: Chuck Silvers <chuq@chuq.com>
From: Christos Zoulas <christos@zoulas.com>
List: netbsd-bugs
Date: 11/09/2005 12:28:19
On Nov 9, 8:55am, chuq@chuq.com (Chuck Silvers) wrote:
-- Subject: Re: kern/24829
| On Tue, Nov 08, 2005 at 10:00:47AM +0100, Jarle Greipsland wrote:
| > Jarle Greipsland <jarle@uninett.no> writes:
| > > It is still there. On a 3.0_BETA kernel from around October
| > > 20th, I still got the same panic. It is not the same system that
| > > for which the original problem report was filed, but the current
| > > system is also a quad-cpu i386-family system. Console log below.
| > > Please let me know if there is any other information you want me
| > > to try and gather.
| > Some more data. I briefly looked in the apache web server logs,
| > and noticed some pthread-releated messages. I don't know whether
| > they are related to the panic or not. Log messages below.
| >
| > -jarle
| >
| > [Tue Nov 08 08:27:39 2005] [warn] Init: Session Cache is not configured [hint: SSLSessionCache]
| > [Tue Nov 08 08:27:42 2005] [notice] Digest: generating secret for digest authentication ...
| > [Tue Nov 08 08:27:42 2005] [notice] Digest: done
| > [Tue Nov 08 08:27:42 2005] [notice] Apache/2.0.55 (Unix) mod_ssl/2.0.55 OpenSSL/0.9.7d DAV/2 configured -- resuming normal operations
| > [Tue Nov 08 08:27:47 2005] [notice] child pid 19659 exit signal Segmentation fault (11)
| > assertion "unreachable" failed: file "/usr/src/lib/libpthread/pthread.c", line 622, function "pthread_exit"
| > [Tue Nov 08 08:27:59 2005] [notice] child pid 5309 exit signal Abort trap (6)
|
| these "unreachable" assertions should be fixed with my recent changes to
| libpthread. those have been applied to -current and the 3.x branch so far,
| are you running with the latest libpthread?
|
|
| > assertion "target->pt_state != PT_STATE_RUNNING || target->pt_blockgen != target->pt_unblockgen" failed: file "/usr/src/lib/libpthread/pthread_sig.c", line 812, function "pthread__kill"
|
| this is because the libpthread code does not yet support running with
| PTHREAD_CONCURRENCY > 1, as evidenced by the comment right before the
| assertion:
|
| /*
| * Ensure the victim is not running.
| * In a MP world, it could be on another processor somewhere.
| *
| * XXX As long as this is uniprocessor, encountering a running
| * target process is a bug.
| */
| pthread__assert(target->pt_state != PT_STATE_RUNNING ||
| target->pt_blockgen != target->pt_unblockgen);
|
|
| I was hoping that at least the kernel would survive this configuration,
| but apparently not. I'll see if I can figure out how to avoid the crash,
| but if we can't fix it very quickly then we should consider disabling
| PTHREAD_CONCURRENCY > 1 for the 3.0 release.
I've done a lot of work with this, and I have this particular test case
almost working. But there is a large number of places in the kernel with
XXX multiprocessor LWPs? Implement me! or equivalent. But definitely
there is no hope of getting PTHREAD_CONCURRENCY working properly for 3.0.
We should document this clearly. I really want to get PTHREAD_CONCURRENCY
work for 4.0... But it is not an easy goal.
cristos