Subject: kern/26803: sigexit() has no barrier for other LWPs
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <thorpej@shagadelic.org>
List: netbsd-bugs
Date: 08/29/2004 15:16:52
>Number: 26803
>Category: kern
>Synopsis: sigexit() has no barrier for other LWPs
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Aug 29 22:16:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator: Jason R Thorpe
>Release: NetBSD 2.0G
>Organization:
-- Jason R. Thorpe <thorpej@shagadelic.org>
>Environment:
System: NetBSD yeah-baby.shagadelic.org 2.0G NetBSD 2.0G (YEAH-BABY-XP) #26: Thu Jul 15 08:26:49 PDT 2004 thorpej@yeah-baby.shagadelic.org:/u1/netbsd/src/sys/arch/i386/compile/YEAH-BABY-XP i386
Architecture: i386
Machine: i386
>Description:
sigexit() has a flaw for multi-threaded programs: while it
sets a userret hook to suspend other LWPs, it doesn't wait
for them to actually suspend.
This means that other LWPs for the process that might be
sleeping in the kernel may wake up and modify the process's
address space while the core dump is taking place.
Another issue (which even has an XXX in the code) is that
other LWPs that might be running in userpace on other
processors don't get jolted into the kernel to suspend
themselves; there is simply no code to do this.
I believe the lack of barrier has something to do with
corrupted core files being dumped by a multi-threaded
application I am working with that performs a lot of
mmap / write (thus modifies the process's VM map and
sleeps a lot while doing it).
>How-To-Repeat:
I will work on a simple test case to show the problematic
behavior.
>Fix:
Unknown.
>Release-Note:
>Audit-Trail:
>Unformatted: