Subject: Re: Problems trying to debug pkgsrc/mail/milter-greylist
To: Chris Ross <cross+netbsd@distal.com>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: port-sparc64
Date: 11/21/2007 14:13:33
--KsGdsel6WgEHnImy
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Wed, Nov 21, 2007 at 10:17:38AM -0500, Chris Ross wrote:
>=20
> [ Martin Husemann suggested I contact you after I sent this message =20
> to the port-sparc64 list.
> Let me know anything you can help with. I'm just trying to even =20
> figure out how to see this
> work at all. -]
>=20
>=20
> Hi there. I have a sparc64 running 4.0_RC3, and I built pkgsrc/=20
> mail/milter-greylist. I notice that it sometimes just dies. I =20
> upgraded the pkgsrc tree to the current release (4.0, vs the 3.0 =20
> that's in pkgsrc currently), but it seems to fail in about the same way.
>=20
> I was seeing multiple problems. Right now, however, I'm wondering =20
> if somethings wrong somewhere related to threading. All of the =20
> errors seem to have a backtrace that ends:
>=20
> #7 0x00000000405137cc in pthread_join () from /usr/lib/libpthread.so.0
> #8 0x0000000040ba7fc0 in _lwp_makecontext () from /usr/lib/libc.so.12
> #9 0x0000000040ba7fc0 in _lwp_makecontext () from /usr/lib/libc.so.12
> Previous frame identical to this frame (corrupt stack?)
> (gdb)
I think that backtrace is actually ok. I think it'd indicitive of a=20
problem in phread_join().
What are the other parts of the trace showing?
> The exact cause for the crash varies, but. I'm not an expert on =20
> using gdb to debug threaded programs by any means, but was wondering:
>=20
> 1) The resolver in NetBSD 4 is BIND 9, so definitely thread-safe, =20
> right?
Yes. However...
> 2) Are gdb or libpthreads on sparc64 known to have any problems?
s/ on sparc64//
Yes.
When we imported the most-recent gdb, the threading support never got=20
added. So gdb in NetBSD 4.0 (and -current, actually) doesn't cope with=20
threaded programs. Which is really lame.
Are you running on an SMP system w/ an SMP kernel? libpthread in 4.0 also=
=20
has issues with concurrency.
We actually have a branch, wrstuden-fixsa, which is dedicated to fixing=20
the libpthread and Scheduler Activations issues in 4.0. I think it's=20
caught up with NetBSD-4.0_RC3. Feel free to try it. There also have been=20
fixes to gdb on it, and the new one (for i386 at least) actually shows=20
threads. I'm not sure if that's been pulled over to sparc64 or not.
> 3) Anyone have any good pointers to "how to debug a threaded =20
> program with gdb" ?
If gdb were working well, there are a few main classes of threading=20
issues. One is locking and the other is not locking. :-)
Locking issues usually either lead to live-lock or deadlock. At the app=20
level, it usually ends up deadlock. That's where thread 1 locks A and=20
tries to lock B, and thread 2 locks B and tries to lock A. Each waits for=
=20
the other, which is waiting for it, so we wait forever.
Live-locking is the same thing but with spinlocks. An application should=20
never actually see that, since the only spinlocks are actually in=20
libpthread. i.e. a spinlock issue really is a libpthread issue.
Not-locking issues are data corruption - i.e. something stomped on the=20
data we were working on.
If all the crashes are in pthread_join(), then it's probably some form of=
=20
libpthread problem.
> milter-greylist doesn't have many threads running ever. It just =20
> spawns off new threads for synchronization (which I'm not using), and =20
> dumping of data. One for reading the config file too, I think, but I =20
> suspect that's not happening repeatedly.
>=20
> Anyway. Thanks...
Please let me know more about the backtraces. I actually fixed a number of=
=20
concurrency issues in libpthread-SA recently.
Take care,
Bill
--KsGdsel6WgEHnImy
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (NetBSD)
iD4DBQFHRK2MWz+3JHUci9cRAjNXAJdi1prTB+eM9pchVh/tqaRCzN19AJ9TMQY3
2Br2UbelMA7sEe0V/Wu/oQ==
=dbCM
-----END PGP SIGNATURE-----
--KsGdsel6WgEHnImy--