Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: 5.99.42/i386 crash (backtrace + core available)
On Sat, Jan 08, 2011 at 11:16:19PM +0000, David Holland wrote:
> On Tue, Dec 28, 2010 at 05:37:43PM +0100, Dennis den Brok wrote:
> > rw_abort()
> > rw_vector_enter(df829668, ...)
> ^^^^^^^^
> > genfs_lock()
> > layer_bypass()
> > VOP_LOCK(e15c2170,2)
> ^^^^^^^^
> > vclean()
> > getcleanvnode()
> > getnewvnode()
> > ffs_vget()
> > ufs_lookup()
> > VOP_LOOKUP(df8295c8,...)
> ^^^^^^^^
>
> Unfortunately most of the things visible in the stack trace are vnode
> op argument structures and not pointers to anything interesting.
> However, since rw_vector_enter is passed &vp->v_lock, I think we can
> tentatively conclude that it's trying to lock the same vnode that was
> passed to VOP_LOOKUP, and it's failing because that's quite properly
> already locked.
>
> It looks like what happened is that ffs went to get a fresh vnode and
> got a not-recently-used nullfs vnode. However, the nullfs vnode turned
> out to be the nullfs vnode sitting on top of the ffs vnode it was
> already working with. Since these share locks now, the vnode was
> locked even though not recently used (and on the list to be cleaned
> and all that), and in fact it turned out to be the same ffs vnode this
> process was already working on, so trying to lock it for cleaning blew
> up.
>
> So this seems like fallout from Juergen's layer locking cleanup from a
> few months ago. Not sure what the proper solution is, though.
While the analysis looks ok I don't think layer locking cleanup is the
reason. Before the cleanup locks were shared so getcleanvnode() would
use the same lock without layer_bypass().
Dennis, to be sure you could build kernel/userland from somewhere in
september 2010 - this is after all my changes to vnode locking.
--
Juergen Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig
(Germany)
Home |
Main Index |
Thread Index |
Old Index