tech-net archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
panic: "(ln->la_flags & LLE_VALID) != 0" failed
I recently upgraded to netbsd-9, and I've been seeing this panic every
couple days, sometimes more than once a day:
panic: kernel diagnostic assertion "(ln->la_flags & LLE_VALID) != 0" failed: file "/home/riastradh/netbsd/9/src/sys/netinet6/nd6.c", line 2412
This is at:
https://nxr.netbsd.org/xref/src/sys/netinet6/nd6.c#2426
(The line number is slightly different in HEAD, but I think the logic
is essentially the same.)
I suspect what happened is:
1. Thread 0 issued nd6_lookup which:
(a) acquired IF_AFDATA_RLOCK(ifp),
(b) looked up lle and acquired LLE_WLOCK(lle), and then
(c) released IF_AFDATA_RLOCK(ifp); meanwhile,
2. Thread 1 did something which called lltable_unlink_entry without
holding LLE_WLOCK, perhaps llentries_unlink either via
lltable_purge_entries or via lltable_prefix_free ->
htable_prefix_free. lltable_unlink_entry -> htable_unlink_entry
clears LLE_VALID.
3. Thread 0 chokes on the cleared LLE_VALID.
Since thread 0 no longer holds IF_AFDATA_*LOCK, thread 1 can take it
and proceed, and since thread 1 _doesn't need_ LLE_*LOCK, the fact
that thread 0 is holding it doesn't prevent thread 1 from unlinking
lle.
I haven't proven that lltable_purge_entries or lltable_prefix_free
happened at the time of the panic -- perhaps they are a red herring.
Anecdotally the system seems to start dropping packets for a few
seconds before it panics. I'm not the only one who has seen this
symptom. Has anyone dug into this?
The attached patch changes llentries_unlink to acquire LLE_WLOCK
before calling lltable_unlink_entry, and changes lltable_unlink_entry
to assert that the LLE_WLOCK is held before modifying the lle in case
there are other code paths I haven't found that need LLE_WLOCK but
lack it. Haven't tested it yet.
(Unclear whether *_link_entry needs the same treatment -- the two
callers, in_lltable_create and in6_lltable_create, both acquire
LLE_WLOCK immediately after lltable_link_entry but could call it
immediately before, I think.)
Does this sound plausible?
From d5190af30272ef99c07d2e239b4a8def01507055 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Date: Sun, 19 Apr 2020 00:13:02 +0000
Subject: [PATCH] Ensure we hold LLE_WLOCK around unlinking the table entry.
Candidate fix for
panic: kernel diagnostic assertion "(ln->la_flags & LLE_VALID) != 0" failed: file "/home/riastradh/netbsd/9/src/sys/netinet6/nd6.c", line 2412.
---
sys/net/if_llatbl.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/sys/net/if_llatbl.c b/sys/net/if_llatbl.c
index 143f8241e74b..44346c4df562 100644
--- a/sys/net/if_llatbl.c
+++ b/sys/net/if_llatbl.c
@@ -231,6 +231,8 @@ static void
htable_unlink_entry(struct llentry *lle)
{
+ LLE_WLOCK_ASSERT(lle);
+
if ((lle->la_flags & LLE_LINKED) != 0) {
IF_AFDATA_WLOCK_ASSERT(lle->lle_tbl->llt_ifp);
LIST_REMOVE(lle, lle_next);
@@ -303,8 +305,11 @@ llentries_unlink(struct lltable *llt, struct llentries *head)
{
struct llentry *lle, *next;
- LIST_FOREACH_SAFE(lle, head, lle_chain, next)
+ LIST_FOREACH_SAFE(lle, head, lle_chain, next) {
+ LLE_WLOCK(lle);
llt->llt_unlink_entry(lle);
+ LLE_WUNLOCK(lle);
+ }
}
/*
Home |
Main Index |
Thread Index |
Old Index