Subject: Re: Racoon (or UVM?) problem with -current
To: Urban Boquist <urban@boquist.net>
From: Paul Dokas <dokas@cs.umn.edu>
List: current-users
Date: 02/11/2002 14:54:40
On Mon, Jan 14, 2002 at 09:39:52PM +0100, Urban Boquist wrote:
> >>>>> Paul Dokas writes:
>
> Paul> I had a rather strange panic on Friday night. My work x86
> Paul> -current (128.101.AAA.BBB) machine paniced while I was
> Paul> attempting to create an IPSec session between it and my X86
> Paul> -current home machine (66.41.CCC.DDD).
>
> This panic look vaguely similar to those in PR kern/13813 (by Luke
> Mewburn). I personally haven't been able to use ipsec at all for about
> half a year. At the end of the PR you'll see the exact cvs commit that
> started the breakage for me.
I think that I might have found the problem here (and possibly a sol'n).
In /sys/netinet/in_pcb.c, line 438, there is the following bit of code:
------------------------------SNIP-SNIP------------------------------
void
in_pcbdisconnect(v)
void *v;
{
struct inpcb *inp = v;
inp->inp_faddr = zeroin_addr;
inp->inp_fport = 0;
in_pcbstate(inp, INP_BOUND);
if (inp->inp_socket->so_state & SS_NOFDREF)
in_pcbdetach(inp);
#ifdef IPSEC
ipsec_pcbdisconn(inp->inp_sp);
#endif
}
------------------------------SNIP-SNIP------------------------------
From my reading of in_pcbdetach() and ipsec_pcbdisconn(), it seems that
they both end up calling key_freesp() on what I think are the same structures.
I'm pretty sure that they're the same because of how the cache is filled.
Here's the ipsec_fillpcbcache() from /sys/netinet6/ipsec.c:
------------------------------SNIP-SNIP------------------------------
static int
ipsec_fillpcbcache(pcbsp, m, sp, dir)
struct inpcbpolicy *pcbsp;
struct mbuf *m;
struct secpolicy *sp;
int dir;
{
switch (dir) {
case IPSEC_DIR_INBOUND:
case IPSEC_DIR_OUTBOUND:
break;
default:
return EINVAL;
}
#ifdef DIAGNOSTIC
if (dir >= sizeof(pcbsp->cache)/sizeof(pcbsp->cache[0]))
panic("dir too big in ipsec_checkpcbcache");
#endif
if (pcbsp->cache[dir])
key_freesp(pcbsp->cache[dir]);
pcbsp->cache[dir] = NULL;
if (ipsec_setspidx(m, &pcbsp->cacheidx[dir], 1) != 0) {
return EINVAL;
}
pcbsp->cache[dir] = sp;
if (pcbsp->cache[dir]) {
pcbsp->cache[dir]->refcnt++;
KEYDEBUG(KEYDEBUG_IPSEC_STAMP,
printf("DP ipsec_fillpcbcache cause refcnt++:%d SP:%p\n",
pcbsp->cache[dir]->refcnt, pcbsp->cache[dir]));
}
pcbsp->cachegen[dir] = sp_cachegen;
return 0;
}
------------------------------SNIP-SNIP------------------------------
Notice that when a the cache line is filled, it's with an assignment:
pcbsp->cache[dir] = sp;
This is what's then eventually passed to key_freesp(), twice. Once via
in_pcbdetach() and then once via ipsec_pcbdisconn().
Also, all of the stack traces that I've seen seem to revolve around key_freesp()
and/or they seem to indicate that the memory has already been freed.
I haven't tested a fix yet, but I suspect that it might be as simple as not
calling key_freesp() from ipsec_fillpcbcache(), ipsec_invalpcbcache() or any
of the other places that it's called from. See the patches mentioned in
kern/13813.
Could someone more familiar with this section of code please take a look and
see if I'm anywhere near right on this?
Paul
--
Paul Dokas dokas@cs.umn.edu
======================================================================
Don Juan Matus: "an enigma wrapped in mystery wrapped in a tortilla."