Subject: Re: Racoon (or UVM?) problem with -current
To: None <itojun@iijlab.net>
From: Paul Dokas <dokas@cs.umn.edu>
List: current-users
Date: 02/19/2002 15:49:03
On Thu, Feb 14, 2002 at 07:38:43AM +0900, itojun@iijlab.net wrote:
> >> On Tue, Feb 12, 2002 at 10:20:08AM -0800, Bill Studenmund wrote:
> >> >
> >> > Another solution is, do the freers check to see if the value is NULL
> >> > before freeing? If so, have the pointer set to NULL after it is freed.
> >> > Then it gets freed only once.
> >>
> >> Looking over the code, I kinda doubt that key_freesp() is being called
> >> twice on the same entry outside of the cache. I'm pretty sure that it's
> >> the cache that's doing it. Also, the pointers to the entry appear to be
> >> copied all over, not just in the cache code. I don't know that it would
> >> be possible to set all of the pointer to NULL, let alone know where they
> >> all are.
> >
> >Then we might need to start doing refcounting.
>
> we do reference-count struct inpcbpolicy. also, I couldn't find the
> problem described in earlier email (pcbsp->cache[x] freed twice).
> key_freesp() is called in ipsec_invalpcbcache(), however, after calling
> key_freesp() pcbsp->cache[x] is set to NULL.
Yes, you're right. From a much more indepth examination of the code, I
don't think that the problem is that the same SP is being freed twice.
The ref. counting appears sound.
However, there *is* a bug in there somewhere, I just intentionally caused my
NetBSD-current machine to panic with this bug. Here's what I've found:
Backtrace:
key_delsp(c070f000,c6e3ec48,c0657000,c03b2100,c06af80c at key_delsp+0x63
key_freesp(c070f000,c6e3ec48,c6e3ed50,c01ef95c) at key_freesp+0x57
ipsec_invalpcbcache(c06af800,1,5857,87640101,c0657300) at ipsec_invalpcbcache+0x44
gcc2_compiled.(c0657000,c006af800,1,1,c0657300) at gcc2_compiled.+0x46
ipsec4_getpolicybysock(c0657000,1,c06ae294,c6e3ed0c,c0657300) at ipsec4_getpolicybysock+0x7e
ipsec4_in_reject_so(c0657000,c06ae294,c0657300,c0657300) at ipsec4_in_reject_so+0x35
rip_input() at rip_input+0x196
icmp_input() at icmp_input+0x449
ip_input() at ip_input+0x668
ipintr() at ipintr+0x6b
Xsoftnet() at Xsoftnet+0x2c
Here's the assembly for key_freesp() as created by 'cc -S ...'
I've marked the place that it's dying with a '<=-'
===========================================================
.section .rodata
.align 32
.LC27:
.ascii "key_delsp: NULL pointer is passed.\12\0"
.LC28:
.ascii "DP delsp calls free SA:%p\12\0"
.text
.align 4
.type key_delsp,@function
key_delsp:
pushl %ebp
movl %esp,%ebp
subl $28,%esp
pushl %edi
pushl %esi
pushl %ebx
movl 8(%ebp),%edi
testl %edi,%edi
jne .L368
addl $-12,%esp
pushl $.LC27
call panic
.align 4
.L368:
movl $0,280(%edi)
cmpl $0,8(%edi)
jg .L367
movl cpl,%eax
movl %eax,-4(%ebp)
orl imask+28,%eax
movl %eax,cpl
#APP
#NO_APP
movl (%edi),%edx
testl %edx,%edx
jne .L390
cmpl $0,4(%edi)
je .L371
jmp .L376
.align 4
.L390:
movl 4(%edi),%eax
movl %eax,4(%edx)
.L376:
movl 4(%edi),%edx
movl (%edi),%eax
movl %eax,(%edx) <=-
.L371:
movl 288(%edi),%esi
movl -4(%ebp),%eax
movl %eax,-8(%ebp)
notl -8(%ebp)
testl %esi,%esi
je .L379
.align 4
.L380:
movl 272(%esi),%edx
testl %edx,%edx
je .L381
movl key_debug_level,%eax
andl $65,%eax
cmpl $65,%eax
jne .L383
addl $-8,%esp
pushl %edx
pushl $.LC28
call printf
addl $16,%esp
.align 4
.L383:
addl $-12,%esp
pushl 272(%esi)
call key_freesav
movl $0,272(%esi)
addl $16,%esp
.L381:
movl (%esi),%ebx
addl $-8,%esp
pushl $95
pushl %esi
call free
movl %ebx,%esi
addl $16,%esp
testl %esi,%esi
jne .L380
.L379:
addl $-12,%esp
pushl %edi
call keydb_delsecpolicy
addl $16,%esp
#APP
#NO_APP
movl -4(%ebp),%eax
movl %eax,cpl
movl ipending,%eax
testl %eax,-8(%ebp)
je .L367
call Xspllower
.L367:
leal -40(%ebp),%esp
popl %ebx
popl %esi
popl %edi
leave
ret
.Lfe10:
.size key_delsp,.Lfe10-key_delsp
.section .rodata
.align 32
.LC29:
.ascii "key_getsp: NULL pointer is passed.\12\0"
===========================================================
Here's the registers after the panic:
ds 0x10
es 0x10
fs 0x10
gs 0x10
edi 0xc070f000 end+0x352ba0
esi 0x01
ebp 0xc6e3eb10 end+0x6a826b0
ebx 0xc070f000 end+0x352ba0
edx 0x01
ecx 0
eax 0xc0676800 end+0x2ba3a0
eip 0xc0225637 key_delsp+0x63
cs 0x8
eflags 0x10286
esp 0xc6e3eae8 end+0x6a82688
ss 0x10
===========================================================
And the contents of memory pointed to by %edi and %eax
0xc070f000: c0676800 1 0 c05ffc00 c0fd600 0
0xc070f018: 0 0 0 0 0 0
0xc0676800: 0xc0773c00 1 1 c05ff880 c05fd600 0
What's interesting is that %edi and %eax appear to point to
secpolicy structs, in which the first 2 long words are le_next
and le_prev pointers from LIST_ENTRY(secpolicy). And, the place
that it's dying appears to be in the middle of the LIST_REMOVE(sp, chain),
found in key_freesp().
So, why is it dying? Well, it looks like *le_prev is 0x01, which is
obviously not a valid pointer (le_prev is a pointer to a pointer).
Perhaps there's a missing LIST_INIT() somewhere? Or worse yet, there's
a buffer overflow somewhere that is overwriting one of the secpolicy structs.
Personally, I'm a little suspicious of a buffer overflow, le_prev is the
*2nd* entry in the structure and the first is not a problem.
One other thing that might be happening here is that perhaps there are
some missing calls to splsoftnet()/splx(), like, for instance, the
call to LIST_INSERT_TAIL() at line 1621 of netkey/key.c, or the call to
LIST_INSERT_HEAD() at line 1998 of netkey/key.c, neither of which are
protected by splsoftnet()/splx().
In fact, there appear to be several more LIST_() operations that are
not splsoftnet()/splx() protected throughout netkey/key.c. Is it not necessary?
I'm just not experienced enough with this code to know, but I would imagine
that it would be a bad thing to be half way through a LIST_FOREACH() that is
not protected by splsoftnet()/splx() and then have LIST_REMOVE() called from
elsewhere.
Paul
--
Paul Dokas dokas@cs.umn.edu
======================================================================
Don Juan Matus: "an enigma wrapped in mystery wrapped in a tortilla."