NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/52515: percpu'ed data (e.g. psref_cpu) can cause panic when percpu_cpu_enlarge() run
>Number: 52515
>Category: kern
>Synopsis: percpu'ed data (e.g. psref_cpu) can cause panic when percpu_cpu_enlarge() run
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Aug 31 10:50:00 +0000 2017
>Originator: Kengo NAKAHARA
>Release: -current and -8(potentially)
>Organization:
Internet Initiative Japan
>Environment:
>Description:
I tested IPsec heavy loading test on -current kernel(later than Augest 22).
That is like following test.
+ setup
- setup gif(4) interface between DUT1 and DUT2
- proctect the gif(4) with IPsec transport mode
+ network load
- sending packets(about 10000pps) over the gif/IPsec
+ ioctl load
- repeat "ifconfig gifX tunnel src dst" and "ifconfig gifX deletetunnel"
- add many security associates(SAs) and security policys(SPs)
I met a panic, that is below back trace.
====================
uvm_fault(0xffffffff81554c80, 0xffff800011fb1000, 2) -> e
fatal page fault in supervisor mode
trap type 6 code 0x2 rip 0xffffffff809c1681 cs 0x8 rflags 0x10246 cr2 0xffff800011fb1410 ilevel 0x4 rsp 0xffffe401109ddbb0
curlwp 0xffffe4027d918180 pid 4440.1 lowest kstack 0xffffe401109da2c0
kernel: page fault trap, code=0
Stopped in pid 4440.1 (ifconfig) at netbsd:psref_release+0x8a: movq %rdx,0(%rcx)
db{4}> bt
psref_release() at netbsd:psref_release+0x8a
doifioctl() at netbsd:doifioctl+0x815
soo_ioctl() at netbsd:soo_ioctl+0x2b5
sys_ioctl() at netbsd:sys_ioctl+0x101
syscall() at netbsd:syscall+0x1ed
--- syscall (number 54) ---
7806c3f175aa:
====================
As a result of my analysis, the reason is following.
[1] psref_acquire() is called (by doifioctl() in this case):
[1-1] get "pcpu" pointer by percpu_getref().
[1-2] insert a element to "pcpu->pcpu_head" list.
# that is, the inserted element has a pointer to "pcpu->pcpu_head"
# as "le_prev".
[2] percpu_cpu_enlarge() can be called:
# percpu_cpu_enlarge() is called when percpu data region become starved
# of memory.
# After MP-ify IPsec, each SA and SP has localcount(9), that is,
# localcount_init() => percpu_alloc() is called for each SA and SP.
# That can cause a lack of percpu data region easily...
[2-1] percpu_cpu_enlarge() allocate new larger memory region.
[2-2] copy old percpu data to new memory region.
[2-3] free old memory region.
[3] psref_release() is called (by doifioctl() in this case):
[3-1] try to remove the element added by psref_acquire() of [1-2].
[3-2] reference "le_prev" pointer of the element.
[3-3] the "le_prev" pointer still points *old* percpu data region.
that is already freed by [2-3] of percpu_cpu_enlarge()!
Yes, this problem is caused by not only psref but also the component who uses
percpu as struct which can be pointed by non-percpu data.
e.g. ipforward_rt_percpu in ip_input.c
# Actually, I met other panic caused by ipforward_rt_percpu when debugging.
>How-To-Repeat:
do heavy load IPsec test like the test in full description.
>Fix:
I am investigating in detail and checking all component which calls
percpu_alloc().
I think it may fix to let the back pointer to percpu data change from
raw pointer to percpu_t pointer. Yes, check all of the percpu struct and
modify problematic ones...
Does anyone have better idea?
Home |
Main Index |
Thread Index |
Old Index