NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/46896: iSCSI initiator ccb_pool gets corrupted
>Number: 46896
>Category: kern
>Synopsis: iSCSI initiator ccb_pool gets corrupted
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Sep 03 20:40:00 +0000 2012
>Originator: Michael L. Hitch
>Release: NetBSD 6.0_RC1 as of 19-Aug-2012
>Organization:
Montana State University
>Environment:
System: NetBSD net5.msu.montana.edu 6.0_RC1 NetBSD 6.0_RC1 (XEN3_DOM0) #43: Sun
Sep 2 20:19:33 MDT 2012
mhitch%net8.msu.montana.edu@localhost:/home/mhitch/NetBSD-6/OBJ/amd64/home/mhitch/NetBSD-6/src/sys/arch/amd64/compile/XEN3_DOM0
amd64
Architecture: x86_64
Machine: amd64
>Description:
After updating to 6.0_RC1, I started a XEN DOMU kernel using an iSCSI
disk. I'm fairly certain that I had been able to run this for some time
previously (netbsd-6 tree as of 24-May). Shortly after starting the
DOMU
kernel, the iSCSI initiator started reporting no ccbs:
Aug 30 00:20:11 net5 /netbsd: S2C1: No CCB in run_xfer
Aug 30 00:20:11 net5 /netbsd: sd1(iscsi0:0:0:0): adapter resource
shortage
Aug 30 00:20:12 net5 /netbsd: S2C1: No CCB in run_xfer
Aug 30 00:20:12 net5 /netbsd: sd1(iscsi0:0:0:0): adapter resource
shortage
I'm running a 6.0_RC1 XEN3_DOM0 kernel (with the iscsi initiator added
to the kernel config, since xen kernels won't load modules), and an i386
XEN3 DOMU running cacti (lots and lots of disk updates).
After writing a quick kernel groveler to extract information from the
various iSCSI initiator tables, I found that indeed, the ccb_pool
head for the session showed it was empty. Dumping out the contents of
all the ccbs seemed to indicate they were all free, just no longer on
the
free list.
Session 0xffffa00002945000: id=2
ccb_pool 0x0000000000000000:0xffffa0000294c588 ccb_throttled
0x0000000000000000
ccb[ 0] 0xffffa00002945208 next 0xffffa0000294d3f8 status 0 disp 0 ITT
80000200
...
ccb[55] 0xffffa0000294c378 next 0xffffa0000294c168 status 0 disp 0 ITT
49000237
ccb[56] 0xffffa0000294c588 next 0x0000000000000000 status 0 disp 0 ITT
89000238
ccb[57] 0xffffa0000294c798 next 0xffffa0000294c588 status 0 disp 0 ITT
87000239
I was not able to see anything obvious in changes to sys/dev/iscsi
source
that might have caused this. I then added the ccbs_waiting queue
header,
and noted that when this condition occurs, the tail entry of the header
pointed to the ccb_pool - certainly not correct.
This leads me to suspect that removing ccbs from ccbs_waiting and
adding them to the free pool has some trouble. From looking at the
code, it looks to me like a ccb on the ccb_waiting queue is passed to
wake_ccb(), which removes it from the ccb_waiting queue. However, there
appears to be no protection of something else from getting the same ccb
on the ccbs_waiting queue and calling wake_ccb(). The first caller
wins,
removing the ccb from ccbs_waiting and adding it to ccb_pool. The
second
caller now tries to remove the same ccb from ccbs_waiting and adding it
to ccb_pool with nasty results. I'm now working on seeing if this is
indeed the case (adding some debug code to check and print information
if it detects this occuring).
>How-To-Repeat:
I suspect this problem is relatively rare, and needs something similar
to my above described setup to get enough random activity with the iSCSI
target to duplicate.
>Fix:
If the problem is multiple processing of a ccb on the ccbs_waiting
queue,
try to prevent that from happening, or at least prevent it from
clobbering
the ccb_pool and ccbs_waiting queues.
Home |
Main Index |
Thread Index |
Old Index