tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
How to keep the kernel from crashing on cd9660 error ?
Hi,
i could need advise about getnewvnode(9) and how to revoke
the creation of the vnode.
While testing my next change proposal for stability with
undigestible ISO 9660 files, i experienced kernel crashes which
look like memory corruption.
To prove that my changes are not to blame, i installed a little
error generator in the current cd9660_vfsops.c, at the place
where my new code will throw EOPNOTSUPP because of an undigestible
file.
It triggers the same crash as the real error complaint in my
changed code. So the problem already sits in cd9660.
I could possibly fake an ISO image which would trigger an error
condition that is already in function cd9660_vget_internal() and
very near to the spot where my test causes havoc.
So this could be a DoS attack path.
---------------------------------------------------------------
What happens in cd9660_vget_internal() is about this:
- Input is the inode number.
- Shortcut is tried for cached vnode. No problem if it triggers.
- getnewvnode() obtains a new vnode.
(It is needed at latest, when the directory record of the desired
ino number shall be read. So this creation cannot be delayed after
the error situation which is triggered by that record.)
- pool_get() obtains a iso_node for the new vnode.
- Obviously a check for race condition is made. (No problem.)
- Several operations are done which have the potential to cause
an error. Most of them do in this case
vput(vp);
if (bp != 0)
brelse(bp, 0);
return (E...);
So wanted i. But that seems to be a bad idea.
My mock-up in current cd9660_vfsops.c throws an error with every
third VOP_LOOKUP(9) or VOP_VGET(9) call.
It survives the first such error occasion and crashes on the
second occasion.
---------------------------------------------------------------
--- cd9660_vfsops.c.patch_006 2014-06-01 13:16:27.000000000 +0000
+++ cd9660_vfsops.c 2014-06-03 15:47:32.000000000 +0000
@@ -858,6 +858,19 @@ cd9660_vget_internal(struct mount *mp, i
break;
}
+/* <<< Error mock-up */
+{ static uint64_t error_cycler = 0;
+ error_cycler++;
+ if ((error_cycler % 3) == 0) {
+ printf("cd9660_vfsops.c: Deliberate error EOPNOTSUPP\n");
+ vput(vp);
+ if (bp != 0)
+ brelse(bp, 0);
+ return (EOPNOTSUPP);
+ }
+}
+
+
if (bp != 0)
brelse(bp, 0);
---------------------------------------------------------------
(I am aware there is a resource leak about iso_node.)
With this kernel booted, i do
netbsd# mount_cd9660 '/dev/wd1f' '/mnt/iso'
netbsd# ls -l /mnt/iso
ls: my: Operation not supported
total 8
dr-x------ 1 thomas wheel 2048 May 3 14:58 dev
dr-x------ 1 thomas wheel 2048 Jan 19 14:41 reg
-r-------- 1 thomas dbus 6 May 6 15:34 small_file
netbsd# ls -l /mnt/iso
This yields crash and reboot.
netbsd# crash
crash> dmesg
...
cd9660_vfsops.c: Deliberate error EOPNOTSUPP
panic: kernel diagnostic assertion "(*vpp)->v_size != VSIZENOTSET &&
(*vpp)->v_writesize != VSIZENOTSET" failed: file
"/usr/src/sys/kern/vnode_if.c", line 124
cpu0: Begin traceback...
vpanic(c2f5d840,c26a8800,2,0,daabdf68,c093e072,c2fb4d40,ffffff9c,bb90a3f4,0)
at netbsd:vpanic+0x120
cpu0: End traceback...
I only see the message of the first occasion. The second one did
not come through. But i am quite sure a second one happened.
At least i had to "ls -l" my bad file two times, before i began
to worsen the situation by adding diagnostic code.
---------------------------------------------------------------
What makes me think of memory corruption:
- Varying last screams in crash command "dmesg", when i tried to hunt
down the problem in my changed code.
- Unplausible code paths. E.g. above KASSERT in
/usr/src/sys/kern/vnode_if.c
is supposed to get in effect with error == 0, but triggers
only if cd9660 is supposed to have returned error != 0.
- Symptoms getting worse if i insert printf() to trace the
upward propagation of the error return value.
It crashes already on the first error occasion and with more
dramatic messages in crash's dmesg:
uvm_fault(0xc2a2a920, 0, 2) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 2 eip c02579d1 cs 8 eflags 10282 cr2 1c ilevel 0 esp
c0943d3d
curlwp 0xc2fb5d40 pid 815 lid 1 lowest kstack 0xda96b2c0
panic: trap
cpu0: Begin traceback...
uvm_fault(0xc2a2a920, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c029e324 cs 8 eflags 10246 cr2 6 ilevel 0 esp 0
curlwp 0xc2fb5d40 pid 815 lid 1 lowest kstack 0xda96b2c0
Skipping crash dump on recursive panic
panic: trap
Faulted in mid-traceback; aborting...
---------------------------------------------------------------
My question is: How i shall repair this function, so that it
can revoke the creation of the vnode in case of errors which
tell that the vnode will be unusable or worse.
(The actual test object is a data file with two sections.
The first is not aligned to block size. So VOP_BMAP(9) cannot
neatly map file blocks to partition blocks.
Debian 6 GNU/Linux tolerates such a file but shows wrong
content, partly from a different data file.)
Have a nice day :)
Thomas
Home |
Main Index |
Thread Index |
Old Index