Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Kernel faults, wapbl updates
Christos Zoulas wrote:
> Just back the all of yesterdays commit out. Just building with -j 8
> and LOCKDEBUG spins out. It trashes the filesystem and then it gets
> another error about not fixing an inode while replaying the log on
> reboot. I.e. the new kernel not only holds a spinlock and crashes,
> but also does not replay the log properly on boot.
Log replay was not touched. The code however didn't record properly
all deallocations even for some finished and committed transactions,
which caused the replay problems. This should now all be fixed, and
the mutex issue also.
After updating to newest kernel (with vfs_wapbl.c 1.84), it is
necessary to run fsck to get filesystem to fully healthy state. After
fsck, there shouldn't be any further problems related to the current
change.
Sorry about that and thanks for patience.
Jaromir
2016-10-02 16:46 GMT+02:00 Jaromír Doleček <jaromir.dolecek%gmail.com@localhost>:
> There was a use-after-free bug which ended up with the fault on DEBUG
> kernels, it's fixed now in revision 1.82 of kern/vfs_wapbl.c
>
> Thank you.
>
> Jaromir
>
> 2016-10-02 1:26 GMT+02:00 bch <brad.harder%gmail.com@localhost>:
>> On 10/1/16, Jaromir Dolecek <jaromir.dolecek%gmail.com@localhost> wrote:
>>> If you can get just a short traceback (which particular wapbl
>>> function(s) for example), it would help to figure possible problem.
>>
>> Here's a gdb backtrace from a core dump:
>>
>>
>> #0 0xffffffff80119a85 in cpu_reboot (howto=howto@entry=260,
>> bootstr=bootstr@entry=0x0) at
>> /usr/src/sys/arch/amd64/amd64/machdep.c:676
>> syncdone = false
>> s = <optimized out>
>> #1 0xffffffff8086e3dc in vpanic (fmt=fmt@entry=0xffffffff80ed1503
>> "trap", ap=ap@entry=0xfffffe804105cb28) at
>> /usr/src/sys/kern/subr_prf.c:342
>> ci = <optimized out>
>> oci = <optimized out>
>> bootopt = 260
>> scratchstr = "trap", '\000' <repeats 379 times>
>> #2 0xffffffff8086e490 in panic (fmt=fmt@entry=0xffffffff80ed1503
>> "trap") at /usr/src/sys/kern/subr_prf.c:258
>> ap = <error reading variable ap (Attempt to dereference a
>> generic pointer.)>
>> #3 0xffffffff8011b706 in trap (frame=0xfffffe804105cc60) at
>> /usr/src/sys/arch/amd64/amd64/trap.c:298
>> p = <optimized out>
>> pcb = <optimized out>
>> vframe = <optimized out>
>> ksi = {ksi_flags = 1, ksi_list = {tqe_next = 0x0, tqe_prev =
>> 0x0}, ksi_info = {_signo = 11, _code = 2, _errno = 0, _pad = 0,
>> _reason = {_rt = {_pid = 0, _uid = 0, _value = {sival_int = 6,
>> sival_ptr = 0x6}}, _child = {_pid = 0, _uid = 0,
>> _status = 6, _utime = 0, _stime = 0}, _fault = {_addr = 0x0, _trap =
>> 6, _trap2 = 0, _trap3 = 0}, _poll = {_band = 0, _fd = 6}}},
>> ksi_lid = 0}
>> onfault = <optimized out>
>> type = 6
>> error = <optimized out>
>> cr2 = <optimized out>
>> pfail = <optimized out>
>> #4 0xffffffff8010115e in alltraps ()
>> No symbol table info available.
>> #5 0xffffffff808cad70 in wapbl_write_revocations
>> (offp=0xfffffe804105cdc8, wl=0xfffffe811ce15688) at
>> /usr/src/sys/kern/vfs_wapbl.c:2343
>> wc = 0xfffffe811c747908
>> blocklen = <optimized out>
>> off = 6082048
>> wd = 0xfffffe81deaddead
>> error = <optimized out>
>> #6 wapbl_flush (wl=0xfffffe811ce15688, waitfor=waitfor@entry=0) at
>> /usr/src/sys/kern/vfs_wapbl.c:1618
>> bp = <optimized out>
>> we = <optimized out>
>> off = 6081536
>> head = <optimized out>
>> tail = <optimized out>
>> delta = 0
>> flushsize = 6996480
>> reserved = <optimized out>
>> error = <optimized out>
>> __func__ = "wapbl_flush"
>> #7 0xffffffff807a45c1 in ffs_sync (mp=0xfffffe811c95b008, waitfor=3,
>> cred=0xfffffe811e145f00) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1975
>> vp = 0x0
>> ump = 0xfffffe8108092b08
>> fs = 0xfffffe811beb5008
>> marker = 0xfffffe810a8b7930
>> error = <optimized out>
>> allerror = 0
>> is_suspending = <optimized out>
>> ctx = {waitfor = 3, is_suspending = false}
>> __func__ = "ffs_sync"
>> #8 0xffffffff808baaa1 in VFS_SYNC (mp=0xfffffe811c95b008,
>> a=<optimized out>, b=<optimized out>) at
>> /usr/src/sys/kern/vfs_subr.c:1358
>> error = <optimized out>
>> #9 0xffffffff808bad20 in sched_sync (arg=<optimized out>) at
>> /usr/src/sys/kern/vfs_subr.c:785
>> slp = <optimized out>
>> vp = <optimized out>
>> mp = 0xfffffe811c95b008
>> nmp = 0xfffffe811c95b008
>> starttime = 1475352687
>> synced = true
>> #10 0xffffffff801008d7 in lwp_trampoline ()
>>
>>
>>
>>> The changes to vfs_wapbl.c were fairly minor so far. I would
>>> understand new panics, but it would be strange if they caused faults.
>>>
>>> Maybe if you can try to downgrade ufs/ffs/ffs_alloc.c before rev.
>>> 1.152. It's possible there is some interaction with wapbl which might
>>> cause troubles there.
>>>
>>> Keep me on CC please, I'm working currently on WAPBL and planning some
>>> further changes, so I'll fix any regressions asap.
>>>
>>> Jaromir
>>>
>>> 2016-10-01 22:50 GMT+02:00 bch <brad.harder%gmail.com@localhost>:
>>>> On Oct 1, 2016 1:44 PM, "bch" <brad.harder%gmail.com@localhost> wrote:
>>>>>
>>>>> This appears to be trashing files, too, based on what I see trying to
>>>>> CVS
>>>>> update
>>>>
>>>> Incl. author of potential troublesome commit.
>>>>
>>>>> On Oct 1, 2016 1:32 PM, "bch" <brad.harder%gmail.com@localhost> wrote:
>>>>>>
>>>>>>
>>>>>> My system is unstable w latest src. Appears to fault in wapbl
>>>>>> functions.
>>>>>> Sadly, this appears to correspond w updates in network interfaces, so my
>>>>>> .38
>>>>>> backup kernel won't cooperate with my .39 userland to bring up the
>>>>>> network
>>>>>> and update src and poll the machine for more info and send that out.
>>>
Home |
Main Index |
Thread Index |
Old Index