tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [10.0_STABLE] Hard lock



Rin Okuyama a écrit :
> Hi,
> 
> On 2024/10/27 23:11, BERTRAND Joël wrote:
>>     I haven't seen other locks with these three patches. Can you apply
>> these patches to -10 branch ?
> 
> For three commits in your previous message:
> 
> On 2024/10/19 3:08, BERTRAND Joël wrote:
>> (1/3)
> https://mail-index.netbsd.org/source-changes/2023/12/28/msg149090.html
>> Use correct status value SCSI_BUSY (0x08) instead of XS_BUSY (7)...
>>
>> (2/3)
> https://mail-index.netbsd.org/source-changes/2024/08/24/msg153012.html
>> Avoid race in timeout handling.
>> Don't try to wake up CCB without connection (which led to a NULL pointer
>> deref).
>>
>> (3/3)
> https://mail-index.netbsd.org/source-changes/2024/10/15/msg153958.html
> 
> only (3/3) (cprng_fast v.s. softint) has already been pullup'ed.
> 
> Michael, can I request iSCSI ones before 10.1 release?

	Thanks,

	But bad news, if kernel without this patch quickly crashed (max uptime
2 or 3 days), I have seen last night (CET) a new hard lock.

	Last messages in dmesg are :

[ 270865,049408] uid 1011, pid 8627, command bacula-dir, on /opt/bacula:
file system full
[ 271163,079723] uid 1011, pid 8627, command bacula-dir, on /opt/bacula:
file system full
[ 271163,079723] uid 1011, pid 8627, command bacula-dir, on /opt/bacula:
file system full
[ 271163,079723] uid 1011, pid 8627, command bacula-dir, on /opt/bacula:
file system full
[ 277674,239666] S3C1: freeing UNUSED pdu

	but I'm not sure last line is related to crash. /var/log/message is
corrupted :

Oct 29 05:35:59 legendre rpc.lockd: no matching entry for pythagore
Oct 29 05:37:04 legendre syslogd[2749]: last message repeated 9 times
Oct 29 05:37:04 legendre rpc.lockd: duplicate lock from pythagore.23361
Oct 29 05:37:04 legendre rpc.lockd: no matching entry for pythagore
Oct 29 05:37:04 legendre rpc.lockd: duplicate lock from pythagore.23361
Oct 29 05:37:04 legendre rpc.lockd: no matching entry for pythagore
Oct 29 05:37:04 legendre rpc.lockd: duplicate lock from pythagore.23361
Oct 29 05:37:04 legendre rpc.lockd: no matching entry for pythagore
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Oct
29 09:03:56 legendre syslogd[2365]: restart
Oct 29 09:03:56 legendre /netbsd: [ 277674.2396657] S3C1: freeing UNUSED pdu

	No crash dump, serial line unresponsive. All fs (ffsv2/ea) are
corrupted even if log is set. I have to reboot in single user mode to
run fsck -fpP even journal is replayed before fsck (fsck takes one
hour...). fsck fixes a lot of fs errors. If I run again fsck, it doesn't
find error anymore.

	I have tested hardware (and precisely do intensive tests on memory).
I'm pretty sure there is no hardware failure.

	Best regards,

	JB

Attachment: signature.asc
Description: OpenPGP digital signature



Home | Main Index | Thread Index | Old Index