Ok, so I booted both kernels and guess what: no change in behaviour.Then with netbsd.gdb booting, I interrupted them at two different points in time, before the line
[ 63.9893876] admtemp0: workqueue busy: updates stoppedand after that, because there is a long break between the detection of the USB-hub and the admtemp0-message (might well be the indicated 60sec)
Backtrace from time before admtemp0-Timeout:[ 1.1200107] uhub0 at usb0: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
Stopped in pid 0.5 (system) at netbsd:cpu_Debugger+0x4: nop db{0}> btintr_list_handler(101dbc880, 7, e00479b0, 0, 1044060, 2) at netbsd:intr_list_handler+0x10 sparc_interrupt(101dbcc40, 1, e0047b90, 0, 1044020, e0048000) at netbsd:sparc_interrupt+0x294 sparc_interrupt(1c70240, 70000000001, 2014000, 101db6ca0, 1, 1c72000) at netbsd:sparc_interrupt+0x294 frag6_slowtimo(1c95730, 101db6ca0, 1c70000, 1cc5000, 1ca3650, 101db6ca4) at netbsd:frag6_slowtimo+0x24 pfslowtimo(0, 101d88041, 0, 1cba478, 1c3c938, 18192b8) at netbsd:pfslowtimo+0x40 callout_softclock(1cba480, 1000000, 10000, 30c0, 20c0, 1cba520) at netbsd:callout_softclock+0xc8 softint_dispatch(2000, 1, 0, 101db6ca0, 1779500c0, 177950360) at netbsd:softint_dispatch+0x80 softint_fastintr(101db6ca0, 1, e0047cf0, 0, 1044020, e0048000) at netbsd:softint_fastintr+0x80 sparc_interrupt(f0056c1c, 1140d0, 1173b8, 0, fff57b48, 1) at netbsd:sparc_interrupt+0x294
Backtrace from time after admtemp0-Timeout:[ 1.1200103] uhub0 at usb0: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
[ 63.5593913] admtemp0: workqueue busy: updates stopped Stopped in pid 0.5 (system) at netbsd:cpu_Debugger+0x4: nop db{0}> btintr_list_handler(101dbc880, 1, e0047b90, 101db6ca0, 1044060, e0048000) at netbsd:intr_list_handler+0x10 sparc_interrupt(101d88040, 101d88041, 101db6ca0, 1cc5000, 1ca3400, 1cc7400) at netbsd:sparc_interrupt+0x294 callout_schedule_locked(101d88040, 101d88040, 1a80, 1cba478, 1cba478, 101d88040) at netbsd:callout_schedule_locked+0x94 callout_softclock(1cba480, 1000000, 10000, 30c0, 20c0, 1cba520) at netbsd:callout_softclock+0x274 softint_dispatch(2000, 1, 0, 101db6ca0, 1779500c0, 177950360) at netbsd:softint_dispatch+0x80 softint_fastintr(101db6ca0, 1, e0047cf0, 0, 1044020, e0048000) at netbsd:softint_fastintr+0x80 sparc_interrupt(f0056c1c, 1140d0, 1173b8, 0, fff57b48, 1) at netbsd:sparc_interrupt+0x294
db{0}>Of course, the actual function that shows up in the bt at the time of interrupt (BREAK) seems arbitrary, as I had different results in different runs. I am not sure, how to read the backtrace, whether the top-most-line is the address of the last return address on the stack, so that you would go from top to bottom to learn the calling sequence... or is it vice versa. And does the interupt from the BREAK show up in the backtrace??
However, something striking is that callout_softclock is always involved... Am 17.06.21 um 13:20 schrieb Julian Coleman:
Hi,hmm, strange. Nothing attached to the Firewire-Ports. The code looks like that watchdog_clock never gets reset? Is there a way to disable the FW-ports via a boot.conf or something?Unfortunately, we need to remove it from the kernel.I downloaded the source for the 9.2-kernel and there seems to be a mismatch in the versions of firewire.c on the website you mentioned and what I have found in the kernel tree. The version in 9.2 seems to be 1.48 and the version on the website is 1.51. Actually, line 1323 in version 1.48 is in a different function... Which kernel is generated by the sources from the website?nxr.netbsd.org has the current sources. The 9.x kernels are on a different branch, which is why you see the different versions. There is also the CVS history for the file at: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/ieee1394/firewire.c Looking at that code though, it hasn't changed for a long time. So, I was wondering if something other change is causing that problem. It tempting to try to increase the multiplier from 15 to something larger to see if we just need to wait longer [1], or just to try a kernel without FW to really check that it is the problem. I've built a kernel from GENERIC without FW: http://ftp.netbsd.org/pub/NetBSD/misc/jdc/sparc64/netbsd and if you are able to test boot that, then it would be useful [2]. There is also the version with full debugging symbols [3]: http://ftp.netbsd.org/pub/NetBSD/misc/jdc/sparc64/netbsd.gdb and the kernel configuration that I used: http://ftp.netbsd.org/pub/NetBSD/misc/jdc/sparc64/GENERIC-NOFW Regards, Julian [1] Instead of waiting, we might just be able to check if we are running with interrupts after start, like the check here: https://nxr.netbsd.org/xref/src/sys/arch/sparc/dev/ts102.c#1044 [2] my test netbsd-9 has a few local changes in some drivers, but nothing that should affect this. [3] The .gdb file is useful because we can match a backtrace to a source line. For example:firewire_watchdog(101d8a040, 101d8a041, 0, 1cba038, 1cba038, 101d8a040) atnetbsd:firewire_watchdog+0x48 :; gdb netbsd.gdb (gdb) list *(firewire_watchdog+0x48) ...