Port-macppc archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: lockups on 6.0.2 - progress?
Unfortunately, my experiment was not successful. By running my
test case using the rtk card/driver for networking, it crashed after a
few hours with the same symptoms. This suggests that it is NOT
the gem driver (or at least not *just* the gem driver)
I am encouraged about the fact that I can make it fail, but at
a loss as to where to go from here.
I don't have the expertise to debug the kernel, but I can run test
cases and report back. I can also provide accounts on my test machine,
which is on the net (charm.icompute.com)
I've been trying to think of a way to run the test case without *any*
network, but I don't know how useful that would be. I'll run a test tonight
with the wget script running on the same machine and using localhost.
If it crashes the same way, that would tend to rule out the ethernet
drivers.
-dgl-
>Hi,
>
>I reported problems with gem(4) on macppc as a bug (kern/46083).
>As the system board is now broken, I can no longer test myself
>(or confirm that it's something related to the driver or an
>already broken board; at least it was running with NetBSD 5.x
>and Linux without problems while on -6 I had an unstable gem(4)).
>
>Maybe both problems are related, even though I didn't see much
>output. Running makemandb over nfs was enough to break gem(4)
>connection. If you think it may be related, it might help
>to combine both bug-reports in a single PR (or if you haven't
>added any PR so far, add your experiences to the kern/46083).
>
>--
>Regards
>Matthias Kretschmer
>
>
>On Fri, May 31, 2013 at 09:03:42PM -0500, Donald Lee wrote:
>> I have been chasing lockups of NetBSD 6.0.1, and recently tried 6.0.2, and
>> have found that it locks up, too. My problem is that this is intermittent,
>> so the first task is to find a failing test case.
>>
>> I have a second machine set up that has hung up 3 times, twice with 6.0.2,
>> and
>> once with 6.0.1. The interesting difference is this i the log:
>>
>> May 29 13:00:00 charm syslogd[151]: restart
>> May 29 21:52:13 charm /netbsd: arp info overwritten for 71.39.101.62 by
>> 20:76:00:10:7f:14
>> May 30 14:44:08 charm /netbsd: gem0: receive error: RX overflow sc->rxptr
>> 75, complete 82
>> May 30 14:44:12 charm /netbsd: gem0: rx_watchdog: not in overflow state:
>> 0x810400
>> May 30 14:44:12 charm /netbsd: gem0: rx_watchdog: wr pointer != saved
>> May 30 14:44:12 charm /netbsd: gem0: rx_watchdog: rd pointer != saved
>> May 30 14:44:12 charm /netbsd: gem0: resetting anyway
>> May 30 15:01:45 charm /netbsd: gem0: receive error: RX overflow sc->rxptr
>> 20, complete 30
>> May 30 15:01:49 charm /netbsd: gem0: rx_watchdog: not in overflow state:
>> 0x810400
>> May 30 15:01:49 charm /netbsd: gem0: rx_watchdog: wr pointer != saved
>> May 30 15:01:49 charm /netbsd: gem0: rx_watchdog: rd pointer != saved
>> May 30 15:01:49 charm /netbsd: gem0: resetting anyway
>> May 30 18:15:30 charm /netbsd: gem0: receive error: RX overflow sc->rxptr
>> 58, complete 70
>> May 30 18:15:34 charm /netbsd: gem0: rx_watchdog: not in overflow state:
>> 0x810400
>> May 30 18:15:34 charm /netbsd: gem0: rx_watchdog: wr pointer != saved
>> May 30 18:15:34 charm /netbsd: gem0: rx_watchdog: rd pointer != saved
>> May 30 18:15:34 charm /netbsd: gem0: resetting anyway
>> May 31 20:51:35 charm syslogd[151]: restart
>>
>>
>> I take this as a clue, and I am going to put in a PCI ethernet card, (SMC)
>> and see if that behaves differently.
>>
>> Note that this message the "watchdog" thing with the reset is new in 6.0.2,
>> so I'm guessing that someone changed the gem driver - just a guess....
>>
>> I'll report back.
>>
>> It takes a day or two or three for the failure to occur. I originally
>> thought it was a failure that happened under heavy disk load, but it
>> turns out that at least with the last couple of failures, it happens
>> on an almost idle machine. The only "load" I have on it is a script that
>> does two wget's in a loop. One wget is of a small index file, and the other
>> is
>> of a 1 Meg file. It does the wgets as fast as it can. It seems to cause the
>> problem in a couple of days.
>>
>> I have now swapped in the SMC ethernet card. Let's see if it still fails.
>> If not, then I have a workaround, and we have a possible driver bug to
>> fix.
>>
>> -dgl-
Home |
Main Index |
Thread Index |
Old Index