Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Ultra5 hard lockups with 4.0_STABLE and hme status messages



I'm using an Ultra5-333 as a file server running RAIDframe RAID-R over
a Sun SixPack enclosure.

  NetBSD 4.0_STABLE (DEEPDISH) #4: Fri Feb 15 23:10:52 CST 2008 sparc64

It had been up for months under light load and had been completely idle
for a couple of days when it became completely unresponsive.

There was no console attached so I don't know what messages it may have
produced--just that when I hooked a terminal to it, it didn't respond--
not even to a BREAK.  It was wedged so hard I had to power-cycle it (with
the mains switch on the back of the PS) to get it going again.

I've experienced these occasionally in the past, but they occurred
under fairly heavy load and the one or two times I had a console
attached the last message was "Watchdog Reset".  I don't recall if
it actually dropped to OpenBoot or if the console was usable.

I don't know if its related but once up, I noticed a series of messages
like the following:

  hme0: status=8<CCNTEXP>
  hme0: status=40<CVCNTEXP>
  hme0: status=10009<GOTFRAME,CCNTEXP,RXTOHOST>
  hme0: status=108<CCNTEXP,SENTFRAME>

Not always the same, and mostly "8" followed by "40" and less frequently
the "108".  They seem to come in bursts, but I haven't established any
pattern of behavior for when they do.  Most have been seen when the
machine is idle.  When there's no NFS traffic, the rest is mostly NTP.

Looking at "hmereg.h" the descriptions of these status messages sound
rather benign.  Unless I'm just naive.

My network is extremely plain.  No VLANs or anything.  Just a couple
LinkSys 100Mb desktop switches tying this machine to two other SPARC-
stations (running NetBSD/sparc), two PowerMacs (Panther and Tiger) and
a wireless bridge to a NetBSD/i386 box and random test machines in
another room.

I'll leave a serial console attached to watch it for a while.

Do the hme status messages indiate a connection with the hard hang,
or possibly impending hardware failure?

Thanks.

--
John D. Baker, KN5UKS                    NetBSD     Darwin/MacOS X
jdbaker(at)mylinuxisp(dot)com                 OpenBSD            FreeBSD
BSD -- It just sits there and _works_!
GPG fingerprint:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645


Home | Main Index | Thread Index | Old Index