Subject: "fatal machine check or error (unknown type)" from power supply issues?
To: NetBSD/Alpha Discussion List <port-alpha@NetBSD.org>
From: Greg A. Woods <woods@planix.ca>
List: port-alpha
Date: 12/01/2006 10:54:41
--pgp-sign-Multipart_Fri_Dec__1_10:54:36_2006-1
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
So, the following has happened on my customer's ES40 a couple of times
now, and the second crash coincided exactly with the removal of power
from one of its three (full N+1) power supplies, and now there's a good
deal of certainty that the last machine check panic also coincided with
power problems (they've been rebuilding their datacentre UPS and moving
machines back and forth between power sources):
fatal machine check or error (unknown type):
mces =3D 0x0
vector =3D 0x680
param =3D 0xfffffc0000006148
pc =3D 0xfffffc00003bde04
ra =3D 0xfffffc00003d3e14
code =3D 0x100000206
curproc =3D 0xfffffc00f5000008
pid =3D 29554, comm =3D pop3d
panic: machine check
Stopped in pid 29554 (pop3d) at cpu_Debugger+0x4: ret zero,(ra)
db{0}>=20
As you can see though the mces value is zero, leaving the code nothing
to decode to determine the cause of the interrupt.
Is it possible there's some other value, besides what alpha_pal_rdmces()
returns, which should also be examined on these newer machines?
--=20
Greg A. Woods
Planix, Inc.
<woods@planix.com> +1 416 489-5852 x122 http://www.planix.com/
--pgp-sign-Multipart_Fri_Dec__1_10:54:36_2006-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit
-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0i for non-commercial use
MessageID: uhtyE6dSv4koe/BoDRkCDyE0YoL9TxaQ
iQA/AwUBRXBQQWZ9cbd4v/R/EQLMugCgrXylsNHwITlN/UVC4PZlpuBvy+wAn2q4
+r7hh5poWzx6yUbyB7posPuH
=4OkU
-----END PGP SIGNATURE-----
--pgp-sign-Multipart_Fri_Dec__1_10:54:36_2006-1--