Subject: port-i386/32894: protection fault trap in tmx86_get_longrun_mode
To: None <port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,>
From: None <sysadmin@terc.edu>
List: netbsd-bugs
Date: 02/21/2006 19:30:01
>Number:         32894
>Category:       port-i386
>Synopsis:       protection fault trap in tmx86_get_longrun_mode
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-i386-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Feb 21 19:30:00 +0000 2006
>Originator:     Robby Griffin
>Release:        NetBSD 3.0
>Organization:
TERC
>Environment:
NetBSD  3.0 NetBSD 3.0 (TERC_RLX) #4: Tue Feb 21 12:17:21 EST 2006  root@foiegras:/usr/src/sys/arch/i386/compile/TERC_RLX i386

>Description:
Booting minimally edited NetBSD 3.0 GENERIC on an RLX ServerBlade 1000t results in a protection fault during cpu initialization:

cpu0 at mainbus0: (uniprocessor)
cpu0: Transmeta Crusoe (586-class), 1000.14 MHz, id 0x543
cpu0: Processor revision 1.5.0.2
cpu0: Code Morphing Software Rev: 4.3.6-9-571
cpu0: 20030113 18:40 official release 4.3.6#1
kernel: protection fault trap, code=0
Stopped in pid 0.1 (swapper) at netbsd:tmx86_get_longrun_mode+0xe:      rdmsr
db> bt
tmx86_get_longrun_mode(0,c090c8c7,78,80860007,33303032) at netbsd:tmx86_get_longrun_mode+0xe
transmeta_cpu_info(c07cd6a0,543,c0842a80,24,0) at netbsd:transmeta_cpu_info+0x91
identifycpu(c07cd6a0,c07c0f60,1,c07b2a40,c090cdcc) at netbsd:identifycpu+0x68d
cpu_attach(c1fc9f80,c1fc9f40,c090ce50,c07b6684,29) at netbsd:cpu_attach+0x100
config_attach_loc(c1fc9f80,c07b6684,0,c090ce50,c045d334) at netbsd:config_attach_loc+0x284
mainbus_attach(0,c1fc9f80,0,c07c4b80,0) at netbsd:mainbus_attach+0x63
config_attach_loc(0,c07b5788,0,0,0) at netbsd:config_attach_loc+0x284
config_rootfound(c071d650,0,c090cf68,c042ae01,c072f3d1) at netbsd:config_rootfound+0x2c
cpu_configure(0,c083ca80,c090cfa0,c03700e8,0) at netbsd:cpu_configure+0x24
configure(0,0,0,0,0) at netbsd:configure+0x4a
main(0,0,0,0,0) at netbsd:main+0x2d4

If I disable the call to tmx86_get_longrun_mode during cpu initialization for the sake of argument, the machine will boot, but a sysctl could still crash it in the same way:

# sysctl -a
sysctl: warning: /var/run/dev.db: No such file or directory
kernel: protection fault trap, code=0
Stopped in pid 52.1 (sysctl) at netbsd:tmx86_get_longrun_mode+0xe:      rdmsr
db> t
tmx86_get_longrun_mode(0,1,0,0,1000272) at netbsd:tmx86_get_longrun_mode+0xe
sysctl_machdep_tm_longrun(cc37defc,0,bfbfe058,cc37def0,0) at netbsd:sysctl_machdep_tm_longrun+0x57
...

Here's dmesg from a successful boot with the call to tmx86_get_longrun_mode commented out:

NetBSD 3.0 (TERC_RLX) #4: Tue Feb 21 12:17:21 EST 2006
        root@foiegras:/usr/src/sys/arch/i386/compile/TERC_RLX
total memory = 1143 MB
avail memory = 1109 MB
BIOS32 rev. 0 found at 0xfd7b0
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: Transmeta Crusoe (586-class), 1000.15 MHz, id 0x543
cpu0: Processor revision 1.5.0.2
cpu0: Code Morphing Software Rev: 4.3.6-9-571
cpu0: 20030113 18:40 official release 4.3.6#1
cpu0: features 84893f<FPU,VME,DE,PSE,TSC,MSR,CX8,SEP>
cpu0: features 84893f<CMOV,PN,MMX>
cpu0: "Transmeta(tm) Crusoe(tm) Processor TM5800"
cpu0: serial number 0000-0543-0000-342E-00AC-8C24
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Transmeta LongRun Northbridge (rev. 0x03)
Transmeta SDRAM Controller (RAM memory) at pci0 dev 0 function 1 not configured
Transmeta BIOS Scratchpad (RAM memory) at pci0 dev 0 function 2 not configured
pcib0 at pci0 dev 7 function 0
pcib0: Acer Labs M1543 PCI-ISA Bridge (rev. 0x00)
fxp0 at pci0 dev 9 function 0: i82559 Ethernet, rev 8
fxp0: interrupting at irq 11
fxp0: May need receiver lock-up workaround
fxp0: Ethernet address 00:42:52:01:1b:a4
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp1 at pci0 dev 10 function 0: i82559 Ethernet, rev 8
fxp1: interrupting at irq 10
fxp1: May need receiver lock-up workaround
fxp1: Ethernet address 00:42:52:01:1b:a5
inphy1 at fxp1 phy 1: i82555 10/100 media interface, rev. 4
inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp2 at pci0 dev 11 function 0: i82559 Ethernet, rev 8
fxp2: interrupting at irq 7
fxp2: May need receiver lock-up workaround
fxp2: Ethernet address 00:42:52:01:1b:a6
inphy2 at fxp2 phy 1: i82555 10/100 media interface, rev. 4
inphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
aceride0 at pci0 dev 15 function 0
aceride0: Acer Labs M5229 UDMA IDE Controller (rev. 0xc3)
aceride0: bus-master DMA support present
aceride0: primary channel wired to compatibility mode
aceride0: primary channel interrupting at irq 14
atabus0 at aceride0 channel 0
aceride0: secondary channel wired to compatibility mode
aceride0: secondary channel interrupting at irq 15
atabus1 at aceride0 channel 1
Acer Labs M7101 Power Management Controller (miscellaneous bridge) at pci0 dev 17 function 0 not configured
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
isapnp0: no ISA Plug 'n Play devices found
Kernelized RAIDframe activated
wd0 at atabus0 drive 0: <FUJITSU MHT2060AS>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 57231 MB, 116280 cyl, 16 head, 63 sec, 512 bytes/sect x 117210240 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(aceride0:0:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA)
wd1 at atabus1 drive 0: <FUJITSU MHT2060AS>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 57231 MB, 116280 cyl, 16 head, 63 sec, 512 bytes/sect x 117210240 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1(aceride0:1:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA)
boot device: fxp2
root on fxp2

>How-To-Repeat:
Obtain an RLX ServerBlade 1000t, edit GENERIC or INSTALL kernel config to set CONS_OVERRIDE and force a console on com0 as a workaround for the oddities of BIOS console redirection. Try to boot.

>Fix:
Not sure why we get a protection fault while trying to read an MSR. A horrible workaround is to forcibly disable longrun support, maybe better done in kernel config if absolutely necessary:

--- usr/src/sys/arch/i386/i386/identcpu.c.orig	2005-07-18 16:48:58.000000000 -0400
+++ usr/src/sys/arch/i386/i386/identcpu.c	2006-02-21 14:17:26.000000000 -0500
@@ -1079,7 +1079,10 @@
 		info.text[64] = 0;
 		printf("%s: %s\n", ci->ci_dev->dv_xname, info.text);
 	}
-
+#if 0
+	/* XXX TERC - disabling this to prevent protection fault
+	 * XXX TERC - during boot of RLX ServerBlade 1000t (TM5800)
+	 */
 	if (nreg >= 0x80860007) {
 		crusoe_longrun = tmx86_get_longrun_mode();
 		tmx86_get_longrun_status(&crusoe_frequency,
@@ -1089,6 +1092,7 @@
 		    crusoe_longrun, crusoe_frequency, crusoe_voltage,
 		    crusoe_percentage);
 	}
+#endif
 }

 void
@@ -1097,8 +1101,13 @@
 	u_int nreg = 0, dummy;

 	CPUID(0x80860000, nreg, dummy, dummy, dummy);
+# if 0
+	/* XXX TERC - disabling this to prevent protection fault in
+	 * XXX TERC - sysctl on RLX ServerBlade 1000t (TM5800)
+	 */
 	if (nreg >= 0x80860007)
 		tmx86_has_longrun = 1;
+# endif
 }

 static const char n_support[] __attribute__((__unused__)) =