At Fri, 10 Jul 2009 11:55:22 -0600, Sverre Froyen <sverre%viewmark.com@localhost> wrote: Subject: Re: reboot fails to reboot -- was Re: NFS > > You're right. Looks like "reboot -n" invokes that flag. Ah yes, indeed -- I almost forgot about that, I was only thinking about kernel stuff anyway... :-) > In my case, that would have been preferable to jumping in a car and driving > for 30 mins. Fsck was skipped anyway because of the "log" option -- perhaps > unsafely, considering the state the file system was in. I really hate PCs, especially PC "servers". I want a modern system with real, and simple, lights-out remote management, just like the old AlphaServers and their RMC, or the ILOM or ALOM that Sun servers, have. And something without a bass-ackwards compatible BIOS, and of course all that means it will have real serial console support in the firmware too. I guess I should just shut up and shell out the bucks for a new(er) Sun server, but sadly I think I'd still be stuck with their x86 or maybe AMD platforms if I wanted to run NetBSD. Of course I'm assuming you at least have a serial console connected for remote management and that you didn't just drive to the machine to "press any key". Proper firmware would not directly avoid that problem, but it would at least make it easier to use a serial console properly. > > A safer mode could probably be done fairly easily too by setting some > > kind of watchdog timer before the unmounting of filesystems and other > > sundry cleanup, and then forcing the system to reboot if the timer > > expires. > > Agreed. This would all have to go into the kernel, correct? Yes, indeed, it would have to be a kernel feature. I haven't looked closely at the kernel shutdown sequences since the 1.6.x days but I think it should be easy enough to establish a timeout around the file > > (FYI, I have some changes, against netbsd-4, which much more reliably > > reboot i386 machines using much more standard methods of rebooting too.) > > Could they be added to 5 and current? I would imagine.... They're at the bottom of this diff (note some are still #if-0'ed out because I haven't had time to figure out how to do them properly in the NetBSD context): Index: sys/arch/i386/i386/machdep.c =================================================================== RCS file: /cvs/master/m-NetBSD/main/src/sys/arch/i386/i386/machdep.c,v retrieving revision 1.586.2.5 diff -u -r1.586.2.5 machdep.c --- sys/arch/i386/i386/machdep.c 28 Aug 2007 11:46:26 -0000 1.586.2.5 +++ sys/arch/i386/i386/machdep.c 3 Jul 2009 20:10:12 -0000 @@ -278,6 +278,14 @@ phys_ram_seg_t mem_clusters[VM_PHYSSEG_MAX]; int mem_cluster_cnt; +#ifdef VGA_APERTURE +# ifdef INSECURE +int allow_vga_aperture = 1; +# else +int allow_vga_aperture = 0; +# endif +#endif + int cpu_dump(void); int cpu_dumpsize(void); u_long cpu_dump_mempagecnt(void); @@ -416,6 +424,12 @@ char pbuf[9]; /* + * For console drivers that require uvm and pmap to be initialized, + * we'll give them one more chance here... + */ + consinit(); + + /* * Initialize error message buffer (et end of core). */ if (msgbuf_p_cnt == 0) @@ -635,6 +649,13 @@ NULL, 0, NULL, 0, CTL_MACHDEP, CTL_EOL); +#ifdef VGA_APERTURE + sysctl_createv(clog, 0, NULL, NULL, + CTLFLAG_PERMANENT|CTLFLAG_READWRITE, /* XXX was using CTLFLAG_READONLY1 */ + CTLTYPE_INT, "allow_vga_aperture", NULL, + NULL, 0, &allow_vga_aperture, 0, + CTL_MACHDEP, CPU_ALLOW_VGA_APERTURE, CTL_EOL); +#endif sysctl_createv(clog, 0, NULL, NULL, CTLFLAG_PERMANENT, CTLTYPE_STRUCT, "console_device", NULL, @@ -870,6 +891,7 @@ void cpu_reboot(int howto, char *bootstr) { + int keyval = 0; if (cold) { howto |= RB_HALT; @@ -877,6 +899,10 @@ } boothowto = howto; + /* + * XXX this bit, except for the "cold" check above, should be MI -- + * i.e. back in kern/kern_xxx.c:sys_reboot() + */ if ((howto & RB_NOSYNC) == 0 && waittime < 0) { waittime = 0; vfs_shutdown(); @@ -888,7 +914,7 @@ resettodr(); } - /* Disable interrupts. */ + /* block interrupts. */ splhigh(); /* Do a dump if requested. */ @@ -921,17 +947,16 @@ apm_set_powstate(NULL, APM_DEV_ALLDEVS, APM_SYS_OFF); printf("WARNING: APM powerdown failed!\n"); /* - * RB_POWERDOWN implies RB_HALT... fall into it... + * RB_POWERDOWN includes RB_HALT... fall into it... */ #endif } if (howto & RB_HALT) { - printf("\n"); - printf("The operating system has halted.\n"); - printf("Please press any key to reboot.\n\n"); + printf("\nThe operating system has halted.\n" + "Please press any key to reboot.\n\n"); -#ifdef BEEP_ONHALT +#ifdef BEEP_ONHALT /* XXX could be: defined(BEEP_ONHALT_COUNT) && (BEEP_ONHALT_COUNT > 0) */ { int c; for (c = BEEP_ONHALT_COUNT; c > 0; c--) { @@ -944,21 +969,61 @@ } #endif - cnpollc(1); /* for proper keyboard command handling */ - if (cngetc() == 0) { - /* no console attached, so just hlt */ - for(;;) { - __asm volatile("hlt"); + cnpollc(1); /* for proper keyboard command handling without + * interrupts */ + /* + * ACK!!! The line discipline does _NOT_ get used from within + * the kernel for console I/O (though it probably should be). + * + * If any output above went out too fast for the device + * connected to a serial console then we'll read a <CTRL-S> + * here, and/or perhaps a <CTRL-Q>, and we'll just have to + * ignore them. + */ +#define ASCII_XON 0x11 +#define ASCII_XOFF 0x13 + do { + if ((keyval = cngetc()) == 0) { + /* + * no console attached, or perhaps a BREAK + * condition caused a read error, so just + * invoke the HLT instruction and wait for the + * operator to push the reset (or power) + * button. + */ + printf("\nCannot read from the console, calling the HLT instruction.\n\n"); + printf("RESET or power cycle the system to reboot.\n\n"); + + goto cpu_halt; } - } +#ifdef DEBUG + else if (keyval == ASCII_XOFF || keyval == ASCII_XON) { + printf("(ignoring flow control char (0x%x)\n", keyval); + /* XXX even this could trigger another XOFF, sigh... */ + } +#endif + } while (keyval == ASCII_XOFF || keyval == ASCII_XON); + cnpollc(0); +#ifdef DEBUG /* XXX if booted with '-v' perhaps? */ + if (keyval) + printf("(read key value 0x%x)\n\n", keyval); +#endif } printf("rebooting...\n"); if (cpureset_delay > 0) delay(cpureset_delay * 1000); cpu_reset(); - for(;;) ; + printf("cpu_reset() returned, waiting for hardware to reset or be reset...\n\n"); + + cpu_halt: + /* + * XXX to halt or not to halt -- is one halt good enough? + */ + for (;;) { + __asm volatile("hlt"); + } /*NOTREACHED*/ } @@ -1435,7 +1500,7 @@ /* XXX XXX XXX */ if (mem_cluster_cnt >= VM_PHYSSEG_MAX) - panic("init386: too many memory segments " + panic("init386: add_mem_cluster(): too many memory segments " "(increase VM_PHYSSEG_MAX)"); seg_start = round_page(seg_start); @@ -1577,16 +1642,10 @@ } #ifdef DEBUG_MEMLOAD - printf("mem_cluster_count: %d\n", mem_cluster_cnt); + printf("initial mem_cluster_cnt: %d\n", mem_cluster_cnt); /* XXX won't this always be zero here? */ #endif /* - * Call pmap initialization to make new kernel address space. - * We must do this before loading pages into the VM system. - */ - pmap_bootstrap((vaddr_t)atdevbase + IOM_SIZE); - - /* * Check to see if we have a memory map from the BIOS (passed * to us by the boot program. */ @@ -1712,6 +1771,17 @@ avail_end = IOM_END + trunc_page(KBTOB(biosextmem)); } + +#ifdef DEBUG_MEMLOAD + printf("final mem_cluster_cnt: %d\n", mem_cluster_cnt); +#endif + + /* + * Call pmap initialization to make new kernel address space. + * We must do this before loading pages into the VM system. + */ + pmap_bootstrap((vaddr_t)atdevbase + IOM_SIZE); + /* * If we have 16M of RAM or less, just put it all on * the default free list. Otherwise, put the first @@ -2191,10 +2261,48 @@ #include <dev/ic/mc146818reg.h> /* for NVRAM POST */ #include <i386/isa/nvram.h> /* for NVRAM POST */ +/* XXX some copied from sys/arch/x86/pci/pci_machdep.c, should be in <x86/include/pci_machdep.h> */ +/* XXX they're also used in sys/arch/i386/stand/lib/test/pci_user.c */ +#define PCI_MODE1_ENABLE 0x80000000UL +#define PCI_MODE1_ADDRESS_REG 0x0cf8 + +#define PCI_MODE1_RESET_CTL_REG 0x0cf9 +#define PCI_MODE1_DATA_REG 0x0cfc + +/* + * From Radisys 82600 High Integration Dual PCI System Controller Data Book: + * + * Bits 1 and 2 in this register are used by the 82600 to generate a hard reset + * or a soft reset. During a hard reset, the 82600 asserts CPURST# and LPRST# + * and resets its own core logic. BPRST# is also asserted if the 82600 is + * configured as the BPCI Central Resource. During a soft reset, the 82600 + * only asserts INIT#. + * + * Bit Description + * + * 7:3 Reserved. + * + * 2 Reset CPU (RCPU) ? R/W. A transition of this bit from a 0 to a 1 + * initiates a reset. The type of reset is determined by bit 1. This bit + * cannot be read as a 1. + * + * 1 System Reset (SRST) ? R/W. This bit is used to select the type of + * reset generated when bit 2 in this register transitions to a 1. A + * value of 1 selects a hard reset and 0 selects a soft reset + * + * 0 Reserved. + */ +#define PCI_RESET_RCPU (1 << 2) +#define PCI_RESET_SRST (1 << 1) + +#define PCAT_SYS_CTL_A 0x92 /* AT System Control Port A */ +#define PCAT_SYS_CTL_A_FRST 0x01 /* Fast Reset, aka Fast Init */ + void cpu_reset() { struct region_descriptor region; + u_int8_t reg8; disable_intr(); @@ -2220,40 +2328,104 @@ * See AMD Geode SC1100 Processor Data Book, Revision 2.0, * sections 6.3.1, 6.3.2, and 6.4.1. */ - if (cpu_info_primary.ci_signature == 0x540) { - outl(0xcf8, 0x80009044ul); - outl(0xcfc, 0xf); + /* XXX OpenBSD puts this in sys/arch/i386/pci/geodesc.c:sc1100_sysreset() */ + if (cpu_info_primary.ci_signature == 0x540) { /* CPU_GEODE1100??? */ +#ifdef DEBUG + printf("cpu_reset(): trying AMD Geode SC1100 PCI-bus system reset...\n"); +#endif + outl(PCI_MODE1_ADDRESS_REG, PCI_MODE1_ENABLE | 0x9044); + outl(PCI_MODE1_DATA_REG, 0xf); + } + + /* XXX OpenBSD has ACPI reset stuff in sys/dev/acpi/acpi.c:acpi_reset() */ + + /* + * Try the PCI system & cpu reset _first_ (from FreeBSD and GNU/Linux) + * + * Write 0x6 to PCI Reset Control Register (0xcf9) to reset the CPU, + * the PCI controller itself, and to trigger a system-wide reset. + * + * This is the best method for all recent and modern systems that + * include any form of PCI controller. + */ +#ifdef DEBUG + printf("cpu_reset(): trying generic PCI-bus system & CPU reset...\n"); +#endif + reg8 = inb(PCI_MODE1_RESET_CTL_REG); + reg8 |= PCI_RESET_RCPU | PCI_RESET_SRST; + outb(PCI_MODE1_RESET_CTL_REG, reg8); + delay(500000); /* wait 0.5 sec to see if that did it */ + + /* + * Try reset by setting the 0x01 bit of the System Control Port A + * (0x92) (also from FreeBSD) + * + * This is the second-best way to reset any i386 system, and should + * work on everything back to the PC/AT. + * + * Note this doesn't work (or didn't) in VMware. + */ + reg8 = inb(PCAT_SYS_CTL_A); + /* Check the the hardware actually has the port in question */ + if (reg8 != 0xff) { +#ifdef DEBUG + printf("cpu_reset(): trying PC/AT System Control Port A reset...\n"); +#endif + /* FAST_INIT must be zero before a one can be written */ + if ((reg8 & PCAT_SYS_CTL_A_FRST) != 0) + outb(PCAT_SYS_CTL_A, reg8 & ~PCAT_SYS_CTL_A_FRST); + outb(PCAT_SYS_CTL_A, reg8 | PCAT_SYS_CTL_A_FRST); + delay(500000); /* wait 0.5 sec to see if that did it */ } /* - * The keyboard controller has 4 random output pins, one of which is - * connected to the RESET pin on the CPU in many PCs. We tell the - * keyboard controller to pulse this line a couple of times. + * The keyboard controller has 4 output pins, one of which is connected + * to the CPU RESET line in many PCs. We tell the keyboard controller + * to pulse this line a couple of times. + * + * XXX FreeBSD only does it once, then waits for 0.5 sec */ +#ifdef DEBUG + printf("cpu_reset(): trying keyboard controller reset...\n"); +#endif outb(IO_KBD + KBCMDP, KBC_PULSE0); delay(100000); outb(IO_KBD + KBCMDP, KBC_PULSE0); - delay(100000); + delay(500000); /* * Try to cause a triple fault and watchdog reset by making the IDT * invalid and causing a fault. */ +#ifdef DEBUG + printf("cpu_reset(): trying IDT invalid with divide-by-zero fault reset...\n"); +#endif memset((caddr_t)idt, 0, NIDT * sizeof(idt[0])); setregion(®ion, idt, NIDT * sizeof(idt[0]) - 1); lidt(®ion); __asm volatile("divl %0,%1" : : "q" (0), "a" (0)); -#if 0 +#if 0 /* XXX FreeBSD and OpenBSD actually do resort to this as a last go */ + /* XXX unfortunately PTD is no longer defined in NetBSD's sys/arch/i386/include/pmap.h.... */ /* * Try to cause a triple fault and watchdog reset by unmapping the * entire address space and doing a TLB flush. */ +#ifdef DEBUG + printf("cpu_reset(): trying TLB flush watchdog reset...\n"); +#endif memset((caddr_t)PTD, 0, PAGE_SIZE); tlbflush(); #endif - for (;;); +#if 0 + /* + * XXX we could also try the BIOS cold boot, but do we have to be in + * real mode first?.... + */ + __asm("movw $0x0000, $0x472; + ljmp $0xffff, $0x0000"); +#endif } void -- Greg A. Woods +1 416 218-0098 VE3TCP RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Secrets of the Weird <woods%weird.com@localhost>
Attachment:
pgpHHOvzuTat4.pgp
Description: PGP signature