NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/38856: Raidframe reconstruction locks my machine up
The following reply was made to PR kern/38856; it has been noted by GNATS.
From: Matthias Scheler <tron%zhadum.org.uk@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc:
Subject: Re: kern/38856: Raidframe reconstruction locks my machine up
Date: Thu, 5 Jun 2008 22:37:20 +0100
On 4 Jun 2008, at 14:45, martin%duskware.de@localhost wrote:
> Soon (sometimes imediately) after I start reconstruction, the
> machine locks
> up completely, I can't even break into ddb on the console. This
> happens within
> less than 10 minutes reliably on the affected raid set. The other raid
> (consisting of slightly slower disks, wd0 and wd1, see dmesg below)
> can
> be reconstructed. I saw the lockup there too once, but can't
> reproduce this.
I can reproduce this with NetBSD 3.x and 4.0 on this machine:
NetBSD 4.0 (BEAVER) #0: Sun Dec 16 14:36:33 CET 2007
tron%beaver.core.de@localhost:/usr/src/sys/arch/i386/compile/BEAVER
total memory = 1023 MB
avail memory = 1000 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
BIOS32 rev. 0 found at 0xf0e90
mainbus0 (root)
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Pentium 4 (686-class), 2018.08 MHz, id 0xf24
cpu0: features
3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 3febfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features 3febfbff<FXSR,SSE,SSE2,SS,HTT,TM>
cpu0: "Intel(R) Pentium(R) 4 CPU 2.00GHz"
cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
cpu0: L2 cache 512 KB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: enabling thermal monitor 1 ... enabled.
cpu0: calibrating local timer
cpu0: apic clock running at 100 MHz
cpu0: 16 page colors
ioapic0 at mainbus0 apid 2 (I/O APIC)
ioapic0: pa 0xfec00000, version 20, 24 pins
acpi0 at mainbus0: Advanced Configuration and Power Interface
acpi0: using Intel ACPI CA subsystem version 20060217
acpi0: X/RSDT: OemId <ASUS ,P4B266 ,42302e31>, AslId <MSFT,31313031>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
ACPI-Fast 24-bit timer
mpacpi: could not get bus number, assuming bus 0
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
acpibut0 at acpi0 (PNP0C0C): ACPI Power Button
PNP0C01 [System Board] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0A03 [PCI/PCI-X Host Bridge] at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not
configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not
configured
PNP0000 [AT Interrupt Controller] at acpi0 not configured
PNP0200 [AT DMA Controller] at acpi0 not configured
attimer0 at acpi0 (PNP0100): AT Timer
attimer0: io 0x40-0x43 irq 0
PNP0B00 [AT Real-Time Clock] at acpi0 not configured
pcppi0 at acpi0 (PNP0800)
pcppi0: io 0x61
sysbeep0 at pcppi0
npx0 at acpi0 (PNP0C04)
npx0: io 0xf0-0xff irq 13
npx0: reported by CPUID; using exception 16
fdc0 at acpi0 (PNP0700)
fdc0: io 0x3f2-0x3f5,0x3f7 irq 6 drq 2
lpt0 at acpi0 (PNP0401)
lpt0: io 0x378-0x37f,0x778-0x77b irq 7 drq 3
com0 at acpi0 (PNP0501-1)
com0: io 0x3f8-0x3ff irq 4
com0: ns16550a, working fifo
com1 at acpi0 (PNP0501-2)
com1: io 0x2f8-0x2ff irq 3
com1: ns16550a, working fifo
pckbc0 at acpi0 (PNP0303): kbd port
pckbc0: io 0x60,0x64 irq 1
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not
configured
pcppi0: attached to attimer0
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Intel 82845 Host (rev. 0x04)
agp0 at pchb0: aperture at 0xfe000000, size 0x800000
ppb0 at pci0 dev 1 function 0: Intel 82845 AGP (rev. 0x04)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
ppb1 at pci0 dev 30 function 0: Intel 82801BA Hub-PCI Bridge (rev. 0x05)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled
pdcide0 at pci2 dev 9 function 0
pdcide0: Promise Ultra100TX2/ATA Bus Master IDE Accelerator (rev. 0x01)
pdcide0: bus-master DMA support present
pdcide0: primary channel configured to native-PCI mode
pdcide0: using ioapic0 pin 21 (irq 12) for native-PCI interrupt
atabus0 at pdcide0 channel 0
pdcide0: secondary channel configured to native-PCI mode
atabus1 at pdcide0 channel 1
ex0 at pci2 dev 11 function 0: 3Com 3c905C-TX 10/100 Ethernet with
mngmt (rev. 0x74)
ex0: interrupting at ioapic0 pin 23 (irq 5)
ex0: MAC address 00:04:76:1a:33:b8
bmtphy0 at ex0 phy 24: Broadcom 3c905C internal PHY, rev. 6
bmtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vga0 at pci2 dev 12 function 0: S3 Trio32/64 (rev. 0x54)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
pcib0 at pci0 dev 31 function 0
pcib0: Intel 82801BA LPC Interface Bridge (rev. 0x05)
piixide0 at pci0 dev 31 function 1
piixide0: Intel 82801BA IDE Controller (ICH2) (rev. 0x05)
piixide0: bus-master DMA support present
piixide0: primary channel wired to compatibility mode
piixide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
atabus2 at piixide0 channel 0
piixide0: secondary channel wired to compatibility mode
piixide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus3 at piixide0 channel 1
uhci0 at pci0 dev 31 function 2: Intel 82801BA USB Controller (rev.
0x05)
uhci0: interrupting at ioapic0 pin 19 (irq 11)
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 31 function 4: Intel 82801BA USB Controller (rev.
0x05)
uhci1: interrupting at ioapic0 pin 23 (irq 5)
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
isa0 at pcib0
ioapic0: enabling
timecounter: Timecounter "TSC" frequency 2018112920 Hz quality 800
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
Kernelized RAIDframe activated
wd0 at atabus0 drive 0: <WDC WD2500SB-01RFA0>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 233 GB, 486344 cyl, 16 head, 63 sec, 512 bytes/sect x 490234752
sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(pdcide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using
DMA)
wd1 at atabus1 drive 0: <SAMSUNG SP1203N>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 111 GB, 232632 cyl, 16 head, 63 sec, 512 bytes/sect x 234493056
sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1(pdcide0:1:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100)
(using DMA)
wd2 at atabus2 drive 0: <WDC WD2500SB-01RFA0>
wd2: drive supports 16-sector PIO transfers, LBA48 addressing
wd2: 233 GB, 486344 cyl, 16 head, 63 sec, 512 bytes/sect x 490234752
sectors
wd2: 32-bit data port
wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd2(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100)
(using DMA)
raid0: RAID Level 1
raid0: Components: component0[**FAILED**] /dev/wd2a
raid0: Total Sectors: 490234624 (239372 MB)
boot device: raid0
root on raid0a dumps on raid0b
root file system type: ffs
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
The symptoms are exactly the same:
The machine locks up hard while rebuilding the RAID. No panic, no
change to get into
the kernel debugger. The problem is very easily reproducible. The
machine looks up
in about 75% of the attempt. It occasionally manages to complete the
parity rewrite
but freezes within the next few days.
The machine is currently running with a broken RAID 1 and has been up
for 82 days
without any problems.
Kind regards
--
Matthias Scheler http://zhadum.org.uk/
Home |
Main Index |
Thread Index |
Old Index