Subject: kern/36896: LBA28 - LBA48 problem with Western Digital AAJS model hard drives
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <stuartb@cat.co.za>
List: netbsd-bugs
Date: 09/04/2007 13:20:00
>Number: 36896
>Category: kern
>Synopsis: LBA28 - LBA48 problem with Western Digital AAJS model hard drives
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Sep 04 13:20:00 +0000 2007
>Originator: Stuart Brooks
>Release: 3.1_STABLE
>Organization:
>Environment:
NetBSD test_77 3.1_STABLE NetBSD 3.1_STABLE (V5_GENERIC) #2: Mon Sep
3 12:15:38 SAST 2007 root@test_77:/usr/src/sys/arch/i386/compile/V5_GENERIC i386
>Description:
A block write across the LBA28 boundary on large (200GB+) Western Digital hard drives with an AAJS model number corrupts the boot sectors of the drive. This has been seen on a WD2500AAJS and a number of WD5000AAJS drives.
A sample of the error logs follows:
Aug 31 11:59:50 30_DEMO_697 /netbsd: wd1g: error writing fsbn 216369024 of 216369024-216369151 (wd1 bn 268435391; cn 266304 tn 15 sn 14), retrying
Aug 31 11:59:50 30_DEMO_697 /netbsd: wd1: (id not found)
Aug 31 11:59:51 30_DEMO_697 /netbsd: wd1: soft error (corrected)
At this point a portion of the disk from sector 0 has been overwritten by part of the block of data being written across the LBA28 boundary.
>How-To-Repeat:
To corrupt the drive:
dd if=/dev/zero of=/dev/rwd1d seek=134200 bs=1000k count=1000
To get a read failure:
dd if=/dev/rwd1d of=/dev/null skip=134200 bs=1000k count=1000
I have only seen this on the AAJS models.
>Fix:
I added the following entry to the wd_quirks array in src/sys/dev/ata/wd.c which appeared to fix the problem:
> { "WDC WD[1-9][0-9][0-9][0-9]AAJS*",
> WD_QUIRK_FORCE_LBA48 },