Subject: Re: Dealing with bad blocks on fixed disks
To: Darren Reed <darrenr@reed.wattle.id.au>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 04/13/2003 16:05:47
On Sun, Apr 13, 2003 at 10:49:01AM +1000, Darren Reed wrote:
>
> Having been bitten again(!) by a disk with errors, it occurred to me that
> NetBSD's handling of these errors was sub-optimal - the kernel would keep
> trying block X and attempts to access it would always result in an error
> that was considered "unrecoverable".
>
> I discussed this briefly with Jason who suggested a linked list might be
> suitable for the job, because, (a) only disks with errors would be impacted
> and (b) if the list is non-NULL, the disk is already "in bad shape". The
> logic seems fine to me :-) A few hours later and I've come up with some
> patches, below. Of the things I'm not sure about, malloc/free and cleaning
> up the SLIST created are top of the list.
>
> To round out the functionality, I added DIOCBSLIST (to get a list of the
> bad sectors) and DIOCBSFLUSH (flush the list of bad sectors). I'm not
> 100% sure if I've done DIOCBSLIST "correctly" but given the lack of sysctl
> at that level, I didn't see any other way to go about it. Amongst other
> things, the impact of changes to the bad sector structure on any
> application that uses DIOCBSLIST is of concern and whether or not that
> can be designed around. At first glance, it would appear that atactl(8)
> would be the appropriate place to start with for a user interface to these
> ioctls to the disks ?
Probably dkctl(8)
> Index: sys/disk.h
> ===================================================================
> RCS file: /cvsroot/src/sys/sys/disk.h,v
> retrieving revision 1.20
> diff -c -r1.20 disk.h
> *** sys/disk.h 2002/11/01 11:32:02 1.20
> --- sys/disk.h 2003/04/13 00:29:04
> ***************
> *** 182,187 ****
> --- 182,197 ----
> */
> TAILQ_HEAD(disklist_head, disk); /* the disklist is a TAILQ */
>
> + /*
> + * Bad sector lists per fixed disk
> + */
> + struct disk_badsectors {
> + SLIST_ENTRY(disk_badsectors) dbs_next;
> + u_int64_t dbs_min; /* min. sector number */
> + u_int64_t dbs_max; /* max. sector number */
> + struct timeval dbs_failedat; /* first failure at */
> + };
> +
> #ifdef _KERNEL
> extern int disk_count; /* number of disks in global disklist */
>
> Index: sys/dkio.h
> ===================================================================
> RCS file: /cvsroot/src/sys/sys/dkio.h,v
> retrieving revision 1.6
> diff -c -r1.6 dkio.h
> *** sys/dkio.h 2002/01/09 04:12:13 1.6
> --- sys/dkio.h 2003/04/13 00:29:04
> ***************
> *** 88,91 ****
> --- 88,95 ----
> /* sync disk cache */
> #define DIOCCACHESYNC _IOW('d', 118, int) /* sync cache (force?) */
>
> + /* bad sector list */
> + #define DIOCBSLIST _IOWR('d', 119, caddr_t) /* get list */
> + #define DIOCBSFLUSH _IO('d', 120) /* flush list */
> +
> #endif /* _SYS_DKIO_H_ */
> Index: dev/ata/wd.c
> ===================================================================
> RCS file: /cvsroot/src/sys/dev/ata/wd.c,v
> retrieving revision 1.240
> diff -c -r1.240 wd.c
> *** dev/ata/wd.c 2003/04/03 22:18:23 1.240
> --- dev/ata/wd.c 2003/04/13 00:29:08
> ***************
> *** 288,293 ****
> --- 288,294 ----
> #else
> bufq_alloc(&wd->sc_q, BUFQ_DISKSORT|BUFQ_SORT_RAWBLOCK);
> #endif
> + SLIST_INIT(&wd->sc_bslist);
>
> wd->atabus = adev->adev_bustype;
> wd->openings = adev->adev_openings;
> ***************
> *** 423,428 ****
> --- 424,436 ----
> struct buf *bp;
> int s, bmaj, cmaj, i, mn;
>
> + /* Clean out the bad sector list */
> + while (!SLIST_EMPTY(&sc->sc_bslist)) {
> + void *head = SLIST_FIRST(&sc->sc_bslist);
> + SLIST_REMOVE_HEAD(&sc->sc_bslist, dbs_next);
> + free(head, M_TEMP);
> + }
> +
> /* locate the major number */
> bmaj = bdevsw_lookup_major(&wd_bdevsw);
> cmaj = cdevsw_lookup_major(&wd_cdevsw);
> ***************
> *** 525,530 ****
> --- 533,555 ----
>
> bp->b_rawblkno = blkno;
>
> + /*
> + * If the transfer about to be attempted contains only a block that
> + * is known to be bad then return an error for the transfer without
> + * even attempting to start a transfer up under the premis that we
> + * will just end up doing more retries for a transfer that will end
> + * up failing again.
> + * XXX:SMP - mutex required to protect with DIOCBSFLUSH
> + */
This should probably be done only for read. For a bad block, IDE disks usually
return an error on read, and remap the bad sector on write.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 24 ans d'experience feront toujours la difference
--