Subject: kern/13275: IDE retries should be disableable
To: None <gnats-bugs@gnats.netbsd.org>
From: John Hawkinson <jhawk@mit.edu>
List: netbsd-bugs
Date: 06/21/2001 10:36:25
>Number: 13275
>Category: kern
>Synopsis: IDE retries should be disableable
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: kern-bug-people
>State: open
>Class: change-request
>Submitter-Id: net
>Arrival-Date: Thu Jun 21 07:40:02 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator: John Hawkinson
>Release: -current of 21 May 2001
>Organization:
MIT
>Environment:
System: NetBSD zorkmid.mit.edu 1.5U NetBSD 1.5U (ZORKMID-$Revision: 1.10 $) #103: Mon May 21 21:10:47 EDT 2001 jhawk@zorkmid.mit.edu:/usr/local/netbsd-current/src/sys/arch/i386/compile/ZORKMID i386
>Description:
On a machine with a number of known uncorrectable bad blocks on
an IDE disk, it is rather irritating to have the kernel retry to read them
5 times, when each retry involves a lengthy delay, and you know that it is
never going to succeed.
>How-To-Repeat:
Inspecting sys/dev/ata/wd.c, it appears to be hardcoded to retry
five times:
106 #define WDIORETRIES 5 /* number of retries before giving up */
...
592 retry: /* Just reset and retry. Can we do more ? */
593 wdc_reset_channel(wd->drvp);
594 diskerr(bp, "wd", errbuf, LOG_PRINTF,
595 wd->sc_wdc_bio.blkdone, wd->sc_dk.dk_label);
596 if (wd->retries++ < WDIORETRIES) {
597 printf(", retrying\n");
598 callout_reset(&wd->sc_restart_ch, RECOVERYTIME,
599 wdrestart, wd);
600 return;
601 }
602 printf("\n");
It would be desirable to change this behavior, though I'm not certain how.
At the very least, it should be easily patchable, so perhaps someone could
just patch "wdioretries" to 0 in ddb.
It would also be reasonable to expose this to the user. Perhaps by a sysctl,
or some other mechanism?
Is there prior art from other OSs [or even Linux] in this area?
>Fix:
As a workaround, one can patch the code. Unfortunately, this is a bit
more of a pain than expected, because the compiler optimizes the test on line
596 (above) to a jnle, so turning off retries completely requires more
opcode synthesis than I'm generally willing to do on the fly. But to drop
it from 5 to 2, just patch as follows:
0xc031f0e2 <wddone+226>: pushl %esi
0xc031f0e3 <wddone+227>: call 0xc01c2d64 <diskerr>
0xc031f0e8 <wddone+232>: addl $0x1c,%esp
0xc031f0eb <wddone+235>: movl 0x40c(%ebx),%eax
0xc031f0f1 <wddone+241>: incl 0x40c(%ebx)
0xc031f0f7 <wddone+247>: cmpl $0x4,%eax
0xc031f0fa <wddone+250>: jg 0xc031f130 <wddone+304>
0xc031f0fc <wddone+252>: pushl $0xc0472702
0xc031f101 <wddone+257>: call 0xc01c5d50 <printf>
wddone+247 (wddone+0xf7) is the test in question.
0xc031f0f7 <wddone+247>: 0x7f04f883
patching it to 0x7f00f883 makes it a "cmpl $0, %eax", which
results in two reads, which is better than 6, but not quite
what I was hoping for.
(Yeah, I guess I could just replace the next opcode with an unconditional
jump...)
>Release-Note:
>Audit-Trail:
>Unformatted: