NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/52614: qemu virtual CD-ROM report read errors since recent wdc changes
The following reply was made to PR kern/52614; it has been noted by GNATS.
From: Andreas Gustafsson <gson%gson.org@localhost>
To: jdolecek%NetBSD.org@localhost
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/52614: qemu virtual CD-ROM report read errors since recent wdc changes
Date: Tue, 21 Aug 2018 11:29:54 +0300
Jaromir,
This bug is still causing large numbers of random installation
failures on the testbed. For example, the i386 install has failed
more than 200 times this year.
I added a bunch of debug printfs to the kernel to try to figure out
what's happening. Here's a summary of what I have found so far.
If there are other tests I can run to help debug this, please let me
know.
During a typical failed sysinst run, the "if (avail == 0)" branch
in ata_get_xfer_ext() was entered more than 9000 times, always with
flags == 0. From reading the code, this condition results in
ata_get_xfer_ext() returning NULL.
53 of the NULL-returning ata_get_xfer_ext() calls were from
wdc_atapi_scsipi_request(), causing sc_xfer->error to be
set to XS_RESOURCE_SHORTAGE.
One read from cd0a failed, with bp->b_error == 16 (EBUSY).
I'm not sure how to interpret these results - are these frequent NULL
returns from ata_get_xfer_ext() themselves the problem, or are they
expected and the bug is the scsipi code not recovering from them?
Here are the kernel log messages from the last few seconds leading up
to the cd0a read error, with the debug printfs in place:
[ 4066.9440596] ata_get_xfer_ext avail 0, flags 00000000
[ 4066.9440596] ata_get_xfer_ext() returned NULL
[ 4066.9440596] cd0(piixide0:0:1): adapter resource shortage
[ 4067.1440971] ata_get_xfer_ext avail 0, flags 00000000
[ 4068.9066925] ata_get_xfer_ext avail 0, flags 00000000
[ 4068.9066925] ata_get_xfer_ext() returned NULL
[ 4068.9066925] cd0(piixide0:0:1): adapter resource shortage
[ 4070.8755630] ata_get_xfer_ext avail 0, flags 00000000
[ 4070.8755630] ata_get_xfer_ext() returned NULL
[ 4070.8755630] cd0(piixide0:0:1): adapter resource shortage
[ 4071.0664209] ata_get_xfer_ext avail 0, flags 00000000
[ 4071.0664209] ata_get_xfer_ext avail 0, flags 00000000
[ 4071.0664209] ata_get_xfer_ext avail 0, flags 00000000
[ 4071.0664209] ata_get_xfer_ext avail 0, flags 00000000
[ 4072.8642938] ata_get_xfer_ext avail 0, flags 00000000
[ 4072.8642938] ata_get_xfer_ext() returned NULL
[ 4072.8642938] cd0(piixide0:0:1): adapter resource shortage
[ 4072.8642938] cddone error=16
[ 4072.8642938] cd0a: error (errno=16) reading fsbn 3565164 of 3565164-3565179 (cd0 bn 3565164; cn 35651 tn 0 sn 64)
This was produced with the following patches applied. The first
patch is to stop sysinst from intercepting the console output,
which will otherwise cause parts of the kernel messages to be lost.
Index: src/usr.sbin/sysinst/run.c
===================================================================
RCS file: /cvsroot/src/usr.sbin/sysinst/run.c,v
retrieving revision 1.5
diff -u -r1.5 run.c
--- src/usr.sbin/sysinst/run.c 30 Dec 2014 10:10:22 -0000 1.5
+++ src/usr.sbin/sysinst/run.c 15 Aug 2018 12:36:42 -0000
@@ -387,7 +387,9 @@
char *cp, *ncp;
struct termios rtt, tt;
struct timeval tmo;
+#if 0
static int do_tioccons = 2;
+#endif
(void)tcgetattr(STDIN_FILENO, &tt);
if (openpty(&master, &slave, NULL, &tt, win) == -1) {
@@ -401,6 +403,7 @@
ttysig_ignore = 1;
ioctl(master, TIOCPKT, &ttysig_ignore);
+#if 0
/* Try to get console output into our pipe */
if (do_tioccons) {
if (ioctl(slave, TIOCCONS, &do_tioccons) == 0
@@ -415,6 +418,7 @@
do_tioccons = 1;
}
}
+#endif
if (logfp)
fflush(logfp);
Index: src/sys/dev/ata/ata_subr.c
===================================================================
RCS file: /cvsroot/src/sys/dev/ata/ata_subr.c,v
retrieving revision 1.4
diff -u -r1.4 ata_subr.c
--- src/sys/dev/ata/ata_subr.c 20 Oct 2017 07:06:07 -0000 1.4
+++ src/sys/dev/ata/ata_subr.c 15 Aug 2018 15:31:48 -0000
@@ -288,6 +288,7 @@
retry:
avail = ffs32(chq->queue_xfers_avail & mask);
if (avail == 0) {
+ printf("ata_get_xfer_ext avail 0, flags %08x\n", flags);
/*
* Catch code which tries to get another recovery xfer while
* already holding one (wrong recursion).
@@ -299,6 +300,7 @@
if (flags & C_WAIT) {
chq->queue_flags |= QF_NEED_XFER;
error = cv_wait_sig(&chq->queue_busy, &chp->ch_lock);
+ printf("ata_get_xfer_ext cv_wait_sig error=%d\n", error);
if (error == 0)
goto retry;
}
Index: src/sys/dev/scsipi/atapi_wdc.c
===================================================================
RCS file: /cvsroot/src/sys/dev/scsipi/atapi_wdc.c,v
retrieving revision 1.129
diff -u -r1.129 atapi_wdc.c
--- src/sys/dev/scsipi/atapi_wdc.c 17 Oct 2017 18:52:51 -0000 1.129
+++ src/sys/dev/scsipi/atapi_wdc.c 15 Aug 2018 15:26:24 -0000
@@ -389,6 +389,7 @@
xfer = ata_get_xfer_ext(atac->atac_channels[channel], false, 0);
if (xfer == NULL) {
+ printf("ata_get_xfer_ext() returned NULL\n");
sc_xfer->error = XS_RESOURCE_SHORTAGE;
scsipi_done(sc_xfer);
return;
Index: src/sys/dev/scsipi/cd.c
===================================================================
RCS file: /cvsroot/src/sys/dev/scsipi/cd.c,v
retrieving revision 1.341
diff -u -r1.341 cd.c
--- src/sys/dev/scsipi/cd.c 17 Jun 2017 22:35:50 -0000 1.341
+++ src/sys/dev/scsipi/cd.c 14 Aug 2018 17:47:47 -0000
@@ -591,6 +591,9 @@
if (obp->b_error)
obp->b_resid = obp->b_bcount;
+ if (obp->b_error)
+ printf("cd bounce error=%d\n", obp->b_error);
+
free(bounce, M_DEVBUF);
biodone(obp);
}
@@ -743,6 +746,7 @@
return;
bad:
+ printf("cdstrategy bad error=%d\n", error);
bp->b_error = error;
bp->b_resid = bp->b_bcount;
biodone(bp);
@@ -915,6 +919,8 @@
struct buf *bp = xs->bp;
if (bp) {
+ if (error)
+ printf("cddone error=%d\n", error);
bp->b_error = error;
bp->b_resid = xs->resid;
if (error) {
Index: src/sys/kern/subr_disk.c
===================================================================
RCS file: /cvsroot/src/sys/kern/subr_disk.c,v
retrieving revision 1.122
diff -u -r1.122 subr_disk.c
--- src/sys/kern/subr_disk.c 7 Mar 2018 21:13:24 -0000 1.122
+++ src/sys/kern/subr_disk.c 14 Aug 2018 10:00:34 -0000
@@ -139,7 +139,7 @@
pr = addlog;
} else
pr = printf;
- (*pr)("%s%d%c: %s %sing fsbn ", dname, unit, partname, what,
+ (*pr)("%s%d%c: %s (errno=%d) %sing fsbn ", dname, unit, partname, what, bp->b_error,
bp->b_flags & B_READ ? "read" : "writ");
sn = bp->b_blkno;
if (bp->b_bcount <= DEV_BSIZE)
--
Andreas Gustafsson, gson%gson.org@localhost
Home |
Main Index |
Thread Index |
Old Index