Subject: Re: kern/36097: http fetch stall in networking code
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-bugs
Date: 03/30/2007 16:50:05
The following reply was made to PR kern/36097; it has been noted by GNATS.
From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/36097: http fetch stall in networking code
Date: Fri, 30 Mar 2007 10:47:04 -0600
"Liam J. Foy" writes:
> The following reply was made to PR kern/36097; it has been noted by GNATS.
>
> From: "Liam J. Foy" <liamfoy@sepulcrum.org>
> To: gnats-bugs@NetBSD.org
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org, root@garbled.net
> Subject: Re: kern/36097: http fetch stall in networking code
> Date: Fri, 30 Mar 2007 17:36:05 +0100
>
> On 30 Mar 2007, at 16:55, Tim Rightnour wrote:
>
> > The following reply was made to PR kern/36097; it has been noted by
> > GNATS.
> >
> > From: Tim Rightnour <root@garbled.net>
> > To: gnats-bugs@NetBSD.org
> > Cc: netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
> > kern-bug-people@netbsd.org
> > Subject: Re: kern/36097: http fetch stall in networking code
> > Date: Fri, 30 Mar 2007 08:50:23 -0700 (MST)
> >
> > On 30-Mar-2007 YAMAMOTO Takashi wrote:
> >> i guess it's failing to transmit any packets with sack, or
> >> something like
> >> that.
> >> are you using any hw offloading?
> >
> > I've tested this on 4.0/i386 with a vr0 (no hardware offload that
> > I know of),
> > and 4.0/prep with an fxp0 with cpusaver turned on. Both acted
> > identically.
> >
> > At the time this was reported, a number of other people also
> > verified the same
> > behavior on thier 4.0 machines.
> >
>
> You say this is fixed in current - we just need to find out who may have
> fixed this on purpose or by accident. Anyone know :-)?
I'm not convinced this was fixed in current... On the 4.99.16 box I
was testing on, it worked sometimes, but not all the time... If you
look at Tim's trace, you'll see:
11:05:12.765774 IP muumi.lnet.lut.fi.www > polaris.64773: . 87870:89330(1460) ack 178 win 1728
11:05:12.765780 IP polaris.64773 > muumi.lnet.lut.fi.www: . ack 89330 win 32120
11:05:12.788019 IP muumi.lnet.lut.fi.www > polaris.64773: . 90790:92250(1460) ack 178 win 1728
11:05:12.788029 IP polaris.64773 > muumi.lnet.lut.fi.www: . ack 89330 win 33580 <nop,nop,sack sack 1 {90790:92250} >
What I'd like to know is where the 89330:90790 packet has gone. In
all traces where this transfer failed it was due to a "missing
packet". In many cases, it was *exactly* this packet that went
missing. We correctly ack that we have received up to 89330, but why the
remote end doesn't retransmit the missing 89330:90790 packet like
it's supposed to (or why it would go missing again if retransmitted)
I have no idea...
Later...
Greg Oster