NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Occasional weird TCP behavior



We're seeing weird behavior trying to initiate SSH connections between
our customer shell hosts (all NetBSD/amd64 10.0). While we see this only
on the SSH port, I don't think it's a problem with the SSH client or
server.

Sometimes connections return "Connection refused" at first, but succeed
the next time:

shellhost2: telnet shellhost1 ssh
  Trying 10.0.0.1...
  telnet: Unable to connect to remote host: Connection refused

a second later:

  shellhost2: telnet shellhost1 ssh
  Trying 10.0.0.1...
  Connected to shellhost1.panix.com.
  Escape character is '^]'.
  SSH-2.0-OpenSSH_9.6 NetBSD_Secure_Shell-20240701-hpn13v14-lpk

Running ktrace, the client shows ECONNREFUSED on the connect, and the
server shows no activity.

Looking at the packets on a failed connection shows this:

  client      server
  ------      --------
  SYN
              SYN,ACK
  RST

This suggests to me that either the server kernel sent something the
client kernel didn't like, or the client kernel is broken. I looked at
SYN,ACK packets on failed and successful connections and they seemed to
be essentially the same to me.

I have not been able to reproduce this on any other hosts. The hosts
in question take customer SSH sessions among other things, so there is
little firewalling.

Testing suggests these are not relevant:

  * IPv4 vs. IPv6
  * xennet vs. vioif (hosts are Xen guests on Linux host, but the problem
    exists even when not using Xen drivers).
  * ARP state (same behavior even when using statically configured ARP)

It does look like the problem doesn't happen if the two guests are
running on the same Xen host, so it may still be Xen related.

--------------------
Running "netstat -s" immediately before and after a failed connection
shows this:

  tcp:
    +1 connection requests
    +1 connections closed (including +0 drops)
    +1 embryonic connections dropped
    +1 dropped due to no socket

--------------------
I tried using the TCP_DEBUG kernel option to debug the connection, but
it doesn't seem useful to me:

Successful connection:

  852 SYN_SENT:output [9895d57d..9895d569)@0(win=8000)<SYN> -> SYN_SENT
  852 CLOSED:user CONNECT -> SYN_SENT
  852 SYN_SENT:input [eefe7dac..eefeb9ac)@9895d57e(win=8000)<SYN,ACK> -> ESTABLISHED
  852 ESTABLISHED:output [9895d57e..9895d56a)@eefe7dad(win=1065)<ACK> -> ESTABLISHED
  854 ESTABLISHED:input [eefe7dad..eefef0ad)@9895d57e(win=1065)<ACK,PUSH> -> ESTABLISHED
  [etc.]

Failed connection:

  810 SYN_SENT:output [26d7ae89..26d7ae75)@0(win=8000)<SYN> -> SYN_SENT
  810 CLOSED:user CONNECT -> SYN_SENT

Thanks.

--
- Brian



Home | Main Index | Thread Index | Old Index