Re: IPsec debugging

To: Frank Wille <frank%phoenix.owl.de@localhost>
Subject: Re: IPsec debugging
From: Greg Troxel <gdt%ir.bbn.com@localhost>
Date: Mon, 29 Feb 2016 20:09:08 -0500

Frank Wille <frank%phoenix.owl.de@localhost> writes:

>>   run "route -n monitor" and save it to a file.   Look for failed route
>>   lookups that seem relevant.
>
> This produces an empty file. There is never anything in it.

That means that there are probably no redirect routes added due to
unreachables.  That's not surprising, but I always like to check for
trouble that is not expected.
>
>>   run "setkey -x" and save that.  This will probably just confirm that
>>   racoon is doing what it said.
>
> There is not much in it either. But I have seen some more output in previous
> test runs. You find it attached.

That's probably because the problem is that the phase1 is getting torn
down and phase2 is never really getting set up.  phase1 is the SA
between IKE implmeentations, and doesn't show up in the kernel, and
phase2 are the IPsec SAs.

> To be honest, I don't know if I need to run any setkey commands at all, when
> doing a completely certificate-based connection set up. In racoon.conf I
> have currently a "test.sh" script for phase1-up and -down, which does
> nothing besides echoing $LOCAL_ADDR, $REMOTE_ADDR, etc., although I never
> see that anywhere.

Generally I have had to install policies (setkey -P) to require IPsec,
in order to cause acquire messages to go to racoon.

>>   run 'tcpdump -s1500 -wFILE', and then go back and look  at the 5
>>   seconds very carefully.
>
> I attached the ASCII version of it, replacing all remote gateway addresses
> with 1.2.3.4. Will send you the pcap by private email.

Thanks for the pcap; it seems the same.   The real point is that the
remote side started sending unreachables, and...

> I also entered the Lancom console and logged the VPN status with "trace +
> vpn-status". It might provide some useful information (see attached). It
> shows that phase-1 looks good. The Lancom sends one keep-alive and receives
> an ACK. Then it receives a keep-alive and sends an ACK.
>
> But 10 seconds later, at 12:32:22, it prints "time out" and
> "IFC-R-Connection-timeout-dynamic", without any other reason...
>
> This seems to be within phase-2. From now on "phase 2/others I inf[E]" is no
> longer answered by the Lancom, although it happened twice before.

It does, but the remote side is apparently tearing down the phase1 SA,
which is why the IKE<>IKE messages on port 4500 started getting unreachables.

So we really need to get the lancom side to explain *why* it is deciding
there is a timeout and what it thinks should have happened.

>>   run 'setkey -D' and 'setkey -D -P' after negotiation and before the
>>   DPD failure.  Check that the SAs match.
>
> I never saw anything else than
>   No SAD entries.
>   No SPD entries.
> at any point of time.

So I really wonder how the traffic will get forced into the tunnel once
set up.  Perhaps the default is "use" these days.  I have always set up
SPD entries.

>> The big question in my mind is whether the DPD is wrong
>
> Probably not. I verified that both sides have DPD with 20 seconds enabled. I
> haven't set "natt_keepalive" yet, as I hoped the default value will do...

It is pretty clear that the NAT state is not timing out.  What I don't
understand is why there is no response to the keepalive. I would hook up
a hub to near the lancom and run tcpdump on that to see if the
keepalives are arriving and if there are responses.  And read the
protocol specs to see how this is supposed to work.

>> This shows that the ISAKMP SA was created, but no phase 2 SA.
>
> This is also my impression at the moment. Something goes wrong in phase 2?
>
> Hmm, but why is the phase1-up script not called?

This feels like a weak clue that phase1 is not really finished.

> Not really. How can I check that, besides seeing UDP packages exchanged on
> port 4500 from both sides?

You run tcpdump on both sides and see if the traffic is consistent.

> Thanks for looking into the problem! It would really be nice to get IPSec
> with NetBSD going, as my company intends to use NetBSD (rdesktop via VPN)
> on all their notebooks!

Wow!

You should also make sure clocks are synced before doing these tests.
They seem pretty close but not quite.

I think the biggest clue comes from interleaving the packets and messages:

  Feb 29 12:31:52 powerbook racoon: INFO: ISAKMP-SA established 192.168.1.5[4500]-1.2.3.4[4500] spi:4f5e1f08e12bd21c:2e8dc875b4e07c26 

  [VPN-Status] 2016/02/29 12:31:52,431
  IKE info: NOTIFY received of type INITIAL_CONTACT for peer VPNCLIENT15EF90
  12:31:52.467385 IP powerbook-wlan.owl.de.ipsec-nat-t > 1.2.3.4.ipsec-nat-t: NONESP-encap: isakmp: phase 2/others I inf[E]

  [VPN-Status] 2016/02/29 12:31:52,441
  IKE info: ISAKMP_NOTIFY_DPD_R_U_THERE sent for Phase-1 SA to peer VPNCLIENT15EF90, sequence nr 0x56e60865
  12:31:52.539912 IP 1.2.3.4.ipsec-nat-t > powerbook-wlan.owl.de.ipsec-nat-t: NONESP-encap: isakmp: phase 2/others R inf[E]

  [VPN-Status] 2016/02/29 12:31:52,530
  IKE info: NOTIFY received of type ISAKMP_NOTIFY_DPD_R_U_THERE_ACK for peer VPNCLIENT15EF90 Seq-Nr 0x56e60865, expected 0x56e60865
  12:31:52.570866 IP powerbook-wlan.owl.de.ipsec-nat-t > 1.2.3.4.ipsec-nat-t: NONESP-encap: isakmp: phase 2/others I inf[E]

  [VPN-Status] 2016/02/29 12:32:12,430
  IKE info: NOTIFY received of type ISAKMP_NOTIFY_DPD_R_U_THERE for peer VPNCLIENT15EF90 Seq-Nr 0xea7, expected 0xea7
  12:32:12.486257 IP powerbook-wlan.owl.de.ipsec-nat-t > 1.2.3.4.ipsec-nat-t: NONESP-encap: isakmp: phase 2/others I inf[E]

  [VPN-Status] 2016/02/29 12:32:12,433
  IKE info: ISAKMP_NOTIFY_DPD_R_U_THERE_ACK sent for Phase-1 SA to peer VPNCLIENT15EF90, sequence nr 0xea7
  12:32:12.533086 IP 1.2.3.4.ipsec-nat-t > powerbook-wlan.owl.de.ipsec-nat-t: NONESP-encap: isakmp: phase 2/others R inf[E]

  [these seem to result in no log entries]
  12:32:27.793879 IP powerbook-wlan.owl.de.ipsec-nat-t > 1.2.3.4.ipsec-nat-t: isakmp-nat-keep-alive

  [VPN-Status] 2016/02/29 12:32:22,284
  VPN: connection for VPNCLIENT15EF90 (91.56.236.148) timed out: no response

  Feb 29 12:32:57 powerbook racoon: [1.2.3.4] INFO: DPD: remote (ISAKMP-SA spi=4f5e1f08e12bd21c:2e8dc875b4e07c26) seems to be dead. 
  Feb 29 12:32:57 powerbook racoon: INFO: purging ISAKMP-SA spi=4f5e1f08e12bd21c:2e8dc875b4e07c26. 
  Feb 29 12:32:57 powerbook racoon: INFO: purged ISAKMP-SA spi=4f5e1f08e12bd21c:2e8dc875b4e07c26. 
  Feb 29 12:32:57 powerbook racoon: INFO: ISAKMP-SA deleted 192.168.1.5[4500]-1.2.3.4[4500] spi:4f5e1f08e12bd21c:2e8dc875b4e07c26 
  Feb 29 12:32:57 powerbook racoon: INFO: KA remove: 192.168.1.5[4500]->1.2.3.4[4500] 

Note that the ISAKMP_NOTIFY_DPD_R_U_THERE was sent and rceeived.  Then
it was received and send, and the timeout is 10s following the received.
I don't follow this; it seems like there was no non-response to the DPD
messages at the server, and I wonder if it was failure to negotiate a
good enough phase 2 that led to the dropping and it's not about the DPD
at all.  But you will need to get more information out of the server,
which I realize may not be doable, or from the client.

Another idea is that the client isn't trying to negotiate a phase 2.  So
I wonder if the client thinks the phase1 is not complete.  But the
message says it is.

So the two things to run down are:

  why isn't the client trying to negotiate phase2

  why is the server declaring a timeout

Attachment: signature.asc
Description: PGP signature

Follow-Ups:
- Re: IPsec debugging
  - From: Frank Wille
- Re: IPsec debugging
  - From: Frank Wille

References:
- IPsec debugging
  - From: Frank Wille
- Re: IPsec debugging
  - From: Greg Troxel
- Re: IPsec debugging
  - From: Frank Wille

Prev by Date: Re: IPsec debugging
Next by Date: Re: IPsec debugging
Previous by Thread: Re: IPsec debugging
Next by Thread: Re: IPsec debugging
Indexes:

Home | Main Index | Thread Index | Old Index