Frank Wille <frank%phoenix.owl.de@localhost> writes: >> run "route -n monitor" and save it to a file. Look for failed route >> lookups that seem relevant. > > This produces an empty file. There is never anything in it. That means that there are probably no redirect routes added due to unreachables. That's not surprising, but I always like to check for trouble that is not expected. > >> run "setkey -x" and save that. This will probably just confirm that >> racoon is doing what it said. > > There is not much in it either. But I have seen some more output in previous > test runs. You find it attached. That's probably because the problem is that the phase1 is getting torn down and phase2 is never really getting set up. phase1 is the SA between IKE implmeentations, and doesn't show up in the kernel, and phase2 are the IPsec SAs. > To be honest, I don't know if I need to run any setkey commands at all, when > doing a completely certificate-based connection set up. In racoon.conf I > have currently a "test.sh" script for phase1-up and -down, which does > nothing besides echoing $LOCAL_ADDR, $REMOTE_ADDR, etc., although I never > see that anywhere. Generally I have had to install policies (setkey -P) to require IPsec, in order to cause acquire messages to go to racoon. >> run 'tcpdump -s1500 -wFILE', and then go back and look at the 5 >> seconds very carefully. > > I attached the ASCII version of it, replacing all remote gateway addresses > with 1.2.3.4. Will send you the pcap by private email. Thanks for the pcap; it seems the same. The real point is that the remote side started sending unreachables, and... > I also entered the Lancom console and logged the VPN status with "trace + > vpn-status". It might provide some useful information (see attached). It > shows that phase-1 looks good. The Lancom sends one keep-alive and receives > an ACK. Then it receives a keep-alive and sends an ACK. > > But 10 seconds later, at 12:32:22, it prints "time out" and > "IFC-R-Connection-timeout-dynamic", without any other reason... > > This seems to be within phase-2. From now on "phase 2/others I inf[E]" is no > longer answered by the Lancom, although it happened twice before. It does, but the remote side is apparently tearing down the phase1 SA, which is why the IKE<>IKE messages on port 4500 started getting unreachables. So we really need to get the lancom side to explain *why* it is deciding there is a timeout and what it thinks should have happened. >> run 'setkey -D' and 'setkey -D -P' after negotiation and before the >> DPD failure. Check that the SAs match. > > I never saw anything else than > No SAD entries. > No SPD entries. > at any point of time. So I really wonder how the traffic will get forced into the tunnel once set up. Perhaps the default is "use" these days. I have always set up SPD entries. >> The big question in my mind is whether the DPD is wrong > > Probably not. I verified that both sides have DPD with 20 seconds enabled. I > haven't set "natt_keepalive" yet, as I hoped the default value will do... It is pretty clear that the NAT state is not timing out. What I don't understand is why there is no response to the keepalive. I would hook up a hub to near the lancom and run tcpdump on that to see if the keepalives are arriving and if there are responses. And read the protocol specs to see how this is supposed to work. >> This shows that the ISAKMP SA was created, but no phase 2 SA. > > This is also my impression at the moment. Something goes wrong in phase 2? > > Hmm, but why is the phase1-up script not called? This feels like a weak clue that phase1 is not really finished. > Not really. How can I check that, besides seeing UDP packages exchanged on > port 4500 from both sides? You run tcpdump on both sides and see if the traffic is consistent. > Thanks for looking into the problem! It would really be nice to get IPSec > with NetBSD going, as my company intends to use NetBSD (rdesktop via VPN) > on all their notebooks! Wow! You should also make sure clocks are synced before doing these tests. They seem pretty close but not quite. I think the biggest clue comes from interleaving the packets and messages: Feb 29 12:31:52 powerbook racoon: INFO: ISAKMP-SA established 192.168.1.5[4500]-1.2.3.4[4500] spi:4f5e1f08e12bd21c:2e8dc875b4e07c26 [VPN-Status] 2016/02/29 12:31:52,431 IKE info: NOTIFY received of type INITIAL_CONTACT for peer VPNCLIENT15EF90 12:31:52.467385 IP powerbook-wlan.owl.de.ipsec-nat-t > 1.2.3.4.ipsec-nat-t: NONESP-encap: isakmp: phase 2/others I inf[E] [VPN-Status] 2016/02/29 12:31:52,441 IKE info: ISAKMP_NOTIFY_DPD_R_U_THERE sent for Phase-1 SA to peer VPNCLIENT15EF90, sequence nr 0x56e60865 12:31:52.539912 IP 1.2.3.4.ipsec-nat-t > powerbook-wlan.owl.de.ipsec-nat-t: NONESP-encap: isakmp: phase 2/others R inf[E] [VPN-Status] 2016/02/29 12:31:52,530 IKE info: NOTIFY received of type ISAKMP_NOTIFY_DPD_R_U_THERE_ACK for peer VPNCLIENT15EF90 Seq-Nr 0x56e60865, expected 0x56e60865 12:31:52.570866 IP powerbook-wlan.owl.de.ipsec-nat-t > 1.2.3.4.ipsec-nat-t: NONESP-encap: isakmp: phase 2/others I inf[E] [VPN-Status] 2016/02/29 12:32:12,430 IKE info: NOTIFY received of type ISAKMP_NOTIFY_DPD_R_U_THERE for peer VPNCLIENT15EF90 Seq-Nr 0xea7, expected 0xea7 12:32:12.486257 IP powerbook-wlan.owl.de.ipsec-nat-t > 1.2.3.4.ipsec-nat-t: NONESP-encap: isakmp: phase 2/others I inf[E] [VPN-Status] 2016/02/29 12:32:12,433 IKE info: ISAKMP_NOTIFY_DPD_R_U_THERE_ACK sent for Phase-1 SA to peer VPNCLIENT15EF90, sequence nr 0xea7 12:32:12.533086 IP 1.2.3.4.ipsec-nat-t > powerbook-wlan.owl.de.ipsec-nat-t: NONESP-encap: isakmp: phase 2/others R inf[E] [these seem to result in no log entries] 12:32:27.793879 IP powerbook-wlan.owl.de.ipsec-nat-t > 1.2.3.4.ipsec-nat-t: isakmp-nat-keep-alive [VPN-Status] 2016/02/29 12:32:22,284 VPN: connection for VPNCLIENT15EF90 (91.56.236.148) timed out: no response Feb 29 12:32:57 powerbook racoon: [1.2.3.4] INFO: DPD: remote (ISAKMP-SA spi=4f5e1f08e12bd21c:2e8dc875b4e07c26) seems to be dead. Feb 29 12:32:57 powerbook racoon: INFO: purging ISAKMP-SA spi=4f5e1f08e12bd21c:2e8dc875b4e07c26. Feb 29 12:32:57 powerbook racoon: INFO: purged ISAKMP-SA spi=4f5e1f08e12bd21c:2e8dc875b4e07c26. Feb 29 12:32:57 powerbook racoon: INFO: ISAKMP-SA deleted 192.168.1.5[4500]-1.2.3.4[4500] spi:4f5e1f08e12bd21c:2e8dc875b4e07c26 Feb 29 12:32:57 powerbook racoon: INFO: KA remove: 192.168.1.5[4500]->1.2.3.4[4500] Note that the ISAKMP_NOTIFY_DPD_R_U_THERE was sent and rceeived. Then it was received and send, and the timeout is 10s following the received. I don't follow this; it seems like there was no non-response to the DPD messages at the server, and I wonder if it was failure to negotiate a good enough phase 2 that led to the dropping and it's not about the DPD at all. But you will need to get more information out of the server, which I realize may not be doable, or from the client. Another idea is that the client isn't trying to negotiate a phase 2. So I wonder if the client thinks the phase1 is not complete. But the message says it is. So the two things to run down are: why isn't the client trying to negotiate phase2 why is the server declaring a timeout
Attachment:
signature.asc
Description: PGP signature