Ryota,I discovered this crash on netbsd current kernels is triggered by a rarely encountered routing configuration for IPv6.
Here are the results of decoding the backtrace when it crashes (if_mcast_op is there):
address function File:line number ?() at ffffffff80205845 breakpoint ??:? ?() at ffffffff804ee4e8 vpanic src/sys/kern/subr_prf.c:343 ?() at ffffffff805ec155 kern_assert ??:??() at ffffffff8057ef93 if_mcast_op src/sys/net/if.c:3595 (discriminator 1) ?() at ffffffff802ce97c in6_addmulti src/sys/netinet6/mld6.c:747 (discriminator 3)
?() at ffffffff802d333f nd6_rtrequest src/sys/netinet6/nd6.c:1585 ?() at ffffffff805af696 rtrequest1 src/sys/net/route.c:1292 ?() at ffffffff805b2aac route_output src/sys/net/rtsock.c:759 ?() at ffffffff805b0c8a route_send src/sys/net/rtsock.c:473 ?() at ffffffff80520989 sosend src/sys/kern/uipc_socket.c:1075 ?() at ffffffff80505e28 soo_write src/sys/kern/sys_socket.c:122 ?() at ffffffff804fa433 dofilewrite src/sys/kern/sys_generic.c:350 ?() at ffffffff804fa539 sys_write src/sys/kern/sys_generic.c:320 ?() at ffffffff8020f2bc sy_call src/sys/sys/syscallvar.h:66After further testing I discovered the crash does not occur unless IPv6 routing to the VPN client is configured a certain way.
After looking at this decoding of the backtrace which involves routing and IPv6, I learned the crash is triggered by the -proxy modifier to the route command I was using in the ipv6-up script. Keep in mind I am using a NetBSD 7 userland and the NetBSD 7 version of the route command. I do not know if current's version of the route command can also use the -proxy modifier.
More details:A while ago I discovered IPv6 connectivity to the VPN client requires that a route be added to the peer in the ipv6-up script of pppd, which is called when ppp0 comes up after phase1 and phase2 are established if the +ipv6 option is set in /etc/ppp/options. So I included this line in my ipv6-up script (In ipv6-up, $4 is the local IPv6 address on the ppp link, $5 is the remote IPv6 address on the ppp link, and $1 is the ppp interface name):
/sbin/route add -inet6 $5%$1 $4%$1 -interface -proxyThis provided connectivity between the peers on the ppp link. I added the -proxy modifier hoping to get the VPN client appear to be on the link-local ethernet network (just as pppd's proxyarp option does this in IPv4, proxy ndp theoretically can do this in IPv6). Although the -proxy modifier to the route command did not work to provide proxy ndp for IPv6 on NetBSD, nor did using the ndp proxy command, it did not cause a system crash on NetBSD 7 or 8 kernels, but this -proxy modifier is what triggers the crash on NetBSD current kernels. I did find a solution for proxy ndp on NetBSD 7, but it required a patch to the NetBSD 7 kernel and use of the -proxy modifier in the route command.
When I do this instead in ipv6-up I do not see a crash: /sbin/route add -inet6 $5%$1 $4%$1 -interfaceWithout the -proxy modifier to the route command, there is no crash and IPv4 connectivity for the VPN client works fine using the proxyarp option in pppd. For IPv6, I only have connectivity on the link-local ppp link, as expected when only using link-local addresses without proxy ndp.
According to route's man page, the -proxy modifier sets the RTF_ANNOUNCE flag, and as far as I can tell from the web interface for route's man page -proxy is still valid for NetBSD 8.0, although maybe it is not actually available in NetBSD current now, in which case this crash would never be seen in ordinary systems using current's route command. But using NetBSD 7's route command with the -proxy modifier with a current kernel, you will see this crash.
Chuck On 05/29/2018 09:26 PM, Ryota Ozaki wrote:
On Wed, May 30, 2018 at 7:02 AM Chuck Zmudzinski <frchuckz%gmail.com@localhost> wrote:Ryota, Here is what I am getting with the crash. I do not know how to decode it.Please do addr2line -f -e <kernel_binary> <address> for each address. Or objdump -d <kernel_binary> and search functions containing each address from the output by hand. Or if you can do, build a kernel with 'makeoptions DEBUG="-g"' and use it, then you can get a backtrace with symbols on a panic. Thanks, ozaki-rI type bt and just get a bunch of hex numbers that I do not know how to interpret. I try sync and get a messages that dumping to dev 142,1 (offset=6291455, size=0): not possible. After reboot, there is no core dump in /var/crash. Maybe it is somewhere else. I checked that I do have a dump device configured and I think I am still using the default values for savecore. What else can I try to decode this? I tried using a separate larger partition for /var/crash but that didn't make any difference. Chuck Here is the output from bt and sync from the db prompt: db{1}> bt ?() at ffffffff80205845 ?() at ffffffff804ee4e8 ?() at ffffffff805ec155 ?() at ffffffff8057ef93 ?() at ffffffff802ce97c ?() at ffffffff802d333f ?() at ffffffff805af696 ?() at ffffffff805b2aac ?() at ffffffff805b0c8a ?() at ffffffff80520989 ?() at ffffffff80505e28 ?() at ffffffff804fa433 ?() at ffffffff804fa539 ?() at ffffffff8020f2bc db{1}> sync [ 1634.8391410] dumping to dev 142,1 (offset=6291455, size=0): not possible [ 1634.8391410] rebooting... On 05/29/2018 04:42 AM, Ryota Ozaki wrote:On Fri, May 25, 2018 at 5:20 AM Maxime Villard <max%m00nbsd.net@localhost> wrote:Le 24/05/2018 à 21:13, Chuck Zmudzinski a écrit :Well, the crash is repeatable on the one week old daily snapshot current kernel. Again, here is the current kernel I am using: NetBSD 8.99.17 (XEN3_DOMU) #0: Wed May 16 21:54:38 UTC 2018 mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/xen/compile/XEN3_DOMU What is happening is ... crazy. With the current kernel, when the remote client connects, we get caughtinan endless loop of creating ipsec security associations. The log showsphase1is created, then the phase2 associations, then we respond to negotiatea newphase1 and two new phase 2's, and I think this loop just continueduntil weran out of memory. The windows client actually thought we wereconnected andshowed it was connected in the network control panel, but the racoon log never reported that a ppp interface was up. When you look at theattachedsnippets from the logs, I bet you will agree that many ppp interfacesandipsec SAs were created and when we finally ran out of memory to create another one, we crashed. I say this because the trace indicated thecrashoccurred at this branch. [1]. From the console at the start of the crash report, I got this: [ 334.5292103] panic: kernel diagnostic assertion "IFNET_LOCKED(ifp)"failed: file "/usr/src/sys/net/if.c", line 3595I don't understand line 3595 because if.c only has 661 lines, unlesstherewas a mistake in how I copied it from the log.You're looking at the wrong revision of if.c, yours seems to be [1]. The main issue here is that we reach this place with ifp unlocked. It's probably not related to the system running out of memory. That several entries get created in a loop, appears to be a separateproblem.I know that several changes were made in netbsd-current for MPification.Itmay be that you exercise a particular condition that breaks an assumption somewhere. Ryota, Kengo, could you have a look?I'm sorry I've looked the mail now. Chuck, could you decode the backtrace of the panic? In this case the path to the assertion (probably in if_mcast_op) is important. Thanks, ozaki-rThanks, Maxime [1] https://nxr.netbsd.org/xref/src/sys/net/if.c?r=1.423#3595