Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: OpenVPN causes fresh -current to crash
On Tue, Jan 24, 2017 at 12:53 AM, Tom Ivar Helbekkmo
<tih%hamartun.priv.no@localhost> wrote:
> Ryota Ozaki <ozaki-r%NetBSD.org@localhost> writes:
>
>> The latest pfil.c (v1.34) should fix the panic. Could you try it?
>
> I'll give it a go tonight, and report back.
Thanks.
>
> Meanwhile, do you think this ongoing MPSAFE work may have some unwanted
> consequences for NFS? There's a problem that's been around for at least
> a couple of months, but that I only discovered the other day -- I was
> running with kernels from late October then, and the problem I observed
> is still there after upgrading.
I'm not sure. I don't know much about NFS, how it works and how it involves
the network stack.
>
> Reading NFS file systems is no problem, which is why I didn't notice it
> before, but writing hangs. Here's an example: I started compiling a C
> source file directly to an executable on an NFS mounted file system
> (server and client both amd64 running fresh -current). The compile pass
> is fine, but when the ld end of the pipeline wants to write the
> executable, it hangs. So I try to do a 'df' in another terminal, and it
> hangs. Finally, I simply attempt to make 'ls -l [target executable]'
> show me if it's written anything yet, and that hangs, too: after an
> attempt to write has hung the communication up, reads no longer work,
> either:
>
> UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
> 0 22179 22678 0 124 0 33344 5136 netio D+ pts/17 0:00.01 ld [...]
> 501 21370 21006 516 85 0 8952 1144 nfsrcv I+ pts/18 0:00.00 df
> 501 21710 1 0 127 0 8964 1116 tstile D pts/20- 0:00.00 /bin/ls [...]
>
> Once I have something with "tstile" in the "WCHAN" column, I know that
> I can't just reboot the machine: it's going to take a hard reset.
Can you get DDB? If you can, you can know where the processes hang up:
db> ps # you can get LWP addresses of ld and ls
db> bt/a <LWP address> # you can get their stack traces
And I guess by ps you can see some other LWPs stuck on tstile, for example
softnet/N. Getting stack traces of such LWPs would explain how the hang
happens, at least, can be hints to investigate.
>
> Oh, and it's the client that hangs; the server seems to be just fine,
> and a reboot of the client makes NFS reads behave normally again. On
> the server, the output file got created, but is zero bytes. The error
> logged on the client when it gets stuck is this console output:
>
> nfs send error 64 for barsoom:/usr/local
>
> ...and then the normal "nfs server not responding" messages in syslog
> after that, of course.
I tried a NFS client with -current and a NFS server with netbsd-7, but
writing didn't hang (I compiled a C program and cp -r /etc/ /mnt/nfs).
The hang may happen depending on a NIC. Which NIC do you use?
And please let me know NFS options of the client and the server?
ozaki-r
Home |
Main Index |
Thread Index |
Old Index