Subject: Re: PR 36963
To: David Holland <dholland+netbsd@eecs.harvard.edu>
From: Jan Danielsson <jan.m.danielsson@gmail.com>
List: tech-kern
Date: 09/19/2007 03:55:22
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

David Holland wrote:
> On Wed, Sep 19, 2007 at 02:39:02AM +0200, Jan Danielsson wrote:
>  > http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=36963
>  > [...]
> 
> Are you running a multiprocessor kernel?

   No; just a single processor. Unfortunately.

> (And if you haven't yet, try turning on DIAGNOSTIC and LOCKDEBUG.)

   I haven't done that; I'll be sure to do that on next rebuild.

>  >    I would just like to check if anyone has any interest in this
>  > problem. 
> 
> Definitely - the more information you can collect, the better.

   Good; I just wanted to know that I'm not wasting my time posting all
this information to the PR.

> One useful thing to find out would be whether, after things stop
> working, the process credentials stored in the kernel are still
> correct or if they've been garbaged.

   As far as I can tell, they aren't garbled I'm pretty sure I've done
this: I open a terminal window, su to my pkgsrc user, run "ls -l", which
will fail, then run the "jan" login/logout procedure, return to the
(still open) pkgsrc terminal window, and run "ls -l" without any
problems. I'm not 100% sure that I have done this; although I'm pretty
sure. I believe that behavior would be unlikely to work if the
permissions have been garbled. (I can't test it right now, because I
just logged in/out with my normal user, so the permission problem is not
"active"). I'll be sure to try when it appears again.

> When things break, does running
> "id" print right or wrong information?

   "id" returns the correct information even when the problem is
"active". As does "user info".

> Also, does ls -l (or
> stat(2)/fstat(2) if necessary(*)) return the right owner/group and
> permissions for the affected files?

   What the fork? This is a behavior I hadn't noticed before (though I'm
very certain it has always been there -- I just haven't seen it until now):

nl102-238-202# su - pkgsrc
$ ls -l
ls: .: Permission denied
$ ls -l update_wip
- -rwx------  1 pkgsrc  users  158 Jun 18 05:15 update_wip
$ ls -l
ls: .: Permission denied
$ ls -l upda*
ls: upda*: No such file or directory
$ ls -l *
ls: *: No such file or directory
$ ls -l .
ls: .: Permission denied
$ ls -l ..
ls: ..: Permission denied
$ ls -l /
ls: /: Permission denied
$ pwd
/home/pkgsrc
<Here I logged in, and then logged out, my "jan" user>
$ ls -l
total 152
- -rwx------     1 pkgsrc  users    219 Sep  8 20:51 checkout_pkgsrc
- -rwx------     1 pkgsrc  users    282 Jun 18 04:31 checkout_wip
- -rwx------     1 pkgsrc  users    173 Sep  8 20:51 update_pkgsrc
- -rwx------     1 pkgsrc  users    158 Jun 18 05:15 update_wip
drwxr-xr-x  1797 pkgsrc  users  68608 Aug 11 16:41 wip
$ ls -l /
total 18411
drwxr-xr-x   2 root  wheel      512 Sep  5 14:42 altroot
drwxr-xr-x   2 root  wheel     1024 Sep  5 16:57 bin
[---]

   It's actually only the first few commands which are interesting (the
rest is old news -- I just wanted to show my cool problem off some
more). Notice that "ls" works when I specify an exact file name(!). I'll
start working on my speech for when I accept the yearly "wtf?!"-prize. :(

   Also, this is what happens when I run "cvs update -dP" in pkgsrc (it
appears to actually update the repsitory, although I don't think it
manages to create new directories), but it ends with:

[---]
cvs update: cannot open directory archivers/arc/files for empty check:
Permission denied
cvs update: cannot open directory archivers/arc for empty check:
Permission denied
cvs update: cannot open directory archivers/afio/patches for empty
check: Permission denied
cvs update: cannot open directory archivers/afio for empty check:
Permission denied
cvs update: cannot open directory archivers/advancecomp/patches for
empty check: Permission denied
cvs update: cannot open directory archivers/advancecomp for empty check:
Permission denied
cvs update: cannot open directory archivers/9e/patches for empty check:
Permission denied
cvs update: cannot open directory archivers/9e for empty check:
Permission denied
cvs update: cannot open directory archivers for empty check: Permission
denied


> (*) E.g., if once it breaks you can't ls -l, or even call stat()
> successfully, you can write a program that opens the file (or
> directory) before things break, waits until they do, and then uses
> fstat() on that file handle. This should get past any broken
> permissions checks. It might also conceivably prevent the problem from
> affecting the file in question...

   I'll do that. But I'm pretty sure that it's only file/directory
_enumerations_ which fail. Hmm... I'll try to write a program which runs
opendir(), lists a few files, and then calls readdir() (or whatever it's
called), and pauses before continuing. Then I'll instruct it to continue
once the bug reappears.

   Did you see the program I posted in the PR? It show that opendir(".")
fails when the "bug" is "active"; but I can "cat" files I know to exist
without problems. I think that's a good place to start looking. Though I
may be staring myself blind on the opendir() issue. :/


- --
Kind regards,
Jan Danielsson

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (NetBSD)

iD8DBQFG8IGKuPlHKFfKXTYRCkoZAJ4rGFG9RJDFJNWGMmcc6Kg1Uw72NACfcmNr
2uiH/wQ9dru4AOq1thFZeBY=
=bq1C
-----END PGP SIGNATURE-----