NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/59058: env(1) exit status can be incorrect



>Number:         59058
>Category:       bin
>Synopsis:       env(1) exit status can be incorrect
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Feb 09 04:10:00 +0000 2025
>Originator:     Robert Elz
>Release:        NetBSD 10.99.12
>Organization:
>Environment:
System: NetBSD jacaranda.noi.kre.to 10.99.12 NetBSD 10.99.12 (JACARANDA:1.1-20250119) #172: Sun Jan 19 08:59:18 +07 2025 kre%jacaranda.noi.kre.to@localhost:/usr/obj/testing/kernels/amd64/JACARANDA amd64
Architecture: x86_64
Machine: amd64
>Description:

	The man page for env(1) says:

		EXIT STATUS
		     env exits with one of the following values:
	[...]
		     126     utility was found, but could not be invoked.
		     127     utility could not be found.

	and yet:

		Script started on Sun Feb  9 09:12:49 2025
		$ mkdir env-test
		$ cd env-test
		$ ln -s foo foo
		$ env $(pwd)/foo/bar
		env: /tmp/env-test/foo/bar: Too many levels of symbolic links
		$ echo $?
		126
		$ exit

		Script done on Sun Feb  9 09:13:37 2025

	Here the utility clearly could not be found, it does not exist,
	and yet the exit code is 126 "utility was found, but..." rather
	than 127 which it should be.


	Another less serious issue (I suppose this should really be a seperate
	PR as a doc bug, but as we are already here) the same section includes:

		     1-125   utility was invoked, but failed [...]

	and yet again (continuing to use the same environment as above):

		Script started on Sun Feb  9 09:22:31 2025
		$ cd env-test
		$ env env $(pwd)/foo/bar
		env: /tmp/env-test/foo/bar: Too many levels of symbolic links
		$ echo $?
		126
		$ exit

		Script done on Sun Feb  9 09:23:08 2025

	Here the first "env" command returns exit status 126, which should
	indicate

		     126     utility was found, but could not be invoked.

	Yet here the utility is "env" which clearly can be found, and can be
	invoked, and in fact, was invoked.

	That 126 exit status (and the message sent to stderr) is actually the
	exit status (and error message) from the env command invoked by the
	env command whose status is being examined.   That is, one of the exit
	codes described by:

		     1-125   utility was invoked, but failed in some way;...

	which is what happened here, the 2nd env was invoked, and failed (it
	is the exact same invocation as the primary subject of this PR, which
	was designed to fail) yet the exit code was not in the range 1-125 as
	promised by the man page.

	The problem here is obviously that the man page is promising something
	which it is unable to deliver, a non-zero exit status from the utility
	can be any value ... if we're still using one of the old wait(2)
	interfaces to collect that status, it can be anything from 1..255, if
	we're using waitid(2) or wait6(2) then it can be any (non-zero, as the
	zero case is covered in a different line item, not included in this PR)
	32 bit value.


	And while we're here, more curiosity/weirdness than bugs of any kind,
	a couple of other exit codes listed in the EXIT STATUS section are:

		     1       An invalid command line option was passed to env.
		     125     utility was specified together with the -0 option.

	First, why devote a whole exit code (125) to something which doesn't
	need to be an error at all?   The -0 option is meaningless when a
	utility is specified, its sole purpose is to alter the delimiter
	between successive entries when the env command is used with no
	utility, and instead prints the contents of the environment.

	It is entirely normal for commands to have options that only apply in
	specific cases, eg: grep doesn't complain if I do:

		grep -i 1234 file
	or
		ls -c file

	despite the fact that it is meaningless to request case-independent
	matching of digits, and ls's -c option only does anything when (at
	least) one of -l or -t is also given.   There's no need to make -0
	an error when a utility is given, simply ignore it.   (-0 is not a
	standard option, so we can do what we like with that one.)

	On the other hand using '1' as the exit code for invalid options is
	an exceedingly poor choice (unfortunately, it might be mandated by
	POSIX, I'll check later).

	If not mandated, it would better to make that one exit(125) (regardless
	of what is, or isn't, done with the -0 case) to make it less likely to
	conflict with the utility exiting with status 1 (which is a very common
	exit code - when I tested thre grep above, just to be sure, it exited
	with status 1, "1234" did not exist in the file I used).

>How-To-Repeat:

	RTFM, and then as above (or many other similar ways).

>Fix:

	First, I am going to assign this PR to myself, and fix what needs to
	be fixed.  The PR is just for tracking the fix, and pullups, ...

	Of course, none of these issues are serious enough to warrant any
	pullups, so I won't be requesting any of those, so instead let's say
	this PR is in case anyone else wants to make any comments about the
	issues.   If you do, be quick, fixing this is not going to take very
	long!

	For the first (primary) issue, the problem is that env simply checks
	for ENOENT from execvp() and does exit(127) if that is the error
	returned, and exit(126) in all other cases.   That's really the wrong
	way, much better would be to do exit(126) if the error is ENOEXEC and
	127 in all the other cases (there are lots of error codes that
	indicate a path not found, not just ENOENT) - but that's not actually
	good enough either, if the utility to be invoked were

		#! /no/such/file

		[...]

	then we can find the utility with no issues, it cannot be invoked
	however ("/no/such/file" doesn't exist, and yes, that's an assumption
	I am making here, but it is correct in my environment) so env should
	exit(126) - yet the errno value from execvp() in this case will be one
	of the ones which indicates a file could not be found (that file being
	"/no/such/file").   Detecting the difference requires more work than
	just looking at the value of errno.

	That is it does, unless the kernel were changed to map all errors
	detected when attempting to locate the #! interpreter into ENOEXEC,
	which it could do easily enough (and has been suggested as a possibility
	in discussions about similar issues related to shell diagnostics) - but
	that potential change is beyond the scope of this PR.


	For the second (doc) issue, what I think should happen, is for the
	EXIT STATUS section to say something like:

		EXIT STATUS

			If a utility is given, and is successfully invoked,
			then the exit status is from that utility, see its
			documentation for the possible values and details.

			If no utility is given, or one is named, but cannot
			be invoked, then env will exit with status:

	followed by the list of exit codes, similar to what is there now,
	but omitting all mention of exit status values from the utility.

	I will also note that whenever env itself exits with a non-zero exit
	status, it always also writes a diagnostic indicating why it failed
	to standard error, which, when necessary, can help determine the
	exit status source -- of course, a perverse utility could be just:

		main()
		{
			fprintf(stderr, "env: unknown option -Q\n");
			exit(125);
		}

	so there never really is any way to be certain (without ktrace anyway).


	For the third (non-bug) issues, if POSIX allows it (sometimes the
	standard lists specific code values, sometimes just 0 and not 0),
	I will probably change any exit(1) in env into exit(125); and also
	simply ignore a "-0" option when a utility is named, rather than
	making that be an automatic error.



Home | Main Index | Thread Index | Old Index