Re: sh(1) read: add LINE_MAX safeguard and "-n" option

To: Robert Elz <kre%munnari.OZ.AU@localhost>
Subject: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
From: tlaronde%kergis.com@localhost
Date: Wed, 25 Sep 2024 15:12:07 +0200

On Tue, Sep 24, 2024 at 02:27:48PM +0200, tlaronde%kergis.com@localhost wrote:
> On Tue, Sep 24, 2024 at 06:54:35PM +0700, Robert Elz wrote:
> >     Date:        Tue, 24 Sep 2024 12:56:49 +0200
> >     From:        <tlaronde%kergis.com@localhost>
> >     Message-ID:  <ZvKa8e8a7FHIFLz6%kergis.com@localhost>
> > 
> >   | The present patch does two things:
> >   |
> >   | 1) Set, by default, the maximum of bytes read, in every case, as being
> >   | LINE_MAX (the maximum number of bytes in a line in a text file);
> > 
> > I am not really in favour of that part, while allowed by the standard,
> > imposing unnecessary limits, just because they are permitted, is not
> > really ideal.   Apart from that, the "line" read by read (without -r)
> > can actually be several (or many) text file lines, if each is ended by
> > a \ (line continuation).

FWIW, you will find below a proposed solution:

In a nutshell, there are 2 counters: one for each "line" (in the
stream read, the sequence of bytes terminated by the end byte) and a
second counter counting all the bytes read, except the continuation
line sequence (when not in raw mode) that is considered to be ignored
(it's the same data, whether it is one long line or split with
continuation lines).

If '-n' does not specify it, the maximum number of bytes (except
continuation line sequence) is "unlimited" (macro READ_UNLIMITED
defined to be ULONG_MAX).

If the delimiter is not specified, it is the default '\n', to the
input stream is supposed to be a POSIX text file. The maximum length
of a "line" is LINE_MAX. When called like:

read myvar <myfile

a not text file will not be read more than LINE_MAX, hence the "naive"
way to read to obtain the first "line" to looke for a shebang works as
expected.

One can perfectly have more than this, with continuation lines: the
counter is only for each line read; it is reset to zero with switching
to the "next" continuating line.

If '-d' is used, even with '\n', linemax is reset to READ_UNLIMITED,
allowing to bypass the limit for a "text" file not POSIX compliant.

I have dropped trying to count (for '-n') only bytes effectively put
in the "record" for 2 reasons:

1) The handling of IFS, even more so if IFS is a name of one of the
variables to be assigned, is not easy to get right (we end processing
before perhaps removing characters written);

2) When the user defines the maximum to read, he can hardly know what
processing will be done with the data i.e. the number of bytes
effectively written in the variables. So this number is effectively
the number of bytes to read, minus the continuation line sequences
that could not be here and should not count.

It seems to me that this addresses the usage, adds a reasonable
safeguard, for a very small price in code.

Best,

T. Laronde

diff --git a/bin/sh/miscbltin.c b/bin/sh/miscbltin.c
index c4f963d0d86a..b0cba4fab15e 100644
--- a/bin/sh/miscbltin.c
+++ b/bin/sh/miscbltin.c
@@ -54,6 +54,7 @@ __RCSID("$NetBSD: miscbltin.c,v 1.54 2023/10/05 20:33:31 kre Exp $");
 #include <stdlib.h>
 #include <ctype.h>
 #include <errno.h>
+#include <limits.h>		/* LINE_MAX, if defined */
 
 #include "shell.h"
 #include "options.h"
@@ -67,6 +68,9 @@ __RCSID("$NetBSD: miscbltin.c,v 1.54 2023/10/05 20:33:31 kre Exp $");
 
 #undef rflag
 
+#ifndef LINE_MAX
+# define LINE_MAX 2048		/* peak value of _POSIX2_LINE_MAX */
+#endif
 
 
 /*
@@ -75,6 +79,14 @@ __RCSID("$NetBSD: miscbltin.c,v 1.54 2023/10/05 20:33:31 kre Exp $");
  *
  * This uses unbuffered input, which may be avoidable in some cases.
  *
+ * For safety and efficiency, when called with the default end "line"
+ * delimiter ('\n'), the maximum length of a line is set to LINE_MAX.
+ * When specifying explicitely something (including '\n'), user
+ * must know what he's doing since this safeguard doesn't exist
+ * (the limit is ULONG_MAX considered unlimited here).
+ * When specifying the maximum number to read, this is the number of
+ * bytes except the escaping sequence '\\' + end.
+ *
  * Note that if IFS=' :' then read x y should work so that:
  * 'a b'	x='a', y='b'
  * ' a b '	x='a', y='b'
@@ -86,6 +98,7 @@ __RCSID("$NetBSD: miscbltin.c,v 1.54 2023/10/05 20:33:31 kre Exp $");
  * ':b c:'	x='',  y='b c:'
  */
 
+#define READ_UNLIMITED ULONG_MAX	/* our 'unlimited' length */
 int
 readcmd(int argc, char **argv)
 {
@@ -101,17 +114,28 @@ readcmd(int argc, char **argv)
 	int i;
 	int is_ifs;
 	int saveall = 0;
+	unsigned long nread_inline;	/* #bytes read in "line" */
+	unsigned long linemax;	/* nread_inline <= linemax */
+	unsigned long nread;	/* #bytes read, not counting escaped end */
+	unsigned long readmax;	/* nread <= readmax */
 	ptrdiff_t wordlen = 0;
 	char *newifs = NULL;
 	struct stackmark mk;
 
-	end = '\n';				/* record delimiter */
+	end = '\n';				/* line delimiter */
 	rflag = 0;
 	prompt = NULL;
-	while ((i = nextopt("d:p:r")) != '\0') {
+	linemax = LINE_MAX;			/* default */
+	readmax = READ_UNLIMITED;
+
+	while ((i = nextopt("d:n:p:r")) != '\0') {
 		switch (i) {
 		case 'd':
 			end = *optionarg;	/* even if '\0' */
+			linemax = READ_UNLIMITED;
+			break;
+		case 'n':
+			readmax = strtoul(optionarg, (char **)NULL, 10);
 			break;
 		case 'p':
 			prompt = optionarg;
@@ -122,9 +146,12 @@ readcmd(int argc, char **argv)
 		}
 	}
 
+	if (end == '\\')
+		rflag = 1;	/* there is no escape with escape...  */
+
 	if (*(ap = argptr) == NULL)
 		error("variable name required\n"
-			"Usage: read [-r] [-p prompt] var...");
+			"Usage: read [-r] [-d delim] [-n count] [-p prompt] var...");
 
 	if (prompt && isatty(0)) {
 		out2str(prompt);
@@ -138,19 +165,30 @@ readcmd(int argc, char **argv)
 	status = 0;
 	startword = 2;
 	STARTSTACKSTR(p);
+	nread = nread_inline = 0;
 	for (;;) {
+		if (nread >= readmax || nread_inline >= linemax)
+			break;
 		if (read(0, &c, 1) != 1) {
 			status = 1;
 			break;
 		}
-		if (c == '\\' && c != end && !rflag) {
+		if (c == '\\' && !rflag) {
 			if (read(0, &c, 1) != 1) {
 				status = 1;
 				break;
 			}
-			if (c != '\n')	/* \ \n is always just removed */
+			if (c != end) {	/* not continuation line */
+				nread_inline += 2;
+				nread += 2;
 				goto wdch;
-			continue;
+			} else {
+				nread_inline = 0;	/* new line */
+				continue;
+			}
+		} else {
+			++nread_inline;
+			++nread;
 		}
 		if (c == end)
 			break;


-- 
        Thierry Laronde <tlaronde +AT+ kergis +dot+ com>
                     http://www.kergis.com/
                    http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

References:
- sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: tlaronde
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: Robert Elz
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: tlaronde

Prev by Date: Re: Fwd: Re: interactive shell detection in shrc
Next by Date: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Previous by Thread: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Next by Thread: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Indexes:

Home | Main Index | Thread Index | Old Index