tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: sh(1) read: add LINE_MAX safeguard and "-n" option



    Date:        Tue, 24 Sep 2024 17:30:42 +0200
    From:        tlaronde%kergis.com@localhost
    Message-ID:  <ZvLbItYTlanIgVgV%kergis.com@localhost>

  | Furthermore the continuation test on:
  |
  | 		if (c != '\n')	/* \ \n is always just removed */
  | 			goto wdch;
  |
  | seems wrong. Shouldn't it be?:
  |
  | 	 if (c != end)
  | 		goto wdch;

Actually no, what is there now is what is intended.

The idea is that the input might need to be divided into many lines
to meet the requirement that it be a text file, which means a max
line length (as you're aware), and that max length is from the first
char in the line to the next \n char (read's delimiter char has
nothing to do with that use of \n).  To allow that, while not restricting
the length of a record, the sequence \ \n is allowed to indicate
continuation lines, regardless of what the delimiter is, and is simply
removed from the input stream (just as in cpp and sh - and more).

Other than that usage, a \ also escapes the following char, avoids
it being anything special (not a field (word) separator, not the
delimiter, and of course, as \\ not the escape char either).

If the delimiter was \n (the default, or -d $'\n') then the end of line
continuation removal causes it to vanish before the code checks if the
delimiter has appeared, if the delimiter is something else, we don't want
it to vanish, there is no point in that -- say we use "-d :", why would
we then ever write \: in the input if those pair of chars are simply
deleted?  Makes no sense.  What we would want is the escaped : there
to be a regular char, not deleted, and not the delimiter either.

So the test above is is checking for when we have a \ before some
character other than \n - in which case the goto adds the following
character to the current word (which makes it into just a ordinary
char, not special in any way, with the preceding \ removed).   But
if it is \n after the \ we don't do that, so just continue (next
line not shown above) which goes back to read more input, simply
discarding the \ \n sequence, which is what we want to happen whether
\n is the delimiter or not.

This is specifically allowed by posix in the spec of the read command,
though you have to read the almost indecipherable sentence about a
million times, and already knowing what it is trying to say, to
understand it (and even then I think what it is saying has an error,
but it is so hard to decipher I'm not sure).

Apart from that:

I think I have -n implemented as intended (by me anyway) now.   But
now I need to also update the manual ... I started trying to fit it
into the text in the form the description of the read builtin
currently exists, but that got ridiculously messy, so I am going to
discard the whole current destription and do it again in the more
conventional form, with the options listed as a list, rather than just
worked into the description in narative form.   That's going to take
another day or so.

I have also added -z (currently, for not very important backward compat
with the current impl) to issue an error if a \0 is encountered in the
input (other than as the record delimiter).  Inverting the
sense of that option probably makes more sense (-z to allow \0
chars, and error without that option).   Either way this is very
very simple and cheap to implement, as the code has to check for
the \0 chars anyway.   (The error would cause the read to terminate
with exit status 2, as does any other error).

Or that option could just go away again.    Opinions please? (everyone)

kre


Home | Main Index | Thread Index | Old Index