tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: sh(1) read: add LINE_MAX safeguard and "-n" option



On Thu, Sep 26, 2024 at 12:02:08AM +0700, Robert Elz wrote:
>     Date:        Wed, 25 Sep 2024 21:01:12 +0700
>     From:        Robert Elz <kre%munnari.OZ.AU@localhost>
>     Message-ID:  <24247.1727272872%jacaranda.noi.kre.to@localhost>
> 
> This isn't avoiding adding a -n that works, but a possible simpler change
> that might help, is much easier to install properly, should break nothing,
> and I think will probably avoid the problem that is being observed.
> 
> That is, currently the read builtin simply ignores \0 bytes in the input,
> except if \0 is the delimiter character.   We could change that, and make
> \0 an error - POSIX specifies that the input shall not contain \0 chars
> unless -d has been used to make \0 the delimiter character.
> 
> If that change got made, I suspect that the read would terminate quite
> quickly on non-text files, and text files will end at the first \n.
> 
> Would that be a useful first step at least?
> 
> If I do that, I'd probably add an option to retain the current behaviour,
> just in case there's something (inappropriately) relying upon it.

AFAIK, this won't solve the problem at hand, because numerous
files with bytes in the range 0x20--0x7E are made stripping extra
white space and rowing it all on one "line", to speed up the parsing
or to render them more difficult to read by a human being---or because
they are generated by software and are not intended for humans,
and the software simply adds instructions, one after the other,
without inserting new lines.

So '\0' is another question---no harm to solve this to to be consistent
with what NetBSD wants to be consistent with---but this will not help
with typical html files or javascript files that may as well not
have any newline while every byte may be in 0x20 0x20 0x7E range.

Even without changing at present the way read reads, the easy solution
is to add, without dealing with terminal settings, the '-n' option.
Then the decision has to be made to count or not the escaping line
sequence (I'm for not counting it, the rational being a continuation
line is simply a formatting of data entry, whether to conform to
LINE_MAX or to whatever line length, but does not count as data).

But this is just one opinion, and it is up to you.
-- 
        Thierry Laronde <tlaronde +AT+ kergis +dot+ com>
                     http://www.kergis.com/
                    http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index