tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: sh(1) read: add LINE_MAX safeguard and "-n" option



On Tue, Sep 24, 2024 at 09:09:29AM -0400, Greg Troxel wrote:
> Robert Elz <kre%munnari.OZ.AU@localhost> writes:
> 
> >     Date:        Tue, 24 Sep 2024 12:56:49 +0200
> >     From:        <tlaronde%kergis.com@localhost>
> >     Message-ID:  <ZvKa8e8a7FHIFLz6%kergis.com@localhost>
> >
> >   | The present patch does two things:
> >   |
> >   | 1) Set, by default, the maximum of bytes read, in every case, as being
> >   | LINE_MAX (the maximum number of bytes in a line in a text file);
> >
> > I am not really in favour of that part, while allowed by the standard,
> > imposing unnecessary limits, just because they are permitted, is not
> > really ideal.   Apart from that, the "line" read by read (without -r)
> > can actually be several (or many) text file lines, if each is ended by
> > a \ (line continuation).
> 
> Sure, but the problem is that if you have a file which is e.g one line
> (single \n at end) that is 10 MB, read from it is unreasonable, and it's
> difficult to deal with this in portable code.
> 
> If there were a limit which was well under 1 MB, but well over anything
> reasonably in a bona fide text file, it would finesse the issue.
> 
> Perhaps 32 * LINE_MAX.

POSIX issue 8 has added the "-d delim", that is a delimiter of a
"line" and this makes things more complex, since the continuation is
the escaping of the delimiter.

My solution was too simple.

We have to make a difference between the maximal length of a "line"
(linemax), and the maximum of bytes to read (the "-n" option):
recordmax.

If the delimiter is the newline, the maximal length of each "line" is a
text line, that is LINE_MAX; if the delimiter is something else, the
maximum is ULONG_MAX.
If this amount is reached without reaching the delimiter (escaped or
not), the reading stops. When changing line (after a continuation
line), the counter is reset to zero allowing to absord another "line".

What is set by "-n" is the maximum count of bytes composing the record
(recordmax), that may be a concatenation of "lines", not counting
the discarded bytes (backslash and delimiter that are not part of
data since the "escaped line" is presentation, to be discarded)
and counting only 1 for an escaped sequence if it is interpreted (not
raw) (replacing the escaped sequence  by the character).

If the maximum is not set it defaults to ULONG_MAX.

Slightly more complex than what I made, but still reasonably simple.
-- 
        Thierry Laronde <tlaronde +AT+ kergis +dot+ com>
                     http://www.kergis.com/
                    http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index