tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: sh(1) read: add LINE_MAX safeguard and "-n" option
On Tue, Sep 24, 2024 at 09:09:29AM -0400, Greg Troxel wrote:
> Robert Elz <kre%munnari.OZ.AU@localhost> writes:
>
> > Date: Tue, 24 Sep 2024 12:56:49 +0200
> > From: <tlaronde%kergis.com@localhost>
> > Message-ID: <ZvKa8e8a7FHIFLz6%kergis.com@localhost>
> >
> > | The present patch does two things:
> > |
> > | 1) Set, by default, the maximum of bytes read, in every case, as being
> > | LINE_MAX (the maximum number of bytes in a line in a text file);
> >
> > I am not really in favour of that part, while allowed by the standard,
> > imposing unnecessary limits, just because they are permitted, is not
> > really ideal. Apart from that, the "line" read by read (without -r)
> > can actually be several (or many) text file lines, if each is ended by
> > a \ (line continuation).
>
> Sure, but the problem is that if you have a file which is e.g one line
> (single \n at end) that is 10 MB, read from it is unreasonable, and it's
> difficult to deal with this in portable code.
>
> If there were a limit which was well under 1 MB, but well over anything
> reasonably in a bona fide text file, it would finesse the issue.
>
> Perhaps 32 * LINE_MAX.
POSIX issue 8 has added the "-d delim", that is a delimiter of a
"line" and this makes things more complex, since the continuation is
the escaping of the delimiter.
My solution was too simple.
We have to make a difference between the maximal length of a "line"
(linemax), and the maximum of bytes to read (the "-n" option):
recordmax.
If the delimiter is the newline, the maximal length of each "line" is a
text line, that is LINE_MAX; if the delimiter is something else, the
maximum is ULONG_MAX.
If this amount is reached without reaching the delimiter (escaped or
not), the reading stops. When changing line (after a continuation
line), the counter is reset to zero allowing to absord another "line".
What is set by "-n" is the maximum count of bytes composing the record
(recordmax), that may be a concatenation of "lines", not counting
the discarded bytes (backslash and delimiter that are not part of
data since the "escaped line" is presentation, to be discarded)
and counting only 1 for an escaped sequence if it is interpreted (not
raw) (replacing the escaped sequence by the character).
If the maximum is not set it defaults to ULONG_MAX.
Slightly more complex than what I made, but still reasonably simple.
--
Thierry Laronde <tlaronde +AT+ kergis +dot+ com>
http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Home |
Main Index |
Thread Index |
Old Index