Re: sh(1) read: add LINE_MAX safeguard and "-n" option

To: Robert Elz <kre%munnari.OZ.AU@localhost>
Subject: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
From: tlaronde%kergis.com@localhost
Date: Tue, 24 Sep 2024 14:27:48 +0200

On Tue, Sep 24, 2024 at 06:54:35PM +0700, Robert Elz wrote:
>     Date:        Tue, 24 Sep 2024 12:56:49 +0200
>     From:        <tlaronde%kergis.com@localhost>
>     Message-ID:  <ZvKa8e8a7FHIFLz6%kergis.com@localhost>
> 
>   | The present patch does two things:
>   |
>   | 1) Set, by default, the maximum of bytes read, in every case, as being
>   | LINE_MAX (the maximum number of bytes in a line in a text file);
> 
> I am not really in favour of that part, while allowed by the standard,
> imposing unnecessary limits, just because they are permitted, is not
> really ideal.   Apart from that, the "line" read by read (without -r)
> can actually be several (or many) text file lines, if each is ended by
> a \ (line continuation).

Good point.

This can be solved by resetting nread to 0 when an actual end-of-line
is reached and escaped. In this case the condition (nread == linemax)
has to be suppressed and the case handled in the corresponding block.

> 
>   | 2) Implement the '-n' option that allows to set explicitely the 
>   | maximum number of bytes to read, thus allowing too to bypass deliberately
>   | the LINE_MAX value.
> 
> Martin suggested that as well.  Your implementation isn't correct
> as it is (if the limit is reached, the next character will be discarded,
> that's not allowed ... also easy to fix) but before doing anything I
> want to check what other shells which implement the option actually
> count (particularly  wrt \ sequences, but also the word splitting).
> There is no point being needlessly different if that is possible to
> avoid.

I gave a quick look to the bash(1) man page, that has two differing
options: -n (max number read) and -N (read exactly this number).

I have not looked at the '-N' case in details (it seems to me overly
too complex to get right for whatever a user might want regarding bytes
read vs "chars" actually ending in the variables).

For '-n', if I understand correctly, this is the number of bytes read,
without consideration of "char"s and, in this sense, escaping sequences.

For me, the least surprise thing is to treat size limit as
next char is an eof except if newline escaped.

This is why I use "bytes" for the count, to treat it differently from
"char" that may be an interpretation of a sequence of bytes.

-- 
        Thierry Laronde <tlaronde +AT+ kergis +dot+ com>
                     http://www.kergis.com/
                    http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

Follow-Ups:
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: Robert Elz
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: Robert Elz
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: tlaronde

References:
- sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: tlaronde
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: Robert Elz

Prev by Date: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Next by Date: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Previous by Thread: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Next by Thread: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Indexes:

Home | Main Index | Thread Index | Old Index