Re: sh(1) read: add LINE_MAX safeguard and "-n" option

To: tlaronde%kergis.com@localhost
Subject: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Tue, 24 Sep 2024 23:13:05 +0700

    Date:        Tue, 24 Sep 2024 14:27:48 +0200
    From:        tlaronde%kergis.com@localhost
    Message-ID:  <ZvKwRPOHOo5eZcsp%kergis.com@localhost>

  | This can be solved by resetting nread to 0 when an actual end-of-line
  | is reached and escaped.

I think it better to just not have a limit in the normal case, it
serves no purpose, except for this one rather exotic use of the
read builtin (which really is meant for reading text files - the
"variation" was to allow it to read \0 delimited "records" as output
from find -print0 and similar.).   It cannot really read binary blobs,
no matter what is done, as sh variables cannot contain \0 characters
(ever).   That doesn't matter for the present purpose, but nor does
much else here.

  | I gave a quick look to the bash(1) man page, that has two differing
  | options: -n (max number read) and -N (read exactly this number).
  |
  | I have not looked at the '-N' case in details (it seems to me overly
  | too complex to get right for whatever a user might want regarding bytes
  | read vs "chars" actually ending in the variables).
  |
  | For '-n', if I understand correctly, this is the number of bytes read,
  | without consideration of "char"s and, in this sense, escaping sequences.

The bash manual says "characters" in both cases, but I'm not sure that
it really means that, and certainly for us the difference is moot, as
sh really wants 1 byte == 1 character, almost always (it can process
UTF-8 and similar, it because it mostly doesn't need to interpret the
strings as characters, just a byte strings).

  | This is why I use "bytes" for the count, to treat it differently from
  | "char" that may be an interpretation of a sequence of bytes.

Yes, that part isn't the issue - the issue is that if "read" reads N
bytes (characters) [0..N-1] (and after processing assigns them to variables)
then another following read must start at the very next byte [N], read isn't
allowed to simply discard anything not explicitly specified -- that is it can
remove \ chars if -r isn't given, and always removes the delimiter char,
if found,  but it cannot actually read 128 bytes, and then just process
100 of them, as there's no way to put back the other 28 (particularly
when reading from a pipe).   That's why it reads 1 byte at a time, and
never reads the next unless it is needed.

The other versions (ignoring zsh where -n means something totally unrelated)
all put the terminal into raw mode (or the equivalent) when -n is specified,
so as soon as n characters have been read the read can stop - otherwise the
terminal driver won't return anything until the user enters a \n (and while
the 1 byte at a time read scheme avoid reading more than N of the bytes
entered, leaving the rest for later, if one does "read -n 1 var" and the
read doesn't return after 1 byte is typed (which it does in the other shells)
people will be unhappy.

I am looking at how to make something reasonable work.  It won't happen
within a day or two however.

kre

References:
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: tlaronde
- sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: tlaronde
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: Robert Elz

Prev by Date: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Next by Date: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Previous by Thread: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Next by Thread: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Indexes:

Home | Main Index | Thread Index | Old Index