Re: sh(1) read: add LINE_MAX safeguard and "-n" option

To: tlaronde%kergis.com@localhost
Subject: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Fri, 27 Sep 2024 22:47:39 +0700

    Date:        Fri, 27 Sep 2024 15:04:18 +0200
    From:        tlaronde%kergis.com@localhost
    Message-ID:  <ZvatUpCvWtk7LrDS%kergis.com@localhost>

  | If I understand correctly your view, the explanation could be
  | something around this (I mean for the idea; for the way it is
  | expressed in some kind of english...):

This what I came up with (no -N option has been implemented, I don't
see the point at the minute - that can be revisited later if someone
can demonstrate a meaningful use for it).

In the description of the -z option, either just the brackets, or the
brackets and all contained text, will end up being deleted, depending
upon which way the option ends up working.

I did add the -b option (turns out to be easy, and actually helpful to
avoid the tty needing to be put into raw mode, losing erase/kill
processing in most cases).

I also added the PS2 output (required by POSIX) when obtaining a
continuation line from stdin as a terminal, which we never bothered
with before.

Comments appreciated (other than about it being just ascii, with no
extra formatting visible - the actual man page doesn't have that
limitation).   I am not particularly happy with the wording for -n.

The final paragraph is about (just slightly modified) all that remains
from the existing man page (sh(1)) description of read.

kre

     read [-brz] [-d delim] [-n max] [-p prompt] variable [...]

            The read command reads a record from its standard input (by
            default one line) splits that record as if by field splitting,
            and assigns the results to the named variable arguments, as
            detailed below.

            The options are as follows:

                  -b         Do buffered reads, rather than reading one byte
                             at a time.  Use of this option might result in
                             reading more bytes from standard input than the
                             read utility actually processes, causing some
                             data from standard input to be unavailable to any
                             subsequent utility that expects to obtain them.

                  -d delim   End the read when the first byte of delim is
                             obtained from standard input.  Specifying "" as
                             delim causes the nul character (`\0') to be the
                             end delimiter.  The default is <newline> (`\n').

                  -n max     read will read no more than max bytes from stan-
                             dard input.  The default is unlimited.  If the
                             end delim has not been encountered within max
                             bytes, read will act as if one immediately fol-
                             lowed the max'th byte, without attempting to
                             obtain it.  However, even if the -r option is not
                             given and the final byte actually read were the
                             escape character (not itself escaped), no more
                             bytes will be read, and that escape character
                             would simply be removed as descibed below.

                  -p prompt  If the standard input is a terminal, then prompt
                             is written to standard error before the read com-
                             mences.  If more lines of data are requred in
                             that case, the normal PS2 prompt is written as
                             each subsequent line is to be obtained.

                  -r         Reduced processsing of the input.  No escape
                             characters are recognised, and line continuation
                             is not performed.  See below.

                  -z         If a nul character (`\0') is found in the input,
                             other than when acting as the delimiter, an error
                             is [normally] generated.  [This option disables
                             that error, the nul is simply ignored.]

            If the read is from a terminal device, and the -p option was
            given, prompt is printed on standard error.  Then a record, termi-
            nated by the first character of delim if the -d option was given,
            or a <newline> (`\n') character otherwise, but no longer than max
            bytes if the -n option was given, is read from the standard input.
            If the -b option is not given, no data from standard input beyond
            the end delimiter, or the max bytes that may be read, are
            obtained.

            If the -r option not was given, and the two character sequence `\'
            `\n' is encountered, those two characters are simply deleted, and
            provided that max bytes have not yet been obtained, and the end
            delimiter has yet to be encountered, more input is obtained, with
            the first character of the following line placed in the input
            where the deleted `\' had been.  This allows logical lines longer
            than the maximum line length permitted for text files to be pro-
            cessed.  The two removed characters are still counted for the pur-
            poses of the max input limit.

            If the -r flag was not given, the <backslash> character (`\')
            character is then treated as an escape character, the character
            following it is always treated as a normal, insignificant, data
            character, and is never treated as the end delimiter nor as an IFS
            character for field splitting.

            After field splitting has completed, but before data has been
            assigned to any variables, all escape characters are removed.
            Note that the two character sequence `\' `\' can be used to enter
            the escape character as data, the first acts as the escape charac-
            ter, the second becomes just a normal data character.

            The ending delimiter, if encountered, and not escaped, is deleted
            from the record which is then split as described in the field
            splitting section of the Word Expansions section above.  The
            pieces are assigned to the variables in order.  If there are more
            pieces than variables, the remaining pieces (along with the char-
            acters in IFS that separated them) are all assigned to the last
            variable.  If there are more variables than pieces, the remaining
            variables are assigned the null string.  The read built-in utility
            will indicate success unless EOF, or a read error, is encountered
            on input, or there is a usage error (unknown option, etc) in which
            case failure is returned.

Follow-Ups:
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: Edgar Fuß
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: Robert Elz
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: tlaronde

References:
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: tlaronde
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: tlaronde
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: tlaronde
- sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: tlaronde
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: Robert Elz
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: Robert Elz
- Re: sh(1) read: add LINE_MAX safeguard and "-n" option
  - From: Robert Elz

Prev by Date: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Next by Date: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Previous by Thread: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Next by Thread: Re: sh(1) read: add LINE_MAX safeguard and "-n" option
Indexes:

Home | Main Index | Thread Index | Old Index