tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: sh(1) read: add LINE_MAX safeguard and "-n" option



    Date:        Fri, 27 Sep 2024 12:25:49 +0200
    From:        tlaronde%kergis.com@localhost
    Message-ID:  <ZvaILZGn-fMR0kOI%kergis.com@localhost>


  | I have an algebraic mind: I always think of rule. A line, sometime
  | ago, was considered a sequence of bytes ending by the first appearance
  | of '\n'. If a "line" is defined more generally as a sequence of bytes
  | ending by the first appearance of whatever byte delimiter,

But it isn't - what a line is is defined, and it isn't that.
The delimiter is just what terminates the read, just as the
byte count given to -n does.  That might take a fractional line
or many lines to achieve (given various combinations of -d and -n).

  | But could you state it clearly (not \`a la POSIX :-^)
  | in the man page?

That would be my hope.   But writing English was never one of my
better achievements, as some of these e-mails should reveal.

  | Other corner case: when specifying a limit (-n) that is "end reading at the
  | first appearance of either eof, not escaped delimiter or that amount
  | of bytes read", what do you do when the last byte read (reaching the
  | count) is '\\'?

Stop anyway.   In general, every time it can occur, a stray ending \ just
generates unspecified behaviour.   In general I'd expect that using -n
would normally mean -r as well, so the whole question is irrelevant, but
for now, all that happens is that \ is read (no more, that would go beyond
the limit) and having nothing to escape, is removed along with all the
other \ chars that don't have any useful purpose (when -r is not given).

  | Or do you allow the stray backslash in the last
  | variable, convert it to the sequence "\\", or remove it?

For now at least, the last (the first two would be essentially
the same thing, as if that final \ was actually followed by another
and the -n limit were one byte bigger).   I think the only other
reasonable approach to take would be to make it be an error, but
I don't think that's warranted here.

There will be, after all, no way to ever know it happened (in the
script), without -r \ chars (except the escaped one, \\) are all
removed anyway, as is IFS whitespace, etc - there's no immediate way
to detect how much of each of those actually happened (with or
without -n).

[On -z]
  | IMHO, the reverse:

That's my general preference as well, but it is a change to current
behaviour, so I will wait upon others' opinions before making that
happen (it is after all, one minor "!" operator addition, so mpt
exactly something that is going to take hours of work).

  | Would it make sense to add a '-Z' option that translates a nul byte
  | into the sequence '\000' with the specification that such a sequence
  | is a constant one and is never interpreted, except by printf?

No, I don't think so.   I doubt there's any immediate need for that,
and even in printf, what happens when that appears is unspecified (and
for use with %b which would be where it ought to be used, if anywhere -
not in the format string, which would mean allowing that to come from
arbitrary external input, which is almost never a good idea, though not
quite as bad in printf(1) as in printf(3)) it would need to be \0000 anyway,
to meet the ancient stupid System III definition of how to write an
octal constant for its echo program.

kre



Home | Main Index | Thread Index | Old Index