tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: sh(1) read: add LINE_MAX safeguard and "-n" option



On Wed, Sep 25, 2024 at 12:18:28AM +0700, Robert Elz wrote:
> 
> It isn't possible.   Actually using \ as the delimiter (without
> -r anyway) makes little sense at all, but that doesn't mean it
> needs to be prohibited.

Then, this is another thing that has to be corrected in POSIX, issue
8:

---8<---
If the -r option is not specified, <backslash> shall act as an escape
character. An unescaped <backslash> shall preserve the literal value
of a following <backslash> and shall prevent a following byte (if any)
from being used to split fields, with the exception of either
<newline> or the logical line delimiter specified with the -d delim
option (if it is used and delim is not <newline>); it is unspecified
which. If this excepted character follows the <backslash>, the read
--->8---

And this escape business is simply non parsable with a backslash as
a delimiter.

I suggest in our code to explicitely (for readability) set:

if (end == '\\')
	rflag = 1;	/* no escaping if escape */

this will help a casual reader and seems, IMHO, more easy to grasp
when reading than (c == '\\' && c != end) --- that indeed discard
end == '\\'. Too smart at least for me ;-)

What I can't once more parse in the POSIX specification is if it shall
be interpreted as "the sequence backslash and delimiter in not raw mode
is a continuation line", or if "in not raw mode, any escaped delimiter
is a continuation line as well as the escaped newline".

For me, the "either newline or other" has to be interpreted as xor,
but am I right? But in this case this covers the whole range "newline
and not newline", so why not simply state that the line delimiter
escaped when not in raw mode is a continuation line (having stated
once and for all that a backslah as delimiter implies raw mode)?

And they should start by stating that the input is a sequence of lines,
considered as a sequence of bytes ending by the first appearance of a
delimiter byte that is the newline by default but that can be set to any
byte with the -d option.

That a record can span multiple lines if there are continuation
lines that is. if not in raw mode, when the end delimiter is escaped.

And that read reads one record, discarding continuation lines and
replacing escaped sequences (when not in raw mode), and then splitting
the record according to the following rules.

-- 
        Thierry Laronde <tlaronde +AT+ kergis +dot+ com>
                     http://www.kergis.com/
                    http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index