NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bin/59029: cut(1) -n argument doesn't work (presently unsupported, though documented)
The following reply was made to PR bin/59029; it has been noted by GNATS.
From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: gutteridge%netbsd.org@localhost
Subject: Re: bin/59029: cut(1) -n argument doesn't work (presently unsupported, though documented)
Date: Thu, 13 Feb 2025 11:46:16 +0700
Date: Thu, 13 Feb 2025 02:55:01 +0000 (UTC)
From: "David H. Gutteridge via gnats" <gnats-admin%NetBSD.org@localhost>
Message-ID: <20250213025501.B38C81A923C%mollari.NetBSD.org@localhost>
| It would be good to find an illustration of
| where the two approaches give varied output.)
My guess (without testing it) would be that if we had a file where
at some point in the file, which has up to this point all been
single byte (eg: ascii) chars, we have, at offset (say) 100
100 A B XX YYY ZZ C D E F G H
where the duplicated chars mean a character that has a multi-byte
encoding, not two X chars, and the spaces are just padding for this e-mail.
In that scheme, using -b the bytes would count
100 A B XX YYY ZZ C D E F G H
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
100 101 102 104 107 109 110 111 112 113 114
(with the missing bytes numbers being the additional bytes
needed to encode the multi-byte characters, which don't easily
fit in this display, unless I added more lines).
But using -c the counts would be
100 A B XX YYY ZZ C D E F G H
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
100 101 102 103 104 105 106 107 108 109 110
Specifying 109 as position in a -b list means cut at the 'C', whereas
specifying it in the -c list means cut at the 'G'. In this case -n
is irrelevant, as no multi-byte character would be broken, but it is
clear that using code for -c to implement the user's -b is simply wrong,
regardless of -n being given or not.
I'd assume the "special logic" you noted in the FreeBSD code is to handle
the case where a -b list includes 105 - that is, a byte offset right in the
middle of the Y character. In that case, without -n, the cut would
just happen there, right in the middle of the Y, but with -n the cut needs
to either be before Y or after it, that is, offset 104 or 107 (which is
selected probably is entirely up to the coder).
Neither the standard -b nor -c algorithm would get that right. If you're
looking for an implementation to import to improve ours, pick FreeBSD's.
kre
Home |
Main Index |
Thread Index |
Old Index