tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: regex, signed chars and 0x80 to 0xFF
Problem found: NetBSD's flex doesn't understand the idiom but is used,
while (pkgsrc) flex-2.6.4 is able to interpret such regex.
Nonetheless, has someone information about the handling of negative
chars in a regex (there are discussions, here and there, about
problems caused by a character as '-1' mistaken for EOF, so it seems
that the regex should always use unsigned char. But I guess a majority
of the code uses char, and only the ASCII range (whether these values
represent ASCII chars or not) i.e. positive values are safe.
On Sat, Dec 21, 2024 at 06:05:30PM +0100, tlaronde%kergis.com@localhost wrote:
> When trying to compile emulators/wine (pkgsrc), flex chokes with
>
> negative range in character class
>
> because (BTW, the line reported is incorrect; it reports the line _after_ the
> error) the regex are like this:
>
> NCNameStartChar ([A-Za-z_]|[\xc0-\xd6\xd8-\xf6\xf8-\xff])
>
> Is the problem with signed chars (but in this case, the ranges are
> valid since they increase also with two-complement coding).
>
> Or is the problem with the \x## notation that is not accepted by the
> regex library, translating in fact \xd8-\xf6 in 8-6 (but in this case,
> discarding \x as unknow, OK; but why discard 'd' or 'f' next...).
>
> How are handled negative characters in a range when char is signed?
>
> Side note: is there a way to obtain from flex a better description of
> the problem?
>
> Thanks in advance for any tip or reference,
> --
> Thierry Laronde <tlaronde +AT+ kergis +dot+ com>
> http://www.kergis.com/
> http://kertex.kergis.com/
> Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
--
Thierry Laronde <tlaronde +AT+ kergis +dot+ com>
http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Home |
Main Index |
Thread Index |
Old Index