tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: regex, signed chars and 0x80 to 0xFF



Am 21.12.2024 um 20:21 schrieb tlaronde%kergis.com@localhost:
> Problem found: NetBSD's flex doesn't understand the idiom but is used,
> while (pkgsrc) flex-2.6.4 is able to interpret such regex.
>
> Nonetheless, has someone information about the handling of negative
> chars in a regex (there are discussions, here and there, about
> problems caused by a character as '-1' mistaken for EOF, so it seems
> that the regex should always use unsigned char. But I guess a majority
> of the code uses char, and only the ASCII range (whether these values
> represent ASCII chars or not) i.e. positive values are safe.

The flex message "negative range in character class" means that in a
character range [F-L], F is greater than L. It's not about negative
character numbers but about a backwards range.

I tried to reproduce the problem by running external/bsd/flex/bin/lex
from the netbsd-8 branch, compiled on NetBSD 10.99.x, on
https://raw.githubusercontent.com/wine-mirror/wine/wine-5.0.5/dlls/msxml3/xslpattern.l,
but everything went fine.

If you have the NetBSD 8 source at hand, you could add a printf
statement right above the "negative range" message in
external/bsd/flex/dist/src/parse.y:
> if ($2 > $4) fprintf(stderr, "from %d to %d\n", $2, $4);

Then, rebuild flex and run it on the file. I'm curious whether you can
reproduce the message, and what the actual character numbers are that
are backwards.

Roland



Home | Main Index | Thread Index | Old Index