tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: regcomp() signedness issues



thoughts? (i'm probably just addressing christos@ since i think he's Mr
Regex :-) )

On Tue, Dec 10, 2024 at 2:06 PM enh <enh%google.com@localhost> wrote:

> a trivial fuzzer someone once wrote blew up on this input to regcomp()
> [passed directly to regcomp() after adding a trailing '\0']:
>
> xxd
> ~/Downloads/clusterfuzz-testcase-minimized-regexec_fuzzer-5459313584832512
> 00000000: 6a3a 5b5d 6a3a 5b5d 6a3a 5bd9 6a3a 5b5d  j:[]j:[]j:[.j:[]
>
> here:
>
> ==2830==ERROR: AddressSanitizer: SEGV on unknown address 0x50f020000093
> (pc 0x7354670e97dd bp 0x0000ffffffd9 sp 0x7fff0d906410 T0)
> ==2830==The signal is caused by a WRITE memory access.
>     #0 0x7354670e97dd in CHadd
> bionic/libc/upstream-netbsd/lib/libc/regex/regcomp.c:1769:30
>     #1 0x7354670e84be in p_b_term
> bionic/libc/upstream-netbsd/lib/libc/regex/regcomp.c:1233:4
>     #2 0x7354670e84be in p_bracket
> bionic/libc/upstream-netbsd/lib/libc/regex/regcomp.c:1128:3
>     #3 0x7354670e6492 in p_ere_exp
> bionic/libc/upstream-netbsd/lib/libc/regex/regcomp.c:521:3
>     #4 0x7354670e7c8b in p_re
> bionic/libc/upstream-netbsd/lib/libc/regex/regcomp.c:851:19
>     #5 0x7354670e5aec in regcomp_internal
> bionic/libc/upstream-netbsd/lib/libc/regex/regcomp.c:379:3
>     #6 0x7354670e5aec in regcomp
> bionic/libc/upstream-netbsd/lib/libc/regex/regcomp.c:432:10
>
> looking at the netbsd regex source, it seems like all accesses to `bmp`
> _do_ all have appropriate `< NC` range checks, but because wint_t is
> signed, the checks are wrong for negative values.
>
> i think you want something like this patch:
>
> diff --git a/lib/libc/regex/regcomp.c b/lib/libc/regex/regcomp.c
> index 47602b77f621..2312dbaa947c 100644
> --- a/lib/libc/regex/regcomp.c
> +++ b/lib/libc/regex/regcomp.c
> @@ -1764,8 +1764,7 @@ CHadd(struct parse *p, cset *cs, wint_t ch)
>         _DIAGASSERT(p != NULL);
>         _DIAGASSERT(cs != NULL);
>
> -       assert(ch >= 0);
> -       if (ch < NC)
> +       if ((unsigned)ch < NC)
>                 cs->bmp[(unsigned)ch >> 3] |= 1 << (ch & 7);
>         else {
>                 newwides = reallocarray(cs->wides, cs->nwides + 1,
> @@ -1778,9 +1777,9 @@ CHadd(struct parse *p, cset *cs, wint_t ch)
>                 cs->wides[cs->nwides++] = ch;
>         }
>         if (cs->icase) {
> -               if ((nch = towlower(ch)) < NC)
> +               if ((unsigned)(nch = towlower(ch)) < NC)
>                         cs->bmp[(unsigned)nch >> 3] |= 1 << (nch & 7);
> -               if ((nch = towupper(ch)) < NC)
> +               if ((unsigned)(nch = towupper(ch)) < NC)
>                         cs->bmp[(unsigned)nch >> 3] |= 1 << (nch & 7);
>         }
>  }
> diff --git a/lib/libc/regex/regex2.h b/lib/libc/regex/regex2.h
> index fbfff0daf0f8..ee37044defc9 100644
> --- a/lib/libc/regex/regex2.h
> +++ b/lib/libc/regex/regex2.h
> @@ -135,8 +135,7 @@ CHIN1(cset *cs, wint_t ch)
>  {
>         unsigned int i;
>
> -       assert(ch >= 0);
> -       if (ch < NC)
> +       if ((unsigned)ch < NC)
>                 return (((cs->bmp[(unsigned)ch >> 3] & (1 << (ch & 7))) !=
> 0) ^
>                     cs->invert);
>         for (i = 0; i < cs->nwides; i++) {
> @@ -160,8 +159,7 @@ static __inline int
>  CHIN(cset *cs, wint_t ch)
>  {
>
> -       assert(ch >= 0);
> -       if (ch < NC)
> +       if ((unsigned)ch < NC)
>                 return (((cs->bmp[(unsigned)ch >> 3] & (1 << (ch & 7))) !=
> 0) ^
>                     cs->invert);
>         else if (cs->icase)
>
> you can also see i've also removed the assert()s since i don't think
> anyone's actually building this code with them enabled, and the false sense
> of security from reading that assert is quite likely what caused that this
> bug to be introduced in the first place...
>


Home | Main Index | Thread Index | Old Index