mirror of
https://github.com/xroche/httrack.git
synced 2026-06-29 21:45:24 +03:00
* filters: decode escaped chars correctly inside *[...] classes
The escape branch in strjoker probed joker[i+2] instead of the current
char, so a backslash escape only worked as the first class member:
'*[\[\]]' (documented as "the [ or ] character") matched only ']', and
'*[a,\[]' dropped the 'a'. The loop also treated any ']' as the class
terminator, so an escaped ']' could never be a member.
Decode the escape first in the loop body: a backslash takes the next char
as the literal member (only that char, not also the backslash the old code
added), and an escaped ']' is consumed before the terminator check. So
'*[\[\]]' now matches both brackets, and escape precedes the range/size
checks ('\-' '\,' '\<' become literal members). The self-test previously
pinned the buggy output as expected; it now asserts the documented
behavior and fails against the old matcher.
Closes #148
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
* filters: fix a 1-byte over-read on a truncated range *[a-
The *[...] class parser's range arm does i += 3 unconditionally, so a
pattern ending in a dangling '-' (e.g. *[a-) read one byte past the NUL:
joker[i+2] is the NUL, i jumps to len+1, and the separator skip and loop
guard then read joker[len+1]. Guard the range arm on joker[i+2] != '\0'
so a truncated range falls through to the literal-member path instead of
overshooting.
The filter self-test now copies the pattern and string into exact-size
heap buffers so a sanitizer traps such over-reads; the pattern previously
came straight from argv (no redzone), which is why this stayed invisible.
A *[a- test case exercises it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
---------
Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>