Compare commits

...

9 Commits

Author SHA1 Message Date
Xavier Roche
71398d510e Add a .clang-format and a changed-lines CI format check
The engine predates clang-format (it was shaped by an old Visual Studio
formatter) and does not round-trip through it: a whole-tree reformat is ~25k
lines of churn, so we never do one. Instead we format only the lines a change
touches, via git-clang-format, and enforce that in CI diff-scoped.

.clang-format is reverse-engineered from src/*.c (2-space, no tabs, 80 cols,
char *x pointers, attached braces, un-indented case labels, space after C-style
casts). That is mostly LLVM defaults; the deliberate deviations are
SpaceAfterCStyleCast (the dominant "(int) x" form) and SortIncludes: false
(C include order can be significant, so never reorder).

The CI "format" job pins clang-format-19 from apt.llvm.org's noble channel
(ubuntu-24.04's native is 18) to match local dev, and fails only if a PR's
changed C lines are not clang-format-clean. Existing untouched code is left
alone.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 12:26:49 +02:00
Xavier Roche
75fc040f06 Merge pull request #331 from xroche/cleanup/htsbuff-builder
Add htsbuff: a bounded string builder over a fixed buffer
2026-06-14 10:40:23 +02:00
Xavier Roche
c4ef18f5a5 Add htsbuff: a bounded string builder over a fixed buffer
Many pointer-destination buff() sites are cursors walking a buffer of known
capacity, with a manual "p += strlen(p)" after each write (the url_savename
renderer does this ~40 times). That hand-rolled pointer math is where several
of the off-by-one hazards live.

htsbuff captures the pattern: a non-owning builder (buf/cap/len) built from an
in-scope array (htsbuff_array, capacity via sizeof) or a pointer of known size
(htsbuff_ptr). htsbuff_cat/catn/cpy bound every write against the real capacity
and abort on overflow, same contract as the *_safe_ helpers, so the pointer
math goes away.

Extend the -#8 self-test and tests/01_engine-strsafe.test with builder
correctness (append, truncating append, reset, length) and an overflow-abort
case. No call sites are converted yet; that follows per file.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 10:38:22 +02:00
Xavier Roche
d76dad47f7 Merge pull request #330 from xroche/cleanup/htssafe-pointer-diagnostics
Flag unchecked pointer-destination uses of the buff() string macros
2026-06-14 08:49:26 +02:00
Xavier Roche
9c6ff54040 Bound catch_url() header buffer to its 32Kb contract
First consumer of the new buff() pointer-destination diagnostic. catch_url()
appended response headers into the caller's 'data' buffer with strcatbuff on
a char* destination, which is unchecked: a long header stream could overrun
the 32Kb buffer.

Make the capacity contract explicit (CATCH_URL_DATA_SIZE in htscatchurl.h,
used by the caller too) and append with strlcatbuff, which enforces the bound
and aborts rather than overflowing. htscatchurl.c now compiles warning-free
under the diagnostic.

The remaining raw sprintf/sscanf into the same buffer are separate items for
a later pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 08:46:03 +02:00
Xavier Roche
4a057514b9 Warn on unchecked pointer-destination uses of the buff() macros
strcpybuff/strcatbuff/strncatbuff only bounds-check when the destination
is a sized char[] array. For a bare char* the capacity is unknown, so the
macro silently falls back to plain strcpy/strcat/strncat while still
looking like a checked call.

On GCC/Clang, route the pointer case through __builtin_choose_expr() to a
stub carrying the 'warning' function attribute, so a compile-time warning
fires only at pointer-destination sites and points at the explicit-size
replacement (strlcpybuff/strlcatbuff). Array sites keep using the bounded
_safe_ helpers and stay quiet. The change is diagnostic only: no runtime
or ABI change, and other compilers keep the previous behavior.

Add a runtime self-test for the bounded ops behind a new -#8 debug mode,
plus tests/01_engine-strsafe.test covering both correct copies and the
abort-on-overflow guarantee.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 08:40:10 +02:00
Xavier Roche
055e17b057 Merge pull request #328 from xroche/cli/header-ua-length-152
Raise the user-agent and custom-header length limits
2026-06-14 01:43:31 +02:00
Xavier Roche
d7bb97d697 Merge pull request #329 from xroche/parser/lock-background-image-237
Lock CSS background-image url() rewriting in the parser test
2026-06-14 01:37:51 +02:00
Xavier Roche
ca810ef7e3 Lock CSS background-image url() rewriting in the parser test
background-image is already captured and rewritten through the style/CSS
url() path, in both an external <style> block and an inline style attribute,
with the URL unquoted, double-quoted or single-quoted. Extend the offline
parser test to cover all of these so the behavior stays locked.

closes #237
2026-06-14 01:07:42 +02:00
10 changed files with 418 additions and 10 deletions

27
.clang-format Normal file
View File

@@ -0,0 +1,27 @@
# clang-format 19 config for the HTTrack C engine.
#
# IMPORTANT: this is applied to TOUCHED LINES ONLY (via git-clang-format / the
# CI format check). The engine was originally formatted by GNU indent / by hand
# and does NOT round-trip through clang-format, so a whole-tree reformat is
# intentionally never done. Format the lines you change; leave the rest.
#
# Reverse-engineered from src/*.c: 2-space indent, no tabs, 80 columns, pointers
# bound to the name (char *x), attached braces, un-indented case labels, and a
# space after C-style casts ((int) x). Most of that is LLVM's defaults; the
# lines below are the deliberate deviations.
BasedOnStyle: LLVM
# Engine specifics / deviations from LLVM:
SpaceAfterCStyleCast: true # "(int) x", overwhelmingly dominant (542 vs 7)
SortIncludes: false # C include order can be significant; never reorder
IncludeBlocks: Preserve # do not merge/reflow include groups
# Stated explicitly for robustness against base-style drift (these match LLVM):
IndentWidth: 2
UseTab: Never
ColumnLimit: 80
PointerAlignment: Right
IndentCaseLabels: false
SpaceBeforeParens: ControlStatements
AllowShortIfStatementsOnASingleLine: Never

View File

@@ -85,3 +85,61 @@ jobs:
- name: shfmt
run: shfmt -d -i 4 man/makeman.sh tools/mkdeb.sh
# Check clang-format on CHANGED LINES ONLY. The engine predates clang-format
# (it was shaped by an old Visual Studio formatter) and does not round-trip,
# so we never reformat the whole tree -- only the lines a PR touches.
format:
name: format (clang-format-19, changed lines)
if: github.event_name == 'pull_request'
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install clang-format 19 (pinned, from apt.llvm.org)
run: |
set -euo pipefail
# ubuntu-24.04's native clang-format is 18; pin 19 to match local dev.
wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \
| sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null
echo "deb http://apt.llvm.org/noble/ llvm-toolchain-noble-19 main" \
| sudo tee /etc/apt/sources.list.d/llvm-19.list >/dev/null
sudo apt-get update
sudo apt-get install -y --no-install-recommends clang-format-19
# git-clang-format driver, pinned to an immutable release tag (not a
# moving branch) since we curl and then execute it.
sudo curl -fsSL -o /usr/local/bin/git-clang-format \
https://raw.githubusercontent.com/llvm/llvm-project/llvmorg-19.1.7/clang/tools/clang-format/git-clang-format
sudo chmod 0755 /usr/local/bin/git-clang-format
clang-format-19 --version
- name: Check formatting of changed lines
run: |
set -euo pipefail
git fetch --no-tags origin \
"+refs/heads/${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}"
base="origin/${{ github.base_ref }}"
set +e
diff="$(git clang-format --binary clang-format-19 --style=file \
--diff --extensions c,h "$base")"
rc=$?
set -e
# Classify by output first: a non-empty diff means "not clean",
# regardless of the driver's exit convention (the release-tag driver
# exits 0 and signals via stdout; some packaged drivers exit 1 on a
# diff). A nonzero exit with clean output is a real checker error.
case "$diff" in
"" | "no modified files to format" | *"did not modify any files"*)
if [ "$rc" -ne 0 ]; then
echo "::error::git clang-format failed (exit $rc): checker error."
exit 1
fi
echo "Formatting OK: changed C lines are clang-format-clean." ;;
*)
echo "$diff"
echo "::error::Changed C lines are not clang-format-clean."
echo "Fix locally with: git clang-format --binary clang-format-19 $base"
exit 1 ;;
esac

View File

@@ -201,8 +201,8 @@ HTSEXT_API int catch_url(T_SOC soc, char *url, char *method, char *data) {
while(strnotempty(line)) {
socinput(soc, line, 1000);
treathead(NULL, NULL, NULL, &blkretour, line); // traiter
strcatbuff(data, line);
strcatbuff(data, "\r\n");
strlcatbuff(data, line, CATCH_URL_DATA_SIZE);
strlcatbuff(data, "\r\n", CATCH_URL_DATA_SIZE);
}
// CR/LF final de l'en tête inutile car déja placé via la ligne vide juste au dessus
//strcatbuff(data,"\r\n");

View File

@@ -40,6 +40,9 @@ Please visit our Website: http://www.httrack.com
/* Library internal definictions */
#ifdef HTS_INTERNAL_BYTECODE
// Capacity contract for the catch_url() 'data' buffer (32 Kb).
#define CATCH_URL_DATA_SIZE 32768
// Fonctions
void socinput(T_SOC soc, char *s, int max);

View File

@@ -140,6 +140,93 @@ static void basic_selftests(void) {
md5selftest();
}
/* Self-tests for the htssafe.h bounded string ops (driven by httrack -#8).
Returns 0 if every bounded operation behaved correctly, 1 otherwise.
The abort-on-overflow guarantee is checked separately by the -#8 "overflow"
sub-mode (it aborts the process by design). */
static int string_safety_selftests(void) {
char buf[8];
/* strcpybuff into a sized array: exact copy */
strcpybuff(buf, "abc");
if (strcmp(buf, "abc") != 0)
return 1;
/* strcatbuff append within capacity */
strcatbuff(buf, "de");
if (strcmp(buf, "abcde") != 0)
return 1;
/* strncatbuff appends at most N source chars */
strcpybuff(buf, "ab");
strncatbuff(buf, "cdef", 2);
if (strcmp(buf, "abcd") != 0)
return 1;
/* strlcpybuff: explicit-capacity copy into a pointer destination, the form
the migration moves toward */
{
char storage[8];
char *const p = storage;
strlcpybuff(p, "hello", sizeof(storage));
if (strcmp(p, "hello") != 0)
return 1;
}
/* strcpybuff into a pointer destination: routes through the unchecked
strcpybuff_ptr_ fallback (the path the -#8 warning flags). The warning is
intentional here; we only verify the fallback still copies correctly. */
#if defined(__GNUC__)
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wattribute-warning"
#endif
{
char storage[8];
char *const p = storage;
strcpybuff(p, "ptr");
if (strcmp(p, "ptr") != 0)
return 1;
}
#if defined(__GNUC__)
#pragma GCC diagnostic pop
#endif
/* htsbuff: bounded builder over a fixed array (append, truncating append,
reset, and length tracking) */
{
char dst[8];
htsbuff b = htsbuff_array(dst);
htsbuff_cat(&b, "ab");
htsbuff_cat(&b, "cd");
if (strcmp(htsbuff_str(&b), "abcd") != 0 || b.len != 4)
return 1;
htsbuff_catn(&b, "efghij", 2); /* append at most 2 */
if (strcmp(htsbuff_str(&b), "abcdef") != 0)
return 1;
htsbuff_cpy(&b, "xyz"); /* reset */
if (strcmp(htsbuff_str(&b), "xyz") != 0 || b.len != 3)
return 1;
}
/* boundary: filling to exactly cap-1 must succeed (one more aborts, which the
-#8 overflow-buff mode checks) */
{
char d2[4];
htsbuff c = htsbuff_array(d2);
htsbuff_cat(&c, "abc");
if (strcmp(htsbuff_str(&c), "abc") != 0 || c.len != 3)
return 1;
}
return 0;
}
static int hts_main_internal(int argc, char **argv, httrackp * opt);
// Main, récupère les paramètres et appelle le robot
@@ -2437,6 +2524,35 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
htsmain_free();
return 0;
break;
case '8': /* string-safety selftest: httrack -#8 [overflow <bigstr>] */
if (na + 1 < argc
&& strncmp(argv[na + 1], "overflow", 8) == 0) {
/* Deliberately exceed a sized buffer: the bounded op must
abort. The source comes from argv so its length is opaque
to the compiler (no static -Wstringop-overflow, genuine
runtime check). "overflow-buff" exercises htsbuff. */
char small[4];
const char *const src =
(na + 2 < argc) ? argv[na + 2] : "overflowing";
if (strcmp(argv[na + 1], "overflow-buff") == 0) {
htsbuff b = htsbuff_array(small);
htsbuff_cat(&b, src);
} else {
strcpybuff(small, src);
}
printf("strsafe: NOT aborted\n"); /* must be unreachable */
htsmain_free();
return 1;
} else {
const int err = string_safety_selftests();
printf("strsafe: %s\n", err ? "FAIL" : "OK");
htsmain_free();
return err;
}
break;
case '7': // hashtable selftest: httrack -#7 nb_entries
basic_selftests();
if (++na < argc) {

View File

@@ -409,7 +409,7 @@ void help_catchurl(const char *dest_path) {
if (soc != INVALID_SOCKET) {
char BIGSTK url[HTS_URLMAXSIZE * 2];
char method[32];
char BIGSTK data[32768];
char BIGSTK data[CATCH_URL_DATA_SIZE];
url[0] = method[0] = data[0] = '\0';
//

View File

@@ -123,41 +123,111 @@ static HTS_UNUSED void htssafe_compile_time_check_(void) {
(void) check_pointer;
}
/*
* Pointer-destination diagnostics for the buff() macros (GCC/Clang, C only).
*
* strcpybuff()/strcatbuff()/strncatbuff() bounds-check only when the
* destination is a sized char[] array (HTS_IS_CHAR_BUFFER). For a bare char*
* the capacity is unknown, so the macro silently falls back to plain
* strcpy()/strcat()/strncat() while still looking like a checked call.
*
* These stubs route that pointer case through __builtin_choose_expr() so the
* 'warning' attribute fires only at pointer-destination sites; array sites use
* the bounded *_safe_ helpers and stay quiet. The warning names the
* explicit-size replacement (strlcpybuff/strlcatbuff). Diagnostic only: no
* runtime or ABI change, built only on GCC/Clang in C mode. Other compilers
* (MSVC, ...) keep the previous behavior via the #else branches.
*/
#if (defined(__GNUC__) && !defined(__cplusplus))
#if defined(__has_attribute)
#if __has_attribute(warning)
#define HTS_BUFF_PTR_ATTR(msg) __attribute__((unused, noinline, warning(msg)))
#endif
#endif
#ifndef HTS_BUFF_PTR_ATTR
/* 'warning' attribute unavailable: keep noinline so the migration can still
grep for these symbols, but no compile-time diagnostic is emitted. */
#define HTS_BUFF_PTR_ATTR(msg) __attribute__((unused, noinline))
#endif
HTS_BUFF_PTR_ATTR("strcpybuff() destination is a pointer (capacity unknown): "
"NOT bounds-checked; use strlcpybuff(dst, src, size)")
static char *strcpybuff_ptr_(char *dest, const char *src) {
return strcpy(dest, src);
}
HTS_BUFF_PTR_ATTR("strcatbuff() destination is a pointer (capacity unknown): "
"NOT bounds-checked; use strlcatbuff(dst, src, size)")
static char *strcatbuff_ptr_(char *dest, const char *src) {
return strcat(dest, src);
}
HTS_BUFF_PTR_ATTR("strncatbuff() destination is a pointer (capacity unknown): "
"NOT bounds-checked; use strlcatbuff(dst, src, size)")
static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
return strncat(dest, src, n);
}
#endif
/**
* Append at most N characters from "B" to "A".
* If "A" is a char[] variable whose size is not sizeof(char*), then the size
* is assumed to be the capacity of this array.
*/
#if (defined(__GNUC__) && !defined(__cplusplus))
#define strncatbuff(A, B, N) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
strncat_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__), \
strncatbuff_ptr_((A), (B), (N)) )
#else
#define strncatbuff(A, B, N) \
( HTS_IS_NOT_CHAR_BUFFER(A) \
? strncat(A, B, N) \
: strncat_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) )
#endif
/**
* Append characters of "B" to "A".
* If "A" is a char[] variable whose size is not sizeof(char*), then the size
* is assumed to be the capacity of this array.
*/
#if (defined(__GNUC__) && !defined(__cplusplus))
#define strcatbuff(A, B) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
strncat_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__), \
strcatbuff_ptr_((A), (B)) )
#else
#define strcatbuff(A, B) \
( HTS_IS_NOT_CHAR_BUFFER(A) \
? strcat(A, B) \
: strncat_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) )
#endif
/**
* Copy characters from "B" to "A".
* If "A" is a char[] variable whose size is not sizeof(char*), then the size
* is assumed to be the capacity of this array.
*/
#if (defined(__GNUC__) && !defined(__cplusplus))
#define strcpybuff(A, B) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
strcpy_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
"overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__), \
strcpybuff_ptr_((A), (B)) )
#else
#define strcpybuff(A, B) \
( HTS_IS_NOT_CHAR_BUFFER(A) \
? strcpy(A, B) \
: strcpy_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
"overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__) )
#endif
/**
* Append characters of "B" to "A", "A" having a maximum capacity of "S".
@@ -217,6 +287,81 @@ static HTS_INLINE HTS_UNUSED char* strcpy_safe_(char *const dest, const size_t s
return strncat_safe_(dest, sizeof_dest, source, sizeof_source, (size_t) -1, exp, file, line);
}
/**
* htsbuff: a non-owning bounded string builder over a fixed buffer.
*
* Companion to the strcpybuff()/strcatbuff() macros for the common case of a
* cursor walking a buffer of known capacity (building a name into a fixed
* array, assembling a status line, etc.). It tracks the write position, bounds
* every write against the real capacity, and aborts on overflow (same contract
* as the *_safe_ helpers), so the error-prone manual "p += strlen(p)" dance
* goes away.
*
* Build one from an in-scope array with htsbuff_array() (capacity via sizeof,
* so pass an array, not a pointer), or from a pointer of known capacity with
* htsbuff_ptr(). The buffer is kept NUL-terminated; htsbuff_str() returns it.
*/
typedef struct {
char *buf; /* backing buffer (kept NUL-terminated) */
size_t cap; /* total capacity of buf, including the NUL */
size_t len; /* current length, excluding the NUL */
} htsbuff;
static HTS_INLINE HTS_UNUSED htsbuff htsbuff_ptr_(char *buf, size_t cap) {
htsbuff b;
b.buf = buf;
b.cap = cap;
b.len = 0;
assertf(cap != 0);
buf[0] = '\0';
return b;
}
/**
* Builder over the in-scope array ARR (capacity = sizeof(ARR)).
* On GCC/Clang this rejects a non-array (e.g. a char* pointer), whose sizeof
* would be the pointer size and silently wrong; use htsbuff_ptr() for pointers.
* On other compilers there is no such guard, so pass only true arrays there.
*/
#if (defined(__GNUC__) && !defined(__cplusplus))
/* 0 for an array, a -1 array-size compile error for a pointer. */
#define htsbuff_must_be_array_(A) \
(sizeof(char[1 - 2 * !!__builtin_types_compatible_p(typeof(A), typeof(&(A)[0]))]) - 1)
#define htsbuff_array(ARR) htsbuff_ptr_((ARR), sizeof(ARR) + htsbuff_must_be_array_(ARR))
#else
#define htsbuff_array(ARR) htsbuff_ptr_((ARR), sizeof(ARR))
#endif
/** Builder over pointer P of known capacity N (N includes the NUL). */
#define htsbuff_ptr(P, N) htsbuff_ptr_((P), (N))
/** Append at most n characters of s (stopping at its NUL). Aborts on overflow. */
static HTS_INLINE HTS_UNUSED void htsbuff_catn(htsbuff *b, const char *s, size_t n) {
const size_t add = strnlen(s, n);
/* Overflow-safe: keep the (potentially huge) 'add' alone on one side. The
maintained invariant len < cap makes 'cap - len' >= 1 (no underflow), so
'add < cap - len' cannot wrap the way 'len + add < cap' could. */
assertf__(add < b->cap - b->len, "htsbuff append overflow", __FILE__, __LINE__);
memcpy(b->buf + b->len, s, add);
b->len += add;
b->buf[b->len] = '\0';
}
/** Append s. Aborts on overflow. */
static HTS_INLINE HTS_UNUSED void htsbuff_cat(htsbuff *b, const char *s) {
htsbuff_catn(b, s, (size_t) -1);
}
/** Reset content to s. Aborts on overflow. */
static HTS_INLINE HTS_UNUSED void htsbuff_cpy(htsbuff *b, const char *s) {
b->len = 0;
htsbuff_catn(b, s, (size_t) -1);
}
/** Current NUL-terminated content. */
static HTS_INLINE HTS_UNUSED const char *htsbuff_str(const htsbuff *b) {
return b->buf;
}
#define malloct(A) malloc(A)
#define calloct(A,B) calloc((A), (B))
#define freet(A) do { if ((A) != NULL) { free(A); (A) = NULL; } } while(0)

View File

@@ -99,17 +99,25 @@ grep -Eq 'srcset="j\.gif 2x"' "$saved" ||
! grep -Eq 'srcset="[^"]*file://' "$saved" ||
! echo "FAIL: a file:// URL survived inside a rewritten srcset attribute" || exit 1
# xlink:href (#298) and inline background-image (#237): detected and rewritten
# to local; no-detect attributes (title, alt, ...) left untouched. Asserted by
# rewrite (deterministic), not download. data-* (#201/#203) is omitted: its
# detection is currently nondeterministic and can't be locked yet.
# xlink:href (#298) and CSS background-image (#237): detected and rewritten to
# local. background-image is covered in both an external <style> block and an
# inline style attribute, with the URL unquoted, double-quoted and single-quoted
# (the quote style is preserved on rewrite). No-detect attributes (title, alt,
# ...) are left untouched. Asserted by rewrite (deterministic), not download.
# data-* (#201/#203) is omitted: its detection is currently nondeterministic and
# can't be locked yet.
site2="$tmp/attrs"
mkdir -p "$site2"
for f in xl ibg tt; do gif "$site2/$f.gif"; done
for f in xl ibg ibgs cex cexd cexs tt; do gif "$site2/$f.gif"; done
cat >"$site2/index.html" <<EOF
<html><body>
<html><head><style>
.a { background-image: url(file://$site2/cex.gif); }
.b { background-image: url("file://$site2/cexd.gif"); }
.c { background-image: url('file://$site2/cexs.gif'); }
</style></head><body>
<a xlink:href="file://$site2/xl.gif">xlink:href (#298)</a>
<div style="background-image:url(file://$site2/ibg.gif)"></div>
<div style="background-image:url('file://$site2/ibgs.gif')"></div>
<span title="file://$site2/tt.gif">excluded attribute</span>
</body></html>
EOF
@@ -121,8 +129,24 @@ test -n "$saved2" || ! echo "FAIL: saved attrs page not found" || exit 1
# detected attributes: the absolute URL is rewritten to a local link
grep -Eq 'xlink:href="xl\.gif"' "$saved2" ||
! echo "FAIL #298: xlink:href not detected/rewritten" || exit 1
# #237 external <style> block, each quoting form, quote style preserved
grep -Eq 'url\(cex\.gif\)' "$saved2" ||
! echo "FAIL #237: unquoted background-image in <style> not rewritten" || exit 1
grep -Eq 'url\("cexd\.gif"\)' "$saved2" ||
! echo "FAIL #237: double-quoted background-image in <style> not rewritten" || exit 1
grep -Eq "url\('cexs\.gif'\)" "$saved2" ||
! echo "FAIL #237: single-quoted background-image in <style> not rewritten" || exit 1
# #237 inline style attribute, unquoted and single-quoted url()
grep -Eq 'style="background-image:url\(ibg\.gif\)"' "$saved2" ||
! echo "FAIL #237: inline background-image url() not detected/rewritten" || exit 1
! echo "FAIL #237: inline unquoted background-image not rewritten" || exit 1
grep -Eq "style=\"background-image:url\('ibgs\.gif'\)\"" "$saved2" ||
! echo "FAIL #237: inline single-quoted background-image not rewritten" || exit 1
# no file:// URL survived inside any rewritten background-image
! grep -Eq 'background-image:[^;"]*file://' "$saved2" ||
! echo "FAIL #237: a file:// URL survived inside a rewritten background-image" || exit 1
# excluded attribute: title is on the no-detect list, so its value is left as-is
grep -q 'title="file://' "$saved2" ||

34
tests/01_engine-strsafe.test Executable file
View File

@@ -0,0 +1,34 @@
#!/bin/bash
#
# htssafe.h bounded string operations (driven by 'httrack -#8').
# Success path: every bounded op (strcpybuff/strcatbuff/strncatbuff/strlcpybuff)
# must behave correctly. Like the other -# debug modes, a trailing token is
# required (a bare '-#8' falls through to the usage screen).
out=$(httrack -#8 run)
test $? -eq 0 || exit 1
test "$out" == "strsafe: OK" || exit 1
# Overflow path: an over-capacity write into a sized buffer must be caught by
# the bounded macro and abort the process, not be silently truncated/completed.
# Assert the htssafe abort signature specifically, so the test cannot pass for
# an unrelated reason (e.g. the -#8 mode being gone and falling through to the
# usage screen, which also exits non-zero).
err=$(httrack -#8 overflow "this string is far too long for the buffer" 2>&1)
case "$err" in
*"strsafe: NOT aborted"*) echo "over-capacity write was NOT caught" >&2; exit 1 ;;
*"overflow while copying"*) ;;
*) echo "expected htssafe overflow abort, got: $err" >&2; exit 1 ;;
esac
# Same guarantee for the htsbuff builder. The source is exactly the buffer
# capacity (4 bytes into a 4-byte buffer), so this also pins the boundary: a
# '<=' off-by-one in the capacity check would let it through (and print "NOT
# aborted"). Match the specific htsbuff abort message, not just any assert.
err=$(httrack -#8 overflow-buff "abcd" 2>&1)
case "$err" in
*"strsafe: NOT aborted"*) echo "htsbuff over-capacity write was NOT caught" >&2; exit 1 ;;
*"htsbuff append overflow"*) ;;
*) echo "expected htsbuff overflow abort, got: $err" >&2; exit 1 ;;
esac

View File

@@ -20,6 +20,7 @@ TESTS = \
01_engine-mime.test \
01_engine-parse.test \
01_engine-simplify.test \
01_engine-strsafe.test \
02_manpage-regen.test \
10_crawl-simple.test \
11_crawl-cookies.test \