Files
httrack/tests/01_engine-cache.test
Xavier Roche 83ff148efd Add an in-process cache create/read/update self-test
Wire a new `httrack -#A <dir>` debug option that exercises the ZIP cache
end to end through the public API (cache_init / cache_add / cache_readex),
in a dedicated source file (htscache_selftest.c).

It stores, then reads back asserting every header field and the body
round-trip exactly:
- hand-crafted edge cases: a normal HTML page, an empty redirect with a
  near-limit location, a non-HTML body kept in cache via all-in-cache, and
  a binary body with embedded NUL and high bytes (compared with memcmp);
- a few thousand small entries, to stress the index/lookup at scale;
- a few large compressible and incompressible bodies, to exercise zlib
  deflate/inflate and large-buffer handling.

It then updates one entry and confirms the new value is read back. The
driver returns the number of mismatches so failures are observable. The
whole cache weighs ~1-2 MB and the run takes a fraction of a second.

The location case is sized to the cache's real per-header-line round-trip
limit: cached headers are parsed through a HTS_URLMAXSIZE-sized line
buffer, so a value longer than that is truncated on read regardless of
the larger r.location buffer; 1000 bytes stays safely under it.

A dedicated test (tests/01_engine-cache.test) drives the option, asserts
the success line, that a ZIP cache was written, and that its footprint
stays under a sane ceiling.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 17:47:04 +02:00

47 lines
1.7 KiB
Bash
Executable File

#!/bin/bash
#
# Cache create/read/update logic (driven by 'httrack -#A <dir>').
#
# The in-process self-test stores several hand-crafted edge entries (normal
# HTML, an empty redirect with a near-limit location, a non-HTML body kept via
# all-in-cache, a binary body with embedded NUL/high bytes), a few thousand
# small entries (index/lookup scale), and a few large compressible and
# incompressible bodies (zlib deflate/inflate). It reads everything back
# asserting every header field and the body round-trip byte for byte, then
# updates one entry and confirms the new value is read back. It exits non-zero
# on the first mismatch.
set -eu
dir=$(mktemp -d)
trap 'rm -rf "$dir"' EXIT
# Like the other -# debug modes, a trailing token (the working directory) is
# required; a bare '-#A' falls through to the usage screen.
out=$(httrack -#A "$dir")
# Match the exact success line, so the test cannot pass for an unrelated reason
# (e.g. the -#A mode being gone and falling through to the usage screen, which
# also exits non-zero but never prints this).
test "$out" = "cache-selftest: OK" || {
echo "expected 'cache-selftest: OK', got: $out" >&2
exit 1
}
# The self-test must have actually produced a ZIP cache on disk.
test -e "$dir/hts-cache/new.zip" || {
echo "no ZIP cache was written by the self-test" >&2
exit 1
}
# Sanity-check the cache footprint: the few-thousand-entry pass is expected to
# weigh ~1-2 MB. Fail if it balloons well past that (e.g. a per-entry overhead
# regression or runaway growth), so the cache size stays bounded.
ceiling=$((4 * 1024 * 1024))
bytes=$(du -sb "$dir/hts-cache" | cut -f1)
test "$bytes" -le "$ceiling" || {
echo "cache footprint $bytes bytes exceeds ${ceiling} ceiling" >&2
exit 1
}