mirror of
https://github.com/xroche/httrack.git
synced 2026-06-28 04:57:49 +03:00
Wire a new `httrack -#A <dir>` debug option that exercises the ZIP cache end to end through the public API (cache_init / cache_add / cache_readex), in a dedicated source file (htscache_selftest.c). It stores, then reads back asserting every header field and the body round-trip exactly: - hand-crafted edge cases: a normal HTML page, an empty redirect with a near-limit location, a non-HTML body kept in cache via all-in-cache, and a binary body with embedded NUL and high bytes (compared with memcmp); - a few thousand small entries, to stress the index/lookup at scale; - a few large compressible and incompressible bodies, to exercise zlib deflate/inflate and large-buffer handling. It then updates one entry and confirms the new value is read back. The driver returns the number of mismatches so failures are observable. The whole cache weighs ~1-2 MB and the run takes a fraction of a second. The location case is sized to the cache's real per-header-line round-trip limit: cached headers are parsed through a HTS_URLMAXSIZE-sized line buffer, so a value longer than that is truncated on read regardless of the larger r.location buffer; 1000 bytes stays safely under it. A dedicated test (tests/01_engine-cache.test) drives the option, asserts the success line, that a ZIP cache was written, and that its footprint stays under a sane ceiling. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>
47 lines
1.7 KiB
Bash
Executable File
47 lines
1.7 KiB
Bash
Executable File
#!/bin/bash
|
|
#
|
|
|
|
# Cache create/read/update logic (driven by 'httrack -#A <dir>').
|
|
#
|
|
# The in-process self-test stores several hand-crafted edge entries (normal
|
|
# HTML, an empty redirect with a near-limit location, a non-HTML body kept via
|
|
# all-in-cache, a binary body with embedded NUL/high bytes), a few thousand
|
|
# small entries (index/lookup scale), and a few large compressible and
|
|
# incompressible bodies (zlib deflate/inflate). It reads everything back
|
|
# asserting every header field and the body round-trip byte for byte, then
|
|
# updates one entry and confirms the new value is read back. It exits non-zero
|
|
# on the first mismatch.
|
|
|
|
set -eu
|
|
|
|
dir=$(mktemp -d)
|
|
trap 'rm -rf "$dir"' EXIT
|
|
|
|
# Like the other -# debug modes, a trailing token (the working directory) is
|
|
# required; a bare '-#A' falls through to the usage screen.
|
|
out=$(httrack -#A "$dir")
|
|
|
|
# Match the exact success line, so the test cannot pass for an unrelated reason
|
|
# (e.g. the -#A mode being gone and falling through to the usage screen, which
|
|
# also exits non-zero but never prints this).
|
|
test "$out" = "cache-selftest: OK" || {
|
|
echo "expected 'cache-selftest: OK', got: $out" >&2
|
|
exit 1
|
|
}
|
|
|
|
# The self-test must have actually produced a ZIP cache on disk.
|
|
test -e "$dir/hts-cache/new.zip" || {
|
|
echo "no ZIP cache was written by the self-test" >&2
|
|
exit 1
|
|
}
|
|
|
|
# Sanity-check the cache footprint: the few-thousand-entry pass is expected to
|
|
# weigh ~1-2 MB. Fail if it balloons well past that (e.g. a per-entry overhead
|
|
# regression or runaway growth), so the cache size stays bounded.
|
|
ceiling=$((4 * 1024 * 1024))
|
|
bytes=$(du -sb "$dir/hts-cache" | cut -f1)
|
|
test "$bytes" -le "$ceiling" || {
|
|
echo "cache footprint $bytes bytes exceeds ${ceiling} ceiling" >&2
|
|
exit 1
|
|
}
|