Compare commits

...

1 Commits

Author SHA1 Message Date
Xavier Roche
09e3b80520 tests: lock "no error pages" (-o0) write-suppression (#17)
#17 (WinHTTrack 3.47-19, 2013) reported 404 error pages and 0-byte files
kept and unpurged with "no error pages" set. It does not reproduce on
current master/Linux: -o0 keeps 4xx/5xx bodies off disk and out of the
purge list, a genuine 0-byte 200 is correctly saved, and purge removes
stale files on update. The report's .html names were the extension-mangle
bug (Defect A, fixed in #408 — the reporter switched to HTTP/1.0 because
binaries were renamed .html); the settings-revert-on-update path is fixed
by the hts_tristate option work (4549ec3, #413).

Add an /errpage/ route group to local-server.py and 23_local-errpage.test
locking -o0 suppression with an -o1 control. Negative-control verified:
neutering the errpage gate (htsparse.c:3902) makes the test fail.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-25 17:54:47 +02:00
3 changed files with 43 additions and 1 deletions

View File

@@ -0,0 +1,19 @@
#!/bin/bash
# Issue #17: with "no error pages" (-o0), 4xx/5xx bodies must not be written;
# a genuine 0-byte 200 stays. Default (-o1) writes the error page. (#17's purge
# half also does not reproduce; the purge path is not exercised here.)
set -e
: "${top_srcdir:=..}"
# -o0: 404 suppressed, good page and the legit 0-byte 200 kept.
bash "$top_srcdir/tests/local-crawl.sh" --errors 1 \
--found 'errpage/good.html' \
--found 'errpage/empty.html' \
--not-found 'errpage/missing.html' \
httrack 'BASEURL/errpage/index.html' '-o0'
# Control -o1 (default): the 404 error page is written.
bash "$top_srcdir/tests/local-crawl.sh" --errors 1 \
--found 'errpage/missing.html' \
httrack 'BASEURL/errpage/index.html' '-o1'

View File

@@ -62,6 +62,7 @@ TESTS = \
19_local-connect-fallback.test \
20_local-resume-loop.test \
21_local-intl-update.test \
22_local-broken-size.test
22_local-broken-size.test \
23_local-errpage.test
CLEANFILES = check-network_sh.cache

View File

@@ -225,6 +225,24 @@ class Handler(SimpleHTTPRequestHandler):
self.send_header("Content-Length", "0")
self.end_headers()
# error pages / 0-byte files (#17): -o0 ("no error pages") must keep 4xx/5xx
# bodies off disk; a genuine 0-byte 200 is a valid file and stays.
def route_errpage_index(self):
self.send_html(
'\t<a href="good.html">good</a>\n'
'\t<a href="missing.html">missing</a>\n'
'\t<a href="empty.html">empty</a>\n'
)
def route_errpage_good(self):
self.send_raw(b"<html><body>good page</body></html>\n", "text/html")
def route_errpage_missing(self):
self.send_html("\t404 error body", status=404, extra_status="Not Found")
def route_errpage_empty(self):
self.send_raw(b"", "text/html")
# broken Content-Length (#32/#41): declared size != bytes sent. httrack
# warns "bogus state (broken size)" and skips the cache unless -%B.
def route_size_index(self):
@@ -265,6 +283,10 @@ class Handler(SimpleHTTPRequestHandler):
"/resume/blob.txt": route_resume,
"/size/index.html": route_size_index,
"/size/oversize.bin": route_size_oversize,
"/errpage/index.html": route_errpage_index,
"/errpage/good.html": route_errpage_good,
"/errpage/missing.html": route_errpage_missing,
"/errpage/empty.html": route_errpage_empty,
}
# --- dispatch ----------------------------------------------------------