Compare commits

...

5 Commits

Author SHA1 Message Date
Xavier Roche
92ad109c30 Pin the naming contract: knobs and fixtures for content-independent naming
-#test=savename gains body= (leading body bytes via a temp url_sav
file) and cached= (a real one-entry cache, reopened read-only, whose
stored body is PNG magic); new rows and 01_zlib-savename-cached.test
pin that naming never depends on content or on the previously recorded
save name, only on headers. e2e fixtures (wrongtype.jpg served as
image/png, a gzip variant, a 16 KiB body, content that changes between
crawls) pin the wire-wins outcome across fresh and update passes. Any
future content-based tie-break must flip these rows explicitly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-07-04 08:58:07 +02:00
Xavier Roche
56b809c82d Make the wire-type-vs-extension naming decision an explicit verdict
Behavior-preserving refactor of wire_patches_ext: the decision becomes
a three-way wire_ext_verdict (ext kept / wire wins / contested), with
the contested case, a specific declared type disagreeing with a
specific URL extension, named explicitly instead of falling through.
Today a contested verdict trusts the wire, unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-07-04 08:58:07 +02:00
Xavier Roche
d7c4eab1f5 Delayed type check finalizes fast transfers under their .delayed placeholder name (#479)
* Fix delayed-slot races: lock guards and finalize order

A slot whose savename is still the .delayed placeholder could be
finalized and recorded under that transient name when the transfer
completed before hts_wait_delayed resolved the type (fast servers):
back_clean ignored the lock hts_wait_delayed holds, the direct-to-disk
shortcut bound the output file to the placeholder name, and the
type-known finalize ran before url_sav was patched, caching the entry
under the wrong name (the 'bogus state (incomplete type)' warnings).
Skip locked slots in back_clean and in the disk shortcut, and patch
url_sav before finalizing.

The delayed-naming body sniff (next commit) waits for body bytes and
would hit these races deterministically; 15_local-types and
18_local-update cover the flow end to end.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>

* Exercise the degenerate delayed-type paths; audit .delayed leftovers everywhere

New delayed/ fixtures (302 without Location, self-loop, a chain one hop
past the redirect budget, a proper redirect, a typeless unknown-ext
file) with 33_local-delayed.test, run twice via --rerun. local-crawl.sh
now fails ANY local test whose finished mirror contains a *.delayed
temporary, guarding the whole suite against the #107 class.

Probing the degenerate paths settled a review question: afs->save can
still be a fresh placeholder when url_sav is patched (redirects that
never resolve), but nothing name-keyed is stored there -- non-OK
entries are cached header-only with an empty X-Save, the link is
dropped, no file is left. The test pins exactly that.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>

* Empty responses finalized delayed slots under the .delayed name

Found while probing for a harness trigger of the finalize race: a
delayed-type URL answered 200 with Content-Length: 0 takes the
empty-response branch, which reaches READY inside the header round and
calls back_finalize directly, while url_sav still holds the .delayed
placeholder. Deterministic on any such URL: the 'bogus state
(incomplete type)' warning and an entry recorded under the placeholder
name. Empty responses are common in the wild (beacons, placeholder
files), so this is the reproducible face of #5.

Defer the finalize when the slot is locked, matching the other guards:
the waiter finalizes right after patching url_sav. The new empty.php
fixture makes 33_local-delayed fail on master and pass here; the
timing race itself remains untriggerable from the harness (headers and
body advance in separate back_wait rounds; probe notes in the PR).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>

---------

Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 08:37:14 +02:00
Xavier Roche
2eac19655b Widen the savename self-test and cover Content-Disposition end to end (#477)
* tests: widen the savename self-test and cover Content-Disposition end to end

-#test=savename grows key=value knobs (cdispo=, statuscode=, status=, adr=,
strip=, urlhack= and its negatives, n83=, type=) plus repeatable
prior=adr|fil|sav rows that register an already-crawled link, so the .test
can pin the still-downloading mime path, redirect delayed naming, dedup and
collision suffixing, 8-3 modes, --strip-query dedup and hostile fils
(traversal, control chars, oversized names) - the regression net for the
upcoming resolve_extension work.

A new cdispo/ endpoint in local-server.py and 32_local-cdispo.test give the
Content-Disposition branches their first end-to-end coverage, including a
traversal filename reduced to its last component.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>

* review follow-ups: isolate the wire cdispo strip, add over-match negatives

The review found the e2e traversal row masked by two independent layers
(the wire parser and url_savename both strip path components), so a new
-#test=header self-test pins treathead's Content-Disposition parse alone.
Three negative rows keep dedup honest: a kept query key that differs, a
distinct URL under urlhack, and a same-basename-different-directory prior
must all produce a fresh name, not a false match. route_cdispo now reuses
send_raw via an extra_headers argument.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>

---------

Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 19:42:32 +02:00
Xavier Roche
83c231d50e Fold the type-probe's triplicated extension decision into one function (#476)
* htsname: extract resolve_extension() from the type-probe block

The cache, backing-headers and live-probe paths each carried the same
cdispo / wire_patches_ext / give_mimext decision, per the block's own
FIXME ("factorize this unholy mess"); fold the three copies into one
static resolve_extension(). Drop the dead DEFAULT_BIN_EXT arms (the
define has been commented out in htsconfig.h for years) and the ishtest
variable only they read.

-#test=savename gains an optional Content-Disposition argument, and the
engine test now pins the filename-replacement path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>

* tests: pin cdispo reserved-char sanitization in the savename table

Review follow-up: adds a row where the expected name differs from the
Content-Disposition input, exercising the hostile-cdispo cleaning path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>

---------

Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 08:38:30 +02:00
14 changed files with 620 additions and 122 deletions

View File

@@ -2237,12 +2237,13 @@ int host_wait(httrackp * opt, lien_back * back) {
static int slot_can_be_cleaned(const lien_back * back) {
return (back->status == STATUS_READY) // ready
/* Check autoclean */
&& (!back->testmode) // not test mode
&& (strnotempty(back->url_sav)) // filename exists
&& (HTTP_IS_OK(back->r.statuscode)) // HTTP "OK"
&& (back->r.size >= 0) // size>=0
;
/* Check autoclean */
&& (!back->locked) // not held by hts_wait_delayed (name pending)
&& (!back->testmode) // not test mode
&& (strnotempty(back->url_sav)) // filename exists
&& (HTTP_IS_OK(back->r.statuscode)) // HTTP "OK"
&& (back->r.size >= 0) // size>=0
;
}
static int slot_can_be_finalized(httrackp * opt, const lien_back * back) {
@@ -2891,10 +2892,10 @@ void back_wait(struct_back * sback, httrackp * opt, cache_back * cache,
// range size hack old location
#if HTS_DIRECTDISK
// Court-circuit:
// Peut-on stocker le fichier directement sur disque?
// Ahh que ca serait vachement mieux et que ahh que la mémoire vous dit merci!
if (back[i].status) {
// Shortcut: store the file directly on disk when possible,
// sparing memory
if (back[i].status &&
!back[i].locked) { // name still pending when locked
if (back[i].r.is_write == 0) { // mode mémoire
if (back[i].r.adr == NULL) { // rien n'a été écrit
if (!back[i].testmode) { // pas mode test
@@ -3960,8 +3961,12 @@ void back_wait(struct_back * sback, httrackp * opt, cache_back * cache,
&& (back[i].r.adr = (char *) malloct(2))) {
back[i].r.adr[0] = 0;
}
hts_log_print(opt, LOG_TRACE, "finalizing empty");
back_finalize(opt, cache, sback, i);
/* locked = name pending; the waiter finalizes after
patching url_sav (else: cached as .delayed, #5) */
if (!back[i].locked) {
hts_log_print(opt, LOG_TRACE, "finalizing empty");
back_finalize(opt, cache, sback, i);
}
} else if (!back[i].r.is_chunk) { // pas de chunk
//if (back[i].r.http11!=2) { // pas de chunk
back[i].is_chunk = 0;

View File

@@ -138,37 +138,66 @@ static void cleanEndingSpaceOrDot(char *s) {
}
}
/* Should the wire Content-Type override the URL's own extension when naming the
saved file? True when the type is patchable (may_unknown2) and either the URL
extension implies no specific type or the server declared a disagreeing one.
A URL extension mapping to a specific non-HTML type is kept only when the
server declared NO type (the HTS_UNKNOWN_MIME sentinel; the #267 mangle
guard): a typeless .png stays .png, but a .pdf explicitly served as text/html
is named .html. The sentinel rides the cache, so updates stay consistent. */
/* Wire Content-Type vs URL extension: a patchable wire type wins over an
unspecific ext, the HTS_UNKNOWN_MIME sentinel keeps a specific non-HTML ext
(#267 guard), a declared disagreement is CONTESTED. Sentinel and verdict
ride the cache, so updates stay consistent. */
typedef enum wire_verdict {
WIRE_KEEPS_EXT,
WIRE_WINS,
WIRE_CONTESTED
} wire_verdict;
static wire_verdict wire_ext_verdict(httrackp *opt, const char *wiremime,
const char *file, char *urlmime,
size_t urlmime_size) {
if (may_unknown2(opt, wiremime, file))
return WIRE_KEEPS_EXT; /* type kept verbatim (keep-list / bogus-multiple) */
urlmime[0] = '\0';
/* type implied by the URL extension, only when confidently known (flag 0) */
if (!get_httptype_sized(opt, urlmime, urlmime_size, file, 0))
return WIRE_WINS; /* URL ext implies no known type */
if (strfield2(wiremime, urlmime))
return WIRE_KEEPS_EXT; /* agreement (no .htm->.html churn) */
if (!is_hypertext_mime(opt, urlmime, file) &&
strfield2(wiremime, HTS_UNKNOWN_MIME))
return WIRE_KEEPS_EXT; /* no declared type */
return WIRE_CONTESTED;
}
static int wire_patches_ext(httrackp *opt, const char *wiremime,
const char *file) {
char urlmime[256];
if (may_unknown2(opt, wiremime, file))
return 0; /* type kept verbatim (keep-list / bogus-multiple) */
urlmime[0] = '\0';
/* type implied by the URL extension, only when confidently known (flag 0) */
if (!get_httptype_sized(opt, urlmime, sizeof(urlmime), file, 0))
return 1; /* URL ext implies no known type: trust the wire type */
if (strfield2(wiremime, urlmime))
return 0; /* wire agrees with the ext: keep it (no .htm->.html churn) */
/* wire disagrees with a specific non-HTML URL ext. Keep the ext only when
the server declared no type (the sentinel); an explicitly declared type,
even text/html, is trusted, so a binary-looking URL that really serves
HTML (login/error interstitial, soft-404) is named .html. */
if (!is_hypertext_mime(opt, urlmime, file) &&
strfield2(wiremime, HTS_UNKNOWN_MIME))
switch (wire_ext_verdict(opt, wiremime, file, urlmime, sizeof(urlmime))) {
case WIRE_KEEPS_EXT:
return 0;
case WIRE_WINS:
return 1;
case WIRE_CONTESTED:
break; /* no content evidence is consulted today: trust the wire */
}
return 1;
}
// forme le nom du fichier à sauver (save) à partir de fil et adr
// système intelligent, qui renomme en cas de besoin (exemple: deux INDEX.HTML et index.html)
/* Wire-metadata name change: a Content-Disposition filename wins (returns 2),
else the declared type's ext when wire_patches_ext() allows (returns 1),
else 0. ext receives the new extension or replacement filename. */
static int resolve_extension(httrackp *opt, const char *cdispo,
const char *contenttype, const char *fil,
char *ext, size_t ext_size) {
if (strnotempty(cdispo)) {
strlcpybuff(ext, cdispo, ext_size);
return 2;
}
if (wire_patches_ext(opt, contenttype, fil) &&
give_mimext(ext, ext_size, contenttype))
return 1;
return 0;
}
// Build the local save name (save) from adr/fil; renames on collision
// (e.g. INDEX.HTML vs index.html).
int url_savename(lien_adrfilsave *const afs,
lien_adrfil *const former,
const char *referer_adr, const char *referer_fil,
@@ -405,45 +434,23 @@ int url_savename(lien_adrfilsave *const afs,
// si option check_type activée
if (is_html < 0 && opt->check_type && !ext_chg) {
int ishtest = 0;
if (protocol != PROTOCOL_FILE
&& protocol != PROTOCOL_FTP
) {
// tester type avec requète HEAD si on ne connait pas le type du fichier
if (!((opt->check_type == 1) && (fil[strlen(fil) - 1] == '/'))) // slash doit être html?
if (opt->savename_delayed == HTS_SAVENAME_DELAYED_HARD ||
(ishtest = ishtml(opt, fil)) <
0) { // unsure whether it's html or a file
ishtml(opt, fil) < 0) { // unsure whether it's html or a file
// lire dans le cache
htsblk r = cache_read_including_broken(opt, cache, adr, fil); // test uniquement
if (r.statuscode != -1) { // pas d'erreur de lecture cache
char s[32];
s[0] = '\0';
if (r.statuscode != -1) { // cache entry read OK
hts_log_print(opt, LOG_DEBUG, "Testing link type (from cache) %s%s",
adr_complete, fil_complete);
if (!HTTP_IS_REDIRECT(r.statuscode)) {
if (strnotempty(r.cdispo)) { /* filename given */
ext_chg = 2; /* change filename */
strcpybuff(ext, r.cdispo);
} else if (wire_patches_ext(opt, r.contenttype, fil)) {
if (give_mimext(s, sizeof(s),
r.contenttype)) { // recognized extension
ext_chg = 1;
strcpybuff(ext, s);
}
}
ext_chg = resolve_extension(opt, r.cdispo, r.contenttype, fil,
ext, sizeof(ext));
}
#ifdef DEFAULT_BIN_EXT
// no extension and potentially bogus
else if (ishtest == -2) {
ext_chg = 1;
strcpybuff(ext, DEFAULT_BIN_EXT + 1);
}
#endif
//
} else if (opt->savename_delayed != HTS_SAVENAME_DELAYED_HARD &&
is_userknowntype(opt, fil)) { /* PATCH BY BRIAN SCHRÖDER.
Lookup mimetype not only by extension,
@@ -467,22 +474,11 @@ int url_savename(lien_adrfilsave *const afs,
// fail later
else if (opt->savename_delayed != HTS_SAVENAME_DELAYED_NONE &&
!opt->state.stop) {
// Check if the file is ready in backing. We basically take the same logic as later.
// FIXME: we should cleanup and factorize this unholy mess
// Check if the file is ready in backing.
if (headers != NULL && headers->status >= 0 && !is_redirect) {
if (strnotempty(headers->r.cdispo)) { /* filename given */
ext_chg = 2; /* change filename */
strcpybuff(ext, headers->r.cdispo);
} else if (wire_patches_ext(opt, headers->r.contenttype,
headers->url_fil)) {
char s[16];
if (give_mimext(
s, sizeof(s),
headers->r.contenttype)) { // recognized extension
ext_chg = 1;
strcpybuff(ext, s);
}
}
ext_chg = resolve_extension(opt, headers->r.cdispo,
headers->r.contenttype,
headers->url_fil, ext, sizeof(ext));
}
else if (mime_type != NULL) {
ext[0] = '\0';
@@ -500,13 +496,6 @@ int url_savename(lien_adrfilsave *const afs,
if (!may_unknown2(opt, mime_type, fil)) {
ext_chg = 1;
}
#ifdef DEFAULT_BIN_EXT
// no extension and potentially bogus
else if (ishtml(opt, fil) == -2) {
ext_chg = 1;
strcpybuff(ext, DEFAULT_BIN_EXT + 1);
}
#endif
} else {
ext_chg = 0;
}
@@ -696,30 +685,10 @@ int url_savename(lien_adrfilsave *const afs,
// libérer emplacement backing
}
{ // pas d'erreur, changer type?
char s[16];
s[0] = '\0';
if (strnotempty(back[b].r.cdispo)) { /* filename given */
ext_chg = 2; /* change filename */
strcpybuff(ext, back[b].r.cdispo);
} else if (wire_patches_ext(opt, back[b].r.contenttype,
back[b].url_fil)) {
if (give_mimext(
s, sizeof(s),
back[b].r.contenttype)) { // recognized extension
ext_chg = 1;
strcpybuff(ext, s);
}
}
#ifdef DEFAULT_BIN_EXT
// no extension and potentially bogus
else if (ishtest == -2) {
ext_chg = 1;
strcpybuff(ext, DEFAULT_BIN_EXT + 1);
}
#endif
}
// no error: change the type?
ext_chg = resolve_extension(
opt, back[b].r.cdispo, back[b].r.contenttype,
back[b].url_fil, ext, sizeof(ext));
}
// FIN Si non déplacé, forcer type?

View File

@@ -4845,6 +4845,9 @@ int hts_wait_delayed(htsmoduleStruct * str, lien_adrfilsave *afs,
/* Still have a back reference */
if (b >= 0) {
/* Patch destination filename for direct-to-disk mode, BEFORE any
finalize: it records and caches the entry under url_sav */
strcpybuff(back[b].url_sav, afs->save);
/* Finalize now as we have the type */
if (back[b].status == STATUS_READY) {
if (!back[b].finalized) {
@@ -4852,8 +4855,6 @@ int hts_wait_delayed(htsmoduleStruct * str, lien_adrfilsave *afs,
back_finalize(opt, cache, sback, b);
}
}
/* Patch destination filename for direct-to-disk mode */
strcpybuff(back[b].url_sav, afs->save);
}
} // b >= 0

View File

@@ -1093,35 +1093,202 @@ static int st_resolve(httrackp *opt, int argc, char **argv) {
return 0;
}
/* Extra args are key=value: adr= cdispo= statuscode= status= strip= urlhack=
no-www= no-slash= no-query= n83= type=, plus repeatable prior=adr|fil|sav
registering an already-crawled link (dedup/collision paths). */
/* Parse raw response-header lines and print the naming-relevant fields. */
static int st_header(httrackp *opt, int argc, char **argv) {
htsblk r;
int i;
(void) opt;
if (argc < 1) {
fprintf(stderr, "header: needs at least one raw header line\n");
return 1;
}
memset(&r, 0, sizeof(r));
for (i = 0; i < argc; i++) {
char BIGSTK line[HTS_URLMAXSIZE * 2];
strcpybuff(line, argv[i]);
treathead(NULL, "www.example.com", "/", &r, line);
}
printf("contenttype=%s cdispo=%s\n", r.contenttype, r.cdispo);
return 0;
}
/* Decode a body argument ("hex:FFD8.." or literal text) into buf. */
static size_t st_decode_body(const char *arg, char *buf, size_t size) {
size_t n = 0;
if (strncmp(arg, "hex:", 4) == 0) {
const char *s = arg + 4;
for (; s[0] != '\0' && s[1] != '\0' && n + 1 < size; s += 2) {
unsigned int byte;
if (sscanf(s, "%2x", &byte) != 1)
break;
buf[n++] = (char) byte;
}
} else {
n = strlen(arg);
if (n >= size)
n = size - 1;
memcpy(buf, arg, n);
}
buf[n] = '\0';
return n;
}
static int st_savename(httrackp *opt, int argc, char **argv) {
lien_adrfilsave afs;
cache_back cache;
struct_back *sback;
hash_struct hash;
lien_back headers;
const char *adr = "www.example.com";
const char *cdispo = NULL;
const char *body = NULL;
const char *cached = NULL;
const char *bodyfile = "st-savename-body.tmp";
int statuscode = HTTP_OK, status = 0;
int i;
if (argc < 2) {
fprintf(stderr, "savename: needs a fil and a content-type\n");
return 1;
}
/* knobs first: hash_init and the prior links depend on them */
for (i = 2; i < argc; i++) {
const char *const a = argv[i];
if (strncmp(a, "adr=", 4) == 0)
adr = a + 4;
else if (strncmp(a, "cdispo=", 7) == 0)
cdispo = a + 7;
else if (strncmp(a, "statuscode=", 11) == 0)
statuscode = atoi(a + 11);
else if (strncmp(a, "status=", 7) == 0)
status = atoi(a + 7);
else if (strncmp(a, "strip=", 6) == 0)
StringCopy(opt->strip_query, a + 6);
else if (strncmp(a, "urlhack=", 8) == 0)
opt->urlhack = atoi(a + 8) ? HTS_TRUE : HTS_FALSE;
else if (strncmp(a, "no-www=", 7) == 0)
opt->no_www_dedup = atoi(a + 7) ? HTS_TRUE : HTS_FALSE;
else if (strncmp(a, "no-slash=", 9) == 0)
opt->no_slash_dedup = atoi(a + 9) ? HTS_TRUE : HTS_FALSE;
else if (strncmp(a, "no-query=", 9) == 0)
opt->no_query_dedup = atoi(a + 9) ? HTS_TRUE : HTS_FALSE;
else if (strncmp(a, "n83=", 4) == 0)
opt->savename_83 = atoi(a + 4);
else if (strncmp(a, "type=", 5) == 0)
opt->savename_type = atoi(a + 5);
else if (strncmp(a, "body=", 5) == 0)
body = a + 5;
else if (strncmp(a, "cached=", 7) == 0)
cached = a + 7;
else if (strncmp(a, "prior=", 6) != 0) {
fprintf(stderr, "savename: unknown arg '%s'\n", a);
return 1;
}
}
memset(&afs, 0, sizeof(afs));
strcpybuff(afs.af.adr, "www.example.com");
strcpybuff(afs.af.adr, adr);
strcpybuff(afs.af.fil, argv[0]);
memset(&cache, 0, sizeof(cache));
cache.hashtable = (void *) coucal_new(0);
if (cached != NULL) { /* cached=<content-type>|<save name> */
char *dup = strdupt(cached);
char *const sep = strchr(dup, '|');
char locbuf[64] = "";
htsblk cr;
if (sep == NULL) {
fprintf(stderr, "savename: cached needs ctype|save\n");
return 1;
}
*sep = '\0';
/* one-entry cache in cwd, reopened read-only; body is PNG magic on
purpose: naming must not depend on stored content */
StringCopy(opt->path_log, "");
cache.type = 1;
cache.log = cache.errlog = stderr;
cache.hashtable = coucal_new(0);
cache_init(&cache, opt);
hts_init_htsblk(&cr);
cr.statuscode = HTTP_OK;
strcpybuff(cr.msg, "OK");
strcpybuff(cr.contenttype, dup);
cr.location = locbuf;
cr.adr = strdupt("\x89PNG\r\n\x1a\n");
cr.size = 8;
cache_add(opt, &cache, &cr, adr, argv[0], sep + 1, 1, NULL);
freet(cr.adr);
if (cache.zipOutput != NULL) {
zipClose(cache.zipOutput, NULL);
cache.zipOutput = NULL;
}
memset(&cache, 0, sizeof(cache));
cache.type = 1;
cache.log = cache.errlog = stderr;
cache.hashtable = coucal_new(0);
cache.ro = 1;
cache_init(&cache, opt);
freet(dup);
} else {
cache.hashtable = (void *) coucal_new(0);
}
sback = back_new(opt, opt->maxsoc * 32 + 1024);
/* same wiring as hts_mirror (htscore.c) */
hash_init(opt, &hash, opt->urlhack);
hash.liens = (const lien_url *const *const *) &opt->liens;
opt->hash = &hash;
hts_record_init(opt);
for (i = 2; i < argc; i++) {
if (strncmp(argv[i], "prior=", 6) == 0) {
char *dup = strdupt(argv[i] + 6);
char *const p1 = strchr(dup, '|');
char *const p2 = p1 != NULL ? strchr(p1 + 1, '|') : NULL;
if (p2 == NULL) {
fprintf(stderr, "savename: prior needs adr|fil|sav\n");
return 1;
}
*p1 = *p2 = '\0';
if (!hts_record_link(opt, dup, p1 + 1, p2 + 1, "", "", NULL))
return 1;
freet(dup);
}
}
memset(&headers, 0, sizeof(headers));
headers.status = 0;
headers.r.statuscode = HTTP_OK;
headers.status = status;
headers.r.statuscode = statuscode;
strcpybuff(headers.r.contenttype, argv[1]);
if (cdispo != NULL)
strcpybuff(headers.r.cdispo, cdispo);
strcpybuff(headers.url_fil, argv[0]);
if (body != NULL) { /* leading body bytes, exposed via url_sav */
char BIGSTK data[1024];
const size_t n = st_decode_body(body, data, sizeof(data));
FILE *const fp = fopen(bodyfile, "wb");
if (fp == NULL || fwrite(data, 1, n, fp) != n) {
fprintf(stderr, "savename: can not write %s\n", bodyfile);
return 1;
}
fclose(fp);
strcpybuff(headers.url_sav, bodyfile);
}
url_savename(&afs, NULL, NULL, NULL, opt, sback, &cache, &hash, 0, 0,
&headers);
if (body != NULL)
(void) UNLINK(bodyfile);
printf("savename: %s\n", afs.save);
return 0;
}
@@ -1924,8 +2091,10 @@ static const struct selftest_entry {
st_relative},
{"resolve", "<link> <adr> <fil>", "resolve a link against an origin",
st_resolve},
{"savename", "<fil> <content-type>", "local save-name for a URL",
st_savename},
{"header", "<raw-header-line> ...", "response header-line parsing",
st_header},
{"savename", "<fil> <content-type> [key=value ...]",
"local save-name for a URL", st_savename},
{"cache", "<dir>", "cache read/write round-trip self-test", st_cache},
{"cache-golden", "<dir> [regen]", "frozen cache-format read self-test",
st_cache_golden},

View File

@@ -0,0 +1,29 @@
#!/bin/bash
#
set -euo pipefail
# Response header-line parsing (treathead via -#test=header <raw-line> ...).
# Isolates the wire layer from url_savename, which strips traversal on its own.
hdr() {
local want="$1"
shift
out="$(httrack -O /dev/null -#test=header "$@" | grep '^contenttype=')"
test "$out" == "$want" || {
echo "FAIL: $* -> '$out' (want '$want')"
exit 1
}
}
hdr 'contenttype=application/pdf cdispo=' 'Content-Type: application/pdf'
# filename= is honored quoted or bare.
hdr 'contenttype= cdispo=report.pdf' \
'Content-Disposition: attachment; filename="report.pdf"'
hdr 'contenttype= cdispo=report.pdf' \
'Content-Disposition: attachment; filename=report.pdf'
# Path components in the filename are dropped on the wire (RFC 2616).
hdr 'contenttype= cdispo=evil.pdf' \
'Content-Disposition: attachment; filename="../../evil.pdf"'

View File

@@ -3,13 +3,38 @@
set -euo pipefail
# Local save-name extension resolution (url_savename via -#test=savename <fil> <content-type>).
# Asserts on the basename of "savename: <path>".
# Local save-name resolution (url_savename via -#test=savename <fil> <content-type> [key=value ...]).
# name() asserts on the basename, full() on the whole path; prior= registers an
# already-crawled link whose sav is rooted under the -O path (/dev/null here).
# resolve httrack before cd: make check puts a RELATIVE ../src on PATH
httrack_bin=$(cd "$(dirname "$(command -v httrack)")" && pwd)/httrack
# scratch dir: body= and cached= write temp files (st-savename-body.tmp, hts-cache/)
scratch=$(mktemp -d)
trap 'rm -rf "$scratch"' EXIT
cd "$scratch"
run() {
"$httrack_bin" -O /dev/null -#test=savename "$@" | sed -n 's/^savename: //p'
}
name() {
out="$(httrack -O /dev/null -#test=savename "$1" "$2" | sed -n 's/^savename: //p')"
test "${out##*/}" == "$3" || {
echo "FAIL: '$1' '$2' -> '$out' (want '$3')"
local fil="$1" ctype="$2" want="$3"
shift 3
out="$(run "$fil" "$ctype" "$@")"
test "${out##*/}" == "$want" || {
echo "FAIL: '$fil' '$ctype' $* -> '$out' (want '$want')"
exit 1
}
}
full() {
local fil="$1" ctype="$2" want="$3"
shift 3
out="$(run "$fil" "$ctype" "$@")"
test "$out" == "$want" || {
echo "FAIL: '$fil' '$ctype' $* -> '$out' (want '$want')"
exit 1
}
}
@@ -39,3 +64,96 @@ name '/types/data.json' 'application/json' 'data.json'
# Agreeing type must not rewrite the extension's casing (no strip-and-reappend).
name '/x.JPG' 'image/jpeg' 'x.JPG'
# A Content-Disposition filename replaces the URL name outright.
name '/x.php' 'application/pdf' 'report.pdf' cdispo=report.pdf
name '/download' 'text/html' 'setup.exe' cdispo=setup.exe
# Reserved characters in a hostile Content-Disposition name are sanitized.
name '/x.php' 'application/pdf' 'set_up.exe' 'cdispo=set:up.exe'
# The md5-of-query suffix lands inside a Content-Disposition name too.
name '/x.php?id=1' 'application/pdf' 'report681a.pdf' cdispo=report.pdf
# Still-downloading path (status=-1): mime drives the ext, cdispo is ignored
# there (the deliberately unfolded 4th resolve_extension variant).
name '/x.pdf' 'text/html' 'x.html' status=-1
name '/x.html' 'text/html' 'x.html' status=-1
name '/x.php' 'application/pdf' 'x.pdf' status=-1 cdispo=report.pdf
# Contested type (wire disagrees with a specific ext): the wire is trusted and
# body bytes are not consulted; pinned so a content-based tie-break shows up
# as an explicit flip of these rows.
name '/photo.jpg' 'image/png' 'photo.png' body=hex:FFD8FFE000104A46
name '/photo.jpg' 'image/png' 'photo.png' body=hex:89504E470D0A1A0A
name '/photo.jpg' 'image/png' 'photo.png'
name '/doc.pdf' 'text/html' 'doc.html' body=hex:255044462D312E34
name '/doc.pdf' 'text/html' 'doc.html' 'body=<html><body>soft 404</body></html>'
name '/style.css' 'image/png' 'style.png' 'body=body { }'
# A redirect answer resolves nothing: delayed placeholder name.
name '/x.php' 'text/html' 'x.0.delayed' statuscode=301
# Root and query-only URLs get index + the md5-of-query suffix.
name '/' 'text/html' 'index.html'
name '/?a=1' 'text/html' 'index3872.html'
# Same URL crawled before: reuse its sav verbatim (case preserved).
full '/X.PHP' 'text/html' 'www.example.com/CASE.HTML' \
'prior=www.example.com|/X.PHP|www.example.com/CASE.HTML'
# Another URL owns the name: collision suffix -2, then -3, case-insensitively.
name '/x.php' 'text/html' 'x-2.html' \
'prior=www.example.com|/other.html|/dev/null/www.example.com/x.html'
name '/x.php' 'text/html' 'x-3.html' \
'prior=www.example.com|/o1.html|/dev/null/www.example.com/x.html' \
'prior=www.example.com|/o2.html|/dev/null/www.example.com/x-2.html'
name '/INDEX.HTML' 'text/html' 'INDEX-2.HTML' \
'prior=www.example.com|/index.html|/dev/null/www.example.com/index.html'
# Same basename in another directory is NOT a collision.
name '/x.php' 'text/html' 'x.html' \
'prior=www.example.com|/sub/x.html|/dev/null/www.example.com/sub/x.html'
# 8-3 modes: DOS truncates every component to 8+3, ISO9660 level 2 to 31.
full '/directory-long/verylongfilename.html' 'text/html' \
'/dev/null/EXAMPLE/DIRECTOR/VERYLONG.HTM' n83=1
full '/directory-long/verylongfilename.html' 'text/html' \
'/dev/null/EXAMPLE_C/DIRECTORY_LONG/VERYLONGFILENAME.HTM' n83=2
name '/verylongfilename.php' 'text/html' 'VERYLO-2.HTM' n83=1 \
'prior=www.example.com|/other.html|/dev/null/EXAMPLE/VERYLONG.HTM'
# urlhack dedup (#271): // collapse and www-strip map to the prior link's sav;
# the per-feature negatives opt out and take a fresh name.
full '/a//b.php' 'text/html' '/dev/null/www.example.com/a/PRIOR.html' \
'prior=www.example.com|/a/b.php|/dev/null/www.example.com/a/PRIOR.html'
full '/a//b.php' 'text/html' '/dev/null/www.example.com/a/b.html' no-slash=1 \
'prior=www.example.com|/a/b.php|/dev/null/www.example.com/a/PRIOR.html'
full '/w.php' 'text/html' '/dev/null/www.example.com/W-PRIOR.html' adr=example.com \
'prior=www.example.com|/w.php|/dev/null/www.example.com/W-PRIOR.html'
full '/w.php' 'text/html' '/dev/null/example.com/w.html' adr=example.com no-www=1 \
'prior=www.example.com|/w.php|/dev/null/www.example.com/W-PRIOR.html'
# Distinct URLs must stay distinct under urlhack (no over-normalization).
full '/a//b.php' 'text/html' '/dev/null/www.example.com/a/b.html' \
'prior=www.example.com|/a/c.php|/dev/null/www.example.com/a/C-PRIOR.html'
# --strip-query (#112): stripped key dedups onto the prior sav; without the
# option the same URLs stay distinct.
full '/page.php?id=3&sid=42' 'text/html' '/dev/null/www.example.com/PAGE-PRIOR.html' \
strip=sid 'prior=www.example.com|/page.php?id=3|/dev/null/www.example.com/PAGE-PRIOR.html'
full '/page.php?id=3&sid=42' 'text/html' '/dev/null/www.example.com/page475b.html' \
'prior=www.example.com|/page.php?id=3|/dev/null/www.example.com/PAGE-PRIOR.html'
# A kept key that differs must still block the dedup (no over-stripping).
full '/page.php?id=3&sid=42' 'text/html' '/dev/null/www.example.com/page475b.html' \
strip=sid 'prior=www.example.com|/page.php?id=4|/dev/null/www.example.com/PAGE-PRIOR.html'
# Hostile fils stay rooted under the mirror: ../ (raw or %2e-encoded) drops out,
# control characters become spaces, oversized names cap at 210 chars (the cap
# can chop the extension off entirely).
full '/../../etc/passwd' 'text/html' '/dev/null/www.example.com///etc/passwd.html'
full '/%2e%2e/%2e%2e/etc/passwd' 'text/html' '/dev/null/www.example.com///etc/passwd.html'
full '/x.php' 'application/pdf' '/dev/null/www.example.com///evil.exe' 'cdispo=../../evil.exe'
name $'/evil\rname\t.php' 'text/html' 'evil name .html'
name "/$(printf 'a%.0s' {1..300}).php" 'text/html' "$(printf 'a%.0s' {1..210})"

View File

@@ -0,0 +1,33 @@
#!/bin/bash
#
set -euo pipefail
# Update-run naming from a real cache entry (-#test=savename cached=<ctype>|<save>).
# Named 01_zlib-*: the cache writer needs zlib, which the MSan job can't run.
# resolve httrack before cd: make check puts a RELATIVE ../src on PATH
httrack_bin=$(cd "$(dirname "$(command -v httrack)")" && pwd)/httrack
scratch=$(mktemp -d)
trap 'rm -rf "$scratch"' EXIT
cd "$scratch"
name() {
local fil="$1" ctype="$2" want="$3"
shift 3
out="$("$httrack_bin" -O /dev/null -#test=savename "$fil" "$ctype" "$@" | sed -n 's/^savename: //p')"
test "${out##*/}" == "$want" || {
echo "FAIL: '$fil' '$ctype' $* -> '$out' (want '$want')"
exit 1
}
}
# Names are re-derived from the stored headers on every run: neither the
# recorded save name nor the cached body bytes change the verdict (pinned).
name '/photo.jpg' 'image/png' 'photo.png' 'cached=image/png|www.example.com/photo.jpg'
name '/photo.jpg' 'image/png' 'photo.png' 'cached=image/png|www.example.com/photo.png'
name '/photo.jpg' 'image/jpeg' 'photo.jpg' 'cached=image/jpeg|www.example.com/photo.png'
name '/style.css' 'image/png' 'style.png' 'cached=image/png|www.example.com/style.css'
# agreement keeps the URL ext verbatim (.jpeg), never canonicalized to .jpg
name '/photo.jpeg' 'image/jpeg' 'photo.jpeg' 'cached=image/jpeg|www.example.com/photo.jpeg'

View File

@@ -15,6 +15,10 @@ bash "$top_srcdir/tests/local-crawl.sh" --errors 0 \
--found 'types/photo.png' \
--found 'types/doc.pdf' \
--found 'types/lie.html' --not-found 'types/lie.png' \
--found 'types/wrongtype.png' --not-found 'types/wrongtype.jpg' \
--found 'types/bigtype.png' --not-found 'types/bigtype.jpg' \
--found 'types/packed.png' --not-found 'types/packed.jpg' \
--found 'types/mutant.png' --not-found 'types/mutant.jpg' \
--found 'types/report.html' --not-found 'types/report.pdf' \
--found 'types/page.htm' --not-found 'types/page.html' \
--found 'types/script.js' \

View File

@@ -12,4 +12,7 @@ bash "$top_srcdir/tests/local-crawl.sh" --errors 0 --rerun \
--found 'types/report.html' --not-found 'types/report.pdf' \
--found 'types/notype.png' --not-found 'types/notype.html' \
--found 'types/lie.html' \
--found 'types/wrongtype.png' --not-found 'types/wrongtype.jpg' \
--found 'types/packed.png' --not-found 'types/packed.jpg' \
--found 'types/mutant.png' --not-found 'types/mutant.jpg' \
httrack 'BASEURL/types/index.html'

View File

@@ -0,0 +1,17 @@
#!/bin/bash
#
# Content-Disposition names the saved file: the attachment filename replaces
# the URL-derived name, and a traversal filename is reduced to its last
# component, inside the mirror.
set -euo pipefail
: "${top_srcdir:=..}"
bash "$top_srcdir/tests/local-crawl.sh" --errors 0 \
--found 'cdispo/report.pdf' \
--file-matches 'cdispo/report.pdf' '%PDF' \
--not-found 'cdispo/fetch.pdf' \
--found 'cdispo/evil.pdf' \
--not-found 'evil.pdf' \
httrack 'BASEURL/cdispo/index.html'

View File

@@ -0,0 +1,20 @@
#!/bin/bash
#
# Degenerate delayed-type paths (#5/#107 family): redirects that never resolve
# a name must drop cleanly -- no .delayed leftovers (audited by local-crawl.sh),
# no "bogus state" cache warnings, resolvable links still land correctly.
set -euo pipefail
: "${top_srcdir:=..}"
bash "$top_srcdir/tests/local-crawl.sh" --rerun --errors 0 \
--found 'delayed/real.pdf' \
--file-matches 'delayed/real.pdf' '%PDF' \
--found 'delayed/notype.bin.html' \
--found 'delayed/empty.html' \
--not-found 'delayed/noloc.html' \
--not-found 'delayed/selfloop.html' \
--not-found 'delayed/chain9.pdf' \
--log-not-found 'bogus state' \
httrack 'BASEURL/delayed/index.html'

View File

@@ -38,6 +38,7 @@ TESTS = \
01_engine-ftp-line.test \
01_engine-ftp-userpass.test \
01_engine-hashtable.test \
01_engine-header.test \
01_engine-idna.test \
01_engine-escape-room.test \
01_engine-inplace-escape.test \
@@ -63,6 +64,7 @@ TESTS = \
01_zlib-cache.test \
01_zlib-cache-golden.test \
01_zlib-cache-writefail.test \
01_zlib-savename-cached.test \
02_manpage-regen.test \
02_update-cache.test \
10_crawl-simple.test \
@@ -91,6 +93,8 @@ TESTS = \
28_local-pause.test \
29_local-redirect-fragment.test \
30_local-fragment-link.test \
31_local-javaclass.test
31_local-javaclass.test \
32_local-cdispo.test \
33_local-delayed.test
CLEANFILES = check-network_sh.cache

View File

@@ -246,6 +246,14 @@ done
test -n "$hostroot" || die "could not find host root under $out"
debug "host root: $hostroot"
# A completed crawl must leave no .delayed temporaries (issue #107)
info "checking for leftover .delayed files"
leftovers=$(find "$out" -name '*.delayed' 2>/dev/null | head -5)
if test -z "$leftovers"; then result "OK"; else
result "leftover: $leftovers"
exit 1
fi
# --- audit -------------------------------------------------------------------
i=0
while test "$i" -lt "${#audit[@]}"; do

View File

@@ -14,6 +14,7 @@ stdlib only (http.server + ssl) -- no new build or runtime dependency.
"""
import argparse
import gzip
import os
import time
from http.server import SimpleHTTPRequestHandler, ThreadingHTTPServer
@@ -134,12 +135,14 @@ class Handler(SimpleHTTPRequestHandler):
# --- type/extension matrix (issue #267 family) -------------------------
def send_raw(self, body, content_type):
def send_raw(self, body, content_type, extra_headers=()):
"""Send a raw body with an explicit Content-Type, or none at all when
content_type is None (to observe httrack's typeless-file naming)."""
self.send_response(200)
if content_type is not None:
self.send_header("Content-Type", content_type)
for name, value in extra_headers:
self.send_header(name, value)
self.send_header("Content-Length", str(len(body)))
self.end_headers()
if self.command != "HEAD":
@@ -148,6 +151,8 @@ class Handler(SimpleHTTPRequestHandler):
# Fake-binary blobs for the image/pdf/typeless cases.
FAKE_PNG = b"\x89PNG\r\n\x1a\n" + b"\x00" * 64
FAKE_PDF = b"%PDF-1.4\n" + b"\x00" * 64
FAKE_JPEG = b"\xff\xd8\xff\xe0" + b"\x00" * 64
BIG_JPEG = b"\xff\xd8\xff\xe0" + bytes(range(256)) * 64 # > sniff window
# path -> (body, content_type); None sends no header, "" sends an empty
# Content-Type value (no usable type, must be treated like None).
@@ -159,6 +164,8 @@ class Handler(SimpleHTTPRequestHandler):
"/types/notype.pdf": (FAKE_PDF, None),
"/types/emptyct.png": (FAKE_PNG, ""),
"/types/lie.png": (FAKE_PNG, "text/html"),
"/types/wrongtype.jpg": (FAKE_JPEG, "image/png"),
"/types/bigtype.jpg": (BIG_JPEG, "image/png"),
"/types/report.pdf": (b"<html><body>real page</body></html>", "text/html"),
"/types/page.htm": (b"<html><body>htm page</body></html>", "text/html"),
"/types/script.js": (b"var x = 1;\n", "application/javascript"),
@@ -176,6 +183,10 @@ class Handler(SimpleHTTPRequestHandler):
'\t<a href="notype.pdf">notypepdf</a>\n'
'\t<img src="emptyct.png" />\n'
'\t<img src="lie.png" />\n'
'\t<img src="wrongtype.jpg" />\n'
'\t<img src="bigtype.jpg" />\n'
'\t<img src="mutant.jpg" />\n'
'\t<img src="packed.jpg" />\n'
'\t<a href="report.pdf">report</a>\n'
'\t<a href="page.htm">htm</a>\n'
'\t<script src="script.js"></script>\n'
@@ -190,6 +201,25 @@ class Handler(SimpleHTTPRequestHandler):
body, ctype = self.TYPE_MATRIX[path]
self.send_raw(body, ctype)
# content changes between crawls: run 1 sniffs JPEG, the update pass must
# keep the run-1 name (recorded verdict) even though the body is now PNG
MUTANT_SEEN = set()
def route_types_mutant(self):
path = urlsplit(self.path).path
body = self.FAKE_PNG if path in self.MUTANT_SEEN else self.FAKE_JPEG
if self.command != "HEAD":
self.MUTANT_SEEN.add(path)
self.send_raw(body, "image/png")
# gzip on the wire: the sniff must see the decoded body, not the stream
def route_types_packed(self):
self.send_raw(
gzip.compress(self.FAKE_JPEG),
"image/png",
extra_headers=[("Content-Encoding", "gzip")],
)
# --- MIME-type exclusion abort (issue #58) -----------------------------
# A -mime:application/pdf filter must abort the transfer once the header
# arrives, not download the whole body and discard it.
@@ -354,6 +384,27 @@ class Handler(SimpleHTTPRequestHandler):
if self.command != "HEAD":
self.wfile.write(body)
# Content-Disposition naming: the attachment filename replaces the
# URL-derived name; path components in it are stripped (RFC 2616).
CDISPO_NAMES = {
"/cdispo/fetch.php": "report.pdf",
"/cdispo/evil.php": "../../evil.pdf",
}
def route_cdispo_index(self):
self.send_html(
'\t<a href="fetch.php">report</a>\n' '\t<a href="evil.php">evil</a>\n'
)
def route_cdispo(self):
filename = self.CDISPO_NAMES[urlsplit(self.path).path]
cdispo = 'attachment; filename="%s"' % filename
self.send_raw(
self.FAKE_PDF,
"application/pdf",
extra_headers=[("Content-Disposition", cdispo)],
)
# 302 whose Location carries a #fragment (#204): the fragment is a UA anchor
# that must be dropped before the target is fetched. A leaked '#' reaches the
# strict-server guard below and 400s.
@@ -369,6 +420,50 @@ class Handler(SimpleHTTPRequestHandler):
def route_redir_target(self):
self.send_raw(b"<html><body>redirect target</body></html>\n", "text/html")
# --- delayed-type degenerate paths (issues #5/#107) --------------------
def route_delayed_index(self):
self.send_html(
'\t<a href="noloc.php">noloc</a>\n'
'\t<a href="selfloop.php">selfloop</a>\n'
'\t<a href="chain1.php">chain</a>\n'
'\t<a href="redir.php">redir</a>\n'
'\t<a href="notype.bin">notype</a>\n'
'\t<a href="empty.php">empty</a>\n'
)
def send_redirect(self, location):
self.send_response(302, "Found")
if location is not None:
self.send_header("Location", location)
self.send_header("Content-Length", "0")
self.end_headers()
def route_delayed_noloc(self):
self.send_redirect(None) # 302 without Location: name never resolves
def route_delayed_selfloop(self):
self.send_redirect("selfloop.php")
def route_delayed_chain(self):
# chain1..chain9: one more hop than the type-check redirect budget
n = int(urlsplit(self.path).path.rsplit("chain", 1)[1].split(".")[0])
if n < 9:
self.send_redirect("chain%d.php" % (n + 1))
else:
self.send_raw(self.FAKE_PDF, "application/pdf")
def route_delayed_redir(self):
self.send_redirect("real.pdf")
def route_delayed_realpdf(self):
self.send_raw(self.FAKE_PDF, "application/pdf")
def route_delayed_notype(self):
self.send_raw(self.FAKE_PDF, None)
def route_delayed_empty(self):
self.send_raw(b"", "text/html") # 200 + Content-Length: 0
ROUTES = {
"/cookies/entrance.php": route_entrance,
"/cookies/second.php": route_second,
@@ -384,6 +479,10 @@ class Handler(SimpleHTTPRequestHandler):
"/types/notype.pdf": route_types,
"/types/emptyct.png": route_types,
"/types/lie.png": route_types,
"/types/wrongtype.jpg": route_types,
"/types/bigtype.jpg": route_types,
"/types/mutant.jpg": route_types_mutant,
"/types/packed.jpg": route_types_packed,
"/types/report.pdf": route_types,
"/types/page.htm": route_types,
"/types/script.js": route_types,
@@ -406,6 +505,25 @@ class Handler(SimpleHTTPRequestHandler):
"/mimex/index.html": route_mimex_index,
"/mimex/blob.pdf": route_mimex_blob,
"/mimex/real.html": route_mimex_real,
"/cdispo/index.html": route_cdispo_index,
"/cdispo/fetch.php": route_cdispo,
"/cdispo/evil.php": route_cdispo,
"/delayed/index.html": route_delayed_index,
"/delayed/noloc.php": route_delayed_noloc,
"/delayed/selfloop.php": route_delayed_selfloop,
"/delayed/redir.php": route_delayed_redir,
"/delayed/real.pdf": route_delayed_realpdf,
"/delayed/notype.bin": route_delayed_notype,
"/delayed/empty.php": route_delayed_empty,
"/delayed/chain1.php": route_delayed_chain,
"/delayed/chain2.php": route_delayed_chain,
"/delayed/chain3.php": route_delayed_chain,
"/delayed/chain4.php": route_delayed_chain,
"/delayed/chain5.php": route_delayed_chain,
"/delayed/chain6.php": route_delayed_chain,
"/delayed/chain7.php": route_delayed_chain,
"/delayed/chain8.php": route_delayed_chain,
"/delayed/chain9.php": route_delayed_chain,
"/redir/index.html": route_redir_index,
"/redir/go.php": route_redir_go,
"/redir/target.html": route_redir_target,