Compare commits

...

5 Commits

Author SHA1 Message Date
Xavier Roche
addbd3136b Use an unknown/unknown sentinel for an absent Content-Type (#412)
#409 distinguished "the server declared text/html" from "no Content-Type,
defaulted to text/html" with a new htsblk.contenttype_given flag, so a
binary-looking URL that really serves HTML is saved .html while a typeless
response keeps its URL extension. That worked on a fresh crawl but had two
costs: the flag was never persisted, so on --update the cache read it as
unset and the names reverted (report.html became report.pdf again, and the
two passes disagreed); and it was an installed-struct ABI break (soname 4,
libhttrack4).

Replace the flag with a sentinel: when no Content-Type is received, store
"unknown/unknown" as the type instead of text/html. The sentinel is treated
as html for every type test (added to is_html_mime_type), so parsing,
storage and filtering of a typeless response are unchanged; only the naming
code (wire_patches_ext) reads it as "no declared type" and keeps the URL
extension. Because the type string rides the cache, an update reads the same
sentinel and names consistently -- the revert is fixed at the source.

The sentinel never reaches a consumer as a real type: a single helper,
hts_effective_mime(), maps it back to text/html wherever a stored type is
derived (give_mimext) or emitted/persisted -- the httrack stdout serve, the
ProxyTrack live serve, and the ProxyTrack .arc export (both the replayed
response header and the index record). The .arc export was caught by an
adversarial spill audit; without the map a typeless page archived via
proxytrack would carry Content-Type: unknown/unknown.

Since the sentinel makes contenttype_given unnecessary, #409's ABI break is
undone: the field is removed, soname returns to 3, and the Debian package
reverts libhttrack4 -> libhttrack3. soname 4 was never released (Debian NEW
carries libhttrack3), so this re-aligns master with the archive rather than
flip-flopping anything downstream.

Tests: 18_local-update re-mirrors and asserts the names survive the update
pass; 15_local-types gains a notype.html negative control; 17_local-empty-ct
stays green. Full make check: 27 pass, 0 fail.

One accepted behavior change: a mime filter matching exactly text/html no
longer matches a typeless response (its type is the sentinel, html-ish but
not literally text/html); the response is still parsed and crawled as html.

Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 10:44:12 +02:00
Xavier Roche
a64c4cd160 Don't read an uninitialized buffer on an empty Content-Type (#411)
treathead() parses the Content-Type value with sscanf("%s") into a local
`tempo` buffer, then calls strlen(tempo) and stores the result. A response
whose Content-Type header has an empty or whitespace-only value yields no
token: sscanf leaves `tempo` uninitialized, so strlen reads uninitialized
stack and can over-read past the buffer. A hostile server triggers this with
a bare `Content-Type:` line.

Guard on sscanf's return: adopt the value, and mark the type as server-given,
only when a token was actually read. An empty value now falls back to the
default type with contenttype_given left false, i.e. it is treated like a
missing header and the URL extension is kept -- which is also the correct
naming behavior.

Found while reviewing #409, which added contenttype_given right beside this
parse; the bug itself predates it. tests/17_local-empty-ct.test exercises the
empty-Content-Type path, and the ASan/UBSan CI job is what catches the
uninitialized read.

Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 10:20:08 +02:00
Xavier Roche
1611dbcabf Trust a declared Content-Type over a binary URL extension (#409)
PR #408 stopped a bogus or missing html-ish wire type from clobbering a URL
extension that maps to a specific non-HTML type (the #267 mangle). But it
treated an explicitly declared text/html the same as a missing type, so a
binary-looking URL that legitimately serves HTML, such as a login or error
interstitial or a soft-404 at a .pdf or .jpg link, was saved under the binary
extension with HTML inside and would not render locally.

The response body is the only true discriminator, but under the default delayed
type check the save name is committed from the headers while the body is still
downloading, so it cannot be sniffed at naming time. Instead, keep the URL
extension only when the server sent no Content-Type at all (a missing header is
defaulted to text/html upstream and must not be trusted); an explicitly declared
type, even text/html, now wins. This trades the rare case of a real binary
explicitly mislabeled text/html (now named .html) for the common interstitial
and soft-404 case.

Whether a Content-Type header was actually received cannot be recovered after
parsing, since treatfirstline defaults a missing header to text/html, so it is
recorded as a new hts_boolean contenttype_given on htsblk. That grows the
installed struct, an incompatible ABI change: soname bumped 3 -> 4, and the
Debian runtime package renamed libhttrack3 -> libhttrack4 to match.

Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 20:18:16 +02:00
Xavier Roche
099501ee50 Make lintian actually gate the Debian package build (#410)
The deb CI job and mkdeb.sh ran lintian via debuild with
--fail-on=error,warning and were believed to gate on it. They did not:
debuild only reports lintian, it does not propagate lintian's exit status,
so a package that lintian flags with errors or warnings still built green.
This was demonstrated by a SONAME bump landing without the matching
libhttrackN package rename: lintian emitted shared-library-is-multi-arch-foreign
and package-name-doesnt-match-sonames, yet the job passed.

Disable debuild's lintian run and run lintian ourselves on the produced
.changes, under set -e, so any error or warning fails the build. Two CI-only
adjustments keep a clean package green: --profile debian, because the Ubuntu
runners' vendor data would otherwise reject the Debian "unstable" distribution,
and --suppress-tags newer-standards-version, which only reflects the runner's
lintian being older than the buildds'. The long-standing script-not-executable
hint on the sample search.sh gets an override.

Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 20:13:12 +02:00
Xavier Roche
1b9eefa3b4 Merge pull request #408 from xroche/fix/267-delayed-ext-mangle
Stop mangling saved-file extensions under the default delayed type check
2026-06-20 18:28:27 +02:00
15 changed files with 171 additions and 53 deletions

View File

@@ -227,7 +227,8 @@ jobs:
# Validate the Debian packaging via the same script maintainers release with.
# One amd64/gcc run is enough: packaging (control/rules/manifest/lintian/quilt
# source build) is arch- and compiler-independent, and the build matrix above
# already covers compile portability. lintian runs with --fail-on=error.
# already covers compile portability. mkdeb.sh runs lintian as an explicit gate
# (debuild does not propagate lintian's exit) with --fail-on=error,warning.
deb:
name: deb package (lintian)
runs-on: ubuntu-24.04

View File

@@ -4,3 +4,6 @@
# so the path lives in the display pointer, not the override -- match with '*'.
httrack-doc: extra-license-file *
httrack-doc: package-contains-documentation-outside-usr-share-doc *
# search.sh is a sample CGI shipped alongside the HTML manual, not meant to be
# run from the package tree; it stays non-executable by design.
httrack-doc: script-not-executable *

View File

@@ -2579,7 +2579,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
(r.size >= 0) ? r.size : (-r.size));
if (r.contenttype >= 0) {
fprintf(stdout, "Content-Type: %s\r\n",
r.contenttype);
hts_effective_mime(r.contenttype));
}
if (r.cdispo[0]) {
fprintf(stdout, "Content-Disposition: %s\r\n",

View File

@@ -1423,7 +1423,7 @@ void treatfirstline(htsblk * retour, const char *rcvd) {
else
infostatuscode(retour->msg, retour->statuscode);
// type MIME par défaut2
strcpybuff(retour->contenttype, HTS_HYPERTEXT_DEFAULT_MIME);
strcpybuff(retour->contenttype, HTS_UNKNOWN_MIME);
} else { // pas de code!
retour->statuscode = STATUSCODE_INVALID;
strcpybuff(retour->msg, "Unknown response structure");
@@ -1438,7 +1438,7 @@ void treatfirstline(htsblk * retour, const char *rcvd) {
retour->statuscode = HTTP_OK;
retour->keep_alive = 0;
strcpybuff(retour->msg, "Unknown, assuming junky server");
strcpybuff(retour->contenttype, HTS_HYPERTEXT_DEFAULT_MIME);
strcpybuff(retour->contenttype, HTS_UNKNOWN_MIME);
} else if (strnotempty(a)) {
retour->statuscode = STATUSCODE_INVALID;
strcpybuff(retour->msg, "Unknown (not HTTP/xx) response structure");
@@ -1447,7 +1447,7 @@ void treatfirstline(htsblk * retour, const char *rcvd) {
retour->statuscode = HTTP_OK;
retour->keep_alive = 0;
strcpybuff(retour->msg, "Unknown, assuming junky server");
strcpybuff(retour->contenttype, HTS_HYPERTEXT_DEFAULT_MIME);
strcpybuff(retour->contenttype, HTS_UNKNOWN_MIME);
}
}
} else { // vide!
@@ -1458,7 +1458,7 @@ void treatfirstline(htsblk * retour, const char *rcvd) {
/* This is dirty .. */
retour->statuscode = HTTP_OK;
strcpybuff(retour->msg, "Unknown, assuming junky server");
strcpybuff(retour->contenttype, HTS_HYPERTEXT_DEFAULT_MIME);
strcpybuff(retour->contenttype, HTS_UNKNOWN_MIME);
}
}
@@ -1589,11 +1589,15 @@ void treathead(t_cookie * cookie, const char *adr, const char *fil, htsblk * ret
}
}
}
sscanf(rcvd + p, "%s", tempo);
if (strlen(tempo) < sizeof(retour->contenttype) - 2) // pas trop long!!
strcpybuff(retour->contenttype, tempo);
else
strcpybuff(retour->contenttype, "application/octet-stream-unknown"); // erreur
// An empty/whitespace Content-Type value yields no token: keep the
// sentinel default rather than reading an uninitialized tempo.
if (sscanf(rcvd + p, "%s", tempo) == 1) {
if (strlen(tempo) < sizeof(retour->contenttype) - 2) // pas trop long!!
strcpybuff(retour->contenttype, tempo);
else
strcpybuff(retour->contenttype,
"application/octet-stream-unknown"); // erreur
}
}
} else if ((p = strfield(rcvd, "Content-Range:")) != 0) {
// Content-Range: bytes 0-70870/70871
@@ -4310,6 +4314,7 @@ int give_mimext(char *s, size_t ssize, const char *st) {
int ok = 0;
int j = 0;
st = hts_effective_mime(st); /* no declared type: derive an html ext */
s[0] = '\0';
while((!ok) && (strnotempty(hts_mime[j][1]))) {
if (strfield2(hts_mime[j][0], st)) {

View File

@@ -481,10 +481,22 @@ HTS_STATIC int strcmpnocase(const char *a, const char *b) {
// is this MIME an hypertext MIME (text/html), html/js-style or other script/text type?
#define HTS_HYPERTEXT_DEFAULT_MIME "text/html"
/* Sentinel stored when the server declared no Content-Type. It is html-ish
for every type test (so a typeless response still parses/stores as today),
but the naming code (wire_patches_ext) treats it as "no declared type" and
keeps the URL extension. It rides the cache, so updates name consistently. */
#define HTS_UNKNOWN_MIME "unknown/unknown"
/* Map the no-declared-type sentinel back to a real type for any header or
record we EMIT or PERSIST, so "unknown/unknown" never reaches a consumer
(a served Content-Type, a ProxyTrack .arc record, ...). */
#define hts_effective_mime(m) \
(strfield2((m), HTS_UNKNOWN_MIME) ? HTS_HYPERTEXT_DEFAULT_MIME : (m))
#define is_html_mime_type(a) \
( (strfield2((a),"text/html")!=0)\
|| (strfield2((a),"application/xhtml+xml")!=0) \
#define is_html_mime_type(a) \
((strfield2((a), "text/html") != 0) || \
(strfield2((a), "application/xhtml+xml") != 0) || \
(strfield2((a), HTS_UNKNOWN_MIME) != \
0) /* no declared type: treat as html */ \
)
#define is_hypertext_mime__(a) \
( \

View File

@@ -139,10 +139,12 @@ static void cleanEndingSpaceOrDot(char *s) {
}
/* Should the wire Content-Type override the URL's own extension when naming the
saved file? True only when the type is patchable (may_unknown2) and doing so
would not clobber a URL extension that already maps to a specific, non-HTML
type. This is the #267 mangle guard: a .png served as text/html (or with no
type) stays named .png. */
saved file? True when the type is patchable (may_unknown2) and either the URL
extension implies no specific type or the server declared a disagreeing one.
A URL extension mapping to a specific non-HTML type is kept only when the
server declared NO type (the HTS_UNKNOWN_MIME sentinel; the #267 mangle
guard): a typeless .png stays .png, but a .pdf explicitly served as text/html
is named .html. The sentinel rides the cache, so updates stay consistent. */
static int wire_patches_ext(httrackp *opt, const char *wiremime,
const char *file) {
char urlmime[256];
@@ -155,9 +157,12 @@ static int wire_patches_ext(httrackp *opt, const char *wiremime,
return 1; /* URL ext implies no known type: trust the wire type */
if (strfield2(wiremime, urlmime))
return 0; /* wire agrees with the ext: keep it (no .htm->.html churn) */
/* wire disagrees: keep a specific non-HTML ext against an html/empty claim */
/* wire disagrees with a specific non-HTML URL ext. Keep the ext only when
the server declared no type (the sentinel); an explicitly declared type,
even text/html, is trusted, so a binary-looking URL that really serves
HTML (login/error interstitial, soft-404) is named .html. */
if (!is_hypertext_mime(opt, urlmime, file) &&
(is_html_mime_type(wiremime) || !strnotempty(wiremime)))
strfield2(wiremime, HTS_UNKNOWN_MIME))
return 0;
return 1;
}
@@ -669,7 +674,8 @@ int url_savename(lien_adrfilsave *const afs,
if (!has_been_moved) {
if (back[b].r.statuscode != -10) { // erreur
if (strnotempty(back[b].r.contenttype) == 0)
strcpybuff(back[b].r.contenttype, "text/html"); // message d'erreur en html
strcpybuff(back[b].r.contenttype,
HTS_UNKNOWN_MIME); // no declared type
// Finalement on, renvoie un erreur, pour ne toucher à rien dans le code
// libérer emplacement backing
}

View File

@@ -1176,11 +1176,15 @@ static void proxytrack_process_HTTP(PT_Indexes indexes, T_SOC soc_c) {
if (element != NULL) {
msgCode = element->statuscode;
StringRoom(headers, 8192);
sprintf(StringBuffRW(headers), "HTTP/1.1 %d %s\r\n"
sprintf(StringBuffRW(headers),
"HTTP/1.1 %d %s\r\n"
#ifndef NO_WEBDAV
"%s"
#endif
"Content-Type: %s%s%s%s\r\n" "%s%s%s" "%s%s%s" "%s%s%s",
"Content-Type: %s%s%s%s\r\n"
"%s%s%s"
"%s%s%s"
"%s%s%s",
/* */
msgCode, element->msg,
#ifndef NO_WEBDAV
@@ -1188,16 +1192,18 @@ static void proxytrack_process_HTTP(PT_Indexes indexes, T_SOC soc_c) {
StringBuff(davHeaders),
#endif
/* Content-type: foo; [ charset=bar ] */
element->contenttype,
hts_effective_mime(element->contenttype),
((element->charset[0]) ? "; charset=\"" : ""),
element->charset, ((element->charset[0]) ? "\"" : ""),
/* location */
((element->location != NULL
&& element->location[0]) ? "Location: " : ""),
((element->location != NULL
&& element->location[0]) ? element->location : ""),
((element->location != NULL
&& element->location[0]) ? "\r\n" : ""),
((element->location != NULL && element->location[0])
? "Location: "
: ""),
((element->location != NULL && element->location[0])
? element->location
: ""),
((element->location != NULL && element->location[0]) ? "\r\n"
: ""),
/* last-modified */
((element->lastmodified[0]) ? "Last-Modified: " : ""),
((element->lastmodified[0]) ? element->lastmodified : ""),
@@ -1205,8 +1211,7 @@ static void proxytrack_process_HTTP(PT_Indexes indexes, T_SOC soc_c) {
/* etag */
((element->etag[0]) ? "ETag: " : ""),
((element->etag[0]) ? element->etag : ""),
((element->etag[0]) ? "\r\n" : "")
);
((element->etag[0]) ? "\r\n" : ""));
StringLength(headers) = (int) strlen(StringBuff(headers));
} else {
/* No query string, no ending / : check the the <url>/ page */

View File

@@ -52,6 +52,7 @@ Please visit our Website: http://www.httrack.com
#include "htscore.h"
#include "htsback.h"
#include "htslib.h" /* hts_effective_mime */
#include "store.h"
#include "proxystrings.h"
@@ -2289,10 +2290,17 @@ static int PT_SaveCache__Arc_Fun(void *arg, const char *url, PT_Element element)
int size_headers;
sprintf(st->headers,
"HTTP/1.0 %d %s" "\r\n" "X-Server: ProxyTrack " PROXYTRACK_VERSION
"\r\n" "Content-type: %s%s%s%s" "\r\n" "Last-modified: %s" "\r\n"
"Content-length: %d" "\r\n", element->statuscode, element->msg,
/**/ element->contenttype,
"HTTP/1.0 %d %s"
"\r\n"
"X-Server: ProxyTrack " PROXYTRACK_VERSION "\r\n"
"Content-type: %s%s%s%s"
"\r\n"
"Last-modified: %s"
"\r\n"
"Content-length: %d"
"\r\n",
element->statuscode, element->msg,
/**/ hts_effective_mime(element->contenttype),
(element->charset[0] ? "; charset=\"" : ""),
(element->charset[0] ? element->charset : ""),
(element->charset[0] ? "\"" : ""), /**/ element->lastmodified,
@@ -2328,10 +2336,10 @@ static int PT_SaveCache__Arc_Fun(void *arg, const char *url, PT_Element element)
/* args */
(link_has_authority(url) ? "" : "http://"), url, "0.0.0.0",
tm->tm_year + 1900, tm->tm_mon + 1, tm->tm_mday, tm->tm_hour,
tm->tm_min, tm->tm_sec, element->contenttype, element->statuscode,
st->md5, (element->location ? element->location : "-"),
(long int) ftell(fp), st->filename,
(long int) (size_headers + element->size));
tm->tm_min, tm->tm_sec, hts_effective_mime(element->contenttype),
element->statuscode, st->md5,
(element->location ? element->location : "-"), (long int) ftell(fp),
st->filename, (long int) (size_headers + element->size));
/* network_doc */
if (fwrite(st->headers, 1, size_headers, fp) != size_headers
|| (element->size > 0

View File

@@ -1,17 +1,22 @@
#!/bin/bash
#
# Content-Type vs URL-extension naming (issue #267 family). Under the default
# delayed type check (-%N2), a bogus/missing html-ish wire type must not clobber
# a URL extension that maps to a specific non-HTML type. The .html "mangle" names
# are asserted absent so a regression that re-introduces it fails here.
# Content-Type vs URL-extension naming (issue #267 family) under the default
# delayed type check (-%N2). Policy: a MISSING Content-Type must not clobber a
# URL extension that maps to a specific non-HTML type (.png/.pdf stay as-is);
# an explicitly DECLARED type is trusted, so a binary-looking URL that really
# serves HTML (text/html on .pdf/.jpg) is named .html. The "wrong" names are
# asserted absent so a regression in either direction fails here.
: "${top_srcdir:=..}"
bash "$top_srcdir/tests/local-crawl.sh" --errors 0 \
--found 'types/notype.png' --not-found 'types/notype.html' \
--found 'types/lie.png' --not-found 'types/lie.html' \
--found 'types/page.htm' --not-found 'types/page.html' \
--found 'types/notype.pdf' --not-found 'types/notype.html' \
--found 'types/photo.png' \
--found 'types/doc.pdf' \
--found 'types/lie.html' --not-found 'types/lie.png' \
--found 'types/report.html' --not-found 'types/report.pdf' \
--found 'types/page.htm' --not-found 'types/page.html' \
--found 'types/script.js' \
--found 'types/style.css' \
--found 'types/data.json' \

View File

@@ -0,0 +1,12 @@
#!/bin/bash
#
# An empty "Content-Type:" header value must be treated as "no usable type"
# (keep the URL extension), not parsed from an uninitialized buffer. The crawl
# also runs under ASan/UBSan in CI, which catches the uninitialized read this
# guards against.
: "${top_srcdir:=..}"
bash "$top_srcdir/tests/local-crawl.sh" --errors 0 \
--found 'types/emptyct.png' --not-found 'types/emptyct.html' \
httrack 'BASEURL/types/index.html'

View File

@@ -0,0 +1,15 @@
#!/bin/bash
#
# A second (update) pass must keep the names the first crawl chose. The stored
# Content-Type rides the cache, so the update reads back the same value -- the
# unknown/unknown sentinel for a typeless response, the declared type otherwise
# -- and names consistently: a declared-text/html .pdf stays .html and a
# typeless .png stays .png across the update rather than reverting.
: "${top_srcdir:=..}"
bash "$top_srcdir/tests/local-crawl.sh" --errors 0 --rerun \
--found 'types/report.html' --not-found 'types/report.pdf' \
--found 'types/notype.png' --not-found 'types/notype.html' \
--found 'types/lie.html' \
httrack 'BASEURL/types/index.html'

View File

@@ -53,6 +53,8 @@ TESTS = \
13_local-cookies.test \
14_local-https.test \
15_local-types.test \
16_local-assume.test
16_local-assume.test \
17_local-empty-ct.test \
18_local-update.test
CLEANFILES = check-network_sh.cache

View File

@@ -26,6 +26,7 @@ key="${testdir}/server.key"
tls=
verbose=
rerun=
tmpdir=
serverpid=
crawlpid=
@@ -89,6 +90,7 @@ nargs=$#
while test "$pos" -lt "$nargs"; do
case "${args[$pos]}" in
--debug) verbose=1 ;;
--rerun) rerun=1 ;; # run httrack a second time (update pass) before auditing
--no-purge)
nopurge=1
audit+=("--no-purge")
@@ -180,6 +182,22 @@ test "$crawlres" -eq 0 || ! result "httrack exited $crawlres" || {
result "OK"
grep -iE "^[0-9:]*[[:space:]]Error:" "${out}/hts-log.txt" >&2
# --- optional second pass: re-mirror into the same dir (cache/update path) ----
if test -n "$rerun"; then
info "re-running httrack (update pass)"
httrack -O "$out" --user-agent="httrack $ver local ($(uname -omrs))" \
"${moreargs[@]}" "${hts[@]}" >"${log}.2" 2>&1 &
crawlpid=$!
wait "$crawlpid"
crawlres=$?
crawlpid=
test "$crawlres" -eq 0 || ! result "update pass exited $crawlres" || {
cat "${log}.2" >&2
exit 1
}
result "OK (update)"
fi
# --- discover the single host root (127.0.0.1_<port> or 127.0.0.1) -----------
hostroot=
for cand in "${out}/127.0.0.1_${port}" "${out}/127.0.0.1"; do

View File

@@ -131,15 +131,21 @@ class Handler(SimpleHTTPRequestHandler):
if self.command != "HEAD":
self.wfile.write(body)
# A fake-binary PNG-ish blob for the image/typeless cases.
# Fake-binary blobs for the image/pdf/typeless cases.
FAKE_PNG = b"\x89PNG\r\n\x1a\n" + b"\x00" * 64
FAKE_PDF = b"%PDF-1.4\n" + b"\x00" * 64
# path -> (body, content_type); content_type None means no header at all.
# path -> (body, content_type); None sends no header, "" sends an empty
# Content-Type value (no usable type, must be treated like None).
TYPE_MATRIX = {
"/types/control.php": (b"<html><body>control</body></html>", "text/html"),
"/types/photo.png": (FAKE_PNG, "image/png"),
"/types/doc.pdf": (FAKE_PDF, "application/pdf"),
"/types/notype.png": (FAKE_PNG, None),
"/types/notype.pdf": (FAKE_PDF, None),
"/types/emptyct.png": (FAKE_PNG, ""),
"/types/lie.png": (FAKE_PNG, "text/html"),
"/types/report.pdf": (b"<html><body>real page</body></html>", "text/html"),
"/types/page.htm": (b"<html><body>htm page</body></html>", "text/html"),
"/types/script.js": (b"var x = 1;\n", "application/javascript"),
"/types/style.css": (b"body { color: red; }\n", "text/css"),
@@ -151,8 +157,12 @@ class Handler(SimpleHTTPRequestHandler):
body = (
'\t<a href="control.php">control</a>\n'
'\t<img src="photo.png" />\n'
'\t<a href="doc.pdf">doc</a>\n'
'\t<img src="notype.png" />\n'
'\t<a href="notype.pdf">notypepdf</a>\n'
'\t<img src="emptyct.png" />\n'
'\t<img src="lie.png" />\n'
'\t<a href="report.pdf">report</a>\n'
'\t<a href="page.htm">htm</a>\n'
'\t<script src="script.js"></script>\n'
'\t<link rel="stylesheet" href="style.css" />\n'
@@ -174,8 +184,12 @@ class Handler(SimpleHTTPRequestHandler):
"/types/index.html": route_types_index,
"/types/control.php": route_types,
"/types/photo.png": route_types,
"/types/doc.pdf": route_types,
"/types/notype.png": route_types,
"/types/notype.pdf": route_types,
"/types/emptyct.png": route_types,
"/types/lie.png": route_types,
"/types/report.pdf": route_types,
"/types/page.htm": route_types,
"/types/script.js": route_types,
"/types/style.css": route_types,

View File

@@ -206,9 +206,10 @@ main() {
cp -a "$export_dir/debian" "httrack-$ver/debian"
)
# Build (debuild also runs lintian and signs). --fail-on aborts on a lintian
# error or warning, so neither a release nor CI produces an unclean package.
local -a debuild_opts=(--lintian-opts -I -i "--fail-on=error,warning")
# Build and sign. debuild runs lintian too but does NOT propagate its exit
# status, so a broken package would pass unnoticed; disable it here and run
# lintian ourselves below as the real gate.
local -a debuild_opts=(--no-lintian)
local -a build_opts=()
[[ $source_only -eq 1 ]] && build_opts+=(-S)
if [[ $unsigned -eq 1 ]]; then
@@ -219,7 +220,8 @@ main() {
info "building packages with debuild"
(
cd "$scratch/httrack-$ver"
debuild "${build_opts[@]}" "${debuild_opts[@]}"
# debuild options (--no-lintian) must precede the dpkg-buildpackage ones
debuild "${debuild_opts[@]}" "${build_opts[@]}"
)
# Collect every file the .changes references (orig, dsc, debs, ddebs, buildinfo).
@@ -229,6 +231,16 @@ main() {
changes=("$scratch"/*.changes)
shopt -u nullglob
[[ ${#changes[@]} -ge 1 ]] || die "debuild produced no .changes file"
# The real lintian gate (debuild only reports, it does not fail on tags).
# --profile debian: CI runners are Ubuntu, whose vendor data would wrongly
# reject the Debian "unstable" distribution. newer-standards-version only
# means the local lintian is older than the buildds', not a package
# defect, so suppress it. set -e turns any error/warning tag into a failure.
info "running lintian gate (--fail-on=error,warning)"
lintian --profile debian -I -i --fail-on=error,warning \
--suppress-tags newer-standards-version "${changes[@]}"
dcmd cp -- "${changes[@]}" "$outdir/"
# Clean-room build gate: rebuild the source package in a minimal chroot that