Compare commits

..

18 Commits

Author SHA1 Message Date
Xavier Roche
9c8d3a41eb tests: tighten the type-matrix guards
Add two assertions surfaced by review of the override path: control.php
must not survive its rename to control.html (a dual-write regression
would leave both), and gen.php?id=5 (a query/extension-less URL served
image/png) must keep its .png and not be mangled to .html. Both exercise
the "override still fires" direction that the suppression cases don't.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 18:25:45 +02:00
Xavier Roche
ae77cd9d6d Honor --assume under the default delayed type check (-%N2)
Under HARD savename-delayed (the default), url_savename() forced
is_html=-1 before consulting the user's --assume rules, so a type the
user pinned was lost to the delayed name and never applied (#56). Skip
the forced delay when is_userknowntype() matches: ishtml() already
consults the user type, so the immediate naming path applies it. Files
with no --assume rule are unaffected -- is_userknowntype() is false and
the delay still fires.

tests/16_local-assume.test crawls a .png served as image/png but assumed
text/html and checks it is saved .html; it fails without this change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 18:12:01 +02:00
Xavier Roche
51b8dcd81c Keep a known URL extension against a bogus html/empty Content-Type
Under the default delayed type check (-%N2), url_savename() rewrote a
saved file's extension from the wire Content-Type, gated only by
!may_unknown2(). text/html is not in the keep-list, so a response
labeled text/html -- or a typeless one, which is coerced to text/html --
clobbered the URL's own extension: a PNG served as text/html or with no
Content-Type was saved as .html, and .htm was normalized to .html (#29).
The bytes stayed intact; only the name was silently wrong.

wire_patches_ext() now lets the wire type override the extension only
when the type is patchable and doing so would not clobber a URL
extension that already maps to a specific, non-HTML type. A generator or
extension-less URL still becomes .html; a .png stays .png.

tests/15_local-types.test locks this with a deterministic offline crawl
of a content-type/extension matrix (tests/local-server.py); it fails on
the unfixed engine. Addresses the #267 mangle family (incl. #29).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 18:07:08 +02:00
Xavier Roche
bcce664143 Merge pull request #364 from xroche/feature/local-test-server
tests: offline local test server prototype (cookies + HTTPS)
2026-06-20 16:41:26 +02:00
Xavier Roche
7a24add87c tests: add offline local test server prototype (cookies + HTTPS)
Replace the network dependency for crawl tests with a self-contained Python
stdlib server (http.server + ssl) that httrack crawls over loopback. The server
binds an ephemeral port and prints it on stdout; local-crawl.sh discovers the
port, substitutes the BASEURL token into the httrack arguments, runs the crawl,
and audits the mirror under the discovered host-root directory.

This prototype migrates two cases off ut.httrack.com:

- 13_local-cookies.test drives the cookie chain (entrance/second/third)
  reimplemented as Python handlers from the old ut/cookies/*.php fixtures. A
  missing or wrong cookie answers 500, so a clean 3-files/0-errors run proves
  the cookie jar is replayed across links.
- 14_local-https.test crawls over HTTPS using a shipped long-dated self-signed
  cert. httrack does not verify certs, so the cert is accepted as-is and the
  real TLS path runs offline.

The group skips (exit 77) when python3 is missing, mirroring check-network.sh.
Fixtures and the cert are listed explicitly in EXTRA_DIST (automake does not
expand globs); make distcheck passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 16:35:13 +02:00
Xavier Roche
2308e7bafd Merge pull request #407 from xroche/fix/mkdeb-orig-artifact-rev2
mkdeb: cut a Debian revision >= 2 without bypassing the tool
2026-06-20 15:46:57 +02:00
Xavier Roche
ef5691fc47 mkdeb: reuse a frozen orig tarball for a Debian revision >= 2
mkdeb.sh regenerated the upstream orig from a fresh `git archive HEAD | make
dist` on every run. That is right for a -1 release, but a Debian revision >= 2
reuses the orig frozen in the archive at -1: the .dsc pins it by checksum, and
a regenerated orig (different mtimes, and content drift whenever the release
tooling shipped in EXTRA_DIST changes) gets rejected by dak. The -2 upload had
to bypass mkdeb.sh and stitch the package by hand.

Derive the upstream version and Debian revision from debian/changelog and let
the revision pick the orig: revision 1 builds a fresh tarball as before;
revision >= 2 reuses the one passed with --orig FILE, untouched. The --orig
requirement is enforced only for a signed (upload-bound) build: an unsigned
build is a throwaway (CI, local lintian) that can never reach the archive, so
it still regenerates the orig as before rather than demanding a frozen one.

Two guards close the gap the old code left implicit: the regenerate path
asserts the built tarball matches the changelog version (catching a
configure.ac/changelog skew), and the overlay step confirms the orig unpacks
to httrack-<ver>/ before dropping debian/ on top.

Validated end to end by reusing the official 3.49.8 orig to build 3.49.8-2:
the resulting .dsc pins the frozen orig's checksum byte for byte.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 15:44:12 +02:00
Xavier Roche
0a6eb73903 mkdeb: emit the orig website artifact on a Debian revision >= 2
The release-artifacts step signs and checksums httrack_<ver>.orig.tar.gz in
$outdir, but $outdir is populated by `dcmd cp` from the .changes, which lists
only the files in the upload. dpkg-genchanges omits the orig from a revision
>= 2 .changes (it is already in the archive), so the orig never reached
$outdir and `gpg --detach-sign` failed with "No such file or directory",
aborting a -2 (or later) release after the source package was already built.

Copy the orig from the build tree into $outdir before signing so the website
artifacts are produced regardless of the Debian revision. The upload is
unaffected: dput uploads the .changes-referenced files, not the extra orig.

CI didn't catch this because the deb job builds unsigned and the artifact
block is gated on a signing key.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 15:12:03 +02:00
Xavier Roche
fdb243e5a2 Merge pull request #406 from xroche/debian/libhttrack3-rename
debian: rename libhttrack2 to libhttrack3 to follow the SONAME
2026-06-20 15:04:12 +02:00
Xavier Roche
f8546e146d debian: drop the dead libhttrack-swf1.files and fix the overrides comment
Two packaging nits surfaced while reviewing the libhttrack3 rename, both
debian/-only:

- debian/libhttrack-swf1.files listed libhtsswf.so.1* but there is no
  libhttrack-swf1 package in debian/control and the swf module is no longer
  built (lib_LTLIBRARIES is just libhttrack/libhtsjava). dh_movefiles only
  consults built packages, so the list was dead. Remove it.

- libhttrack3.lintian-overrides claimed the ABI is tracked via "a strict
  =version dependency", but dh_makeshlibs --version-info emits the
  conservative (>= upstream-version) form, which is the correct choice for a
  soname-versioned library; a = ${binary:Version} shlibs dependency draws
  lintian's distant-prerequisite-in-shlibs. Correct the comment to match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 14:59:00 +02:00
Xavier Roche
b7f602f2eb debian: rename libhttrack2 to libhttrack3 to follow the SONAME
The 3.49.8 ABI bump moved the soname to libhttrack.so.3, but the packaging
still globbed .so.2 in debian/libhttrack2.files, so the runtime libraries
matched nothing there and fell through into the catch-all httrack package;
libhttrack2 shipped no library (lintian package-name-doesnt-match-sonames).

Rename the binary package to libhttrack3, take over the misplaced libraries
from httrack and the old libhttrack2 via Breaks/Replaces, and switch the
.files globs to a .so.3* wildcard so a future soname bump no longer silently
misplaces the libraries. Ships as 3.49.8-2; new binary name goes through NEW.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 14:46:14 +02:00
Xavier Roche
550100b56a Merge pull request #405 from xroche/feature/mkdeb-sbuild
mkdeb: optional --sbuild clean-room build gate
2026-06-20 14:43:43 +02:00
Xavier Roche
33ddb27243 mk-sbuild-chroot: suggest a concrete usermod for the subuid range
Compute a start past every range already in /etc/subuid+subgid and print the
canonical sudo usermod --add-subuids/--add-subgids command, instead of a raw
file append the user has to adjust by hand.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 14:13:06 +02:00
Xavier Roche
4606dfbf66 mk-sbuild-chroot: require a subuid/subgid range up front
The unshare backend maps a whole UID range, not just the caller's, because the
base install creates system users. Without an /etc/subuid+subgid entry the
install crashes (dpkg SIGSEGV) instead of failing cleanly. Check for the range
before bootstrapping and point at the one-line fix; skip the check for root,
which uses mode=root.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 14:07:10 +02:00
Xavier Roche
a6f1b9a3dd mk-sbuild-chroot: only treat an active $chroot_mode line as configured
The idempotency guard matched chroot_mode.*unshare anywhere in ~/.sbuildrc,
including a commented-out line, so --write-sbuildrc would silently skip the
append and leave the unshare backend unconfigured. Anchor the match to an
active assignment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 14:02:42 +02:00
Xavier Roche
fb35d6a0f1 tools: add mk-sbuild-chroot.sh to set up the --sbuild gate
The --sbuild gate needs an sbuild chroot, which was only documented as loose
commands. This adds a companion script that bootstraps one with the rootless
unshare backend (mmdebstrap into ~/.cache/sbuild/<dist>-<arch>.tar.zst, where
sbuild finds it by name), idempotent unless --force, optionally writing the
unshare mode into ~/.sbuildrc. mkdeb.sh's --sbuild help now points at it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 13:43:34 +02:00
Xavier Roche
8a270fec03 mkdeb: add an optional --sbuild clean-room build gate
With source-only uploads the archive's buildds are the first place the package
is built in a clean environment, so an undeclared Build-Depends or any FTBFS
only shows up after the upload. --sbuild rebuilds the freshly produced .dsc in a
minimal chroot holding only the declared Build-Depends, reproducing the buildd
environment; a failure aborts the release before the upload. It runs after the
source package is built and before the upstream-tarball release artifacts are
signed. Logs and the clean-built debs land in <outdir>/sbuild.

The distribution comes from the changelog (UNRELEASED falls back to unstable),
and the flag fails fast if sbuild isn't installed. Off by default; needs an
sbuild chroot for the target suite.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 13:37:20 +02:00
Xavier Roche
0cbd5279f2 Merge pull request #404 from xroche/release/3.49.8
Curate the 3.49-8 release notes
2026-06-20 13:06:13 +02:00
23 changed files with 937 additions and 47 deletions

5
.flake8 Normal file
View File

@@ -0,0 +1,5 @@
[flake8]
# Match black's formatting so the two tools don't fight.
max-line-length = 88
# E203/W503 conflict with black's slice and line-break style.
extend-ignore = E203, W503

13
debian/changelog vendored
View File

@@ -1,3 +1,16 @@
httrack (3.49.8-2) unstable; urgency=medium
* Rename libhttrack2 to libhttrack3 to follow the SONAME, which the 3.49.8
ABI bump moved to libhttrack.so.3 (package-name-doesnt-match-sonames). In
3.49.8-1 the libhttrack2.files glob still matched .so.2, so the runtime
libraries fell through into the httrack package and libhttrack2 shipped no
library. The new .files uses a .so.3* wildcard so a future SONAME bump no
longer silently misplaces the libraries. New binary package, via NEW.
* Drop the stale debian/libhttrack-swf1.files: the swf module is no longer
built and no libhttrack-swf1 package exists.
-- Xavier Roche <xavier@debian.org> Sat, 20 Jun 2026 14:42:13 +0200
httrack (3.49.8-1) unstable; urgency=medium
* New upstream release: HTTPS-proxy CONNECT tunnelling and wider srcset

6
debian/control vendored
View File

@@ -58,13 +58,13 @@ Description: webhttrack common files
This package is the common files of webhttrack, website copier and
mirroring utility
Package: libhttrack2
Package: libhttrack3
Architecture: any
Multi-Arch: same
Section: libs
Replaces: libhttrack1
Conflicts: libhttrack1
Depends: ${misc:Depends}, ${shlibs:Depends}
Replaces: libhttrack2, httrack (<< 3.49.8-2~)
Breaks: libhttrack2, httrack (<< 3.49.8-2~)
Description: Httrack website copier library
This package is the library part of httrack, website copier and mirroring
utility

View File

@@ -1,2 +0,0 @@
usr/lib/*/libhtsswf.so.1.0.0
usr/lib/*/libhtsswf.so.1

View File

@@ -1,5 +0,0 @@
usr/lib/*/libhttrack.so.2.0.49
usr/lib/*/libhttrack.so.2
usr/lib/*/libhtsjava.so.2.0.49
usr/lib/*/libhtsjava.so.2
usr/share/httrack/templates

View File

@@ -1,3 +0,0 @@
# The shared libraries ship without a versioned symbols control file (ABI is
# tracked via the SONAME and a strict =version dependency, see debian/rules).
libhttrack2: no-symbols-control-file usr/lib/*

3
debian/libhttrack3.files vendored Normal file
View File

@@ -0,0 +1,3 @@
usr/lib/*/libhttrack.so.3*
usr/lib/*/libhtsjava.so.3*
usr/share/httrack/templates

3
debian/libhttrack3.lintian-overrides vendored Normal file
View File

@@ -0,0 +1,3 @@
# The shared libraries ship without a versioned symbols control file (ABI is
# tracked via the SONAME plus a >= upstream-version dependency, see debian/rules).
libhttrack3: no-symbols-control-file usr/lib/*

2
debian/rules vendored
View File

@@ -135,7 +135,7 @@ binary-arch: build install
dh_makeshlibs -a -X/usr/lib/$(DEB_HOST_MULTIARCH)/httrack/libtest --version-info
dh_installdeb -a
# we depend on the current version (ABI may change)
dh_shlibdeps -a -ldebian/libhttrack2/usr/lib/$(DEB_HOST_MULTIARCH)
dh_shlibdeps -a -ldebian/libhttrack3/usr/lib/$(DEB_HOST_MULTIARCH)
dh_gencontrol -a
dh_md5sums -a
dh_builddeb -a

View File

@@ -138,6 +138,30 @@ static void cleanEndingSpaceOrDot(char *s) {
}
}
/* Should the wire Content-Type override the URL's own extension when naming the
saved file? True only when the type is patchable (may_unknown2) and doing so
would not clobber a URL extension that already maps to a specific, non-HTML
type. This is the #267 mangle guard: a .png served as text/html (or with no
type) stays named .png. */
static int wire_patches_ext(httrackp *opt, const char *wiremime,
const char *file) {
char urlmime[256];
if (may_unknown2(opt, wiremime, file))
return 0; /* type kept verbatim (keep-list / bogus-multiple) */
urlmime[0] = '\0';
/* type implied by the URL extension, only when confidently known (flag 0) */
if (!get_httptype_sized(opt, urlmime, sizeof(urlmime), file, 0))
return 1; /* URL ext implies no known type: trust the wire type */
if (strfield2(wiremime, urlmime))
return 0; /* wire agrees with the ext: keep it (no .htm->.html churn) */
/* wire disagrees: keep a specific non-HTML ext against an html/empty claim */
if (!is_hypertext_mime(opt, urlmime, file) &&
(is_html_mime_type(wiremime) || !strnotempty(wiremime)))
return 0;
return 1;
}
// forme le nom du fichier à sauver (save) à partir de fil et adr
// système intelligent, qui renomme en cas de besoin (exemple: deux INDEX.HTML et index.html)
int url_savename(lien_adrfilsave *const afs,
@@ -325,7 +349,10 @@ int url_savename(lien_adrfilsave *const afs,
}
/* replace shtml to html.. */
if (opt->savename_delayed == HTS_SAVENAME_DELAYED_HARD)
/* HARD delays every type, except one the user pinned with --assume: honor it
immediately (ishtml() consults the user type), no delayed name (#56) */
if (opt->savename_delayed == HTS_SAVENAME_DELAYED_HARD &&
!is_userknowntype(opt, fil))
is_html = -1; /* ALWAYS delay type */
else
is_html = ishtml(opt, fil);
@@ -380,7 +407,7 @@ int url_savename(lien_adrfilsave *const afs,
if (strnotempty(r.cdispo)) { /* filename given */
ext_chg = 2; /* change filename */
strcpybuff(ext, r.cdispo);
} else if (!may_unknown2(opt, r.contenttype, fil)) { // on peut patcher à priori?
} else if (wire_patches_ext(opt, r.contenttype, fil)) {
if (give_mimext(s, sizeof(s),
r.contenttype)) { // recognized extension
ext_chg = 1;
@@ -425,7 +452,8 @@ int url_savename(lien_adrfilsave *const afs,
if (strnotempty(headers->r.cdispo)) { /* filename given */
ext_chg = 2; /* change filename */
strcpybuff(ext, headers->r.cdispo);
} else if (!may_unknown2(opt, headers->r.contenttype, headers->url_fil)) { // on peut patcher à priori? (pas interdit ou pas de type)
} else if (wire_patches_ext(opt, headers->r.contenttype,
headers->url_fil)) {
char s[16];
if (give_mimext(
s, sizeof(s),
@@ -653,7 +681,8 @@ int url_savename(lien_adrfilsave *const afs,
if (strnotempty(back[b].r.cdispo)) { /* filename given */
ext_chg = 2; /* change filename */
strcpybuff(ext, back[b].r.cdispo);
} else if (!may_unknown2(opt, back[b].r.contenttype, back[b].url_fil)) { // on peut patcher à priori? (pas interdit ou pas de type)
} else if (wire_patches_ext(opt, back[b].r.contenttype,
back[b].url_fil)) {
if (give_mimext(
s, sizeof(s),
back[b].r.contenttype)) { // recognized extension

15
tests/13_local-cookies.test Executable file
View File

@@ -0,0 +1,15 @@
#!/bin/bash
#
# Cookie chain against the local test server (replaces the old online
# ut/cookies/*.php fixtures). entrance.php sets cat/cake; second.php checks
# them and sets badger; third.php checks all three. A missing or wrong cookie
# returns 500, which would surface as an httrack error and a missing file, so a
# clean 3-files/0-errors run proves the cookie jar is replayed across links.
: "${top_srcdir:=..}"
bash "$top_srcdir/tests/local-crawl.sh" --errors 0 --files 3 \
--found 'cookies/entrance.html' \
--found 'cookies/second.html' \
--found 'cookies/third.html' \
httrack 'BASEURL/cookies/entrance.php'

18
tests/14_local-https.test Executable file
View File

@@ -0,0 +1,18 @@
#!/bin/bash
#
# HTTPS crawl against the local test server, using the shipped self-signed
# cert. httrack does not verify certs (htslib.c: SSL_CTX_new with no
# SSL_CTX_set_verify), so the self-signed cert is accepted as-is and this
# exercises the real TLS path offline. basic.html links to link.html with four
# distinct query strings, each saved under a hashed name -> 5 files.
: "${top_srcdir:=..}"
if test "$HTTPS_SUPPORT" == "no"; then
echo "no https support compiled, skipping"
exit 77
fi
bash "$top_srcdir/tests/local-crawl.sh" --tls --errors 0 --files 5 \
--found 'simple/basic.html' \
httrack 'BASEURL/simple/basic.html'

20
tests/15_local-types.test Normal file
View File

@@ -0,0 +1,20 @@
#!/bin/bash
#
# Content-Type vs URL-extension naming (issue #267 family). Under the default
# delayed type check (-%N2), a bogus/missing html-ish wire type must not clobber
# a URL extension that maps to a specific non-HTML type. The .html "mangle" names
# are asserted absent so a regression that re-introduces it fails here.
: "${top_srcdir:=..}"
bash "$top_srcdir/tests/local-crawl.sh" --errors 0 \
--found 'types/notype.png' --not-found 'types/notype.html' \
--found 'types/lie.png' --not-found 'types/lie.html' \
--found 'types/page.htm' --not-found 'types/page.html' \
--found 'types/photo.png' \
--found 'types/script.js' \
--found 'types/style.css' \
--found 'types/data.json' \
--found 'types/control.html' --not-found 'types/control.php' \
--found 'types/gend61c.png' --not-found 'types/gend61c.html' \
httrack 'BASEURL/types/index.html'

View File

@@ -0,0 +1,11 @@
#!/bin/bash
#
# --assume under the default delayed type check (-%N2), issue #56. A user type
# pinned with --assume must be honored immediately, not lost to the delayed
# name: photo.png served as image/png but assumed text/html is saved as .html.
: "${top_srcdir:=..}"
bash "$top_srcdir/tests/local-crawl.sh" --errors 0 \
--found 'types/photo.html' --not-found 'types/photo.png' \
httrack 'BASEURL/types/photo.png' --assume png=text/html

View File

@@ -3,6 +3,8 @@
# silently drop it from the dist tarball and break "make distcheck".
EXTRA_DIST = $(TESTS) crawl-test.sh run-all-tests.sh check-network.sh \
proxy-https-server.py \
local-crawl.sh local-server.py server.crt server.key \
server-root/simple/basic.html server-root/simple/link.html \
fixtures/cache-golden/hts-cache/new.zip
TESTS_ENVIRONMENT =
@@ -47,6 +49,10 @@ TESTS = \
11_crawl-longurl.test \
11_crawl-parsing.test \
12_crawl_https.test \
13_crawl_proxy_https.test
13_crawl_proxy_https.test \
13_local-cookies.test \
14_local-https.test \
15_local-types.test \
16_local-assume.test
CLEANFILES = check-network_sh.cache

235
tests/local-crawl.sh Executable file
View File

@@ -0,0 +1,235 @@
#!/bin/bash
#
# Launcher for httrack crawl tests against the local Python test server.
#
# Starts tests/local-server.py on an ephemeral port, discovers the port from
# the server's stdout, then runs httrack against http(s)://127.0.0.1:$PORT and
# audits the mirror. The server is always killed and the tmpdir removed on exit.
#
# The token BASEURL in any httrack argument is replaced with the discovered
# http(s)://127.0.0.1:$PORT base. --found/--directory paths are relative to the
# discovered host root (127.0.0.1_<port>/), since the random port leaks into
# the mirror directory name.
#
# Usage:
# bash local-crawl.sh [--tls] [--root DIR] \
# --errors N --files N --found PATH ... --directory PATH ... \
# httrack BASEURL/some/path [httrack-args...]
set -u
testdir=$(cd "$(dirname "$0")" && pwd)
server="${testdir}/local-server.py"
root="${LOCAL_SERVER_ROOT:-${testdir}/server-root}"
cert="${testdir}/server.crt"
key="${testdir}/server.key"
tls=
verbose=
tmpdir=
serverpid=
crawlpid=
function warning {
echo "** $*" >&2
return 0
}
function die {
warning "$*"
exit 1
}
function debug {
test -n "$verbose" && echo "$*" >&2
return 0
}
function info { printf "[%s] ..\t" "$*" >&2; }
function result { echo "$*" >&2; }
function cleanup {
if test -n "$crawlpid"; then
kill -9 "$crawlpid" 2>/dev/null
crawlpid=
fi
if test -n "$serverpid"; then
kill "$serverpid" 2>/dev/null
# Reap it so the port is released before we rm the tmpdir/log.
wait "$serverpid" 2>/dev/null
serverpid=
fi
if test -n "$tmpdir" && test -d "$tmpdir"; then
test -n "$nopurge" || rm -rf "$tmpdir"
fi
}
function assert_equals {
info "$1"
if test ! "$2" == "$3"; then
result "expected '$2', got '$3'"
exit 1
fi
result "OK ($2)"
}
nopurge=
trap cleanup EXIT HUP INT QUIT PIPE TERM
# python3 is required; mirror check-network.sh's skip-with-77 convention.
command -v python3 >/dev/null || ! echo "python3 not found; skipping local crawl tests" || exit 77
tmptopdir=${TMPDIR:-/tmp}
test -d "$tmptopdir" || mkdir -p "$tmptopdir" || die "no temporary directory; set TMPDIR"
tmpdir=$(mktemp -d "${tmptopdir}/httrack_local.XXXXXX") || die "could not create tmpdir"
# --- parse leading control flags --------------------------------------------
declare -a audit=()
scheme=http
pos=0
args=("$@")
nargs=$#
while test "$pos" -lt "$nargs"; do
case "${args[$pos]}" in
--debug) verbose=1 ;;
--no-purge)
nopurge=1
audit+=("--no-purge")
;;
--tls)
tls=1
scheme=https
;;
--root)
pos=$((pos + 1))
root="${args[$pos]}"
;;
--errors | --files)
audit+=("${args[$pos]}" "${args[$((pos + 1))]}")
pos=$((pos + 1))
;;
--found | --not-found | --directory)
audit+=("${args[$pos]}" "${args[$((pos + 1))]}")
pos=$((pos + 1))
;;
httrack)
pos=$((pos + 1))
break
;;
*) die "unrecognized option ${args[$pos]}" ;;
esac
pos=$((pos + 1))
done
# --- start the server --------------------------------------------------------
test -r "$server" || die "cannot read $server"
serverlog="${tmpdir}/server.log"
serverargs=(--root "$root")
if test -n "$tls"; then
serverargs+=(--tls --cert "$cert" --key "$key")
fi
debug "starting python3 $server ${serverargs[*]}"
python3 "$server" "${serverargs[@]}" >"$serverlog" 2>&1 &
serverpid=$!
# Wait for the "PORT <n>" line (server prints it once bound).
port=
for _ in $(seq 1 50); do
if test -s "$serverlog"; then
line=$(head -n1 "$serverlog")
if test "${line%% *}" == "PORT"; then
port="${line#PORT }"
break
fi
fi
kill -0 "$serverpid" 2>/dev/null || die "server exited early: $(cat "$serverlog")"
sleep 0.1
done
test -n "$port" || die "could not discover server port: $(cat "$serverlog")"
debug "server listening on ${scheme}://127.0.0.1:${port}"
baseurl="${scheme}://127.0.0.1:${port}"
# --- substitute BASEURL in the remaining (httrack) args ----------------------
declare -a hts=()
while test "$pos" -lt "$nargs"; do
hts+=("${args[$pos]//BASEURL/$baseurl}")
pos=$((pos + 1))
done
# --- run httrack -------------------------------------------------------------
which httrack >/dev/null || die "could not find httrack"
ver=$(httrack -O /dev/null --version | sed -e 's/HTTrack version //')
test -n "$ver" || die "could not run httrack"
out="${tmpdir}/crawl"
mkdir "$out" || die "could not create $out"
# Localhost is fast; disable the rate/bandwidth safety limits but keep a
# max-time backstop so a hang cannot wedge the suite.
declare -a moreargs=(--quiet --max-time=120 --timeout=30 --disable-security-limits --robots=0)
log="${tmpdir}/log"
info "running httrack ${hts[*]}"
httrack -O "$out" --user-agent="httrack $ver local ($(uname -omrs))" "${moreargs[@]}" "${hts[@]}" >"$log" 2>&1 &
crawlpid=$!
wait "$crawlpid"
crawlres=$?
crawlpid=
# httrack exits 0 even on hard connect/DNS errors, so this is a backstop only;
# the real guard is the audit below (--errors 0 plus the host-root existence check).
test "$crawlres" -eq 0 || ! result "httrack exited $crawlres" || {
cat "$log" >&2
exit 1
}
result "OK"
grep -iE "^[0-9:]*[[:space:]]Error:" "${out}/hts-log.txt" >&2
# --- discover the single host root (127.0.0.1_<port> or 127.0.0.1) -----------
hostroot=
for cand in "${out}/127.0.0.1_${port}" "${out}/127.0.0.1"; do
if test -d "$cand"; then
hostroot="$cand"
break
fi
done
test -n "$hostroot" || die "could not find host root under $out"
debug "host root: $hostroot"
# --- audit -------------------------------------------------------------------
i=0
while test "$i" -lt "${#audit[@]}"; do
case "${audit[$i]}" in
--errors)
i=$((i + 1))
assert_equals "checking errors" "${audit[$i]}" \
"$(grep -iEc "^[0-9:]*[[:space:]]Error:" "${out}/hts-log.txt")"
;;
--files)
i=$((i + 1))
nFiles=$(grep -E "^HTTrack Website Copier/[^ ]* mirror complete in " "${out}/hts-log.txt" |
sed -e 's/.*[[:space:]]\([^ ]*\)[[:space:]]files written.*/\1/g')
assert_equals "checking files" "${audit[$i]}" "$nFiles"
;;
--found)
i=$((i + 1))
info "checking for ${audit[$i]}"
if test -f "${hostroot}/${audit[$i]}"; then result "OK"; else
result "not found"
exit 1
fi
;;
--not-found)
i=$((i + 1))
info "checking absence of ${audit[$i]}"
if test ! -f "${hostroot}/${audit[$i]}"; then result "OK"; else
result "present"
exit 1
fi
;;
--directory)
i=$((i + 1))
info "checking for dir ${audit[$i]}"
if test -d "${hostroot}/${audit[$i]}"; then result "OK"; else
result "not found"
exit 1
fi
;;
esac
i=$((i + 1))
done

240
tests/local-server.py Executable file
View File

@@ -0,0 +1,240 @@
#!/usr/bin/env python3
"""Self-contained local web server for httrack's crawl tests.
Serves static fixtures from a docroot plus a handful of dynamic endpoints
(cookies, ...) so httrack can be exercised over loopback, deterministically and
offline, instead of crawling the live ut.httrack.com.
Binds to an ephemeral port (port 0) and prints the chosen port to stdout as
"PORT <n>\n" so a launcher can discover it. Pass --tls to wrap the socket with
the shipped self-signed test cert; httrack does not verify certs, so no CA
trust plumbing is needed.
stdlib only (http.server + ssl) -- no new build or runtime dependency.
"""
import argparse
import os
from http.server import SimpleHTTPRequestHandler, ThreadingHTTPServer
from urllib.parse import quote, unquote, urlsplit
# Cookie chain replicated from the old ut/cookies/*.php fixtures.
COOKIE_PATH = "/cookies/"
COOKIES = {
"cat": "dog",
"cake": "is a lie!",
"badger": "mushroom, with 'ants'",
}
PAGE = """<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
\t"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
\t<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
\t<title>Sample test</title>
</head>
<body>
{body}
</body>
</html>
"""
class Handler(SimpleHTTPRequestHandler):
# Quieter logging; the launcher captures httrack's own log anyway.
def log_message(self, fmt, *args):
if os.environ.get("LOCAL_SERVER_VERBOSE"):
super().log_message(fmt, *args)
# --- helpers -----------------------------------------------------------
def request_cookies(self):
"""Parse the Cookie header into {name: decoded-value}.
Mirrors PHP's $_COOKIE: values are url-decoded, matching the encoding
applied when the cookie was set (see set_cookie)."""
jar = {}
raw = self.headers.get("Cookie", "")
for pair in raw.split(";"):
pair = pair.strip()
if "=" in pair:
name, value = pair.split("=", 1)
jar[name.strip()] = unquote(value.strip())
return jar
def set_cookie(self, name, value):
"""Queue a Set-Cookie header, url-encoding the value like PHP's
setcookie() so spaces/quotes/commas stay a single token that httrack
can store and replay verbatim."""
self._set_cookies.append(f"{name}={quote(value)}; Path={COOKIE_PATH}")
def send_html(self, body, status=200, extra_status=None):
encoded = PAGE.format(body=body).encode("utf-8")
self.send_response(status, extra_status)
self.send_header("Content-Type", "text/html; charset=utf-8")
self.send_header("Content-Length", str(len(encoded)))
for cookie in self._set_cookies:
self.send_header("Set-Cookie", cookie)
self.end_headers()
if self.command != "HEAD":
self.wfile.write(encoded)
def fail_cookie(self, what):
# The old PHPs answered 500 with the reason in the status line.
self.send_html("", status=500, extra_status=f"The {what} is missing or invalid")
# --- dynamic routes ----------------------------------------------------
def route_entrance(self):
self.set_cookie("cat", COOKIES["cat"])
self.set_cookie("cake", COOKIES["cake"])
self.send_html('\tThis is a <a href="second.php">link</a>')
def route_second(self):
jar = self.request_cookies()
if jar.get("cat") != COOKIES["cat"]:
return self.fail_cookie("cat")
if jar.get("cake") != COOKIES["cake"]:
return self.fail_cookie("cake")
self.set_cookie("badger", COOKIES["badger"])
self.send_html('\tThis is a <a href="third.php">link</a>')
def route_third(self):
jar = self.request_cookies()
if jar.get("cat") != COOKIES["cat"]:
return self.fail_cookie("cat")
if jar.get("cake") != COOKIES["cake"]:
return self.fail_cookie("cake")
if jar.get("badger") != COOKIES["badger"]:
return self.fail_cookie("badger")
self.send_html("\tThis is a test.")
def route_robots(self):
body = b"User-agent: *\nDisallow:\n"
self.send_response(200)
self.send_header("Content-Type", "text/plain")
self.send_header("Content-Length", str(len(body)))
self.end_headers()
if self.command != "HEAD":
self.wfile.write(body)
# --- type/extension matrix (issue #267 family) -------------------------
def send_raw(self, body, content_type):
"""Send a raw body with an explicit Content-Type, or none at all when
content_type is None (to observe httrack's typeless-file naming)."""
self.send_response(200)
if content_type is not None:
self.send_header("Content-Type", content_type)
self.send_header("Content-Length", str(len(body)))
self.end_headers()
if self.command != "HEAD":
self.wfile.write(body)
# A fake-binary PNG-ish blob for the image/typeless cases.
FAKE_PNG = b"\x89PNG\r\n\x1a\n" + b"\x00" * 64
# path -> (body, content_type); content_type None means no header at all.
TYPE_MATRIX = {
"/types/control.php": (b"<html><body>control</body></html>", "text/html"),
"/types/photo.png": (FAKE_PNG, "image/png"),
"/types/notype.png": (FAKE_PNG, None),
"/types/lie.png": (FAKE_PNG, "text/html"),
"/types/page.htm": (b"<html><body>htm page</body></html>", "text/html"),
"/types/script.js": (b"var x = 1;\n", "application/javascript"),
"/types/style.css": (b"body { color: red; }\n", "text/css"),
"/types/data.json": (b'{"k": "v"}\n', "application/json"),
"/types/gen.php": (FAKE_PNG, "image/png"),
}
def route_types_index(self):
body = (
'\t<a href="control.php">control</a>\n'
'\t<img src="photo.png" />\n'
'\t<img src="notype.png" />\n'
'\t<img src="lie.png" />\n'
'\t<a href="page.htm">htm</a>\n'
'\t<script src="script.js"></script>\n'
'\t<link rel="stylesheet" href="style.css" />\n'
'\t<a href="data.json">json</a>\n'
'\t<img src="gen.php?id=5" />\n'
)
self.send_html(body)
def route_types(self):
path = urlsplit(self.path).path
body, ctype = self.TYPE_MATRIX[path]
self.send_raw(body, ctype)
ROUTES = {
"/cookies/entrance.php": route_entrance,
"/cookies/second.php": route_second,
"/cookies/third.php": route_third,
"/robots.txt": route_robots,
"/types/index.html": route_types_index,
"/types/control.php": route_types,
"/types/photo.png": route_types,
"/types/notype.png": route_types,
"/types/lie.png": route_types,
"/types/page.htm": route_types,
"/types/script.js": route_types,
"/types/style.css": route_types,
"/types/data.json": route_types,
"/types/gen.php": route_types,
}
# --- dispatch ----------------------------------------------------------
def dispatch(self):
self._set_cookies = []
path = urlsplit(self.path).path
handler = self.ROUTES.get(path)
if handler is not None:
handler(self)
return True
return False
def do_GET(self):
if not self.dispatch():
super().do_GET()
def do_HEAD(self):
if not self.dispatch():
super().do_HEAD()
def main():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--root", required=True, help="docroot for static files")
parser.add_argument("--bind", default="127.0.0.1", help="bind address")
parser.add_argument("--tls", action="store_true", help="serve HTTPS")
parser.add_argument("--cert", help="TLS certificate (PEM)")
parser.add_argument("--key", help="TLS private key (PEM)")
args = parser.parse_args()
root = os.path.abspath(args.root)
def factory(*a, **kw):
return Handler(*a, directory=root, **kw)
httpd = ThreadingHTTPServer((args.bind, 0), factory)
if args.tls:
import ssl
ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ctx.load_cert_chain(certfile=args.cert, keyfile=args.key)
httpd.socket = ctx.wrap_socket(httpd.socket, server_side=True)
port = httpd.socket.getsockname()[1]
# The launcher reads this line to discover the ephemeral port.
print(f"PORT {port}", flush=True)
try:
httpd.serve_forever()
except KeyboardInterrupt:
pass
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,18 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Sample test</title>
</head>
<body>
This is a <a href="link.html?v=1">link</a>
This is a <a href='link.html?v=2'>link</a>
This is a <a href="./link.html?v=3">link</a>
This is a <a href=link.html?v=4>link</a>
</body>

View File

@@ -0,0 +1,3 @@
This is a link.
Go back to <a href="basic.html">home</a>.

21
tests/server.crt Normal file
View File

@@ -0,0 +1,21 @@
-----BEGIN CERTIFICATE-----
MIIDbzCCAlegAwIBAgIUdWkDDomnY3WW95UqJ+UOASuR/i0wDQYJKoZIhvcNAQEL
BQAwODESMBAGA1UEAwwJMTI3LjAuMC4xMSIwIAYDVQQKDBlIVFRyYWNrIGxvY2Fs
IHRlc3Qgc2VydmVyMCAXDTI2MDYxNTE0NDQxMFoYDzIwNTYwNjA3MTQ0NDEwWjA4
MRIwEAYDVQQDDAkxMjcuMC4wLjExIjAgBgNVBAoMGUhUVHJhY2sgbG9jYWwgdGVz
dCBzZXJ2ZXIwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDx78mogNhT
noWwRa51NeGtapQ1PfTYLlIMUzuloFXOsR1/ozRkFucqHNftF22wf0gg4VQJSBSf
3rwj79vsnt3nyaD03bTAafpHXkd+IJxQowiG8TfOJF0R/Qg9g7DCE66R9agQpMJC
SGxIin9p/4ld4Hn6869d4hNq4fHxNf/qkj2cnf8DYxrldz2FGsi6yMed4tzz2Am4
ZbPgwep+fy843ZdYrVIms9vJluNa9E+6Vpw9FwdjzQ/IBBMLvGaC2pDkc95YelaE
nQrAlTO/0l5vjc8XuTQFlo3DbUg+WEld/pxvCqsd/q1mqjL0WbxtXl2zCwGzAoJx
rjVEPfA8QSbtAgMBAAGjbzBtMB0GA1UdDgQWBBTHE0KKW8REV4HxajzVsIBxz3iL
9zAfBgNVHSMEGDAWgBTHE0KKW8REV4HxajzVsIBxz3iL9zAPBgNVHRMBAf8EBTAD
AQH/MBoGA1UdEQQTMBGHBH8AAAGCCWxvY2FsaG9zdDANBgkqhkiG9w0BAQsFAAOC
AQEAYlTEftrwGJBXuPmtxhmtw2HO/VTC4TGnq67hH5H+ptwgZJuuxCQ5KW6flTyp
FTyMhha33WD4EBL3wqqJsWr9Y4BXqi4G0lRqXBcC1oIUa2VYIDMER7kaY1qTSqE8
ARpwdB2BhvngAzDLc+4Jt4jQMRGr8fHAwxpDBoIZ1knbyzYNP73Bajse6/8YtxUu
nB2BsldjZnLvyHvRxUpWp92OyQih4jYSrlN6olDFlKDg7++kMhkHtJQW9a1t54VN
0ZXrB1ZRuHUUvGBq26x71riTWor7HNOSQaGeCMQjZNQkh5tfshNygUGSZVXTEwhG
xSrOL7NqBt2+EkVwf7LjGzjmBw==
-----END CERTIFICATE-----

28
tests/server.key Normal file
View File

@@ -0,0 +1,28 @@
-----BEGIN PRIVATE KEY-----
MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQDx78mogNhTnoWw
Ra51NeGtapQ1PfTYLlIMUzuloFXOsR1/ozRkFucqHNftF22wf0gg4VQJSBSf3rwj
79vsnt3nyaD03bTAafpHXkd+IJxQowiG8TfOJF0R/Qg9g7DCE66R9agQpMJCSGxI
in9p/4ld4Hn6869d4hNq4fHxNf/qkj2cnf8DYxrldz2FGsi6yMed4tzz2Am4ZbPg
wep+fy843ZdYrVIms9vJluNa9E+6Vpw9FwdjzQ/IBBMLvGaC2pDkc95YelaEnQrA
lTO/0l5vjc8XuTQFlo3DbUg+WEld/pxvCqsd/q1mqjL0WbxtXl2zCwGzAoJxrjVE
PfA8QSbtAgMBAAECggEACgNK4klq1T3IpKdNoBY5yoE7CbUQZBNkBpSPRxHgBezj
SVFfgrZGnOySrIJSt4JHtuynG2Hl+0ku74HRep/ck+eOsh5W3mZvGvMLnGxhwR3u
Or99osTIgU0VQTkpC0SLQ16FCnih0uJycNIikdLR7uuya1tt1OyIBzK7XlNGIywT
p85zJc7/6TfTC9eM7lqh7JGR7KplBxSvgZL1pUr7y4rNpKms6uzOvPND79CcKnbU
BBA9Tu4qdOkoOljsZKkvh3pihxyG9X6d8QTZ/uX3pkvliwSFBc+Sz9EootA3/4r5
gVWpQ2t/AY7fY4hqzLIX/HivVaPj3cWk1G+SHm0XNQKBgQD5I9rijqFvV/p6FmUl
FbnjJFFHHgZLivlGxAC5vOyJNQQaqdeDzg7yMotNmQTggVGjT6sjdosQb3n+ctuk
EhQnZSU5VkNKv1+PTR35WrRkaECCaqz3Pv79pV9GVcX3it7UuYjNiOeSPqINWe+X
49JwnJFz+qQ1BchAwOis4zkENwKBgQD4mShDaYLOO97VpgZj4cGxHHWyEK9CRQvp
I7HxRmfaWS3JHwb88lOmALEU6pAj5cYJPAznv8BnUWcVHalZbkQ1JWYtUJRqj6OI
Ym7rw/nm4Ay5ijbdEism173dSk3IjOe+PdAlxzsOuVzYdBTqElmeQWtBzhY9aHvX
r+A02C2j+wKBgHHDo6Gsi57yR5gUPd9vSlCkNtEIrss0DJv5yHMIB+KnaNZcE+NF
5qFF30Jxyz5RDtxJ9tXcvaeln8lG3XDQKI/MqfDCqTuqo5ImHrfMaW8oA70JxS2p
gHqGVzkg1aMxsIrmpcdk6olnPExocvWivGdbtzeEjhMALu8Sp6y6nUCFAoGBAK5h
KLgYw/OMVaQCIMthaa+l6f0s7PMMYe1453H6VBD6qz4/8HPwO7LfG1gzrUYxADgs
ElVh0UHn/On383nS+i9Ze5Hfyyvwc+LQQURKJPrJQMPJavCptPE7NmiKnYNHK6vr
yh0l4oxShAklbCJBGvICq4zuVfVfXDeQnDIVTfaPAoGBAMCrZqYdOUhUu+aUqxZq
qO/TTQxrxftU63jGUg+o042TdgI4KWLn07wvHJ8/E2OqF35eXenvcuKbNLI1l72J
4cp+3cUv8iAXThTRYEztr5CS/wta4o4CNN8zfjn5dV9AI4Hmt4V7EaGWpBcViGbj
n0Mhag+dO8DHuenqi1yfMrAt
-----END PRIVATE KEY-----

152
tools/mk-sbuild-chroot.sh Executable file
View File

@@ -0,0 +1,152 @@
#!/usr/bin/env bash
#
# Bootstrap an sbuild chroot for the clean-room build gate (mkdeb.sh --sbuild).
#
# Uses the rootless unshare backend: no root, no schroot daemon. It builds a
# minimal buildd chroot tarball into ~/.cache/sbuild/<dist>-<arch>.tar.zst, where
# sbuild --dist=<dist> finds it automatically in unshare mode.
#
# Usage:
# tools/mk-sbuild-chroot.sh [options]
#
# Options:
# -d, --dist DIST suite to bootstrap (default: unstable)
# -a, --arch ARCH architecture (default: dpkg --print-architecture)
# -m, --mirror URL apt mirror (default: http://deb.debian.org/debian)
# --components LIST comma-separated components (default: main)
# -f, --force rebuild even if the tarball already exists
# --write-sbuildrc add "$chroot_mode = 'unshare';" to ~/.sbuildrc if absent
# -h, --help show this help
#
# One-time setup; refresh later with sbuild-update or by rerunning with --force.
# Requires mmdebstrap and the uidmap tools (newuidmap) for the unshare backend.
set -euo pipefail
readonly PROGNAME=${0##*/}
die() {
printf '%s: error: %s\n' "$PROGNAME" "$*" >&2
exit 1
}
info() {
printf '==> %s\n' "$*" >&2
}
usage() {
sed -n '2,/^set -euo/{/^set -euo/!p}' "$0" | sed 's/^# \{0,1\}//'
}
need() {
local tool
for tool in "$@"; do
command -v "$tool" >/dev/null 2>&1 || die "required tool not found: $tool"
done
}
main() {
local dist=unstable
local arch=""
local mirror=http://deb.debian.org/debian
local components=main
local force=0
local write_sbuildrc=0
while [[ $# -gt 0 ]]; do
case $1 in
-d | --dist)
[[ $# -ge 2 ]] || die "missing argument for $1"
dist=$2
shift 2
;;
-a | --arch)
[[ $# -ge 2 ]] || die "missing argument for $1"
arch=$2
shift 2
;;
-m | --mirror)
[[ $# -ge 2 ]] || die "missing argument for $1"
mirror=$2
shift 2
;;
--components)
[[ $# -ge 2 ]] || die "missing argument for $1"
components=$2
shift 2
;;
-f | --force)
force=1
shift
;;
--write-sbuildrc)
write_sbuildrc=1
shift
;;
-h | --help)
usage
exit 0
;;
*)
die "unknown option: $1 (try --help)"
;;
esac
done
need mmdebstrap dpkg
# Unshare needs the setuid uid/gid mappers; mmdebstrap fails cryptically without.
command -v newuidmap >/dev/null 2>&1 ||
die "newuidmap not found; install the uidmap package for the unshare backend"
# Unshare maps a whole UID range, not just the caller's: the base install
# creates system users, and without an /etc/subuid+subgid range the install
# crashes (dpkg SIGSEGV) instead of erroring cleanly. Root uses mode=root and
# needs no range.
if [[ $(id -u) -ne 0 ]]; then
local me
me=$(id -un)
if ! grep -qs "^$me:" /etc/subuid || ! grep -qs "^$me:" /etc/subgid; then
# Suggest a range starting past every allocation in either file.
local start
start=$(awk -F: '{e = $2 + $3; if (e > m) m = e} END {print (m ? m : 100000)}' \
/etc/subuid /etc/subgid 2>/dev/null)
die "no /etc/subuid+subgid range for $me; the unshare backend needs one:
sudo usermod --add-subuids $start-$((start + 65535)) --add-subgids $start-$((start + 65535)) $me"
fi
fi
: "${arch:=$(dpkg --print-architecture)}"
local cache=$HOME/.cache/sbuild
local tarball=$cache/${dist}-${arch}.tar.zst
if [[ -e $tarball && $force -eq 0 ]]; then
info "chroot already exists: $tarball (use --force to rebuild)"
else
info "bootstrapping $dist/$arch chroot into $tarball"
mkdir -p "$cache"
mmdebstrap --variant=buildd --arch="$arch" --components="$components" \
"$dist" "$tarball" "$mirror"
info "chroot ready: $tarball"
fi
local rc=$HOME/.sbuildrc
local mode_line="\$chroot_mode = 'unshare';"
# shellcheck disable=SC2016 # $chroot_mode is literal regex text, not a shell var.
if grep -qsE '^[[:space:]]*\$chroot_mode[[:space:]]*=.*unshare' "$rc"; then
: # already configured (active, non-commented line)
elif [[ $write_sbuildrc -eq 1 ]]; then
info "enabling the unshare backend in $rc"
printf '%s\n' "$mode_line" >>"$rc"
else
cat >&2 <<EOF
==> To use this chroot without passing --chroot-mode each time, add to $rc:
$mode_line
(or rerun with --write-sbuildrc). Then verify with:
sbuild --dist=$dist path/to/package.dsc
and build the release gate with:
tools/mkdeb.sh --source-only --sbuild
EOF
fi
}
main "$@"

View File

@@ -20,11 +20,27 @@
# Options:
# -k, --key KEYID GPG key for signing (default: $DEBSIGN_KEYID)
# -o, --outdir DIR output directory (default: <repo>/dist)
# --orig FILE reuse this upstream orig tarball instead of
# regenerating it (required for a Debian revision
# >= 2, whose orig is frozen in the archive)
# -s, --source-only build only the source package
# -u, --unsigned do not sign anything (implies no release sigs)
# --no-release-artifacts skip the orig tarball .asc/.md5/.sha1
# --sbuild additionally build the .dsc in a clean sbuild
# chroot as a from-scratch verification gate
# -h, --help show this help
#
# --sbuild reproduces the buildd environment: it builds the source package in a
# minimal chroot holding only the declared Build-Depends, so an FTBFS or a
# missing dependency fails here instead of on the archive's buildds (which, with
# a source-only upload, are otherwise the first clean build). It needs an sbuild
# chroot for the changelog's distribution; create one once with the companion
# tools/mk-sbuild-chroot.sh (rootless unshare backend).
#
# The Debian revision in debian/changelog decides the orig: revision 1 builds a
# fresh upstream tarball; revision >= 2 must reuse the orig frozen at revision 1
# (the .dsc references it by checksum), so pass it with --orig.
#
# SOURCE_DATE_EPOCH is honored for reproducible output.
set -euo pipefail
@@ -57,9 +73,11 @@ need() {
main() {
local key=${DEBSIGN_KEYID:-}
local outdir=""
local orig_in=""
local source_only=0
local unsigned=0
local release_artifacts=1
local sbuild=0
while [[ $# -gt 0 ]]; do
case $1 in
@@ -73,6 +91,11 @@ main() {
outdir=$2
shift 2
;;
--orig)
[[ $# -ge 2 ]] || die "missing argument for $1"
orig_in=$2
shift 2
;;
-s | --source-only)
source_only=1
shift
@@ -85,6 +108,10 @@ main() {
release_artifacts=0
shift
;;
--sbuild)
sbuild=1
shift
;;
-h | --help)
usage
exit 0
@@ -95,7 +122,8 @@ main() {
esac
done
need git autoreconf debuild dcmd
need git autoreconf debuild dcmd dpkg-parsechangelog
[[ $sbuild -eq 1 ]] && need sbuild
if [[ $unsigned -eq 0 ]]; then
need gpg
[[ -n $key ]] || die "no signing key (pass --key or set DEBSIGN_KEYID, or use --unsigned)"
@@ -107,6 +135,11 @@ main() {
mkdir -p "$outdir"
outdir=$(cd "$outdir" && pwd)
if [[ -n $orig_in ]]; then
[[ -r $orig_in ]] || die "--orig file not readable: $orig_in"
orig_in=$(cd "$(dirname "$orig_in")" && pwd)/$(basename "$orig_in")
fi
scratch=$(mktemp -d "${TMPDIR:-/tmp}/httrack-mkdeb.XXXXXX")
trap 'rm -rf -- "$scratch"' EXIT
@@ -118,39 +151,58 @@ main() {
git -C "$repo/src/coucal" archive --format=tar --prefix=src/coucal/ HEAD |
tar -x -C "$export_dir"
# Refresh build system and man page, then build the tarball. We build here
# only because regen-man needs the compiled binaries; the test suite is not
# run in this pass. debuild (below) runs the full suite once, with the online
# tests enabled, so a check here would just be a slower, offline-only repeat.
info "regenerating build system and man page"
(
cd "$export_dir"
autoreconf -fi
./configure --quiet
make -s -j"$(nproc)"
make -s -C man regen-man
# Build the tarball from a clean tree so no object files leak into it.
make -s clean
make -s dist
)
# Upstream version and Debian revision drive the orig: revision 1 builds a
# fresh tarball, revision >= 2 reuses the one frozen at -1 (the .dsc pins it
# by checksum, so a regenerated orig with new mtimes would be rejected).
local fullver ver rev
fullver=$(cd "$export_dir" && dpkg-parsechangelog -S Version)
ver=${fullver%-*}
rev=${fullver##*-}
local orig=httrack_${ver}.orig.tar.gz
info "version $ver (Debian revision $rev)"
local tarball ver
local -a tarballs
shopt -s nullglob
tarballs=("$export_dir"/httrack-*.tar.gz)
shopt -u nullglob
[[ ${#tarballs[@]} -ge 1 ]] || die "make dist produced no tarball"
tarball=${tarballs[0]##*/}
ver=${tarball#httrack-}
ver=${ver%.tar.gz}
info "version $ver"
# A signed build is upload-bound, so a revision >= 2 must reuse the frozen
# orig (--orig); an unsigned build is a throwaway (CI, local) and may
# regenerate it, since it can never reach the archive.
if [[ -z $orig_in && $rev != 1 && $unsigned -eq 0 ]]; then
die "Debian revision $rev needs --orig FILE (the orig is frozen from revision 1)"
fi
if [[ -n $orig_in ]]; then
info "reusing upstream tarball $orig_in"
cp -- "$orig_in" "$scratch/$orig"
else
# Refresh build system and man page, then build the tarball. We build
# here only because regen-man needs the compiled binaries; the test
# suite is not run in this pass. debuild (below) runs the full suite
# once, online tests enabled, so a check here would just repeat it.
info "regenerating build system and man page"
(
cd "$export_dir"
autoreconf -fi
./configure --quiet
make -s -j"$(nproc)"
make -s -C man regen-man
# Build the tarball from a clean tree so no object files leak in.
make -s clean
make -s dist
)
local -a tarballs
shopt -s nullglob
tarballs=("$export_dir"/httrack-*.tar.gz)
shopt -u nullglob
[[ ${#tarballs[@]} -ge 1 ]] || die "make dist produced no tarball"
local tarball=${tarballs[0]##*/}
[[ $tarball == "httrack-$ver.tar.gz" ]] ||
die "changelog version $ver disagrees with built tarball $tarball (configure.ac mismatch?)"
cp -- "$export_dir/$tarball" "$scratch/$orig"
fi
# 3.0 (quilt): orig tarball is upstream-only; debian/ is overlaid on top.
local orig=httrack_${ver}.orig.tar.gz
cp -- "$export_dir/$tarball" "$scratch/$orig"
(
cd "$scratch"
tar -xf "$orig"
[[ -d httrack-$ver ]] || die "orig tarball does not unpack to httrack-$ver/"
cp -a "$export_dir/debian" "httrack-$ver/debian"
)
@@ -179,9 +231,37 @@ main() {
[[ ${#changes[@]} -ge 1 ]] || die "debuild produced no .changes file"
dcmd cp -- "${changes[@]}" "$outdir/"
# Clean-room build gate: rebuild the source package in a minimal chroot that
# holds only the declared Build-Depends, the same way the buildds will. An
# undeclared dependency or any FTBFS aborts the release here instead of
# surfacing after a source-only upload. Logs and clean-built debs land in
# $outdir/sbuild for inspection.
if [[ $sbuild -eq 1 ]]; then
local -a dscs
shopt -s nullglob
dscs=("$scratch"/*.dsc)
shopt -u nullglob
[[ ${#dscs[@]} -ge 1 ]] || die "no .dsc to sbuild"
local dist
dist=$(cd "$scratch/httrack-$ver" && dpkg-parsechangelog -S Distribution)
[[ $dist == UNRELEASED ]] && dist=unstable
info "clean-room build with sbuild (dist $dist)"
local sbdir=$outdir/sbuild
rm -rf -- "$sbdir"
mkdir -p "$sbdir"
(cd "$sbdir" && sbuild --dist="$dist" -- "${dscs[0]}")
info "sbuild clean-room build passed; logs in $sbdir"
fi
# Release artifacts for the upstream tarball (detached sig + checksums).
# A Debian revision >= 2 .changes omits the orig (it is already in the
# archive), so dcmd above won't have copied it; place it from the build tree
# so the website artifacts are produced regardless of the revision.
if [[ $release_artifacts -eq 1 && $unsigned -eq 0 ]]; then
info "signing upstream tarball"
cp -- "$scratch/$orig" "$outdir/$orig"
(
cd "$outdir"
gpg --armor --detach-sign --yes -u "$key" -- "$orig"