Compare commits

..

2 Commits

Author SHA1 Message Date
Xavier Roche
98bacadda7 debian: lead webhttrack browser dep with chromium, not firefox-esr
webhttrack's "firefox-esr | chromium | www-browser" made the httrack
source inherit firefox-esr's autoremoval clock: britney keys off the
first alternative of a disjunction, so every firefox-esr RC bug
(currently #1127569) dragged httrack toward removal even though
webhttrack stays installable via the other alternatives.

Lead with chromium instead. lintian requires a real package as the
first alternative (virtual-package-depends-without-real-package-depends),
so the virtual www-browser cannot go first; chromium is real, keeps the
dep lintian-clean, and makes britney track chromium rather than the
RC-bug-prone firefox-esr.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-27 20:41:08 +02:00
Xavier Roche
669947cd23 Split -%u URL Hacks into independent www/slash/query toggles (#271) (#435)
-%u (--urlhack) bundled three dedup normalizations under one switch:
www.host == host, redundant // collapse, and query-argument reordering.
A mirror that needed one but not another (e.g. keep www. distinct) had to
turn the whole umbrella off. Add three opt-out sub-options, defaulting to
the umbrella so existing -%u/-%u0 behavior is unchanged:

  --keep-www-prefix      keep www.foo.com distinct from foo.com   (-%j)
  --keep-double-slashes  keep redundant // in the path            (-%o)
  --keep-query-order     keep query-argument order significant    (-%y)

The split is resolved once in hash_init() into norm_host/norm_slash/
norm_query and threaded through the dedup hash (htshash.c), the savename
lookup key (htsname.c) and the redirect-loop diagnostic (htsparse.c) so
all three stay consistent. fil_normalized() gains an internal
fil_normalized_ex(do_slash, do_query) core; the public
fil_normalized()/fil_normalized_filtered() keep their signatures.

Normalization (slash/query) now follows urlhack and its sub-flags
uniformly, while --strip-query stays orthogonal. So with urlhack off,
strip-query strips keys without sorting the remainder; the url_savename
urlhack-off branch is moved to the same do_slash=0/do_query=0 normalizer
the hash uses, so a URL is always looked up under the key it was stored
with (a self-lookup mismatch this otherwise introduced).

http/https are always merged in the dedup key (the scheme is stripped
regardless of -%u), so that part of the request needs no toggle.

The opt-outs are spelled positively (--keep-*) because httrack's generic
--no<opt> prefix only appends the disabling "0" for parametered options,
not "single" booleans, so --nowww-dedup would silently no-op.

opt grows three hts_boolean fields appended at the struct tail (offsets
stable, no soname bump, matching the strip_query addition in #112).

Tested by a -#test=urlhack engine self-test (hash_url_equals over each
flag combination) plus a -%u0 + --strip-query crawl case exercising the
urlhack-off savename branch.

Closes #271

Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 20:26:28 +02:00

2
debian/control vendored
View File

@@ -30,7 +30,7 @@ Description: Copy websites to your computer (Offline browser)
Package: webhttrack
Architecture: any
Multi-Arch: foreign
Depends: ${misc:Depends}, ${shlibs:Depends}, webhttrack-common, sensible-utils, firefox-esr | chromium | www-browser
Depends: ${misc:Depends}, ${shlibs:Depends}, webhttrack-common, sensible-utils, chromium | firefox-esr | www-browser
Replaces: webhttrack-common (<< 3.43.9-2)
Breaks: webhttrack-common (<< 3.43.9-2)
Suggests: httrack, httrack-doc