Files
httrack/debian/control
Xavier Roche addbd3136b Use an unknown/unknown sentinel for an absent Content-Type (#412)
#409 distinguished "the server declared text/html" from "no Content-Type,
defaulted to text/html" with a new htsblk.contenttype_given flag, so a
binary-looking URL that really serves HTML is saved .html while a typeless
response keeps its URL extension. That worked on a fresh crawl but had two
costs: the flag was never persisted, so on --update the cache read it as
unset and the names reverted (report.html became report.pdf again, and the
two passes disagreed); and it was an installed-struct ABI break (soname 4,
libhttrack4).

Replace the flag with a sentinel: when no Content-Type is received, store
"unknown/unknown" as the type instead of text/html. The sentinel is treated
as html for every type test (added to is_html_mime_type), so parsing,
storage and filtering of a typeless response are unchanged; only the naming
code (wire_patches_ext) reads it as "no declared type" and keeps the URL
extension. Because the type string rides the cache, an update reads the same
sentinel and names consistently -- the revert is fixed at the source.

The sentinel never reaches a consumer as a real type: a single helper,
hts_effective_mime(), maps it back to text/html wherever a stored type is
derived (give_mimext) or emitted/persisted -- the httrack stdout serve, the
ProxyTrack live serve, and the ProxyTrack .arc export (both the replayed
response header and the index record). The .arc export was caught by an
adversarial spill audit; without the map a typeless page archived via
proxytrack would carry Content-Type: unknown/unknown.

Since the sentinel makes contenttype_given unnecessary, #409's ABI break is
undone: the field is removed, soname returns to 3, and the Debian package
reverts libhttrack4 -> libhttrack3. soname 4 was never released (Debian NEW
carries libhttrack3), so this re-aligns master with the archive rather than
flip-flopping anything downstream.

Tests: 18_local-update re-mirrors and asserts the names survive the update
pass; 15_local-types gains a notype.html negative control; 17_local-empty-ct
stays green. Full make check: 27 pass, 0 fail.

One accepted behavior change: a mime filter matching exactly text/html no
longer matches a typeless response (its type is the sentinel, html-ish but
not literally text/html); the response is still parsed and crawled as html.

Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 10:44:12 +02:00

103 lines
3.9 KiB
Plaintext

Source: httrack
Section: web
Priority: optional
Maintainer: Xavier Roche <roche@httrack.com>
Standards-Version: 4.7.0
Build-Depends: debhelper-compat (= 13), autoconf, autoconf-archive, automake, libtool, zlib1g-dev, libssl-dev
Rules-Requires-Root: no
Homepage: http://www.httrack.com
Vcs-Git: https://github.com/xroche/httrack.git
Vcs-Browser: https://github.com/xroche/httrack
Package: httrack
Architecture: any
Multi-Arch: foreign
Depends: ${misc:Depends}, ${shlibs:Depends}
Suggests: webhttrack, httrack-doc
Description: Copy websites to your computer (Offline browser)
HTTrack is an offline browser utility, allowing you to download a World
Wide website from the Internet to a local directory, building recursively
all directories, getting html, images, and other files from the server to
your computer.
.
HTTrack arranges the original site's relative link-structure. Simply
open a page of the "mirrored" website in your browser, and you can
browse the site from link to link, as if you were viewing it online.
HTTrack can also update an existing mirrored site, and resume
interrupted downloads. HTTrack is fully configurable, and has an
integrated help system.
Package: webhttrack
Architecture: any
Multi-Arch: foreign
Depends: ${misc:Depends}, ${shlibs:Depends}, webhttrack-common, sensible-utils, firefox-esr | chromium | www-browser
Replaces: webhttrack-common (<< 3.43.9-2)
Breaks: webhttrack-common (<< 3.43.9-2)
Suggests: httrack, httrack-doc
Enhances: httrack
Description: Copy websites to your computer, httrack with a Web interface
WebHTTrack is an offline browser utility, allowing you to download a World
Wide website from the Internet to a local directory, building recursively
all directories, getting html, images, and other files from the server to
your computer, using a step-by-step web interface.
.
WebHTTrack arranges the original site's relative link-structure. Simply
open a page of the "mirrored" website in your browser, and you can
browse the site from link to link, as if you were viewing it online.
HTTrack can also update an existing mirrored site, and resume
interrupted downloads. WebHTTrack is fully configurable, and has an
integrated help system.
.
Snapshots: http://www.httrack.com/page/21/
Package: webhttrack-common
Architecture: all
Multi-Arch: foreign
Depends: ${misc:Depends}
Description: webhttrack common files
This package is the common files of webhttrack, website copier and
mirroring utility
Package: libhttrack3
Architecture: any
Multi-Arch: same
Section: libs
Depends: ${misc:Depends}, ${shlibs:Depends}
Replaces: libhttrack2, httrack (<< 3.49.8-2~)
Breaks: libhttrack2, httrack (<< 3.49.8-2~)
Description: Httrack website copier library
This package is the library part of httrack, website copier and mirroring
utility
Package: libhttrack-dev
Architecture: any
Multi-Arch: same
Section: libdevel
Depends: ${misc:Depends}, ${shlibs:Depends}, zlib1g-dev
Description: Httrack website copier includes and development files
This package adds supplemental files for using the httrack website copier
library
Package: httrack-doc
Architecture: all
Multi-Arch: foreign
Section: doc
Depends: ${misc:Depends}
Description: Httrack website copier additional documentation
This package adds supplemental documentation for httrack and webhttrack
as a browsable html documentation
Package: proxytrack
Architecture: any
Multi-Arch: foreign
Depends: ${misc:Depends}, ${shlibs:Depends}
Suggests: squid, httrack
Description: Build HTTP Caches using archived websites copied by HTTrack
ProxyTrack is a simple proxy server aimed to deliver content archived by
HTTrack sessions. It can aggregate multiple download caches, for direct
use (through any browser) or as an upstream cache slave server.
This proxy can handle HTTP/1.1 proxy connections, and is able to reply to
ICPv2 requests for an efficient integration within other cache servers,
such as Squid. It can also handle transparent HTTP requests to allow
cached live connections inside an offline network.