Compare commits

..

27 Commits

Author SHA1 Message Date
Xavier Roche
17fc54869d Allocate exactly one extra byte for cache-buffer NUL terminators
These fread buffers were over-allocated as size+4, a superstitious margin
that never bought anything: every site writes a single trailing NUL at
[size], so size+1 is exactly right. Trim them all to size+1.

The proxytrack disk-fallback read in PT_ReadCache__New_u never wrote that
NUL at all, unlike its sibling read paths in the same file; add the missing
r->adr[r->size] = '\0' so the spare byte is actually used and the buffer is
a valid C string.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-15 09:30:34 +02:00
Xavier Roche
d2e43549d8 Merge pull request #358 from xroche/ci/asan-poison-fill
ci: poison the ASan allocator to surface missing-NUL bugs
2026-06-15 09:19:04 +02:00
Xavier Roche
a9b16d96ea ci: poison the ASan allocator to surface missing-NUL bugs
Fill malloc'd and freed memory with 0xCA in the sanitize job so a buffer
fread into without NUL termination, then used as a C string, runs off into
the redzone instead of stopping at an accidental zero byte. ASan caps its
malloc fill at the first 4096 bytes by default, which lets large cache
buffers escape; max_malloc_fill_size lifts the cap. No rebuild, no source
change -- purely the test environment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-15 09:16:48 +02:00
Xavier Roche
4ed828ff78 Merge pull request #357 from xroche/audit/fread-nul-termination
Fix more un-NUL-terminated fread buffers used as C strings
2026-06-15 09:07:37 +02:00
Xavier Roche
82ace34c4d Add a cache disk-fallback self-test for the NUL-termination invariant
The disk-fallback read (cache_readex with X-In-Cache: 0, body on disk) had no
runtime coverage: the crawl tests never re-read such a body into memory, which
is why the missing terminator there went unnoticed until the audit. Extend the
-#A cache self-test:

- check_entry now asserts every read-back body is NUL-terminated at [size],
  covering the in-zip read paths.
- A new pass stores a non-hypertext record (X-In-Cache: 0), creates the body at
  the exact fconv()-resolved path the reader uses, reads it back through the
  disk-fallback branch, and asserts it round-trips and is terminated.

Verified by reverting the fix: with the terminator removed the new pass fails
("body not NUL-terminated"); with it in place the pass is clean. Runs under the
ASan/UBSan CI job, so it now guards the disk-fallback path that had none.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-15 09:02:37 +02:00
Xavier Roche
3970eb3706 Fix more un-NUL-terminated fread buffers used as C strings
Follow-up audit after the cache strstr() overflow in #356: same pattern of
reading a file or record into a malloc'd buffer and then treating it as a C
string without a terminator.

- cache_readex disk-fallback paths (htscache.c, "previous_save"/"return_save")
  read a record body into malloc(size+4) but, unlike their zip and .dat
  siblings, never set the trailing NUL. The body is later strlen'd
  (htscache.c:923, htscore.c:1046), so an un-terminated one over-reads.
  Terminate it like the siblings do, but only for r.size >= 0: these two paths
  guard the read with `r.size > 0 &&`, so a crafted cache with a negative
  X-Size would otherwise fall through to write *(r.adr + r.size) one byte
  before the allocation (heap underflow). The sibling paths read
  unconditionally and fail the read for a negative size, so they never hit it.
- cache_readdata (HTS_FAST_CACHE) reads the record into malloc(len+4) whose
  comment already reserves the "Plus byte 0" but never set it. Set it (the
  enclosing `len > 0` keeps the write in bounds).
- index_finish (htsindex.c) ran strchr() over a malloc(size+4) buffer read raw
  from the temp index file; a final line without a newline would over-read.
  NUL-terminate before scanning.

All four are exercised under the ASan/UBSan CI job. proxytrack's store.c has the
same structural pattern but never strlen()s the body (it is served as binary),
so it is left as is.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-15 07:23:19 +02:00
Xavier Roche
d3c41b31e8 Merge pull request #356 from xroche/ci/hardening-sanitize-nossl-distcheck
ci: ASan/UBSan, no-openssl, and distcheck jobs (plus the bugs they found)
2026-06-15 06:57:02 +02:00
Xavier Roche
f8367eeac7 Fix heap-buffer-overflow reading the update cache
httpmirror() read hts-cache/new.lst into a malloc(sz) buffer and then ran
strstr() over it to decide which old files to purge. fread() does not
NUL-terminate, so strstr() scanned past the end of the allocation; with the
wrong heap layout it ran into the redzone. ASan caught it as a
heap-buffer-overflow on the cache-read (update) crawl. Whether it tripped
depended on the byte just past the buffer, which is why it surfaced only
intermittently on cold CI runners and never reproduced locally.

Allocate sz + 1 and NUL-terminate after the read, matching the existing
filelist_buff pattern in the same file. Both strstr() calls in the block are
covered.

Found by the new ASan/UBSan CI job.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-15 06:51:17 +02:00
Xavier Roche
9279a4b349 ci: add ASan/UBSan, no-openssl, and distcheck jobs
sanitize: build and run the suite under AddressSanitizer + UndefinedBehavior
Sanitizer, driving the parsers that handle untrusted crawled input. This
surfaced the use-after-free, the numeric-entity overflow, and the coucal
alignment fix in this branch; leak detection is off so the job reports
memory-safety errors rather than exit-time leaks.

no-ssl: build and test with --disable-https (and no libssl installed) so the
#if HTS_USEOPENSSL branches, never compiled by the libssl-equipped matrix, do
not rot.

distcheck: roll the release tarball and build/test it out-of-tree, guarding
against a source missing from *_SOURCES or EXTRA_DIST.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 23:37:59 +02:00
Xavier Roche
b52e8c4c0f Drop EXTRA_DIST wildcards so the dist tarball builds
automake does not expand wildcards in EXTRA_DIST, so "coucal/*" and the
"*.dsp/*.dsw/*.vcproj" globs were left as literal targets that broke
"make dist" (and distcheck) out-of-tree with "No rule to make target
'coucal/*'". List the files explicitly; coucal's .c/.h ship via *_SOURCES
already, so only its aux files (LICENSE, Makefile, README.md, sample.c,
tests.c) plus the Windows project files needed listing. Regenerated
src/Makefile.in.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 23:37:28 +02:00
Xavier Roche
665f51d1a0 Bump coucal: fix misaligned 32-bit loads in MurmurHash3
Picks up the coucal fix that reads each hash block with memcpy instead of
dereferencing an unaligned uint32_t*, clearing a UBSan alignment finding that
fired on nearly every hashtable insert during a crawl.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 23:37:27 +02:00
Xavier Roche
e4e5d4699a Fix signed overflow when decoding large numeric HTML entities
A numeric entity such as &#9999999999; was accumulated digit by digit into an
int with no bound, overflowing once past INT_MAX (undefined behavior). Guard
before each multiply: a value beyond the Unicode maximum (0x10FFFF) is invalid
anyway, so stop and keep the entity literal instead of overflowing. The input
comes straight from crawled pages.

Found by the new ASan/UBSan CI job.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 23:37:27 +02:00
Xavier Roche
a50691c0f8 Fix use-after-free in the HTML post-process path
The post-process step captured a pointer into output_buffer's own storage,
reset the array size to zero, then re-appended that pointer. The append's
realloc (TypedArrayEnsureRoom reallocs unconditionally) could move the block,
leaving the copy reading freed memory. The default callback returns "modified"
without touching the data, so this hit on every crawl; ASan flagged the
use-after-free. glibc usually returns the same pointer on a same-size realloc,
which is why a plain build never crashed.

Only copy when the callback handed back a different buffer. When it edited
output_buffer in place, just adopt the new length.

Found by the new ASan/UBSan CI job.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 23:37:27 +02:00
Xavier Roche
5f96e86818 Merge pull request #355 from xroche/ci/bump-checkout-v5
ci: bump actions/checkout to v6
2026-06-14 23:15:01 +02:00
Xavier Roche
6002bc20ca ci: bump actions/checkout from v4 to v6
Keeps the checkout action on a supported major; v4 runs on the
end-of-life Node 20 runtime, v6 moves to Node 24.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 23:13:06 +02:00
Xavier Roche
bdbc741597 Merge pull request #354 from xroche/ci/mkdeb-single-test
mkdeb: drop the redundant pre-build test pass
2026-06-14 22:22:49 +02:00
Xavier Roche
d0a1b957cd ci: let the deb job run debuild's test pass
The deb job set DEB_BUILD_OPTIONS=nocheck to skip a redundant second test run.
With mkdeb.sh no longer running its own pre-build check, debuild's is the only
test pass, so nocheck would suppress it entirely and CI would never exercise the
packaged build's tests. Drop nocheck; keep noautodbgsym and parallel.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 22:15:51 +02:00
Xavier Roche
6c329744e7 mkdeb: drop the redundant pre-build test pass
mkdeb.sh built and tested the sources twice: once in its own export-tree
pre-build (make check, offline), then again under debuild, whose dh_auto_test
runs the suite with the online tests enabled (debian/rules configures with
--enable-online-unit-tests=auto). The first run was a slower, offline-only
subset of the second.

Drop mkdeb's own make check. The export-tree build stays, since regen-man needs
the compiled binaries, but the suite now runs once, under debuild, as the
superset. This is the same redundancy CI #352 removed via DEB_BUILD_OPTIONS=nocheck;
fixing it in mkdeb.sh applies it to release builds too instead of per-environment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 22:13:23 +02:00
Xavier Roche
1375ef97d7 Merge pull request #353 from xroche/ci/macos-i386
ci: add macOS and 32-bit (i386) build jobs
2026-06-14 22:09:11 +02:00
Xavier Roche
13207a92fc Make the cache and manpage tests POSIX-shell portable
Running the suite on macOS surfaced two GNU/Linux assumptions. The test
harness there resolves $(BASH) to /bin/sh (POSIX mode), and macOS ships
BSD userland, so:

- 01_engine-cache used "du -sb"; the -b (apparent bytes) flag is GNU-only
  and BSD/macOS du rejects it, leaving an empty size and an "integer
  expression expected" error. Switch to portable "du -sk" (1024-byte
  units); block-allocated size is an upper bound, fine for a ceiling.

- 02_manpage-regen used diff with process substitution, which a POSIX
  /bin/sh does not parse. Stage the stripped inputs in temp files instead.

Both now pass under dash as well as bash, on Linux and macOS.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 22:05:31 +02:00
Xavier Roche
d3eecbf211 Gate the GNU-ld libc-force flag behind a linker check
-Wl,--push-state,--no-as-needed,-lc,--pop-state forces libc back into
DT_NEEDED for libraries that reach it only through libhttrack: the
libhtsjava JNI wrapper and the libtest callback examples. The flag is
GNU-ld-specific; Apple's ld rejects it ("ld: unknown options:
--push-state --no-as-needed --pop-state"), breaking the macOS build, and
doesn't need it (every dylib links libSystem anyway).

Probe it once with AX_CHECK_LINK_FLAG and emit it via LIBC_FORCE_LINK
only where the linker accepts it. On GNU/Linux the flag is still applied
and libc.so.6 stays in DT_NEEDED, so behavior is unchanged there.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 21:57:56 +02:00
Xavier Roche
7ec77156d0 ci: add macOS and 32-bit (i386) build jobs
Two cheap portability targets that need no VM or second CI provider:

- macOS (Darwin/clang) on a native macos-14 runner. The tree has no
  __APPLE__ branches, so Darwin runs the generic-Unix path against a
  second libc and kernel. brew's openssl@3 is keg-only, so configure is
  pointed at it via CPPFLAGS/LDFLAGS.

- 32-bit i386 via multilib on the existing x86-64 runner. Exercises the
  32-bit size_t/pointer ABI, where size and bounds math can truncate or
  wrap in ways 64-bit never shows. --build (not --host) keeps configure
  out of cross mode so the i386 binary still runs the test suite.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 21:38:49 +02:00
Xavier Roche
3cd8197cc7 Merge pull request #352 from xroche/ci/deb-faster
ci: speed up the deb package job
2026-06-14 21:33:53 +02:00
Xavier Roche
37f50bb925 ci: speed up the deb package job
The deb job spent ~3m19s in the build step, half of it on work CI does not
need. The package build (via mkdeb.sh) ran the full test suite a second time
with online/network unit tests enabled (~54s), and compressed the large LTO
-dbgsym packages that CI throws away (~48s).

Set DEB_BUILD_OPTIONS=nocheck,noautodbgsym,parallel=N on the CI step only.
nocheck skips debuild's make check, which is redundant here: the build matrix
already runs the suite on every config and mkdeb.sh's own pre-build runs the
offline tests. noautodbgsym drops the -dbgsym packages. parallel uses every
runner core. mkdeb.sh is unchanged, so release builds still build with LTO,
full tests, and debug symbols; only the CI environment differs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 21:31:03 +02:00
Xavier Roche
d8d1eafcd1 Merge pull request #351 from xroche/feature/appstream-metainfo
Ship AppStream MetaInfo for WebHTTrack
2026-06-14 21:30:28 +02:00
Xavier Roche
80d0e90819 Merge pull request #350 from xroche/fix/webhttrack-browser-deps
debian: refresh stale webhttrack browser dependency
2026-06-14 21:26:31 +02:00
Xavier Roche
8dde8dc03c Ship AppStream MetaInfo for WebHTTrack
The Debian AppStream generator flagged both webhttrack desktop entries as
no-metainfo: with no MetaInfo file, the catalog entry is synthesized from
the .desktop file and the package description, which is deprecated and risks
the app being dropped from the metadata catalog.

Add com.httrack.WebHTTrack.metainfo.xml (installed to share/metainfo) for the
main app, launching WebHTTrack.desktop. Mark the secondary "Browse Mirrored
Websites" launcher with X-AppStream-Ignore=true so it doesn't produce a
duplicate, metadata-less catalog entry.

Validated with appstreamcli validate and desktop-file-validate.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 21:24:45 +02:00
29 changed files with 603 additions and 77 deletions

View File

@@ -31,7 +31,7 @@ jobs:
env:
CC: ${{ matrix.cc }}
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
with:
submodules: recursive
@@ -61,6 +61,169 @@ jobs:
if: failure()
run: cat tests/test-suite.log 2>/dev/null || true
# Portability: build and test on macOS (Darwin/clang) on a native runner --
# no VM. The tree has no __APPLE__ branches, so Darwin exercises the
# generic-Unix path on a second libc and kernel. brew's openssl@3 is keg-only,
# so point configure at it; everything else is in the SDK or default paths.
macos:
name: build (macOS arm64, clang)
runs-on: macos-14
steps:
- uses: actions/checkout@v6
with:
submodules: recursive
- name: Install build dependencies
run: |
set -euo pipefail
brew install autoconf automake libtool autoconf-archive
- name: Configure
run: |
set -euo pipefail
ssl="$(brew --prefix openssl@3)"
autoreconf -fi
./configure CPPFLAGS="-I${ssl}/include" LDFLAGS="-L${ssl}/lib"
- name: Build
run: make -j"$(sysctl -n hw.ncpu)"
- name: Test
run: make check
- name: Print the test log on failure
if: failure()
run: cat tests/test-suite.log 2>/dev/null || true
# Portability/hardening: 32-bit (i386) build on the x86-64 runner via multilib
# -- no extra hardware. Exercises the 32-bit size_t/pointer ABI, where size
# and bounds math can truncate or wrap in ways 64-bit never reveals (the axis
# the overflow-safe bounds work targets). --build (not --host) keeps configure
# out of cross mode, so the i386 binary still runs the test suite here.
linux-i386:
name: build (linux i386, gcc -m32)
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v6
with:
submodules: recursive
- name: Install build dependencies (multilib + 32-bit libs)
run: |
set -euo pipefail
sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential gcc-multilib autoconf automake libtool \
autoconf-archive zlib1g-dev:i386 libssl-dev:i386
- name: Configure
run: |
set -euo pipefail
autoreconf -fi
./configure --build=i686-pc-linux-gnu CC="gcc -m32"
- name: Build
run: make -j"$(nproc)"
- name: Test
run: make check
- name: Print the test log on failure
if: failure()
run: cat tests/test-suite.log 2>/dev/null || true
# Memory safety: build and run the suite under AddressSanitizer +
# UndefinedBehaviorSanitizer. The offline engine self-tests drive the parsers
# that chew on untrusted crawled input (charset, mime, HTML, entities, IDNA,
# filters, cache) straight through the sanitizers, so a buffer overrun,
# use-after-free, or signed overflow there fails the build instead of slipping
# past a plain -O2 build. gcc's runtimes; one job is enough (the bug class is
# arch-independent and the matrix already covers compile portability).
sanitize:
name: sanitize (ASan+UBSan, gcc)
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v6
with:
submodules: recursive
- name: Install build dependencies
run: |
set -euo pipefail
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential autoconf automake libtool autoconf-archive \
zlib1g-dev libssl-dev
- name: Configure (sanitized)
run: |
set -euo pipefail
autoreconf -fi
./configure CC=gcc \
CFLAGS="-fsanitize=address,undefined -fno-sanitize-recover=all -g -O1 -fno-omit-frame-pointer" \
LDFLAGS="-fsanitize=address,undefined"
- name: Build
run: make -j"$(nproc)"
- name: Test (sanitized)
# Leaks at exit are out of scope (the CLI frees little on the way out);
# we want memory-safety errors, so turn leak detection off and make every
# other finding abort the run.
#
# Poison fresh allocations with 0xCA and freed blocks with 0xCB (decimal
# 202/203) so memory never reads back as accidental zeros: a missing-NUL
# fread buffer then runs strlen off into the redzone instead of stopping
# at a lucky zero. Distinct bytes tell the two apart in a dump (0xCA =
# uninitialized, 0xCB = use-after-free). ASan caps its malloc fill at 4096
# bytes by default, so max_malloc_fill_size lifts it to cover large cache
# buffers; free_fill flags use-after-free reads.
env:
ASAN_OPTIONS: detect_leaks=0:abort_on_error=1:halt_on_error=1:strict_string_checks=1:malloc_fill_byte=202:max_malloc_fill_size=2147483647:free_fill_byte=203:max_free_fill_size=2147483647
UBSAN_OPTIONS: print_stacktrace=1:halt_on_error=1
run: make check
- name: Print the test log on failure
if: failure()
run: cat tests/test-suite.log 2>/dev/null || true
# Optional-dependency build: compile and test with HTTPS/OpenSSL disabled --
# the configuration users on minimal systems build, and one libssl is not even
# installed here so configure cannot silently re-enable it. The matrix above
# always has libssl, so the #if HTS_USEOPENSSL branches would otherwise never
# be compiled and could rot unnoticed.
no-ssl:
name: build (no openssl, --disable-https)
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v6
with:
submodules: recursive
- name: Install build dependencies (no libssl)
run: |
set -euo pipefail
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential autoconf automake libtool autoconf-archive zlib1g-dev
- name: Configure (https disabled)
run: |
set -euo pipefail
autoreconf -fi
./configure --disable-https
- name: Build
run: make -j"$(nproc)"
- name: Test
run: make check
- name: Print the test log on failure
if: failure()
run: cat tests/test-suite.log 2>/dev/null || true
# Validate the Debian packaging via the same script maintainers release with.
# One amd64/gcc run is enough: packaging (control/rules/manifest/lintian/quilt
# source build) is arch- and compiler-independent, and the build matrix above
@@ -69,7 +232,7 @@ jobs:
name: deb package (lintian)
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
with:
submodules: recursive
@@ -84,8 +247,44 @@ jobs:
# --unsigned: CI has no GPG key (also skips the release sig/checksums).
# debuild builds every package, then lintian gates on errors.
#
# DEB_BUILD_OPTIONS trims work CI does not need (release builds via
# mkdeb.sh are untouched): noautodbgsym drops the -dbgsym packages whose
# LTO payloads are slow to compress and that CI never ships; parallel uses
# every core. We let debuild run its test pass -- the only one now that
# mkdeb no longer runs its own -- so CI exercises the packaged tests.
- name: Build Debian packages
run: bash tools/mkdeb.sh --unsigned --no-release-artifacts
run: |
export DEB_BUILD_OPTIONS="noautodbgsym parallel=$(nproc)"
bash tools/mkdeb.sh --unsigned --no-release-artifacts
# Release-tarball integrity: `make distcheck` rolls the dist tarball, then
# configures, builds and tests it out-of-tree from a read-only source tree and
# checks nothing is left behind. Catches a file referenced in *_SOURCES or
# EXTRA_DIST but missing from the tarball -- the same "ships broken to users"
# class as a stale committed Makefile.in.
distcheck:
name: distcheck (release tarball)
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v6
with:
submodules: recursive
- name: Install build dependencies
run: |
set -euo pipefail
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential autoconf automake libtool autoconf-archive \
zlib1g-dev libssl-dev
- name: distcheck
run: |
set -euo pipefail
autoreconf -fi
./configure
make -j"$(nproc)" distcheck
dco:
name: DCO sign-off
@@ -93,7 +292,7 @@ jobs:
if: github.event_name == 'pull_request'
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
with:
fetch-depth: 0
@@ -122,7 +321,7 @@ jobs:
name: lint (shellcheck, shfmt)
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: Install linters
env:
@@ -151,7 +350,7 @@ jobs:
if: github.event_name == 'pull_request'
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
with:
fetch-depth: 0

View File

@@ -257,6 +257,7 @@ LD = @LD@
LDFLAGS = @LDFLAGS@
LDFLAGS_PIE = @LDFLAGS_PIE@
LFS_FLAG = @LFS_FLAG@
LIBC_FORCE_LINK = @LIBC_FORCE_LINK@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@

48
configure vendored
View File

@@ -695,6 +695,7 @@ HAVE_VISIBILITY
CFLAG_VISIBILITY
LDFLAGS_PIE
CFLAGS_PIE
LIBC_FORCE_LINK
DEFAULT_LDFLAGS
DEFAULT_CFLAGS
VERSION_INFO
@@ -15871,6 +15872,53 @@ esac
fi
# Force libc back into DT_NEEDED for libraries that reach it only through
# libhttrack (libhtsjava, the libtest callbacks), but only with a GNU-style
# linker; Apple ld rejects these flags and links libSystem unconditionally.
{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the linker accepts -Wl,--push-state,--no-as-needed,-lc,--pop-state" >&5
printf %s "checking whether the linker accepts -Wl,--push-state,--no-as-needed,-lc,--pop-state... " >&6; }
if test ${ax_cv_check_ldflags___Wl___push_state___no_as_needed__lc___pop_state+y}
then :
printf %s "(cached) " >&6
else case e in #(
e)
ax_check_save_flags=$LDFLAGS
LDFLAGS="$LDFLAGS -Wl,--push-state,--no-as-needed,-lc,--pop-state"
cat confdefs.h - <<_ACEOF >conftest.$ac_ext
/* end confdefs.h. */
int
main (void)
{
;
return 0;
}
_ACEOF
if ac_fn_c_try_link "$LINENO"
then :
ax_cv_check_ldflags___Wl___push_state___no_as_needed__lc___pop_state=yes
else case e in #(
e) ax_cv_check_ldflags___Wl___push_state___no_as_needed__lc___pop_state=no ;;
esac
fi
rm -f core conftest.err conftest.$ac_objext conftest.beam \
conftest$ac_exeext conftest.$ac_ext
LDFLAGS=$ax_check_save_flags ;;
esac
fi
{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_check_ldflags___Wl___push_state___no_as_needed__lc___pop_state" >&5
printf "%s\n" "$ax_cv_check_ldflags___Wl___push_state___no_as_needed__lc___pop_state" >&6; }
if test "x$ax_cv_check_ldflags___Wl___push_state___no_as_needed__lc___pop_state" = xyes
then :
LIBC_FORCE_LINK="-Wl,--push-state,--no-as-needed,-lc,--pop-state"
else case e in #(
e) : ;;
esac
fi
### PIE
CFLAGS_PIE=""
LDFLAGS_PIE=""

View File

@@ -91,6 +91,13 @@ AX_CHECK_LINK_FLAG([-Wl,--no-undefined], [DEFAULT_LDFLAGS="$DEFAULT_LDFLAGS -Wl,
AX_CHECK_LINK_FLAG([-Wl,-z,relro,-z,now], [DEFAULT_LDFLAGS="$DEFAULT_LDFLAGS -Wl,-z,relro,-z,now"])
AX_CHECK_LINK_FLAG([-Wl,-z,noexecstack], [DEFAULT_LDFLAGS="$DEFAULT_LDFLAGS -Wl,-z,noexecstack"])
# Force libc back into DT_NEEDED for libraries that reach it only through
# libhttrack (libhtsjava, the libtest callbacks), but only with a GNU-style
# linker; Apple ld rejects these flags and links libSystem unconditionally.
AX_CHECK_LINK_FLAG([-Wl,--push-state,--no-as-needed,-lc,--pop-state],
[LIBC_FORCE_LINK="-Wl,--push-state,--no-as-needed,-lc,--pop-state"])
AC_SUBST([LIBC_FORCE_LINK])
### PIE
CFLAGS_PIE=""
LDFLAGS_PIE=""

View File

@@ -4,3 +4,4 @@ usr/share/man/man1/webhttrack.1
usr/share/man/man1/htsserver.1
usr/share/applications/WebHTTrack-Websites.desktop
usr/share/applications/WebHTTrack.desktop
usr/share/metainfo/com.httrack.WebHTTrack.metainfo.xml

View File

@@ -12,6 +12,7 @@ WebIcon16x16dir = $(datadir)/icons/hicolor/16x16/apps
WebIcon32x32dir = $(datadir)/icons/hicolor/32x32/apps
WebIcon48x48dir = $(datadir)/icons/hicolor/48x48/apps
VFolderEntrydir = $(prefix)/share/applications
MetaInfodir = $(datadir)/metainfo
# Wildcards are globbed against $(srcdir): a bare "*.html" is resolved against
# the build dir and stays unexpanded (breaking "make") in an out-of-tree build.
@@ -33,11 +34,12 @@ WebIcon16x16_DATA = $(srcdir)/server/div/16x16/*.png
WebIcon32x32_DATA = $(srcdir)/server/div/32x32/*.png
WebIcon48x48_DATA = $(srcdir)/server/div/48x48/*.png
VFolderEntry_DATA = $(srcdir)/server/div/*.desktop
MetaInfo_DATA = $(srcdir)/server/div/*.metainfo.xml
EXTRA_DIST = $(HelpHtml_DATA) $(HelpHtmlimg_DATA) $(HelpHtmlimages_DATA) \
$(HelpHtmldiv_DATA) $(WebHtml_DATA) $(WebHtmlimages_DATA) \
$(WebPixmap_DATA) $(WebIcon16x16_DATA) $(WebIcon32x32_DATA) $(WebIcon48x48_DATA) \
$(VFolderEntry_DATA) \
$(VFolderEntry_DATA) $(MetaInfo_DATA) \
httrack.css
install-data-hook:

View File

@@ -152,14 +152,15 @@ am__uninstall_files_from_dir = { \
am__installdirs = "$(DESTDIR)$(HelpHtmldir)" \
"$(DESTDIR)$(HelpHtmlTxtdir)" "$(DESTDIR)$(HelpHtmldivdir)" \
"$(DESTDIR)$(HelpHtmlimagesdir)" "$(DESTDIR)$(HelpHtmlimgdir)" \
"$(DESTDIR)$(HelpHtmlrootdir)" "$(DESTDIR)$(VFolderEntrydir)" \
"$(DESTDIR)$(WebHtmldir)" "$(DESTDIR)$(WebHtmlimagesdir)" \
"$(DESTDIR)$(WebIcon16x16dir)" "$(DESTDIR)$(WebIcon32x32dir)" \
"$(DESTDIR)$(WebIcon48x48dir)" "$(DESTDIR)$(WebPixmapdir)"
"$(DESTDIR)$(HelpHtmlrootdir)" "$(DESTDIR)$(MetaInfodir)" \
"$(DESTDIR)$(VFolderEntrydir)" "$(DESTDIR)$(WebHtmldir)" \
"$(DESTDIR)$(WebHtmlimagesdir)" "$(DESTDIR)$(WebIcon16x16dir)" \
"$(DESTDIR)$(WebIcon32x32dir)" "$(DESTDIR)$(WebIcon48x48dir)" \
"$(DESTDIR)$(WebPixmapdir)"
DATA = $(HelpHtml_DATA) $(HelpHtmlTxt_DATA) $(HelpHtmldiv_DATA) \
$(HelpHtmlimages_DATA) $(HelpHtmlimg_DATA) \
$(HelpHtmlroot_DATA) $(VFolderEntry_DATA) $(WebHtml_DATA) \
$(WebHtmlimages_DATA) $(WebIcon16x16_DATA) \
$(HelpHtmlroot_DATA) $(MetaInfo_DATA) $(VFolderEntry_DATA) \
$(WebHtml_DATA) $(WebHtmlimages_DATA) $(WebIcon16x16_DATA) \
$(WebIcon32x32_DATA) $(WebIcon48x48_DATA) $(WebPixmap_DATA)
am__tagged_files = $(HEADERS) $(SOURCES) $(TAGS_FILES) $(LISP)
am__DIST_COMMON = $(srcdir)/Makefile.in
@@ -212,6 +213,7 @@ LD = @LD@
LDFLAGS = @LDFLAGS@
LDFLAGS_PIE = @LDFLAGS_PIE@
LFS_FLAG = @LFS_FLAG@
LIBC_FORCE_LINK = @LIBC_FORCE_LINK@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@
@@ -320,6 +322,7 @@ WebIcon16x16dir = $(datadir)/icons/hicolor/16x16/apps
WebIcon32x32dir = $(datadir)/icons/hicolor/32x32/apps
WebIcon48x48dir = $(datadir)/icons/hicolor/48x48/apps
VFolderEntrydir = $(prefix)/share/applications
MetaInfodir = $(datadir)/metainfo
# Wildcards are globbed against $(srcdir): a bare "*.html" is resolved against
# the build dir and stays unexpanded (breaking "make") in an out-of-tree build.
@@ -341,10 +344,11 @@ WebIcon16x16_DATA = $(srcdir)/server/div/16x16/*.png
WebIcon32x32_DATA = $(srcdir)/server/div/32x32/*.png
WebIcon48x48_DATA = $(srcdir)/server/div/48x48/*.png
VFolderEntry_DATA = $(srcdir)/server/div/*.desktop
MetaInfo_DATA = $(srcdir)/server/div/*.metainfo.xml
EXTRA_DIST = $(HelpHtml_DATA) $(HelpHtmlimg_DATA) $(HelpHtmlimages_DATA) \
$(HelpHtmldiv_DATA) $(WebHtml_DATA) $(WebHtmlimages_DATA) \
$(WebPixmap_DATA) $(WebIcon16x16_DATA) $(WebIcon32x32_DATA) $(WebIcon48x48_DATA) \
$(VFolderEntry_DATA) \
$(VFolderEntry_DATA) $(MetaInfo_DATA) \
httrack.css
all: all-am
@@ -511,6 +515,27 @@ uninstall-HelpHtmlrootDATA:
@list='$(HelpHtmlroot_DATA)'; test -n "$(HelpHtmlrootdir)" || list=; \
files=`for p in $$list; do echo $$p; done | sed -e 's|^.*/||'`; \
dir='$(DESTDIR)$(HelpHtmlrootdir)'; $(am__uninstall_files_from_dir)
install-MetaInfoDATA: $(MetaInfo_DATA)
@$(NORMAL_INSTALL)
@list='$(MetaInfo_DATA)'; test -n "$(MetaInfodir)" || list=; \
if test -n "$$list"; then \
echo " $(MKDIR_P) '$(DESTDIR)$(MetaInfodir)'"; \
$(MKDIR_P) "$(DESTDIR)$(MetaInfodir)" || exit 1; \
fi; \
for p in $$list; do \
if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \
echo "$$d$$p"; \
done | $(am__base_list) | \
while read files; do \
echo " $(INSTALL_DATA) $$files '$(DESTDIR)$(MetaInfodir)'"; \
$(INSTALL_DATA) $$files "$(DESTDIR)$(MetaInfodir)" || exit $$?; \
done
uninstall-MetaInfoDATA:
@$(NORMAL_UNINSTALL)
@list='$(MetaInfo_DATA)'; test -n "$(MetaInfodir)" || list=; \
files=`for p in $$list; do echo $$p; done | sed -e 's|^.*/||'`; \
dir='$(DESTDIR)$(MetaInfodir)'; $(am__uninstall_files_from_dir)
install-VFolderEntryDATA: $(VFolderEntry_DATA)
@$(NORMAL_INSTALL)
@list='$(VFolderEntry_DATA)'; test -n "$(VFolderEntrydir)" || list=; \
@@ -701,7 +726,7 @@ check-am: all-am
check: check-am
all-am: Makefile $(DATA)
installdirs:
for dir in "$(DESTDIR)$(HelpHtmldir)" "$(DESTDIR)$(HelpHtmlTxtdir)" "$(DESTDIR)$(HelpHtmldivdir)" "$(DESTDIR)$(HelpHtmlimagesdir)" "$(DESTDIR)$(HelpHtmlimgdir)" "$(DESTDIR)$(HelpHtmlrootdir)" "$(DESTDIR)$(VFolderEntrydir)" "$(DESTDIR)$(WebHtmldir)" "$(DESTDIR)$(WebHtmlimagesdir)" "$(DESTDIR)$(WebIcon16x16dir)" "$(DESTDIR)$(WebIcon32x32dir)" "$(DESTDIR)$(WebIcon48x48dir)" "$(DESTDIR)$(WebPixmapdir)"; do \
for dir in "$(DESTDIR)$(HelpHtmldir)" "$(DESTDIR)$(HelpHtmlTxtdir)" "$(DESTDIR)$(HelpHtmldivdir)" "$(DESTDIR)$(HelpHtmlimagesdir)" "$(DESTDIR)$(HelpHtmlimgdir)" "$(DESTDIR)$(HelpHtmlrootdir)" "$(DESTDIR)$(MetaInfodir)" "$(DESTDIR)$(VFolderEntrydir)" "$(DESTDIR)$(WebHtmldir)" "$(DESTDIR)$(WebHtmlimagesdir)" "$(DESTDIR)$(WebIcon16x16dir)" "$(DESTDIR)$(WebIcon32x32dir)" "$(DESTDIR)$(WebIcon48x48dir)" "$(DESTDIR)$(WebPixmapdir)"; do \
test -z "$$dir" || $(MKDIR_P) "$$dir"; \
done
install: install-am
@@ -757,10 +782,10 @@ info-am:
install-data-am: install-HelpHtmlDATA install-HelpHtmlTxtDATA \
install-HelpHtmldivDATA install-HelpHtmlimagesDATA \
install-HelpHtmlimgDATA install-HelpHtmlrootDATA \
install-VFolderEntryDATA install-WebHtmlDATA \
install-WebHtmlimagesDATA install-WebIcon16x16DATA \
install-WebIcon32x32DATA install-WebIcon48x48DATA \
install-WebPixmapDATA
install-MetaInfoDATA install-VFolderEntryDATA \
install-WebHtmlDATA install-WebHtmlimagesDATA \
install-WebIcon16x16DATA install-WebIcon32x32DATA \
install-WebIcon48x48DATA install-WebPixmapDATA
@$(NORMAL_INSTALL)
$(MAKE) $(AM_MAKEFLAGS) install-data-hook
install-dvi: install-dvi-am
@@ -808,10 +833,10 @@ ps-am:
uninstall-am: uninstall-HelpHtmlDATA uninstall-HelpHtmlTxtDATA \
uninstall-HelpHtmldivDATA uninstall-HelpHtmlimagesDATA \
uninstall-HelpHtmlimgDATA uninstall-HelpHtmlrootDATA \
uninstall-VFolderEntryDATA uninstall-WebHtmlDATA \
uninstall-WebHtmlimagesDATA uninstall-WebIcon16x16DATA \
uninstall-WebIcon32x32DATA uninstall-WebIcon48x48DATA \
uninstall-WebPixmapDATA
uninstall-MetaInfoDATA uninstall-VFolderEntryDATA \
uninstall-WebHtmlDATA uninstall-WebHtmlimagesDATA \
uninstall-WebIcon16x16DATA uninstall-WebIcon32x32DATA \
uninstall-WebIcon48x48DATA uninstall-WebPixmapDATA
.MAKE: install-am install-data-am install-strip
@@ -821,20 +846,21 @@ uninstall-am: uninstall-HelpHtmlDATA uninstall-HelpHtmlTxtDATA \
install install-HelpHtmlDATA install-HelpHtmlTxtDATA \
install-HelpHtmldivDATA install-HelpHtmlimagesDATA \
install-HelpHtmlimgDATA install-HelpHtmlrootDATA \
install-VFolderEntryDATA install-WebHtmlDATA \
install-WebHtmlimagesDATA install-WebIcon16x16DATA \
install-WebIcon32x32DATA install-WebIcon48x48DATA \
install-WebPixmapDATA install-am install-data install-data-am \
install-data-hook install-dvi install-dvi-am install-exec \
install-exec-am install-html install-html-am install-info \
install-info-am install-man install-pdf install-pdf-am \
install-ps install-ps-am install-strip installcheck \
installcheck-am installdirs maintainer-clean \
maintainer-clean-generic mostlyclean mostlyclean-generic \
mostlyclean-libtool pdf pdf-am ps ps-am tags-am uninstall \
uninstall-HelpHtmlDATA uninstall-HelpHtmlTxtDATA \
uninstall-HelpHtmldivDATA uninstall-HelpHtmlimagesDATA \
uninstall-HelpHtmlimgDATA uninstall-HelpHtmlrootDATA \
install-MetaInfoDATA install-VFolderEntryDATA \
install-WebHtmlDATA install-WebHtmlimagesDATA \
install-WebIcon16x16DATA install-WebIcon32x32DATA \
install-WebIcon48x48DATA install-WebPixmapDATA install-am \
install-data install-data-am install-data-hook install-dvi \
install-dvi-am install-exec install-exec-am install-html \
install-html-am install-info install-info-am install-man \
install-pdf install-pdf-am install-ps install-ps-am \
install-strip installcheck installcheck-am installdirs \
maintainer-clean maintainer-clean-generic mostlyclean \
mostlyclean-generic mostlyclean-libtool pdf pdf-am ps ps-am \
tags-am uninstall uninstall-HelpHtmlDATA \
uninstall-HelpHtmlTxtDATA uninstall-HelpHtmldivDATA \
uninstall-HelpHtmlimagesDATA uninstall-HelpHtmlimgDATA \
uninstall-HelpHtmlrootDATA uninstall-MetaInfoDATA \
uninstall-VFolderEntryDATA uninstall-WebHtmlDATA \
uninstall-WebHtmlimagesDATA uninstall-WebIcon16x16DATA \
uninstall-WebIcon32x32DATA uninstall-WebIcon48x48DATA \

View File

@@ -8,3 +8,6 @@ Comment=Browse Websites Mirrored by WebHTTrack
Keywords=browse mirrored;
Exec=webhttrack browse
Icon=httrack
# Helper launcher for WebHTTrack's browse mode, not a standalone app: keep it
# out of software-center catalogs so it doesn't duplicate the main entry.
X-AppStream-Ignore=true

View File

@@ -0,0 +1,55 @@
<?xml version="1.0" encoding="UTF-8"?>
<!-- Copyright 2026 Xavier Roche <roche@httrack.com> -->
<component type="desktop-application">
<id>com.httrack.WebHTTrack</id>
<metadata_license>FSFAP</metadata_license>
<project_license>GPL-3.0-or-later</project_license>
<name>WebHTTrack Website Copier</name>
<summary>Copy websites to your computer for offline browsing</summary>
<description>
<p>
WebHTTrack is the web interface to HTTrack, an offline browser utility.
It downloads a website from the Internet to a local directory, fetching
the HTML, images, and other files and rebuilding the site's link
structure so you can browse it offline.
</p>
<p>
A step-by-step web interface guides you through choosing the addresses
to mirror and the options to apply. Mirrors can be updated in place and
interrupted downloads resumed.
</p>
<p>Typical uses include:</p>
<ul>
<li>Keeping an offline copy of a website for reading without a connection</li>
<li>Archiving or preserving sites and capturing them for later reference</li>
<li>Updating an existing local mirror without downloading it again</li>
</ul>
</description>
<launchable type="desktop-id">WebHTTrack.desktop</launchable>
<icon type="stock">httrack</icon>
<categories>
<category>Network</category>
</categories>
<keywords>
<keyword>offline browser</keyword>
<keyword>website copier</keyword>
<keyword>mirror</keyword>
<keyword>crawl</keyword>
<keyword>archiving</keyword>
</keywords>
<url type="homepage">https://www.httrack.com/</url>
<url type="bugtracker">https://github.com/xroche/httrack/issues</url>
<developer id="com.httrack">
<name>Xavier Roche</name>
</developer>
<screenshots>
<screenshot type="default">
<caption>Choosing the addresses and options for a new mirror</caption>
<image>https://www.httrack.com/html/images/screenshot_01b.jpg</image>
</screenshot>
</screenshots>
<content_rating type="oars-1.1"/>
<releases>
<release version="3.49.8" date="2026-06-07"/>
</releases>
</component>

View File

@@ -202,6 +202,7 @@ LD = @LD@
LDFLAGS = @LDFLAGS@
LDFLAGS_PIE = @LDFLAGS_PIE@
LFS_FLAG = @LFS_FLAG@
LIBC_FORCE_LINK = @LIBC_FORCE_LINK@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@

View File

@@ -20,11 +20,12 @@ AM_CPPFLAGS += -I$(top_srcdir)/src
# The callback examples reference libc only through libhttrack, so the direct
# libc edge gets dropped from DT_NEEDED (library-not-linked-against-libc).
# Force libc to be recorded as a dependency.
# Force libc back; configure gates the flag since only a GNU-style linker
# accepts it (LIBC_FORCE_LINK is empty on e.g. macOS).
AM_LDFLAGS = \
@DEFAULT_LDFLAGS@ \
-L../src \
-Wl,--push-state,--no-as-needed,-lc,--pop-state
@LIBC_FORCE_LINK@
# Examples
libbaselinks_la_SOURCES = callbacks-example-baselinks.c

View File

@@ -344,6 +344,7 @@ LD = @LD@
LDFLAGS = @LDFLAGS@
LDFLAGS_PIE = @LDFLAGS_PIE@
LFS_FLAG = @LFS_FLAG@
LIBC_FORCE_LINK = @LIBC_FORCE_LINK@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@
@@ -453,11 +454,12 @@ AM_CPPFLAGS = @DEFAULT_CFLAGS@ @THREADS_CFLAGS@ @V6_FLAG@ @LFS_FLAG@ \
# The callback examples reference libc only through libhttrack, so the direct
# libc edge gets dropped from DT_NEEDED (library-not-linked-against-libc).
# Force libc to be recorded as a dependency.
# Force libc back; configure gates the flag since only a GNU-style linker
# accepts it (LIBC_FORCE_LINK is empty on e.g. macOS).
AM_LDFLAGS = \
@DEFAULT_LDFLAGS@ \
-L../src \
-Wl,--push-state,--no-as-needed,-lc,--pop-state
@LIBC_FORCE_LINK@
# Examples

View File

@@ -173,6 +173,7 @@ LD = @LD@
LDFLAGS = @LDFLAGS@
LDFLAGS_PIE = @LDFLAGS_PIE@
LFS_FLAG = @LFS_FLAG@
LIBC_FORCE_LINK = @LIBC_FORCE_LINK@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@

View File

@@ -203,6 +203,7 @@ LD = @LD@
LDFLAGS = @LDFLAGS@
LDFLAGS_PIE = @LDFLAGS_PIE@
LFS_FLAG = @LFS_FLAG@
LIBC_FORCE_LINK = @LIBC_FORCE_LINK@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@

View File

@@ -86,8 +86,9 @@ libhtsjava_la_SOURCES = htsjava.c htsjava.h
libhtsjava_la_LIBADD = $(THREADS_LIBS) $(DL_LIBS) libhttrack.la
# This thin JNI wrapper reaches libc only through libhttrack, so the direct
# libc edge is dropped from DT_NEEDED (library-not-linked-against-libc). Force
# libc to be recorded as a dependency.
libhtsjava_la_LDFLAGS = $(AM_LDFLAGS) -version-info $(VERSION_INFO) -Wl,--push-state,--no-as-needed,-lc,--pop-state
# libc back as a dependency; configure gates the flag since only a GNU-style
# linker accepts it (LIBC_FORCE_LINK is empty on e.g. macOS).
libhtsjava_la_LDFLAGS = $(AM_LDFLAGS) -version-info $(VERSION_INFO) $(LIBC_FORCE_LINK)
EXTRA_DIST = httrack.h webhttrack \
coucal/murmurhash3.h.diff \
@@ -113,5 +114,12 @@ EXTRA_DIST = httrack.h webhttrack \
proxy/proxytrack.h \
proxy/store.h \
proxy/proxytrack.vcproj \
coucal/* \
*.dsw *.dsp *.vcproj
coucal/LICENSE \
coucal/Makefile \
coucal/README.md \
coucal/sample.c \
coucal/tests.c \
htsjava.vcproj \
httrack.dsp httrack.dsw httrack.vcproj \
libhttrack.dsp libhttrack.dsw libhttrack.vcproj \
webhttrack.dsp webhttrack.dsw webhttrack.vcproj

View File

@@ -361,6 +361,7 @@ LD = @LD@
LDFLAGS = @LDFLAGS@
LDFLAGS_PIE = @LDFLAGS_PIE@
LFS_FLAG = @LFS_FLAG@
LIBC_FORCE_LINK = @LIBC_FORCE_LINK@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@
@@ -537,8 +538,9 @@ libhtsjava_la_SOURCES = htsjava.c htsjava.h
libhtsjava_la_LIBADD = $(THREADS_LIBS) $(DL_LIBS) libhttrack.la
# This thin JNI wrapper reaches libc only through libhttrack, so the direct
# libc edge is dropped from DT_NEEDED (library-not-linked-against-libc). Force
# libc to be recorded as a dependency.
libhtsjava_la_LDFLAGS = $(AM_LDFLAGS) -version-info $(VERSION_INFO) -Wl,--push-state,--no-as-needed,-lc,--pop-state
# libc back as a dependency; configure gates the flag since only a GNU-style
# linker accepts it (LIBC_FORCE_LINK is empty on e.g. macOS).
libhtsjava_la_LDFLAGS = $(AM_LDFLAGS) -version-info $(VERSION_INFO) $(LIBC_FORCE_LINK)
EXTRA_DIST = httrack.h webhttrack \
coucal/murmurhash3.h.diff \
coucal/murmurhash3.h.orig \
@@ -563,8 +565,15 @@ EXTRA_DIST = httrack.h webhttrack \
proxy/proxytrack.h \
proxy/store.h \
proxy/proxytrack.vcproj \
coucal/* \
*.dsw *.dsp *.vcproj
coucal/LICENSE \
coucal/Makefile \
coucal/README.md \
coucal/sample.c \
coucal/tests.c \
htsjava.vcproj \
httrack.dsp httrack.dsw httrack.vcproj \
libhttrack.dsp libhttrack.dsw libhttrack.vcproj \
webhttrack.dsp webhttrack.dsw webhttrack.vcproj
all: all-am

View File

@@ -939,7 +939,7 @@ static htsblk cache_readex_new(httrackp * opt, cache_back * cache,
FILE *const fp = FOPEN(fconv(catbuff, sizeof(catbuff), previous_save), "rb");
if (fp != NULL) {
r.adr = (char *) malloct((int) r.size + 4);
r.adr = (char *) malloct((int) r.size + 1);
if (r.adr != NULL) {
if (r.size > 0
&& fread(r.adr, 1, (int) r.size, fp) != r.size) {
@@ -948,7 +948,8 @@ static htsblk cache_readex_new(httrackp * opt, cache_back * cache,
r.statuscode = STATUSCODE_INVALID;
sprintf(r.msg, "Read error in cache disk data: %s",
strerror(last_errno));
}
} else if (r.size >= 0)
*(r.adr + r.size) = '\0';
} else {
r.statuscode = STATUSCODE_INVALID;
strcpybuff(r.msg,
@@ -965,7 +966,7 @@ static htsblk cache_readex_new(httrackp * opt, cache_back * cache,
// Data in cache.
else {
// lire fichier (d'un coup)
r.adr = (char *) malloct((int) r.size + 4);
r.adr = (char *) malloct((int) r.size + 1);
if (r.adr != NULL) {
if (unzReadCurrentFile((unzFile) cache->zipInput, r.adr, (int) r.size) != r.size) { // erreur
freet(r.adr);
@@ -1245,13 +1246,14 @@ static htsblk cache_readex_old(httrackp * opt, cache_back * cache,
FILE *fp = FOPEN(fconv(catbuff, sizeof(catbuff), return_save), "rb");
if (fp != NULL) {
r.adr = (char *) malloct((size_t) r.size + 4);
r.adr = (char *) malloct((size_t) r.size + 1);
if (r.adr != NULL) {
if (r.size > 0
&& fread(r.adr, 1, (size_t) r.size, fp) != r.size) {
r.statuscode = STATUSCODE_INVALID;
strcpybuff(r.msg, "Read error in cache disk data");
}
} else if (r.size >= 0)
*(r.adr + r.size) = '\0';
} else {
r.statuscode = STATUSCODE_INVALID;
strcpybuff(r.msg,
@@ -1266,7 +1268,7 @@ static htsblk cache_readex_old(httrackp * opt, cache_back * cache,
}
} else {
// lire fichier (d'un coup)
r.adr = (char *) malloct((size_t) r.size + 4);
r.adr = (char *) malloct((size_t) r.size + 1);
if (r.adr != NULL) {
if (fread(r.adr, 1, (size_t) r.size, cache->olddat) != r.size) { // erreur
freet(r.adr);
@@ -1369,10 +1371,11 @@ int cache_readdata(cache_back * cache, const char *str1, const char *str2,
cache_rint(cache->olddat, &len);
if (len > 0) {
char *mem_buff = (char *) malloct(len + 4); /* Plus byte 0 */
char *mem_buff = (char *) malloct(len + 1); /* trailing \0 */
if (mem_buff) {
if (fread(mem_buff, 1, len, cache->olddat) == len) { // lire tout (y compris statuscode etc)*/
mem_buff[len] = '\0';
*inbuff = mem_buff;
*inlen = len;
return 1;

View File

@@ -182,6 +182,16 @@ static int check_entry(httrackp *opt, cache_back *cache, const char *adr,
fail++;
}
/* The loaded body must be NUL-terminated at [size]: cache_readex's strlen()
consumers (htscore.c:1046, htscache.c) rely on it, and a missing
terminator is a heap over-read. The buffer is malloc(size + slack), so
reading [size] is in bounds. */
if (r.adr != NULL && r.adr[r.size] != '\0') {
fprintf(stderr, "cache-selftest: %s%s: body not NUL-terminated at [size]\n",
adr, fil);
fail++;
}
#undef CHECK_STR
if (r.adr != NULL) {
@@ -208,6 +218,107 @@ static void gen_body(char *buf, size_t len, int kind) {
}
}
/* Exercise the disk-fallback read path: a record stored with X-In-Cache: 0
keeps its body on disk (not in the ZIP), and cache_readex must load it from
there. The one-shot crawl tests never re-read such a body into memory, so
this path otherwise has no runtime coverage. We store the header with
all_in_cache=0 and a non-hypertext content-type (-> X-In-Cache: 0), create
the body at the exact fconv()-resolved path the reader uses, then read it
back and assert it round-trips and is NUL-terminated. */
static int disk_fallback_selftest(httrackp *opt) {
int fail = 0;
cache_back cache;
htsblk r;
char catbuff[HTS_URLMAXSIZE * 2];
char *path;
char *locbuf;
FILE *fp;
const char *const adr = "example.com";
const char *const fil = "/blob.bin";
char save[HTS_URLMAXSIZE * 2];
/* no embedded NUL: were the read to leave this un-terminated, a later
strlen() would run off the end (the bug this guards) */
static const char body[] = "BINARY-on-disk-body-0123456789-no-trailing-nul";
const size_t body_len = sizeof(body) - 1;
/* X-Save must start with path_html_utf8 so the reader resolves it verbatim
(otherwise it re-roots it as a pre-3.40 relative path); then the body we
create at fconv(save) is exactly where cache_readex looks for it. */
fconcat(save, sizeof(save), StringBuff(opt->path_html_utf8),
"example.com/blob.bin");
/* write only the header (X-In-Cache: 0); the body stays on disk */
selftest_open_for_write(&cache, opt);
{
htsblk w;
char locw[4];
char *bodycopy = malloct(body_len);
hts_init_htsblk(&w);
w.statuscode = 200;
w.size = (LLint) body_len;
strcpybuff(w.msg, "OK");
strcpybuff(w.contenttype, "application/octet-stream");
locw[0] = '\0';
w.location = locw;
w.is_write = 0;
memcpy(bodycopy, body, body_len);
w.adr = bodycopy;
cache_add(opt, &cache, &w, adr, fil, save, 0 /* all_in_cache */, NULL);
freet(bodycopy);
}
selftest_close(&cache);
/* create the on-disk body where the reader will look for it */
path = fconv(catbuff, sizeof(catbuff), save);
(void) structcheck(path);
fp = FOPEN(path, "wb");
if (fp == NULL) {
fprintf(stderr, "cache-selftest: disk-fallback: cannot create '%s'\n",
path);
return 1;
}
if (fwrite(body, 1, body_len, fp) != body_len) {
fprintf(stderr, "cache-selftest: disk-fallback: short write to '%s'\n",
path);
fail++;
}
fclose(fp);
/* read it back: takes the X-In-Cache: 0 disk-fallback branch */
selftest_open_for_read(&cache, opt);
locbuf = malloct(HTS_URLMAXSIZE * 2);
locbuf[0] = '\0';
r = cache_readex(opt, &cache, adr, fil, "", locbuf, NULL, 1);
if (r.statuscode != 200) {
fprintf(stderr,
"cache-selftest: disk-fallback: statuscode %d, expected 200"
" (path not taken or read failed)\n",
r.statuscode);
fail++;
}
if (r.size != (LLint) body_len) {
fprintf(stderr,
"cache-selftest: disk-fallback: size " LLintP ", expected %d\n",
(LLint) r.size, (int) body_len);
fail++;
} else if (r.adr == NULL || memcmp(r.adr, body, body_len) != 0) {
fprintf(stderr, "cache-selftest: disk-fallback: body mismatch\n");
fail++;
}
/* the loaded body must be NUL-terminated at [size] */
if (r.adr != NULL && r.adr[r.size] != '\0') {
fprintf(stderr, "cache-selftest: disk-fallback: body not NUL-terminated\n");
fail++;
}
if (r.adr != NULL) {
freet(r.adr);
}
freet(locbuf);
selftest_close(&cache);
return fail;
}
int cache_selftests(httrackp *opt, const char *dir) {
int failures = 0;
cache_back cache;
@@ -257,6 +368,10 @@ int cache_selftests(httrackp *opt, const char *dir) {
strcatbuff(base, "/");
}
StringCopy(opt->path_log, base);
/* the disk-fallback pass resolves on-disk body paths through fconv(), which
is rooted at path_html; keep it inside the test directory too */
StringCopy(opt->path_html, base);
StringCopy(opt->path_html_utf8, base);
}
opt->cache = 1;
@@ -366,6 +481,9 @@ int cache_selftests(httrackp *opt, const char *dir) {
"", body_updated, strlen(body_updated));
selftest_close(&cache);
/* pass 5: the disk-fallback read path (X-In-Cache: 0, body on disk) */
failures += disk_fallback_selftest(opt);
for (i = 0; i < large_count; i++) {
freet(large_body[i]);
}

View File

@@ -2193,16 +2193,19 @@ int httpmirror(char *url1, httrackp * opt) {
(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt), StringBuff(opt->path_log),
"hts-cache/new.lst"), "rb");
if (new_lst != NULL && sz != (size_t) -1) {
char *adr = (char *) malloct(sz);
/* +1 for the NUL below: new.lst is read raw, and the strstr()
that follows needs a terminated C string. */
char *adr = (char *) malloct(sz + 1);
if (adr) {
if (fread(adr, 1, sz, new_lst) == sz) {
adr[sz] = '\0';
char line[1100];
int purge = 0;
while(!feof(old_lst)) {
linput(old_lst, line, 1000);
if (!strstr(adr, line)) { // fichier non trouvé dans le nouveau?
if (!strstr(adr, line)) { // not found in the new list?
char BIGSTK file[HTS_URLMAXSIZE * 2];
strcpybuff(file, StringBuff(opt->path_html));

View File

@@ -145,8 +145,13 @@ int hts_unescapeEntitiesWithCharset(const char *src, char *dest, const size_t ma
if (!hex) {
if (src[i] >= '0' && src[i] <= '9') {
const int h = src[i] - '0';
uc *= 10;
uc += h;
/* Guard before multiplying: a codepoint past the Unicode max
(0x10FFFF) is invalid anyway, so stop rather than overflow uc. */
if (uc > (0x10FFFF - h) / 10) {
ampStart = (size_t) -1;
} else {
uc = uc * 10 + h;
}
} else {
/* abandon */
ampStart = (size_t) -1;
@@ -156,8 +161,11 @@ int hts_unescapeEntitiesWithCharset(const char *src, char *dest, const size_t ma
else {
const int h = get_hex_value(src[i]);
if (h != -1) {
uc *= 16;
uc += h;
if (uc > (0x10FFFF - h) / 16) {
ampStart = (size_t) -1;
} else {
uc = uc * 16 + h;
}
} else {
/* abandon */
ampStart = (size_t) -1;

View File

@@ -334,7 +334,7 @@ void index_finish(const char *indexpath, int mode) {
if (fp_tmpproject) {
tab = (char **) malloct(sizeof(char *) * (hts_primindex_size + 2));
if (tab) {
blk = malloct(size + 4);
blk = malloct(size + 1);
if (blk) {
fseek(fp_tmpproject, 0, SEEK_SET);
if ((INTsys) fread(blk, 1, size, fp_tmpproject) == size) {
@@ -343,6 +343,7 @@ void index_finish(const char *indexpath, int mode) {
int i;
FILE *fp;
blk[size] = '\0';
while((b = strchr(a, '\n')) && (index < hts_primindex_size)) {
tab[index++] = a;
*b = '\0';

View File

@@ -3416,8 +3416,17 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
if (RUN_CALLBACK4(opt, postprocess, &cAddr, &cSize, urladr(), urlfil()) == 1) {
hts_log_print(opt, LOG_DEBUG,
"engine: postprocess-html: callback modified data, applying %d bytes", cSize);
TypedArraySize(output_buffer) = 0;
TypedArrayAppend(output_buffer, cAddr, cSize);
/* The callback either edits output_buffer in place (cAddr
unchanged) or hands back its own buffer (cAddr changed). Only
the latter needs a copy: re-appending output_buffer onto itself
would read freed memory, as the append's realloc can relocate
the block out from under cAddr. */
if (cAddr != TypedArrayElts(output_buffer)) {
TypedArraySize(output_buffer) = 0;
TypedArrayAppend(output_buffer, cAddr, cSize);
} else {
TypedArraySize(output_buffer) = (size_t) cSize;
}
}
}

View File

@@ -1162,7 +1162,7 @@ static PT_Element PT_ReadCache__New_u(PT_Index index_, const char *url,
FILE *fp = fopen(file_convert(catbuff, sizeof(catbuff), previous_save), "rb");
if (fp != NULL) {
r->adr = (char *) malloc(r->size + 4);
r->adr = (char *) malloc(r->size + 1);
if (r->adr != NULL) {
if (r->size > 0
&& fread(r->adr, 1, r->size, fp) != r->size) {
@@ -1172,6 +1172,7 @@ static PT_Element PT_ReadCache__New_u(PT_Index index_, const char *url,
sprintf(r->msg, "Read error in cache disk data: %s",
strerror(last_errno));
}
r->adr[r->size] = '\0';
} else {
r->statuscode = STATUSCODE_INVALID;
strcpy(r->msg,

View File

@@ -202,6 +202,7 @@ LD = @LD@
LDFLAGS = @LDFLAGS@
LDFLAGS_PIE = @LDFLAGS_PIE@
LFS_FLAG = @LFS_FLAG@
LIBC_FORCE_LINK = @LIBC_FORCE_LINK@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@

View File

@@ -1,5 +1,8 @@
#!/bin/bash
#
# Keep this POSIX-portable: the harness runs it via $(BASH), which is a plain
# POSIX /bin/sh on some platforms (e.g. macOS), so avoid bashisms and GNU-only
# tool flags despite the #!/bin/bash above.
# Cache create/read/update logic (driven by 'httrack -#A <dir>').
#
@@ -38,9 +41,12 @@ test -e "$dir/hts-cache/new.zip" || {
# Sanity-check the cache footprint: the few-thousand-entry pass is expected to
# weigh ~1-2 MB. Fail if it balloons well past that (e.g. a per-entry overhead
# regression or runaway growth), so the cache size stays bounded.
ceiling=$((4 * 1024 * 1024))
bytes=$(du -sb "$dir/hts-cache" | cut -f1)
test "$bytes" -le "$ceiling" || {
echo "cache footprint $bytes bytes exceeds ${ceiling} ceiling" >&2
# du -sk (1024-byte units) is portable; GNU's -b (apparent bytes) is rejected
# by BSD/macOS du. Block-allocated size is an upper bound on apparent size,
# which is all a ceiling check needs.
ceiling=$((4 * 1024)) # KiB
kbytes=$(du -sk "$dir/hts-cache" | cut -f1)
test "$kbytes" -le "$ceiling" || {
echo "cache footprint ${kbytes} KiB exceeds ${ceiling} KiB ceiling" >&2
exit 1
}

View File

@@ -3,6 +3,10 @@
# The committed man/httrack.1 must match what man/makeman.sh produces from the
# current "httrack --help" output. This catches a --help change that was not
# followed by "make -C man regen-man".
#
# Keep this POSIX-portable: the harness runs it via $(BASH), which is a plain
# POSIX /bin/sh on some platforms (e.g. macOS), so avoid bashisms (such as
# process substitution) despite the #!/bin/bash above.
: "${top_srcdir:=..}"
@@ -20,7 +24,9 @@ command -v httrack >/dev/null 2>&1 || {
}
tmp=$(mktemp) || exit 1
trap 'rm -f "$tmp"' EXIT
committed_clean=$(mktemp) || exit 1
generated_clean=$(mktemp) || exit 1
trap 'rm -f "$tmp" "$committed_clean" "$generated_clean"' EXIT
README="$top_srcdir/README" bash "$gen" httrack >"$tmp" 2>/dev/null || {
echo "makeman.sh failed" >&2
@@ -28,12 +34,15 @@ README="$top_srcdir/README" bash "$gen" httrack >"$tmp" 2>/dev/null || {
}
# Ignore the two intentionally date-dependent lines (page date, copyright year).
# Temp files, not process substitution, so this works under a POSIX /bin/sh.
strip_volatile() { grep -vE '^\.TH httrack |^Copyright \(C\) 1998-'; }
strip_volatile <"$committed" >"$committed_clean"
strip_volatile <"$tmp" >"$generated_clean"
if diff <(strip_volatile <"$committed") <(strip_volatile <"$tmp") >/dev/null; then
if diff "$committed_clean" "$generated_clean" >/dev/null; then
exit 0
fi
echo "man/httrack.1 is out of date. Regenerate with: make -C man regen-man" >&2
diff <(strip_volatile <"$committed") <(strip_volatile <"$tmp") | head -40 >&2
diff "$committed_clean" "$generated_clean" | head -40 >&2
exit 1

View File

@@ -380,6 +380,7 @@ LD = @LD@
LDFLAGS = @LDFLAGS@
LDFLAGS_PIE = @LDFLAGS_PIE@
LFS_FLAG = @LFS_FLAG@
LIBC_FORCE_LINK = @LIBC_FORCE_LINK@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@

View File

@@ -118,7 +118,10 @@ main() {
git -C "$repo/src/coucal" archive --format=tar --prefix=src/coucal/ HEAD |
tar -x -C "$export_dir"
# Refresh build system and man page, then build and validate the tarball.
# Refresh build system and man page, then build the tarball. We build here
# only because regen-man needs the compiled binaries; the test suite is not
# run in this pass. debuild (below) runs the full suite once, with the online
# tests enabled, so a check here would just be a slower, offline-only repeat.
info "regenerating build system and man page"
(
cd "$export_dir"
@@ -126,8 +129,6 @@ main() {
./configure --quiet
make -s -j"$(nproc)"
make -s -C man regen-man
info "running test suite"
make -s check
# Build the tarball from a clean tree so no object files leak into it.
make -s clean
make -s dist