Compare commits

...

12 Commits

Author SHA1 Message Date
Xavier Roche
130c2fff54 Bound htsback backing-info and fast-cache copies
Continue the htssafe.h pointer-destination migration in htsback.c.

back_infostr() wrote into a bare char* through the unchecked strcatbuff()
path. Thread the destination capacity through and use strlcatbuff(), and
fix a latent bug while here: the size/totalsize trailer was sprintf'd
straight into the destination, wiping the URL the function had just
assembled, instead of being built in the scratch buffer and appended.
The fixed-size sprintf() calls become snprintf().

Enlarge back_info()'s status buffer to HTS_URLMAXSIZE*4+1024 so it can
hold both url_adr and url_fil (each HTS_URLMAXSIZE*2) plus framing. The
old HTS_URLMAXSIZE*2+1024 buffer was too small for two full-length URL
fields, so the now-bounded appends would abort on a long URL.

In back_add()'s fast-header cache path, copy the cached location into its
backing array (location_buffer) rather than through the r.location alias,
so the bounded macro sees the real capacity.

Add a back_infostr()/back_info() self-test under -#7: it formats 2000
in-memory slots across every status-code arm with exact-match assertions
(no sockets needed), plus a near-maximal URL driven through back_info()
to guard the buffer sizing. It fails on the clobber bug and on an
undersized status buffer.

htsback.c is now free of pointer-destination buff() warnings.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 20:21:09 +02:00
Xavier Roche
2f85ca7e6d Merge pull request #345 from xroche/build/noexec-out-of-tree
build: make out-of-tree builds and "make check" work from a read-only/noexec tree
2026-06-14 18:27:23 +02:00
Xavier Roche
28c22bd64d build: make out-of-tree builds and "make check" work from a read-only/noexec tree
Out-of-tree builds were broken in two ways, and "make check" could not run
when the source tree sits on a noexec filesystem. Fix both so a plain
`mkdir build && cd build && bash <srcdir>/configure && make && make check`
works without copying the tree.

libtest: -I../src is relative to the build dir, so out-of-tree it pointed at
build/src (generated files only) and missed the source header
httrack-library.h. Use -I$(top_srcdir)/src.

Wildcard DATA lists (libtest, lang, html, m4) used bare globs like "*.html".
Make expands a wildcard prerequisite against the build dir, so out-of-tree the
glob matches nothing and stays literal ("No rule to make target '*.html'").
Glob against $(srcdir) instead. Explicit filenames (e.g. ../history.txt) are
left as-is; they resolve through VPATH.

make check: automake's driver execve()s each tests/*.test, which fails with
"Permission denied" when the source tree is on a noexec mount. Run them through
bash via TEST_LOG_COMPILER = $(BASH) (detected by configure); this also drops
any reliance on the scripts' executable bit and works on a normal tree too.

Verified end to end from a noexec source tree: out-of-tree make builds the full
tree including libtest, and "make check" runs (14 pass, online crawl tests skip
offline, 0 fail).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 18:18:55 +02:00
Xavier Roche
4b7be69526 Merge pull request #344 from xroche/cleanup/copyright-spdx
Normalize copyright years and add SPDX identifiers
2026-06-14 18:18:31 +02:00
Xavier Roche
995cc6c86e Normalize copyright years and add SPDX identifiers
Collapse the stale "1998-2017" copyright ranges (and a handful of other
ranges) in HTTrack-authored source to a single earliest year taken from
git history: 1998 for files tracing back to the 2012 release-history
import, and the real first-commit year for later additions (2013 for
htsencoding, 2014 for htsarrays/htssafe/htsconcat, 2026 for the cache
self-test). Each header also gains an SPDX-License-Identifier:
GPL-3.0-or-later line.

The runtime "about" banners (httrack, proxytrack) and the man pages keep
a range, but now end in the current year computed at build time: via
__DATE__ for the C banners and makeman.sh for the generated httrack.1,
so they no longer freeze at a stale year.

Third-party notices (Even Rouault, Mathias Svensson, Info-ZIP, Eric
Young) and the BSD-licensed coucal submodule are left untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 18:15:40 +02:00
Xavier Roche
e1fdfcec5d Merge pull request #343 from xroche/tests/cache-selftest
Add an in-process cache create/read/update self-test
2026-06-14 17:49:08 +02:00
Xavier Roche
83ff148efd Add an in-process cache create/read/update self-test
Wire a new `httrack -#A <dir>` debug option that exercises the ZIP cache
end to end through the public API (cache_init / cache_add / cache_readex),
in a dedicated source file (htscache_selftest.c).

It stores, then reads back asserting every header field and the body
round-trip exactly:
- hand-crafted edge cases: a normal HTML page, an empty redirect with a
  near-limit location, a non-HTML body kept in cache via all-in-cache, and
  a binary body with embedded NUL and high bytes (compared with memcmp);
- a few thousand small entries, to stress the index/lookup at scale;
- a few large compressible and incompressible bodies, to exercise zlib
  deflate/inflate and large-buffer handling.

It then updates one entry and confirms the new value is read back. The
driver returns the number of mismatches so failures are observable. The
whole cache weighs ~1-2 MB and the run takes a fraction of a second.

The location case is sized to the cache's real per-header-line round-trip
limit: cached headers are parsed through a HTS_URLMAXSIZE-sized line
buffer, so a value longer than that is truncated on read regardless of
the larger r.location buffer; 1000 bytes stays safely under it.

A dedicated test (tests/01_engine-cache.test) drives the option, asserts
the success line, that a ZIP cache was written, and that its footprint
stays under a sane ceiling.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 17:47:04 +02:00
Xavier Roche
50bb02e729 Merge pull request #342 from xroche/cleanup/cache-rstr-s1
Bound the legacy .dat cache readers (cache_rstr / cache_brstr)
2026-06-14 16:44:27 +02:00
Xavier Roche
b80ee793ac Bound the legacy .dat cache readers (cache_rstr / cache_brstr)
cache_rstr() read an attacker-controlled length (clamped only to 32768) from a
CACHE-1.x .dat and fread() it straight into fixed htsblk fields (r.msg[80],
r.contenttype[64], ...) with no destination bound -- a heap/stack overflow from
a crafted/old cache (the audit's S1). cache_brstr() (the in-memory variant) had
the same shape and, worse, no length cap at all.

Thread a destination size into both:
- cache_rstr stores at most s_size-1 bytes and fseek()s past the remainder so
  the next field stays aligned (the field may be longer than the destination in
  a tampered cache).
- cache_brstr caps the length and bounds the copy.
Update every caller (htscache.c and htscoremain.c) to pass sizeof(field) /
HTS_URLMAXSIZE*2. cache_rstr_addr already malloc()s to the read size, so it is
left as is. Remove the dead cache_quickbrstr (no callers).

A dedicated cache self-test (create/read/update) follows separately.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 16:41:17 +02:00
Xavier Roche
d12456c1e8 Merge pull request #341 from xroche/test/cache-update
Add an offline update/cache regression test
2026-06-14 16:31:42 +02:00
Xavier Roche
a52a2b146c Add an offline update/cache regression test
Every crawl test runs httrack exactly once (crawl-test.sh), so the cache read /
update path (cache_readex) -- recently touched by the buffer-bounding work -- had
zero regression coverage: the cache was written but never read back.

Add tests/02_update-cache.test, a self-contained file:// two-pass test (no
network, always runs): mirror a local site, re-mirror it unchanged (the cache-
read pass must complete with no errors -- guards a crash/abort in cache_readex),
then change a source file and re-mirror (the update must pick up the new content
-- guards the update decision that reads the cached metadata).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 16:29:45 +02:00
Xavier Roche
226a38d3d0 Merge pull request #340 from xroche/cleanup/htscache-bounds
Bound htscache.c cache-field and save-name copies
2026-06-14 15:58:04 +02:00
93 changed files with 1015 additions and 178 deletions

View File

@@ -50,6 +50,9 @@ LT_INIT
AC_PROG_LN_S
LT_INIT
# bash, used to run the test scripts (see tests/Makefile.am TEST_LOG_COMPILER)
AC_PATH_PROGS([BASH], [bash], [/bin/bash])
# Export LD_LIBRARY_PATH name or equivalent.
AC_SUBST(SHLIBPATH_VAR,$shlibpath_var)

View File

@@ -13,22 +13,26 @@ WebIcon32x32dir = $(datadir)/icons/hicolor/32x32/apps
WebIcon48x48dir = $(datadir)/icons/hicolor/48x48/apps
VFolderEntrydir = $(prefix)/share/applications
# Wildcards are globbed against $(srcdir): a bare "*.html" is resolved against
# the build dir and stays unexpanded (breaking "make") in an out-of-tree build.
# Explicit filenames (e.g. ../history.txt, div/search.sh) resolve via VPATH and
# need no prefix.
HelpHtmlroot_DATA = ../httrack-doc.html ../history.txt
HelpHtml_DATA = *.html
HelpHtml_DATA = $(srcdir)/*.html
HelpHtmldiv_DATA = div/search.sh
HelpHtmlimg_DATA = img/*
HelpHtmlimages_DATA = images/*
HelpHtmlimg_DATA = $(srcdir)/img/*
HelpHtmlimages_DATA = $(srcdir)/images/*
HelpHtmlTxt_DATA = ../greetings.txt ../history.txt ../license.txt
WebHtml_DATA = server/*.html server/*.js server/*.css
WebHtmlimages_DATA = server/images/*
WebHtml_DATA = $(srcdir)/server/*.html $(srcdir)/server/*.js $(srcdir)/server/*.css
WebHtmlimages_DATA = $(srcdir)/server/images/*
# note: converted & normalized by
# ico2xpm favicon.ico -o httrack.xpm
# mogrify -format xpm -map /usr/share/doc/menu/examples/cmap.xpm httrack.xpm
WebPixmap_DATA = server/div/*.xpm
WebIcon16x16_DATA = server/div/16x16/*.png
WebIcon32x32_DATA = server/div/32x32/*.png
WebIcon48x48_DATA = server/div/48x48/*.png
VFolderEntry_DATA = server/div/*.desktop
WebPixmap_DATA = $(srcdir)/server/div/*.xpm
WebIcon16x16_DATA = $(srcdir)/server/div/16x16/*.png
WebIcon32x32_DATA = $(srcdir)/server/div/32x32/*.png
WebIcon48x48_DATA = $(srcdir)/server/div/48x48/*.png
VFolderEntry_DATA = $(srcdir)/server/div/*.desktop
EXTRA_DIST = $(HelpHtml_DATA) $(HelpHtmlimg_DATA) $(HelpHtmlimages_DATA) \
$(HelpHtmldiv_DATA) $(WebHtml_DATA) $(WebHtmlimages_DATA) \

View File

@@ -1,6 +1,8 @@
langdir = $(datadir)/httrack/lang
lang_DATA = *.txt
# Glob against $(srcdir): a bare "*.txt" is resolved against the build dir and
# stays unexpanded (breaking "make") in an out-of-tree build.
lang_DATA = $(srcdir)/*.txt
langrootdir = $(datadir)/httrack
langroot_DATA = ../lang.def ../lang.indexes

View File

@@ -1,6 +1,8 @@
exemplesdir = $(datadir)/httrack/libtest
exemples_DATA = *.c *.h *.txt
# Glob against $(srcdir), not the build dir: a bare "*.c" is resolved relative to
# the build dir and stays unexpanded (breaking "make") in an out-of-tree build.
exemples_DATA = $(srcdir)/*.c $(srcdir)/*.h $(srcdir)/*.txt
EXTRA_DIST = $(exemples_DATA) libtest.mak libtest.vcproj
AM_CPPFLAGS = \
@@ -12,7 +14,9 @@ AM_CPPFLAGS = \
-DSYSCONFDIR=\""$(sysconfdir)"\" \
-DDATADIR=\""$(datadir)"\" \
-DLIBDIR=\""$(libdir)"\"
AM_CPPFLAGS += -I../src
# Use $(top_srcdir)/src, not ../src: the latter is relative to the build dir and
# misses the source headers (e.g. httrack-library.h) in an out-of-tree build.
AM_CPPFLAGS += -I$(top_srcdir)/src
# The callback examples reference libc only through libhttrack, so the direct
# libc edge gets dropped from DT_NEEDED (library-not-linked-against-libc).

View File

@@ -1 +1,3 @@
EXTRA_DIST = *.m4
# Glob against $(srcdir) so "make dist" works out-of-tree (a bare "*.m4" is
# resolved against the build dir, where there are no sources).
EXTRA_DIST = $(srcdir)/*.m4

View File

@@ -1,6 +1,7 @@
.\" Process this file with
.\" groff -man -Tascii htsserver.1
.\"
.\" SPDX-License-Identifier: GPL-3.0-or-later
.TH htsserver 1 "Mar 2003" "httrack website copier"
.SH NAME
htsserver \- offline browser server : copy websites to a local directory
@@ -35,7 +36,7 @@ Please reports bugs to
.B <bugs@httrack.com>.
Include a complete, self-contained example that will allow the bug to be reproduced, and say which version of (web)httrack you are using. Do not forget to detail options used, OS version, and any other information you deem necessary.
.SH COPYRIGHT
Copyright (C) 1998-2013 Xavier Roche and other contributors
Copyright (C) 1998-2026 Xavier Roche and other contributors
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -2,6 +2,7 @@
.\" groff -man -Tascii httrack.1
.\"
.\" This file is generated by man/makeman.sh; do not edit by hand.
.\" SPDX-License-Identifier: GPL-3.0-or-later
.TH httrack 1 "13 June 2026" "httrack website copier"
.SH NAME
httrack \- offline browser : copy websites to a local directory

View File

@@ -110,6 +110,7 @@ cat <<'EOF'
.\" groff -man -Tascii httrack.1
.\"
.\" This file is generated by man/makeman.sh; do not edit by hand.
.\" SPDX-License-Identifier: GPL-3.0-or-later
EOF
printf '.TH httrack 1 "%s" "httrack website copier"\n' "$date_str"
cat <<'EOF'

View File

@@ -1,6 +1,7 @@
.\" Process this file with
.\" groff -man -Tascii proxytrack.1
.\"
.\" SPDX-License-Identifier: GPL-3.0-or-later
.TH proxytrack 1 "Mar 2003" "httrack website copier"
.SH NAME
proxytrack \- proxy to serve content archived by httrack website copier
@@ -25,7 +26,7 @@ Please reports bugs to
.B <bugs@httrack.com>.
Include a complete, self-contained example that will allow the bug to be reproduced, and say which version of (web)httrack you are using. Do not forget to detail options used, OS version, and any other information you deem necessary.
.SH COPYRIGHT
Copyright (C) 1998-2013 Xavier Roche and other contributors
Copyright (C) 1998-2026 Xavier Roche and other contributors
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,6 +1,7 @@
.\" Process this file with
.\" groff -man -Tascii webhttrack.1
.\"
.\" SPDX-License-Identifier: GPL-3.0-or-later
.TH webhttrack 1 "Mar 2003" "httrack website copier"
.SH NAME
webhttrack \- offline browser : copy websites to a local directory
@@ -36,7 +37,7 @@ Please reports bugs to
.B <bugs@httrack.com>.
Include a complete, self-contained example that will allow the bug to be reproduced, and say which version of (web)httrack you are using. Do not forget to detail options used, OS version, and any other information you deem necessary.
.SH COPYRIGHT
Copyright (C) 1998-2013 Xavier Roche and other contributors
Copyright (C) 1998-2026 Xavier Roche and other contributors
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -56,6 +56,7 @@ whttrackrundir = $(bindir)
whttrackrun_SCRIPTS = webhttrack
libhttrack_la_SOURCES = htscore.c htsparse.c htsback.c htscache.c \
htscache_selftest.c \
htscatchurl.c htsfilters.c htsftp.c htshash.c coucal/coucal.c \
htshelp.c htslib.c htscoremain.c \
htsname.c htsrobots.c htstools.c htswizard.c \
@@ -65,7 +66,7 @@ libhttrack_la_SOURCES = htscore.c htsparse.c htsback.c htscache.c \
md5.c \
minizip/ioapi.c minizip/mztools.c minizip/unzip.c minizip/zip.c \
hts-indextmpl.h htsalias.h htsback.h htsbase.h htssafe.h \
htsbasenet.h htsbauth.h htscache.h htscatchurl.h \
htsbasenet.h htsbauth.h htscache.h htscache_selftest.h htscatchurl.h \
htsconfig.h htscore.h htsparse.h htscoremain.h htsdefines.h \
htsfilters.h htsftp.h htsglobal.h htshash.h coucal/coucal.h \
htshelp.h htsindex.h htslib.h htsmd5.h \

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 2014 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -1516,9 +1518,11 @@ int back_add(struct_back * sback, httrackp * opt, cache_back * cache, const char
if (sscanf(text, "%d", &code) == 1) { // got code
back[p].r.statuscode = code;
back[p].status = STATUS_READY; // terminé
back[p].status = STATUS_READY; // done
if (lf != NULL && *lf != '\0') { // got location ?
strcpybuff(back[p].r.location, lf + 1);
// r.location aliases location_buffer (set above); write the array
// so the bounded macro picks up its capacity.
strcpybuff(back[p].location_buffer, lf + 1);
}
return 0;
}
@@ -3996,26 +4000,30 @@ LLint back_transferred(LLint nb, struct_back * sback) {
return nb;
}
// infos backing
// j: 1 afficher sockets 2 afficher autres 3 tout afficher
// backing info
// j: 1=show sockets 2=show others 3=show all
void back_info(struct_back * sback, int i, int j, FILE * fp) {
lien_back *const back = sback->lnk;
const int back_max = sback->count;
assertf(i >= 0 && i < back_max);
if (back[i].status >= 0) {
char BIGSTK s[HTS_URLMAXSIZE * 2 + 1024];
// Holds the status tag plus the full URL: url_adr and url_fil are each
// HTS_URLMAXSIZE*2, so reserve room for both (*4) plus framing/trailer.
// Undersizing would make back_infostr's bounded appends abort on a long
// URL.
char BIGSTK s[HTS_URLMAXSIZE * 4 + 1024];
s[0] = '\0';
back_infostr(sback, i, j, s);
back_infostr(sback, i, j, s, sizeof(s));
strcatbuff(s, LF);
fprintf(fp, "%s", s);
}
}
// infos backing
// j: 1 afficher sockets 2 afficher autres 3 tout afficher
void back_infostr(struct_back * sback, int i, int j, char *s) {
// backing info
// j: 1=show sockets 2=show others 3=show all
void back_infostr(struct_back *sback, int i, int j, char *s, size_t size) {
lien_back *const back = sback->lnk;
const int back_max = sback->count;
@@ -4025,16 +4033,16 @@ void back_infostr(struct_back * sback, int i, int j, char *s) {
if (j & 1) {
if (back[i].status == STATUS_CONNECTING) {
strcatbuff(s, "CONNECT ");
strlcatbuff(s, "CONNECT ", size);
} else if (back[i].status == STATUS_WAIT_HEADERS) {
strcatbuff(s, "INFOS ");
strlcatbuff(s, "INFOS ", size);
aff = 1;
} else if (back[i].status == STATUS_CHUNK_WAIT
|| back[i].status == STATUS_CHUNK_CR) {
strcatbuff(s, "INFOSC"); // infos chunk
strlcatbuff(s, "INFOSC", size); // chunk info
aff = 1;
} else if (back[i].status > 0) {
strcatbuff(s, "RECEIVE ");
strlcatbuff(s, "RECEIVE ", size);
aff = 1;
}
}
@@ -4042,44 +4050,44 @@ void back_infostr(struct_back * sback, int i, int j, char *s) {
if (back[i].status == STATUS_READY) {
switch (back[i].r.statuscode) {
case 200:
strcatbuff(s, "READY ");
strlcatbuff(s, "READY ", size);
aff = 1;
break;
case -1:
strcatbuff(s, "ERROR ");
strlcatbuff(s, "ERROR ", size);
aff = 1;
break;
case -2:
strcatbuff(s, "TIMEOUT ");
strlcatbuff(s, "TIMEOUT ", size);
aff = 1;
break;
case -3:
strcatbuff(s, "TOOSLOW ");
strlcatbuff(s, "TOOSLOW ", size);
aff = 1;
break;
case 400:
strcatbuff(s, "BADREQUEST ");
strlcatbuff(s, "BADREQUEST ", size);
aff = 1;
break;
case 401:
case 403:
strcatbuff(s, "FORBIDDEN ");
strlcatbuff(s, "FORBIDDEN ", size);
aff = 1;
break;
case 404:
strcatbuff(s, "NOT FOUND ");
strlcatbuff(s, "NOT FOUND ", size);
aff = 1;
break;
case 500:
strcatbuff(s, "SERVERROR ");
strlcatbuff(s, "SERVERROR ", size);
aff = 1;
break;
default:
{
char s2[256];
sprintf(s2, "ERROR(%d)", back[i].r.statuscode);
strcatbuff(s, s2);
snprintf(s2, sizeof(s2), "ERROR(%d)", back[i].r.statuscode);
strlcatbuff(s, s2, size);
}
aff = 1;
}
@@ -4090,16 +4098,18 @@ void back_infostr(struct_back * sback, int i, int j, char *s) {
{
char BIGSTK s2[HTS_URLMAXSIZE * 2 + 1024];
sprintf(s2, "\"%s", back[i].url_adr);
strcatbuff(s, s2);
snprintf(s2, sizeof(s2), "\"%s", back[i].url_adr);
strlcatbuff(s, s2, size);
if (back[i].url_fil[0] != '/')
strcatbuff(s, "/");
sprintf(s2, "%s\" ", back[i].url_fil);
strcatbuff(s, s2);
sprintf(s, LLintP " " LLintP " ", (LLint) back[i].r.size,
(LLint) back[i].r.totalsize);
strcatbuff(s, s2);
strlcatbuff(s, "/", size);
snprintf(s2, sizeof(s2), "%s\" ", back[i].url_fil);
strlcatbuff(s, s2, size);
// size/totalsize trailer: build in s2, then append (the old code wrote
// straight into s here, clobbering the URL it had just assembled).
snprintf(s2, sizeof(s2), LLintP " " LLintP " ", (LLint) back[i].r.size,
(LLint) back[i].r.totalsize);
strlcatbuff(s, s2, size);
}
}
}

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -127,7 +129,7 @@ int back_trylive(httrackp * opt, cache_back * cache, struct_back * sback,
int back_finalize(httrackp * opt, cache_back * cache, struct_back * sback,
const int p);
void back_info(struct_back * sback, int i, int j, FILE * fp);
void back_infostr(struct_back * sback, int i, int j, char *s);
void back_infostr(struct_back *sback, int i, int j, char *s, size_t size);
LLint back_transferred(LLint add, struct_back * sback);
// hostback

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,6 +1,8 @@
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -1105,20 +1107,24 @@ static htsblk cache_readex_old(httrackp * opt, cache_back * cache,
//
cache_rint(cache->olddat, &r.statuscode);
cache_rLLint(cache->olddat, &r.size);
cache_rstr(cache->olddat, r.msg);
cache_rstr(cache->olddat, r.contenttype);
cache_rstr(cache->olddat, r.msg, sizeof(r.msg));
cache_rstr(cache->olddat, r.contenttype, sizeof(r.contenttype));
if (cache->version >= 3)
cache_rstr(cache->olddat, r.charset);
cache_rstr(cache->olddat, r.lastmodified);
cache_rstr(cache->olddat, r.etag);
cache_rstr(cache->olddat, r.location);
cache_rstr(cache->olddat, r.charset, sizeof(r.charset));
cache_rstr(cache->olddat, r.lastmodified, sizeof(r.lastmodified));
cache_rstr(cache->olddat, r.etag, sizeof(r.etag));
// r.location points into a HTS_URLMAXSIZE*2 buffer
cache_rstr(cache->olddat, r.location, HTS_URLMAXSIZE * 2);
if (cache->version >= 2)
cache_rstr(cache->olddat, r.cdispo);
cache_rstr(cache->olddat, r.cdispo, sizeof(r.cdispo));
if (cache->version >= 4) {
cache_rstr(cache->olddat, previous_save); // adr
cache_rstr(cache->olddat, previous_save); // fil
cache_rstr(cache->olddat, previous_save,
sizeof(previous_save)); // adr
cache_rstr(cache->olddat, previous_save,
sizeof(previous_save)); // fil
previous_save[0] = '\0';
cache_rstr(cache->olddat, previous_save); // save
cache_rstr(cache->olddat, previous_save,
sizeof(previous_save)); // save
if (return_save != NULL) {
strlcpybuff(return_save, previous_save, HTS_URLMAXSIZE * 2);
}
@@ -1127,8 +1133,8 @@ static htsblk cache_readex_old(httrackp * opt, cache_back * cache,
r.headers = cache_rstr_addr(cache->olddat);
}
//
cache_rstr(cache->olddat, check);
if (strcmp(check, "HTS") == 0) { /* intégrité OK */
cache_rstr(cache->olddat, check, sizeof(check));
if (strcmp(check, "HTS") == 0) { /* integrity OK */
ok = 1;
}
cache_rLLint(cache->olddat, &size_read); /* lire size pour être sûr de la taille déclarée (réécrire) */
@@ -1767,12 +1773,12 @@ void cache_init(cache_back * cache, httrackp * opt) {
char firstline[256];
char *a = cache->use;
a += cache_brstr(a, firstline);
if (strncmp(firstline, "CACHE-", 6) == 0) { // Nouvelle version du cache
if (strncmp(firstline, "CACHE-1.", 8) == 0) { // Version 1.1x
a += cache_brstr(a, firstline, sizeof(firstline));
if (strncmp(firstline, "CACHE-", 6) == 0) { // new cache format
if (strncmp(firstline, "CACHE-1.", 8) == 0) { // version 1.1x
cache->version = (int) (firstline[8] - '0'); // cache 1.x
if (cache->version <= 5) {
a += cache_brstr(a, firstline);
a += cache_brstr(a, firstline, sizeof(firstline));
strcpybuff(cache->lastmodified, firstline);
} else {
hts_log_print(opt, LOG_ERROR,
@@ -1783,7 +1789,7 @@ void cache_init(cache_back * cache, httrackp * opt) {
freet(cache->use);
cache->use = NULL;
}
} else { // non supporté
} else { // non supporté
hts_log_print(opt, LOG_ERROR,
"Cache: %s not supported, ignoring current cache",
firstline);
@@ -1793,7 +1799,7 @@ void cache_init(cache_back * cache, httrackp * opt) {
cache->use = NULL;
}
/* */
} else { // Vieille version du cache
} else { // Vieille version du cache
/* */
hts_log_print(opt, LOG_WARNING,
"Cache: importing old cache format");
@@ -2118,7 +2124,7 @@ int cache_wstr(FILE * fp, const char *s) {
return -1;
return 0;
}
void cache_rstr(FILE * fp, char *s) {
void cache_rstr(FILE *fp, char *s, size_t s_size) {
INTsys i;
char buff[256 + 4];
@@ -2127,13 +2133,26 @@ void cache_rstr(FILE * fp, char *s) {
if (i < 0 || i > 32768) /* error, something nasty happened */
i = 0;
if (i > 0) {
if ((int) fread(s, 1, i, fp) != i) {
/* Store at most s_size-1 bytes into s, but consume all i bytes from the
stream so the next field stays aligned (the field may be longer than the
destination in a tampered/old cache). */
const size_t want = (size_t) i;
const size_t store = want < s_size ? want : s_size - 1;
if (fread(s, 1, store, fp) != store) {
int fread_cache_failed = 0;
assertf(fread_cache_failed);
}
if (want > store && fseek(fp, (long) (want - store), SEEK_CUR) != 0) {
int fseek_cache_failed = 0;
assertf(fseek_cache_failed);
}
s[store] = '\0';
} else {
s[0] = '\0';
}
*(s + i) = '\0';
}
char *cache_rstr_addr(FILE * fp) {
INTsys i;
@@ -2157,7 +2176,7 @@ char *cache_rstr_addr(FILE * fp) {
}
return addr;
}
int cache_brstr(char *adr, char *s) {
int cache_brstr(char *adr, char *s, size_t s_size) {
int i;
int off;
char buff[256 + 4];
@@ -2165,23 +2184,17 @@ int cache_brstr(char *adr, char *s) {
off = binput(adr, buff, 256);
adr += off;
sscanf(buff, "%d", &i);
if (i > 0)
strncpy(s, adr, i);
*(s + i) = '\0';
off += i;
return off;
}
int cache_quickbrstr(char *adr, char *s) {
int i;
int off;
char buff[256 + 4];
if (i < 0 || i > 32768) /* guard a corrupt length */
i = 0;
if (i > 0) {
/* copy at most s_size-1 bytes; advance past the full field regardless */
const size_t store = (size_t) i < s_size ? (size_t) i : s_size - 1;
off = binput(adr, buff, 256);
adr += off;
sscanf(buff, "%d", &i);
if (i > 0)
strncpy(s, adr, i);
*(s + i) = '\0';
strncpy(s, adr, store);
s[store] = '\0';
} else {
s[0] = '\0';
}
off += i;
return off;
}
@@ -2189,7 +2202,7 @@ int cache_quickbrstr(char *adr, char *s) {
/* idem, mais en int */
int cache_brint(char *adr, int *i) {
char s[256];
int r = cache_brstr(adr, s);
int r = cache_brstr(adr, s, sizeof(s));
if (r != -1)
sscanf(s, "%d", i);
@@ -2198,7 +2211,7 @@ int cache_brint(char *adr, int *i) {
void cache_rint(FILE * fp, int *i) {
char s[256];
cache_rstr(fp, s);
cache_rstr(fp, s, sizeof(s));
sscanf(s, "%d", i);
}
int cache_wint(FILE * fp, int i) {
@@ -2210,7 +2223,7 @@ int cache_wint(FILE * fp, int i) {
void cache_rLLint(FILE * fp, LLint * i) {
char s[256];
cache_rstr(fp, s);
cache_rstr(fp, s, sizeof(s));
sscanf(s, LLintP, i);
}
int cache_wLLint(FILE * fp, LLint i) {

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -80,10 +82,9 @@ int cache_writedata(FILE * cache_ndx, FILE * cache_dat, const char *str1,
int cache_readdata(cache_back * cache, const char *str1, const char *str2,
char **inbuff, int *len);
void cache_rstr(FILE * fp, char *s);
void cache_rstr(FILE *fp, char *s, size_t s_size);
char *cache_rstr_addr(FILE * fp);
int cache_brstr(char *adr, char *s);
int cache_quickbrstr(char *adr, char *s);
int cache_brstr(char *adr, char *s, size_t s_size);
int cache_brint(char *adr, int *i);
void cache_rint(FILE * fp, int *i);
void cache_rLLint(FILE * fp, LLint * i);

376
src/htscache_selftest.c Normal file
View File

@@ -0,0 +1,376 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 2026 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Important notes:
- We hereby ask people using this source NOT to use it in purpose of grabbing
emails addresses, or collecting any other private information on persons.
This would disgrace our work, and spoil the many hours we spent on it.
Please visit our Website: http://www.httrack.com
*/
/* ------------------------------------------------------------ */
/* File: htscache_selftest.c subroutines: */
/* in-process self-test for the (ZIP) cache subsystem */
/* Author: Xavier Roche */
/* ------------------------------------------------------------ */
/* Drives the public cache API (cache_init / cache_add / cache_readex)
through a create -> read -> update cycle on a real on-disk ZIP cache,
asserting every header field and the (binary-safe) body round-trips.
Besides a few hand-crafted edge cases it stores a few thousand entries
(index/lookup scale) and a handful of large compressible/incompressible
bodies (zlib deflate/inflate). Reached via `httrack -#A <dir>`. */
#define HTS_INTERNAL_BYTECODE
#include "htscache_selftest.h"
#include "htscache.h"
#include "htscore.h"
#include "htslib.h"
#include "htszlib.h"
#include <stdio.h>
#include <string.h>
#define SELFTEST_VOLUME 3000 /* number of small entries in the scale pass */
/* Open a cache session. A write session (ro=0) rotates new.zip -> old.zip and
opens a fresh new.zip; a read session (ro=1) opens new.zip in place. */
static void selftest_open(cache_back *cache, httrackp *opt, int ro) {
memset(cache, 0, sizeof(*cache));
cache->type = 1;
cache->log = stderr;
cache->errlog = stderr;
cache->hashtable = coucal_new(0);
cache->ro = ro;
cache_init(cache, opt);
}
static void selftest_open_for_write(cache_back *cache, httrackp *opt) {
selftest_open(cache, opt, 0);
}
static void selftest_open_for_read(cache_back *cache, httrackp *opt) {
selftest_open(cache, opt, 1);
}
static void selftest_close(cache_back *cache) {
if (cache->dat != NULL) {
fclose(cache->dat);
cache->dat = NULL;
}
if (cache->ndx != NULL) {
fclose(cache->ndx);
cache->ndx = NULL;
}
if (cache->zipOutput != NULL) {
zipClose(cache->zipOutput,
"Created by HTTrack Website Copier (cache self-test)");
cache->zipOutput = NULL;
}
if (cache->zipInput != NULL) {
unzClose(cache->zipInput);
cache->zipInput = NULL;
}
/* hashtable is intentionally not coucal_delete()d: it would dump a stats
summary to stderr on every call, and this is a one-shot CLI subcommand
that exits right after (same choice as the other -# cache subcommands) */
}
/* Store one entry. The body is copied into a private buffer (any size), so
callers may pass const data and cache_add never sees a cast-away qualifier;
it consumes everything synchronously, so the copy is freed on return. */
static void store_entry(httrackp *opt, cache_back *cache, const char *adr,
const char *fil, const char *save, int statuscode,
const char *msg, const char *contenttype,
const char *charset, const char *lastmodified,
const char *etag, const char *location,
const char *body, size_t body_len) {
htsblk r;
char locbuf[HTS_URLMAXSIZE * 2];
char *bodycopy = NULL;
hts_init_htsblk(&r);
r.statuscode = statuscode;
r.size = (LLint) body_len;
strcpybuff(r.msg, msg);
strcpybuff(r.contenttype, contenttype);
strcpybuff(r.charset, charset);
strcpybuff(r.lastmodified, lastmodified);
strcpybuff(r.etag, etag);
strcpybuff(locbuf, location);
r.location = locbuf;
r.is_write = 0;
/* an empty body must be a NULL pointer: cache_add rejects a non-NULL
pointer with size 0 */
if (body_len != 0) {
bodycopy = malloct(body_len);
memcpy(bodycopy, body, body_len);
r.adr = bodycopy;
}
/* all_in_cache=1: keep the body in the ZIP whatever the content-type,
so the read path never depends on a file on disk */
cache_add(opt, cache, &r, adr, fil, save, 1, NULL);
if (bodycopy != NULL) {
freet(bodycopy);
}
}
/* Read one entry back and check every field. Returns the number of
mismatches (0 == success). */
static int check_entry(httrackp *opt, cache_back *cache, const char *adr,
const char *fil, int statuscode, const char *msg,
const char *contenttype, const char *charset,
const char *lastmodified, const char *etag,
const char *location, const char *body,
size_t body_len) {
int fail = 0;
char *locbuf = malloct(HTS_URLMAXSIZE * 2);
htsblk r;
locbuf[0] = '\0';
/* readonly=1: pure read, no rename/disk-write decision logic */
r = cache_readex(opt, cache, adr, fil, "", locbuf, NULL, 1);
#define CHECK_STR(field, want) \
do { \
if (strcmp((field), (want)) != 0) { \
fprintf(stderr, \
"cache-selftest: %s%s: " #field " is '%s', expected '%s'\n", \
adr, fil, (field), (want)); \
fail++; \
} \
} while (0)
if (r.statuscode != statuscode) {
fprintf(stderr, "cache-selftest: %s%s: statuscode is %d, expected %d\n",
adr, fil, r.statuscode, statuscode);
fail++;
}
CHECK_STR(r.msg, msg);
CHECK_STR(r.contenttype, contenttype);
CHECK_STR(r.charset, charset);
CHECK_STR(r.lastmodified, lastmodified);
CHECK_STR(r.etag, etag);
CHECK_STR(locbuf, location);
if (r.size != (LLint) body_len) {
fprintf(stderr, "cache-selftest: %s%s: size is " LLintP ", expected %d\n",
adr, fil, (LLint) r.size, (int) body_len);
fail++;
} else if (body_len != 0 &&
(r.adr == NULL || memcmp(r.adr, body, body_len) != 0)) {
fprintf(stderr, "cache-selftest: %s%s: body mismatch\n", adr, fil);
fail++;
}
#undef CHECK_STR
if (r.adr != NULL) {
freet(r.adr);
}
freet(locbuf);
return fail;
}
/* Fill a body of the requested size. kind 0 is highly compressible (a short
repeating pattern), kind 1 is incompressible (a deterministic PRNG), kind 2
alternates the two -- together they exercise both deflate outcomes. */
static void gen_body(char *buf, size_t len, int kind) {
unsigned int seed = 0x9e3779b1u ^ (unsigned int) len;
size_t j;
for (j = 0; j < len; j++) {
if (kind == 0 || (kind == 2 && (j & 1) == 0)) {
buf[j] = (char) ('A' + (j % 26));
} else {
seed = seed * 1103515245u + 12345u;
buf[j] = (char) (seed >> 16);
}
}
}
int cache_selftests(httrackp *opt, const char *dir) {
int failures = 0;
cache_back cache;
int i;
/* near-limit field values. The etag stresses htsblk.etag[256]; the location
stresses a long redirect URL. Each cached header line is read back through
a HTS_URLMAXSIZE-sized parse buffer ("<field>: <value>\r\n"), so the
round-trippable value is shorter than HTS_URLMAXSIZE: 1000 stays safely
under that real limit. */
static char etag_long[251];
static char location_long[1001];
/* a body with embedded NUL and high bytes, to prove binary safety */
static const char binary_body[] = {
'P', 'N', 'G', '\0', '\r', '\n', (char) 0xFF, (char) 0x80,
'\0', '\0', 'e', 'n', 'd', (char) 0xCA, (char) 0xFE, '\n'};
/* large bodies for the compression pass; kept alive across the write and
read passes so the read can compare against them */
static const size_t large_size[] = {200000, 200000, 50000};
const int large_count = (int) (sizeof(large_size) / sizeof(large_size[0]));
char *large_body[3];
/* edge-case bodies, named so store and read assert the exact same bytes */
const char *const body_index = "<html><body>hello</body></html>";
const char *const body_api = "{\"k\":\"v\"}";
const char *const body_updated = "<html><body>UPDATED CONTENT</body></html>";
const char *const body_404 = "<html><body>404 Not Found</body></html>";
memset(etag_long, 'E', sizeof(etag_long) - 1);
etag_long[sizeof(etag_long) - 1] = '\0';
memset(location_long, 'L', sizeof(location_long) - 1);
location_long[sizeof(location_long) - 1] = '\0';
for (i = 0; i < large_count; i++) {
large_body[i] = malloct(large_size[i]);
gen_body(large_body[i], large_size[i], i);
}
/* set up an isolated cache directory */
{
char base[HTS_URLMAXSIZE];
strcpybuff(base, dir);
if (base[0] != '\0' && base[strlen(base) - 1] != '/') {
strcatbuff(base, "/");
}
StringCopy(opt->path_log, base);
}
opt->cache = 1;
/* pass 1: create everything in a single write session */
selftest_open_for_write(&cache, opt);
/* edge cases: normal HTML page */
store_entry(opt, &cache, "example.com", "/", "example.com/index.html", 200,
"OK", "text/html", "utf-8", "Mon, 01 Jan 2024 00:00:00 GMT",
"etag-normal", "", body_index, strlen(body_index));
/* redirect: empty body, empty optional fields, near-limit location */
store_entry(opt, &cache, "example.com", "/moved", "example.com/moved.html",
301, "Moved Permanently", "text/html", "", "", "", location_long,
NULL, 0);
/* non-HTML content-type kept in cache via all_in_cache, near-limit etag */
store_entry(opt, &cache, "example.com", "/api", "example.com/api.json", 200,
"OK", "application/json", "utf-8",
"Tue, 02 Jan 2024 12:00:00 GMT", etag_long, "", body_api,
strlen(body_api));
/* binary body */
store_entry(opt, &cache, "example.com", "/logo", "example.com/logo.png", 200,
"OK", "image/png", "", "", "etag-bin", "", binary_body,
sizeof(binary_body));
/* error status with a body and a location (non-2xx codes are cached too) */
store_entry(opt, &cache, "example.com", "/gone", "example.com/gone.html", 404,
"Not Found", "text/html", "utf-8", "", "etag-404",
"https://example.com/where-it-went", body_404, strlen(body_404));
/* scale: a few thousand small entries */
for (i = 0; i < SELFTEST_VOLUME; i++) {
char fil[64], save[128], body[64];
sprintf(fil, "/v/%05d", i);
sprintf(save, "example.com/v/%05d.html", i);
sprintf(body, "<html>volume entry %d</html>", i);
store_entry(opt, &cache, "example.com", fil, save, 200, "OK", "text/html",
"utf-8", "", "", "", body, strlen(body));
}
/* compression: a few large bodies */
for (i = 0; i < large_count; i++) {
char fil[64], save[128];
sprintf(fil, "/big/%d.bin", i);
sprintf(save, "example.com/big/%d.bin", i);
store_entry(opt, &cache, "example.com", fil, save, 200, "OK",
"application/octet-stream", "", "", "", "", large_body[i],
large_size[i]);
}
selftest_close(&cache);
/* pass 2: read back and verify everything round-tripped */
selftest_open_for_read(&cache, opt);
failures += check_entry(opt, &cache, "example.com", "/", 200, "OK",
"text/html", "utf-8", "Mon, 01 Jan 2024 00:00:00 GMT",
"etag-normal", "", body_index, strlen(body_index));
failures += check_entry(opt, &cache, "example.com", "/moved", 301,
"Moved Permanently", "text/html", "", "", "",
location_long, NULL, 0);
failures +=
check_entry(opt, &cache, "example.com", "/api", 200, "OK",
"application/json", "utf-8", "Tue, 02 Jan 2024 12:00:00 GMT",
etag_long, "", body_api, strlen(body_api));
failures +=
check_entry(opt, &cache, "example.com", "/logo", 200, "OK", "image/png",
"", "", "etag-bin", "", binary_body, sizeof(binary_body));
failures += check_entry(opt, &cache, "example.com", "/gone", 404, "Not Found",
"text/html", "utf-8", "", "etag-404",
"https://example.com/where-it-went", body_404,
strlen(body_404));
for (i = 0; i < SELFTEST_VOLUME; i++) {
char fil[64], body[64];
sprintf(fil, "/v/%05d", i);
sprintf(body, "<html>volume entry %d</html>", i);
failures +=
check_entry(opt, &cache, "example.com", fil, 200, "OK", "text/html",
"utf-8", "", "", "", body, strlen(body));
}
for (i = 0; i < large_count; i++) {
char fil[64];
sprintf(fil, "/big/%d.bin", i);
failures += check_entry(opt, &cache, "example.com", fil, 200, "OK",
"application/octet-stream", "", "", "", "",
large_body[i], large_size[i]);
}
selftest_close(&cache);
/* pass 3: update one edge entry with new body and headers */
selftest_open_for_write(&cache, opt);
store_entry(opt, &cache, "example.com", "/", "example.com/index.html", 200,
"OK", "text/html", "iso-8859-1", "Wed, 03 Jan 2024 09:30:00 GMT",
"etag-updated", "", body_updated, strlen(body_updated));
selftest_close(&cache);
/* pass 4: re-read and confirm the updated value, not the old one */
selftest_open_for_read(&cache, opt);
failures +=
check_entry(opt, &cache, "example.com", "/", 200, "OK", "text/html",
"iso-8859-1", "Wed, 03 Jan 2024 09:30:00 GMT", "etag-updated",
"", body_updated, strlen(body_updated));
selftest_close(&cache);
for (i = 0; i < large_count; i++) {
freet(large_body[i]);
}
return failures;
}

51
src/htscache_selftest.h Normal file
View File

@@ -0,0 +1,51 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 2026 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Important notes:
- We hereby ask people using this source NOT to use it in purpose of grabbing
emails addresses, or collecting any other private information on persons.
This would disgrace our work, and spoil the many hours we spent on it.
Please visit our Website: http://www.httrack.com
*/
/* ------------------------------------------------------------ */
/* File: htscache_selftest.h */
/* Author: Xavier Roche */
/* ------------------------------------------------------------ */
#ifndef HTSCACHE_SELFTEST_DEFH
#define HTSCACHE_SELFTEST_DEFH
#ifdef HTS_INTERNAL_BYTECODE
#ifndef HTS_DEF_FWSTRUCT_httrackp
#define HTS_DEF_FWSTRUCT_httrackp
typedef struct httrackp httrackp;
#endif
/* Run the cache create/read/update self-test against a working directory.
Returns the number of failed checks (0 == success). */
int cache_selftests(httrackp *opt, const char *dir);
#endif
#endif

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2013 Xavier Roche and other contributors
Copyright (C) 2014 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -3993,5 +3995,5 @@ void voidf(void) {
(void) a;
}
// HTTrack Website Copier Copyright (C) 1998-2017 Xavier Roche and other contributors
// HTTrack Website Copier Copyright (C) 1998 Xavier Roche and other contributors
//

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -46,6 +48,7 @@ Please visit our Website: http://www.httrack.com
#include "htszlib.h"
#include "htscharset.h"
#include "htsencoding.h"
#include "htscache_selftest.h"
#include "htsmd5.h"
#include <ctype.h>
@@ -152,6 +155,89 @@ static void basic_selftests(void) {
assertf(strcmp(cookie_get(cbuf, "a\t\tc", 1), "") == 0); // empty field
assertf(strcmp(cookie_get(cbuf, "a\tb\tc", 9), "") == 0); // beyond last
}
// back_infostr() status-line formatting (no sockets: pure formatting over
// in-memory slots). Stresses a few thousand entries across every status-code
// arm. Regression for a clobber bug where the size/totalsize trailer was
// written straight into the destination, wiping the URL it had just built.
{
static const struct {
int code;
const char *tag;
} cases[] = {
{200, "READY "}, {-1, "ERROR "}, {-2, "TIMEOUT "},
{-3, "TOOSLOW "}, {400, "BADREQUEST "}, {403, "FORBIDDEN "},
{404, "NOT FOUND "}, {500, "SERVERROR "}, {999, "ERROR(999)"},
};
const int ncases = (int) (sizeof(cases) / sizeof(cases[0]));
const int n = 2000;
lien_back *slots = calloct(n, sizeof(lien_back));
char line[HTS_URLMAXSIZE * 4 + 1024];
char expect[HTS_URLMAXSIZE * 4 + 1024];
struct_back sb;
int idx;
sb.lnk = slots;
sb.count = n;
sb.ready = NULL;
sb.ready_size_bytes = 0;
for (idx = 0; idx < n; idx++) {
lien_back *const slot = &slots[idx];
slot->r.location = slot->location_buffer;
slot->status = STATUS_READY;
slot->r.statuscode = cases[idx % ncases].code;
slot->r.size = idx;
slot->r.totalsize = idx + 1;
snprintf(slot->url_adr, sizeof(slot->url_adr), "http://h%d.example", idx);
snprintf(slot->url_fil, sizeof(slot->url_fil), "/p/%d.html", idx);
}
for (idx = 0; idx < n; idx++) {
line[0] = '\0';
back_infostr(&sb, idx, 3, line, sizeof(line));
// Exact match (not substring): pins tag/URL/trailer order and rejects a
// partial clobber, duplication, or truncation that a presence check would
// let through. The expected format is stated here independently.
snprintf(expect, sizeof(expect),
"%s\"http://h%d.example/p/%d.html\" " LLintP " " LLintP " ",
cases[idx % ncases].tag, idx, idx, (LLint) idx,
(LLint) (idx + 1));
assertf(strcmp(line, expect) == 0);
}
// Near-maximal URL, driven through back_info() (which owns the status
// buffer internally and prints to a FILE*). url_adr + url_fil together
// overrun the old HTS_URLMAXSIZE*2+1024 buffer, so the bounded appends
// would abort unless that buffer is sized to hold both fields. Regression
// for that sizing -- exercising back_infostr() directly would miss it,
// since the caller's buffer is what matters.
{
lien_back *const slot = &slots[0];
const size_t adrlen = sizeof(slot->url_adr) - 8;
const size_t fillen = sizeof(slot->url_fil) - 8;
FILE *const fp = tmpfile();
size_t got;
assertf(fp != NULL);
slot->status = STATUS_READY;
slot->r.statuscode = 200;
slot->r.size = 1;
slot->r.totalsize = 2;
memset(slot->url_adr, 'a', adrlen);
slot->url_adr[adrlen] = '\0';
slot->url_fil[0] = '/';
memset(slot->url_fil + 1, 'b', fillen - 1);
slot->url_fil[fillen] = '\0';
back_info(&sb, 0, 3, fp);
rewind(fp);
got = fread(line, 1, sizeof(line) - 1, fp);
line[got] = '\0';
fclose(fp);
snprintf(expect, sizeof(expect),
"READY \"%s%s\" " LLintP " " LLintP " " LF, slot->url_adr,
slot->url_fil, (LLint) 1, (LLint) 2);
assertf(strcmp(line, expect) == 0);
}
freet(slots);
}
}
/* Self-tests for the htssafe.h bounded string ops (driven by httrack -#8).
@@ -2113,6 +2199,19 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
case '#':{ // non documenté
com++;
switch (*com) {
case 'A': // cache self-test: httrack -#A <dir>
if (na + 1 < argc) {
const int err = cache_selftests(opt, argv[na + 1]);
printf("cache-selftest: %s\n", err ? "FAIL" : "OK");
htsmain_free();
return err;
} else {
fprintf(stderr, "Option #A requires a directory argument\n");
htsmain_free();
return 1;
}
break;
case 'C': // list cache files : httrack -#C '*spid*.gif' will attempt to find the matching file
{
int hasFilter = 0;
@@ -2155,8 +2254,8 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
char firstline[256];
char *a = cacheNdx;
a += cache_brstr(a, firstline);
a += cache_brstr(a, firstline);
a += cache_brstr(a, firstline, sizeof(firstline));
a += cache_brstr(a, firstline, sizeof(firstline));
while(a != NULL) {
a = strchr(a + 1, '\n'); /* start of line */
if (a) {

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 2013 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 2013 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -202,7 +204,7 @@ Please visit our Website: http://www.httrack.com
/* Taille max ligne de commande (>=HTS_URLMAXSIZE*2) */
#define HTS_CDLMAXSIZE 1024
/* Copyright (C) 1998-2017 Xavier Roche and other contributors */
/* Copyright (C) 1998 Xavier Roche and other contributors */
#define HTTRACK_AFF_AUTHORS "[XR&CO'2014]"
#define HTS_DEFAULT_FOOTER "<!-- Mirrored from %s%s by HTTrack Website Copier/" HTTRACK_AFF_VERSION " " HTTRACK_AFF_AUTHORS ", %s -->"
#define HTTRACK_WEB "http://www.httrack.com"

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -182,7 +184,8 @@ void help_wizard(httrackp * opt) {
printf("\n");
printf("Welcome to HTTrack Website Copier (Offline Browser) " HTTRACK_VERSION
"%s\n", hts_get_version_info(opt));
printf("Copyright (C) 1998-2017 Xavier Roche and other contributors\n");
printf("Copyright (C) 1998-%s Xavier Roche and other contributors\n",
&__DATE__[7]);
#ifdef _WIN32
printf("Note: You are running the commandline version,\n");
printf("run 'WinHTTrack.exe' to get the GUI version.\n");
@@ -795,7 +798,10 @@ void help(const char *app, int more) {
snprintf(info, sizeof(info), "HTTrack version " HTTRACK_VERSION "%s",
hts_is_available());
infomsg(info);
infomsg("Copyright (C) 1998-2017 Xavier Roche and other contributors");
snprintf(info, sizeof(info),
"Copyright (C) 1998-%s Xavier Roche and other contributors",
&__DATE__[7]);
infomsg(info);
#ifdef HTS_PLATFORM_NAME
infomsg("[compiled: " HTS_PLATFORM_NAME "]");
#endif

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 2014 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -91,7 +93,8 @@ int main(int argc, char *argv[]) {
/* Args */
printf("ProxyTrack %s, build proxies upon HTTrack Website Copier Archives\n",
PROXYTRACK_VERSION);
printf("Copyright (C) 1998-2017 Xavier Roche and other contributors\n");
printf("Copyright (C) 1998-%s Xavier Roche and other contributors\n",
&__DATE__[7]);
printf("\n");
printf("This program is free software: you can redistribute it and/or modify\n");
printf("it under the terms of the GNU General Public License as published by\n");

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View File

@@ -1,7 +1,9 @@
/* ------------------------------------------------------------ */
/*
HTTrack Website Copier, Offline Browser for Windows and Unix
Copyright (C) 1998-2017 Xavier Roche and other contributors
Copyright (C) 1998 Xavier Roche and other contributors
SPDX-License-Identifier: GPL-3.0-or-later
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

46
tests/01_engine-cache.test Executable file
View File

@@ -0,0 +1,46 @@
#!/bin/bash
#
# Cache create/read/update logic (driven by 'httrack -#A <dir>').
#
# The in-process self-test stores several hand-crafted edge entries (normal
# HTML, an empty redirect with a near-limit location, a non-HTML body kept via
# all-in-cache, a binary body with embedded NUL/high bytes), a few thousand
# small entries (index/lookup scale), and a few large compressible and
# incompressible bodies (zlib deflate/inflate). It reads everything back
# asserting every header field and the body round-trip byte for byte, then
# updates one entry and confirms the new value is read back. It exits non-zero
# on the first mismatch.
set -eu
dir=$(mktemp -d)
trap 'rm -rf "$dir"' EXIT
# Like the other -# debug modes, a trailing token (the working directory) is
# required; a bare '-#A' falls through to the usage screen.
out=$(httrack -#A "$dir")
# Match the exact success line, so the test cannot pass for an unrelated reason
# (e.g. the -#A mode being gone and falling through to the usage screen, which
# also exits non-zero but never prints this).
test "$out" = "cache-selftest: OK" || {
echo "expected 'cache-selftest: OK', got: $out" >&2
exit 1
}
# The self-test must have actually produced a ZIP cache on disk.
test -e "$dir/hts-cache/new.zip" || {
echo "no ZIP cache was written by the self-test" >&2
exit 1
}
# Sanity-check the cache footprint: the few-thousand-entry pass is expected to
# weigh ~1-2 MB. Fail if it balloons well past that (e.g. a per-entry overhead
# regression or runaway growth), so the cache size stays bounded.
ceiling=$((4 * 1024 * 1024))
bytes=$(du -sb "$dir/hts-cache" | cut -f1)
test "$bytes" -le "$ceiling" || {
echo "cache footprint $bytes bytes exceeds ${ceiling} ceiling" >&2
exit 1
}

62
tests/02_update-cache.test Executable file
View File

@@ -0,0 +1,62 @@
#!/bin/bash
#
# Update path: re-mirroring a site reads the cache (cache_readex) to decide what
# is up to date -- a path the one-shot crawl tests never exercise. Offline
# (file://), so it always runs.
#
# 1. mirror, then re-mirror unchanged -> the cache-read pass must complete clean
# (guards against a crash/abort/error in cache_readex).
# 2. change a source file, re-mirror -> the update must pick up the new content
# (guards the update decision that reads the cached metadata).
set -eu
site=$(mktemp -d)
out=$(mktemp -d)
trap 'rm -rf "$site" "$out"' EXIT
cat >"$site/index.html" <<EOF
<a href="a.html">a</a> <a href="sub/b.html">b</a>
EOF
echo 'OLDCONTENT' >"$site/a.html"
mkdir -p "$site/sub"
echo '<p>bbb</p>' >"$site/sub/b.html"
url="file://$site/index.html"
# count Error: lines in the log (grep -c exits 1 on zero matches: guard it)
errors() { grep -ciE '^[0-9:]*[[:space:]]Error:' "$out/hts-log.txt" || true; }
# 1. fresh mirror writes the cache
httrack "$url" -O "$out" -q -%v0 -r3 >/dev/null 2>&1
test -e "$out/hts-cache/new.zip" || {
echo "no cache was written" >&2
exit 1
}
# 2. re-mirror unchanged: the update reads the cache and must complete cleanly
httrack "$url" -O "$out" -q -%v0 -r3 >/dev/null 2>&1
test "$(errors)" = 0 || {
echo "update (unchanged) reported errors" >&2
exit 1
}
for suffix in a.html sub/b.html; do
find "$out" -path "*/$suffix" | grep -q . || {
echo "missing $suffix after update" >&2
exit 1
}
done
# 3. change a source file: the update must pick up the new content
sleep 1
echo 'NEWCONTENT' >"$site/a.html"
httrack "$url" -O "$out" -q -%v0 -r3 >/dev/null 2>&1
test "$(errors)" = 0 || {
echo "update (changed) reported errors" >&2
exit 1
}
grep -q NEWCONTENT "$(find "$out" -path '*/a.html')" || {
echo "update did not pick up the changed source" >&2
exit 1
}

View File

@@ -9,8 +9,14 @@ TESTS_ENVIRONMENT += HTTPS_SUPPORT=$(HTTPS_SUPPORT)
TESTS_ENVIRONMENT += top_srcdir=$(top_srcdir)
TEST_EXTENSIONS = .test
# Run each .test through bash instead of execve()ing it. This lets "make check"
# work when the source tree sits on a noexec filesystem (the driver would
# otherwise fail with "Permission denied"), and removes any reliance on the
# scripts' executable bit. The scripts are #!/bin/bash and use bash features.
TEST_LOG_COMPILER = $(BASH)
TESTS = \
00_runnable.test \
01_engine-cache.test \
01_engine-charset.test \
01_engine-cmdline.test \
01_engine-entities.test \
@@ -22,6 +28,7 @@ TESTS = \
01_engine-simplify.test \
01_engine-strsafe.test \
02_manpage-regen.test \
02_update-cache.test \
10_crawl-simple.test \
11_crawl-cookies.test \
11_crawl-idna.test \