Compare commits

...

12 Commits

Author SHA1 Message Date
Xavier Roche
90847cf083 ci: single-source the llvm tag so key and URL can't drift
The git-clang-format cache key and fetch URL each hardcoded llvmorg-19.1.7.
A future one-sided version bump would leave the cache serving the old driver
under a stale key. Pull the tag into an LLVM_TAG job env, mirroring how the
lint job already single-sources SHFMT_VERSION.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 21:44:18 +02:00
Xavier Roche
1611add5a9 ci: log cache hit vs cold fetch for both cached tools
Print an explicit HIT/MISS line in each install step so the job log shows
whether the binary came from the cache or was downloaded.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 21:39:38 +02:00
Xavier Roche
92f4ea044b ci: cache the shfmt binary too
shfmt has the same shape as the git-clang-format driver: a pinned, immutable
release binary curled from github.com on every lint run. Cache it keyed on the
pinned version (and arch) so it is fetched at most once, and retry the cold
fetch through transient errors -- so the lint job stops depending on a
github.com download succeeding on every PR run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 21:38:51 +02:00
Xavier Roche
4344801983 ci: cache the git-clang-format driver instead of refetching it every run
The format job downloaded git-clang-format from raw.githubusercontent.com on
every run. The URL is pinned to an immutable tag (llvmorg-19.1.7), so the file
never changes, yet the repeated fetch was the one thing in the job that could
hit raw.githubusercontent.com's per-IP rate limit -- and on shared runner
egress IPs it did, failing the job with curl 429 (apt.llvm.org was fine; only
the step name suggested otherwise).

Cache the driver keyed on the tag so it is fetched at most once, and retry the
cold fetch through transient 429s with curl --retry --retry-all-errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 21:34:47 +02:00
Xavier Roche
25b9a53c89 Merge pull request #372 from xroche/cleanup/htsparse-bounds
Bound htsparse.c pointer-destination buffer writes
2026-06-16 21:32:24 +02:00
Xavier Roche
5a716a0e30 Bound htsparse.c pointer-destination buffer writes (batch 15)
The makeindex_firstlink_, base, codebase and loc_ aliases in the HTML
parser are bare char* views onto HTS_URLMAXSIZE*2 caller arrays, so
strcpybuff degraded to a raw strcpy (htssafe.h pointer-dest branch).
Bound all five with strlcpybuff(..., HTS_URLMAXSIZE*2), the documented
capacity of every target (makeindex_firstlink/base/codebase/loc in
htscore.c, r->location aliasing loc).

Behavior-preserving: each source (tempo, lien, back[].r.location) is
itself an HTS_URLMAXSIZE*2 buffer, so its NUL-terminated contents are
<= cap-1 and copy identically; no truncation is reachable. htsparse.c
now has zero pointer-destination warnings; htsserver.c (5) is the last
file before the stub can be flipped to an error.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 21:20:01 +02:00
Xavier Roche
4bc6855213 Merge pull request #371 from xroche/cleanup/htsalias-bounds
Bound htsalias.c config-file alias buffer writes (batch 14)
2026-06-16 20:45:31 +02:00
Xavier Roche
fe8bd59d19 Bound htsalias.c pointer-destination buffer writes (batch 14)
htsalias.c keeps its own copy of htscoremain.c's cmdl_ins macro (config-file
alias expansion in optinclude_file). The copy still wrote alias-expanded tokens
into the argv block with an unbounded strcpybuff on a bare char*. Thread the
block capacity (x_argvblk_size) through optinclude_file and bound the insert
with strlcpybuff + cmdl_room, the same guard batch 13 applied to the original:
cmdl_room yields 0 instead of size_t-wrapping when the offset outruns the block,
so an alias/doit.log expansion bomb aborts cleanly rather than overflowing.

Adds 01_engine-rcfile.test, which had no coverage before: it drops a .httrackrc
with a long user-agent alias in the working directory, runs httrack with no -O
(the only way the rc files load), and checks the alias-expanded -F <value> token
reaches hts-cache/doit.log intact. user-agent expands to two tokens, exercising
both cmdl_ins insertions; a truncating bound is caught (verified by injecting
one).

htsalias.c pointer-destination warnings 2->0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 20:41:08 +02:00
Xavier Roche
83d813eb7f Merge pull request #370 from xroche/cleanup/htscoremain-bounds
Bound htscoremain.c pointer-destination buffer writes (batch 13)
2026-06-16 19:37:06 +02:00
Xavier Roche
31eead95df Bound htscoremain.c pointer-destination buffer writes (batch 13)
Continues the htssafe.h pointer-destination migration in the CLI parser
(hts_main_internal). All sites write into a bare char*.

* The cmdl_add()/cmdl_ins() macros build argv entries into the x_argvblk block
  (malloc'd as the command-line size + 32768). Thread the block's total size
  (recorded in a new x_argvblk_size) and bound the copy with strlcpybuff. The
  remaining room is computed by a cmdl_room() helper that yields 0 once the block
  is exhausted (alias expansion or doit.log insertion can outrun the 32768 slack)
  so the copy aborts cleanly instead of the size_t subtraction wrapping to a huge
  unbounded value.
* The in-place argv rewrites each write no more than the slot already holds, so
  they are bounded by strlen(dest)+1 (provably sufficient): the "(none)" ->
  "\"\"" replacement, the two quote-strip copies (tempo is argv[na] minus its
  surrounding quotes), and the "--catchurl" -> "-#P" rewrite. The "--clean"/
  "--tide" empty rewrite becomes a direct argv[i][1]='\0'.
* Guard the quote-strip's tempo[strlen(tempo)-1] read: a lone '"' argument left
  tempo empty and read tempo[-1] (out of bounds). It now takes the existing
  missing-quote error path.
* The URL accumulator append uses strlcatbuff against the tracked url_sz.

These are macros/locals inside hts_main_internal, so not -#7 unit-testable;
cmdl_add runs on every invocation (covered by the whole suite). New
01_engine-cmdline.test cases exercise the quote-strip rewrite as the sole URL (a
quoted URL is mirrored; dangling- and lone-quote arguments are refused cleanly,
never a crash).

htscoremain.c pointer-destination warnings: 10 -> 0.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 19:29:30 +02:00
Xavier Roche
1f29ed41db Bound htscoremain.c pointer-destination buffer writes (batch 13)
Continues the htssafe.h pointer-destination migration in the CLI parser
(hts_main_internal). All sites write into a bare char*.

* The cmdl_add()/cmdl_ins() macros build argv entries into the x_argvblk block
  (malloc'd as the command-line size + 32768). Thread the block's total size and
  bound the copy with strlcpybuff(argv[i], token, bufsize - ptr); record the size
  in a new x_argvblk_size alongside x_argvblk.
* The in-place argv rewrites each write no more than the slot already holds, so
  they are bounded by strlen(dest)+1 (provably sufficient): the "(none)" ->
  "\"\"" replacement, the two quote-strip copies (tempo is argv[na] minus its
  surrounding quotes), and the "--catchurl" -> "-#P" rewrite. The "--clean"/
  "--tide" empty rewrite becomes a direct argv[i][1]='\0'.
* The URL accumulator append uses strlcatbuff against the tracked url_sz.

These are macros/locals inside hts_main_internal, so they are not -#7
unit-testable; cmdl_add runs on every invocation (covered by the whole suite),
and a new 01_engine-cmdline.test case exercises the quote-strip rewrite (a quoted
URL is mirrored; a dangling quote is refused cleanly, never a crash).

htscoremain.c pointer-destination warnings: 10 -> 0.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 18:57:19 +02:00
Xavier Roche
9db360e5fd Merge pull request #369 from xroche/cleanup/htstools-bounds
Bound htstools.c pointer-destination buffer writes (batch 12)
2026-06-16 18:25:07 +02:00
9 changed files with 246 additions and 71 deletions

View File

@@ -320,20 +320,37 @@ jobs:
lint:
name: lint (shellcheck, shfmt)
runs-on: ubuntu-24.04
env:
SHFMT_VERSION: v3.8.0
steps:
- uses: actions/checkout@v6
# shfmt is a pinned release binary, so it never changes: cache it keyed on
# the version. Same rationale as the git-clang-format driver below -- avoid
# re-downloading an unchanging file from github.com on every run.
- name: Cache shfmt binary
uses: actions/cache@v4
with:
path: ~/.cache/shfmt/shfmt
key: shfmt-${{ env.SHFMT_VERSION }}-${{ runner.arch }}
- name: Install linters
env:
SHFMT_VERSION: v3.8.0
run: |
set -euo pipefail
sudo apt-get update
sudo apt-get install -y --no-install-recommends shellcheck
# shfmt is not packaged in apt; fetch a pinned release binary.
curl -fsSL -o /tmp/shfmt \
"https://github.com/mvdan/sh/releases/download/${SHFMT_VERSION}/shfmt_${SHFMT_VERSION}_linux_$(dpkg --print-architecture)"
sudo install -m 0755 /tmp/shfmt /usr/local/bin/shfmt
# shfmt is not packaged in apt; fetch a pinned release binary (cold
# cache only), retrying through transient errors.
shfmt="$HOME/.cache/shfmt/shfmt"
if [ ! -s "$shfmt" ]; then
echo "shfmt cache MISS: fetching ${SHFMT_VERSION} from github.com"
mkdir -p "$(dirname "$shfmt")"
curl --retry 5 --retry-all-errors -fsSL -o "$shfmt" \
"https://github.com/mvdan/sh/releases/download/${SHFMT_VERSION}/shfmt_${SHFMT_VERSION}_linux_$(dpkg --print-architecture)"
else
echo "shfmt cache HIT: using cached ${SHFMT_VERSION}"
fi
sudo install -m 0755 "$shfmt" /usr/local/bin/shfmt
# Lint the scripts we maintain; the legacy scripts are a separate cleanup.
- name: shellcheck
@@ -349,11 +366,24 @@ jobs:
name: format (clang-format-19, changed lines)
if: github.event_name == 'pull_request'
runs-on: ubuntu-24.04
env:
# Single-source the tag so the cache key and the fetch URL can never drift.
LLVM_TAG: llvmorg-19.1.7
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0
# The git-clang-format driver is pinned to an immutable release tag, so the
# fetched file never changes: cache it keyed on the tag. raw.githubusercontent.com
# 429-rate-limits the shared runner egress IPs, and re-downloading an unchanging
# file every run was the only thing that could (and did) hit that limit.
- name: Cache git-clang-format driver
uses: actions/cache@v4
with:
path: ~/.cache/git-clang-format/git-clang-format
key: git-clang-format-${{ env.LLVM_TAG }}
- name: Install clang-format 19 (pinned, from apt.llvm.org)
run: |
set -euo pipefail
@@ -364,11 +394,17 @@ jobs:
| sudo tee /etc/apt/sources.list.d/llvm-19.list >/dev/null
sudo apt-get update
sudo apt-get install -y --no-install-recommends clang-format-19
# git-clang-format driver, pinned to an immutable release tag (not a
# moving branch) since we curl and then execute it.
sudo curl -fsSL -o /usr/local/bin/git-clang-format \
https://raw.githubusercontent.com/llvm/llvm-project/llvmorg-19.1.7/clang/tools/clang-format/git-clang-format
sudo chmod 0755 /usr/local/bin/git-clang-format
# Cold cache only: fetch the driver, retrying through transient 429s.
driver="$HOME/.cache/git-clang-format/git-clang-format"
if [ ! -s "$driver" ]; then
echo "git-clang-format cache MISS: fetching ${LLVM_TAG} from raw.githubusercontent.com"
mkdir -p "$(dirname "$driver")"
curl --retry 5 --retry-all-errors -fsSL -o "$driver" \
"https://raw.githubusercontent.com/llvm/llvm-project/${LLVM_TAG}/clang/tools/clang-format/git-clang-format"
else
echo "git-clang-format cache HIT: using cached ${LLVM_TAG}"
fi
sudo install -m 0755 "$driver" /usr/local/bin/git-clang-format
clang-format-19 --version
- name: Check formatting of changed lines

View File

@@ -41,19 +41,24 @@ Please visit our Website: http://www.httrack.com
#define _NOT_NULL(a) ( (a!=NULL) ? (a) : "" )
// COPY OF cmdl_ins in htsmain.c
// Insert a command in the argc/argv
#define cmdl_ins(token,argc,argv,buff,ptr) \
{ \
int i; \
for(i=argc;i>0;i--)\
argv[i]=argv[i-1];\
} \
argv[0]=(buff+ptr); \
strcpybuff(argv[0],token); \
ptr += (int) (strlen(argv[0])+1); \
// COPY OF cmdl_ins in htscoremain.c
/* Bytes left in x_argvblk from offset ptr. The offset can in principle outrun
the block (alias/doit.log expansion), so the copy aborts cleanly instead of
the subtraction wrapping to a huge unbounded size. */
#define cmdl_room(bufsize, ptr) \
((ptr) < (size_t) (bufsize) ? (size_t) (bufsize) - (ptr) : 0)
// Insert a command in the argc/argv (buff has total capacity bufsize)
#define cmdl_ins(token, argc, argv, buff, bufsize, ptr) \
{ \
int i; \
for (i = argc; i > 0; i--) \
argv[i] = argv[i - 1]; \
} \
argv[0] = (buff + ptr); \
strlcpybuff(argv[0], token, cmdl_room(bufsize, ptr)); \
ptr += (int) (strlen(argv[0]) + 1); \
argc++
// END OF COPY OF cmdl_ins in htsmain.c
// END OF COPY OF cmdl_ins in htscoremain.c
/*
Aliases for command-line and config file definitions
@@ -468,7 +473,7 @@ const char *optalias_help(const char *token) {
*/
/* Note: NOT utf-8 */
int optinclude_file(const char *name, int *argc, char **argv, char *x_argvblk,
int *x_ptr) {
size_t x_argvblk_size, int *x_ptr) {
FILE *fp;
fp = fopen(name, "rb");
@@ -542,14 +547,15 @@ int optinclude_file(const char *name, int *argc, char **argv, char *x_argvblk,
/* temporary argc: Number of parameters after minus insert_after_argc */
insert_after_argc = (*argc) - insert_after;
cmdl_ins((tmp_argv[2]), insert_after_argc, (argv + insert_after),
x_argvblk, (*x_ptr));
x_argvblk, x_argvblk_size, (*x_ptr));
*argc = insert_after_argc + insert_after;
insert_after++;
/* Second one */
if (return_argc > 1) {
insert_after_argc = (*argc) - insert_after;
cmdl_ins((tmp_argv[3]), insert_after_argc,
(argv + insert_after), x_argvblk, (*x_ptr));
(argv + insert_after), x_argvblk, x_argvblk_size,
(*x_ptr));
*argc = insert_after_argc + insert_after;
insert_after++;
}

View File

@@ -45,7 +45,7 @@ int optalias_find(const char *token);
const char *optalias_help(const char *token);
int optreal_find(const char *token);
int optinclude_file(const char *name, int *argc, char **argv, char *x_argvblk,
int *x_ptr);
size_t x_argvblk_size, int *x_ptr);
const char *optreal_value(int p);
const char *optalias_value(int p);
const char *opttype_value(int p);

View File

@@ -69,23 +69,29 @@ Please visit our Website: http://www.httrack.com
/* Resolver */
extern int IPV6_resolver;
// Add a command in the argc/argv
#define cmdl_add(token,argc,argv,buff,ptr) \
argv[argc]=(buff+ptr); \
strcpybuff(argv[argc],token); \
ptr += (int) (strlen(argv[argc])+2); \
/* Remaining room in the argv block; 0 once it is exhausted (alias expansion or
doit.log insertion can outrun the +32768 slack), so the copy aborts cleanly
instead of the subtraction wrapping to a huge unbounded size. */
#define cmdl_room(bufsize, ptr) \
((ptr) < (size_t) (bufsize) ? (size_t) (bufsize) - (ptr) : 0)
// Add a command in the argc/argv (buff has total capacity bufsize)
#define cmdl_add(token, argc, argv, buff, bufsize, ptr) \
argv[argc] = (buff + ptr); \
strlcpybuff(argv[argc], token, cmdl_room(bufsize, ptr)); \
ptr += (int) (strlen(argv[argc]) + 2); \
argc++
// Insert a command in the argc/argv
#define cmdl_ins(token,argc,argv,buff,ptr) \
{ \
int i; \
for(i=argc;i>0;i--)\
argv[i]=argv[i-1];\
} \
argv[0]=(buff+ptr); \
strcpybuff(argv[0],token); \
ptr += (int) (strlen(argv[0])+2); \
// Insert a command in the argc/argv (buff has total capacity bufsize)
#define cmdl_ins(token, argc, argv, buff, bufsize, ptr) \
{ \
int i; \
for (i = argc; i > 0; i--) \
argv[i] = argv[i - 1]; \
} \
argv[0] = (buff + ptr); \
strlcpybuff(argv[0], token, cmdl_room(bufsize, ptr)); \
ptr += (int) (strlen(argv[0]) + 2); \
argc++
#define htsmain_free() do { \
@@ -592,6 +598,7 @@ HTSEXT_API int hts_main2(int argc, char **argv, httrackp * opt) {
static int hts_main_internal(int argc, char **argv, httrackp * opt) {
char **x_argv = NULL; // Patch pour argv et argc: en cas de récupération de ligne de commande
char *x_argvblk = NULL; // (reprise ou update)
size_t x_argvblk_size = 0; // total capacity of x_argvblk
int x_ptr = 0; // offset
//
@@ -669,7 +676,8 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
*a = ' ';
/* equivalent to "empty parameter" */
if ((strcmp(argv[na], HTS_NOPARAM) == 0) || (strcmp(argv[na], HTS_NOPARAM2) == 0)) // (none)
strcpybuff(argv[na], "\"\"");
/* replacing "(none)"/"\"(none)\"" with "\"\"" always fits in place */
strlcpybuff(argv[na], "\"\"", strlen(argv[na]) + 1);
if (strncmp(argv[na], "-&", 2) == 0)
argv[na][1] = '%';
}
@@ -691,6 +699,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
htsmain_free();
return -1;
}
x_argvblk_size = (size_t) (current_size + 32768);
x_argvblk[0] = '\0';
x_ptr = 0;
@@ -712,7 +721,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
//
argv_url = 0; /* pour comptage */
//
cmdl_add(argv[0], x_argc, x_argv, x_argvblk, x_ptr);
cmdl_add(argv[0], x_argc, x_argv, x_argvblk, x_argvblk_size, x_ptr);
na = 1; /* commencer après nom_prg */
while(na < argc) {
int result = 1;
@@ -733,9 +742,10 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
}
/* Copier */
cmdl_add(tmp_argv[0], x_argc, x_argv, x_argvblk, x_ptr);
cmdl_add(tmp_argv[0], x_argc, x_argv, x_argvblk, x_argvblk_size, x_ptr);
if (tmp_argc > 1) {
cmdl_add(tmp_argv[1], x_argc, x_argv, x_argvblk, x_ptr);
cmdl_add(tmp_argv[1], x_argc, x_argv, x_argvblk, x_argvblk_size,
x_ptr);
}
/* Compter URLs et détecter -i,-q.. */
@@ -807,7 +817,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
char BIGSTK tempo[HTS_CDLMAXSIZE];
strcpybuff(tempo, argv[na] + 1);
if (tempo[strlen(tempo) - 1] != '"') {
if (tempo[0] == '\0' || tempo[strlen(tempo) - 1] != '"') {
char BIGSTK s[HTS_CDLMAXSIZE];
sprintf(s, "Missing quote in %s", argv[na]);
@@ -816,7 +826,9 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
return -1;
}
tempo[strlen(tempo) - 1] = '\0';
strcpybuff(argv[na], tempo);
/* tempo is argv[na] minus its surrounding quotes, so it fits in place
*/
strlcpybuff(argv[na], tempo, strlen(argv[na]) + 1);
}
if (cmdl_opt(argv[na])) { // option
@@ -917,18 +929,19 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt),
StringBuff(opt->path_log),
"hts-cache/doit.log"))) || (argv_url > 0)) {
if (!optinclude_file
(fconcat
(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt),
StringBuff(opt->path_log), HTS_HTTRACKRC),
&argc, argv, x_argvblk, &x_ptr))
if (!optinclude_file(HTS_HTTRACKRC, &argc, argv, x_argvblk, &x_ptr)) {
if (!optinclude_file
(fconcat(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt),
hts_gethome(), "/" HTS_HTTRACKRC),
&argc, argv, x_argvblk, &x_ptr)) {
if (!optinclude_file(
fconcat(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt),
StringBuff(opt->path_log), HTS_HTTRACKRC),
&argc, argv, x_argvblk, x_argvblk_size, &x_ptr))
if (!optinclude_file(HTS_HTTRACKRC, &argc, argv, x_argvblk,
x_argvblk_size, &x_ptr)) {
if (!optinclude_file(
fconcat(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt),
hts_gethome(), "/" HTS_HTTRACKRC),
&argc, argv, x_argvblk, x_argvblk_size, &x_ptr)) {
#ifdef HTS_HTTRACKCNF
optinclude_file(HTS_HTTRACKCNF, &argc, argv, x_argvblk, &x_ptr);
optinclude_file(HTS_HTTRACKCNF, &argc, argv, x_argvblk,
x_argvblk_size, &x_ptr);
#endif
}
}
@@ -981,7 +994,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
if (strnotempty(lastp)) {
insert_after_argc = argc - insert_after;
cmdl_ins(lastp, insert_after_argc, (argv + insert_after), x_argvblk,
x_ptr);
x_argvblk_size, x_ptr);
argc = insert_after_argc + insert_after;
insert_after++;
}
@@ -1101,7 +1114,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
if (argv[i][0] == '-') {
if (argv[i][1] == '-') { // --xxx
if ((strfield2(argv[i] + 2, "clean")) || (strfield2(argv[i] + 2, "tide"))) { // nettoyer
strcpybuff(argv[i] + 1, "");
argv[i][1] = '\0';
if (fexist
(fconcat
(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt), StringBuff(opt->path_log), "hts-log.txt")))
@@ -1210,7 +1223,8 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
//
} else if (strfield2(argv[i] + 2, "catchurl")) { // capture d'URL via proxy temporaire!
argv_url = 1; // forcer a passer les parametres
strcpybuff(argv[i] + 1, "#P");
/* argv[i] is "--catchurl"; "#P" fits after its first char */
strlcpybuff(argv[i] + 1, "#P", strlen(argv[i] + 1) + 1);
//
} else if (strfield2(argv[i] + 2, "updatehttrack")) {
#ifdef _WIN32
@@ -1538,7 +1552,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
char BIGSTK tempo[HTS_CDLMAXSIZE + 256];
strcpybuff(tempo, argv[na] + 1);
if (tempo[strlen(tempo) - 1] != '"') {
if (tempo[0] == '\0' || tempo[strlen(tempo) - 1] != '"') {
char s[HTS_CDLMAXSIZE + 256];
sprintf(s, "Missing quote in %s", argv[na]);
@@ -1547,7 +1561,9 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
return -1;
}
tempo[strlen(tempo) - 1] = '\0';
strcpybuff(argv[na], tempo);
/* tempo is argv[na] minus its surrounding quotes, so it fits in place
*/
strlcpybuff(argv[na], tempo, strlen(argv[na]) + 1);
}
if (cmdl_opt(argv[na])) { // option
@@ -3206,7 +3222,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
if (urlSize < HTS_URLMAXSIZE) {
ensureUrlCapacity(url, url_sz, capa);
if (strnotempty(url))
strcatbuff(url, " "); // espace de séparation
strlcatbuff(url, " ", url_sz); // separator space
append_escape_spc_url(unescape_http_unharm(catbuff, sizeof(catbuff), argv[na], 1), url, url_sz);
}
} // if argv=- etc.

View File

@@ -617,13 +617,15 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
"index.html")) == 0) {
detect_title = 1; // ok détecté pour cette page!
makeindex_links++; // un de plus
strcpybuff(makeindex_firstlink, tempo);
strlcpybuff(makeindex_firstlink, tempo,
HTS_URLMAXSIZE * 2);
//
/* Hack */
if (opt->mimehtml) {
strcpybuff(makeindex_firstlink,
"cid:primary/primary");
strlcpybuff(makeindex_firstlink,
"cid:primary/primary",
HTS_URLMAXSIZE * 2);
}
if ((b == a) || (a == NULL) || (b == NULL)) { // pas de titre
@@ -2319,12 +2321,12 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
switch (p_type) {
case 2:{
//if (*lien!='/') strcatbuff(base,"/");
strcpybuff(base, lien);
strlcpybuff(base, lien, HTS_URLMAXSIZE * 2);
}
break; // base
case -2:{
//if (*lien!='/') strcatbuff(codebase,"/");
strcpybuff(codebase, lien);
strlcpybuff(codebase, lien, HTS_URLMAXSIZE * 2);
}
break; // base
}
@@ -4397,7 +4399,7 @@ int hts_mirror_wait_for_next_file(htsmoduleStruct * str,
memcpy(r, &(back[b].r), sizeof(htsblk));
r->location = stre->loc_; // ne PAS copier location!! adresse, pas de buffer
if (back[b].r.location)
strcpybuff(r->location, back[b].r.location);
strlcpybuff(r->location, back[b].r.location, HTS_URLMAXSIZE * 2);
back[b].r.adr = NULL; // ne pas faire de desalloc ensuite
// libérer emplacement backing

View File

@@ -30,6 +30,17 @@ run() {
RC=$?
}
# crawl using exactly the given args as the only URL(s), no implicit primary URL;
# leaves the exit status in RC
run_only() {
local out="$1"
shift
rm -rf "$out"
mkdir -p "$out"
httrack -O "$out" --quiet -n "$@" >"$out/.log" 2>&1
RC=$?
}
# assert the value was accepted: clean exit and the fixture was mirrored
accepted() {
{ test "$RC" -eq 0 && test -n "$(find "$1" -type f -path '*/index.html' -print -quit)"; } ||
@@ -68,4 +79,15 @@ refused "#152: over-cap -F not refused cleanly"
run "$tmp/ov-l" --user-agent "$over"
refused "#152: over-cap --user-agent not refused cleanly"
# Quote handling on the sole URL (run_only, so the quoted arg is the only URL and
# can't be masked by an implicit one). A fully "-quoted URL has its surrounding
# quotes stripped in place and is mirrored; a dangling opening quote, and a lone
# quote (empty after the opening "), are refused cleanly and never crash.
run_only "$tmp/q-ok" "\"file://$tmp/index.html\""
accepted "$tmp/q-ok" "quoted URL not stripped/mirrored"
run_only "$tmp/q-bad" '"foo'
refused "dangling-quote argument not refused cleanly"
run_only "$tmp/q-lone" '"'
refused "lone-quote argument not refused cleanly"
exit 0

91
tests/01_engine-rcfile.test Executable file
View File

@@ -0,0 +1,91 @@
#!/bin/bash
#
# Config-file alias loading (no network). A .httrackrc in the working directory
# is read by optinclude_file(), whose cmdl_ins macro inserts each alias-expanded
# token into the x_argvblk block. That macro used to copy with an unbounded
# strcpy on a bare char*; it is now bounded (strlcpybuff + cmdl_room over the
# block capacity). Two properties are checked:
# 1. The bound does not truncate: a long user-agent alias reaches doit.log
# intact. user-agent expands to two tokens (-F <value>), so it exercises
# both cmdl_ins insertions.
# 2. The bound holds under exhaustion: a pathological .httrackrc whose alias
# expansions overflow the block aborts cleanly through the htssafe bounds
# check (a message naming htsalias.c) instead of overrunning the heap. The
# unbounded version segfaulted here.
# set -e with the intentional-nonzero httrack runs guarded explicitly (the
# crawls below are expected to fail/abort and their status is inspected by hand).
set -euo pipefail
# Resolve httrack to an absolute path before we cd: PATH may hold a build-relative
# entry that would not resolve from the temp directory.
bin=$(command -v httrack) || {
echo "FAIL: httrack not found on PATH"
exit 1
}
case "$bin" in
/*) ;;
*) bin="$(cd "$(dirname "$bin")" && pwd)/$(basename "$bin")" ;;
esac
tmp=$(mktemp -d "${TMPDIR:-/tmp}/httrack_rcfile.XXXXXX") || exit 1
trap 'rm -rf "$tmp"' EXIT HUP INT QUIT PIPE TERM
# --- 1. alias token survives the bound intact -------------------------------
d1="$tmp/intact"
mkdir -p "$d1"
echo '<html><body>hello</body></html>' >"$d1/index.html"
# optinclude_file() lowercases each config line, so the marker is lowercase to
# survive the comparison verbatim.
marker='zzz_rcfile_marker_0123456789_abcdefghijklmnopqrstuvwxyz_intact'
printf 'user-agent=%s\n' "$marker" >"$d1/.httrackrc"
# Run with no -O so the working-directory .httrackrc is loaded (an -O path makes
# the engine skip the rc files). Output lands in the temp dir. Guard the run so a
# nonzero exit is captured for the assertion instead of tripping set -e.
rc=0
(cd "$d1" && "$bin" "file://$d1/index.html" --quiet -n >.log 2>&1) || rc=$?
test "$rc" -eq 0 || {
echo "FAIL: rc-file crawl exited $rc"
exit 1
}
test -f "$d1/hts-cache/doit.log" || {
echo "FAIL: doit.log not written (rc file not processed)"
exit 1
}
# A truncated copy would cut the token; require the full -F value.
grep -q -- "-F $marker" "$d1/hts-cache/doit.log" || {
echo "FAIL: user-agent alias missing or truncated in doit.log"
head -1 "$d1/hts-cache/doit.log"
exit 1
}
# --- 2. block exhaustion aborts through the bound, not the heap -------------
d2="$tmp/exhaust"
mkdir -p "$d2"
echo '<html><body>hi</body></html>' >"$d2/index.html"
# Each line inserts ~two tokens of ~200 bytes; 400 lines overflow the block's
# fixed slack (current_size + 32768) many times over, deterministically.
val=$(printf 'a%.0s' $(seq 1 200))
for _ in $(seq 1 400); do
printf 'user-agent=%s\n' "$val"
done >"$d2/.httrackrc"
# The process aborts (httrack turns the fatal signal into exit 134 either way),
# so the exit code does not distinguish the bounded abort from a heap overflow;
# the stderr diagnostic does. The htssafe bounds check names the offending file.
# Expected to fail, so the nonzero exit is swallowed; only the log is inspected.
(cd "$d2" && "$bin" "file://$d2/index.html" --quiet -n >.log 2>&1) || true
grep -Eq "overflow while copying.*htsalias\.c" "$d2/.log" || {
echo "FAIL: exhausted rc file did not abort through the htsalias.c bound"
echo "(an unbounded copy would overrun the heap here)"
tail -3 "$d2/.log"
exit 1
}
exit 0

View File

@@ -25,6 +25,7 @@ TESTS = \
01_engine-idna.test \
01_engine-mime.test \
01_engine-parse.test \
01_engine-rcfile.test \
01_engine-simplify.test \
01_engine-strsafe.test \
02_manpage-regen.test \

View File

@@ -499,6 +499,7 @@ TESTS = \
01_engine-idna.test \
01_engine-mime.test \
01_engine-parse.test \
01_engine-rcfile.test \
01_engine-simplify.test \
01_engine-strsafe.test \
02_manpage-regen.test \