Compare commits

...

7 Commits

Author SHA1 Message Date
Xavier Roche
fe8bd59d19 Bound htsalias.c pointer-destination buffer writes (batch 14)
htsalias.c keeps its own copy of htscoremain.c's cmdl_ins macro (config-file
alias expansion in optinclude_file). The copy still wrote alias-expanded tokens
into the argv block with an unbounded strcpybuff on a bare char*. Thread the
block capacity (x_argvblk_size) through optinclude_file and bound the insert
with strlcpybuff + cmdl_room, the same guard batch 13 applied to the original:
cmdl_room yields 0 instead of size_t-wrapping when the offset outruns the block,
so an alias/doit.log expansion bomb aborts cleanly rather than overflowing.

Adds 01_engine-rcfile.test, which had no coverage before: it drops a .httrackrc
with a long user-agent alias in the working directory, runs httrack with no -O
(the only way the rc files load), and checks the alias-expanded -F <value> token
reaches hts-cache/doit.log intact. user-agent expands to two tokens, exercising
both cmdl_ins insertions; a truncating bound is caught (verified by injecting
one).

htsalias.c pointer-destination warnings 2->0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 20:41:08 +02:00
Xavier Roche
83d813eb7f Merge pull request #370 from xroche/cleanup/htscoremain-bounds
Bound htscoremain.c pointer-destination buffer writes (batch 13)
2026-06-16 19:37:06 +02:00
Xavier Roche
31eead95df Bound htscoremain.c pointer-destination buffer writes (batch 13)
Continues the htssafe.h pointer-destination migration in the CLI parser
(hts_main_internal). All sites write into a bare char*.

* The cmdl_add()/cmdl_ins() macros build argv entries into the x_argvblk block
  (malloc'd as the command-line size + 32768). Thread the block's total size
  (recorded in a new x_argvblk_size) and bound the copy with strlcpybuff. The
  remaining room is computed by a cmdl_room() helper that yields 0 once the block
  is exhausted (alias expansion or doit.log insertion can outrun the 32768 slack)
  so the copy aborts cleanly instead of the size_t subtraction wrapping to a huge
  unbounded value.
* The in-place argv rewrites each write no more than the slot already holds, so
  they are bounded by strlen(dest)+1 (provably sufficient): the "(none)" ->
  "\"\"" replacement, the two quote-strip copies (tempo is argv[na] minus its
  surrounding quotes), and the "--catchurl" -> "-#P" rewrite. The "--clean"/
  "--tide" empty rewrite becomes a direct argv[i][1]='\0'.
* Guard the quote-strip's tempo[strlen(tempo)-1] read: a lone '"' argument left
  tempo empty and read tempo[-1] (out of bounds). It now takes the existing
  missing-quote error path.
* The URL accumulator append uses strlcatbuff against the tracked url_sz.

These are macros/locals inside hts_main_internal, so not -#7 unit-testable;
cmdl_add runs on every invocation (covered by the whole suite). New
01_engine-cmdline.test cases exercise the quote-strip rewrite as the sole URL (a
quoted URL is mirrored; dangling- and lone-quote arguments are refused cleanly,
never a crash).

htscoremain.c pointer-destination warnings: 10 -> 0.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 19:29:30 +02:00
Xavier Roche
1f29ed41db Bound htscoremain.c pointer-destination buffer writes (batch 13)
Continues the htssafe.h pointer-destination migration in the CLI parser
(hts_main_internal). All sites write into a bare char*.

* The cmdl_add()/cmdl_ins() macros build argv entries into the x_argvblk block
  (malloc'd as the command-line size + 32768). Thread the block's total size and
  bound the copy with strlcpybuff(argv[i], token, bufsize - ptr); record the size
  in a new x_argvblk_size alongside x_argvblk.
* The in-place argv rewrites each write no more than the slot already holds, so
  they are bounded by strlen(dest)+1 (provably sufficient): the "(none)" ->
  "\"\"" replacement, the two quote-strip copies (tempo is argv[na] minus its
  surrounding quotes), and the "--catchurl" -> "-#P" rewrite. The "--clean"/
  "--tide" empty rewrite becomes a direct argv[i][1]='\0'.
* The URL accumulator append uses strlcatbuff against the tracked url_sz.

These are macros/locals inside hts_main_internal, so they are not -#7
unit-testable; cmdl_add runs on every invocation (covered by the whole suite),
and a new 01_engine-cmdline.test case exercises the quote-strip rewrite (a quoted
URL is mirrored; a dangling quote is refused cleanly, never a crash).

htscoremain.c pointer-destination warnings: 10 -> 0.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 18:57:19 +02:00
Xavier Roche
9db360e5fd Merge pull request #369 from xroche/cleanup/htstools-bounds
Bound htstools.c pointer-destination buffer writes (batch 12)
2026-06-16 18:25:07 +02:00
Xavier Roche
88bfcff10c Bound htstools.c pointer-destination buffer writes (batch 12)
Continues the htssafe.h pointer-destination migration: the strcpybuff/strcatbuff
macros silently fall back to a raw strcpy/strcat when the destination is a bare
char* rather than a sized array.

All four functions are internal (hidden, not HTSEXT_API), so they take explicit
destination sizes:
* lienrelatif() builds a relative link into a char* caller buffer; threads a
  size_t and bounds the "../"/path appends with strlcatbuff (the local _curr
  copy uses sizeof(_curr)).
* long_to_83() / longfile_to_83() build an 8-3 / ISO9660 name into a caller
  buffer; thread a size_t and use strl(n)catbuff.
* ident_url_relatif()'s in-place IDNA host rewrite bounds the copy by the
  remaining capacity of adrfil->adr (a pointer into that array).

Callers in htscore.c, htswizard.c, htsparse.c and htsname.c pass sizeof(dest)
(all the destinations are HTS_URLMAXSIZE*2 arrays).

Add -#7 basic_selftests for longfile_to_83 (8-3 and ISO9660), long_to_83
(per-segment path conversion) and lienrelatif (same-dir basename, parent "../").

htstools.c pointer-destination warnings: 10 -> 0.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 18:01:47 +02:00
Xavier Roche
1df45fc231 Merge pull request #368 from xroche/cleanup/htsname-bounds
Bound htsname.c pointer-destination buffer writes (batch 11)
2026-06-16 17:25:12 +02:00
13 changed files with 275 additions and 88 deletions

View File

@@ -41,19 +41,24 @@ Please visit our Website: http://www.httrack.com
#define _NOT_NULL(a) ( (a!=NULL) ? (a) : "" )
// COPY OF cmdl_ins in htsmain.c
// Insert a command in the argc/argv
#define cmdl_ins(token,argc,argv,buff,ptr) \
{ \
int i; \
for(i=argc;i>0;i--)\
argv[i]=argv[i-1];\
} \
argv[0]=(buff+ptr); \
strcpybuff(argv[0],token); \
ptr += (int) (strlen(argv[0])+1); \
// COPY OF cmdl_ins in htscoremain.c
/* Bytes left in x_argvblk from offset ptr. The offset can in principle outrun
the block (alias/doit.log expansion), so the copy aborts cleanly instead of
the subtraction wrapping to a huge unbounded size. */
#define cmdl_room(bufsize, ptr) \
((ptr) < (size_t) (bufsize) ? (size_t) (bufsize) - (ptr) : 0)
// Insert a command in the argc/argv (buff has total capacity bufsize)
#define cmdl_ins(token, argc, argv, buff, bufsize, ptr) \
{ \
int i; \
for (i = argc; i > 0; i--) \
argv[i] = argv[i - 1]; \
} \
argv[0] = (buff + ptr); \
strlcpybuff(argv[0], token, cmdl_room(bufsize, ptr)); \
ptr += (int) (strlen(argv[0]) + 1); \
argc++
// END OF COPY OF cmdl_ins in htsmain.c
// END OF COPY OF cmdl_ins in htscoremain.c
/*
Aliases for command-line and config file definitions
@@ -468,7 +473,7 @@ const char *optalias_help(const char *token) {
*/
/* Note: NOT utf-8 */
int optinclude_file(const char *name, int *argc, char **argv, char *x_argvblk,
int *x_ptr) {
size_t x_argvblk_size, int *x_ptr) {
FILE *fp;
fp = fopen(name, "rb");
@@ -542,14 +547,15 @@ int optinclude_file(const char *name, int *argc, char **argv, char *x_argvblk,
/* temporary argc: Number of parameters after minus insert_after_argc */
insert_after_argc = (*argc) - insert_after;
cmdl_ins((tmp_argv[2]), insert_after_argc, (argv + insert_after),
x_argvblk, (*x_ptr));
x_argvblk, x_argvblk_size, (*x_ptr));
*argc = insert_after_argc + insert_after;
insert_after++;
/* Second one */
if (return_argc > 1) {
insert_after_argc = (*argc) - insert_after;
cmdl_ins((tmp_argv[3]), insert_after_argc,
(argv + insert_after), x_argvblk, (*x_ptr));
(argv + insert_after), x_argvblk, x_argvblk_size,
(*x_ptr));
*argc = insert_after_argc + insert_after;
insert_after++;
}

View File

@@ -45,7 +45,7 @@ int optalias_find(const char *token);
const char *optalias_help(const char *token);
int optreal_find(const char *token);
int optinclude_file(const char *name, int *argc, char **argv, char *x_argvblk,
int *x_ptr);
size_t x_argvblk_size, int *x_ptr);
const char *optreal_value(int p);
const char *optalias_value(int p);
const char *opttype_value(int p);

View File

@@ -3862,7 +3862,8 @@ int htsAddLink(htsmoduleStruct * str, char *link) {
opt->savename_83 = b;
if (r != -1 && !forbidden_url) {
if (savename()) {
if (lienrelatif(tempo, afs.save, savename()) == 0) {
if (lienrelatif(tempo, sizeof(tempo), afs.save, savename()) ==
0) {
hts_log_print(opt, LOG_DEBUG,
"(module): relative link at %s build with %s and %s: %s",
afs.af.adr, afs.save, savename(), tempo);

View File

@@ -69,23 +69,29 @@ Please visit our Website: http://www.httrack.com
/* Resolver */
extern int IPV6_resolver;
// Add a command in the argc/argv
#define cmdl_add(token,argc,argv,buff,ptr) \
argv[argc]=(buff+ptr); \
strcpybuff(argv[argc],token); \
ptr += (int) (strlen(argv[argc])+2); \
/* Remaining room in the argv block; 0 once it is exhausted (alias expansion or
doit.log insertion can outrun the +32768 slack), so the copy aborts cleanly
instead of the subtraction wrapping to a huge unbounded size. */
#define cmdl_room(bufsize, ptr) \
((ptr) < (size_t) (bufsize) ? (size_t) (bufsize) - (ptr) : 0)
// Add a command in the argc/argv (buff has total capacity bufsize)
#define cmdl_add(token, argc, argv, buff, bufsize, ptr) \
argv[argc] = (buff + ptr); \
strlcpybuff(argv[argc], token, cmdl_room(bufsize, ptr)); \
ptr += (int) (strlen(argv[argc]) + 2); \
argc++
// Insert a command in the argc/argv
#define cmdl_ins(token,argc,argv,buff,ptr) \
{ \
int i; \
for(i=argc;i>0;i--)\
argv[i]=argv[i-1];\
} \
argv[0]=(buff+ptr); \
strcpybuff(argv[0],token); \
ptr += (int) (strlen(argv[0])+2); \
// Insert a command in the argc/argv (buff has total capacity bufsize)
#define cmdl_ins(token, argc, argv, buff, bufsize, ptr) \
{ \
int i; \
for (i = argc; i > 0; i--) \
argv[i] = argv[i - 1]; \
} \
argv[0] = (buff + ptr); \
strlcpybuff(argv[0], token, cmdl_room(bufsize, ptr)); \
ptr += (int) (strlen(argv[0]) + 2); \
argc++
#define htsmain_free() do { \
@@ -431,6 +437,50 @@ static void basic_selftests(void) {
assertf(strcmp(b + len - 4, ".htm") == 0);
}
}
// longfile_to_83(): single-name 8-3 (mode 1) / ISO9660 (mode 2) conversion;
// uppercases, clamps the name (8 / 31) and the extension (3). It rewrites
// 'save' in place, so pass a mutable array.
{
char n83[256];
{
char save[] = "longfilename.html";
longfile_to_83(1, n83, sizeof(n83), save); // 8-3: name->8, ext->3
assertf(strcmp(n83, "LONGFILE.HTM") == 0);
}
{
char save[] = "longfilename.html";
longfile_to_83(2, n83, sizeof(n83), save); // ISO9660: name->31, ext->3
assertf(strcmp(n83, "LONGFILENAME.HTM") == 0);
}
{ // sanitization: leading '.'->'_', interior dots
char save[] = ".a b.c.d e"; // collapse to '_', spaces/specials -> '_'
// (only the last dot stays as the separator)
longfile_to_83(1, n83, sizeof(n83), save);
assertf(strcmp(n83, "_A_B_C.D_E") == 0);
}
}
// long_to_83(): per-segment 8-3 conversion of a whole path.
{
char n83[HTS_URLMAXSIZE * 2];
char save[] = "dir/longfilename.html";
long_to_83(1, n83, sizeof(n83), save);
assertf(strcmp(n83, "DIR/LONGFILE.HTM") == 0);
}
// lienrelatif(): relative path from the directory of curr_fil to link.
{
char s[HTS_URLMAXSIZE * 2];
// same directory -> just the basename
assertf(lienrelatif(s, sizeof(s), "dir/page.html", "dir/index.html") == 0);
assertf(strcmp(s, "page.html") == 0);
// link one level up -> a "../" prefix
assertf(lienrelatif(s, sizeof(s), "a.html", "dir/index.html") == 0);
assertf(strcmp(s, "../a.html") == 0);
}
}
/* Self-tests for the htssafe.h bounded string ops (driven by httrack -#8).
@@ -548,6 +598,7 @@ HTSEXT_API int hts_main2(int argc, char **argv, httrackp * opt) {
static int hts_main_internal(int argc, char **argv, httrackp * opt) {
char **x_argv = NULL; // Patch pour argv et argc: en cas de récupération de ligne de commande
char *x_argvblk = NULL; // (reprise ou update)
size_t x_argvblk_size = 0; // total capacity of x_argvblk
int x_ptr = 0; // offset
//
@@ -625,7 +676,8 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
*a = ' ';
/* equivalent to "empty parameter" */
if ((strcmp(argv[na], HTS_NOPARAM) == 0) || (strcmp(argv[na], HTS_NOPARAM2) == 0)) // (none)
strcpybuff(argv[na], "\"\"");
/* replacing "(none)"/"\"(none)\"" with "\"\"" always fits in place */
strlcpybuff(argv[na], "\"\"", strlen(argv[na]) + 1);
if (strncmp(argv[na], "-&", 2) == 0)
argv[na][1] = '%';
}
@@ -647,6 +699,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
htsmain_free();
return -1;
}
x_argvblk_size = (size_t) (current_size + 32768);
x_argvblk[0] = '\0';
x_ptr = 0;
@@ -668,7 +721,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
//
argv_url = 0; /* pour comptage */
//
cmdl_add(argv[0], x_argc, x_argv, x_argvblk, x_ptr);
cmdl_add(argv[0], x_argc, x_argv, x_argvblk, x_argvblk_size, x_ptr);
na = 1; /* commencer après nom_prg */
while(na < argc) {
int result = 1;
@@ -689,9 +742,10 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
}
/* Copier */
cmdl_add(tmp_argv[0], x_argc, x_argv, x_argvblk, x_ptr);
cmdl_add(tmp_argv[0], x_argc, x_argv, x_argvblk, x_argvblk_size, x_ptr);
if (tmp_argc > 1) {
cmdl_add(tmp_argv[1], x_argc, x_argv, x_argvblk, x_ptr);
cmdl_add(tmp_argv[1], x_argc, x_argv, x_argvblk, x_argvblk_size,
x_ptr);
}
/* Compter URLs et détecter -i,-q.. */
@@ -763,7 +817,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
char BIGSTK tempo[HTS_CDLMAXSIZE];
strcpybuff(tempo, argv[na] + 1);
if (tempo[strlen(tempo) - 1] != '"') {
if (tempo[0] == '\0' || tempo[strlen(tempo) - 1] != '"') {
char BIGSTK s[HTS_CDLMAXSIZE];
sprintf(s, "Missing quote in %s", argv[na]);
@@ -772,7 +826,9 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
return -1;
}
tempo[strlen(tempo) - 1] = '\0';
strcpybuff(argv[na], tempo);
/* tempo is argv[na] minus its surrounding quotes, so it fits in place
*/
strlcpybuff(argv[na], tempo, strlen(argv[na]) + 1);
}
if (cmdl_opt(argv[na])) { // option
@@ -873,18 +929,19 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt),
StringBuff(opt->path_log),
"hts-cache/doit.log"))) || (argv_url > 0)) {
if (!optinclude_file
(fconcat
(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt),
StringBuff(opt->path_log), HTS_HTTRACKRC),
&argc, argv, x_argvblk, &x_ptr))
if (!optinclude_file(HTS_HTTRACKRC, &argc, argv, x_argvblk, &x_ptr)) {
if (!optinclude_file
(fconcat(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt),
hts_gethome(), "/" HTS_HTTRACKRC),
&argc, argv, x_argvblk, &x_ptr)) {
if (!optinclude_file(
fconcat(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt),
StringBuff(opt->path_log), HTS_HTTRACKRC),
&argc, argv, x_argvblk, x_argvblk_size, &x_ptr))
if (!optinclude_file(HTS_HTTRACKRC, &argc, argv, x_argvblk,
x_argvblk_size, &x_ptr)) {
if (!optinclude_file(
fconcat(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt),
hts_gethome(), "/" HTS_HTTRACKRC),
&argc, argv, x_argvblk, x_argvblk_size, &x_ptr)) {
#ifdef HTS_HTTRACKCNF
optinclude_file(HTS_HTTRACKCNF, &argc, argv, x_argvblk, &x_ptr);
optinclude_file(HTS_HTTRACKCNF, &argc, argv, x_argvblk,
x_argvblk_size, &x_ptr);
#endif
}
}
@@ -937,7 +994,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
if (strnotempty(lastp)) {
insert_after_argc = argc - insert_after;
cmdl_ins(lastp, insert_after_argc, (argv + insert_after), x_argvblk,
x_ptr);
x_argvblk_size, x_ptr);
argc = insert_after_argc + insert_after;
insert_after++;
}
@@ -1057,7 +1114,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
if (argv[i][0] == '-') {
if (argv[i][1] == '-') { // --xxx
if ((strfield2(argv[i] + 2, "clean")) || (strfield2(argv[i] + 2, "tide"))) { // nettoyer
strcpybuff(argv[i] + 1, "");
argv[i][1] = '\0';
if (fexist
(fconcat
(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt), StringBuff(opt->path_log), "hts-log.txt")))
@@ -1166,7 +1223,8 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
//
} else if (strfield2(argv[i] + 2, "catchurl")) { // capture d'URL via proxy temporaire!
argv_url = 1; // forcer a passer les parametres
strcpybuff(argv[i] + 1, "#P");
/* argv[i] is "--catchurl"; "#P" fits after its first char */
strlcpybuff(argv[i] + 1, "#P", strlen(argv[i] + 1) + 1);
//
} else if (strfield2(argv[i] + 2, "updatehttrack")) {
#ifdef _WIN32
@@ -1494,7 +1552,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
char BIGSTK tempo[HTS_CDLMAXSIZE + 256];
strcpybuff(tempo, argv[na] + 1);
if (tempo[strlen(tempo) - 1] != '"') {
if (tempo[0] == '\0' || tempo[strlen(tempo) - 1] != '"') {
char s[HTS_CDLMAXSIZE + 256];
sprintf(s, "Missing quote in %s", argv[na]);
@@ -1503,7 +1561,9 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
return -1;
}
tempo[strlen(tempo) - 1] = '\0';
strcpybuff(argv[na], tempo);
/* tempo is argv[na] minus its surrounding quotes, so it fits in place
*/
strlcpybuff(argv[na], tempo, strlen(argv[na]) + 1);
}
if (cmdl_opt(argv[na])) { // option
@@ -3162,7 +3222,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
if (urlSize < HTS_URLMAXSIZE) {
ensureUrlCapacity(url, url_sz, capa);
if (strnotempty(url))
strcatbuff(url, " "); // espace de séparation
strlcatbuff(url, " ", url_sz); // separator space
append_escape_spc_url(unescape_http_unharm(catbuff, sizeof(catbuff), argv[na], 1), url, url_sz);
}
} // if argv=- etc.

View File

@@ -925,7 +925,7 @@ int url_savename(lien_adrfilsave *const afs,
pth[0] = n83[0] = '\0';
strncatbuff(pth, fil, (int) (nom_pos - fil) - 1);
long_to_83(opt->savename_83, n83, pth);
long_to_83(opt->savename_83, n83, sizeof(n83), pth);
htsbuff_cat(&sb, n83);
}
}
@@ -1307,7 +1307,7 @@ int url_savename(lien_adrfilsave *const afs,
if (opt->savename_83) {
char BIGSTK n83[HTS_URLMAXSIZE * 2];
long_to_83(opt->savename_83, n83, afs->save);
long_to_83(opt->savename_83, n83, sizeof(n83), afs->save);
strcpybuff(afs->save, n83);
}
// enforce stricter ISO9660 compliance (bug reported by Steffo Carlsson)

View File

@@ -610,11 +610,11 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
b = strchr(a, '<'); // prochain tag
}
}
if (lienrelatif
(tempo, heap(ptr)->sav,
concat(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt),
StringBuff(opt->path_html_utf8),
"index.html")) == 0) {
if (lienrelatif(tempo, sizeof(tempo), heap(ptr)->sav,
concat(OPT_GET_BUFF(opt),
OPT_GET_BUFF_SIZE(opt),
StringBuff(opt->path_html_utf8),
"index.html")) == 0) {
detect_title = 1; // ok détecté pour cette page!
makeindex_links++; // un de plus
strcpybuff(makeindex_firstlink, tempo);
@@ -2720,7 +2720,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
strcpybuff(save, StringBuff(opt->path_html_utf8));
strcatbuff(save, cat_name);
if (lienrelatif(tempo, save, relativesavename()) == 0) {
if (lienrelatif(tempo, sizeof(tempo), save,
relativesavename()) == 0) {
/* Never escape high-chars (we don't know the encoding!!) */
inplace_escape_uri_utf(tempo, sizeof(tempo)); // escape with %xx
//if (!no_esc_utf)
@@ -2950,7 +2951,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
tempo[0] = '\0';
// calculer le lien relatif
if (lienrelatif(tempo, afs.save, relativesavename()) == 0) {
if (lienrelatif(tempo, sizeof(tempo), afs.save,
relativesavename()) == 0) {
if (!in_media) { // In media (such as real audio): don't patch
/* Never escape high-chars (we don't know the encoding!!) */
inplace_escape_uri_utf(tempo, sizeof(tempo));

View File

@@ -274,7 +274,9 @@ int ident_url_relatif(const char *lien, const char *origin_adr,
char *const idna = hts_convertStringUTF8ToIDNA(a, strlen(a));
if (idna != NULL) {
if (strlen(idna) < HTS_URLMAXSIZE) {
strcpybuff(a, idna);
/* a points within adrfil->adr; bound by the remaining capacity */
strlcpybuff(a, idna,
sizeof(adrfil->adr) - (size_t) (a - adrfil->adr));
}
free(idna);
}
@@ -286,7 +288,7 @@ int ident_url_relatif(const char *lien, const char *origin_adr,
// créer dans s, à partir du chemin courant curr_fil, le lien vers link (absolu)
// un ident_url_relatif a déja été fait avant, pour que link ne soit pas un chemin relatif
int lienrelatif(char *s, const char *link, const char *curr_fil) {
int lienrelatif(char *s, size_t ssize, const char *link, const char *curr_fil) {
char BIGSTK _curr[HTS_URLMAXSIZE * 2];
char BIGSTK newcurr_fil[HTS_URLMAXSIZE * 2], newlink[HTS_URLMAXSIZE * 2];
char *curr;
@@ -314,9 +316,9 @@ int lienrelatif(char *s, const char *link, const char *curr_fil) {
}
}
// recopier uniquement le chemin courant
// copy only the current path
curr = _curr;
strcpybuff(curr, curr_fil);
strlcpybuff(curr, curr_fil, sizeof(_curr));
if ((a = strchr(curr, '?')) == NULL) // couper au ? (params)
a = curr + strlen(curr) - 1; // pas de params: aller à la fin
while((*a != '/') && (a > curr))
@@ -359,14 +361,14 @@ int lienrelatif(char *s, const char *link, const char *curr_fil) {
a++;
while(*a)
if (*(a++) == '/')
strcatbuff(s, "../");
strlcatbuff(s, "../", ssize);
//if (strlen(s)==0) strcatbuff(s,"/");
if (slash)
strcatbuff(s, "/"); // garder absolu!!
strlcatbuff(s, "/", ssize); // keep it absolute!
// on est dans le répertoire de départ, copier
strcatbuff(s, link + ((*link == '/') ? 1 : 0));
// we are in the starting directory, copy
strlcatbuff(s, link + ((*link == '/') ? 1 : 0), ssize);
/* Security check */
if (strlen(s) >= HTS_URLMAXSIZE)
@@ -410,7 +412,7 @@ int link_has_authorization(const char *lien) {
}
// conversion chemin de fichier/dossier vers 8-3 ou ISO9660
void long_to_83(int mode, char *n83, char *save) {
void long_to_83(int mode, char *n83, size_t n83size, char *save) {
n83[0] = '\0';
while(*save) {
@@ -425,19 +427,19 @@ void long_to_83(int mode, char *n83, char *save) {
}
fnl[j] = '\0';
// conversion
longfile_to_83(mode, fn83, fnl);
strcatbuff(n83, fn83);
longfile_to_83(mode, fn83, sizeof(fn83), fnl);
strlcatbuff(n83, fn83, n83size);
save += i;
if (*save == '/') {
strcatbuff(n83, "/");
strlcatbuff(n83, "/", n83size);
save++;
}
}
}
// conversion nom de fichier/dossier isolé vers 8-3 ou ISO9660
void longfile_to_83(int mode, char *n83, char *save) {
void longfile_to_83(int mode, char *n83, size_t n83size, char *save) {
int j = 0, max = 0;
int i = 0;
char nom[256];
@@ -526,10 +528,10 @@ void longfile_to_83(int mode, char *n83, char *save) {
}
// corriger vers 8-3
n83[0] = '\0';
strncatbuff(n83, nom, max);
strlncatbuff(n83, nom, n83size, max);
if (strnotempty(ext)) {
strcatbuff(n83, ".");
strncatbuff(n83, ext, 3);
strlcatbuff(n83, ".", n83size);
strlncatbuff(n83, ext, n83size, 3);
}
}

View File

@@ -61,11 +61,11 @@ typedef struct lien_adrfilsave lien_adrfilsave;
int ident_url_relatif(const char *lien, const char *origin_adr,
const char *origin_fil,
lien_adrfil* const adrfil);
int lienrelatif(char *s, const char *link, const char *curr);
int lienrelatif(char *s, size_t ssize, const char *link, const char *curr);
int link_has_authority(const char *lien);
int link_has_authorization(const char *lien);
void long_to_83(int mode, char *n83, char *save);
void longfile_to_83(int mode, char *n83, char *save);
void long_to_83(int mode, char *n83, size_t n83size, char *save);
void longfile_to_83(int mode, char *n83, size_t n83size, char *save);
HTS_INLINE int __rech_tageq(const char *adr, const char *s);
HTS_INLINE int __rech_tageqbegdigits(const char *adr, const char *s);
HTS_INLINE int rech_tageq_all(const char *adr, const char *s);

View File

@@ -223,8 +223,9 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
// note (up/down): on calcule à partir du lien primaire, ET du lien précédent.
// ex: si on descend 2 fois on peut remonter 1 fois
if (lienrelatif(tempo, fil, heap(heap(ptr)->premier)->fil) == 0) {
if (lienrelatif(tempo2, fil, heap(ptr)->fil) == 0) {
if (lienrelatif(tempo, sizeof(tempo), fil,
heap(heap(ptr)->premier)->fil) == 0) {
if (lienrelatif(tempo2, sizeof(tempo2), fil, heap(ptr)->fil) == 0) {
hts_log_print(opt, LOG_DEBUG,
"build relative links to test: %s %s (with %s and %s)",
tempo, tempo2, heap(heap(ptr)->premier)->fil,
@@ -326,8 +327,9 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
char BIGSTK tempo[HTS_URLMAXSIZE * 2];
char BIGSTK tempo2[HTS_URLMAXSIZE * 2];
if (lienrelatif(tempo, fil, heap(heap(ptr)->premier)->fil) == 0) {
if (lienrelatif(tempo2, fil, heap(ptr)->fil) == 0) {
if (lienrelatif(tempo, sizeof(tempo), fil,
heap(heap(ptr)->premier)->fil) == 0) {
if (lienrelatif(tempo2, sizeof(tempo2), fil, heap(ptr)->fil) == 0) {
} else {
hts_log_print(opt, LOG_ERROR,
"Error building relative link %s and %s", fil,
@@ -336,7 +338,6 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
} else {
hts_log_print(opt, LOG_ERROR, "Error building relative link %s and %s",
fil, heap(heap(ptr)->premier)->fil);
}
} // fin tester interdiction de monter

View File

@@ -30,6 +30,17 @@ run() {
RC=$?
}
# crawl using exactly the given args as the only URL(s), no implicit primary URL;
# leaves the exit status in RC
run_only() {
local out="$1"
shift
rm -rf "$out"
mkdir -p "$out"
httrack -O "$out" --quiet -n "$@" >"$out/.log" 2>&1
RC=$?
}
# assert the value was accepted: clean exit and the fixture was mirrored
accepted() {
{ test "$RC" -eq 0 && test -n "$(find "$1" -type f -path '*/index.html' -print -quit)"; } ||
@@ -68,4 +79,15 @@ refused "#152: over-cap -F not refused cleanly"
run "$tmp/ov-l" --user-agent "$over"
refused "#152: over-cap --user-agent not refused cleanly"
# Quote handling on the sole URL (run_only, so the quoted arg is the only URL and
# can't be masked by an implicit one). A fully "-quoted URL has its surrounding
# quotes stripped in place and is mirrored; a dangling opening quote, and a lone
# quote (empty after the opening "), are refused cleanly and never crash.
run_only "$tmp/q-ok" "\"file://$tmp/index.html\""
accepted "$tmp/q-ok" "quoted URL not stripped/mirrored"
run_only "$tmp/q-bad" '"foo'
refused "dangling-quote argument not refused cleanly"
run_only "$tmp/q-lone" '"'
refused "lone-quote argument not refused cleanly"
exit 0

91
tests/01_engine-rcfile.test Executable file
View File

@@ -0,0 +1,91 @@
#!/bin/bash
#
# Config-file alias loading (no network). A .httrackrc in the working directory
# is read by optinclude_file(), whose cmdl_ins macro inserts each alias-expanded
# token into the x_argvblk block. That macro used to copy with an unbounded
# strcpy on a bare char*; it is now bounded (strlcpybuff + cmdl_room over the
# block capacity). Two properties are checked:
# 1. The bound does not truncate: a long user-agent alias reaches doit.log
# intact. user-agent expands to two tokens (-F <value>), so it exercises
# both cmdl_ins insertions.
# 2. The bound holds under exhaustion: a pathological .httrackrc whose alias
# expansions overflow the block aborts cleanly through the htssafe bounds
# check (a message naming htsalias.c) instead of overrunning the heap. The
# unbounded version segfaulted here.
# set -e with the intentional-nonzero httrack runs guarded explicitly (the
# crawls below are expected to fail/abort and their status is inspected by hand).
set -euo pipefail
# Resolve httrack to an absolute path before we cd: PATH may hold a build-relative
# entry that would not resolve from the temp directory.
bin=$(command -v httrack) || {
echo "FAIL: httrack not found on PATH"
exit 1
}
case "$bin" in
/*) ;;
*) bin="$(cd "$(dirname "$bin")" && pwd)/$(basename "$bin")" ;;
esac
tmp=$(mktemp -d "${TMPDIR:-/tmp}/httrack_rcfile.XXXXXX") || exit 1
trap 'rm -rf "$tmp"' EXIT HUP INT QUIT PIPE TERM
# --- 1. alias token survives the bound intact -------------------------------
d1="$tmp/intact"
mkdir -p "$d1"
echo '<html><body>hello</body></html>' >"$d1/index.html"
# optinclude_file() lowercases each config line, so the marker is lowercase to
# survive the comparison verbatim.
marker='zzz_rcfile_marker_0123456789_abcdefghijklmnopqrstuvwxyz_intact'
printf 'user-agent=%s\n' "$marker" >"$d1/.httrackrc"
# Run with no -O so the working-directory .httrackrc is loaded (an -O path makes
# the engine skip the rc files). Output lands in the temp dir. Guard the run so a
# nonzero exit is captured for the assertion instead of tripping set -e.
rc=0
(cd "$d1" && "$bin" "file://$d1/index.html" --quiet -n >.log 2>&1) || rc=$?
test "$rc" -eq 0 || {
echo "FAIL: rc-file crawl exited $rc"
exit 1
}
test -f "$d1/hts-cache/doit.log" || {
echo "FAIL: doit.log not written (rc file not processed)"
exit 1
}
# A truncated copy would cut the token; require the full -F value.
grep -q -- "-F $marker" "$d1/hts-cache/doit.log" || {
echo "FAIL: user-agent alias missing or truncated in doit.log"
head -1 "$d1/hts-cache/doit.log"
exit 1
}
# --- 2. block exhaustion aborts through the bound, not the heap -------------
d2="$tmp/exhaust"
mkdir -p "$d2"
echo '<html><body>hi</body></html>' >"$d2/index.html"
# Each line inserts ~two tokens of ~200 bytes; 400 lines overflow the block's
# fixed slack (current_size + 32768) many times over, deterministically.
val=$(printf 'a%.0s' $(seq 1 200))
for _ in $(seq 1 400); do
printf 'user-agent=%s\n' "$val"
done >"$d2/.httrackrc"
# The process aborts (httrack turns the fatal signal into exit 134 either way),
# so the exit code does not distinguish the bounded abort from a heap overflow;
# the stderr diagnostic does. The htssafe bounds check names the offending file.
# Expected to fail, so the nonzero exit is swallowed; only the log is inspected.
(cd "$d2" && "$bin" "file://$d2/index.html" --quiet -n >.log 2>&1) || true
grep -Eq "overflow while copying.*htsalias\.c" "$d2/.log" || {
echo "FAIL: exhausted rc file did not abort through the htsalias.c bound"
echo "(an unbounded copy would overrun the heap here)"
tail -3 "$d2/.log"
exit 1
}
exit 0

View File

@@ -25,6 +25,7 @@ TESTS = \
01_engine-idna.test \
01_engine-mime.test \
01_engine-parse.test \
01_engine-rcfile.test \
01_engine-simplify.test \
01_engine-strsafe.test \
02_manpage-regen.test \

View File

@@ -499,6 +499,7 @@ TESTS = \
01_engine-idna.test \
01_engine-mime.test \
01_engine-parse.test \
01_engine-rcfile.test \
01_engine-simplify.test \
01_engine-strsafe.test \
02_manpage-regen.test \