Compare commits

...

15 Commits

Author SHA1 Message Date
Xavier Roche
3b2d7afdaa Merge pull request #393 from xroche/fix/empty-footer-doitlog-106
Keep empty quoted args when reloading doit.log (#106)
2026-06-19 08:13:19 +02:00
Xavier Roche
6ee539619e htscoremain: keep empty quoted args when reloading doit.log (#106)
An empty footer (-%F "") is written to hts-cache/doit.log correctly as the
two-character token "", and next_token() unquotes it back to an empty string.
But the doit.log reload loop only re-inserted a token when strnotempty(lastp),
which dropped the empty one. With its argument gone, -%F absorbed the following
token (or had none), so a no-url --continue/--update reprise misparsed and
failed.

Track whether the token started with a quote (before next_token() strips it in
place) and keep it even when empty, so "" survives the round-trip. Whitespace
gaps still produce no token, so spacing behavior is unchanged.

01_engine-doitlog.test gains a scenario that mirrors with -%F "" -r2, then on
the no-url reprise checks the regenerated doit.log still round-trips the empty
token -- probing the reader's rebuilt argv, not just that the reprise didn't
crash. The trailing -r2 makes a dropped-token bug visible (it shifts into -%F's
slot and panics) rather than a harmless run off the end of argv. Reverting only
the guard makes the scenario fail (reprise exits 255).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-19 08:09:57 +02:00
Xavier Roche
fb098b27b4 Merge pull request #392 from xroche/fix/cookie-rfc6265-151
Drop $Version/$Path from the request Cookie header (#151)
2026-06-18 22:42:47 +02:00
Xavier Roche
5f6a3fb917 htslib: drop $Version/$Path from request Cookie header (#151)
The request "Cookie:" header was built in the obsolete RFC 2965 style,
emitting "$Version=1" before the first cookie and a "$Path=..." attribute
after every value:

  Cookie: $Version=1; name=value; $Path=/; has_js=1; $Path=/

Servers expecting RFC 6265 treat $Version and $Path as stray cookies and
reject or misread the request. Emit bare name=value pairs joined by "; ":

  Cookie: name=value; has_js=1

The cookie loop is factored out of http_sendhead into append_cookie_header
(same logic, same buffer), with a thin http_cookie_header_selftest wrapper
so the exact code path can be unit-tested. A new hidden "-#Q" subcommand
builds the header for two same-domain cookies plus one on a different
domain (which must be filtered out) and checks the output is the clean
RFC 6265 form with no $Version/$Path and no cross-domain leak; driven by
tests/01_engine-cookies.test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-18 22:12:28 +02:00
Xavier Roche
f9e676dbe3 Merge pull request #391 from xroche/feature/api-enum-callsites-savename83
htsopt: name the savename_83 enum and finish the call-site constant adoption
2026-06-18 21:43:34 +02:00
Xavier Roche
1b440c44b5 htsopt: name savename_83 enum and adopt enum constants at call sites
Type opt->savename_83 as a new hts_savename_83 enum (LONG/DOS/ISO9660 =
0/1/2) and replace the remaining magic-number literals for the already-
typed verbosedisplay and savename_delayed fields with their named enum
constants across the engine.

Behavior-preserving: every constant equals the literal it replaces, and a
C enum is int-sized, so struct layout is unchanged (sizeof(httrackp) and
offsetof(savename_83) are identical to origin/master, no soname bump). The
-L option block is deliberately reflowed to clang-format style, which is
what made the savename_83 retype tractable. Bitmask fields (travel/seeker/
getmode/parsejava/hostcontrol) intentionally stay int with named bit enums,
per the existing flags-as-enum split.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-18 21:03:33 +02:00
Xavier Roche
ac6dd1a570 Merge pull request #390 from xroche/fix/copy-htsopt-unsigned-enum-guards
copy_htsopt silently drops boolean option fields
2026-06-18 20:46:00 +02:00
Xavier Roche
4549ec3695 htsopt: fix copy_htsopt dropping unsigned-enum fields
copy_htsopt() copies each field only when it is not the "-1 means unset"
sentinel, written as `if (from->X > -1)`. The boolean/enum option
migrations turned nearlink, errpage and parseall into hts_boolean, which
GCC backs with unsigned int. `unsigned > -1` is always false, so those
three fields silently stopped being copied.

Cast to int at the guard to restore the signed sentinel test. Add a
hidden `httrack -#9` self-test that drives copy_htsopt over distinct
boolean values plus an int positive control (tests/01_engine-copyopt.test);
it fails on the unfixed guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-18 20:25:42 +02:00
Xavier Roche
ac56c31b24 Merge pull request #389 from xroche/fix/travel-test-all-enum
htsopt: fold HTS_TRAVEL_TEST_ALL into the hts_travel_scope enum
2026-06-18 18:40:33 +02:00
Xavier Roche
ee6beeeb7d htsopt: fold HTS_TRAVEL_TEST_ALL into the hts_travel_scope enum
The -t "test all" flag was a stray #define sitting next to the scope
enum; make it an enum constant so the named travel values live in one
place. The mask (HTS_TRAVEL_SCOPE_MASK) stays a #define: it selects the
scope out of opt->travel, it is not a member of the value set.

Name and value (1 << 8) are unchanged, so every use site compiles
identically and opt->travel stays plain int. No ABI change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-18 18:29:23 +02:00
Xavier Roche
6788bda380 Merge pull request #388 from xroche/feature/api-enum-fields-2
htsopt: type debug, savename_delayed and verbosedisplay as named enums
2026-06-18 18:25:44 +02:00
Xavier Roche
7ead8d595e htsopt: type three more option fields as named enums
debug becomes hts_log_type (it already stored LOG_* values; the int
declaration was a latent type hole), savename_delayed becomes a new
hts_savename_delayed { NONE, SOFT, HARD }, and verbosedisplay becomes a
new hts_verbosedisplay { NONE, SIMPLE, FULL }. hostcontrol stays int but
its bits are now named by a new hts_hostcontrol flags enum, matching the
existing getmode/seeker/travel/htsparsejava_flags pattern.

A C enum is int-sized, so struct layout, field offsets and
sizeof(httrackp) are unchanged: no ABI break, no soname bump. The three
sscanf("%d", ...) sites that fill these fields now write through an int*
(size-identical) to keep the format type exact.

These enums are unsigned-backed (all enumerators non-negative), so the
non-negative debug comparisons (debug < level, debug > LOG_INFO, etc.)
now compile to unsigned jumps. debug is never negative, never sscanf'd
and never tested against a negative bound, so the result is unchanged;
disassembly is otherwise byte-identical bar instruction scheduling.

savename_83 is left as int on purpose: its sscanf sits in the -L parser
block whose old indentation does not round-trip through clang-format.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-18 18:11:19 +02:00
Xavier Roche
93f502990c Merge pull request #387 from xroche/feature/api-bool-returns
Return hts_boolean from the yes/no library functions
2026-06-18 17:38:48 +02:00
Xavier Roche
0f4b2596b2 htslib: return hts_boolean from the yes/no library functions
The exported API had many functions returning int where the int is really a
yes/no answer. Type the 14 genuinely-boolean ones as hts_boolean
(catch_url, dir_exists, is_dyntype, may_unknown, hts_findnext,
hts_findisdir/isfile/issystem, hts_has_stopped, hts_addurl, hts_resetaddurl,
hts_log, get_httptype_sized, guess_httptype_sized) and the three boolean int
parameters likewise (get_httptype_sized's flag, unescape_http_unharm's no_high,
hts_request_stop's force).

hts_boolean moves from htsopt.h to htsglobal.h so the library header, which only
forward-declares httrackp and does not include htsopt.h, can see the type.

The audit deliberately left alone the functions whose name suggests a boolean
but whose value is not 0/1: hts_is_testing returns 0..5, hts_is_exiting and
is_knowntype/is_userknowntype are tri-state, structcheck and the *_utf8 wrappers
are POSIX 0/-1, hts_findgetsize is a size, hts_main is an exit code, and
copy_htsopt returns 0 for success (a bool would read backwards). hts_setpause
and hts_is_parsing keep int params because they gate on '>= 0', not 0/1.

Not an ABI break: int -> int-sized enum is the same calling convention for both
return values (eax) and parameters, and enum<->int is implicit for callers, so
already-compiled consumers keep working. Verified by comparing per-object
disassembly against master: 39 of 45 objects byte-identical, htslib differs only
in __LINE__ immediates, and the five caller/definer objects differ only in
register allocation and return-block merging (no control-flow or value change).
make check passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-18 09:19:36 +02:00
Xavier Roche
4a676bb5e1 Merge pull request #386 from xroche/feature/api-boolean-enum
Type the boolean option fields as a named enum
2026-06-18 09:04:14 +02:00
18 changed files with 371 additions and 168 deletions

View File

@@ -3838,7 +3838,7 @@ void back_wait(struct_back * sback, httrackp * opt, cache_back * cache,
/* funny log for commandline users */ /* funny log for commandline users */
//if (!opt->quiet) { //if (!opt->quiet) {
// petite animation // petite animation
if (opt->verbosedisplay == 1) { if (opt->verbosedisplay == HTS_VERBOSE_SIMPLE) {
if (back[i].status == STATUS_READY) { if (back[i].status == STATUS_READY) {
if (back[i].r.statuscode == HTTP_OK) if (back[i].r.statuscode == HTTP_OK)
printf("* %s%s (" LLintP " bytes) - OK" VT_CLREOL "\r", printf("* %s%s (" LLintP " bytes) - OK" VT_CLREOL "\r",

View File

@@ -135,7 +135,8 @@ HTSEXT_API T_SOC catch_url_init(int *port, /* 128 bytes */ char *adr) {
// returns 0 if error // returns 0 if error
// url: buffer where URL must be stored - or ip:port in case of failure // url: buffer where URL must be stored - or ip:port in case of failure
// data: 32Kb // data: 32Kb
HTSEXT_API int catch_url(T_SOC soc, char *url, char *method, char *data) { HTSEXT_API hts_boolean catch_url(T_SOC soc, char *url, char *method,
char *data) {
int retour = 0; int retour = 0;
// connexion (accept) // connexion (accept)

View File

@@ -2585,7 +2585,7 @@ static int mkdir_compat(const char *pathname) {
/* path must end with "/" or with the finename (/tmp/bar/ or /tmp/bar/foo.zip) */ /* path must end with "/" or with the finename (/tmp/bar/ or /tmp/bar/foo.zip) */
/* Note: preserve errno */ /* Note: preserve errno */
HTSEXT_API int dir_exists(const char *path) { HTSEXT_API hts_boolean dir_exists(const char *path) {
const int err = errno; const int err = errno;
STRUCT_STAT st; STRUCT_STAT st;
char BIGSTK file[HTS_URLMAXSIZE * 2]; char BIGSTK file[HTS_URLMAXSIZE * 2];
@@ -3342,7 +3342,8 @@ int back_fill(struct_back * sback, httrackp * opt, cache_back * cache,
int ptr, int numero_passe) { int ptr, int numero_passe) {
int n = back_pluggable_sockets(sback, opt); int n = back_pluggable_sockets(sback, opt);
if (opt->savename_delayed == 2 && !opt->delayed_cached) /* cancel (always delayed) */ if (opt->savename_delayed == HTS_SAVENAME_DELAYED_HARD &&
!opt->delayed_cached) /* cancel (always delayed) */
return 0; return 0;
if (n > 0) { if (n > 0) {
int p; int p;
@@ -3646,7 +3647,7 @@ HTSEXT_API int hts_setpause(httrackp * opt, int p) {
} }
// ask for termination // ask for termination
HTSEXT_API int hts_request_stop(httrackp * opt, int force) { HTSEXT_API int hts_request_stop(httrackp *opt, hts_boolean force) {
if (opt != NULL) { if (opt != NULL) {
hts_log_print(opt, LOG_ERROR, "Exit requested by shell or user"); hts_log_print(opt, LOG_ERROR, "Exit requested by shell or user");
hts_mutexlock(&opt->state.lock); hts_mutexlock(&opt->state.lock);
@@ -3656,7 +3657,7 @@ HTSEXT_API int hts_request_stop(httrackp * opt, int force) {
return 0; return 0;
} }
HTSEXT_API int hts_has_stopped(httrackp * opt) { HTSEXT_API hts_boolean hts_has_stopped(httrackp *opt) {
int ended; int ended;
hts_mutexlock(&opt->state.lock); hts_mutexlock(&opt->state.lock);
ended = opt->state.is_ended; ended = opt->state.is_ended;
@@ -3678,12 +3679,12 @@ HTSEXT_API int hts_has_stopped(httrackp * opt) {
//} //}
// ajout d'URL // ajout d'URL
// -1 : erreur // -1 : erreur
HTSEXT_API int hts_addurl(httrackp * opt, char **url) { HTSEXT_API hts_boolean hts_addurl(httrackp *opt, char **url) {
if (url) if (url)
opt->state._hts_addurl = url; opt->state._hts_addurl = url;
return (opt->state._hts_addurl != NULL); return (opt->state._hts_addurl != NULL);
} }
HTSEXT_API int hts_resetaddurl(httrackp * opt) { HTSEXT_API hts_boolean hts_resetaddurl(httrackp *opt) {
opt->state._hts_addurl = NULL; opt->state._hts_addurl = NULL;
return (opt->state._hts_addurl != NULL); return (opt->state._hts_addurl != NULL);
} }
@@ -3702,7 +3703,9 @@ HTSEXT_API int copy_htsopt(const httrackp * from, httrackp * to) {
if (from->maxsoc > 0) if (from->maxsoc > 0)
to->maxsoc = from->maxsoc; to->maxsoc = from->maxsoc;
if (from->nearlink > -1) /* hts_boolean/enum fields are unsigned (GCC), so a bare `> -1` unset-guard
is always false; cast to int to keep the -1 "unset" sentinel test. */
if ((int) from->nearlink > -1)
to->nearlink = from->nearlink; to->nearlink = from->nearlink;
if (from->timeout > -1) if (from->timeout > -1)
@@ -3729,10 +3732,10 @@ HTSEXT_API int copy_htsopt(const httrackp * from, httrackp * to) {
if (from->hostcontrol > -1) if (from->hostcontrol > -1)
to->hostcontrol = from->hostcontrol; to->hostcontrol = from->hostcontrol;
if (from->errpage > -1) if ((int) from->errpage > -1)
to->errpage = from->errpage; to->errpage = from->errpage;
if (from->parseall > -1) if ((int) from->parseall > -1)
to->parseall = from->parseall; to->parseall = from->parseall;
// test all: bit 8 de travel // test all: bit 8 de travel
@@ -3844,7 +3847,7 @@ int htsAddLink(htsmoduleStruct * str, char *link) {
a = opt->savename_type; a = opt->savename_type;
b = opt->savename_83; b = opt->savename_83;
opt->savename_type = 0; opt->savename_type = 0;
opt->savename_83 = 0; opt->savename_83 = HTS_SAVENAME_83_LONG;
// note: adr,fil peuvent être patchés // note: adr,fil peuvent être patchés
r = r =
url_savename(&afs, NULL, NULL, NULL, opt, sback, cache, hashptr, ptr, numero_passe, url_savename(&afs, NULL, NULL, NULL, opt, sback, cache, hashptr, ptr, numero_passe,

View File

@@ -612,12 +612,12 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
/* Terminal is a tty, may ask questions and display funny information */ /* Terminal is a tty, may ask questions and display funny information */
if (isatty(1)) { if (isatty(1)) {
opt->quiet = 0; opt->quiet = 0;
opt->verbosedisplay = 1; opt->verbosedisplay = HTS_VERBOSE_SIMPLE;
} }
/* Not a tty, no stdin input or funny output! */ /* Not a tty, no stdin input or funny output! */
else { else {
opt->quiet = 1; opt->quiet = 1;
opt->verbosedisplay = 0; opt->verbosedisplay = HTS_VERBOSE_NONE;
} }
#endif #endif
@@ -953,9 +953,11 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
p = buff; p = buff;
do { do {
int insert_after_argc; int insert_after_argc;
int quoted; /* "" unquotes to empty but is still a real token (#106) */
// read next // read next
lastp = p; lastp = p;
quoted = (p != NULL && *p == '"');
if (p) { if (p) {
p = next_token(p, 1); p = next_token(p, 1);
if (p) { if (p) {
@@ -966,7 +968,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
/* Insert parameters BUT so that they can be in the same order */ /* Insert parameters BUT so that they can be in the same order */
if (lastp) { if (lastp) {
if (strnotempty(lastp)) { if (strnotempty(lastp) || quoted) {
insert_after_argc = argc - insert_after; insert_after_argc = argc - insert_after;
cmdl_ins(lastp, insert_after_argc, (argv + insert_after), x_argvblk, cmdl_ins(lastp, insert_after_argc, (argv + insert_after), x_argvblk,
x_argvblk_size, x_ptr); x_argvblk_size, x_ptr);
@@ -1815,24 +1817,22 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
com++; com++;
} }
break; break;
case 'L': case 'L': {
{ sscanf(com + 1, "%d", (int *) &opt->savename_83);
sscanf(com + 1, "%d", &opt->savename_83); switch (opt->savename_83) {
switch (opt->savename_83) { case 0: // 8-3 (ISO9660 L1)
case 0: // 8-3 (ISO9660 L1) opt->savename_83 = HTS_SAVENAME_83_DOS;
opt->savename_83 = 1; break;
break; case 1:
case 1: opt->savename_83 = HTS_SAVENAME_83_LONG;
opt->savename_83 = 0; break;
break; default: // 2 == ISO9660 (ISO9660 L2)
default: // 2 == ISO9660 (ISO9660 L2) opt->savename_83 = HTS_SAVENAME_83_ISO9660;
opt->savename_83 = 2; break;
break;
}
while(isdigit((unsigned char) *(com + 1)))
com++;
} }
break; while (isdigit((unsigned char) *(com + 1)))
com++;
} break;
case 's': case 's':
if (isdigit((unsigned char) *(com + 1))) { if (isdigit((unsigned char) *(com + 1))) {
sscanf(com + 1, "%d", (int *) &opt->robots); sscanf(com + 1, "%d", (int *) &opt->robots);
@@ -1989,9 +1989,9 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
} }
break; // url hack break; // url hack
case 'v': case 'v':
opt->verbosedisplay = 2; opt->verbosedisplay = HTS_VERBOSE_FULL;
if (isdigit((unsigned char) *(com + 1))) { if (isdigit((unsigned char) *(com + 1))) {
sscanf(com + 1, "%d", &opt->verbosedisplay); sscanf(com + 1, "%d", (int *) &opt->verbosedisplay);
while(isdigit((unsigned char) *(com + 1))) while(isdigit((unsigned char) *(com + 1)))
com++; com++;
} }
@@ -2004,9 +2004,9 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
} }
break; break;
case 'N': case 'N':
opt->savename_delayed = 2; opt->savename_delayed = HTS_SAVENAME_DELAYED_HARD;
if (isdigit((unsigned char) *(com + 1))) { if (isdigit((unsigned char) *(com + 1))) {
sscanf(com + 1, "%d", &opt->savename_delayed); sscanf(com + 1, "%d", (int *) &opt->savename_delayed);
while(isdigit((unsigned char) *(com + 1))) while(isdigit((unsigned char) *(com + 1)))
com++; com++;
} }
@@ -3096,6 +3096,78 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
htsmain_free(); htsmain_free();
return 0; return 0;
break; break;
case '9': { // copy_htsopt selftest: httrack -#9
httrackp *from = hts_create_opt();
httrackp *to = hts_create_opt();
int err = 0;
/* from-values differ from both the to-values and the
hts_create_opt() defaults (nearlink FALSE, errpage/parseall
TRUE), so a copy that no-ops or just resets to defaults is
caught too, not only the unsigned-guard bug. */
from->retry = 7; /* int field: positive control */
to->retry = 0;
from->nearlink = HTS_TRUE;
to->nearlink = HTS_FALSE;
from->errpage = HTS_FALSE;
to->errpage = HTS_TRUE;
from->parseall = HTS_FALSE;
to->parseall = HTS_TRUE;
copy_htsopt(from, to);
if (to->retry != 7)
err = 1;
if (to->nearlink != HTS_TRUE)
err = 1;
if (to->errpage != HTS_FALSE)
err = 1;
if (to->parseall != HTS_FALSE)
err = 1;
hts_free_opt(from);
hts_free_opt(to);
printf("copy-htsopt: %s\n", err ? "FAIL" : "OK");
htsmain_free();
return err;
} break;
case 'Q': { // cookie request-header selftest: httrack -#Q
static t_cookie cookie;
char hdr[1024];
/* RFC 6265: bare name=value pairs, no $Version/$Path (#151). */
const char *expected = "Cookie: name=value; has_js=1" H_CRLF;
int err = 0;
const char *dom = "www.example.com";
int added;
cookie.max_len = (int) sizeof(cookie.data);
cookie.data[0] = '\0';
added = cookie_add(&cookie, "name", "value", dom, "/");
added |= cookie_add(&cookie, "has_js", "1", dom, "/");
/* different domain: must be filtered out */
added |= cookie_add(&cookie, "junk", "x", "other.org", "/");
if (added) {
printf("cookie-header: FAIL (cookie_add setup)\n");
htsmain_free();
return 1;
}
http_cookie_header_selftest(&cookie, dom, "/", hdr,
sizeof(hdr));
if (strcmp(hdr, expected) != 0)
err = 1;
if (strstr(hdr, "$Version") != NULL ||
strstr(hdr, "$Path") != NULL)
err = 1;
if (strstr(hdr, "junk") != NULL) // wrong-domain cookie leaked
err = 1;
printf("cookie-header: %s\n", err ? "FAIL" : "OK");
if (err)
printf(" got: %s\n", hdr);
htsmain_free();
return err;
} break;
case '!': case '!':
HTS_PANIC_PRINTF HTS_PANIC_PRINTF
("Option #! is disabled for security reasons"); ("Option #! is disabled for security reasons");

View File

@@ -242,6 +242,14 @@ Please visit our Website: http://www.httrack.com
#define HTS_NOPARAM "(none)" #define HTS_NOPARAM "(none)"
#define HTS_NOPARAM2 "\"(none)\"" #define HTS_NOPARAM2 "\"(none)\""
/* Boolean flag for option fields and API yes/no returns. An enum (not C bool)
so it stays int-sized: option fields keep the httrackp layout/ABI, and a
return type stays compatible with the int it replaces. */
#ifndef HTS_DEF_DEFSTRUCT_hts_boolean
#define HTS_DEF_DEFSTRUCT_hts_boolean
typedef enum hts_boolean { HTS_FALSE = 0, HTS_TRUE = 1 } hts_boolean;
#endif
/* Larger/smaller of two values. Macros: arguments are evaluated twice. */ /* Larger/smaller of two values. Macros: arguments are evaluated twice. */
#define maximum(A,B) ( (A) > (B) ? (A) : (B) ) #define maximum(A,B) ( (A) > (B) ? (A) : (B) )

View File

@@ -874,6 +874,50 @@ static void print_buffer(buff_struct*const str, const char *format, ...) {
assertf(str->pos < str->capacity); assertf(str->pos < str->capacity);
} }
/* Append the request "Cookie:" header line for every stored cookie matching
domain/path. RFC 6265 form: bare "name=value" pairs joined by "; ", no
$Version/$Path attributes (those are RFC 2965 syntax that modern servers
reject, issue #151). Returns the number of cookies emitted. */
static int append_cookie_header(buff_struct *bstr, t_cookie *cookie,
const char *domain, const char *path) {
char buffer[8192];
char *b;
int cook = 0;
int max_cookies = 8;
if (cookie == NULL)
return 0;
b = cookie->data;
do {
b = cookie_find(b, "", domain, path); // next matching cookie
if (b != NULL) {
max_cookies--;
if (!cook) {
print_buffer(bstr, "Cookie: ");
cook = 1;
} else
print_buffer(bstr, "; ");
print_buffer(bstr, "%s", cookie_get(buffer, b, 5));
print_buffer(bstr, "=%s", cookie_get(buffer, b, 6));
b = cookie_nextfield(b);
}
} while (b != NULL && max_cookies > 0);
if (cook)
print_buffer(bstr, H_CRLF);
return cook;
}
/* Self-test entry for append_cookie_header(): build the request Cookie line
into dst (always NUL-terminated). Returns the number of cookies emitted. */
int http_cookie_header_selftest(t_cookie *cookie, const char *domain,
const char *path, char *dst, size_t dst_size) {
buff_struct bstr = {dst, dst_size, 0};
assertf(dst != NULL && dst_size > 0);
dst[0] = '\0';
return append_cookie_header(&bstr, cookie, domain, path);
}
// envoi d'une requète // envoi d'une requète
int http_sendhead(httrackp * opt, t_cookie * cookie, int mode, int http_sendhead(httrackp * opt, t_cookie * cookie, int mode,
const char *xsend, const char *adr, const char *fil, const char *xsend, const char *adr, const char *fil,
@@ -1048,34 +1092,9 @@ int http_sendhead(httrackp * opt, t_cookie * cookie, int mode,
search_tag + strlen(POSTTOK) + 1)))); search_tag + strlen(POSTTOK) + 1))));
} }
} }
// gestion cookies? // send stored cookies matching this host/path
if (cookie) { if (cookie) {
char buffer[8192]; append_cookie_header(&bstr, cookie, jump_identification_const(adr), fil);
char *b = cookie->data;
int cook = 0;
int max_cookies = 8;
do {
b = cookie_find(b, "", jump_identification_const(adr), fil); // prochain cookie satisfaisant aux conditions
if (b != NULL) {
max_cookies--;
if (!cook) {
print_buffer(&bstr, "Cookie: $Version=1; ");
cook = 1;
} else
print_buffer(&bstr, "; ");
print_buffer(&bstr, "%s", cookie_get(buffer, b, 5));
print_buffer(&bstr, "=%s", cookie_get(buffer, b, 6));
print_buffer(&bstr, "; $Path=%s", cookie_get(buffer, b, 2));
b = cookie_nextfield(b);
}
} while(b != NULL && max_cookies > 0);
if (cook) { // on a envoyé un (ou plusieurs) cookie?
print_buffer(&bstr, H_CRLF);
#if DEBUG_COOK
printf("Header:\n%s\n", bstr.buffer);
#endif
}
} }
// gérer le keep-alive (garder socket) // gérer le keep-alive (garder socket)
if (retour->req.http11 && !retour->req.nokeepalive) { if (retour->req.http11 && !retour->req.nokeepalive) {
@@ -3646,8 +3665,9 @@ HTSEXT_API char *unescape_http(char *const catbuff, const size_t size, const cha
// DOES NOT DECODE %25 (part of CHAR_DELIM) // DOES NOT DECODE %25 (part of CHAR_DELIM)
// no_high & 1: decode high chars // no_high & 1: decode high chars
// no_high & 2: decode space // no_high & 2: decode space
HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size, HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size,
const char *s, const int no_high) { const char *s,
const hts_boolean no_high) {
size_t i, j; size_t i, j;
RUNTIME_TIME_CHECK_SIZE(size); RUNTIME_TIME_CHECK_SIZE(size);
@@ -3931,8 +3951,8 @@ void hts_replace(char *s, char from, char to) {
// guess a local file's mime type (e.g. fil="toto.gif" -> s="image/gif") // guess a local file's mime type (e.g. fil="toto.gif" -> s="image/gif")
// returns 1 if a type was written to s, 0 otherwise // returns 1 if a type was written to s, 0 otherwise
int guess_httptype_sized(httrackp *opt, char *s, size_t ssize, hts_boolean guess_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil) { const char *fil) {
return get_httptype_sized(opt, s, ssize, fil, 1); return get_httptype_sized(opt, s, ssize, fil, 1);
} }
@@ -3945,8 +3965,8 @@ void guess_httptype(httrackp * opt, char *s, const char *fil) {
// write the mime type for fil into s (capacity ssize) // write the mime type for fil into s (capacity ssize)
// flag: 1 to always return a type (the "application/..." / octet-stream // flag: 1 to always return a type (the "application/..." / octet-stream
// fallback) returns 1 if a type was written to s, 0 otherwise // fallback) returns 1 if a type was written to s, 0 otherwise
HTSEXT_API int get_httptype_sized(httrackp *opt, char *s, size_t ssize, HTSEXT_API hts_boolean get_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil, int flag) { const char *fil, hts_boolean flag) {
// userdef overrides get_httptype (a rule with an empty value, e.g. "--assume // userdef overrides get_httptype (a rule with an empty value, e.g. "--assume
// cgi=", matches but writes nothing: report it as "no type" like the old // cgi=", matches but writes nothing: report it as "no type" like the old
// code, whose callers tested strnotempty(s)) // code, whose callers tested strnotempty(s))
@@ -4196,7 +4216,7 @@ HTSEXT_API int is_userknowntype(httrackp * opt, const char *fil) {
// page dynamique? // page dynamique?
// is_dyntype(get_ext("foo.asp")) // is_dyntype(get_ext("foo.asp"))
HTSEXT_API int is_dyntype(const char *fil) { HTSEXT_API hts_boolean is_dyntype(const char *fil) {
int j = 0; int j = 0;
if (!fil) if (!fil)
@@ -4214,7 +4234,7 @@ HTSEXT_API int is_dyntype(const char *fil) {
// types critiques qui ne doivent pas être changés car renvoyés par des serveurs qui ne // types critiques qui ne doivent pas être changés car renvoyés par des serveurs qui ne
// connaissent pas le type // connaissent pas le type
int may_unknown(httrackp * opt, const char *st) { hts_boolean may_unknown(httrackp *opt, const char *st) {
int j = 0; int j = 0;
// types média // types média
@@ -5236,7 +5256,8 @@ HTSEXT_API int hts_uninit_module(void) {
} }
// legacy. do not use // legacy. do not use
HTSEXT_API int hts_log(httrackp * opt, const char *prefix, const char *msg) { HTSEXT_API hts_boolean hts_log(httrackp *opt, const char *prefix,
const char *msg) {
if (opt->log != NULL) { if (opt->log != NULL) {
fspc(opt, opt->log, prefix); fspc(opt, opt->log, prefix);
fprintf(opt->log, "%s" LF, msg); fprintf(opt->log, "%s" LF, msg);
@@ -5466,9 +5487,10 @@ HTSEXT_API httrackp *hts_create_opt(void) {
"Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"); "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)");
StringCopy(opt->referer, ""); StringCopy(opt->referer, "");
StringCopy(opt->from, ""); StringCopy(opt->from, "");
opt->savename_83 = 0; // noms longs par défaut opt->savename_83 = HTS_SAVENAME_83_LONG; // long names by default
opt->savename_type = 0; // avec structure originale opt->savename_type = 0; // avec structure originale
opt->savename_delayed = 2; // hard delayed type (default) opt->savename_delayed =
HTS_SAVENAME_DELAYED_HARD; // always delay the type check (default)
opt->delayed_cached = HTS_TRUE; opt->delayed_cached = HTS_TRUE;
opt->mimehtml = HTS_FALSE; opt->mimehtml = HTS_FALSE;
opt->parsejava = HTSPARSE_DEFAULT; // parser classes opt->parsejava = HTSPARSE_DEFAULT; // parser classes
@@ -5493,7 +5515,7 @@ HTSEXT_API httrackp *hts_create_opt(void) {
opt->parseall = HTS_TRUE; opt->parseall = HTS_TRUE;
opt->parsedebug = HTS_FALSE; opt->parsedebug = HTS_FALSE;
opt->norecatch = HTS_FALSE; opt->norecatch = HTS_FALSE;
opt->verbosedisplay = 0; // pas d'animation texte opt->verbosedisplay = HTS_VERBOSE_NONE; // no text animation
opt->sizehack = HTS_FALSE; opt->sizehack = HTS_FALSE;
opt->urlhack = HTS_TRUE; opt->urlhack = HTS_TRUE;
StringCopy(opt->footer, HTS_DEFAULT_FOOTER); StringCopy(opt->footer, HTS_DEFAULT_FOOTER);

View File

@@ -182,6 +182,11 @@ int http_sendhead(httrackp * opt, t_cookie * cookie, int mode, const char *xsend
const char *adr, const char *fil, const char *adr, const char *fil,
const char *referer_adr, const char *referer_fil, const char *referer_adr, const char *referer_fil,
htsblk * retour); htsblk * retour);
/* Build the request "Cookie:" header line for stored cookies matching
domain/path into dst (NUL-terminated). Exposed for the -#Q self-test;
wraps the same logic http_sendhead() uses. Returns cookies emitted. */
int http_cookie_header_selftest(t_cookie *cookie, const char *domain,
const char *path, char *dst, size_t dst_size);
//int newhttp(char* iadr,char* err=NULL); //int newhttp(char* iadr,char* err=NULL);
T_SOC newhttp(httrackp * opt, const char *iadr, htsblk * retour, int port, T_SOC newhttp(httrackp * opt, const char *iadr, htsblk * retour, int port,

View File

@@ -184,10 +184,11 @@ int url_savename(lien_adrfilsave *const afs,
/* 8-3 ? */ /* 8-3 ? */
switch (opt->savename_83) { switch (opt->savename_83) {
case 1: // 8-3 case HTS_SAVENAME_83_DOS: // 8-3
max_char = 8; max_char = 8;
break; break;
case 2: // Level 2 File names may be up to 31 characters. case HTS_SAVENAME_83_ISO9660: // Level 2 File names may be up to 31
// characters.
max_char = 31; max_char = 31;
break; break;
default: default:
@@ -324,7 +325,7 @@ int url_savename(lien_adrfilsave *const afs,
} }
/* replace shtml to html.. */ /* replace shtml to html.. */
if (opt->savename_delayed == 2) if (opt->savename_delayed == HTS_SAVENAME_DELAYED_HARD)
is_html = -1; /* ALWAYS delay type */ is_html = -1; /* ALWAYS delay type */
else else
is_html = ishtml(opt, fil); is_html = ishtml(opt, fil);
@@ -363,7 +364,9 @@ int url_savename(lien_adrfilsave *const afs,
) { ) {
// tester type avec requète HEAD si on ne connait pas le type du fichier // tester type avec requète HEAD si on ne connait pas le type du fichier
if (!((opt->check_type == 1) && (fil[strlen(fil) - 1] == '/'))) // slash doit être html? if (!((opt->check_type == 1) && (fil[strlen(fil) - 1] == '/'))) // slash doit être html?
if (opt->savename_delayed == 2 || (ishtest = ishtml(opt, fil)) < 0) { // on ne sait pas si c'est un html ou un fichier.. if (opt->savename_delayed == HTS_SAVENAME_DELAYED_HARD ||
(ishtest = ishtml(opt, fil)) <
0) { // unsure whether it's html or a file
// lire dans le cache // lire dans le cache
htsblk r = cache_read_including_broken(opt, cache, adr, fil); // test uniquement htsblk r = cache_read_including_broken(opt, cache, adr, fil); // test uniquement
@@ -393,11 +396,12 @@ int url_savename(lien_adrfilsave *const afs,
} }
#endif #endif
// //
} else if (opt->savename_delayed != 2 && is_userknowntype(opt, fil)) { /* PATCH BY BRIAN SCHRÖDER. } else if (opt->savename_delayed != HTS_SAVENAME_DELAYED_HARD &&
Lookup mimetype not only by extension, is_userknowntype(opt, fil)) { /* PATCH BY BRIAN SCHRÖDER.
but also by filename */ Lookup mimetype not only by extension,
/* Note: "foo.cgi => text/html" means that foo.cgi shall have the text/html MIME file type, but also by filename */
that is, ".html" */ /* Note: "foo.cgi => text/html" means that foo.cgi shall have the
text/html MIME file type, that is, ".html" */
char BIGSTK mime[1024]; char BIGSTK mime[1024];
mime[0] = ext[0] = '\0'; mime[0] = ext[0] = '\0';
@@ -408,9 +412,13 @@ int url_savename(lien_adrfilsave *const afs,
} }
} }
} }
// note: if savename_delayed is enabled, the naming will be temporary (and slightly invalid!) // note: if savename_delayed is enabled, the naming will be temporary
// note: if we are about to stop (opt->state.stop), back_add() will fail later // (and slightly invalid!)
else if (opt->savename_delayed != 0 && !opt->state.stop) { //
// note: if we are about to stop (opt->state.stop), back_add() will
// fail later
else if (opt->savename_delayed != HTS_SAVENAME_DELAYED_NONE &&
!opt->state.stop) {
// Check if the file is ready in backing. We basically take the same logic as later. // Check if the file is ready in backing. We basically take the same logic as later.
// FIXME: we should cleanup and factorize this unholy mess // FIXME: we should cleanup and factorize this unholy mess
if (headers != NULL && headers->status >= 0 && !is_redirect) { if (headers != NULL && headers->status >= 0 && !is_redirect) {
@@ -698,7 +706,7 @@ int url_savename(lien_adrfilsave *const afs,
} }
// restaurer // restaurer
opt->state._hts_in_html_parsing = hihp; opt->state._hts_in_html_parsing = hihp;
} // caché? } // caché?
} }
} }
} }
@@ -1190,7 +1198,8 @@ int url_savename(lien_adrfilsave *const afs,
// Not used anymore unless non-delayed types. // Not used anymore unless non-delayed types.
// de même en cas de manque d'extension on en place une de manière forcée.. // de même en cas de manque d'extension on en place une de manière forcée..
// cela évite les /chez/toto et les /chez/toto/index.html incompatibles // cela évite les /chez/toto et les /chez/toto/index.html incompatibles
if (opt->savename_type != -1 && opt->savename_delayed != 2) { if (opt->savename_type != -1 &&
opt->savename_delayed != HTS_SAVENAME_DELAYED_HARD) {
char *a = afs->save + strlen(afs->save) - 1; char *a = afs->save + strlen(afs->save) - 1;
while((a > afs->save) && (*a != '.') && (*a != '/')) while((a > afs->save) && (*a != '.') && (*a != '/'))
@@ -1236,31 +1245,21 @@ int url_savename(lien_adrfilsave *const afs,
size_t i; size_t i;
for(i = 0 ; afs->save[i] != '\0' ; i++) { for(i = 0 ; afs->save[i] != '\0' ; i++) {
unsigned char c = (unsigned char) afs->save[i]; unsigned char c = (unsigned char) afs->save[i];
if (c < 32 // control if (c < 32 // control
|| c == 127 // unwise || c == 127 // unwise
|| c == '~' // unix unwise || c == '~' // unix unwise
|| c == '\\' // windows separator || c == '\\' // windows separator
|| c == ':' // windows forbidden || c == ':' // windows forbidden
|| c == '*' // windows forbidden || c == '*' // windows forbidden
|| c == '?' // windows forbidden || c == '?' // windows forbidden
|| c == '\"' // windows forbidden || c == '\"' // windows forbidden
|| c == '<' // windows forbidden || c == '<' // windows forbidden
|| c == '>' // windows forbidden || c == '>' // windows forbidden
|| c == '|' // windows forbidden || c == '|' // windows forbidden
//|| c == '@' // ? //|| c == '@' // ?
|| || (opt->savename_83 == HTS_SAVENAME_83_ISO9660 // CDROM
( && (c == '-' || c == '=' || c == '+'))) {
opt->savename_83 == 2 // CDROM afs->save[i] = '_';
&&
(
c == '-'
|| c == '='
|| c == '+'
)
)
)
{
afs->save[i] = '_';
} }
} }
} }
@@ -1521,7 +1520,8 @@ int url_savename(lien_adrfilsave *const afs,
char *a = afs->save + strlen(afs->save) - 1; char *a = afs->save + strlen(afs->save) - 1;
char *b; char *b;
int n = 2; int n = 2;
char collisionSeparator = ((opt->savename_83 != 2) ? '-' : '_'); char collisionSeparator =
((opt->savename_83 != HTS_SAVENAME_83_ISO9660) ? '-' : '_');
tempo[0] = '\0'; tempo[0] = '\0';

View File

@@ -342,24 +342,44 @@ typedef enum hts_seeker {
HTS_SEEKER_UP = 1 << 1 /**< may ascend to parent directories */ HTS_SEEKER_UP = 1 << 1 /**< may ascend to parent directories */
} hts_seeker; } hts_seeker;
/* Link-following scope, stored in the low byte of opt->travel. */ /* opt->travel: link-following scope in the low byte, flags OR'd in above it. */
typedef enum hts_travel_scope { typedef enum hts_travel_scope {
HTS_TRAVEL_SAME_ADDRESS = 0, /**< stay on the same address (host) */ HTS_TRAVEL_SAME_ADDRESS = 0, /**< stay on the same address (host) */
HTS_TRAVEL_SAME_DOMAIN = 1, /**< stay on the same principal domain */ HTS_TRAVEL_SAME_DOMAIN = 1, /**< stay on the same principal domain */
HTS_TRAVEL_SAME_TLD = 2, /**< stay on the same TLD (e.g. .com) */ HTS_TRAVEL_SAME_TLD = 2, /**< stay on the same TLD (e.g. .com) */
HTS_TRAVEL_EVERYWHERE = 7 /**< follow links anywhere on the web */ HTS_TRAVEL_EVERYWHERE = 7, /**< follow links anywhere on the web */
HTS_TRAVEL_TEST_ALL = 1 << 8 /**< also test forbidden URLs (-t) */
} hts_travel_scope; } hts_travel_scope;
/* Flags OR'd into opt->travel above the scope value. */ /* Mask selecting the scope value out of opt->travel. */
#define HTS_TRAVEL_SCOPE_MASK 0xff /**< mask selecting the scope value */ #define HTS_TRAVEL_SCOPE_MASK 0xff
#define HTS_TRAVEL_TEST_ALL (1 << 8) /**< also test forbidden URLs (-t) */
/* Boolean option flag. An enum (not C bool) so the option fields stay int-sized /* Text progress display detail (opt->verbosedisplay). */
and the httrackp layout/ABI is unchanged. */ typedef enum hts_verbosedisplay {
#ifndef HTS_DEF_DEFSTRUCT_hts_boolean HTS_VERBOSE_NONE = 0, /**< no animated progress display (default) */
#define HTS_DEF_DEFSTRUCT_hts_boolean HTS_VERBOSE_SIMPLE = 1, /**< minimal single-line progress */
typedef enum hts_boolean { HTS_FALSE = 0, HTS_TRUE = 1 } hts_boolean; HTS_VERBOSE_FULL = 2 /**< full animated progress */
#endif } hts_verbosedisplay;
/* Delayed file-type resolution policy (opt->savename_delayed). */
typedef enum hts_savename_delayed {
HTS_SAVENAME_DELAYED_NONE = 0, /**< resolve the type immediately */
HTS_SAVENAME_DELAYED_SOFT = 1, /**< delay the type check when unknown */
HTS_SAVENAME_DELAYED_HARD = 2 /**< always delay the type check (default) */
} hts_savename_delayed;
/* Saved-name length layout (opt->savename_83). */
typedef enum hts_savename_83 {
HTS_SAVENAME_83_LONG = 0, /**< long file names (default) */
HTS_SAVENAME_83_DOS = 1, /**< DOS 8.3 names (ISO9660 level 1) */
HTS_SAVENAME_83_ISO9660 = 2 /**< ISO9660 level 2 names (up to 31 chars) */
} hts_savename_83;
/* Host-banning triggers (opt->hostcontrol bitmask). */
typedef enum hts_hostcontrol {
HTS_HOSTCONTROL_BAN_TIMEOUT = 1 << 0, /**< ban a timing-out host */
HTS_HOSTCONTROL_BAN_SLOW = 1 << 1 /**< ban a too-slow host */
} hts_hostcontrol;
#ifndef HTS_DEF_FWSTRUCT_lien_buffers #ifndef HTS_DEF_FWSTRUCT_lien_buffers
#define HTS_DEF_FWSTRUCT_lien_buffers #define HTS_DEF_FWSTRUCT_lien_buffers
@@ -393,7 +413,7 @@ struct httrackp {
hts_urlmode hts_urlmode
urlmode; /**< saved-link rewriting style (relative, absolute, etc.) */ urlmode; /**< saved-link rewriting style (relative, absolute, etc.) */
hts_boolean no_type_change; // do not change file type according to MIME hts_boolean no_type_change; // do not change file type according to MIME
int debug; /**< debug logging level */ hts_log_type debug; /**< debug logging level */
int getmode; /**< what to fetch (HTML, images, ...) bitmask */ int getmode; /**< what to fetch (HTML, images, ...) bitmask */
FILE *log; /**< informational log stream; NULL mutes it */ FILE *log; /**< informational log stream; NULL mutes it */
FILE *errlog; /**< error log stream; NULL mutes it */ FILE *errlog; /**< error log stream; NULL mutes it */
@@ -417,11 +437,12 @@ struct httrackp {
// int aff_progress; // progress bar // int aff_progress; // progress bar
hts_boolean shell; /**< driven by a shell over stdin/stdout pipes */ hts_boolean shell; /**< driven by a shell over stdin/stdout pipes */
t_proxy proxy; /**< proxy configuration */ t_proxy proxy; /**< proxy configuration */
int savename_83; /**< force 8.3 (DOS) file names */ hts_savename_83
savename_83; /**< saved-name length layout (long/DOS/ISO9660) */
int savename_type; /**< saved-name layout (original tree, flat, ...) */ int savename_type; /**< saved-name layout (original tree, flat, ...) */
String String
savename_userdef; /**< user-defined name template (e.g. %h%p/%n%q.%t) */ savename_userdef; /**< user-defined name template (e.g. %h%p/%n%q.%t) */
int savename_delayed; // delayed type check hts_savename_delayed savename_delayed; /**< delayed type-check policy */
hts_boolean hts_boolean
delayed_cached; // delayed type check can be cached to speedup updates delayed_cached; // delayed type check can be cached to speedup updates
hts_boolean mimehtml; /**< produce a single MIME/MHTML archive */ hts_boolean mimehtml; /**< produce a single MIME/MHTML archive */
@@ -437,7 +458,7 @@ struct httrackp {
hts_boolean makestat; /**< maintain a transfer-statistics log */ hts_boolean makestat; /**< maintain a transfer-statistics log */
hts_boolean maketrack; /**< maintain an operations-statistics log */ hts_boolean maketrack; /**< maintain an operations-statistics log */
int parsejava; /**< Java/JS parsing mode; see htsparsejava_flags */ int parsejava; /**< Java/JS parsing mode; see htsparsejava_flags */
int hostcontrol; /**< drop hosts that are too slow, etc. */ int hostcontrol; /**< ban slow/timing-out hosts; see hts_hostcontrol bits */
hts_boolean errpage; /**< generate an error page on 404 and similar */ hts_boolean errpage; /**< generate an error page on 404 and similar */
hts_boolean hts_boolean
check_type; /**< probe unknown-type links (cgi/asp/dir) and follow moves check_type; /**< probe unknown-type links (cgi/asp/dir) and follow moves
@@ -462,7 +483,7 @@ struct httrackp {
parseall; /**< parse aggressively, including unknown tags with links */ parseall; /**< parse aggressively, including unknown tags with links */
hts_boolean parsedebug; /**< parser debug mode */ hts_boolean parsedebug; /**< parser debug mode */
hts_boolean norecatch; /**< do not re-fetch files the user deleted locally */ hts_boolean norecatch; /**< do not re-fetch files the user deleted locally */
int verbosedisplay; /**< animated text progress display */ hts_verbosedisplay verbosedisplay; /**< animated text progress display */
String footer; /**< footer/info line injected into pages */ String footer; /**< footer/info line injected into pages */
int maxcache; /**< in-memory cache backing limit (bytes) */ int maxcache; /**< in-memory cache backing limit (bytes) */
// int maxcache_anticipate; // maximum links to anticipate (upper bound) // int maxcache_anticipate; // maximum links to anticipate (upper bound)

View File

@@ -3722,7 +3722,8 @@ int hts_mirror_check_moved(htsmoduleStruct * str,
//case -1: can_retry=1; break; //case -1: can_retry=1; break;
case STATUSCODE_TIMEOUT: case STATUSCODE_TIMEOUT:
if (opt->hostcontrol) { // timeout et retry épuisés if (opt->hostcontrol) { // timeout et retry épuisés
if ((opt->hostcontrol & 1) && (heap(ptr)->retry <= 0)) { if ((opt->hostcontrol & HTS_HOSTCONTROL_BAN_TIMEOUT) &&
(heap(ptr)->retry <= 0)) {
hts_log_print(opt, LOG_DEBUG, "Link banned: %s%s", urladr(), urlfil()); hts_log_print(opt, LOG_DEBUG, "Link banned: %s%s", urladr(), urlfil());
host_ban(opt, ptr, sback, jump_identification_const(urladr())); host_ban(opt, ptr, sback, jump_identification_const(urladr()));
hts_log_print(opt, LOG_DEBUG, hts_log_print(opt, LOG_DEBUG,
@@ -3735,7 +3736,7 @@ int hts_mirror_check_moved(htsmoduleStruct * str,
break; break;
case STATUSCODE_SLOW: case STATUSCODE_SLOW:
if ((opt->hostcontrol) && (heap(ptr)->retry <= 0)) { // too slow if ((opt->hostcontrol) && (heap(ptr)->retry <= 0)) { // too slow
if (opt->hostcontrol & 2) { if (opt->hostcontrol & HTS_HOSTCONTROL_BAN_SLOW) {
hts_log_print(opt, LOG_DEBUG, "Link banned: %s%s", urladr(), urlfil()); hts_log_print(opt, LOG_DEBUG, "Link banned: %s%s", urladr(), urlfil());
host_ban(opt, ptr, sback, jump_identification_const(urladr())); host_ban(opt, ptr, sback, jump_identification_const(urladr()));
hts_log_print(opt, LOG_DEBUG, hts_log_print(opt, LOG_DEBUG,
@@ -4261,10 +4262,10 @@ int hts_mirror_wait_for_next_file(htsmoduleStruct * str,
char com[256]; char com[256];
linput(stdin, com, 200); linput(stdin, com, 200);
if (opt->verbosedisplay == 2) if (opt->verbosedisplay == HTS_VERBOSE_FULL)
opt->verbosedisplay = 1; opt->verbosedisplay = HTS_VERBOSE_SIMPLE;
else else
opt->verbosedisplay = 2; opt->verbosedisplay = HTS_VERBOSE_FULL;
/* Info for wrappers */ /* Info for wrappers */
hts_log_print(opt, LOG_INFO, "engine: change-options"); hts_log_print(opt, LOG_INFO, "engine: change-options");
RUN_CALLBACK0(opt, chopt); RUN_CALLBACK0(opt, chopt);
@@ -4374,7 +4375,7 @@ int hts_mirror_wait_for_next_file(htsmoduleStruct * str,
printf("%c\x0d", ("/-\\|")[roll]); printf("%c\x0d", ("/-\\|")[roll]);
fflush(stdout); fflush(stdout);
} }
} else if (opt->verbosedisplay == 1) { } else if (opt->verbosedisplay == HTS_VERBOSE_SIMPLE) {
if (b >= 0) { if (b >= 0) {
if (back[b].r.statuscode == HTTP_OK) if (back[b].r.statuscode == HTTP_OK)
printf("%d/%d: %s%s (" LLintP " bytes) - OK\33[K\r", ptr, opt->lien_tot, printf("%d/%d: %s%s (" LLintP " bytes) - OK\33[K\r", ptr, opt->lien_tot,
@@ -4465,8 +4466,8 @@ int hts_wait_delayed(htsmoduleStruct * str, lien_adrfilsave *afs,
char in_error_msg[32]; char in_error_msg[32];
// resolve unresolved type // resolve unresolved type
if (opt->savename_delayed != 0 && *forbidden_url == 0 && IS_DELAYED_EXT(afs->save) if (opt->savename_delayed != HTS_SAVENAME_DELAYED_NONE &&
&& !opt->state.stop) { *forbidden_url == 0 && IS_DELAYED_EXT(afs->save) && !opt->state.stop) {
int loops; int loops;
int continue_loop; int continue_loop;
@@ -4850,7 +4851,7 @@ int hts_wait_delayed(htsmoduleStruct * str, lien_adrfilsave *afs,
} }
} }
} // delayed type check ? } // delayed type check ?
ENGINE_SAVE_CONTEXT_BASE(); ENGINE_SAVE_CONTEXT_BASE();

View File

@@ -1213,7 +1213,7 @@ HTSEXT_API find_handle hts_findfirst(char *path) {
return NULL; return NULL;
} }
HTSEXT_API int hts_findnext(find_handle find) { HTSEXT_API hts_boolean hts_findnext(find_handle find) {
if (find) { if (find) {
#ifdef _WIN32 #ifdef _WIN32
if ((FindNextFileA(find->handle, &find->hdata))) if ((FindNextFileA(find->handle, &find->hdata)))
@@ -1273,7 +1273,7 @@ HTSEXT_API int hts_findgetsize(find_handle find) {
return -1; return -1;
} }
HTSEXT_API int hts_findisdir(find_handle find) { HTSEXT_API hts_boolean hts_findisdir(find_handle find) {
if (find) { if (find) {
if (!hts_findissystem(find)) { if (!hts_findissystem(find)) {
#ifdef _WIN32 #ifdef _WIN32
@@ -1287,7 +1287,7 @@ HTSEXT_API int hts_findisdir(find_handle find) {
} }
return 0; return 0;
} }
HTSEXT_API int hts_findisfile(find_handle find) { HTSEXT_API hts_boolean hts_findisfile(find_handle find) {
if (find) { if (find) {
if (!hts_findissystem(find)) { if (!hts_findissystem(find)) {
#ifdef _WIN32 #ifdef _WIN32
@@ -1301,7 +1301,7 @@ HTSEXT_API int hts_findisfile(find_handle find) {
} }
return 0; return 0;
} }
HTSEXT_API int hts_findissystem(find_handle find) { HTSEXT_API hts_boolean hts_findissystem(find_handle find) {
if (find) { if (find) {
#ifdef _WIN32 #ifdef _WIN32
if (find->hdata. if (find->hdata.

View File

@@ -108,15 +108,15 @@ HTSEXT_API int hts_buildtopindex(httrackp * opt, const char *path,
// Portable directory find functions // Portable directory find functions
// Directory find functions // Directory find functions
HTSEXT_API find_handle hts_findfirst(char *path); HTSEXT_API find_handle hts_findfirst(char *path);
HTSEXT_API int hts_findnext(find_handle find); HTSEXT_API hts_boolean hts_findnext(find_handle find);
HTSEXT_API int hts_findclose(find_handle find); HTSEXT_API int hts_findclose(find_handle find);
// //
HTSEXT_API char *hts_findgetname(find_handle find); HTSEXT_API char *hts_findgetname(find_handle find);
HTSEXT_API int hts_findgetsize(find_handle find); HTSEXT_API int hts_findgetsize(find_handle find);
HTSEXT_API int hts_findisdir(find_handle find); HTSEXT_API hts_boolean hts_findisdir(find_handle find);
HTSEXT_API int hts_findisfile(find_handle find); HTSEXT_API hts_boolean hts_findisfile(find_handle find);
HTSEXT_API int hts_findissystem(find_handle find); HTSEXT_API hts_boolean hts_findissystem(find_handle find);
#endif #endif

View File

@@ -206,7 +206,8 @@ HTSEXT_API htsErrorCallback hts_get_error_callback(void);
/* Logging */ /* Logging */
/** Legacy: write prefix then msg to opt->log. Returns 0 if written, 1 if /** Legacy: write prefix then msg to opt->log. Returns 0 if written, 1 if
opt->log is NULL. Prefer hts_log_print(). */ opt->log is NULL. Prefer hts_log_print(). */
HTSEXT_API int hts_log(httrackp * opt, const char *prefix, const char *msg); HTSEXT_API hts_boolean hts_log(httrackp *opt, const char *prefix,
const char *msg);
/** printf-style log at level @p type (an hts_log_type, optionally |LOG_ERRNO). /** printf-style log at level @p type (an hts_log_type, optionally |LOG_ERRNO).
Forwards to the registered log callback, and when the level is <= opt->debug Forwards to the registered log callback, and when the level is <= opt->debug
@@ -313,7 +314,8 @@ HTSEXT_API T_SOC catch_url_init(int *port, char *adr);
"ip:port". The buffers are caller-allocated and not bounds-checked: @p data "ip:port". The buffers are caller-allocated and not bounds-checked: @p data
must be CATCH_URL_DATA_SIZE bytes, and @p url / @p method must fit the must be CATCH_URL_DATA_SIZE bytes, and @p url / @p method must fit the
captured request line. */ captured request line. */
HTSEXT_API int catch_url(T_SOC soc, char *url, char *method, char *data); HTSEXT_API hts_boolean catch_url(T_SOC soc, char *url, char *method,
char *data);
/* State */ /* State */
/** Whether the engine is parsing HTML. Returns 0 if not, otherwise the percent /** Whether the engine is parsing HTML. Returns 0 if not, otherwise the percent
@@ -334,10 +336,10 @@ HTSEXT_API int hts_is_exiting(httrackp * opt);
caller-owned, NULL-terminated array of strings; the engine stores the caller-owned, NULL-terminated array of strings; the engine stores the
pointer without copying, so the array and its strings must stay valid until pointer without copying, so the array and its strings must stay valid until
the engine consumes them. @return nonzero if a list is now set. */ the engine consumes them. @return nonzero if a list is now set. */
HTSEXT_API int hts_addurl(httrackp * opt, char **url); HTSEXT_API hts_boolean hts_addurl(httrackp *opt, char **url);
/** Clear any pending add-URL list set by hts_addurl(). Always returns 0. */ /** Clear any pending add-URL list set by hts_addurl(). Always returns 0. */
HTSEXT_API int hts_resetaddurl(httrackp * opt); HTSEXT_API hts_boolean hts_resetaddurl(httrackp *opt);
/** Apply the runtime-tunable options from @p from onto @p to, to adjust a live /** Apply the runtime-tunable options from @p from onto @p to, to adjust a live
mirror. Only fields set to a non-sentinel value are copied; the rest of @p mirror. Only fields set to a non-sentinel value are copied; the rest of @p
@@ -356,7 +358,7 @@ HTSEXT_API int hts_setpause(httrackp * opt, int);
lock, so it is safe to call from another thread). @p force is currently lock, so it is safe to call from another thread). @p force is currently
ignored. ignored.
@return 0; no-op if @p opt is NULL. */ @return 0; no-op if @p opt is NULL. */
HTSEXT_API int hts_request_stop(httrackp * opt, int force); HTSEXT_API int hts_request_stop(httrackp *opt, hts_boolean force);
/** Queue a single in-progress file, by URL, to be cancelled by the engine. /** Queue a single in-progress file, by URL, to be cancelled by the engine.
@p url is copied internally. Takes the state lock, so it is thread-safe. @p url is copied internally. Takes the state lock, so it is thread-safe.
@@ -373,7 +375,7 @@ HTSEXT_API void hts_cancel_parsing(httrackp * opt);
/** Nonzero once the mirror has fully ended. Read under the engine state lock, /** Nonzero once the mirror has fully ended. Read under the engine state lock,
so safe to poll from another thread. Wait for this before hts_free_opt(). */ so safe to poll from another thread. Wait for this before hts_free_opt(). */
HTSEXT_API int hts_has_stopped(httrackp * opt); HTSEXT_API hts_boolean hts_has_stopped(httrackp *opt);
/* Tools */ /* Tools */
/** Ensure the directory chain leading to @p path exists, creating missing /** Ensure the directory chain leading to @p path exists, creating missing
@@ -390,7 +392,7 @@ HTSEXT_API int structcheck_utf8(const char *path);
/** Whether the directory containing @p path exists. The basename is stripped /** Whether the directory containing @p path exists. The basename is stripped
first, so passing a file path tests its parent directory. @return 1 if it is first, so passing a file path tests its parent directory. @return 1 if it is
a directory, 0 otherwise. */ a directory, 0 otherwise. */
HTSEXT_API int dir_exists(const char *path); HTSEXT_API hts_boolean dir_exists(const char *path);
/** Write the HTTP reason phrase for @p statuscode into @p msg, a caller buffer /** Write the HTTP reason phrase for @p statuscode into @p msg, a caller buffer
of at least 64 bytes. For an unknown code a non-empty @p msg is kept, of at least 64 bytes. For an unknown code a non-empty @p msg is kept,
@@ -573,14 +575,15 @@ HTSEXT_API char *unescape_http(char *const catbuff, const size_t size, const cha
must-avoid escapes are kept encoded, and %25 is never decoded). @p no_high & must-avoid escapes are kept encoded, and %25 is never decoded). @p no_high &
1 also decodes high (>= 128) bytes; @p no_high & 2 also decodes an escaped 1 also decodes high (>= 128) bytes; @p no_high & 2 also decodes an escaped
space. Returns @p catbuff. */ space. Returns @p catbuff. */
HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size, const char *s, const int no_high); HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size,
const char *s, const hts_boolean no_high);
/** Determine the MIME type of local file name @p fil into @p s (capacity /** Determine the MIME type of local file name @p fil into @p s (capacity
@p ssize): user --assume rules, then ".html", then the built-in extension @p ssize): user --assume rules, then ".html", then the built-in extension
table. @p flag != 0 forces a fallback type. @return 1 if a type was written, table. @p flag != 0 forces a fallback type. @return 1 if a type was written,
0 otherwise. */ 0 otherwise. */
HTSEXT_API int get_httptype_sized(httrackp *opt, char *s, size_t ssize, HTSEXT_API hts_boolean get_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil, int flag); const char *fil, hts_boolean flag);
/** @deprecated Use get_httptype_sized(). Assumes @p s has at least /** @deprecated Use get_httptype_sized(). Assumes @p s has at least
HTS_MIMETYPE_SIZE capacity. */ HTS_MIMETYPE_SIZE capacity. */
@@ -600,7 +603,7 @@ HTSEXT_API int is_userknowntype(httrackp * opt, const char *fil);
/** 1 if @p fil, an extension such as "asp" or "php" (not a full filename), is a /** 1 if @p fil, an extension such as "asp" or "php" (not a full filename), is a
known dynamic-page type, else 0. */ known dynamic-page type, else 0. */
HTSEXT_API int is_dyntype(const char *fil); HTSEXT_API hts_boolean is_dyntype(const char *fil);
/** Extract the extension of @p fil (text after the last '.', stopping at '?') /** Extract the extension of @p fil (text after the last '.', stopping at '?')
into caller scratch @p catbuff (capacity @p size) and return it. Returns "" into caller scratch @p catbuff (capacity @p size) and return it. Returns ""
@@ -610,12 +613,12 @@ HTSEXT_API const char *get_ext(char *catbuff, size_t size, const char *fil);
/** 1 if MIME type @p st must not be reclassified or renamed (hypertext types /** 1 if MIME type @p st must not be reclassified or renamed (hypertext types
and a built-in keep-list of commonly mislabeled types), else 0. */ and a built-in keep-list of commonly mislabeled types), else 0. */
HTSEXT_API int may_unknown(httrackp * opt, const char *st); HTSEXT_API hts_boolean may_unknown(httrackp *opt, const char *st);
/** Guess the MIME type of local file @p fil into @p s (capacity @p ssize), /** Guess the MIME type of local file @p fil into @p s (capacity @p ssize),
always producing a type. @return 1 if a type was written. */ always producing a type. @return 1 if a type was written. */
HTSEXT_API int guess_httptype_sized(httrackp *opt, char *s, size_t ssize, HTSEXT_API hts_boolean guess_httptype_sized(httrackp *opt, char *s,
const char *fil); size_t ssize, const char *fil);
/** @deprecated Use guess_httptype_sized(). Assumes @p s has at least /** @deprecated Use guess_httptype_sized(). Assumes @p s has at least
HTS_MIMETYPE_SIZE capacity. */ HTS_MIMETYPE_SIZE capacity. */
@@ -677,7 +680,7 @@ HTSEXT_API find_handle hts_findfirst(char *path);
/** Advance to the next directory entry. Returns 1 if an entry is available, 0 /** Advance to the next directory entry. Returns 1 if an entry is available, 0
at end of directory. */ at end of directory. */
HTSEXT_API int hts_findnext(find_handle find); HTSEXT_API hts_boolean hts_findnext(find_handle find);
/** Close the iteration and free @p find. Always returns 0; NULL is accepted. */ /** Close the iteration and free @p find. Always returns 0; NULL is accepted. */
HTSEXT_API int hts_findclose(find_handle find); HTSEXT_API int hts_findclose(find_handle find);
@@ -692,16 +695,16 @@ HTSEXT_API int hts_findgetsize(find_handle find);
/** 1 if the current entry is a directory, else 0 (a system/special entry, see /** 1 if the current entry is a directory, else 0 (a system/special entry, see
hts_findissystem(), reports 0). */ hts_findissystem(), reports 0). */
HTSEXT_API int hts_findisdir(find_handle find); HTSEXT_API hts_boolean hts_findisdir(find_handle find);
/** 1 if the current entry is a regular file, else 0 (a system/special entry, /** 1 if the current entry is a regular file, else 0 (a system/special entry,
see hts_findissystem(), reports 0). */ see hts_findissystem(), reports 0). */
HTSEXT_API int hts_findisfile(find_handle find); HTSEXT_API hts_boolean hts_findisfile(find_handle find);
/** 1 if the current entry is a special/system entry to skip: "." or "..", on /** 1 if the current entry is a special/system entry to skip: "." or "..", on
POSIX also device/fifo/socket nodes, on Windows also system, hidden or POSIX also device/fifo/socket nodes, on Windows also system, hidden or
temporary entries. Else 0. */ temporary entries. Else 0. */
HTSEXT_API int hts_findissystem(find_handle find); HTSEXT_API hts_boolean hts_findissystem(find_handle find);
/* UTF-8 aware FILE API */ /* UTF-8 aware FILE API */
/* On non-Windows these macros resolve directly to the POSIX calls. On Windows /* On non-Windows these macros resolve directly to the POSIX calls. On Windows

View File

@@ -288,7 +288,7 @@ static void __cdecl htsshow_uninit(t_hts_callbackarg * carg) {
} }
static int __cdecl htsshow_start(t_hts_callbackarg * carg, httrackp * opt) { static int __cdecl htsshow_start(t_hts_callbackarg * carg, httrackp * opt) {
use_show = 0; use_show = 0;
if (opt->verbosedisplay == 2) { if (opt->verbosedisplay == HTS_VERBOSE_FULL) {
use_show = 1; use_show = 1;
vt_clear(); vt_clear();
} }
@@ -852,7 +852,7 @@ static void sig_doback(int blind) { // mettre en backing
if (global_opt != NULL) { if (global_opt != NULL) {
// suppress logging and asking lousy questions // suppress logging and asking lousy questions
global_opt->quiet = 1; global_opt->quiet = 1;
global_opt->verbosedisplay = 0; global_opt->verbosedisplay = HTS_VERBOSE_NONE;
} }
if (!blind) if (!blind)

15
tests/01_engine-cookies.test Executable file
View File

@@ -0,0 +1,15 @@
#!/bin/bash
#
# Issue #151 guard: the request Cookie header must be bare RFC 6265 name=value
# pairs, no $Version/$Path attributes. Driven by the 'httrack -#Q' selftest.
set -eu
# A trailing token is required; a bare '-#Q' falls through to the usage screen.
out=$(httrack -#Q run)
# Exact-match the success line so a fall-through to usage can't pass the test.
test "$out" = "cookie-header: OK" || {
echo "expected 'cookie-header: OK', got: $out" >&2
exit 1
}

17
tests/01_engine-copyopt.test Executable file
View File

@@ -0,0 +1,17 @@
#!/bin/bash
#
# Regression guard for the unsigned-enum sentinel trap: copy_htsopt's
# `if (from->X > -1)` guard is always false for unsigned hts_boolean fields, so
# they silently stop being copied. Driven by the in-process 'httrack -#9' test.
# Keep POSIX-portable (harness runs it via $(BASH), a plain /bin/sh on macOS).
set -eu
# A trailing token is required; a bare '-#9' falls through to the usage screen.
out=$(httrack -#9 run)
# Exact-match the success line so a fall-through to usage can't pass the test.
test "$out" = "copy-htsopt: OK" || {
echo "expected 'copy-htsopt: OK', got: $out" >&2
exit 1
}

View File

@@ -89,4 +89,37 @@ grep -q NEWCONTENT "$(find "$out" -path '*/a.html' -print -quit)" || {
exit 1 exit 1
} }
# --- 3. an empty quoted arg survives the doit.log round-trip (#106) ----------
# -%F "" (empty footer) records an empty "" token in doit.log; -r2 follows it so
# a "drop the empty token" bug shifts -r2 into -%F's slot (the reprise then sees
# -%F -r2 and panics "%F needs to be followed by ..."), making the bug visible
# rather than a harmless run off the end of argv.
out2="$tmp/out2"
rc=0
"$bin" "$url" -O "$out2" --quiet -n -%v0 -%F "" -r2 >/dev/null 2>&1 || rc=$?
test "$rc" -eq 0 || {
echo "FAIL: initial mirror with empty footer exited $rc"
exit 1
}
# precondition: the writer put the empty token on disk for the reader to reload.
grep -q ' -%F "" -r2' "$out2/hts-cache/doit.log" || {
echo "FAIL: empty footer not recorded as -%F \"\" -r2 in doit.log"
grep -- '-%F' "$out2/hts-cache/doit.log" || true
exit 1
}
# no-url reprise: the reader rebuilds argv from doit.log and rewrites doit.log
# from it. The empty token surviving in the regenerated file proves the reader
# kept it (a drop/swallow would panic above or rewrite -%F without the "").
rc=0
"$bin" -O "$out2" --quiet >/dev/null 2>&1 || rc=$?
test "$rc" -eq 0 || {
echo "FAIL: empty-footer reprise exited $rc (empty token dropped from doit.log?)"
exit 1
}
grep -q ' -%F "" -r2' "$out2/hts-cache/doit.log" || {
echo "FAIL: empty footer did not survive the doit.log reload round-trip"
grep -- '-%F' "$out2/hts-cache/doit.log" || true
exit 1
}
exit 0 exit 0

View File

@@ -24,6 +24,8 @@ TESTS = \
01_engine-cache-golden.test \ 01_engine-cache-golden.test \
01_engine-charset.test \ 01_engine-charset.test \
01_engine-cmdline.test \ 01_engine-cmdline.test \
01_engine-cookies.test \
01_engine-copyopt.test \
01_engine-doitlog.test \ 01_engine-doitlog.test \
01_engine-entities.test \ 01_engine-entities.test \
01_engine-filter.test \ 01_engine-filter.test \