Compare commits

...

8 Commits

Author SHA1 Message Date
Xavier Roche
ee6beeeb7d htsopt: fold HTS_TRAVEL_TEST_ALL into the hts_travel_scope enum
The -t "test all" flag was a stray #define sitting next to the scope
enum; make it an enum constant so the named travel values live in one
place. The mask (HTS_TRAVEL_SCOPE_MASK) stays a #define: it selects the
scope out of opt->travel, it is not a member of the value set.

Name and value (1 << 8) are unchanged, so every use site compiles
identically and opt->travel stays plain int. No ABI change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-18 18:29:23 +02:00
Xavier Roche
6788bda380 Merge pull request #388 from xroche/feature/api-enum-fields-2
htsopt: type debug, savename_delayed and verbosedisplay as named enums
2026-06-18 18:25:44 +02:00
Xavier Roche
7ead8d595e htsopt: type three more option fields as named enums
debug becomes hts_log_type (it already stored LOG_* values; the int
declaration was a latent type hole), savename_delayed becomes a new
hts_savename_delayed { NONE, SOFT, HARD }, and verbosedisplay becomes a
new hts_verbosedisplay { NONE, SIMPLE, FULL }. hostcontrol stays int but
its bits are now named by a new hts_hostcontrol flags enum, matching the
existing getmode/seeker/travel/htsparsejava_flags pattern.

A C enum is int-sized, so struct layout, field offsets and
sizeof(httrackp) are unchanged: no ABI break, no soname bump. The three
sscanf("%d", ...) sites that fill these fields now write through an int*
(size-identical) to keep the format type exact.

These enums are unsigned-backed (all enumerators non-negative), so the
non-negative debug comparisons (debug < level, debug > LOG_INFO, etc.)
now compile to unsigned jumps. debug is never negative, never sscanf'd
and never tested against a negative bound, so the result is unchanged;
disassembly is otherwise byte-identical bar instruction scheduling.

savename_83 is left as int on purpose: its sscanf sits in the -L parser
block whose old indentation does not round-trip through clang-format.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-18 18:11:19 +02:00
Xavier Roche
93f502990c Merge pull request #387 from xroche/feature/api-bool-returns
Return hts_boolean from the yes/no library functions
2026-06-18 17:38:48 +02:00
Xavier Roche
0f4b2596b2 htslib: return hts_boolean from the yes/no library functions
The exported API had many functions returning int where the int is really a
yes/no answer. Type the 14 genuinely-boolean ones as hts_boolean
(catch_url, dir_exists, is_dyntype, may_unknown, hts_findnext,
hts_findisdir/isfile/issystem, hts_has_stopped, hts_addurl, hts_resetaddurl,
hts_log, get_httptype_sized, guess_httptype_sized) and the three boolean int
parameters likewise (get_httptype_sized's flag, unescape_http_unharm's no_high,
hts_request_stop's force).

hts_boolean moves from htsopt.h to htsglobal.h so the library header, which only
forward-declares httrackp and does not include htsopt.h, can see the type.

The audit deliberately left alone the functions whose name suggests a boolean
but whose value is not 0/1: hts_is_testing returns 0..5, hts_is_exiting and
is_knowntype/is_userknowntype are tri-state, structcheck and the *_utf8 wrappers
are POSIX 0/-1, hts_findgetsize is a size, hts_main is an exit code, and
copy_htsopt returns 0 for success (a bool would read backwards). hts_setpause
and hts_is_parsing keep int params because they gate on '>= 0', not 0/1.

Not an ABI break: int -> int-sized enum is the same calling convention for both
return values (eax) and parameters, and enum<->int is implicit for callers, so
already-compiled consumers keep working. Verified by comparing per-object
disassembly against master: 39 of 45 objects byte-identical, htslib differs only
in __LINE__ immediates, and the five caller/definer objects differ only in
register allocation and return-block merging (no control-flow or value change).
make check passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-18 09:19:36 +02:00
Xavier Roche
4a676bb5e1 Merge pull request #386 from xroche/feature/api-boolean-enum
Type the boolean option fields as a named enum
2026-06-18 09:04:14 +02:00
Xavier Roche
36b4e834b8 htsopt: type the boolean option fields as a named enum
The httrackp option fields that are pure on/off toggles were declared as bare
int. Introduce a two-value enum, hts_boolean { HTS_FALSE, HTS_TRUE }, and use it
as the type of the 38 boolean fields so each one documents its nature at the
declaration. The hts_create_opt() defaults block now reads HTS_TRUE/HTS_FALSE.

An enum is used rather than C bool on purpose: a C enum is int-sized and
represented like int, so the struct layout, every field offset and
sizeof(httrackp) are unchanged (verified: 141648 bytes before and after). The
size_httrackp guard value still holds and there is no soname bump. A bool field
would be one byte and would repack the whole struct.

Scope is httrackp only; fields that look boolean but are not were left as int
(savename_delayed is tri-state, hostcontrol is a bitmask), as was is_update in
the separate lien_back struct. The four CLI sites that sscanf("%d") into a
boolean field now cast to int* to keep the read well-defined.

Value-preserving: built against origin/master and compared per-object
disassembly. 40 of 45 objects are byte-identical; the five that differ
(htscore/htslib/htsname/htsparse/htswizard) differ only in instruction selection
from the int->enum field types, with every hts_create_opt default confirmed
unchanged. make check passes. Runtime assignments and tests on these fields are
left as plain 0/1, which compile identically.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-18 07:34:36 +02:00
Xavier Roche
bbb423f025 Merge pull request #385 from xroche/feature/api-enum-types
Give the option fields named enum types and flag macros
2026-06-18 07:06:59 +02:00
10 changed files with 176 additions and 135 deletions

View File

@@ -135,7 +135,8 @@ HTSEXT_API T_SOC catch_url_init(int *port, /* 128 bytes */ char *adr) {
// returns 0 if error
// url: buffer where URL must be stored - or ip:port in case of failure
// data: 32Kb
HTSEXT_API int catch_url(T_SOC soc, char *url, char *method, char *data) {
HTSEXT_API hts_boolean catch_url(T_SOC soc, char *url, char *method,
char *data) {
int retour = 0;
// connexion (accept)

View File

@@ -2585,7 +2585,7 @@ static int mkdir_compat(const char *pathname) {
/* path must end with "/" or with the finename (/tmp/bar/ or /tmp/bar/foo.zip) */
/* Note: preserve errno */
HTSEXT_API int dir_exists(const char *path) {
HTSEXT_API hts_boolean dir_exists(const char *path) {
const int err = errno;
STRUCT_STAT st;
char BIGSTK file[HTS_URLMAXSIZE * 2];
@@ -3646,7 +3646,7 @@ HTSEXT_API int hts_setpause(httrackp * opt, int p) {
}
// ask for termination
HTSEXT_API int hts_request_stop(httrackp * opt, int force) {
HTSEXT_API int hts_request_stop(httrackp *opt, hts_boolean force) {
if (opt != NULL) {
hts_log_print(opt, LOG_ERROR, "Exit requested by shell or user");
hts_mutexlock(&opt->state.lock);
@@ -3656,7 +3656,7 @@ HTSEXT_API int hts_request_stop(httrackp * opt, int force) {
return 0;
}
HTSEXT_API int hts_has_stopped(httrackp * opt) {
HTSEXT_API hts_boolean hts_has_stopped(httrackp *opt) {
int ended;
hts_mutexlock(&opt->state.lock);
ended = opt->state.is_ended;
@@ -3678,12 +3678,12 @@ HTSEXT_API int hts_has_stopped(httrackp * opt) {
//}
// ajout d'URL
// -1 : erreur
HTSEXT_API int hts_addurl(httrackp * opt, char **url) {
HTSEXT_API hts_boolean hts_addurl(httrackp *opt, char **url) {
if (url)
opt->state._hts_addurl = url;
return (opt->state._hts_addurl != NULL);
}
HTSEXT_API int hts_resetaddurl(httrackp * opt) {
HTSEXT_API hts_boolean hts_resetaddurl(httrackp *opt) {
opt->state._hts_addurl = NULL;
return (opt->state._hts_addurl != NULL);
}

View File

@@ -1783,7 +1783,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
break;
//
case 'b':
sscanf(com + 1, "%d", &opt->accept_cookie);
sscanf(com + 1, "%d", (int *) &opt->accept_cookie);
while(isdigit((unsigned char) *(com + 1)))
com++;
break;
@@ -1845,12 +1845,12 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
#endif
break;
case 'o':
sscanf(com + 1, "%d", &opt->errpage);
sscanf(com + 1, "%d", (int *) &opt->errpage);
while(isdigit((unsigned char) *(com + 1)))
com++;
break;
case 'u':
sscanf(com + 1, "%d", &opt->check_type);
sscanf(com + 1, "%d", (int *) &opt->check_type);
while(isdigit((unsigned char) *(com + 1)))
com++;
break;
@@ -1917,7 +1917,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
case 'I':
opt->kindex = 1;
if (isdigit((unsigned char) *(com + 1))) {
sscanf(com + 1, "%d", &opt->kindex);
sscanf(com + 1, "%d", (int *) &opt->kindex);
while(isdigit((unsigned char) *(com + 1)))
com++;
}
@@ -1991,7 +1991,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
case 'v':
opt->verbosedisplay = 2;
if (isdigit((unsigned char) *(com + 1))) {
sscanf(com + 1, "%d", &opt->verbosedisplay);
sscanf(com + 1, "%d", (int *) &opt->verbosedisplay);
while(isdigit((unsigned char) *(com + 1)))
com++;
}
@@ -2006,7 +2006,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
case 'N':
opt->savename_delayed = 2;
if (isdigit((unsigned char) *(com + 1))) {
sscanf(com + 1, "%d", &opt->savename_delayed);
sscanf(com + 1, "%d", (int *) &opt->savename_delayed);
while(isdigit((unsigned char) *(com + 1)))
com++;
}

View File

@@ -242,6 +242,14 @@ Please visit our Website: http://www.httrack.com
#define HTS_NOPARAM "(none)"
#define HTS_NOPARAM2 "\"(none)\""
/* Boolean flag for option fields and API yes/no returns. An enum (not C bool)
so it stays int-sized: option fields keep the httrackp layout/ABI, and a
return type stays compatible with the int it replaces. */
#ifndef HTS_DEF_DEFSTRUCT_hts_boolean
#define HTS_DEF_DEFSTRUCT_hts_boolean
typedef enum hts_boolean { HTS_FALSE = 0, HTS_TRUE = 1 } hts_boolean;
#endif
/* Larger/smaller of two values. Macros: arguments are evaluated twice. */
#define maximum(A,B) ( (A) > (B) ? (A) : (B) )

View File

@@ -3646,8 +3646,9 @@ HTSEXT_API char *unescape_http(char *const catbuff, const size_t size, const cha
// DOES NOT DECODE %25 (part of CHAR_DELIM)
// no_high & 1: decode high chars
// no_high & 2: decode space
HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size,
const char *s, const int no_high) {
HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size,
const char *s,
const hts_boolean no_high) {
size_t i, j;
RUNTIME_TIME_CHECK_SIZE(size);
@@ -3931,8 +3932,8 @@ void hts_replace(char *s, char from, char to) {
// guess a local file's mime type (e.g. fil="toto.gif" -> s="image/gif")
// returns 1 if a type was written to s, 0 otherwise
int guess_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil) {
hts_boolean guess_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil) {
return get_httptype_sized(opt, s, ssize, fil, 1);
}
@@ -3945,8 +3946,8 @@ void guess_httptype(httrackp * opt, char *s, const char *fil) {
// write the mime type for fil into s (capacity ssize)
// flag: 1 to always return a type (the "application/..." / octet-stream
// fallback) returns 1 if a type was written to s, 0 otherwise
HTSEXT_API int get_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil, int flag) {
HTSEXT_API hts_boolean get_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil, hts_boolean flag) {
// userdef overrides get_httptype (a rule with an empty value, e.g. "--assume
// cgi=", matches but writes nothing: report it as "no type" like the old
// code, whose callers tested strnotempty(s))
@@ -4196,7 +4197,7 @@ HTSEXT_API int is_userknowntype(httrackp * opt, const char *fil) {
// page dynamique?
// is_dyntype(get_ext("foo.asp"))
HTSEXT_API int is_dyntype(const char *fil) {
HTSEXT_API hts_boolean is_dyntype(const char *fil) {
int j = 0;
if (!fil)
@@ -4214,7 +4215,7 @@ HTSEXT_API int is_dyntype(const char *fil) {
// types critiques qui ne doivent pas être changés car renvoyés par des serveurs qui ne
// connaissent pas le type
int may_unknown(httrackp * opt, const char *st) {
hts_boolean may_unknown(httrackp *opt, const char *st) {
int j = 0;
// types média
@@ -5236,7 +5237,8 @@ HTSEXT_API int hts_uninit_module(void) {
}
// legacy. do not use
HTSEXT_API int hts_log(httrackp * opt, const char *prefix, const char *msg) {
HTSEXT_API hts_boolean hts_log(httrackp *opt, const char *prefix,
const char *msg) {
if (opt->log != NULL) {
fspc(opt, opt->log, prefix);
fprintf(opt->log, "%s" LF, msg);
@@ -5435,14 +5437,14 @@ HTSEXT_API httrackp *hts_create_opt(void) {
/* default settings */
opt->wizard = HTS_WIZARD_AUTO; // wizard automatique
opt->quiet = 0; // questions
opt->quiet = HTS_FALSE;
//
opt->travel = HTS_TRAVEL_SAME_ADDRESS; // même adresse
opt->depth = 9999; // mirror total par défaut
opt->extdepth = 0; // mais pas à l'extérieur
opt->seeker = HTS_SEEKER_DOWN; // down
opt->urlmode = HTS_URLMODE_RELATIVE; // relatif par défaut
opt->no_type_change = 0; // change file types
opt->no_type_change = HTS_FALSE;
opt->debug = LOG_NOTICE; // small log
opt->getmode = HTS_GETMODE_HTML | HTS_GETMODE_NONHTML;
opt->maxsite = -1; // taille max site (aucune)
@@ -5450,18 +5452,18 @@ HTSEXT_API httrackp *hts_create_opt(void) {
opt->maxfile_html = -1; // idem pour html
opt->maxsoc = 4; // nbre socket max
opt->fragment = -1; // pas de fragmentation
opt->nearlink = 0; // ne pas prendre les liens non-html "adjacents"
opt->makeindex = 1; // faire un index
opt->kindex = 0; // index 'keyword'
opt->delete_old = 1; // effacer anciens fichiers
opt->background_on_suspend = 1; // Background the process if Control Z calls signal suspend.
opt->makestat = 0; // pas de fichier de stats
opt->maketrack = 0; // ni de tracking
opt->nearlink = HTS_FALSE;
opt->makeindex = HTS_TRUE;
opt->kindex = HTS_FALSE;
opt->delete_old = HTS_TRUE;
opt->background_on_suspend = HTS_TRUE;
opt->makestat = HTS_FALSE;
opt->maketrack = HTS_FALSE;
opt->timeout = 120; // timeout par défaut (2 minutes)
opt->cache = HTS_CACHE_PRIORITY; // cache prioritaire
opt->shell = 0; // pas de shell par defaut
opt->shell = HTS_FALSE;
opt->proxy.active = 0; // pas de proxy
opt->user_agent_send = 1; // envoyer un user-agent
opt->user_agent_send = HTS_TRUE;
StringCopy(opt->user_agent,
"Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)");
StringCopy(opt->referer, "");
@@ -5469,34 +5471,36 @@ HTSEXT_API httrackp *hts_create_opt(void) {
opt->savename_83 = 0; // noms longs par défaut
opt->savename_type = 0; // avec structure originale
opt->savename_delayed = 2; // hard delayed type (default)
opt->delayed_cached = 1; // cached delayed type (default)
opt->mimehtml = 0; // pas MIME-html
opt->delayed_cached = HTS_TRUE;
opt->mimehtml = HTS_FALSE;
opt->parsejava = HTSPARSE_DEFAULT; // parser classes
opt->hostcontrol = 0; // PAS de control host pour timeout et traffic jammer
opt->retry = 2; // 2 retry par défaut
opt->errpage = 1; // copier ou générer une page d'erreur en cas d'erreur (404 etc.)
opt->check_type = 1; // vérifier type si inconnu (cgi,asp..) SAUF / considéré comme html
opt->all_in_cache = 0; // ne pas tout stocker en cache
opt->errpage = HTS_TRUE;
// d'erreur (404 etc.)
opt->check_type = HTS_TRUE;
// considéré comme html
opt->all_in_cache = HTS_FALSE;
opt->robots = HTS_ROBOTS_ALWAYS; // traiter les robots.txt
opt->external = 0; // liens externes normaux
opt->passprivacy = 0; // mots de passe dans les fichiers
opt->includequery = 1; // include query-string par défaut
opt->mirror_first_page = 0; // pas mode mirror links
opt->accept_cookie = 1; // gérer les cookies
opt->external = HTS_FALSE;
opt->passprivacy = HTS_FALSE;
opt->includequery = HTS_TRUE;
opt->mirror_first_page = HTS_FALSE;
opt->accept_cookie = HTS_TRUE;
opt->cookie = NULL;
opt->http10 = 0; // laisser http/1.1
opt->nokeepalive = 0; // pas keep-alive
opt->nocompression = 0; // pas de compression
opt->tolerant = 0; // ne pas accepter content-length incorrect
opt->parseall = 1; // tout parser (tags inconnus, par exemple)
opt->parsedebug = 0; // pas de mode débuggage
opt->norecatch = 0; // ne pas reprendre les fichiers effacés par l'utilisateur
opt->http10 = HTS_FALSE;
opt->nokeepalive = HTS_FALSE;
opt->nocompression = HTS_FALSE;
opt->tolerant = HTS_FALSE;
opt->parseall = HTS_TRUE;
opt->parsedebug = HTS_FALSE;
opt->norecatch = HTS_FALSE;
opt->verbosedisplay = 0; // pas d'animation texte
opt->sizehack = 0; // size hack
opt->urlhack = 1; // url hack (normalizer)
opt->sizehack = HTS_FALSE;
opt->urlhack = HTS_TRUE;
StringCopy(opt->footer, HTS_DEFAULT_FOOTER);
opt->ftp_proxy = 1; // proxy http pour ftp
opt->convert_utf8 = 1; // convert html to UTF-8
opt->ftp_proxy = HTS_TRUE;
opt->convert_utf8 = HTS_TRUE;
StringCopy(opt->filelist, "");
StringCopy(opt->lang_iso, "en, *");
StringCopy(opt->accept,
@@ -5507,9 +5511,9 @@ HTSEXT_API httrackp *hts_create_opt(void) {
//
opt->log = stdout;
opt->errlog = stderr;
opt->flush = 1; // flush sur les fichiers log
//opt->aff_progress=0;
opt->keyboard = 0;
opt->flush = HTS_TRUE;
// opt->aff_progress=0;
opt->keyboard = HTS_FALSE;
//
StringCopy(opt->path_html, "");
StringCopy(opt->path_html_utf8, "");
@@ -5526,10 +5530,10 @@ HTSEXT_API httrackp *hts_create_opt(void) {
opt->waittime = -1; // wait until.. hh*3600+mm*60+ss
//
opt->exec = "";
opt->is_update = 0; // not an update (yet)
opt->dir_topindex = 0; // do not built top index (yet)
opt->is_update = HTS_FALSE;
opt->dir_topindex = HTS_FALSE;
//
opt->bypass_limits = 0; // enforce limits by default
opt->bypass_limits = HTS_FALSE;
opt->state.stop = 0; // stopper
opt->state.exit_xh = 0; // abort
//

View File

@@ -342,17 +342,37 @@ typedef enum hts_seeker {
HTS_SEEKER_UP = 1 << 1 /**< may ascend to parent directories */
} hts_seeker;
/* Link-following scope, stored in the low byte of opt->travel. */
/* opt->travel: link-following scope in the low byte, flags OR'd in above it. */
typedef enum hts_travel_scope {
HTS_TRAVEL_SAME_ADDRESS = 0, /**< stay on the same address (host) */
HTS_TRAVEL_SAME_DOMAIN = 1, /**< stay on the same principal domain */
HTS_TRAVEL_SAME_TLD = 2, /**< stay on the same TLD (e.g. .com) */
HTS_TRAVEL_EVERYWHERE = 7 /**< follow links anywhere on the web */
HTS_TRAVEL_EVERYWHERE = 7, /**< follow links anywhere on the web */
HTS_TRAVEL_TEST_ALL = 1 << 8 /**< also test forbidden URLs (-t) */
} hts_travel_scope;
/* Flags OR'd into opt->travel above the scope value. */
#define HTS_TRAVEL_SCOPE_MASK 0xff /**< mask selecting the scope value */
#define HTS_TRAVEL_TEST_ALL (1 << 8) /**< also test forbidden URLs (-t) */
/* Mask selecting the scope value out of opt->travel. */
#define HTS_TRAVEL_SCOPE_MASK 0xff
/* Text progress display detail (opt->verbosedisplay). */
typedef enum hts_verbosedisplay {
HTS_VERBOSE_NONE = 0, /**< no animated progress display (default) */
HTS_VERBOSE_SIMPLE = 1, /**< minimal single-line progress */
HTS_VERBOSE_FULL = 2 /**< full animated progress */
} hts_verbosedisplay;
/* Delayed file-type resolution policy (opt->savename_delayed). */
typedef enum hts_savename_delayed {
HTS_SAVENAME_DELAYED_NONE = 0, /**< resolve the type immediately */
HTS_SAVENAME_DELAYED_SOFT = 1, /**< delay the type check when unknown */
HTS_SAVENAME_DELAYED_HARD = 2 /**< always delay the type check (default) */
} hts_savename_delayed;
/* Host-banning triggers (opt->hostcontrol bitmask). */
typedef enum hts_hostcontrol {
HTS_HOSTCONTROL_BAN_TIMEOUT = 1 << 0, /**< ban a timing-out host */
HTS_HOSTCONTROL_BAN_SLOW = 1 << 1 /**< ban a too-slow host */
} hts_hostcontrol;
#ifndef HTS_DEF_FWSTRUCT_lien_buffers
#define HTS_DEF_FWSTRUCT_lien_buffers
@@ -378,15 +398,15 @@ struct httrackp {
size_t size_httrackp; /**< size of this structure (version/ABI guard) */
/* */
hts_wizard wizard; /**< interactive wizard level (none/ask/auto) */
int flush; /**< fflush() log files after each write */
hts_boolean flush; /**< fflush() log files after each write */
int travel; /**< link-following scope (same domain, etc.) */
int seeker; /**< allowed direction: go up and/or down the tree */
int depth; /**< maximum recursion depth (-rN) */
int extdepth; /**< maximum recursion depth outside the start domain */
hts_urlmode
urlmode; /**< saved-link rewriting style (relative, absolute, etc.) */
int no_type_change; // do not change file type according to MIME
int debug; /**< debug logging level */
hts_boolean no_type_change; // do not change file type according to MIME
hts_log_type debug; /**< debug logging level */
int getmode; /**< what to fetch (HTML, images, ...) bitmask */
FILE *log; /**< informational log stream; NULL mutes it */
FILE *errlog; /**< error log stream; NULL mutes it */
@@ -395,10 +415,11 @@ struct httrackp {
LLint maxfile_html; /**< max bytes per HTML file */
int maxsoc; /**< max simultaneous sockets (-cN) */
LLint fragment; /**< split site after this many bytes */
int nearlink; /**< also fetch images/data adjacent to a page but off-site */
int makeindex; /**< build a top-level index.html */
int kindex; /**< build a keyword index */
int delete_old; /**< delete locally obsolete files after update */
hts_boolean
nearlink; /**< also fetch images/data adjacent to a page but off-site */
hts_boolean makeindex; /**< build a top-level index.html */
hts_boolean kindex; /**< build a keyword index */
hts_boolean delete_old; /**< delete locally obsolete files after update */
int timeout; /**< connection timeout in seconds */
int rateout; /**< minimum transfer rate (bytes/s) before abort */
int maxtime; /**< max total mirror duration in seconds */
@@ -407,16 +428,17 @@ struct httrackp {
int waittime; /**< scheduled start time (wall-clock seconds) */
hts_cachemode cache; /**< cache generation mode */
// int aff_progress; // progress bar
int shell; /**< driven by a shell over stdin/stdout pipes */
hts_boolean shell; /**< driven by a shell over stdin/stdout pipes */
t_proxy proxy; /**< proxy configuration */
int savename_83; /**< force 8.3 (DOS) file names */
int savename_type; /**< saved-name layout (original tree, flat, ...) */
String
savename_userdef; /**< user-defined name template (e.g. %h%p/%n%q.%t) */
int savename_delayed; // delayed type check
int delayed_cached; // delayed type check can be cached to speedup updates
int mimehtml; /**< produce a single MIME/MHTML archive */
int user_agent_send; /**< send a User-Agent header */
hts_savename_delayed savename_delayed; /**< delayed type-check policy */
hts_boolean
delayed_cached; // delayed type check can be cached to speedup updates
hts_boolean mimehtml; /**< produce a single MIME/MHTML archive */
hts_boolean user_agent_send; /**< send a User-Agent header */
String user_agent; /**< User-Agent value (e.g. httrack/1.0) */
String referer; /**< Referer value to send */
String from; /**< From value to send */
@@ -425,37 +447,39 @@ struct httrackp {
String path_html_utf8; /**< output directory for the mirror, UTF-8 form */
String path_bin; /**< directory for HTML templates */
int retry; /**< extra retries on a failed transfer */
int makestat; /**< maintain a transfer-statistics log */
int maketrack; /**< maintain an operations-statistics log */
hts_boolean makestat; /**< maintain a transfer-statistics log */
hts_boolean maketrack; /**< maintain an operations-statistics log */
int parsejava; /**< Java/JS parsing mode; see htsparsejava_flags */
int hostcontrol; /**< drop hosts that are too slow, etc. */
int errpage; /**< generate an error page on 404 and similar */
int check_type; /**< probe unknown-type links (cgi/asp/dir) and follow moves
*/
int all_in_cache; /**< keep all retrieved data in the cache */
int hostcontrol; /**< ban slow/timing-out hosts; see hts_hostcontrol bits */
hts_boolean errpage; /**< generate an error page on 404 and similar */
hts_boolean
check_type; /**< probe unknown-type links (cgi/asp/dir) and follow moves
*/
hts_boolean all_in_cache; /**< keep all retrieved data in the cache */
hts_robots robots; /**< robots.txt handling level */
int external; /**< render external links as error pages */
int passprivacy; /**< strip passwords from external links */
int includequery; /**< include the query string in saved names */
int mirror_first_page; /**< only mirror the links of the first page */
hts_boolean external; /**< render external links as error pages */
hts_boolean passprivacy; /**< strip passwords from external links */
hts_boolean includequery; /**< include the query string in saved names */
hts_boolean mirror_first_page; /**< only mirror the links of the first page */
String sys_com; /**< system command to run */
int sys_com_exec; /**< actually execute sys_com */
int accept_cookie; /**< accept and send cookies */
hts_boolean sys_com_exec; /**< actually execute sys_com */
hts_boolean accept_cookie; /**< accept and send cookies */
t_cookie *cookie; /**< cookie store */
int http10; /**< force HTTP/1.0 */
int nokeepalive; /**< disable keep-alive */
int nocompression; /**< disable content compression */
int sizehack; /**< treat same-size response as "updated" */
int urlhack; // force "url normalization" to avoid loops
int tolerant; /**< accept an incorrect Content-Length */
int parseall; /**< parse aggressively, including unknown tags with links */
int parsedebug; /**< parser debug mode */
int norecatch; /**< do not re-fetch files the user deleted locally */
int verbosedisplay; /**< animated text progress display */
hts_boolean http10; /**< force HTTP/1.0 */
hts_boolean nokeepalive; /**< disable keep-alive */
hts_boolean nocompression; /**< disable content compression */
hts_boolean sizehack; /**< treat same-size response as "updated" */
hts_boolean urlhack; // force "url normalization" to avoid loops
hts_boolean tolerant; /**< accept an incorrect Content-Length */
hts_boolean
parseall; /**< parse aggressively, including unknown tags with links */
hts_boolean parsedebug; /**< parser debug mode */
hts_boolean norecatch; /**< do not re-fetch files the user deleted locally */
hts_verbosedisplay verbosedisplay; /**< animated text progress display */
String footer; /**< footer/info line injected into pages */
int maxcache; /**< in-memory cache backing limit (bytes) */
// int maxcache_anticipate; // maximum links to anticipate (upper bound)
int ftp_proxy; /**< use the HTTP proxy for FTP too */
hts_boolean ftp_proxy; /**< use the HTTP proxy for FTP too */
String filelist; /**< file listing URLs to include */
String urllist; /**< file listing filters to include */
htsfilters filters; /**< filter pointers (+/-pattern rules) */
@@ -469,20 +493,20 @@ struct httrackp {
String headers; // Additional headers
String mimedefs; // ext1=mimetype1\next2=mimetype2..
String mod_blacklist; /**< blacklisted modules */
int convert_utf8; // filenames UTF-8 conversion (3.46)
hts_boolean convert_utf8; // filenames UTF-8 conversion (3.46)
//
int maxlink; /**< max number of links */
int maxfilter; /**< max number of filters */
//
const char *exec; /**< path of the running executable */
//
int quiet; /**< suppress non-wizard questions */
int keyboard; /**< poll stdin for keyboard input */
int bypass_limits; // bypass built-in limits
int background_on_suspend; // background process on suspend signal
hts_boolean quiet; /**< suppress non-wizard questions */
hts_boolean keyboard; /**< poll stdin for keyboard input */
hts_boolean bypass_limits; // bypass built-in limits
hts_boolean background_on_suspend; // background process on suspend signal
//
int is_update; /**< this run is an update (show "File updated...") */
int dir_topindex; /**< rebuild the top index afterwards */
hts_boolean is_update; /**< this run is an update (show "File updated...") */
hts_boolean dir_topindex; /**< rebuild the top index afterwards */
//
// callbacks
t_hts_htmlcheck_callbacks

View File

@@ -3722,7 +3722,8 @@ int hts_mirror_check_moved(htsmoduleStruct * str,
//case -1: can_retry=1; break;
case STATUSCODE_TIMEOUT:
if (opt->hostcontrol) { // timeout et retry épuisés
if ((opt->hostcontrol & 1) && (heap(ptr)->retry <= 0)) {
if ((opt->hostcontrol & HTS_HOSTCONTROL_BAN_TIMEOUT) &&
(heap(ptr)->retry <= 0)) {
hts_log_print(opt, LOG_DEBUG, "Link banned: %s%s", urladr(), urlfil());
host_ban(opt, ptr, sback, jump_identification_const(urladr()));
hts_log_print(opt, LOG_DEBUG,
@@ -3735,7 +3736,7 @@ int hts_mirror_check_moved(htsmoduleStruct * str,
break;
case STATUSCODE_SLOW:
if ((opt->hostcontrol) && (heap(ptr)->retry <= 0)) { // too slow
if (opt->hostcontrol & 2) {
if (opt->hostcontrol & HTS_HOSTCONTROL_BAN_SLOW) {
hts_log_print(opt, LOG_DEBUG, "Link banned: %s%s", urladr(), urlfil());
host_ban(opt, ptr, sback, jump_identification_const(urladr()));
hts_log_print(opt, LOG_DEBUG,

View File

@@ -1213,7 +1213,7 @@ HTSEXT_API find_handle hts_findfirst(char *path) {
return NULL;
}
HTSEXT_API int hts_findnext(find_handle find) {
HTSEXT_API hts_boolean hts_findnext(find_handle find) {
if (find) {
#ifdef _WIN32
if ((FindNextFileA(find->handle, &find->hdata)))
@@ -1273,7 +1273,7 @@ HTSEXT_API int hts_findgetsize(find_handle find) {
return -1;
}
HTSEXT_API int hts_findisdir(find_handle find) {
HTSEXT_API hts_boolean hts_findisdir(find_handle find) {
if (find) {
if (!hts_findissystem(find)) {
#ifdef _WIN32
@@ -1287,7 +1287,7 @@ HTSEXT_API int hts_findisdir(find_handle find) {
}
return 0;
}
HTSEXT_API int hts_findisfile(find_handle find) {
HTSEXT_API hts_boolean hts_findisfile(find_handle find) {
if (find) {
if (!hts_findissystem(find)) {
#ifdef _WIN32
@@ -1301,7 +1301,7 @@ HTSEXT_API int hts_findisfile(find_handle find) {
}
return 0;
}
HTSEXT_API int hts_findissystem(find_handle find) {
HTSEXT_API hts_boolean hts_findissystem(find_handle find) {
if (find) {
#ifdef _WIN32
if (find->hdata.

View File

@@ -108,15 +108,15 @@ HTSEXT_API int hts_buildtopindex(httrackp * opt, const char *path,
// Portable directory find functions
// Directory find functions
HTSEXT_API find_handle hts_findfirst(char *path);
HTSEXT_API int hts_findnext(find_handle find);
HTSEXT_API hts_boolean hts_findnext(find_handle find);
HTSEXT_API int hts_findclose(find_handle find);
//
HTSEXT_API char *hts_findgetname(find_handle find);
HTSEXT_API int hts_findgetsize(find_handle find);
HTSEXT_API int hts_findisdir(find_handle find);
HTSEXT_API int hts_findisfile(find_handle find);
HTSEXT_API int hts_findissystem(find_handle find);
HTSEXT_API hts_boolean hts_findisdir(find_handle find);
HTSEXT_API hts_boolean hts_findisfile(find_handle find);
HTSEXT_API hts_boolean hts_findissystem(find_handle find);
#endif

View File

@@ -206,7 +206,8 @@ HTSEXT_API htsErrorCallback hts_get_error_callback(void);
/* Logging */
/** Legacy: write prefix then msg to opt->log. Returns 0 if written, 1 if
opt->log is NULL. Prefer hts_log_print(). */
HTSEXT_API int hts_log(httrackp * opt, const char *prefix, const char *msg);
HTSEXT_API hts_boolean hts_log(httrackp *opt, const char *prefix,
const char *msg);
/** printf-style log at level @p type (an hts_log_type, optionally |LOG_ERRNO).
Forwards to the registered log callback, and when the level is <= opt->debug
@@ -313,7 +314,8 @@ HTSEXT_API T_SOC catch_url_init(int *port, char *adr);
"ip:port". The buffers are caller-allocated and not bounds-checked: @p data
must be CATCH_URL_DATA_SIZE bytes, and @p url / @p method must fit the
captured request line. */
HTSEXT_API int catch_url(T_SOC soc, char *url, char *method, char *data);
HTSEXT_API hts_boolean catch_url(T_SOC soc, char *url, char *method,
char *data);
/* State */
/** Whether the engine is parsing HTML. Returns 0 if not, otherwise the percent
@@ -334,10 +336,10 @@ HTSEXT_API int hts_is_exiting(httrackp * opt);
caller-owned, NULL-terminated array of strings; the engine stores the
pointer without copying, so the array and its strings must stay valid until
the engine consumes them. @return nonzero if a list is now set. */
HTSEXT_API int hts_addurl(httrackp * opt, char **url);
HTSEXT_API hts_boolean hts_addurl(httrackp *opt, char **url);
/** Clear any pending add-URL list set by hts_addurl(). Always returns 0. */
HTSEXT_API int hts_resetaddurl(httrackp * opt);
HTSEXT_API hts_boolean hts_resetaddurl(httrackp *opt);
/** Apply the runtime-tunable options from @p from onto @p to, to adjust a live
mirror. Only fields set to a non-sentinel value are copied; the rest of @p
@@ -356,7 +358,7 @@ HTSEXT_API int hts_setpause(httrackp * opt, int);
lock, so it is safe to call from another thread). @p force is currently
ignored.
@return 0; no-op if @p opt is NULL. */
HTSEXT_API int hts_request_stop(httrackp * opt, int force);
HTSEXT_API int hts_request_stop(httrackp *opt, hts_boolean force);
/** Queue a single in-progress file, by URL, to be cancelled by the engine.
@p url is copied internally. Takes the state lock, so it is thread-safe.
@@ -373,7 +375,7 @@ HTSEXT_API void hts_cancel_parsing(httrackp * opt);
/** Nonzero once the mirror has fully ended. Read under the engine state lock,
so safe to poll from another thread. Wait for this before hts_free_opt(). */
HTSEXT_API int hts_has_stopped(httrackp * opt);
HTSEXT_API hts_boolean hts_has_stopped(httrackp *opt);
/* Tools */
/** Ensure the directory chain leading to @p path exists, creating missing
@@ -390,7 +392,7 @@ HTSEXT_API int structcheck_utf8(const char *path);
/** Whether the directory containing @p path exists. The basename is stripped
first, so passing a file path tests its parent directory. @return 1 if it is
a directory, 0 otherwise. */
HTSEXT_API int dir_exists(const char *path);
HTSEXT_API hts_boolean dir_exists(const char *path);
/** Write the HTTP reason phrase for @p statuscode into @p msg, a caller buffer
of at least 64 bytes. For an unknown code a non-empty @p msg is kept,
@@ -573,14 +575,15 @@ HTSEXT_API char *unescape_http(char *const catbuff, const size_t size, const cha
must-avoid escapes are kept encoded, and %25 is never decoded). @p no_high &
1 also decodes high (>= 128) bytes; @p no_high & 2 also decodes an escaped
space. Returns @p catbuff. */
HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size, const char *s, const int no_high);
HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size,
const char *s, const hts_boolean no_high);
/** Determine the MIME type of local file name @p fil into @p s (capacity
@p ssize): user --assume rules, then ".html", then the built-in extension
table. @p flag != 0 forces a fallback type. @return 1 if a type was written,
0 otherwise. */
HTSEXT_API int get_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil, int flag);
HTSEXT_API hts_boolean get_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil, hts_boolean flag);
/** @deprecated Use get_httptype_sized(). Assumes @p s has at least
HTS_MIMETYPE_SIZE capacity. */
@@ -600,7 +603,7 @@ HTSEXT_API int is_userknowntype(httrackp * opt, const char *fil);
/** 1 if @p fil, an extension such as "asp" or "php" (not a full filename), is a
known dynamic-page type, else 0. */
HTSEXT_API int is_dyntype(const char *fil);
HTSEXT_API hts_boolean is_dyntype(const char *fil);
/** Extract the extension of @p fil (text after the last '.', stopping at '?')
into caller scratch @p catbuff (capacity @p size) and return it. Returns ""
@@ -610,12 +613,12 @@ HTSEXT_API const char *get_ext(char *catbuff, size_t size, const char *fil);
/** 1 if MIME type @p st must not be reclassified or renamed (hypertext types
and a built-in keep-list of commonly mislabeled types), else 0. */
HTSEXT_API int may_unknown(httrackp * opt, const char *st);
HTSEXT_API hts_boolean may_unknown(httrackp *opt, const char *st);
/** Guess the MIME type of local file @p fil into @p s (capacity @p ssize),
always producing a type. @return 1 if a type was written. */
HTSEXT_API int guess_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil);
HTSEXT_API hts_boolean guess_httptype_sized(httrackp *opt, char *s,
size_t ssize, const char *fil);
/** @deprecated Use guess_httptype_sized(). Assumes @p s has at least
HTS_MIMETYPE_SIZE capacity. */
@@ -677,7 +680,7 @@ HTSEXT_API find_handle hts_findfirst(char *path);
/** Advance to the next directory entry. Returns 1 if an entry is available, 0
at end of directory. */
HTSEXT_API int hts_findnext(find_handle find);
HTSEXT_API hts_boolean hts_findnext(find_handle find);
/** Close the iteration and free @p find. Always returns 0; NULL is accepted. */
HTSEXT_API int hts_findclose(find_handle find);
@@ -692,16 +695,16 @@ HTSEXT_API int hts_findgetsize(find_handle find);
/** 1 if the current entry is a directory, else 0 (a system/special entry, see
hts_findissystem(), reports 0). */
HTSEXT_API int hts_findisdir(find_handle find);
HTSEXT_API hts_boolean hts_findisdir(find_handle find);
/** 1 if the current entry is a regular file, else 0 (a system/special entry,
see hts_findissystem(), reports 0). */
HTSEXT_API int hts_findisfile(find_handle find);
HTSEXT_API hts_boolean hts_findisfile(find_handle find);
/** 1 if the current entry is a special/system entry to skip: "." or "..", on
POSIX also device/fifo/socket nodes, on Windows also system, hidden or
temporary entries. Else 0. */
HTSEXT_API int hts_findissystem(find_handle find);
HTSEXT_API hts_boolean hts_findissystem(find_handle find);
/* UTF-8 aware FILE API */
/* On non-Windows these macros resolve directly to the POSIX calls. On Windows