Compare commits

..

6 Commits

Author SHA1 Message Date
Xavier Roche
0f4b2596b2 htslib: return hts_boolean from the yes/no library functions
The exported API had many functions returning int where the int is really a
yes/no answer. Type the 14 genuinely-boolean ones as hts_boolean
(catch_url, dir_exists, is_dyntype, may_unknown, hts_findnext,
hts_findisdir/isfile/issystem, hts_has_stopped, hts_addurl, hts_resetaddurl,
hts_log, get_httptype_sized, guess_httptype_sized) and the three boolean int
parameters likewise (get_httptype_sized's flag, unescape_http_unharm's no_high,
hts_request_stop's force).

hts_boolean moves from htsopt.h to htsglobal.h so the library header, which only
forward-declares httrackp and does not include htsopt.h, can see the type.

The audit deliberately left alone the functions whose name suggests a boolean
but whose value is not 0/1: hts_is_testing returns 0..5, hts_is_exiting and
is_knowntype/is_userknowntype are tri-state, structcheck and the *_utf8 wrappers
are POSIX 0/-1, hts_findgetsize is a size, hts_main is an exit code, and
copy_htsopt returns 0 for success (a bool would read backwards). hts_setpause
and hts_is_parsing keep int params because they gate on '>= 0', not 0/1.

Not an ABI break: int -> int-sized enum is the same calling convention for both
return values (eax) and parameters, and enum<->int is implicit for callers, so
already-compiled consumers keep working. Verified by comparing per-object
disassembly against master: 39 of 45 objects byte-identical, htslib differs only
in __LINE__ immediates, and the five caller/definer objects differ only in
register allocation and return-block merging (no control-flow or value change).
make check passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-18 09:19:36 +02:00
Xavier Roche
4a676bb5e1 Merge pull request #386 from xroche/feature/api-boolean-enum
Type the boolean option fields as a named enum
2026-06-18 09:04:14 +02:00
Xavier Roche
36b4e834b8 htsopt: type the boolean option fields as a named enum
The httrackp option fields that are pure on/off toggles were declared as bare
int. Introduce a two-value enum, hts_boolean { HTS_FALSE, HTS_TRUE }, and use it
as the type of the 38 boolean fields so each one documents its nature at the
declaration. The hts_create_opt() defaults block now reads HTS_TRUE/HTS_FALSE.

An enum is used rather than C bool on purpose: a C enum is int-sized and
represented like int, so the struct layout, every field offset and
sizeof(httrackp) are unchanged (verified: 141648 bytes before and after). The
size_httrackp guard value still holds and there is no soname bump. A bool field
would be one byte and would repack the whole struct.

Scope is httrackp only; fields that look boolean but are not were left as int
(savename_delayed is tri-state, hostcontrol is a bitmask), as was is_update in
the separate lien_back struct. The four CLI sites that sscanf("%d") into a
boolean field now cast to int* to keep the read well-defined.

Value-preserving: built against origin/master and compared per-object
disassembly. 40 of 45 objects are byte-identical; the five that differ
(htscore/htslib/htsname/htsparse/htswizard) differ only in instruction selection
from the int->enum field types, with every hts_create_opt default confirmed
unchanged. make check passes. Runtime assignments and tests on these fields are
left as plain 0/1, which compile identically.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-18 07:34:36 +02:00
Xavier Roche
bbb423f025 Merge pull request #385 from xroche/feature/api-enum-types
Give the option fields named enum types and flag macros
2026-06-18 07:06:59 +02:00
Xavier Roche
eed46e0b09 htsopt: give the option fields named enum types and flag macros
The per-mirror option fields in the installed htsopt.h carried bare ints whose
values were scattered magic numbers, decoded only by reading the parser. Type
the four single-valued fields as enums (urlmode -> hts_urlmode, cache ->
hts_cachemode, wizard -> hts_wizard, robots -> hts_robots) and name the bitmask
bits as enums too (hts_getmode, hts_seeker, hts_travel_scope, plus
HTS_TRAVEL_SCOPE_MASK / HTS_TRAVEL_TEST_ALL), following the existing
htsparsejava_flags pattern where the flag bits are an enum but the field stays
int. Replace the magic numbers at every use site with the named values.

This is not an ABI break: a C enum is int-sized and represented identically, so
the struct layout, field offsets and sizeof(httrackp) are unchanged and the
size_httrackp guard value still holds. No soname bump.

The substitution is value-preserving and was verified by comparing per-object
disassembly between this branch and origin/master: 98 of 103 objects are
byte-identical, the htscore/htscoremain/htsparse objects have identical opcode
sequences (the only deltas are __LINE__ immediates moved by clang-format
wrapping long lines), and htslib/htswizard differ only in instruction selection
from the int->enum field types, with every hts_create_opt default confirmed
unchanged. make check passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-17 23:59:38 +02:00
Xavier Roche
fa57f0148f Merge pull request #384 from xroche/cleanup/dead-decls
Drop dead and duplicate function declarations
2026-06-17 22:15:13 +02:00
13 changed files with 395 additions and 291 deletions

View File

@@ -2779,7 +2779,7 @@ void back_wait(struct_back * sback, httrackp * opt, cache_back * cache,
if (strcmp(back[i].url_fil, "/robots.txt")) {
if (back[i].r.statuscode == HTTP_OK) { // 'OK'
if (!is_hypertext_mime(opt, back[i].r.contenttype, back[i].url_fil)) { // pas HTML
if (opt->getmode & 2) { // on peut ecrire des non html
if (opt->getmode & HTS_GETMODE_NONHTML) {
int fcheck = 0;
int last_errno = 0;
@@ -2852,7 +2852,7 @@ void back_wait(struct_back * sback, httrackp * opt, cache_back * cache,
}
}
}
} else { // on coupe tout!
} else { // on coupe tout!
hts_log_print(opt, LOG_DEBUG,
"File cancelled (non HTML): %s%s",
back[i].url_adr, back[i].url_fil);
@@ -3661,7 +3661,7 @@ void back_wait(struct_back * sback, httrackp * opt, cache_back * cache,
#endif
if (sz >= 0) {
if (!is_hypertext_mime(opt, back[i].r.contenttype, back[i].url_sav)) { // pas HTML
if (opt->getmode & 2) { // on peut ecrire des non html **sinon ben euhh sera intercepté plus loin, donc rap sur ce qui va sortir**
if (opt->getmode & HTS_GETMODE_NONHTML) {
filenote(&opt->state.strc, back[i].url_sav, NULL); // noter fichier comme connu
file_notify(opt, back[i].url_adr, back[i].url_fil,
back[i].url_sav, 0, 1,

View File

@@ -370,7 +370,7 @@ int cache_selftests(httrackp *opt, const char *dir) {
StringCopy(opt->path_html, base);
StringCopy(opt->path_html_utf8, base);
}
opt->cache = 1;
opt->cache = HTS_CACHE_PRIORITY;
/* pass 1: create everything in a single write session */
selftest_open_for_write(&cache, opt);
@@ -547,7 +547,7 @@ static void golden_setup(httrackp *opt, const char *dir) {
StringCopy(opt->path_log, base);
StringCopy(opt->path_html, base);
StringCopy(opt->path_html_utf8, base);
opt->cache = 1;
opt->cache = HTS_CACHE_PRIORITY;
}
int cache_golden_selftest(httrackp *opt, const char *dir, int regen) {

View File

@@ -135,7 +135,8 @@ HTSEXT_API T_SOC catch_url_init(int *port, /* 128 bytes */ char *adr) {
// returns 0 if error
// url: buffer where URL must be stored - or ip:port in case of failure
// data: 32Kb
HTSEXT_API int catch_url(T_SOC soc, char *url, char *method, char *data) {
HTSEXT_API hts_boolean catch_url(T_SOC soc, char *url, char *method,
char *data) {
int retour = 0;
// connexion (accept)

View File

@@ -1835,9 +1835,10 @@ int httpmirror(char *url1, httrackp * opt) {
a++; // sauter espace(s)
if (strnotempty(a)) {
#ifdef IGNORE_RESTRICTIVE_ROBOTS
if (strcmp(a, "/") != 0 || opt->robots >= 3)
if (strcmp(a, "/") != 0 ||
opt->robots >= HTS_ROBOTS_ALWAYS_STRICT)
#endif
{ /* ignoring disallow: / */
{ /* ignoring disallow: / */
if ((strlen(buff) + strlen(a) + 8) < sizeof(buff)) {
strcatbuff(buff, a);
strcatbuff(buff, "\n");
@@ -1932,10 +1933,10 @@ int httpmirror(char *url1, httrackp * opt) {
"Warning: store %s without scan: %s", r.contenttype,
savename());
} else {
if ((opt->getmode & 2) != 0) { // ok autorisé
if ((opt->getmode & HTS_GETMODE_NONHTML) != 0) {
hts_log_print(opt, LOG_DEBUG, "Store %s: %s", r.contenttype,
savename());
} else { // lien non autorisé! (ex: cgi-bin en html)
} else { // lien non autorisé! (ex: cgi-bin en html)
hts_log_print(opt, LOG_DEBUG,
"non-html file ignored after upload at %s : %s",
urladr(), urlfil());
@@ -2052,7 +2053,7 @@ int httpmirror(char *url1, httrackp * opt) {
ptr++;
// faut-il sauter le(s) lien(s) suivant(s)? (fichiers images à passer après les html)
if (opt->getmode & 4) { // sauver les non html après
if (opt->getmode & HTS_GETMODE_HTML_FIRST) {
// sauter les fichiers selon la passe
if (!numero_passe) {
while((ptr < opt->lien_tot) ? (heap(ptr)->pass2) : 0)
@@ -2584,7 +2585,7 @@ static int mkdir_compat(const char *pathname) {
/* path must end with "/" or with the finename (/tmp/bar/ or /tmp/bar/foo.zip) */
/* Note: preserve errno */
HTSEXT_API int dir_exists(const char *path) {
HTSEXT_API hts_boolean dir_exists(const char *path) {
const int err = errno;
STRUCT_STAT st;
char BIGSTK file[HTS_URLMAXSIZE * 2];
@@ -3645,7 +3646,7 @@ HTSEXT_API int hts_setpause(httrackp * opt, int p) {
}
// ask for termination
HTSEXT_API int hts_request_stop(httrackp * opt, int force) {
HTSEXT_API int hts_request_stop(httrackp *opt, hts_boolean force) {
if (opt != NULL) {
hts_log_print(opt, LOG_ERROR, "Exit requested by shell or user");
hts_mutexlock(&opt->state.lock);
@@ -3655,7 +3656,7 @@ HTSEXT_API int hts_request_stop(httrackp * opt, int force) {
return 0;
}
HTSEXT_API int hts_has_stopped(httrackp * opt) {
HTSEXT_API hts_boolean hts_has_stopped(httrackp *opt) {
int ended;
hts_mutexlock(&opt->state.lock);
ended = opt->state.is_ended;
@@ -3677,12 +3678,12 @@ HTSEXT_API int hts_has_stopped(httrackp * opt) {
//}
// ajout d'URL
// -1 : erreur
HTSEXT_API int hts_addurl(httrackp * opt, char **url) {
HTSEXT_API hts_boolean hts_addurl(httrackp *opt, char **url) {
if (url)
opt->state._hts_addurl = url;
return (opt->state._hts_addurl != NULL);
}
HTSEXT_API int hts_resetaddurl(httrackp * opt) {
HTSEXT_API hts_boolean hts_resetaddurl(httrackp *opt) {
opt->state._hts_addurl = NULL;
return (opt->state._hts_addurl != NULL);
}
@@ -3736,10 +3737,10 @@ HTSEXT_API int copy_htsopt(const httrackp * from, httrackp * to) {
// test all: bit 8 de travel
if (from->travel > -1) {
if (from->travel & 256)
to->travel |= 256;
if (from->travel & HTS_TRAVEL_TEST_ALL)
to->travel |= HTS_TRAVEL_TEST_ALL;
else
to->travel &= 255;
to->travel &= HTS_TRAVEL_SCOPE_MASK;
}
return 0;

View File

@@ -1431,7 +1431,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
StringBuff(opt->path_log), "hts-in_progress.lock"))) { // fichier lock?
//char s[32];
opt->cache = 1; // cache prioritaire
opt->cache = HTS_CACHE_PRIORITY; // cache prioritaire
if (opt->quiet == 0) {
if ((fexist
(fconcat
@@ -1465,7 +1465,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
(fconcat
(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt), StringBuff(opt->path_html), "index.html"))) {
//char s[32];
opt->cache = 2; // cache vient après test de validité
opt->cache = HTS_CACHE_TEST_UPDATE;
if (opt->quiet == 0) {
if ((fexist
(fconcat
@@ -1558,25 +1558,25 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
return 0; // déja fait normalement
//
case 'g': // récupérer un (ou plusieurs) fichiers isolés
opt->wizard = 2; // le wizard on peut plus s'en passer..
opt->wizard = HTS_WIZARD_AUTO;
//opt->wizard=0; // pas de wizard
opt->cache = 0; // ni de cache
opt->cache = HTS_CACHE_NONE; // ni de cache
opt->makeindex = 0; // ni d'index
httrack_logmode = 1; // erreurs à l'écran
opt->savename_type = 1003; // mettre dans le répertoire courant
opt->depth = 0; // ne pas explorer la page
opt->accept_cookie = 0; // pas de cookies
opt->robots = 0; // pas de robots
opt->robots = HTS_ROBOTS_NEVER; // pas de robots
break;
case 'w':
opt->wizard = 2; // wizard 'soft' (ne pose pas de questions)
opt->travel = 0;
opt->seeker = 1;
opt->wizard = HTS_WIZARD_AUTO;
opt->travel = HTS_TRAVEL_SAME_ADDRESS;
opt->seeker = HTS_SEEKER_DOWN;
break;
case 'W':
opt->wizard = 1; // Wizard-Help (pose des questions)
opt->travel = 0;
opt->seeker = 1;
opt->wizard = HTS_WIZARD_ASK; // Wizard-Help (pose des questions)
opt->travel = HTS_TRAVEL_SAME_ADDRESS;
opt->seeker = HTS_SEEKER_DOWN;
break;
case 'r': // n'est plus le recurse get bestial mais wizard itou!
if (isdigit((unsigned char) *(com + 1))) {
@@ -1598,19 +1598,23 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
// note: les tests opt->depth sont pour éviter de faire
// un miroir du web (:-O) accidentelement ;-)
case 'a': /*if (opt->depth==9999) opt->depth=3; */
opt->travel = 0 + (opt->travel & 256);
opt->travel =
HTS_TRAVEL_SAME_ADDRESS + (opt->travel & HTS_TRAVEL_TEST_ALL);
break;
case 'd': /*if (opt->depth==9999) opt->depth=3; */
opt->travel = 1 + (opt->travel & 256);
opt->travel =
HTS_TRAVEL_SAME_DOMAIN + (opt->travel & HTS_TRAVEL_TEST_ALL);
break;
case 'l': /*if (opt->depth==9999) opt->depth=3; */
opt->travel = 2 + (opt->travel & 256);
opt->travel =
HTS_TRAVEL_SAME_TLD + (opt->travel & HTS_TRAVEL_TEST_ALL);
break;
case 'e': /*if (opt->depth==9999) opt->depth=3; */
opt->travel = 7 + (opt->travel & 256);
opt->travel =
HTS_TRAVEL_EVERYWHERE + (opt->travel & HTS_TRAVEL_TEST_ALL);
break;
case 't':
opt->travel |= 256;
opt->travel |= HTS_TRAVEL_TEST_ALL;
break;
case 'n':
opt->nearlink = 1;
@@ -1620,16 +1624,16 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
break;
//
case 'U':
opt->seeker = 2;
opt->seeker = HTS_SEEKER_UP;
break;
case 'D':
opt->seeker = 1;
opt->seeker = HTS_SEEKER_DOWN;
break;
case 'S':
opt->seeker = 0;
break;
case 'B':
opt->seeker = 3;
opt->seeker = HTS_SEEKER_DOWN | HTS_SEEKER_UP;
break;
//
case 'Y':
@@ -1659,12 +1663,12 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
//case 'A': opt->urlmode=1; break;
//case 'R': opt->urlmode=2; break;
case 'K':
opt->urlmode = 0;
opt->urlmode = HTS_URLMODE_ABSOLUTE;
if (isdigit((unsigned char) *(com + 1))) {
sscanf(com + 1, "%d", &opt->urlmode);
if (opt->urlmode == 0) { // in fact K0 ==> K2
sscanf(com + 1, "%d", (int *) &opt->urlmode);
if (opt->urlmode == HTS_URLMODE_ABSOLUTE) { // in fact K0 ==> K2
// and K ==> K0
opt->urlmode = 2;
opt->urlmode = HTS_URLMODE_RELATIVE;
}
while(isdigit((unsigned char) *(com + 1)))
com++;
@@ -1779,7 +1783,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
break;
//
case 'b':
sscanf(com + 1, "%d", &opt->accept_cookie);
sscanf(com + 1, "%d", (int *) &opt->accept_cookie);
while(isdigit((unsigned char) *(com + 1)))
com++;
break;
@@ -1831,33 +1835,33 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
break;
case 's':
if (isdigit((unsigned char) *(com + 1))) {
sscanf(com + 1, "%d", &opt->robots);
sscanf(com + 1, "%d", (int *) &opt->robots);
while(isdigit((unsigned char) *(com + 1)))
com++;
} else
opt->robots = 1;
opt->robots = HTS_ROBOTS_SOMETIMES;
#if DEBUG_ROBOTS
printf("robots.txt mode set to %d\n", opt->robots);
#endif
break;
case 'o':
sscanf(com + 1, "%d", &opt->errpage);
sscanf(com + 1, "%d", (int *) &opt->errpage);
while(isdigit((unsigned char) *(com + 1)))
com++;
break;
case 'u':
sscanf(com + 1, "%d", &opt->check_type);
sscanf(com + 1, "%d", (int *) &opt->check_type);
while(isdigit((unsigned char) *(com + 1)))
com++;
break;
//
case 'C':
if (isdigit((unsigned char) *(com + 1))) {
sscanf(com + 1, "%d", &opt->cache);
sscanf(com + 1, "%d", (int *) &opt->cache);
while(isdigit((unsigned char) *(com + 1)))
com++;
} else
opt->cache = 1;
opt->cache = HTS_CACHE_PRIORITY;
break;
case 'k':
opt->all_in_cache = 1;
@@ -1913,7 +1917,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
case 'I':
opt->kindex = 1;
if (isdigit((unsigned char) *(com + 1))) {
sscanf(com + 1, "%d", &opt->kindex);
sscanf(com + 1, "%d", (int *) &opt->kindex);
while(isdigit((unsigned char) *(com + 1)))
com++;
}
@@ -2045,7 +2049,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
// preserve: no footer, original links
case 'p':
StringClear(opt->footer);
opt->urlmode = 4;
opt->urlmode = HTS_URLMODE_KEEP_ORIGINAL;
break;
case 'L': // URL list
if ((na + 1 >= argc) || (argv[na + 1][0] == '-')) {
@@ -3610,12 +3614,12 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
printf("Mirror launched on %s by HTTrack Website Copier/"
HTTRACK_VERSION "%s " HTTRACK_AFF_AUTHORS "" LF, t,
hts_get_version_info(opt));
if (opt->wizard == 0) {
if (opt->wizard == HTS_WIZARD_NONE) {
printf
("mirroring %s with %d levels, %d sockets,t=%d,s=%d,logm=%d,lnk=%d,mdg=%d\n",
url, opt->depth, opt->maxsoc, opt->travel, opt->seeker,
httrack_logmode, opt->urlmode, opt->getmode);
} else { // the magic wizard
} else { // the magic wizard
printf("mirroring %s with the wizard help..\n", url);
}
}

View File

@@ -242,6 +242,14 @@ Please visit our Website: http://www.httrack.com
#define HTS_NOPARAM "(none)"
#define HTS_NOPARAM2 "\"(none)\""
/* Boolean flag for option fields and API yes/no returns. An enum (not C bool)
so it stays int-sized: option fields keep the httrackp layout/ABI, and a
return type stays compatible with the int it replaces. */
#ifndef HTS_DEF_DEFSTRUCT_hts_boolean
#define HTS_DEF_DEFSTRUCT_hts_boolean
typedef enum hts_boolean { HTS_FALSE = 0, HTS_TRUE = 1 } hts_boolean;
#endif
/* Larger/smaller of two values. Macros: arguments are evaluated twice. */
#define maximum(A,B) ( (A) > (B) ? (A) : (B) )

View File

@@ -3646,8 +3646,9 @@ HTSEXT_API char *unescape_http(char *const catbuff, const size_t size, const cha
// DOES NOT DECODE %25 (part of CHAR_DELIM)
// no_high & 1: decode high chars
// no_high & 2: decode space
HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size,
const char *s, const int no_high) {
HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size,
const char *s,
const hts_boolean no_high) {
size_t i, j;
RUNTIME_TIME_CHECK_SIZE(size);
@@ -3931,8 +3932,8 @@ void hts_replace(char *s, char from, char to) {
// guess a local file's mime type (e.g. fil="toto.gif" -> s="image/gif")
// returns 1 if a type was written to s, 0 otherwise
int guess_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil) {
hts_boolean guess_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil) {
return get_httptype_sized(opt, s, ssize, fil, 1);
}
@@ -3945,8 +3946,8 @@ void guess_httptype(httrackp * opt, char *s, const char *fil) {
// write the mime type for fil into s (capacity ssize)
// flag: 1 to always return a type (the "application/..." / octet-stream
// fallback) returns 1 if a type was written to s, 0 otherwise
HTSEXT_API int get_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil, int flag) {
HTSEXT_API hts_boolean get_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil, hts_boolean flag) {
// userdef overrides get_httptype (a rule with an empty value, e.g. "--assume
// cgi=", matches but writes nothing: report it as "no type" like the old
// code, whose callers tested strnotempty(s))
@@ -4196,7 +4197,7 @@ HTSEXT_API int is_userknowntype(httrackp * opt, const char *fil) {
// page dynamique?
// is_dyntype(get_ext("foo.asp"))
HTSEXT_API int is_dyntype(const char *fil) {
HTSEXT_API hts_boolean is_dyntype(const char *fil) {
int j = 0;
if (!fil)
@@ -4214,7 +4215,7 @@ HTSEXT_API int is_dyntype(const char *fil) {
// types critiques qui ne doivent pas être changés car renvoyés par des serveurs qui ne
// connaissent pas le type
int may_unknown(httrackp * opt, const char *st) {
hts_boolean may_unknown(httrackp *opt, const char *st) {
int j = 0;
// types média
@@ -5236,7 +5237,8 @@ HTSEXT_API int hts_uninit_module(void) {
}
// legacy. do not use
HTSEXT_API int hts_log(httrackp * opt, const char *prefix, const char *msg) {
HTSEXT_API hts_boolean hts_log(httrackp *opt, const char *prefix,
const char *msg) {
if (opt->log != NULL) {
fspc(opt, opt->log, prefix);
fprintf(opt->log, "%s" LF, msg);
@@ -5434,34 +5436,34 @@ HTSEXT_API httrackp *hts_create_opt(void) {
/* default settings */
opt->wizard = 2; // wizard automatique
opt->quiet = 0; // questions
//
opt->travel = 0; // même adresse
opt->wizard = HTS_WIZARD_AUTO; // wizard automatique
opt->quiet = HTS_FALSE;
//
opt->travel = HTS_TRAVEL_SAME_ADDRESS; // même adresse
opt->depth = 9999; // mirror total par défaut
opt->extdepth = 0; // mais pas à l'extérieur
opt->seeker = 1; // down
opt->urlmode = 2; // relatif par défaut
opt->no_type_change = 0; // change file types
opt->seeker = HTS_SEEKER_DOWN; // down
opt->urlmode = HTS_URLMODE_RELATIVE; // relatif par défaut
opt->no_type_change = HTS_FALSE;
opt->debug = LOG_NOTICE; // small log
opt->getmode = 3; // linear scan
opt->getmode = HTS_GETMODE_HTML | HTS_GETMODE_NONHTML;
opt->maxsite = -1; // taille max site (aucune)
opt->maxfile_nonhtml = -1; // taille max fichier non html
opt->maxfile_html = -1; // idem pour html
opt->maxsoc = 4; // nbre socket max
opt->fragment = -1; // pas de fragmentation
opt->nearlink = 0; // ne pas prendre les liens non-html "adjacents"
opt->makeindex = 1; // faire un index
opt->kindex = 0; // index 'keyword'
opt->delete_old = 1; // effacer anciens fichiers
opt->background_on_suspend = 1; // Background the process if Control Z calls signal suspend.
opt->makestat = 0; // pas de fichier de stats
opt->maketrack = 0; // ni de tracking
opt->nearlink = HTS_FALSE;
opt->makeindex = HTS_TRUE;
opt->kindex = HTS_FALSE;
opt->delete_old = HTS_TRUE;
opt->background_on_suspend = HTS_TRUE;
opt->makestat = HTS_FALSE;
opt->maketrack = HTS_FALSE;
opt->timeout = 120; // timeout par défaut (2 minutes)
opt->cache = 1; // cache prioritaire
opt->shell = 0; // pas de shell par defaut
opt->cache = HTS_CACHE_PRIORITY; // cache prioritaire
opt->shell = HTS_FALSE;
opt->proxy.active = 0; // pas de proxy
opt->user_agent_send = 1; // envoyer un user-agent
opt->user_agent_send = HTS_TRUE;
StringCopy(opt->user_agent,
"Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)");
StringCopy(opt->referer, "");
@@ -5469,34 +5471,36 @@ HTSEXT_API httrackp *hts_create_opt(void) {
opt->savename_83 = 0; // noms longs par défaut
opt->savename_type = 0; // avec structure originale
opt->savename_delayed = 2; // hard delayed type (default)
opt->delayed_cached = 1; // cached delayed type (default)
opt->mimehtml = 0; // pas MIME-html
opt->delayed_cached = HTS_TRUE;
opt->mimehtml = HTS_FALSE;
opt->parsejava = HTSPARSE_DEFAULT; // parser classes
opt->hostcontrol = 0; // PAS de control host pour timeout et traffic jammer
opt->retry = 2; // 2 retry par défaut
opt->errpage = 1; // copier ou générer une page d'erreur en cas d'erreur (404 etc.)
opt->check_type = 1; // vérifier type si inconnu (cgi,asp..) SAUF / considéré comme html
opt->all_in_cache = 0; // ne pas tout stocker en cache
opt->robots = 2; // traiter les robots.txt
opt->external = 0; // liens externes normaux
opt->passprivacy = 0; // mots de passe dans les fichiers
opt->includequery = 1; // include query-string par défaut
opt->mirror_first_page = 0; // pas mode mirror links
opt->accept_cookie = 1; // gérer les cookies
opt->errpage = HTS_TRUE;
// d'erreur (404 etc.)
opt->check_type = HTS_TRUE;
// considéré comme html
opt->all_in_cache = HTS_FALSE;
opt->robots = HTS_ROBOTS_ALWAYS; // traiter les robots.txt
opt->external = HTS_FALSE;
opt->passprivacy = HTS_FALSE;
opt->includequery = HTS_TRUE;
opt->mirror_first_page = HTS_FALSE;
opt->accept_cookie = HTS_TRUE;
opt->cookie = NULL;
opt->http10 = 0; // laisser http/1.1
opt->nokeepalive = 0; // pas keep-alive
opt->nocompression = 0; // pas de compression
opt->tolerant = 0; // ne pas accepter content-length incorrect
opt->parseall = 1; // tout parser (tags inconnus, par exemple)
opt->parsedebug = 0; // pas de mode débuggage
opt->norecatch = 0; // ne pas reprendre les fichiers effacés par l'utilisateur
opt->http10 = HTS_FALSE;
opt->nokeepalive = HTS_FALSE;
opt->nocompression = HTS_FALSE;
opt->tolerant = HTS_FALSE;
opt->parseall = HTS_TRUE;
opt->parsedebug = HTS_FALSE;
opt->norecatch = HTS_FALSE;
opt->verbosedisplay = 0; // pas d'animation texte
opt->sizehack = 0; // size hack
opt->urlhack = 1; // url hack (normalizer)
opt->sizehack = HTS_FALSE;
opt->urlhack = HTS_TRUE;
StringCopy(opt->footer, HTS_DEFAULT_FOOTER);
opt->ftp_proxy = 1; // proxy http pour ftp
opt->convert_utf8 = 1; // convert html to UTF-8
opt->ftp_proxy = HTS_TRUE;
opt->convert_utf8 = HTS_TRUE;
StringCopy(opt->filelist, "");
StringCopy(opt->lang_iso, "en, *");
StringCopy(opt->accept,
@@ -5507,9 +5511,9 @@ HTSEXT_API httrackp *hts_create_opt(void) {
//
opt->log = stdout;
opt->errlog = stderr;
opt->flush = 1; // flush sur les fichiers log
//opt->aff_progress=0;
opt->keyboard = 0;
opt->flush = HTS_TRUE;
// opt->aff_progress=0;
opt->keyboard = HTS_FALSE;
//
StringCopy(opt->path_html, "");
StringCopy(opt->path_html_utf8, "");
@@ -5526,10 +5530,10 @@ HTSEXT_API httrackp *hts_create_opt(void) {
opt->waittime = -1; // wait until.. hh*3600+mm*60+ss
//
opt->exec = "";
opt->is_update = 0; // not an update (yet)
opt->dir_topindex = 0; // do not built top index (yet)
opt->is_update = HTS_FALSE;
opt->dir_topindex = HTS_FALSE;
//
opt->bypass_limits = 0; // enforce limits by default
opt->bypass_limits = HTS_FALSE;
opt->state.stop = 0; // stopper
opt->state.exit_xh = 0; // abort
//

View File

@@ -285,6 +285,75 @@ typedef enum htsparsejava_flags {
HTSPARSE_NO_AGGRESSIVE = 8 // don't aggressively parse .js or .java
} htsparsejava_flags;
/* Link-rewriting style for saved pages (opt->urlmode). */
#ifndef HTS_DEF_DEFSTRUCT_hts_urlmode
#define HTS_DEF_DEFSTRUCT_hts_urlmode
typedef enum hts_urlmode {
HTS_URLMODE_ABSOLUTE = 0, /**< absolute URL (http://host/path) everywhere */
HTS_URLMODE_ABSOLUTE_FILE = 1, /**< legacy file: form, unused */
HTS_URLMODE_RELATIVE = 2, /**< relative link (default) */
HTS_URLMODE_ABSOLUTE_URI = 3, /**< absolute URI from root (/path) */
HTS_URLMODE_KEEP_ORIGINAL = 4, /**< keep the original link, do not rewrite */
HTS_URLMODE_TRANSPARENT_PROXY = 5 /**< transparent-proxy URL */
} hts_urlmode;
#endif
/* Cache policy for updates and retries (opt->cache). */
#ifndef HTS_DEF_DEFSTRUCT_hts_cachemode
#define HTS_DEF_DEFSTRUCT_hts_cachemode
typedef enum hts_cachemode {
HTS_CACHE_NONE = 0, /**< no cache */
HTS_CACHE_PRIORITY = 1, /**< cache takes priority over the network */
HTS_CACHE_TEST_UPDATE = 2 /**< check for update before reuse (default) */
} hts_cachemode;
#endif
/* Interactive wizard level (opt->wizard). */
#ifndef HTS_DEF_DEFSTRUCT_hts_wizard
#define HTS_DEF_DEFSTRUCT_hts_wizard
typedef enum hts_wizard {
HTS_WIZARD_NONE = 0, /**< no wizard */
HTS_WIZARD_ASK = 1, /**< wizard asks questions */
HTS_WIZARD_AUTO = 2 /**< wizard runs without asking */
} hts_wizard;
#endif
/* robots.txt / meta-robots obedience level (opt->robots). */
#ifndef HTS_DEF_DEFSTRUCT_hts_robots
#define HTS_DEF_DEFSTRUCT_hts_robots
typedef enum hts_robots {
HTS_ROBOTS_NEVER = 0, /**< ignore robots rules */
HTS_ROBOTS_SOMETIMES = 1, /**< partial obedience (default) */
HTS_ROBOTS_ALWAYS = 2, /**< obey robots rules */
HTS_ROBOTS_ALWAYS_STRICT = 3 /**< obey even strict rules */
} hts_robots;
#endif
/* What to fetch (opt->getmode bitmask). */
typedef enum hts_getmode {
HTS_GETMODE_HTML = 1 << 0, /**< save HTML files */
HTS_GETMODE_NONHTML = 1 << 1, /**< save non-HTML files */
HTS_GETMODE_HTML_FIRST = 1 << 2 /**< fetch HTML first, then the other files */
} hts_getmode;
/* Allowed directions in the directory tree (opt->seeker bitmask). */
typedef enum hts_seeker {
HTS_SEEKER_DOWN = 1 << 0, /**< may descend into subdirectories */
HTS_SEEKER_UP = 1 << 1 /**< may ascend to parent directories */
} hts_seeker;
/* Link-following scope, stored in the low byte of opt->travel. */
typedef enum hts_travel_scope {
HTS_TRAVEL_SAME_ADDRESS = 0, /**< stay on the same address (host) */
HTS_TRAVEL_SAME_DOMAIN = 1, /**< stay on the same principal domain */
HTS_TRAVEL_SAME_TLD = 2, /**< stay on the same TLD (e.g. .com) */
HTS_TRAVEL_EVERYWHERE = 7 /**< follow links anywhere on the web */
} hts_travel_scope;
/* Flags OR'd into opt->travel above the scope value. */
#define HTS_TRAVEL_SCOPE_MASK 0xff /**< mask selecting the scope value */
#define HTS_TRAVEL_TEST_ALL (1 << 8) /**< also test forbidden URLs (-t) */
#ifndef HTS_DEF_FWSTRUCT_lien_buffers
#define HTS_DEF_FWSTRUCT_lien_buffers
typedef struct lien_buffers lien_buffers;
@@ -308,14 +377,15 @@ typedef struct httrackp httrackp;
struct httrackp {
size_t size_httrackp; /**< size of this structure (version/ABI guard) */
/* */
int wizard; /**< interactive wizard level (none/full/light) */
int flush; /**< fflush() log files after each write */
hts_wizard wizard; /**< interactive wizard level (none/ask/auto) */
hts_boolean flush; /**< fflush() log files after each write */
int travel; /**< link-following scope (same domain, etc.) */
int seeker; /**< allowed direction: go up and/or down the tree */
int depth; /**< maximum recursion depth (-rN) */
int extdepth; /**< maximum recursion depth outside the start domain */
int urlmode; /**< saved-link rewriting style (relative, absolute, etc.) */
int no_type_change; // do not change file type according to MIME
hts_urlmode
urlmode; /**< saved-link rewriting style (relative, absolute, etc.) */
hts_boolean no_type_change; // do not change file type according to MIME
int debug; /**< debug logging level */
int getmode; /**< what to fetch (HTML, images, ...) bitmask */
FILE *log; /**< informational log stream; NULL mutes it */
@@ -325,28 +395,30 @@ struct httrackp {
LLint maxfile_html; /**< max bytes per HTML file */
int maxsoc; /**< max simultaneous sockets (-cN) */
LLint fragment; /**< split site after this many bytes */
int nearlink; /**< also fetch images/data adjacent to a page but off-site */
int makeindex; /**< build a top-level index.html */
int kindex; /**< build a keyword index */
int delete_old; /**< delete locally obsolete files after update */
hts_boolean
nearlink; /**< also fetch images/data adjacent to a page but off-site */
hts_boolean makeindex; /**< build a top-level index.html */
hts_boolean kindex; /**< build a keyword index */
hts_boolean delete_old; /**< delete locally obsolete files after update */
int timeout; /**< connection timeout in seconds */
int rateout; /**< minimum transfer rate (bytes/s) before abort */
int maxtime; /**< max total mirror duration in seconds */
int maxrate; /**< max transfer rate cap (bytes/s) */
float maxconn; /**< max connections per second */
int waittime; /**< scheduled start time (wall-clock seconds) */
int cache; /**< cache generation mode */
hts_cachemode cache; /**< cache generation mode */
// int aff_progress; // progress bar
int shell; /**< driven by a shell over stdin/stdout pipes */
hts_boolean shell; /**< driven by a shell over stdin/stdout pipes */
t_proxy proxy; /**< proxy configuration */
int savename_83; /**< force 8.3 (DOS) file names */
int savename_type; /**< saved-name layout (original tree, flat, ...) */
String
savename_userdef; /**< user-defined name template (e.g. %h%p/%n%q.%t) */
int savename_delayed; // delayed type check
int delayed_cached; // delayed type check can be cached to speedup updates
int mimehtml; /**< produce a single MIME/MHTML archive */
int user_agent_send; /**< send a User-Agent header */
hts_boolean
delayed_cached; // delayed type check can be cached to speedup updates
hts_boolean mimehtml; /**< produce a single MIME/MHTML archive */
hts_boolean user_agent_send; /**< send a User-Agent header */
String user_agent; /**< User-Agent value (e.g. httrack/1.0) */
String referer; /**< Referer value to send */
String from; /**< From value to send */
@@ -355,37 +427,39 @@ struct httrackp {
String path_html_utf8; /**< output directory for the mirror, UTF-8 form */
String path_bin; /**< directory for HTML templates */
int retry; /**< extra retries on a failed transfer */
int makestat; /**< maintain a transfer-statistics log */
int maketrack; /**< maintain an operations-statistics log */
hts_boolean makestat; /**< maintain a transfer-statistics log */
hts_boolean maketrack; /**< maintain an operations-statistics log */
int parsejava; /**< Java/JS parsing mode; see htsparsejava_flags */
int hostcontrol; /**< drop hosts that are too slow, etc. */
int errpage; /**< generate an error page on 404 and similar */
int check_type; /**< probe unknown-type links (cgi/asp/dir) and follow moves
*/
int all_in_cache; /**< keep all retrieved data in the cache */
int robots; /**< robots.txt handling level */
int external; /**< render external links as error pages */
int passprivacy; /**< strip passwords from external links */
int includequery; /**< include the query string in saved names */
int mirror_first_page; /**< only mirror the links of the first page */
hts_boolean errpage; /**< generate an error page on 404 and similar */
hts_boolean
check_type; /**< probe unknown-type links (cgi/asp/dir) and follow moves
*/
hts_boolean all_in_cache; /**< keep all retrieved data in the cache */
hts_robots robots; /**< robots.txt handling level */
hts_boolean external; /**< render external links as error pages */
hts_boolean passprivacy; /**< strip passwords from external links */
hts_boolean includequery; /**< include the query string in saved names */
hts_boolean mirror_first_page; /**< only mirror the links of the first page */
String sys_com; /**< system command to run */
int sys_com_exec; /**< actually execute sys_com */
int accept_cookie; /**< accept and send cookies */
hts_boolean sys_com_exec; /**< actually execute sys_com */
hts_boolean accept_cookie; /**< accept and send cookies */
t_cookie *cookie; /**< cookie store */
int http10; /**< force HTTP/1.0 */
int nokeepalive; /**< disable keep-alive */
int nocompression; /**< disable content compression */
int sizehack; /**< treat same-size response as "updated" */
int urlhack; // force "url normalization" to avoid loops
int tolerant; /**< accept an incorrect Content-Length */
int parseall; /**< parse aggressively, including unknown tags with links */
int parsedebug; /**< parser debug mode */
int norecatch; /**< do not re-fetch files the user deleted locally */
hts_boolean http10; /**< force HTTP/1.0 */
hts_boolean nokeepalive; /**< disable keep-alive */
hts_boolean nocompression; /**< disable content compression */
hts_boolean sizehack; /**< treat same-size response as "updated" */
hts_boolean urlhack; // force "url normalization" to avoid loops
hts_boolean tolerant; /**< accept an incorrect Content-Length */
hts_boolean
parseall; /**< parse aggressively, including unknown tags with links */
hts_boolean parsedebug; /**< parser debug mode */
hts_boolean norecatch; /**< do not re-fetch files the user deleted locally */
int verbosedisplay; /**< animated text progress display */
String footer; /**< footer/info line injected into pages */
int maxcache; /**< in-memory cache backing limit (bytes) */
// int maxcache_anticipate; // maximum links to anticipate (upper bound)
int ftp_proxy; /**< use the HTTP proxy for FTP too */
hts_boolean ftp_proxy; /**< use the HTTP proxy for FTP too */
String filelist; /**< file listing URLs to include */
String urllist; /**< file listing filters to include */
htsfilters filters; /**< filter pointers (+/-pattern rules) */
@@ -399,20 +473,20 @@ struct httrackp {
String headers; // Additional headers
String mimedefs; // ext1=mimetype1\next2=mimetype2..
String mod_blacklist; /**< blacklisted modules */
int convert_utf8; // filenames UTF-8 conversion (3.46)
hts_boolean convert_utf8; // filenames UTF-8 conversion (3.46)
//
int maxlink; /**< max number of links */
int maxfilter; /**< max number of filters */
//
const char *exec; /**< path of the running executable */
//
int quiet; /**< suppress non-wizard questions */
int keyboard; /**< poll stdin for keyboard input */
int bypass_limits; // bypass built-in limits
int background_on_suspend; // background process on suspend signal
hts_boolean quiet; /**< suppress non-wizard questions */
hts_boolean keyboard; /**< poll stdin for keyboard input */
hts_boolean bypass_limits; // bypass built-in limits
hts_boolean background_on_suspend; // background process on suspend signal
//
int is_update; /**< this run is an update (show "File updated...") */
int dir_topindex; /**< rebuild the top index afterwards */
hts_boolean is_update; /**< this run is an update (show "File updated...") */
hts_boolean dir_topindex; /**< rebuild the top index afterwards */
//
// callbacks
t_hts_htmlcheck_callbacks

View File

@@ -349,7 +349,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
#endif
// Now, parsing
if ((opt->getmode & 1) && (ptr > 0)) { // récupérer les html sur disque
if ((opt->getmode & HTS_GETMODE_HTML) && (ptr > 0)) {
// créer le fichier html local
HT_ADD_FOP; // écrire peu à peu le fichier
}
@@ -553,7 +553,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
if (opt->depth == heap(ptr)->depth) { // on note toujours les premiers liens
if (!in_media) {
if (opt->makeindex && (ptr > 0)) {
if (opt->getmode & 1) { // autorisation d'écrire
if (opt->getmode & HTS_GETMODE_HTML) {
p = strfield(html, "title");
if (p) {
if (*(html - 1) == '/')
@@ -704,7 +704,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
}
}
if (opt->getmode & 1) { // sauver html
if (opt->getmode & HTS_GETMODE_HTML) { // sauver html
p = 0;
switch (emited_footer) {
case 0:
@@ -740,7 +740,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
if (strchr(r->adr, '\r'))
eol = "\r\n";
if (StringNotEmpty(opt->footer) || opt->urlmode != 4) { /* != preserve */
if (StringNotEmpty(opt->footer) ||
opt->urlmode != HTS_URLMODE_KEEP_ORIGINAL) {
if (StringNotEmpty(opt->footer)) {
char BIGSTK tempo[1024 + HTS_URLMAXSIZE * 2];
char gmttime[256];
@@ -1746,7 +1747,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
// écrire codebase avant, flusher avant code
if ((p_type == -1) || (p_type == -2)) {
if ((opt->getmode & 1) && (ptr > 0)) {
if ((opt->getmode & HTS_GETMODE_HTML) && (ptr > 0)) {
HT_add_adr; // refresh
}
lastsaved = html; // dernier écrit+1
@@ -1837,7 +1838,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
// ne pas flusher après code si on doit écrire le codebase avant!
if ((p_type != -1) && (p_type != 2) && (p_type != -2)) {
if ((opt->getmode & 1) && (ptr > 0)) {
if ((opt->getmode & HTS_GETMODE_HTML) && (ptr > 0)) {
HT_add_adr; // refresh
}
lastsaved = html; // dernier écrit+1
@@ -1914,7 +1915,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
if (*html != '#') { // Not empty+unique #
if (eadr - html == 1) { // 1=link empty with delim (end_adr-start_adr)
if (quote) {
if ((opt->getmode & 1) && (ptr > 0)) {
if ((opt->getmode & HTS_GETMODE_HTML) && (ptr > 0)) {
HT_ADD("#"); // We add this for a <href="">
}
}
@@ -2569,7 +2570,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
if ((p_type == 2) || (p_type == -2)) { // base href ou codebase, pas un lien
hts_log_print(opt, LOG_DEBUG, "Code/Codebase: %s%s",
afs.af.adr, afs.af.fil);
} else if ((opt->getmode & 4) == 0) {
} else if ((opt->getmode & HTS_GETMODE_HTML_FIRST) ==
0) {
hts_log_print(opt, LOG_DEBUG, "Record: %s%s -> %s",
afs.af.adr, afs.af.fil, afs.save);
} else {
@@ -2592,8 +2594,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
lastsaved = eadr - 1 + 1; // sauter "
}
/* */
else if (opt->urlmode == 0) { // URL absolue dans tous les cas
if ((opt->getmode & 1) && (ptr > 0)) { // ecrire les html
else if (opt->urlmode == HTS_URLMODE_ABSOLUTE) {
if ((opt->getmode & HTS_GETMODE_HTML) && (ptr > 0)) {
if (!link_has_authority(afs.af.adr)) {
HT_ADD("http://");
} else {
@@ -2620,12 +2622,14 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
}
lastsaved = eadr - 1; // dernier écrit+1 (enfin euh apres on fait un ++ alors hein)
/* */
} else if (opt->urlmode == 4) { // ne rien faire!
} else if (opt->urlmode == HTS_URLMODE_KEEP_ORIGINAL) {
/* */
/* leave the link 'as is' */
/* Sinon, dépend de interne/externe */
} else if (forbidden_url == 1) { // le lien ne sera pas chargé, référence externe!
if ((opt->getmode & 1) && (ptr > 0)) {
} else if (forbidden_url ==
1) { // le lien ne sera pas chargé, référence
// externe!
if ((opt->getmode & HTS_GETMODE_HTML) && (ptr > 0)) {
if (p_type != -1) { // pas que le nom de fichier (pas classe java)
if (!opt->external) {
if (!link_has_authority(afs.af.adr)) {
@@ -2674,7 +2678,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
'/') ? 1 : (ishtml(opt, afs.af.fil)))) {
case 1:
case -2: // html ou répertoire
if (opt->getmode & 1) { // sauver html
if (opt->getmode & HTS_GETMODE_HTML) {
patch_it = 1; // redirect
add_url = 1; // avec link?
cat_name = "external.html";
@@ -2847,7 +2851,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
}
// érire codebase="chemin"
if ((opt->getmode & 1) && (ptr > 0)) {
if ((opt->getmode & HTS_GETMODE_HTML) &&
(ptr > 0)) {
char BIGSTK tempo4[HTS_URLMAXSIZE * 2];
tempo4[0] = '\0';
@@ -2875,9 +2880,11 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
lastsaved = eadr - 1;
}
/*
else if (opt->urlmode==1) { // ABSOLU, c'est le cas le moins courant
else if (opt->urlmode==1) { // ABSOLU, c'est le cas le
moins courant
// NE FONCTIONNE PAS!! (et est inutile)
if ((opt->getmode & 1) && (ptr>0)) { // ecrire les html
if ((opt->getmode & 1) && (ptr>0)) { // ecrire les
html
// écrire le lien modifié, absolu
HT_ADD("file:");
if (*save=='/')
@@ -2885,7 +2892,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
else
HT_ADD(save)
}
lastsaved=eadr-1; // dernier écrit+1 (enfin euh apres on fait un ++ alors hein)
lastsaved=eadr-1; // dernier écrit+1 (enfin euh apres
on fait un ++ alors hein)
}
*/
else if (opt->mimehtml) {
@@ -2895,18 +2903,18 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
make_content_id(afs.af.adr, afs.af.fil, cid, sizeof(cid));
HT_ADD_HTMLESCAPED(cid);
lastsaved = eadr - 1; // dernier écrit+1 (enfin euh apres on fait un ++ alors hein)
} else if (opt->urlmode == 3) { // URI absolue /
if ((opt->getmode & 1) && (ptr > 0)) { // ecrire les html
} else if (opt->urlmode == HTS_URLMODE_ABSOLUTE_URI) {
if ((opt->getmode & HTS_GETMODE_HTML) && (ptr > 0)) {
HT_ADD_HTMLESCAPED(afs.af.fil);
}
lastsaved = eadr - 1; // dernier écrit+1 (enfin euh apres on fait un ++ alors hein)
} else if (opt->urlmode == 5) { // transparent proxy URL
} else if (opt->urlmode == HTS_URLMODE_TRANSPARENT_PROXY) {
char BIGSTK tempo[HTS_URLMAXSIZE * 2];
const char *uri;
int i;
char *pos;
if ((opt->getmode & 1) && (ptr > 0)) { // ecrire les html
if ((opt->getmode & HTS_GETMODE_HTML) && (ptr > 0)) {
if (!link_has_authority(afs.af.adr)) {
HT_ADD("http://");
} else {
@@ -2947,7 +2955,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
HT_ADD_HTMLESCAPED(tempo);
}
lastsaved = eadr - 1; // dernier écrit+1 (enfin euh apres on fait un ++ alors hein)
} else if (opt->urlmode == 2) { // RELATIF
} else if (opt->urlmode == HTS_URLMODE_RELATIVE) {
char BIGSTK tempo[HTS_URLMAXSIZE * 2];
tempo[0] = '\0';
@@ -3009,7 +3017,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
}
// érire codebase="chemin"
if ((opt->getmode & 1) && (ptr > 0)) {
if ((opt->getmode & HTS_GETMODE_HTML) &&
(ptr > 0)) {
char BIGSTK tempo4[HTS_URLMAXSIZE * 2];
tempo4[0] = '\0';
@@ -3027,7 +3036,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
//lastsaved=adr; // dernier écrit+1
}
if ((opt->getmode & 1) && (ptr > 0)) {
if ((opt->getmode & HTS_GETMODE_HTML) && (ptr > 0)) {
// convert to local codepage - NOT, already converted into %NN, and passed to the remote server so we do not have anything to do
//if (str->page_charset_ != NULL && *str->page_charset_ != '\0') {
// char *const local_save = hts_convertStringFromUTF8(tempo, strlen(tempo), str->page_charset_);
@@ -3061,7 +3070,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
"Error building relative link %s and %s",
afs.save, relativesavename());
}
} // sinon le lien sera écrit normalement
} // sinon le lien sera écrit normalement
#if 0
if (fexist(save)) { // le fichier existe..
@@ -3089,7 +3098,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
opt->maxlink);
hts_log_print(opt, LOG_INFO,
"To avoid that: use #L option for more links (example: -#L1000000)");
if ((opt->getmode & 1) && (ptr > 0)) {
if ((opt->getmode & HTS_GETMODE_HTML) && (ptr > 0)) {
if (fp) {
fclose(fp);
fp = NULL;
@@ -3101,9 +3110,9 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
int pass_fix, dejafait = 0;
// Calculer la priorité de ce lien
if ((opt->getmode & 4) == 0) { // traiter html après
if ((opt->getmode & HTS_GETMODE_HTML_FIRST) == 0) {
pass_fix = 0;
} else { // vérifier que ce n'est pas un !html
} else { // vérifier que ce n'est pas un !html
if (!ishtml(opt, afs.af.fil))
pass_fix = 1; // priorité inférieure (traiter après)
else
@@ -3167,7 +3176,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
if (checkrobots(_ROBOTS, afs.af.adr, "") == -1) { // robots.txt ?
// enregistrer robots.txt (MACRO)
if (!hts_record_link(opt, afs.af.adr, "/robots.txt", "", "", "", NULL)) {
if ((opt->getmode & 1) && (ptr > 0)) {
if ((opt->getmode & HTS_GETMODE_HTML) &&
(ptr > 0)) {
if (fp) {
fclose(fp);
fp = NULL;
@@ -3206,7 +3216,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
// enregistrer
if (!hts_record_link(opt, afs.af.adr, afs.af.fil, afs.save,
former.adr, former.fil, codebase)) {
if ((opt->getmode & 1) && (ptr > 0)) {
if ((opt->getmode & HTS_GETMODE_HTML) &&
(ptr > 0)) {
if (fp) {
fclose(fp);
fp = NULL;
@@ -3351,7 +3362,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
}
// ----------
// écrire peu à peu
if ((opt->getmode & 1) && (ptr > 0))
if ((opt->getmode & HTS_GETMODE_HTML) && (ptr > 0))
HT_add_adr;
lastsaved = html; // dernier écrit+1
// ----------
@@ -3411,7 +3422,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
opt->state._hts_in_html_parsing = 0; // flag
opt->state._hts_cancel = 0; // pas de cancel
if ((opt->getmode & 1) && (ptr > 0)) {
if ((opt->getmode & HTS_GETMODE_HTML) && (ptr > 0)) {
{
char *cAddr = TypedArrayElts(output_buffer);
int cSize = (int) TypedArraySize(output_buffer);
@@ -3443,7 +3454,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
//
} // if !error
if (opt->getmode & 1) {
if (opt->getmode & HTS_GETMODE_HTML) {
if (fp) {
fclose(fp);
fp = NULL;

View File

@@ -1213,7 +1213,7 @@ HTSEXT_API find_handle hts_findfirst(char *path) {
return NULL;
}
HTSEXT_API int hts_findnext(find_handle find) {
HTSEXT_API hts_boolean hts_findnext(find_handle find) {
if (find) {
#ifdef _WIN32
if ((FindNextFileA(find->handle, &find->hdata)))
@@ -1273,7 +1273,7 @@ HTSEXT_API int hts_findgetsize(find_handle find) {
return -1;
}
HTSEXT_API int hts_findisdir(find_handle find) {
HTSEXT_API hts_boolean hts_findisdir(find_handle find) {
if (find) {
if (!hts_findissystem(find)) {
#ifdef _WIN32
@@ -1287,7 +1287,7 @@ HTSEXT_API int hts_findisdir(find_handle find) {
}
return 0;
}
HTSEXT_API int hts_findisfile(find_handle find) {
HTSEXT_API hts_boolean hts_findisfile(find_handle find) {
if (find) {
if (!hts_findissystem(find)) {
#ifdef _WIN32
@@ -1301,7 +1301,7 @@ HTSEXT_API int hts_findisfile(find_handle find) {
}
return 0;
}
HTSEXT_API int hts_findissystem(find_handle find) {
HTSEXT_API hts_boolean hts_findissystem(find_handle find) {
if (find) {
#ifdef _WIN32
if (find->hdata.

View File

@@ -108,15 +108,15 @@ HTSEXT_API int hts_buildtopindex(httrackp * opt, const char *path,
// Portable directory find functions
// Directory find functions
HTSEXT_API find_handle hts_findfirst(char *path);
HTSEXT_API int hts_findnext(find_handle find);
HTSEXT_API hts_boolean hts_findnext(find_handle find);
HTSEXT_API int hts_findclose(find_handle find);
//
HTSEXT_API char *hts_findgetname(find_handle find);
HTSEXT_API int hts_findgetsize(find_handle find);
HTSEXT_API int hts_findisdir(find_handle find);
HTSEXT_API int hts_findisfile(find_handle find);
HTSEXT_API int hts_findissystem(find_handle find);
HTSEXT_API hts_boolean hts_findisdir(find_handle find);
HTSEXT_API hts_boolean hts_findisfile(find_handle find);
HTSEXT_API hts_boolean hts_findissystem(find_handle find);
#endif

View File

@@ -178,7 +178,7 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
// -------------------- PHASE 1 --------------------
/* Doit-on traiter les non html? */
if ((opt->getmode & 2) == 0) { // non on ne doit pas
if ((opt->getmode & HTS_GETMODE_NONHTML) == 0) { // non on ne doit pas
if (!ishtml(opt, fil)) { // non il ne faut pas
//adr[0]='\0'; // ne pas traiter ce lien, pas traiter
forbidden_url = 1; // interdire récupération du lien
@@ -266,11 +266,11 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
test2 =
(strchr(tempo2 + ((*tempo2 == '/') ? 1 : 0), '/') != NULL);
if ((test1) && (test2)) { // on ne peut que descendre
if ((opt->seeker & 1) == 0) { // interdiction de descendre
if ((opt->seeker & HTS_SEEKER_DOWN) == 0) {
forbidden_url = 1;
hts_log_print(opt, LOG_DEBUG, "lower link canceled: %s%s", adr,
fil);
} else { // autorisé à priori - NEW
} else { // autorisé à priori - NEW
if (!heap(ptr)->link_import) { // ne résulte pas d'un 'moved'
forbidden_url = 0;
hts_log_print(opt, LOG_DEBUG, "lower link authorized: %s%s",
@@ -278,7 +278,7 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
}
}
} else if ((test1) || (test2)) { // on peut descendre pour accéder au lien
if ((opt->seeker & 1) != 0) { // on peut descendre - NEW
if ((opt->seeker & HTS_SEEKER_DOWN) != 0) {
if (!heap(ptr)->link_import) { // ne résulte pas d'un 'moved'
forbidden_url = 0;
hts_log_print(opt, LOG_DEBUG, "lower link authorized: %s%s",
@@ -290,11 +290,11 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
// up
if ((!strncmp(tempo, "../", 3)) && (!strncmp(tempo2, "../", 3))) { // impossible sans monter
if ((opt->seeker & 2) == 0) { // interdiction de monter
if ((opt->seeker & HTS_SEEKER_UP) == 0) {
forbidden_url = 1;
hts_log_print(opt, LOG_DEBUG, "upper link canceled: %s%s", adr,
fil);
} else { // autorisé à monter - NEW
} else { // autorisé à monter - NEW
if (!heap(ptr)->link_import) { // ne résulte pas d'un 'moved'
forbidden_url = 0;
hts_log_print(opt, LOG_DEBUG, "upper link authorized: %s%s",
@@ -302,13 +302,13 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
}
}
} else if ((!strncmp(tempo, "../", 3)) || (!strncmp(tempo2, "../", 3))) { // Possible en montant
if ((opt->seeker & 2) != 0) { // autorisé à monter - NEW
if ((opt->seeker & HTS_SEEKER_UP) != 0) {
if (!heap(ptr)->link_import) { // ne résulte pas d'un 'moved'
forbidden_url = 0;
hts_log_print(opt, LOG_DEBUG, "upper link authorized: %s%s",
adr, fil);
}
} // sinon autorisé en descente
} // sinon autorisé en descente
}
} else {
@@ -345,83 +345,81 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
//if (!opt->wizard) { // mode non wizard
// doit-on traiter ce lien?.. vérifier droits de sortie
switch ((opt->travel & 255)) {
case 0:
switch ((opt->travel & HTS_TRAVEL_SCOPE_MASK)) {
case HTS_TRAVEL_SAME_ADDRESS:
if (!opt->wizard) // mode non wizard
forbidden_url = 1;
break; // interdicton de sortir au dela de l'adresse
case 1:{ // sortie sur le même dom.xxx
size_t i = strlen(adr) - 1;
size_t j = strlen(urladr()) - 1;
case HTS_TRAVEL_SAME_DOMAIN: {
size_t i = strlen(adr) - 1;
size_t j = strlen(urladr()) - 1;
if ((i > 0) && (j > 0)) {
while((i > 0) && (adr[i] != '.'))
i--;
while((j > 0) && (urladr()[j] != '.'))
j--;
if ((i > 0) && (j > 0)) {
i--;
j--;
while((i > 0) && (adr[i] != '.'))
i--;
while((j > 0) && (urladr()[j] != '.'))
j--;
}
}
if ((i > 0) && (j > 0)) {
if (!strfield2(adr + i, urladr() + j)) { // !=
if (!opt->wizard) { // mode non wizard
//printf("refused: %s\n",adr);
forbidden_url = 1; // pas même domaine
hts_log_print(opt, LOG_DEBUG,
"foreign domain link canceled: %s%s", adr, fil);
}
} else {
if (opt->wizard) { // mode wizard
forbidden_url = 0; // même domaine
hts_log_print(opt, LOG_DEBUG, "same domain link authorized: %s%s",
adr, fil);
}
}
} else
forbidden_url = 1;
}
break;
case 2:{ // sortie sur le même .xxx
size_t i = strlen(adr) - 1;
size_t j = strlen(urladr()) - 1;
while((i > 0) && (adr[i] != '.'))
if ((i > 0) && (j > 0)) {
while ((i > 0) && (adr[i] != '.'))
i--;
while((j > 0) && (urladr()[j] != '.'))
while ((j > 0) && (urladr()[j] != '.'))
j--;
if ((i > 0) && (j > 0)) {
if (!strfield2(adr + i, urladr() + j)) { // !-
if (!opt->wizard) { // mode non wizard
//printf("refused: %s\n",adr);
forbidden_url = 1; // pas même .xx
hts_log_print(opt, LOG_DEBUG,
"foreign location link canceled: %s%s", adr, fil);
}
} else {
if (opt->wizard) { // mode wizard
forbidden_url = 0; // même domaine
hts_log_print(opt, LOG_DEBUG,
"same location link authorized: %s%s", adr, fil);
}
}
} else
forbidden_url = 1;
i--;
j--;
while ((i > 0) && (adr[i] != '.'))
i--;
while ((j > 0) && (urladr()[j] != '.'))
j--;
}
}
break;
case 7: // everywhere!!
if ((i > 0) && (j > 0)) {
if (!strfield2(adr + i, urladr() + j)) { // !=
if (!opt->wizard) { // mode non wizard
// printf("refused: %s\n",adr);
forbidden_url = 1; // pas même domaine
hts_log_print(opt, LOG_DEBUG, "foreign domain link canceled: %s%s",
adr, fil);
}
} else {
if (opt->wizard) { // mode wizard
forbidden_url = 0; // même domaine
hts_log_print(opt, LOG_DEBUG, "same domain link authorized: %s%s",
adr, fil);
}
}
} else
forbidden_url = 1;
} break;
case HTS_TRAVEL_SAME_TLD: {
size_t i = strlen(adr) - 1;
size_t j = strlen(urladr()) - 1;
while ((i > 0) && (adr[i] != '.'))
i--;
while ((j > 0) && (urladr()[j] != '.'))
j--;
if ((i > 0) && (j > 0)) {
if (!strfield2(adr + i, urladr() + j)) { // !-
if (!opt->wizard) { // mode non wizard
// printf("refused: %s\n",adr);
forbidden_url = 1; // pas même .xx
hts_log_print(opt, LOG_DEBUG,
"foreign location link canceled: %s%s", adr, fil);
}
} else {
if (opt->wizard) { // mode wizard
forbidden_url = 0; // même domaine
hts_log_print(opt, LOG_DEBUG, "same location link authorized: %s%s",
adr, fil);
}
}
} else
forbidden_url = 1;
} break;
case HTS_TRAVEL_EVERYWHERE:
if (opt->wizard) { // mode wizard
forbidden_url = 0;
break;
}
} // switch
} // switch
// ANCIENNE POS -- récupérer les liens à côtés d'un lien (nearlink)
@@ -583,7 +581,7 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
// on doit poser la question.. peut on la poser?
// (oui je sais quel preuve de délicatesse, merci merci)
if ((question) && (ptr > 0) && (!force_mirror)) {
if (opt->wizard == 2) { // éliminer tous les liens non répertoriés comme autorisés (ou inconnus)
if (opt->wizard == HTS_WIZARD_AUTO) {
question = 0;
forbidden_url = 1;
hts_log_print(opt, LOG_DEBUG,
@@ -600,8 +598,8 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
printf("robots.txt forbidden: %s%s\n", adr, fil);
#endif
// question résolue, par les filtres, et mode robot non strict
if ((!question) && (filters_answer) && (opt->robots == 1)
&& (forbidden_url != 1)) {
if ((!question) && (filters_answer) &&
(opt->robots == HTS_ROBOTS_SOMETIMES) && (forbidden_url != 1)) {
r = 0; // annuler interdiction des robots
if (!forbidden_url) {
hts_log_print(opt, LOG_DEBUG,
@@ -685,7 +683,7 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
io_flush;
} else { // lien primaire: autoriser répertoire entier
if (!force_mirror) {
if ((opt->seeker & 1) == 0) { // interdiction de descendre
if ((opt->seeker & HTS_SEEKER_DOWN) == 0) {
n = 7;
} else {
n = 5; // autoriser miroir répertoires descendants (lien primaire)
@@ -712,7 +710,7 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
switch (n) {
case -1: // sauter tout le reste
forbidden_url = 1;
opt->wizard = 2; // sauter tout le reste
opt->wizard = HTS_WIZARD_AUTO; // sauter tout le reste
break;
case 0: // forbid the same link: adr/fil
forbidden_url = 1;
@@ -796,7 +794,7 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
break;
case 5: // allow the whole directory and its children
if ((opt->seeker & 2) == 0) { // not allowed to go up
if ((opt->seeker & HTS_SEEKER_UP) == 0) { // not allowed to go up
size_t i = strlen(fil) - 1;
while((fil[i] != '/') && (i > 0))
@@ -872,7 +870,7 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
// lien non autorisé, peut-on juste le tester?
if (just_test_it) {
if (forbidden_url == 1) {
if (opt->travel & 256) { // tester tout de même
if (opt->travel & HTS_TRAVEL_TEST_ALL) { // tester tout de même
if (strfield(adr, "ftp://") == 0) { // PAS ftp!
forbidden_url = 1; // oui oui toujours interdit (note: sert à rien car ==1 mais c pour comprendre)
*just_test_it = 1; // mais on teste

View File

@@ -206,7 +206,8 @@ HTSEXT_API htsErrorCallback hts_get_error_callback(void);
/* Logging */
/** Legacy: write prefix then msg to opt->log. Returns 0 if written, 1 if
opt->log is NULL. Prefer hts_log_print(). */
HTSEXT_API int hts_log(httrackp * opt, const char *prefix, const char *msg);
HTSEXT_API hts_boolean hts_log(httrackp *opt, const char *prefix,
const char *msg);
/** printf-style log at level @p type (an hts_log_type, optionally |LOG_ERRNO).
Forwards to the registered log callback, and when the level is <= opt->debug
@@ -313,7 +314,8 @@ HTSEXT_API T_SOC catch_url_init(int *port, char *adr);
"ip:port". The buffers are caller-allocated and not bounds-checked: @p data
must be CATCH_URL_DATA_SIZE bytes, and @p url / @p method must fit the
captured request line. */
HTSEXT_API int catch_url(T_SOC soc, char *url, char *method, char *data);
HTSEXT_API hts_boolean catch_url(T_SOC soc, char *url, char *method,
char *data);
/* State */
/** Whether the engine is parsing HTML. Returns 0 if not, otherwise the percent
@@ -334,10 +336,10 @@ HTSEXT_API int hts_is_exiting(httrackp * opt);
caller-owned, NULL-terminated array of strings; the engine stores the
pointer without copying, so the array and its strings must stay valid until
the engine consumes them. @return nonzero if a list is now set. */
HTSEXT_API int hts_addurl(httrackp * opt, char **url);
HTSEXT_API hts_boolean hts_addurl(httrackp *opt, char **url);
/** Clear any pending add-URL list set by hts_addurl(). Always returns 0. */
HTSEXT_API int hts_resetaddurl(httrackp * opt);
HTSEXT_API hts_boolean hts_resetaddurl(httrackp *opt);
/** Apply the runtime-tunable options from @p from onto @p to, to adjust a live
mirror. Only fields set to a non-sentinel value are copied; the rest of @p
@@ -356,7 +358,7 @@ HTSEXT_API int hts_setpause(httrackp * opt, int);
lock, so it is safe to call from another thread). @p force is currently
ignored.
@return 0; no-op if @p opt is NULL. */
HTSEXT_API int hts_request_stop(httrackp * opt, int force);
HTSEXT_API int hts_request_stop(httrackp *opt, hts_boolean force);
/** Queue a single in-progress file, by URL, to be cancelled by the engine.
@p url is copied internally. Takes the state lock, so it is thread-safe.
@@ -373,7 +375,7 @@ HTSEXT_API void hts_cancel_parsing(httrackp * opt);
/** Nonzero once the mirror has fully ended. Read under the engine state lock,
so safe to poll from another thread. Wait for this before hts_free_opt(). */
HTSEXT_API int hts_has_stopped(httrackp * opt);
HTSEXT_API hts_boolean hts_has_stopped(httrackp *opt);
/* Tools */
/** Ensure the directory chain leading to @p path exists, creating missing
@@ -390,7 +392,7 @@ HTSEXT_API int structcheck_utf8(const char *path);
/** Whether the directory containing @p path exists. The basename is stripped
first, so passing a file path tests its parent directory. @return 1 if it is
a directory, 0 otherwise. */
HTSEXT_API int dir_exists(const char *path);
HTSEXT_API hts_boolean dir_exists(const char *path);
/** Write the HTTP reason phrase for @p statuscode into @p msg, a caller buffer
of at least 64 bytes. For an unknown code a non-empty @p msg is kept,
@@ -573,14 +575,15 @@ HTSEXT_API char *unescape_http(char *const catbuff, const size_t size, const cha
must-avoid escapes are kept encoded, and %25 is never decoded). @p no_high &
1 also decodes high (>= 128) bytes; @p no_high & 2 also decodes an escaped
space. Returns @p catbuff. */
HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size, const char *s, const int no_high);
HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size,
const char *s, const hts_boolean no_high);
/** Determine the MIME type of local file name @p fil into @p s (capacity
@p ssize): user --assume rules, then ".html", then the built-in extension
table. @p flag != 0 forces a fallback type. @return 1 if a type was written,
0 otherwise. */
HTSEXT_API int get_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil, int flag);
HTSEXT_API hts_boolean get_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil, hts_boolean flag);
/** @deprecated Use get_httptype_sized(). Assumes @p s has at least
HTS_MIMETYPE_SIZE capacity. */
@@ -600,7 +603,7 @@ HTSEXT_API int is_userknowntype(httrackp * opt, const char *fil);
/** 1 if @p fil, an extension such as "asp" or "php" (not a full filename), is a
known dynamic-page type, else 0. */
HTSEXT_API int is_dyntype(const char *fil);
HTSEXT_API hts_boolean is_dyntype(const char *fil);
/** Extract the extension of @p fil (text after the last '.', stopping at '?')
into caller scratch @p catbuff (capacity @p size) and return it. Returns ""
@@ -610,12 +613,12 @@ HTSEXT_API const char *get_ext(char *catbuff, size_t size, const char *fil);
/** 1 if MIME type @p st must not be reclassified or renamed (hypertext types
and a built-in keep-list of commonly mislabeled types), else 0. */
HTSEXT_API int may_unknown(httrackp * opt, const char *st);
HTSEXT_API hts_boolean may_unknown(httrackp *opt, const char *st);
/** Guess the MIME type of local file @p fil into @p s (capacity @p ssize),
always producing a type. @return 1 if a type was written. */
HTSEXT_API int guess_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil);
HTSEXT_API hts_boolean guess_httptype_sized(httrackp *opt, char *s,
size_t ssize, const char *fil);
/** @deprecated Use guess_httptype_sized(). Assumes @p s has at least
HTS_MIMETYPE_SIZE capacity. */
@@ -677,7 +680,7 @@ HTSEXT_API find_handle hts_findfirst(char *path);
/** Advance to the next directory entry. Returns 1 if an entry is available, 0
at end of directory. */
HTSEXT_API int hts_findnext(find_handle find);
HTSEXT_API hts_boolean hts_findnext(find_handle find);
/** Close the iteration and free @p find. Always returns 0; NULL is accepted. */
HTSEXT_API int hts_findclose(find_handle find);
@@ -692,16 +695,16 @@ HTSEXT_API int hts_findgetsize(find_handle find);
/** 1 if the current entry is a directory, else 0 (a system/special entry, see
hts_findissystem(), reports 0). */
HTSEXT_API int hts_findisdir(find_handle find);
HTSEXT_API hts_boolean hts_findisdir(find_handle find);
/** 1 if the current entry is a regular file, else 0 (a system/special entry,
see hts_findissystem(), reports 0). */
HTSEXT_API int hts_findisfile(find_handle find);
HTSEXT_API hts_boolean hts_findisfile(find_handle find);
/** 1 if the current entry is a special/system entry to skip: "." or "..", on
POSIX also device/fifo/socket nodes, on Windows also system, hidden or
temporary entries. Else 0. */
HTSEXT_API int hts_findissystem(find_handle find);
HTSEXT_API hts_boolean hts_findissystem(find_handle find);
/* UTF-8 aware FILE API */
/* On non-Windows these macros resolve directly to the POSIX calls. On Windows