Compare commits

..

11 Commits

Author SHA1 Message Date
Xavier Roche
31eead95df Bound htscoremain.c pointer-destination buffer writes (batch 13)
Continues the htssafe.h pointer-destination migration in the CLI parser
(hts_main_internal). All sites write into a bare char*.

* The cmdl_add()/cmdl_ins() macros build argv entries into the x_argvblk block
  (malloc'd as the command-line size + 32768). Thread the block's total size
  (recorded in a new x_argvblk_size) and bound the copy with strlcpybuff. The
  remaining room is computed by a cmdl_room() helper that yields 0 once the block
  is exhausted (alias expansion or doit.log insertion can outrun the 32768 slack)
  so the copy aborts cleanly instead of the size_t subtraction wrapping to a huge
  unbounded value.
* The in-place argv rewrites each write no more than the slot already holds, so
  they are bounded by strlen(dest)+1 (provably sufficient): the "(none)" ->
  "\"\"" replacement, the two quote-strip copies (tempo is argv[na] minus its
  surrounding quotes), and the "--catchurl" -> "-#P" rewrite. The "--clean"/
  "--tide" empty rewrite becomes a direct argv[i][1]='\0'.
* Guard the quote-strip's tempo[strlen(tempo)-1] read: a lone '"' argument left
  tempo empty and read tempo[-1] (out of bounds). It now takes the existing
  missing-quote error path.
* The URL accumulator append uses strlcatbuff against the tracked url_sz.

These are macros/locals inside hts_main_internal, so not -#7 unit-testable;
cmdl_add runs on every invocation (covered by the whole suite). New
01_engine-cmdline.test cases exercise the quote-strip rewrite as the sole URL (a
quoted URL is mirrored; dangling- and lone-quote arguments are refused cleanly,
never a crash).

htscoremain.c pointer-destination warnings: 10 -> 0.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 19:29:30 +02:00
Xavier Roche
1f29ed41db Bound htscoremain.c pointer-destination buffer writes (batch 13)
Continues the htssafe.h pointer-destination migration in the CLI parser
(hts_main_internal). All sites write into a bare char*.

* The cmdl_add()/cmdl_ins() macros build argv entries into the x_argvblk block
  (malloc'd as the command-line size + 32768). Thread the block's total size and
  bound the copy with strlcpybuff(argv[i], token, bufsize - ptr); record the size
  in a new x_argvblk_size alongside x_argvblk.
* The in-place argv rewrites each write no more than the slot already holds, so
  they are bounded by strlen(dest)+1 (provably sufficient): the "(none)" ->
  "\"\"" replacement, the two quote-strip copies (tempo is argv[na] minus its
  surrounding quotes), and the "--catchurl" -> "-#P" rewrite. The "--clean"/
  "--tide" empty rewrite becomes a direct argv[i][1]='\0'.
* The URL accumulator append uses strlcatbuff against the tracked url_sz.

These are macros/locals inside hts_main_internal, so they are not -#7
unit-testable; cmdl_add runs on every invocation (covered by the whole suite),
and a new 01_engine-cmdline.test case exercises the quote-strip rewrite (a quoted
URL is mirrored; a dangling quote is refused cleanly, never a crash).

htscoremain.c pointer-destination warnings: 10 -> 0.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 18:57:19 +02:00
Xavier Roche
9db360e5fd Merge pull request #369 from xroche/cleanup/htstools-bounds
Bound htstools.c pointer-destination buffer writes (batch 12)
2026-06-16 18:25:07 +02:00
Xavier Roche
88bfcff10c Bound htstools.c pointer-destination buffer writes (batch 12)
Continues the htssafe.h pointer-destination migration: the strcpybuff/strcatbuff
macros silently fall back to a raw strcpy/strcat when the destination is a bare
char* rather than a sized array.

All four functions are internal (hidden, not HTSEXT_API), so they take explicit
destination sizes:
* lienrelatif() builds a relative link into a char* caller buffer; threads a
  size_t and bounds the "../"/path appends with strlcatbuff (the local _curr
  copy uses sizeof(_curr)).
* long_to_83() / longfile_to_83() build an 8-3 / ISO9660 name into a caller
  buffer; thread a size_t and use strl(n)catbuff.
* ident_url_relatif()'s in-place IDNA host rewrite bounds the copy by the
  remaining capacity of adrfil->adr (a pointer into that array).

Callers in htscore.c, htswizard.c, htsparse.c and htsname.c pass sizeof(dest)
(all the destinations are HTS_URLMAXSIZE*2 arrays).

Add -#7 basic_selftests for longfile_to_83 (8-3 and ISO9660), long_to_83
(per-segment path conversion) and lienrelatif (same-dir basename, parent "../").

htstools.c pointer-destination warnings: 10 -> 0.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 18:01:47 +02:00
Xavier Roche
1df45fc231 Merge pull request #368 from xroche/cleanup/htsname-bounds
Bound htsname.c pointer-destination buffer writes (batch 11)
2026-06-16 17:25:12 +02:00
Xavier Roche
3a0f5779dd Bound htsname.c pointer-destination buffer writes (batch 11)
Continues the htssafe.h pointer-destination migration: the strcpybuff/strcatbuff
macros silently fall back to a raw strcpy/strcat when the destination is a bare
char* rather than a sized array.

In htsname.c:
* standard_name() builds the md5-based name into a caller buffer it received as
  char* (size lost), via a chain of strncatbuff/strcatbuff. It is internal
  (hidden, not HTSEXT_API), so it now takes an explicit destination size and
  builds through an htsbuff bounded builder; its one caller (the
  ADD_STANDARD_NAME macro) passes sizeof(buff).
* url_savename()'s delayed-extension append into lastDot (a pointer into the
  afs->save[HTS_URLMAXSIZE*2] array) is bounded with strlcatbuff against the
  remaining capacity.

Add a -#7 basic_selftests case for standard_name covering the no-query (no md5),
query (4-char md5) and short-name (clamped extension) paths.

htsname.c pointer-destination warnings: 12 -> 0.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 17:23:22 +02:00
Xavier Roche
46fd973e0b Merge pull request #366 from xroche/docs/agents-md
Add AGENTS.md operational checklist for AI-assisted contributions
2026-06-16 16:59:33 +02:00
Xavier Roche
ddc39b7dc0 Merge pull request #367 from xroche/cleanup/htslib-mime-bounds
Fix get_httptype contenttype overflow; bound the mime/normalize APIs
2026-06-16 16:59:11 +02:00
Xavier Roche
085937b305 Fix get_httptype contenttype overflow; bound the mime/normalize APIs
get_httptype() took the caller buffer as a bare char* and raw-strcpy'd the MIME
string into it, so crawling a URL ending in .docx/.pptx/.xlsx (whose table MIME
types reach 73 chars) overflowed the 64-byte htsblk.contenttype that the htsback
and htslib callers pass, corrupting the adjacent struct fields. Remotely
triggerable.

* Widen htsblk contenttype/charset/contentencoding to HTS_MIMETYPE_SIZE (128, a
  new named constant holding the longest registered MIME type). This changes the
  installed htsblk layout, so bump the library soname (VERSION_INFO 2:49:0 ->
  3:0:0).
* Add bounded get_httptype_sized(), guess_httptype_sized() and
  adr_normalized_sized() that take the destination size and use
  strlcpybuff/snprintf. The old get_httptype(), guess_httptype() and
  adr_normalized() stay as wrappers, now marked HTS_DEPRECATED (portable:
  GCC/Clang attribute, MSVC __declspec, nothing elsewhere). Internal callers
  pass the real buffer size; the deprecated wrappers bound to the implicit
  contract their old callers relied on (HTS_MIMETYPE_SIZE for the mime buffer,
  HTS_URLMAXSIZE*2 for the URL buffer) rather than staying unbounded, so they
  abort on overflow instead of silently corrupting memory.
* get_httptype_sized(), guess_httptype_sized() and give_mimext() now report
  whether a type/extension was written; callers check the result and bail
  rather than use a possibly-empty buffer (e.g. the is_hypertext_mime helpers).
  A user "--assume cgi=" rule (empty value) matches but writes nothing, so
  get_httptype_sized() returns the buffer's emptiness, matching the old callers'
  strnotempty(s) test rather than reporting a bogus recognized type.
* -#7 basic_selftests: a .pptx MIME (73 chars) is stored whole into a real
  htsblk.contenttype (a [64] field makes the bounded copy abort); give_mimext
  and get_httptype_sized return values; the octet-stream fallback; the empty
  --assume rule; plus fil_normalized "//"-in-query preservation and cut_path
  trailing-slash / single-char branches.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 11:10:49 +02:00
Xavier Roche
36a9f5a827 Merge pull request #365 from xroche/cleanup/htslib-bounds
Bound htslib.c pointer-destination buffer writes (batch 9)
2026-06-16 03:54:38 +02:00
Xavier Roche
20880c1a4d Bound htslib.c pointer-destination buffer writes (batch 9)
Continues the htssafe.h pointer-destination migration (X1), where the
strcpybuff/strcatbuff macros silently fall back to a raw strcpy/strcat
when the destination is a bare char* rather than a sized array.

In htslib.c:
* fil_normalized() rebuilds the sorted query through an htsbuff bounded
  builder over the malloc'd copyBuff, then copies it back with strlcpybuff
  (capacity is the known qLen + 1).
* treathead() bounds the Location: copy with strlcpybuff against the
  location_buffer[HTS_URLMAXSIZE*2] contract.
* give_mimext(), convtolower() and cut_path() are internal (hidden, not
  HTSEXT_API), so they take an explicit destination size and the callers
  pass it: give_mimext in htsname.c/htscoremain.c/htslib.c, convtolower in
  htshash.c. cut_path has no callers.

Add strlncatbuff(dst, src, size, n) to htssafe.h: a bounded n-limited
append with explicit capacity, the missing parallel to strlcatbuff.

Cover fil_normalized query-sort, give_mimext, convtolower and cut_path with
the -#7 basic_selftests.

get_httptype() and adr_normalized() are left for a follow-up: both are
exported (HTSEXT_API), and get_httptype() exposes a real latent overflow
(a .docx/.pptx/.xlsx URL writes a 65-73 char mime type into 64-byte
contenttype callers) whose fix is a public-ABI decision.

htslib.c pointer-destination warnings: 14 -> 4.

Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 03:48:52 +02:00
20 changed files with 494 additions and 180 deletions

4
configure vendored
View File

@@ -3685,7 +3685,9 @@ fi
VERSION_INFO="2:49:0"
# 3:0:0: htsblk layout changed (contenttype/charset/contentencoding widened to
# 128), an incompatible ABI break, so bump current and reset revision/age.
VERSION_INFO="3:0:0"
{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether to enable maintainer-specific portions of Makefiles" >&5
printf %s "checking whether to enable maintainer-specific portions of Makefiles... " >&6; }

View File

@@ -29,7 +29,9 @@ AC_CONFIG_SRCDIR(src/httrack.c)
AC_CONFIG_MACRO_DIR([m4])
AC_CONFIG_HEADERS(config.h)
AM_INIT_AUTOMAKE([subdir-objects])
VERSION_INFO="2:49:0"
# 3:0:0: htsblk layout changed (contenttype/charset/contentencoding widened to
# 128), an incompatible ABI break, so bump current and reset revision/age.
VERSION_INFO="3:0:0"
AM_MAINTAINER_MODE
AC_USE_SYSTEM_EXTENSIONS

View File

@@ -3584,8 +3584,9 @@ void back_wait(struct_back * sback, httrackp * opt, cache_back * cache,
back[i].r.is_file = 1;
back[i].r.totalsize = back[i].r.size =
fsize_utf8(back[i].url_sav);
get_httptype(opt, back[i].r.contenttype,
back[i].url_sav, 1);
get_httptype_sized(opt, back[i].r.contenttype,
sizeof(back[i].r.contenttype),
back[i].url_sav, 1);
hts_log_print(opt, LOG_DEBUG,
"Not-modified status without cache guessed: %s%s",
back[i].url_adr, back[i].url_fil);

View File

@@ -1734,7 +1734,7 @@ int httpmirror(char *url1, httrackp * opt) {
{
char buff[256];
guess_httptype(opt, buff, urlfil());
guess_httptype_sized(opt, buff, sizeof(buff), urlfil());
if (strcmp(buff, "image/gif") == 0)
create_gif_warning = 1;
}
@@ -3150,7 +3150,7 @@ static void postprocess_file(httrackp * opt, const char *save, const char *adr,
/* CID */
make_content_id(adr, fil, cid, sizeof(cid));
guess_httptype(opt, mimebuff, save);
guess_httptype_sized(opt, mimebuff, sizeof(mimebuff), save);
fprintf(opt->state.mimefp, "--%s\r\n",
StringBuff(opt->state.mimemid));
/*if (first)
@@ -3862,7 +3862,8 @@ int htsAddLink(htsmoduleStruct * str, char *link) {
opt->savename_83 = b;
if (r != -1 && !forbidden_url) {
if (savename()) {
if (lienrelatif(tempo, afs.save, savename()) == 0) {
if (lienrelatif(tempo, sizeof(tempo), afs.save, savename()) ==
0) {
hts_log_print(opt, LOG_DEBUG,
"(module): relative link at %s build with %s and %s: %s",
afs.af.adr, afs.save, savename(), tempo);

View File

@@ -69,23 +69,29 @@ Please visit our Website: http://www.httrack.com
/* Resolver */
extern int IPV6_resolver;
// Add a command in the argc/argv
#define cmdl_add(token,argc,argv,buff,ptr) \
argv[argc]=(buff+ptr); \
strcpybuff(argv[argc],token); \
ptr += (int) (strlen(argv[argc])+2); \
/* Remaining room in the argv block; 0 once it is exhausted (alias expansion or
doit.log insertion can outrun the +32768 slack), so the copy aborts cleanly
instead of the subtraction wrapping to a huge unbounded size. */
#define cmdl_room(bufsize, ptr) \
((ptr) < (size_t) (bufsize) ? (size_t) (bufsize) - (ptr) : 0)
// Add a command in the argc/argv (buff has total capacity bufsize)
#define cmdl_add(token, argc, argv, buff, bufsize, ptr) \
argv[argc] = (buff + ptr); \
strlcpybuff(argv[argc], token, cmdl_room(bufsize, ptr)); \
ptr += (int) (strlen(argv[argc]) + 2); \
argc++
// Insert a command in the argc/argv
#define cmdl_ins(token,argc,argv,buff,ptr) \
{ \
int i; \
for(i=argc;i>0;i--)\
argv[i]=argv[i-1];\
} \
argv[0]=(buff+ptr); \
strcpybuff(argv[0],token); \
ptr += (int) (strlen(argv[0])+2); \
// Insert a command in the argc/argv (buff has total capacity bufsize)
#define cmdl_ins(token, argc, argv, buff, bufsize, ptr) \
{ \
int i; \
for (i = argc; i > 0; i--) \
argv[i] = argv[i - 1]; \
} \
argv[0] = (buff + ptr); \
strlcpybuff(argv[0], token, cmdl_room(bufsize, ptr)); \
ptr += (int) (strlen(argv[0]) + 2); \
argc++
#define htsmain_free() do { \
@@ -285,6 +291,196 @@ static void basic_selftests(void) {
assertf(end == NULL && strcmp(tok, "a\\") == 0);
}
}
// fil_normalized(): canonicalizes a URL path. Query arguments are sorted
// alphabetically (by the text after each '?'/'&') and the query is rebuilt
// through a bounded builder; outside the query, "//" collapses to "/".
// Regression for that builder.
{
char norm[256];
assertf(strcmp(fil_normalized("/p?b=2&a=1&c=3", norm), "/p?a=1&b=2&c=3") ==
0);
assertf(strcmp(fil_normalized("/a//b", norm), "/a/b") == 0);
// "//" is collapsed only before the query; inside the query it is kept
assertf(strcmp(fil_normalized("/a//b?x=c//d", norm), "/a/b?x=c//d") == 0);
}
// give_mimext(): mime type -> file extension, bounded into the caller buffer.
// Returns 1 when an extension was written, 0 otherwise.
{
char ext[16];
assertf(give_mimext(ext, sizeof(ext), "image/gif") == 1);
assertf(strcmp(ext, "gif") == 0);
assertf(give_mimext(ext, sizeof(ext), "text/html") == 1);
assertf(strcmp(ext, "html") == 0);
assertf(give_mimext(ext, sizeof(ext), "no/such-mime-type") == 0);
assertf(ext[0] == '\0');
}
// convtolower(): lower-cases into the caller buffer (bounded by its size).
{
char low[64];
assertf(strcmp(convtolower(low, sizeof(low), "ABC/Def.HTML"),
"abc/def.html") == 0);
}
// cut_path(): splits a path into directory (with trailing '/') and basename,
// each bounded by its buffer size.
{
char path[256];
char pname[256];
{
char full[] = "/dir/sub/file.html";
cut_path(full, path, sizeof(path), pname, sizeof(pname));
assertf(strcmp(path, "/dir/sub/") == 0);
assertf(strcmp(pname, "file.html") == 0);
}
{ // a trailing slash is trimmed before the split
char full[] = "/dir/sub/";
cut_path(full, path, sizeof(path), pname, sizeof(pname));
assertf(strcmp(path, "/dir/") == 0);
assertf(strcmp(pname, "sub") == 0);
}
{ // a path of length <= 1 yields empty results
char full[] = "/";
cut_path(full, path, sizeof(path), pname, sizeof(pname));
assertf(path[0] == '\0' && pname[0] == '\0');
}
}
// get_httptype_sized(): a long MIME type (Office OOXML reaches 73 chars) is
// written whole into a contenttype-sized buffer; returns 1 on a match, 0 when
// flag==0 and nothing matched. Regression for the old contenttype[64]
// overflow.
{
httrackp *opt = hts_create_opt();
htsblk r; // write into the real struct field, not a stand-in
assertf(opt != NULL);
// a long MIME (Office OOXML reaches 73 chars) must fit htsblk.contenttype
// whole: a [64] field would make this bounded copy abort.
assertf(get_httptype_sized(opt, r.contenttype, sizeof(r.contenttype),
"deck.pptx", 0) == 1);
assertf(strcmp(r.contenttype,
"application/vnd.openxmlformats-officedocument."
"presentationml.presentation") == 0);
assertf(get_httptype_sized(opt, r.contenttype, sizeof(r.contenttype),
"x.gif", 0) == 1);
assertf(strcmp(r.contenttype, "image/gif") == 0);
// no extension and flag==0: nothing written, returns 0
assertf(get_httptype_sized(opt, r.contenttype, sizeof(r.contenttype),
"noextfile", 0) == 0);
assertf(r.contenttype[0] == '\0');
// no extension and flag==1: octet-stream fallback, returns 1
assertf(get_httptype_sized(opt, r.contenttype, sizeof(r.contenttype),
"noextfile", 1) == 1);
assertf(strcmp(r.contenttype, "application/octet-stream") == 0);
// a user --assume rule with an empty value matches but writes nothing:
// get_userhttptype returns 1 with the buffer empty, so get_httptype_sized
// must still report 0 (callers test the return like the old
// strnotempty(s)).
StringCopy(opt->mimedefs, "\ncgi=\n");
assertf(get_httptype_sized(opt, r.contenttype, sizeof(r.contenttype),
"/x.cgi", 0) == 0);
assertf(r.contenttype[0] == '\0');
StringCopy(opt->mimedefs, "\ncgi=text/html\n");
assertf(get_httptype_sized(opt, r.contenttype, sizeof(r.contenttype),
"/x.cgi", 0) == 1);
assertf(strcmp(r.contenttype, "text/html") == 0);
hts_free_opt(opt);
}
// adr_normalized_sized(): bounded host normalization (passthrough when
// already normal).
{
char n[HTS_URLMAXSIZE];
assertf(strcmp(adr_normalized_sized("example.com", n, sizeof(n)),
"example.com") == 0);
}
// standard_name(): builds "<name><md5?>.<ext>" into a bounded buffer. The md5
// is appended (4 chars) only when the URL has a query string (see url_md5),
// so test both; pin the structure (name + ext, lengths), not the md5 chars.
{
char b[HTS_URLMAXSIZE * 2];
const char *nom = "index.html"; // name part
const char *dot = nom + 5; // points at ".html"
size_t len;
// no query -> no md5: "index" + ".html"
standard_name(b, sizeof(b), dot, nom, "http://example.com/index.html", 0);
assertf(strcmp(b, "index.html") == 0);
// query -> 4 md5 chars between name and ext: "index" + md5(4) + ".html"
standard_name(b, sizeof(b), dot, nom, "http://example.com/index.html?v=1",
0);
len = strlen(b);
assertf(len == 5 + 4 + 5);
assertf(strncmp(b, "index", 5) == 0);
assertf(strcmp(b + len - 5, ".html") == 0);
// short names: name kept (<=8), the extension is clamped to 3 -> ".htm"
standard_name(b, sizeof(b), dot, nom, "http://example.com/index.html?v=1",
1);
len = strlen(b);
assertf(len == 5 + 4 + 4);
assertf(strcmp(b + len - 4, ".htm") == 0);
// short names with a >8-char name: the name is clamped to 8 ("indexpag")
{
const char *lnom = "indexpage.html";
const char *ldot = lnom + 9; // points at ".html"
standard_name(b, sizeof(b), ldot, lnom,
"http://example.com/indexpage.html?v=1", 1);
len = strlen(b);
assertf(len == 8 + 4 + 4);
assertf(strncmp(b, "indexpag", 8) == 0);
assertf(strcmp(b + len - 4, ".htm") == 0);
}
}
// longfile_to_83(): single-name 8-3 (mode 1) / ISO9660 (mode 2) conversion;
// uppercases, clamps the name (8 / 31) and the extension (3). It rewrites
// 'save' in place, so pass a mutable array.
{
char n83[256];
{
char save[] = "longfilename.html";
longfile_to_83(1, n83, sizeof(n83), save); // 8-3: name->8, ext->3
assertf(strcmp(n83, "LONGFILE.HTM") == 0);
}
{
char save[] = "longfilename.html";
longfile_to_83(2, n83, sizeof(n83), save); // ISO9660: name->31, ext->3
assertf(strcmp(n83, "LONGFILENAME.HTM") == 0);
}
{ // sanitization: leading '.'->'_', interior dots
char save[] = ".a b.c.d e"; // collapse to '_', spaces/specials -> '_'
// (only the last dot stays as the separator)
longfile_to_83(1, n83, sizeof(n83), save);
assertf(strcmp(n83, "_A_B_C.D_E") == 0);
}
}
// long_to_83(): per-segment 8-3 conversion of a whole path.
{
char n83[HTS_URLMAXSIZE * 2];
char save[] = "dir/longfilename.html";
long_to_83(1, n83, sizeof(n83), save);
assertf(strcmp(n83, "DIR/LONGFILE.HTM") == 0);
}
// lienrelatif(): relative path from the directory of curr_fil to link.
{
char s[HTS_URLMAXSIZE * 2];
// same directory -> just the basename
assertf(lienrelatif(s, sizeof(s), "dir/page.html", "dir/index.html") == 0);
assertf(strcmp(s, "page.html") == 0);
// link one level up -> a "../" prefix
assertf(lienrelatif(s, sizeof(s), "a.html", "dir/index.html") == 0);
assertf(strcmp(s, "../a.html") == 0);
}
}
/* Self-tests for the htssafe.h bounded string ops (driven by httrack -#8).
@@ -402,6 +598,7 @@ HTSEXT_API int hts_main2(int argc, char **argv, httrackp * opt) {
static int hts_main_internal(int argc, char **argv, httrackp * opt) {
char **x_argv = NULL; // Patch pour argv et argc: en cas de récupération de ligne de commande
char *x_argvblk = NULL; // (reprise ou update)
size_t x_argvblk_size = 0; // total capacity of x_argvblk
int x_ptr = 0; // offset
//
@@ -479,7 +676,8 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
*a = ' ';
/* equivalent to "empty parameter" */
if ((strcmp(argv[na], HTS_NOPARAM) == 0) || (strcmp(argv[na], HTS_NOPARAM2) == 0)) // (none)
strcpybuff(argv[na], "\"\"");
/* replacing "(none)"/"\"(none)\"" with "\"\"" always fits in place */
strlcpybuff(argv[na], "\"\"", strlen(argv[na]) + 1);
if (strncmp(argv[na], "-&", 2) == 0)
argv[na][1] = '%';
}
@@ -501,6 +699,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
htsmain_free();
return -1;
}
x_argvblk_size = (size_t) (current_size + 32768);
x_argvblk[0] = '\0';
x_ptr = 0;
@@ -522,7 +721,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
//
argv_url = 0; /* pour comptage */
//
cmdl_add(argv[0], x_argc, x_argv, x_argvblk, x_ptr);
cmdl_add(argv[0], x_argc, x_argv, x_argvblk, x_argvblk_size, x_ptr);
na = 1; /* commencer après nom_prg */
while(na < argc) {
int result = 1;
@@ -543,9 +742,10 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
}
/* Copier */
cmdl_add(tmp_argv[0], x_argc, x_argv, x_argvblk, x_ptr);
cmdl_add(tmp_argv[0], x_argc, x_argv, x_argvblk, x_argvblk_size, x_ptr);
if (tmp_argc > 1) {
cmdl_add(tmp_argv[1], x_argc, x_argv, x_argvblk, x_ptr);
cmdl_add(tmp_argv[1], x_argc, x_argv, x_argvblk, x_argvblk_size,
x_ptr);
}
/* Compter URLs et détecter -i,-q.. */
@@ -617,7 +817,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
char BIGSTK tempo[HTS_CDLMAXSIZE];
strcpybuff(tempo, argv[na] + 1);
if (tempo[strlen(tempo) - 1] != '"') {
if (tempo[0] == '\0' || tempo[strlen(tempo) - 1] != '"') {
char BIGSTK s[HTS_CDLMAXSIZE];
sprintf(s, "Missing quote in %s", argv[na]);
@@ -626,7 +826,9 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
return -1;
}
tempo[strlen(tempo) - 1] = '\0';
strcpybuff(argv[na], tempo);
/* tempo is argv[na] minus its surrounding quotes, so it fits in place
*/
strlcpybuff(argv[na], tempo, strlen(argv[na]) + 1);
}
if (cmdl_opt(argv[na])) { // option
@@ -791,7 +993,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
if (strnotempty(lastp)) {
insert_after_argc = argc - insert_after;
cmdl_ins(lastp, insert_after_argc, (argv + insert_after), x_argvblk,
x_ptr);
x_argvblk_size, x_ptr);
argc = insert_after_argc + insert_after;
insert_after++;
}
@@ -911,7 +1113,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
if (argv[i][0] == '-') {
if (argv[i][1] == '-') { // --xxx
if ((strfield2(argv[i] + 2, "clean")) || (strfield2(argv[i] + 2, "tide"))) { // nettoyer
strcpybuff(argv[i] + 1, "");
argv[i][1] = '\0';
if (fexist
(fconcat
(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt), StringBuff(opt->path_log), "hts-log.txt")))
@@ -1020,7 +1222,8 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
//
} else if (strfield2(argv[i] + 2, "catchurl")) { // capture d'URL via proxy temporaire!
argv_url = 1; // forcer a passer les parametres
strcpybuff(argv[i] + 1, "#P");
/* argv[i] is "--catchurl"; "#P" fits after its first char */
strlcpybuff(argv[i] + 1, "#P", strlen(argv[i] + 1) + 1);
//
} else if (strfield2(argv[i] + 2, "updatehttrack")) {
#ifdef _WIN32
@@ -1348,7 +1551,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
char BIGSTK tempo[HTS_CDLMAXSIZE + 256];
strcpybuff(tempo, argv[na] + 1);
if (tempo[strlen(tempo) - 1] != '"') {
if (tempo[0] == '\0' || tempo[strlen(tempo) - 1] != '"') {
char s[HTS_CDLMAXSIZE + 256];
sprintf(s, "Missing quote in %s", argv[na]);
@@ -1357,7 +1560,9 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
return -1;
}
tempo[strlen(tempo) - 1] = '\0';
strcpybuff(argv[na], tempo);
/* tempo is argv[na] minus its surrounding quotes, so it fits in place
*/
strlcpybuff(argv[na], tempo, strlen(argv[na]) + 1);
}
if (cmdl_opt(argv[na])) { // option
@@ -2598,15 +2803,12 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
// initialiser mimedefs
//get_userhttptype(opt,1,opt->mimedefs,NULL);
// check
mime[0] = '\0';
get_httptype(opt, mime, argv[na + 1], 0);
if (mime[0] != '\0') {
if (get_httptype_sized(opt, mime, sizeof(mime), argv[na + 1],
0)) {
char ext[256];
printf("%s is '%s'\n", argv[na + 1], mime);
ext[0] = '\0';
give_mimext(ext, mime);
if (ext[0]) {
if (give_mimext(ext, sizeof(ext), mime)) {
printf("and its local type is '.%s'\n", ext);
}
} else {
@@ -3019,7 +3221,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
if (urlSize < HTS_URLMAXSIZE) {
ensureUrlCapacity(url, url_sz, capa);
if (strnotempty(url))
strcatbuff(url, " "); // espace de séparation
strlcatbuff(url, " ", url_sz); // separator space
append_escape_spc_url(unescape_http_unharm(catbuff, sizeof(catbuff), argv[na], 1), url, url_sz);
}
} // if argv=- etc.

View File

@@ -197,10 +197,13 @@ Please visit our Website: http://www.httrack.com
#endif
/* Taille max d'une URL */
/* Max URL length */
#define HTS_URLMAXSIZE 1024
/* Taille max ligne de commande (>=HTS_URLMAXSIZE*2) */
/* Max command-line length (>=HTS_URLMAXSIZE*2) */
#define HTS_CDLMAXSIZE 1024
/* MIME-type buffer contract (htsblk.contenttype/charset/contentencoding); holds
the longest registered MIME type, the Office OOXML ones reaching 73 chars */
#define HTS_MIMETYPE_SIZE 128
/* Copyright (C) 1998 Xavier Roche and other contributors */
#define HTTRACK_AFF_AUTHORS "[XR&CO'2014]"
@@ -250,6 +253,22 @@ Please visit our Website: http://www.httrack.com
#endif
#endif
/**
* Mark a function deprecated, with a message pointing at the replacement.
* Placed before the declaration so both the GCC/Clang attribute and the MSVC
* __declspec sit in a position both accept. Degrades to nothing elsewhere.
*/
#if defined(__GNUC__) && \
(__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 5))
#define HTS_DEPRECATED(msg) __attribute__((deprecated(msg)))
#elif defined(__GNUC__)
#define HTS_DEPRECATED(msg) __attribute__((deprecated))
#elif defined(_MSC_VER) && (_MSC_VER >= 1400)
#define HTS_DEPRECATED(msg) __declspec(deprecated(msg))
#else
#define HTS_DEPRECATED(msg)
#endif
#ifndef HTS_LONGLONG
#ifdef HTS_NO_64_BIT
#define HTS_LONGLONG 0

View File

@@ -76,7 +76,7 @@ static coucal_key key_duphandler(void *arg, coucal_key_const name) {
/* Key sav hashes are using case-insensitive version */
static coucal_hashkeys key_sav_hashes(void *arg, coucal_key_const key) {
hash_struct *const hash = (hash_struct*) arg;
convtolower(hash->catbuff, (const char*) key);
convtolower(hash->catbuff, sizeof(hash->catbuff), (const char *) key);
return coucal_hash_string(hash->catbuff);
}

View File

@@ -472,9 +472,8 @@ static int tris(httrackp * opt, char *buffer) {
{
char type[256];
type[0] = '\0';
get_httptype(opt, type, buffer, 0);
if (strnotempty(type)) // type reconnu!
if (get_httptype_sized(opt, type, sizeof(type), buffer,
0)) // recognized type
return 1;
// ajout RX 05/2001
else if (is_dyntype(get_ext(catbuff, sizeof(catbuff), buffer))) // asp,cgi...

View File

@@ -754,7 +754,8 @@ T_SOC http_xfopen(httrackp * opt, int mode, int treat, int waitconnect,
if (soc != INVALID_SOCKET) {
retour->statuscode = HTTP_OK; // OK
strcpybuff(retour->msg, "OK");
guess_httptype(opt, retour->contenttype, fil);
guess_httptype_sized(opt, retour->contenttype,
sizeof(retour->contenttype), fil);
} else if (strnotempty(retour->msg) == 0)
strcpybuff(retour->msg, "Unable to open local file");
return soc; // renvoyer
@@ -1530,8 +1531,9 @@ void treathead(t_cookie * cookie, const char *adr, const char *fil, htsblk * ret
if (retour->location) {
while(is_realspace(*(rcvd + p)))
p++; // sauter espaces
if ((int) strlen(rcvd + p) < HTS_URLMAXSIZE) // pas trop long?
strcpybuff(retour->location, rcvd + p);
if ((int) strlen(rcvd + p) < HTS_URLMAXSIZE) // not too long?
/* location aliases location_buffer[HTS_URLMAXSIZE * 2] */
strlcpybuff(retour->location, rcvd + p, HTS_URLMAXSIZE * 2);
else // erreur.. ignorer
retour->location[0] = '\0';
}
@@ -3444,16 +3446,17 @@ HTSEXT_API char *fil_normalized(const char *source, char *dest) {
/* Replace query by sorted query */
copyBuff = malloct(qLen + 1);
assertf(copyBuff != NULL);
copyBuff[0] = '\0';
for(i = 0; i < ampargs; i++) {
if (i == 0)
strcatbuff(copyBuff, "?");
else
strcatbuff(copyBuff, "&");
strcatbuff(copyBuff, amps[i] + 1);
{
htsbuff cb = htsbuff_ptr(copyBuff, qLen + 1);
for (i = 0; i < ampargs; i++) {
htsbuff_cat(&cb, i == 0 ? "?" : "&");
htsbuff_cat(&cb, amps[i] + 1);
}
assertf(cb.len == qLen);
}
assertf(strlen(copyBuff) == qLen);
strcpybuff(query, copyBuff);
/* query points into dest where the original qLen-byte query was */
strlcpybuff(query, copyBuff, qLen + 1);
/* Cleanup */
freet(amps);
@@ -3464,12 +3467,19 @@ HTSEXT_API char *fil_normalized(const char *source, char *dest) {
}
#define endwith(a) ( (len >= (sizeof(a)-1)) ? ( strncmp(dest, a+len-(sizeof(a)-1), sizeof(a)-1) == 0 ) : 0 );
HTSEXT_API char *adr_normalized(const char *source, char *dest) {
HTSEXT_API char *adr_normalized_sized(const char *source, char *dest,
size_t destsize) {
/* not yet too aggressive (no com<->net<->org checkings) */
strcpybuff(dest, jump_normalized_const(source));
strlcpybuff(dest, jump_normalized_const(source), destsize);
return dest;
}
// deprecated variant; kept for ABI compatibility. Bounds to the implicit
// contract the old callers relied on (an HTS_URLMAXSIZE*2 URL buffer).
HTSEXT_API char *adr_normalized(const char *source, char *dest) {
return adr_normalized_sized(source, dest, HTS_URLMAXSIZE * 2);
}
#undef endwith
// find port (:80) or NULL if not found
@@ -3894,9 +3904,9 @@ HTSEXT_API size_t escape_for_html_print_full(const char *const s, char *const de
#undef ADD_CHAR
// conversion minuscules, avec buffer
char *convtolower(char *catbuff, const char *a) {
strcpybuff(catbuff, a);
// lower-case conversion into caller buffer (capacity catbuffsize)
char *convtolower(char *catbuff, size_t catbuffsize, const char *a) {
strlcpybuff(catbuff, a, catbuffsize);
hts_lowcase(catbuff); // lower case
return catbuff;
}
@@ -3919,22 +3929,34 @@ void hts_replace(char *s, char from, char to) {
}
}
// deviner type d'un fichier local..
// ex: fil="toto.gif" -> s="image/gif"
void guess_httptype(httrackp * opt, char *s, const char *fil) {
get_httptype(opt, s, fil, 1);
// guess a local file's mime type (e.g. fil="toto.gif" -> s="image/gif")
// returns 1 if a type was written to s, 0 otherwise
int guess_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil) {
return get_httptype_sized(opt, s, ssize, fil, 1);
}
// idem
// flag: 1 si toujours renvoyer un type
HTSEXT_API void get_httptype(httrackp * opt, char *s, const char *fil, int flag) {
// userdef overrides get_httptype
// deprecated variant; kept for ABI compatibility. Bounds to the implicit
// contract the old callers relied on (a contenttype-sized buffer).
void guess_httptype(httrackp * opt, char *s, const char *fil) {
(void) get_httptype_sized(opt, s, HTS_MIMETYPE_SIZE, fil, 1);
}
// write the mime type for fil into s (capacity ssize)
// flag: 1 to always return a type (the "application/..." / octet-stream
// fallback) returns 1 if a type was written to s, 0 otherwise
HTSEXT_API int get_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil, int flag) {
// userdef overrides get_httptype (a rule with an empty value, e.g. "--assume
// cgi=", matches but writes nothing: report it as "no type" like the old
// code, whose callers tested strnotempty(s))
if (get_userhttptype(opt, s, fil)) {
return;
return s[0] != '\0';
}
// regular tests
if (ishtml(opt, fil) == 1) {
strcpybuff(s, "text/html");
strlcpybuff(s, "text/html", ssize);
return 1;
} else {
/* Check html -> text/html */
const char *a = fil + strlen(fil) - 1;
@@ -3947,21 +3969,33 @@ HTSEXT_API void get_httptype(httrackp * opt, char *s, const char *fil, int flag)
a++;
while(strnotempty(hts_mime[j][1])) {
if (strfield2(hts_mime[j][1], a)) {
if (hts_mime[j][0][0] != '*') { // Une correspondance existe
strcpybuff(s, hts_mime[j][0]);
return;
if (hts_mime[j][0][0] != '*') { // a match exists
strlcpybuff(s, hts_mime[j][0], ssize);
return 1;
}
}
j++;
}
if (flag)
sprintf(s, "application/%s", a);
if (flag) {
snprintf(s, ssize, "application/%s", a);
return 1;
}
} else {
if (flag)
strcpybuff(s, "application/octet-stream");
if (flag) {
strlcpybuff(s, "application/octet-stream", ssize);
return 1;
}
}
}
return 0;
}
// deprecated variant; kept for ABI compatibility. Bounds to the implicit
// contract the old callers relied on (a contenttype-sized buffer).
HTSEXT_API void get_httptype(httrackp *opt, char *s, const char *fil,
int flag) {
(void) get_httptype_sized(opt, s, HTS_MIMETYPE_SIZE, fil, flag);
}
// get type of fil (php)
@@ -4071,17 +4105,17 @@ int get_userhttptype(httrackp * opt, char *s, const char *fil) {
return 0;
}
// renvoyer extesion d'un type mime..
// ex: "image/gif" -> gif
void give_mimext(char *s, const char *st) {
// give the file extension for a mime type (e.g. "image/gif" -> "gif")
// returns 1 if an extension was found (and written to s), 0 otherwise
int give_mimext(char *s, size_t ssize, const char *st) {
int ok = 0;
int j = 0;
s[0] = '\0';
while((!ok) && (strnotempty(hts_mime[j][1]))) {
if (strfield2(hts_mime[j][0], st)) {
if (hts_mime[j][1][0] != '*') { // Une correspondance existe
strcpybuff(s, hts_mime[j][1]);
if (hts_mime[j][1][0] != '*') { // a match exists
strlcpybuff(s, hts_mime[j][1], ssize);
ok = 1;
}
}
@@ -4102,12 +4136,13 @@ void give_mimext(char *s, const char *st) {
if (a) {
if ((int) strlen(a) >= 1) {
if ((int) strlen(a) <= 4) {
strcpybuff(s, a);
strlcpybuff(s, a, ssize);
ok = 1;
}
}
}
}
return ok;
}
// extension connue?..
@@ -4205,9 +4240,8 @@ int may_bogus_multiple(httrackp * opt, const char *mime, const char *filename) {
if (strfield2(hts_mime_bogus_multiple[j], mime)) { /* found mime type in suspicious list */
char ext[64];
ext[0] = '\0';
give_mimext(ext, mime);
if (ext[0] != 0) { /* we have an extension for that */
if (give_mimext(ext, sizeof(ext),
mime)) { /* we have an extension for that */
const size_t ext_size = strlen(ext);
const char *file = strrchr(filename, '/'); /* fetch terminal filename */
@@ -4930,7 +4964,8 @@ void hts_freeall(void) {
// cut path and project name
// patch also initial path
void cut_path(char *fullpath, char *path, char *pname) {
void cut_path(char *fullpath, char *path, size_t path_size, char *pname,
size_t pname_size) {
path[0] = pname[0] = '\0';
if (strnotempty(fullpath)) {
if ((fullpath[strlen(fullpath) - 1] == '/')
@@ -4946,8 +4981,8 @@ void cut_path(char *fullpath, char *path, char *pname) {
a--;
if (*a == '/')
a++;
strcpybuff(pname, a);
strncatbuff(path, fullpath, (int) (a - fullpath));
strlcpybuff(pname, a, pname_size);
strlncatbuff(path, fullpath, path_size, (size_t) (a - fullpath));
}
}
}

View File

@@ -252,7 +252,7 @@ int ishtml_ext(const char *a);
int ishttperror(int err);
int get_userhttptype(httrackp * opt, char *s, const char *fil);
void give_mimext(char *s, const char *st);
int give_mimext(char *s, size_t ssize, const char *st);
int may_bogus_multiple(httrackp * opt, const char *mime, const char *filename);
int may_unknown2(httrackp * opt, const char *mime, const char *filename);
@@ -264,7 +264,7 @@ void code64(unsigned char *a, int size_a, unsigned char *b, int crlf);
#define copychar(catbuff,a) concat(catbuff,(a),NULL)
char *convtolower(char *catbuff, const char *a);
char *convtolower(char *catbuff, size_t catbuffsize, const char *a);
void hts_lowcase(char *s);
void hts_replace(char *s, char from, char to);
int multipleStringMatch(const char *s, const char *match);
@@ -276,7 +276,8 @@ void fprintfio(FILE * fp, const char *buff, const char *prefix);
int sig_ignore_flag(int setflag); // flag ignore
#endif
void cut_path(char *fullpath, char *path, char *pname);
void cut_path(char *fullpath, char *path, size_t path_size, char *pname,
size_t pname_size);
int fexist(const char *s);
int fexist_utf8(const char *s);
@@ -499,7 +500,8 @@ HTS_STATIC int is_hypertext_mime(httrackp * opt, const char *mime,
char guessed[256];
guessed[0] = '\0';
guess_httptype(opt, guessed, file);
if (!guess_httptype_sized(opt, guessed, sizeof(guessed), file))
return 0;
return is_hypertext_mime__(guessed);
}
return 0;
@@ -514,7 +516,8 @@ HTS_STATIC int may_be_hypertext_mime(httrackp * opt, const char *mime,
char guessed[256];
guessed[0] = '\0';
guess_httptype(opt, guessed, file);
if (!guess_httptype_sized(opt, guessed, sizeof(guessed), file))
return 0;
return may_be_hypertext_mime__(guessed);
}
return 0;
@@ -529,7 +532,8 @@ HTS_STATIC int compare_mime(httrackp * opt, const char *mime, const char *file,
char guessed[256];
guessed[0] = '\0';
guess_httptype(opt, guessed, file);
if (!guess_httptype_sized(opt, guessed, sizeof(guessed), file))
return 0;
return strfield2(guessed, reference);
}
return 0;

View File

@@ -51,12 +51,13 @@ Please visit our Website: http://www.httrack.com
url_savename_addstr(afs->save, buff);\
}
#define ADD_STANDARD_NAME(shortname) \
{ /* ajout nom */\
char BIGSTK buff[HTS_URLMAXSIZE*2];\
standard_name(buff,dot_pos,nom_pos,fil_complete,(shortname));\
url_savename_addstr(afs->save, buff);\
}
#define ADD_STANDARD_NAME(shortname) \
{ /* add name */ \
char BIGSTK buff[HTS_URLMAXSIZE * 2]; \
standard_name(buff, sizeof(buff), dot_pos, nom_pos, fil_complete, \
(shortname)); \
url_savename_addstr(afs->save, buff); \
}
/* Avoid stupid DOS system folders/file such as 'nul' */
/* Based on linux/fs/umsdos/mangle.c */
@@ -200,7 +201,7 @@ int url_savename(lien_adrfilsave *const afs,
// foo.com/bar//foobar -> foo.com/bar/foobar
if (opt->urlhack) {
// copy of adr (without protocol), used for lookups (see urlhack)
normadr = adr_normalized(adr, normadr_);
normadr = adr_normalized_sized(adr, normadr_, sizeof(normadr_));
normfil = fil_normalized(fil_complete, normfil_);
} else {
if (link_has_authority(adr_complete)) { // https or other protocols : in "http/" subfolder
@@ -344,8 +345,7 @@ int url_savename(lien_adrfilsave *const afs,
mime[0] = ext[0] = '\0';
get_userhttptype(opt, mime, fil);
if (strnotempty(mime)) {
give_mimext(ext, mime);
if (strnotempty(ext)) {
if (give_mimext(ext, sizeof(ext), mime)) {
ext_chg = 1;
}
}
@@ -378,8 +378,8 @@ int url_savename(lien_adrfilsave *const afs,
ext_chg = 2; /* change filename */
strcpybuff(ext, r.cdispo);
} else if (!may_unknown2(opt, r.contenttype, fil)) { // on peut patcher à priori?
give_mimext(s, r.contenttype); // obtenir extension
if (strnotempty(s) > 0) { // on a reconnu l'extension
if (give_mimext(s, sizeof(s),
r.contenttype)) { // recognized extension
ext_chg = 1;
strcpybuff(ext, s);
}
@@ -403,8 +403,7 @@ int url_savename(lien_adrfilsave *const afs,
mime[0] = ext[0] = '\0';
get_userhttptype(opt, mime, fil);
if (strnotempty(mime)) {
give_mimext(ext, mime);
if (strnotempty(ext)) {
if (give_mimext(ext, sizeof(ext), mime)) {
ext_chg = 1;
}
}
@@ -420,9 +419,9 @@ int url_savename(lien_adrfilsave *const afs,
strcpybuff(ext, headers->r.cdispo);
} else if (!may_unknown2(opt, headers->r.contenttype, headers->url_fil)) { // on peut patcher à priori? (pas interdit ou pas de type)
char s[16];
s[0] = '\0';
give_mimext(s, headers->r.contenttype); // obtenir extension
if (strnotempty(s) > 0) { // on a reconnu l'extension
if (give_mimext(
s, sizeof(s),
headers->r.contenttype)) { // recognized extension
ext_chg = 1;
strcpybuff(ext, s);
}
@@ -431,13 +430,14 @@ int url_savename(lien_adrfilsave *const afs,
else if (mime_type != NULL) {
ext[0] = '\0';
if (*mime_type) {
give_mimext(ext, mime_type);
give_mimext(ext, sizeof(ext), mime_type);
}
if (strnotempty(ext)) {
char mime_from_file[128];
mime_from_file[0] = 0;
get_httptype(opt, mime_from_file, fil, 1);
get_httptype_sized(opt, mime_from_file, sizeof(mime_from_file),
fil, 1);
if (!strnotempty(mime_from_file) || strcasecmp(mime_type, mime_from_file) != 0) { /* different mime for this type */
/* type change not forbidden (or no extension at all) */
if (!may_unknown2(opt, mime_type, fil)) {
@@ -646,8 +646,9 @@ int url_savename(lien_adrfilsave *const afs,
ext_chg = 2; /* change filename */
strcpybuff(ext, back[b].r.cdispo);
} else if (!may_unknown2(opt, back[b].r.contenttype, back[b].url_fil)) { // on peut patcher à priori? (pas interdit ou pas de type)
give_mimext(s, back[b].r.contenttype); // obtenir extension
if (strnotempty(s) > 0) { // on a reconnu l'extension
if (give_mimext(
s, sizeof(s),
back[b].r.contenttype)) { // recognized extension
ext_chg = 1;
strcpybuff(ext, s);
}
@@ -924,7 +925,7 @@ int url_savename(lien_adrfilsave *const afs,
pth[0] = n83[0] = '\0';
strncatbuff(pth, fil, (int) (nom_pos - fil) - 1);
long_to_83(opt->savename_83, n83, pth);
long_to_83(opt->savename_83, n83, sizeof(n83), pth);
htsbuff_cat(&sb, n83);
}
}
@@ -1306,7 +1307,7 @@ int url_savename(lien_adrfilsave *const afs,
if (opt->savename_83) {
char BIGSTK n83[HTS_URLMAXSIZE * 2];
long_to_83(opt->savename_83, n83, afs->save);
long_to_83(opt->savename_83, n83, sizeof(n83), afs->save);
strcpybuff(afs->save, n83);
}
// enforce stricter ISO9660 compliance (bug reported by Steffo Carlsson)
@@ -1377,7 +1378,9 @@ int url_savename(lien_adrfilsave *const afs,
if (lastDot == NULL) {
strcatbuff(afs->save, "." DELAYED_EXT);
} else if (!IS_DELAYED_EXT(afs->save)) {
strcatbuff(lastDot, "." DELAYED_EXT);
/* lastDot points within afs->save; bound by the remaining capacity */
strlcatbuff(lastDot, "." DELAYED_EXT,
sizeof(afs->save) - (size_t) (lastDot - afs->save));
}
}
// enforce 260-character path limit before inserting destination path
@@ -1582,41 +1585,41 @@ int url_savename(lien_adrfilsave *const afs,
return 0;
}
/* nom avec md5 urilisé partout */
void standard_name(char *b, const char *dot_pos, const char *nom_pos, const char *fil,
int short_ver) {
/* md5-based name used everywhere; builds into b (capacity bsize) */
void standard_name(char *b, size_t bsize, const char *dot_pos,
const char *nom_pos, const char *fil, int short_ver) {
char md5[32 + 2];
htsbuff bb = htsbuff_ptr(b, bsize);
b[0] = '\0';
/* Nom */
/* Name */
if (dot_pos) {
if (!short_ver) // Noms longs
strncatbuff(b, nom_pos, (dot_pos - nom_pos));
if (!short_ver) // long names
htsbuff_catn(&bb, nom_pos, (size_t) (dot_pos - nom_pos));
else
strncatbuff(b, nom_pos, min(dot_pos - nom_pos, 8));
htsbuff_catn(&bb, nom_pos, (size_t) min(dot_pos - nom_pos, 8));
} else {
if (!short_ver) // Noms longs
strcatbuff(b, nom_pos);
if (!short_ver) // long names
htsbuff_cat(&bb, nom_pos);
else
strncatbuff(b, nom_pos, 8);
htsbuff_catn(&bb, nom_pos, 8);
}
/* MD5 - 16 bits */
strncatbuff(b, url_md5(md5, fil), 4);
htsbuff_catn(&bb, url_md5(md5, fil), 4);
/* Ext */
if (dot_pos) {
strcatbuff(b, ".");
if (!short_ver) // Noms longs
strcatbuff(b, dot_pos + 1);
htsbuff_catc(&bb, '.');
if (!short_ver) // long names
htsbuff_cat(&bb, dot_pos + 1);
else
strncatbuff(b, dot_pos + 1, 3);
htsbuff_catn(&bb, dot_pos + 1, 3);
}
// Allow extensionless
#ifdef DO_NOT_ALLOW_EXTENSIONLESS
else {
if (!short_ver) // Noms longs
strcatbuff(b, DEFAULT_EXT);
if (!short_ver) // long names
htsbuff_cat(&bb, DEFAULT_EXT);
else
strcatbuff(b, DEFAULT_EXT_SHORT);
htsbuff_cat(&bb, DEFAULT_EXT_SHORT);
}
#endif
}

View File

@@ -96,8 +96,8 @@ int url_savename(lien_adrfilsave *const afs,
httrackp * opt, struct_back * sback, cache_back * cache,
hash_struct * hash, int ptr, int numero_passe,
const lien_back * headers);
void standard_name(char *b, const char *dot_pos, const char *nom_pos,
const char *fil_complete,
void standard_name(char *b, size_t bsize, const char *dot_pos,
const char *nom_pos, const char *fil_complete,
int short_ver);
void url_savename_addstr(char *d, const char *s);
char *url_md5(char *digest_buffer, const char *fil_complete);

View File

@@ -499,9 +499,9 @@ struct htsblk {
FILE *out; // écriture directe sur disque (si is_write=1)
LLint size; // taille fichier
char msg[80]; // message éventuel si échec ("\0"=non précisé)
char contenttype[64]; // content-type ("text/html" par exemple)
char charset[64]; // charset ("iso-8859-1" par exemple)
char contentencoding[64]; // content-encoding ("gzip" par exemple)
char contenttype[HTS_MIMETYPE_SIZE]; // content-type (e.g. "text/html")
char charset[HTS_MIMETYPE_SIZE]; // charset (e.g. "iso-8859-1")
char contentencoding[HTS_MIMETYPE_SIZE]; // content-encoding (e.g. "gzip")
char *location; // on copie dedans éventuellement la véritable 'location'
LLint totalsize; // taille totale à télécharger (-1=inconnue)
short int is_file; // ce n'est pas une socket mais un descripteur de fichier si 1

View File

@@ -610,11 +610,11 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
b = strchr(a, '<'); // prochain tag
}
}
if (lienrelatif
(tempo, heap(ptr)->sav,
concat(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt),
StringBuff(opt->path_html_utf8),
"index.html")) == 0) {
if (lienrelatif(tempo, sizeof(tempo), heap(ptr)->sav,
concat(OPT_GET_BUFF(opt),
OPT_GET_BUFF_SIZE(opt),
StringBuff(opt->path_html_utf8),
"index.html")) == 0) {
detect_title = 1; // ok détecté pour cette page!
makeindex_links++; // un de plus
strcpybuff(makeindex_firstlink, tempo);
@@ -1649,8 +1649,9 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
}
// Prendre si extension reconnue
if (!url_ok) {
get_httptype(opt, type, tempo, 0);
if (strnotempty(type)) // type reconnu!
if (get_httptype_sized(opt, type,
sizeof(type), tempo,
0)) // recognized type
url_ok = 1;
else if (is_dyntype(get_ext(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt), tempo))) // reconnu php,cgi,asp..
url_ok = 1;
@@ -2719,7 +2720,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
strcpybuff(save, StringBuff(opt->path_html_utf8));
strcatbuff(save, cat_name);
if (lienrelatif(tempo, save, relativesavename()) == 0) {
if (lienrelatif(tempo, sizeof(tempo), save,
relativesavename()) == 0) {
/* Never escape high-chars (we don't know the encoding!!) */
inplace_escape_uri_utf(tempo, sizeof(tempo)); // escape with %xx
//if (!no_esc_utf)
@@ -2949,7 +2951,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
tempo[0] = '\0';
// calculer le lien relatif
if (lienrelatif(tempo, afs.save, relativesavename()) == 0) {
if (lienrelatif(tempo, sizeof(tempo), afs.save,
relativesavename()) == 0) {
if (!in_media) { // In media (such as real audio): don't patch
/* Never escape high-chars (we don't know the encoding!!) */
inplace_escape_uri_utf(tempo, sizeof(tempo));
@@ -3507,9 +3510,9 @@ int hts_mirror_check_moved(htsmoduleStruct * str,
char BIGSTK pn_adr[HTS_URLMAXSIZE * 2], pn_fil[HTS_URLMAXSIZE * 2];
n_adr[0] = n_fil[0] = '\0';
(void) adr_normalized(moved->adr, n_adr);
(void) adr_normalized_sized(moved->adr, n_adr, sizeof(n_adr));
(void) fil_normalized(moved->fil, n_fil);
(void) adr_normalized(urladr(), pn_adr);
(void) adr_normalized_sized(urladr(), pn_adr, sizeof(pn_adr));
(void) fil_normalized(urlfil(), pn_fil);
if (strcasecmp(n_adr, pn_adr) == 0
&& strcasecmp(n_fil, pn_fil) == 0) {

View File

@@ -237,6 +237,15 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__)
/**
* Append at most "N" characters of "B" to "A", "A" having a maximum capacity
* of "S".
*/
#define strlncatbuff(A, B, S, N) \
strncat_safe_(A, S, B, HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
N, "overflow while appending '" #B "' to '" #A "'", __FILE__, \
__LINE__)
/**
* Copy characters of "B" to "A", "A" having a maximum capacity of "S".
*/

View File

@@ -274,7 +274,9 @@ int ident_url_relatif(const char *lien, const char *origin_adr,
char *const idna = hts_convertStringUTF8ToIDNA(a, strlen(a));
if (idna != NULL) {
if (strlen(idna) < HTS_URLMAXSIZE) {
strcpybuff(a, idna);
/* a points within adrfil->adr; bound by the remaining capacity */
strlcpybuff(a, idna,
sizeof(adrfil->adr) - (size_t) (a - adrfil->adr));
}
free(idna);
}
@@ -286,7 +288,7 @@ int ident_url_relatif(const char *lien, const char *origin_adr,
// créer dans s, à partir du chemin courant curr_fil, le lien vers link (absolu)
// un ident_url_relatif a déja été fait avant, pour que link ne soit pas un chemin relatif
int lienrelatif(char *s, const char *link, const char *curr_fil) {
int lienrelatif(char *s, size_t ssize, const char *link, const char *curr_fil) {
char BIGSTK _curr[HTS_URLMAXSIZE * 2];
char BIGSTK newcurr_fil[HTS_URLMAXSIZE * 2], newlink[HTS_URLMAXSIZE * 2];
char *curr;
@@ -314,9 +316,9 @@ int lienrelatif(char *s, const char *link, const char *curr_fil) {
}
}
// recopier uniquement le chemin courant
// copy only the current path
curr = _curr;
strcpybuff(curr, curr_fil);
strlcpybuff(curr, curr_fil, sizeof(_curr));
if ((a = strchr(curr, '?')) == NULL) // couper au ? (params)
a = curr + strlen(curr) - 1; // pas de params: aller à la fin
while((*a != '/') && (a > curr))
@@ -359,14 +361,14 @@ int lienrelatif(char *s, const char *link, const char *curr_fil) {
a++;
while(*a)
if (*(a++) == '/')
strcatbuff(s, "../");
strlcatbuff(s, "../", ssize);
//if (strlen(s)==0) strcatbuff(s,"/");
if (slash)
strcatbuff(s, "/"); // garder absolu!!
strlcatbuff(s, "/", ssize); // keep it absolute!
// on est dans le répertoire de départ, copier
strcatbuff(s, link + ((*link == '/') ? 1 : 0));
// we are in the starting directory, copy
strlcatbuff(s, link + ((*link == '/') ? 1 : 0), ssize);
/* Security check */
if (strlen(s) >= HTS_URLMAXSIZE)
@@ -410,7 +412,7 @@ int link_has_authorization(const char *lien) {
}
// conversion chemin de fichier/dossier vers 8-3 ou ISO9660
void long_to_83(int mode, char *n83, char *save) {
void long_to_83(int mode, char *n83, size_t n83size, char *save) {
n83[0] = '\0';
while(*save) {
@@ -425,19 +427,19 @@ void long_to_83(int mode, char *n83, char *save) {
}
fnl[j] = '\0';
// conversion
longfile_to_83(mode, fn83, fnl);
strcatbuff(n83, fn83);
longfile_to_83(mode, fn83, sizeof(fn83), fnl);
strlcatbuff(n83, fn83, n83size);
save += i;
if (*save == '/') {
strcatbuff(n83, "/");
strlcatbuff(n83, "/", n83size);
save++;
}
}
}
// conversion nom de fichier/dossier isolé vers 8-3 ou ISO9660
void longfile_to_83(int mode, char *n83, char *save) {
void longfile_to_83(int mode, char *n83, size_t n83size, char *save) {
int j = 0, max = 0;
int i = 0;
char nom[256];
@@ -526,10 +528,10 @@ void longfile_to_83(int mode, char *n83, char *save) {
}
// corriger vers 8-3
n83[0] = '\0';
strncatbuff(n83, nom, max);
strlncatbuff(n83, nom, n83size, max);
if (strnotempty(ext)) {
strcatbuff(n83, ".");
strncatbuff(n83, ext, 3);
strlcatbuff(n83, ".", n83size);
strlncatbuff(n83, ext, n83size, 3);
}
}

View File

@@ -61,11 +61,11 @@ typedef struct lien_adrfilsave lien_adrfilsave;
int ident_url_relatif(const char *lien, const char *origin_adr,
const char *origin_fil,
lien_adrfil* const adrfil);
int lienrelatif(char *s, const char *link, const char *curr);
int lienrelatif(char *s, size_t ssize, const char *link, const char *curr);
int link_has_authority(const char *lien);
int link_has_authorization(const char *lien);
void long_to_83(int mode, char *n83, char *save);
void longfile_to_83(int mode, char *n83, char *save);
void long_to_83(int mode, char *n83, size_t n83size, char *save);
void longfile_to_83(int mode, char *n83, size_t n83size, char *save);
HTS_INLINE int __rech_tageq(const char *adr, const char *s);
HTS_INLINE int __rech_tageqbegdigits(const char *adr, const char *s);
HTS_INLINE int rech_tageq_all(const char *adr, const char *s);

View File

@@ -223,8 +223,9 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
// note (up/down): on calcule à partir du lien primaire, ET du lien précédent.
// ex: si on descend 2 fois on peut remonter 1 fois
if (lienrelatif(tempo, fil, heap(heap(ptr)->premier)->fil) == 0) {
if (lienrelatif(tempo2, fil, heap(ptr)->fil) == 0) {
if (lienrelatif(tempo, sizeof(tempo), fil,
heap(heap(ptr)->premier)->fil) == 0) {
if (lienrelatif(tempo2, sizeof(tempo2), fil, heap(ptr)->fil) == 0) {
hts_log_print(opt, LOG_DEBUG,
"build relative links to test: %s %s (with %s and %s)",
tempo, tempo2, heap(heap(ptr)->premier)->fil,
@@ -326,8 +327,9 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
char BIGSTK tempo[HTS_URLMAXSIZE * 2];
char BIGSTK tempo2[HTS_URLMAXSIZE * 2];
if (lienrelatif(tempo, fil, heap(heap(ptr)->premier)->fil) == 0) {
if (lienrelatif(tempo2, fil, heap(ptr)->fil) == 0) {
if (lienrelatif(tempo, sizeof(tempo), fil,
heap(heap(ptr)->premier)->fil) == 0) {
if (lienrelatif(tempo2, sizeof(tempo2), fil, heap(ptr)->fil) == 0) {
} else {
hts_log_print(opt, LOG_ERROR,
"Error building relative link %s and %s", fil,
@@ -336,7 +338,6 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
} else {
hts_log_print(opt, LOG_ERROR, "Error building relative link %s and %s",
fil, heap(heap(ptr)->premier)->fil);
}
} // fin tester interdiction de monter

View File

@@ -207,6 +207,9 @@ HTSEXT_API const char *jump_normalized_const(const char *);
HTSEXT_API char *jump_toport(char *);
HTSEXT_API const char *jump_toport_const(const char *);
HTSEXT_API char *fil_normalized(const char *source, char *dest);
HTSEXT_API char *adr_normalized_sized(const char *source, char *dest,
size_t destsize);
HTS_DEPRECATED("use adr_normalized_sized(source, dest, destsize)")
HTSEXT_API char *adr_normalized(const char *source, char *dest);
HTSEXT_API const char *hts_rootdir(char *file);
@@ -244,6 +247,9 @@ HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size, co
HTSEXT_API char *antislash_unescaped(char *catbuff, const char *s);
HTSEXT_API void escape_remove_control(char *s);
HTSEXT_API int get_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil, int flag);
HTS_DEPRECATED("use get_httptype_sized(opt, s, ssize, fil, flag)")
HTSEXT_API void get_httptype(httrackp * opt, char *s, const char *fil,
int flag);
HTSEXT_API int is_knowntype(httrackp * opt, const char *fil);
@@ -251,6 +257,9 @@ HTSEXT_API int is_userknowntype(httrackp * opt, const char *fil);
HTSEXT_API int is_dyntype(const char *fil);
HTSEXT_API const char *get_ext(char *catbuff, size_t size, const char *fil);
HTSEXT_API int may_unknown(httrackp * opt, const char *st);
HTSEXT_API int guess_httptype_sized(httrackp *opt, char *s, size_t ssize,
const char *fil);
HTS_DEPRECATED("use guess_httptype_sized(opt, s, ssize, fil)")
HTSEXT_API void guess_httptype(httrackp * opt, char *s, const char *fil);
/* Ugly string tools */

View File

@@ -30,6 +30,17 @@ run() {
RC=$?
}
# crawl using exactly the given args as the only URL(s), no implicit primary URL;
# leaves the exit status in RC
run_only() {
local out="$1"
shift
rm -rf "$out"
mkdir -p "$out"
httrack -O "$out" --quiet -n "$@" >"$out/.log" 2>&1
RC=$?
}
# assert the value was accepted: clean exit and the fixture was mirrored
accepted() {
{ test "$RC" -eq 0 && test -n "$(find "$1" -type f -path '*/index.html' -print -quit)"; } ||
@@ -68,4 +79,15 @@ refused "#152: over-cap -F not refused cleanly"
run "$tmp/ov-l" --user-agent "$over"
refused "#152: over-cap --user-agent not refused cleanly"
# Quote handling on the sole URL (run_only, so the quoted arg is the only URL and
# can't be masked by an implicit one). A fully "-quoted URL has its surrounding
# quotes stripped in place and is mirrored; a dangling opening quote, and a lone
# quote (empty after the opening "), are refused cleanly and never crash.
run_only "$tmp/q-ok" "\"file://$tmp/index.html\""
accepted "$tmp/q-ok" "quoted URL not stripped/mirrored"
run_only "$tmp/q-bad" '"foo'
refused "dangling-quote argument not refused cleanly"
run_only "$tmp/q-lone" '"'
refused "lone-quote argument not refused cleanly"
exit 0