mirror of
https://github.com/xroche/httrack.git
synced 2026-06-16 23:33:18 +03:00
Compare commits
8 Commits
cleanup/ht
...
cleanup/ht
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
88bfcff10c | ||
|
|
1df45fc231 | ||
|
|
3a0f5779dd | ||
|
|
46fd973e0b | ||
|
|
ddc39b7dc0 | ||
|
|
085937b305 | ||
|
|
594820d3eb | ||
|
|
36a9f5a827 |
67
AGENTS.md
Normal file
67
AGENTS.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# AGENTS.md — working in the HTTrack tree
|
||||
|
||||
Policy and PR etiquette live in [CONTRIBUTING.md](CONTRIBUTING.md). This file is
|
||||
the operational checklist: toolchain, invariants, and how to ship a change.
|
||||
|
||||
## Build & test
|
||||
- Fresh clone first: `git submodule update --init src/coucal`
|
||||
- `bash configure && make && make check`
|
||||
|
||||
## Hard invariants
|
||||
- **Toolchain edit** (`configure.ac`, any `Makefile.am`, `m4/`) → run
|
||||
`autoreconf -fi` and commit the regenerated tracked files. The repo ships the
|
||||
generated `configure`/`Makefile.in` so users build without autotools; CI does
|
||||
**not** catch staleness.
|
||||
- **Format only changed lines** with `git clang-format` (clang-format 19). Never
|
||||
reformat untouched code: the engine was formatted by an old tool and won't
|
||||
round-trip.
|
||||
- **Byte-safe edits.** Files with raw high bytes are ISO-8859-1 (French
|
||||
comments). Edit them byte-wise (`perl -0pi`, `sed`), not through a tool that
|
||||
re-encodes to UTF-8 and corrupts them.
|
||||
|
||||
## Security (HTTrack parses hostile input off the network)
|
||||
- Bounds-check every copy. Overflow-safe form: put the untrusted value alone,
|
||||
`untrusted < limit - controlled` — never `controlled + untrusted < limit`,
|
||||
which can wrap and pass.
|
||||
|
||||
## Code & prose
|
||||
- Be terse. Comment the why, in English; translate French comments you touch.
|
||||
- Strip AI tells from prose (em-dash overuse, rule-of-three, filler, vague
|
||||
attributions). Ref: Wikipedia "Signs of AI writing". Claude Code: `/humanizer`.
|
||||
- Behavior change → add a test. Fast path: a hidden `httrack -#N` debug
|
||||
subcommand (`htscoremain.c`) driven by a `tests/NN_*.test`, over a slow crawl.
|
||||
|
||||
## Review your change adversarially (strongly suggested)
|
||||
Before pushing, and when reviewing others, don't skim for bugs:
|
||||
- **One invariant at a time.** Name a property the diff must preserve (bounds
|
||||
hold, cache/wire format unchanged, no use-after-free, ABI stable), then
|
||||
construct inputs that would break it. "General correctness" is not a charter.
|
||||
- **Audit tests against the spec, not the code.** For each new test ask: "what
|
||||
buggy path would still pass this?" If you can build one, the test is
|
||||
confirmation-biased: assertions copied from observed output lock bugs in.
|
||||
- **Risk areas need runtime probes.** Touching hostile-input parsing, struct
|
||||
layout/ABI, cache/wire format, or a security path? A static or unit check
|
||||
isn't enough; exercise the wrong behavior at runtime. Claude Code:
|
||||
`/review-recipe`.
|
||||
|
||||
## Commits
|
||||
- **Sign-off is mandatory.** Every commit carries a `Signed-off-by` trailer:
|
||||
`git commit -s` (DCO, CI-enforced — unsigned commits are rejected).
|
||||
- **Co-Authored-By is mandatory for AI-assisted commits.** Carry a
|
||||
`Co-Authored-By:` trailer naming the assistant. Attribute there, never in a
|
||||
PR-body footer.
|
||||
- PRs land as a merge commit; every commit on the branch goes onto master, so
|
||||
keep each commit message clean and meaningful.
|
||||
|
||||
## PR descriptions
|
||||
- Plain concise prose; lead with what changed and why. No What/Why/How template.
|
||||
- Title names the problem, not the implementation.
|
||||
- Don't restate the diff — give what it can't show: motivation, context,
|
||||
tradeoffs, risk.
|
||||
- Length tracks the change: a typo is one sentence; a security fix earns a writeup.
|
||||
- Verify claims against the code before you write them; flag drift, don't repeat it.
|
||||
- Don't hard-wrap (GitHub reflows). No "Generated with Claude" footer. Run the
|
||||
prose through `/humanizer`.
|
||||
|
||||
## Toolchain
|
||||
C · clang-format-19 · autoreconf · shfmt + shellcheck (shell) · black + flake8 (Python)
|
||||
@@ -1,12 +1,15 @@
|
||||
# Contributing to HTTrack
|
||||
|
||||
HTTrack is small and old. Keep changes easy to review and safe to merge.
|
||||
HTTrack is small and old. Keep changes easy to review and safe to merge. Working
|
||||
with an AI assistant? The operational checklist is [AGENTS.md](AGENTS.md).
|
||||
|
||||
## Pull requests
|
||||
|
||||
- One change per PR. Small diffs merge fast.
|
||||
- PRs are squash-merged: the title and description become the commit message, so
|
||||
explain *why*.
|
||||
- PRs land as a merge commit, so the branch's commits go onto master as-is: keep
|
||||
each commit message clean and explain *why*.
|
||||
- Be terse in the PR title and description: name the problem, not the fix, don't
|
||||
restate the diff, and calibrate length to the change.
|
||||
- Add or update tests for engine changes (`tests/`), and keep CI green.
|
||||
|
||||
## Style
|
||||
@@ -30,6 +33,9 @@ Welcome, and nothing to disclose. Two rules:
|
||||
- **Own every line** as if you wrote it. Can't explain it in review? Not ready.
|
||||
- **Don't push your work onto reviewers.** A raw generated patch a maintainer has
|
||||
to vet from scratch will be closed.
|
||||
- **Attribution is mandatory.** AI-assisted commits must carry a
|
||||
`Co-Authored-By:` trailer naming the assistant, not a footer in the PR
|
||||
description.
|
||||
|
||||
The sign-off covers AI-assisted code too.
|
||||
|
||||
|
||||
4
configure
vendored
4
configure
vendored
@@ -3685,7 +3685,9 @@ fi
|
||||
|
||||
|
||||
|
||||
VERSION_INFO="2:49:0"
|
||||
# 3:0:0: htsblk layout changed (contenttype/charset/contentencoding widened to
|
||||
# 128), an incompatible ABI break, so bump current and reset revision/age.
|
||||
VERSION_INFO="3:0:0"
|
||||
|
||||
{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether to enable maintainer-specific portions of Makefiles" >&5
|
||||
printf %s "checking whether to enable maintainer-specific portions of Makefiles... " >&6; }
|
||||
|
||||
@@ -29,7 +29,9 @@ AC_CONFIG_SRCDIR(src/httrack.c)
|
||||
AC_CONFIG_MACRO_DIR([m4])
|
||||
AC_CONFIG_HEADERS(config.h)
|
||||
AM_INIT_AUTOMAKE([subdir-objects])
|
||||
VERSION_INFO="2:49:0"
|
||||
# 3:0:0: htsblk layout changed (contenttype/charset/contentencoding widened to
|
||||
# 128), an incompatible ABI break, so bump current and reset revision/age.
|
||||
VERSION_INFO="3:0:0"
|
||||
AM_MAINTAINER_MODE
|
||||
AC_USE_SYSTEM_EXTENSIONS
|
||||
|
||||
|
||||
@@ -3584,8 +3584,9 @@ void back_wait(struct_back * sback, httrackp * opt, cache_back * cache,
|
||||
back[i].r.is_file = 1;
|
||||
back[i].r.totalsize = back[i].r.size =
|
||||
fsize_utf8(back[i].url_sav);
|
||||
get_httptype(opt, back[i].r.contenttype,
|
||||
back[i].url_sav, 1);
|
||||
get_httptype_sized(opt, back[i].r.contenttype,
|
||||
sizeof(back[i].r.contenttype),
|
||||
back[i].url_sav, 1);
|
||||
hts_log_print(opt, LOG_DEBUG,
|
||||
"Not-modified status without cache guessed: %s%s",
|
||||
back[i].url_adr, back[i].url_fil);
|
||||
|
||||
@@ -1734,7 +1734,7 @@ int httpmirror(char *url1, httrackp * opt) {
|
||||
{
|
||||
char buff[256];
|
||||
|
||||
guess_httptype(opt, buff, urlfil());
|
||||
guess_httptype_sized(opt, buff, sizeof(buff), urlfil());
|
||||
if (strcmp(buff, "image/gif") == 0)
|
||||
create_gif_warning = 1;
|
||||
}
|
||||
@@ -3150,7 +3150,7 @@ static void postprocess_file(httrackp * opt, const char *save, const char *adr,
|
||||
/* CID */
|
||||
make_content_id(adr, fil, cid, sizeof(cid));
|
||||
|
||||
guess_httptype(opt, mimebuff, save);
|
||||
guess_httptype_sized(opt, mimebuff, sizeof(mimebuff), save);
|
||||
fprintf(opt->state.mimefp, "--%s\r\n",
|
||||
StringBuff(opt->state.mimemid));
|
||||
/*if (first)
|
||||
@@ -3862,7 +3862,8 @@ int htsAddLink(htsmoduleStruct * str, char *link) {
|
||||
opt->savename_83 = b;
|
||||
if (r != -1 && !forbidden_url) {
|
||||
if (savename()) {
|
||||
if (lienrelatif(tempo, afs.save, savename()) == 0) {
|
||||
if (lienrelatif(tempo, sizeof(tempo), afs.save, savename()) ==
|
||||
0) {
|
||||
hts_log_print(opt, LOG_DEBUG,
|
||||
"(module): relative link at %s build with %s and %s: %s",
|
||||
afs.af.adr, afs.save, savename(), tempo);
|
||||
|
||||
@@ -295,16 +295,19 @@ static void basic_selftests(void) {
|
||||
assertf(strcmp(fil_normalized("/p?b=2&a=1&c=3", norm), "/p?a=1&b=2&c=3") ==
|
||||
0);
|
||||
assertf(strcmp(fil_normalized("/a//b", norm), "/a/b") == 0);
|
||||
// "//" is collapsed only before the query; inside the query it is kept
|
||||
assertf(strcmp(fil_normalized("/a//b?x=c//d", norm), "/a/b?x=c//d") == 0);
|
||||
}
|
||||
// give_mimext(): mime type -> file extension, bounded into the caller buffer.
|
||||
// Returns 1 when an extension was written, 0 otherwise.
|
||||
{
|
||||
char ext[16];
|
||||
|
||||
give_mimext(ext, sizeof(ext), "image/gif");
|
||||
assertf(give_mimext(ext, sizeof(ext), "image/gif") == 1);
|
||||
assertf(strcmp(ext, "gif") == 0);
|
||||
give_mimext(ext, sizeof(ext), "text/html");
|
||||
assertf(give_mimext(ext, sizeof(ext), "text/html") == 1);
|
||||
assertf(strcmp(ext, "html") == 0);
|
||||
give_mimext(ext, sizeof(ext), "no/such-mime-type");
|
||||
assertf(give_mimext(ext, sizeof(ext), "no/such-mime-type") == 0);
|
||||
assertf(ext[0] == '\0');
|
||||
}
|
||||
// convtolower(): lower-cases into the caller buffer (bounded by its size).
|
||||
@@ -317,13 +320,160 @@ static void basic_selftests(void) {
|
||||
// cut_path(): splits a path into directory (with trailing '/') and basename,
|
||||
// each bounded by its buffer size.
|
||||
{
|
||||
char full[] = "/dir/sub/file.html";
|
||||
char path[256];
|
||||
char pname[256];
|
||||
|
||||
cut_path(full, path, sizeof(path), pname, sizeof(pname));
|
||||
assertf(strcmp(path, "/dir/sub/") == 0);
|
||||
assertf(strcmp(pname, "file.html") == 0);
|
||||
{
|
||||
char full[] = "/dir/sub/file.html";
|
||||
|
||||
cut_path(full, path, sizeof(path), pname, sizeof(pname));
|
||||
assertf(strcmp(path, "/dir/sub/") == 0);
|
||||
assertf(strcmp(pname, "file.html") == 0);
|
||||
}
|
||||
{ // a trailing slash is trimmed before the split
|
||||
char full[] = "/dir/sub/";
|
||||
|
||||
cut_path(full, path, sizeof(path), pname, sizeof(pname));
|
||||
assertf(strcmp(path, "/dir/") == 0);
|
||||
assertf(strcmp(pname, "sub") == 0);
|
||||
}
|
||||
{ // a path of length <= 1 yields empty results
|
||||
char full[] = "/";
|
||||
|
||||
cut_path(full, path, sizeof(path), pname, sizeof(pname));
|
||||
assertf(path[0] == '\0' && pname[0] == '\0');
|
||||
}
|
||||
}
|
||||
// get_httptype_sized(): a long MIME type (Office OOXML reaches 73 chars) is
|
||||
// written whole into a contenttype-sized buffer; returns 1 on a match, 0 when
|
||||
// flag==0 and nothing matched. Regression for the old contenttype[64]
|
||||
// overflow.
|
||||
{
|
||||
httrackp *opt = hts_create_opt();
|
||||
htsblk r; // write into the real struct field, not a stand-in
|
||||
|
||||
assertf(opt != NULL);
|
||||
// a long MIME (Office OOXML reaches 73 chars) must fit htsblk.contenttype
|
||||
// whole: a [64] field would make this bounded copy abort.
|
||||
assertf(get_httptype_sized(opt, r.contenttype, sizeof(r.contenttype),
|
||||
"deck.pptx", 0) == 1);
|
||||
assertf(strcmp(r.contenttype,
|
||||
"application/vnd.openxmlformats-officedocument."
|
||||
"presentationml.presentation") == 0);
|
||||
assertf(get_httptype_sized(opt, r.contenttype, sizeof(r.contenttype),
|
||||
"x.gif", 0) == 1);
|
||||
assertf(strcmp(r.contenttype, "image/gif") == 0);
|
||||
// no extension and flag==0: nothing written, returns 0
|
||||
assertf(get_httptype_sized(opt, r.contenttype, sizeof(r.contenttype),
|
||||
"noextfile", 0) == 0);
|
||||
assertf(r.contenttype[0] == '\0');
|
||||
// no extension and flag==1: octet-stream fallback, returns 1
|
||||
assertf(get_httptype_sized(opt, r.contenttype, sizeof(r.contenttype),
|
||||
"noextfile", 1) == 1);
|
||||
assertf(strcmp(r.contenttype, "application/octet-stream") == 0);
|
||||
// a user --assume rule with an empty value matches but writes nothing:
|
||||
// get_userhttptype returns 1 with the buffer empty, so get_httptype_sized
|
||||
// must still report 0 (callers test the return like the old
|
||||
// strnotempty(s)).
|
||||
StringCopy(opt->mimedefs, "\ncgi=\n");
|
||||
assertf(get_httptype_sized(opt, r.contenttype, sizeof(r.contenttype),
|
||||
"/x.cgi", 0) == 0);
|
||||
assertf(r.contenttype[0] == '\0');
|
||||
StringCopy(opt->mimedefs, "\ncgi=text/html\n");
|
||||
assertf(get_httptype_sized(opt, r.contenttype, sizeof(r.contenttype),
|
||||
"/x.cgi", 0) == 1);
|
||||
assertf(strcmp(r.contenttype, "text/html") == 0);
|
||||
hts_free_opt(opt);
|
||||
}
|
||||
// adr_normalized_sized(): bounded host normalization (passthrough when
|
||||
// already normal).
|
||||
{
|
||||
char n[HTS_URLMAXSIZE];
|
||||
|
||||
assertf(strcmp(adr_normalized_sized("example.com", n, sizeof(n)),
|
||||
"example.com") == 0);
|
||||
}
|
||||
// standard_name(): builds "<name><md5?>.<ext>" into a bounded buffer. The md5
|
||||
// is appended (4 chars) only when the URL has a query string (see url_md5),
|
||||
// so test both; pin the structure (name + ext, lengths), not the md5 chars.
|
||||
{
|
||||
char b[HTS_URLMAXSIZE * 2];
|
||||
const char *nom = "index.html"; // name part
|
||||
const char *dot = nom + 5; // points at ".html"
|
||||
size_t len;
|
||||
|
||||
// no query -> no md5: "index" + ".html"
|
||||
standard_name(b, sizeof(b), dot, nom, "http://example.com/index.html", 0);
|
||||
assertf(strcmp(b, "index.html") == 0);
|
||||
// query -> 4 md5 chars between name and ext: "index" + md5(4) + ".html"
|
||||
standard_name(b, sizeof(b), dot, nom, "http://example.com/index.html?v=1",
|
||||
0);
|
||||
len = strlen(b);
|
||||
assertf(len == 5 + 4 + 5);
|
||||
assertf(strncmp(b, "index", 5) == 0);
|
||||
assertf(strcmp(b + len - 5, ".html") == 0);
|
||||
// short names: name kept (<=8), the extension is clamped to 3 -> ".htm"
|
||||
standard_name(b, sizeof(b), dot, nom, "http://example.com/index.html?v=1",
|
||||
1);
|
||||
len = strlen(b);
|
||||
assertf(len == 5 + 4 + 4);
|
||||
assertf(strcmp(b + len - 4, ".htm") == 0);
|
||||
// short names with a >8-char name: the name is clamped to 8 ("indexpag")
|
||||
{
|
||||
const char *lnom = "indexpage.html";
|
||||
const char *ldot = lnom + 9; // points at ".html"
|
||||
|
||||
standard_name(b, sizeof(b), ldot, lnom,
|
||||
"http://example.com/indexpage.html?v=1", 1);
|
||||
len = strlen(b);
|
||||
assertf(len == 8 + 4 + 4);
|
||||
assertf(strncmp(b, "indexpag", 8) == 0);
|
||||
assertf(strcmp(b + len - 4, ".htm") == 0);
|
||||
}
|
||||
}
|
||||
// longfile_to_83(): single-name 8-3 (mode 1) / ISO9660 (mode 2) conversion;
|
||||
// uppercases, clamps the name (8 / 31) and the extension (3). It rewrites
|
||||
// 'save' in place, so pass a mutable array.
|
||||
{
|
||||
char n83[256];
|
||||
|
||||
{
|
||||
char save[] = "longfilename.html";
|
||||
|
||||
longfile_to_83(1, n83, sizeof(n83), save); // 8-3: name->8, ext->3
|
||||
assertf(strcmp(n83, "LONGFILE.HTM") == 0);
|
||||
}
|
||||
{
|
||||
char save[] = "longfilename.html";
|
||||
|
||||
longfile_to_83(2, n83, sizeof(n83), save); // ISO9660: name->31, ext->3
|
||||
assertf(strcmp(n83, "LONGFILENAME.HTM") == 0);
|
||||
}
|
||||
{ // sanitization: leading '.'->'_', interior dots
|
||||
char save[] = ".a b.c.d e"; // collapse to '_', spaces/specials -> '_'
|
||||
// (only the last dot stays as the separator)
|
||||
longfile_to_83(1, n83, sizeof(n83), save);
|
||||
assertf(strcmp(n83, "_A_B_C.D_E") == 0);
|
||||
}
|
||||
}
|
||||
// long_to_83(): per-segment 8-3 conversion of a whole path.
|
||||
{
|
||||
char n83[HTS_URLMAXSIZE * 2];
|
||||
char save[] = "dir/longfilename.html";
|
||||
|
||||
long_to_83(1, n83, sizeof(n83), save);
|
||||
assertf(strcmp(n83, "DIR/LONGFILE.HTM") == 0);
|
||||
}
|
||||
// lienrelatif(): relative path from the directory of curr_fil to link.
|
||||
{
|
||||
char s[HTS_URLMAXSIZE * 2];
|
||||
|
||||
// same directory -> just the basename
|
||||
assertf(lienrelatif(s, sizeof(s), "dir/page.html", "dir/index.html") == 0);
|
||||
assertf(strcmp(s, "page.html") == 0);
|
||||
// link one level up -> a "../" prefix
|
||||
assertf(lienrelatif(s, sizeof(s), "a.html", "dir/index.html") == 0);
|
||||
assertf(strcmp(s, "../a.html") == 0);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2638,15 +2788,12 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
|
||||
// initialiser mimedefs
|
||||
//get_userhttptype(opt,1,opt->mimedefs,NULL);
|
||||
// check
|
||||
mime[0] = '\0';
|
||||
get_httptype(opt, mime, argv[na + 1], 0);
|
||||
if (mime[0] != '\0') {
|
||||
if (get_httptype_sized(opt, mime, sizeof(mime), argv[na + 1],
|
||||
0)) {
|
||||
char ext[256];
|
||||
|
||||
printf("%s is '%s'\n", argv[na + 1], mime);
|
||||
ext[0] = '\0';
|
||||
give_mimext(ext, sizeof(ext), mime);
|
||||
if (ext[0]) {
|
||||
if (give_mimext(ext, sizeof(ext), mime)) {
|
||||
printf("and its local type is '.%s'\n", ext);
|
||||
}
|
||||
} else {
|
||||
|
||||
@@ -197,10 +197,13 @@ Please visit our Website: http://www.httrack.com
|
||||
|
||||
#endif
|
||||
|
||||
/* Taille max d'une URL */
|
||||
/* Max URL length */
|
||||
#define HTS_URLMAXSIZE 1024
|
||||
/* Taille max ligne de commande (>=HTS_URLMAXSIZE*2) */
|
||||
/* Max command-line length (>=HTS_URLMAXSIZE*2) */
|
||||
#define HTS_CDLMAXSIZE 1024
|
||||
/* MIME-type buffer contract (htsblk.contenttype/charset/contentencoding); holds
|
||||
the longest registered MIME type, the Office OOXML ones reaching 73 chars */
|
||||
#define HTS_MIMETYPE_SIZE 128
|
||||
|
||||
/* Copyright (C) 1998 Xavier Roche and other contributors */
|
||||
#define HTTRACK_AFF_AUTHORS "[XR&CO'2014]"
|
||||
@@ -250,6 +253,22 @@ Please visit our Website: http://www.httrack.com
|
||||
#endif
|
||||
#endif
|
||||
|
||||
/**
|
||||
* Mark a function deprecated, with a message pointing at the replacement.
|
||||
* Placed before the declaration so both the GCC/Clang attribute and the MSVC
|
||||
* __declspec sit in a position both accept. Degrades to nothing elsewhere.
|
||||
*/
|
||||
#if defined(__GNUC__) && \
|
||||
(__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 5))
|
||||
#define HTS_DEPRECATED(msg) __attribute__((deprecated(msg)))
|
||||
#elif defined(__GNUC__)
|
||||
#define HTS_DEPRECATED(msg) __attribute__((deprecated))
|
||||
#elif defined(_MSC_VER) && (_MSC_VER >= 1400)
|
||||
#define HTS_DEPRECATED(msg) __declspec(deprecated(msg))
|
||||
#else
|
||||
#define HTS_DEPRECATED(msg)
|
||||
#endif
|
||||
|
||||
#ifndef HTS_LONGLONG
|
||||
#ifdef HTS_NO_64_BIT
|
||||
#define HTS_LONGLONG 0
|
||||
|
||||
@@ -472,9 +472,8 @@ static int tris(httrackp * opt, char *buffer) {
|
||||
{
|
||||
char type[256];
|
||||
|
||||
type[0] = '\0';
|
||||
get_httptype(opt, type, buffer, 0);
|
||||
if (strnotempty(type)) // type reconnu!
|
||||
if (get_httptype_sized(opt, type, sizeof(type), buffer,
|
||||
0)) // recognized type
|
||||
return 1;
|
||||
// ajout RX 05/2001
|
||||
else if (is_dyntype(get_ext(catbuff, sizeof(catbuff), buffer))) // asp,cgi...
|
||||
|
||||
84
src/htslib.c
84
src/htslib.c
@@ -754,7 +754,8 @@ T_SOC http_xfopen(httrackp * opt, int mode, int treat, int waitconnect,
|
||||
if (soc != INVALID_SOCKET) {
|
||||
retour->statuscode = HTTP_OK; // OK
|
||||
strcpybuff(retour->msg, "OK");
|
||||
guess_httptype(opt, retour->contenttype, fil);
|
||||
guess_httptype_sized(opt, retour->contenttype,
|
||||
sizeof(retour->contenttype), fil);
|
||||
} else if (strnotempty(retour->msg) == 0)
|
||||
strcpybuff(retour->msg, "Unable to open local file");
|
||||
return soc; // renvoyer
|
||||
@@ -3466,12 +3467,19 @@ HTSEXT_API char *fil_normalized(const char *source, char *dest) {
|
||||
}
|
||||
|
||||
#define endwith(a) ( (len >= (sizeof(a)-1)) ? ( strncmp(dest, a+len-(sizeof(a)-1), sizeof(a)-1) == 0 ) : 0 );
|
||||
HTSEXT_API char *adr_normalized(const char *source, char *dest) {
|
||||
HTSEXT_API char *adr_normalized_sized(const char *source, char *dest,
|
||||
size_t destsize) {
|
||||
/* not yet too aggressive (no com<->net<->org checkings) */
|
||||
strcpybuff(dest, jump_normalized_const(source));
|
||||
strlcpybuff(dest, jump_normalized_const(source), destsize);
|
||||
return dest;
|
||||
}
|
||||
|
||||
// deprecated variant; kept for ABI compatibility. Bounds to the implicit
|
||||
// contract the old callers relied on (an HTS_URLMAXSIZE*2 URL buffer).
|
||||
HTSEXT_API char *adr_normalized(const char *source, char *dest) {
|
||||
return adr_normalized_sized(source, dest, HTS_URLMAXSIZE * 2);
|
||||
}
|
||||
|
||||
#undef endwith
|
||||
|
||||
// find port (:80) or NULL if not found
|
||||
@@ -3921,22 +3929,34 @@ void hts_replace(char *s, char from, char to) {
|
||||
}
|
||||
}
|
||||
|
||||
// deviner type d'un fichier local..
|
||||
// ex: fil="toto.gif" -> s="image/gif"
|
||||
void guess_httptype(httrackp * opt, char *s, const char *fil) {
|
||||
get_httptype(opt, s, fil, 1);
|
||||
// guess a local file's mime type (e.g. fil="toto.gif" -> s="image/gif")
|
||||
// returns 1 if a type was written to s, 0 otherwise
|
||||
int guess_httptype_sized(httrackp *opt, char *s, size_t ssize,
|
||||
const char *fil) {
|
||||
return get_httptype_sized(opt, s, ssize, fil, 1);
|
||||
}
|
||||
|
||||
// idem
|
||||
// flag: 1 si toujours renvoyer un type
|
||||
HTSEXT_API void get_httptype(httrackp * opt, char *s, const char *fil, int flag) {
|
||||
// userdef overrides get_httptype
|
||||
// deprecated variant; kept for ABI compatibility. Bounds to the implicit
|
||||
// contract the old callers relied on (a contenttype-sized buffer).
|
||||
void guess_httptype(httrackp * opt, char *s, const char *fil) {
|
||||
(void) get_httptype_sized(opt, s, HTS_MIMETYPE_SIZE, fil, 1);
|
||||
}
|
||||
|
||||
// write the mime type for fil into s (capacity ssize)
|
||||
// flag: 1 to always return a type (the "application/..." / octet-stream
|
||||
// fallback) returns 1 if a type was written to s, 0 otherwise
|
||||
HTSEXT_API int get_httptype_sized(httrackp *opt, char *s, size_t ssize,
|
||||
const char *fil, int flag) {
|
||||
// userdef overrides get_httptype (a rule with an empty value, e.g. "--assume
|
||||
// cgi=", matches but writes nothing: report it as "no type" like the old
|
||||
// code, whose callers tested strnotempty(s))
|
||||
if (get_userhttptype(opt, s, fil)) {
|
||||
return;
|
||||
return s[0] != '\0';
|
||||
}
|
||||
// regular tests
|
||||
if (ishtml(opt, fil) == 1) {
|
||||
strcpybuff(s, "text/html");
|
||||
strlcpybuff(s, "text/html", ssize);
|
||||
return 1;
|
||||
} else {
|
||||
/* Check html -> text/html */
|
||||
const char *a = fil + strlen(fil) - 1;
|
||||
@@ -3949,21 +3969,33 @@ HTSEXT_API void get_httptype(httrackp * opt, char *s, const char *fil, int flag)
|
||||
a++;
|
||||
while(strnotempty(hts_mime[j][1])) {
|
||||
if (strfield2(hts_mime[j][1], a)) {
|
||||
if (hts_mime[j][0][0] != '*') { // Une correspondance existe
|
||||
strcpybuff(s, hts_mime[j][0]);
|
||||
return;
|
||||
if (hts_mime[j][0][0] != '*') { // a match exists
|
||||
strlcpybuff(s, hts_mime[j][0], ssize);
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
j++;
|
||||
}
|
||||
|
||||
if (flag)
|
||||
sprintf(s, "application/%s", a);
|
||||
if (flag) {
|
||||
snprintf(s, ssize, "application/%s", a);
|
||||
return 1;
|
||||
}
|
||||
} else {
|
||||
if (flag)
|
||||
strcpybuff(s, "application/octet-stream");
|
||||
if (flag) {
|
||||
strlcpybuff(s, "application/octet-stream", ssize);
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
// deprecated variant; kept for ABI compatibility. Bounds to the implicit
|
||||
// contract the old callers relied on (a contenttype-sized buffer).
|
||||
HTSEXT_API void get_httptype(httrackp *opt, char *s, const char *fil,
|
||||
int flag) {
|
||||
(void) get_httptype_sized(opt, s, HTS_MIMETYPE_SIZE, fil, flag);
|
||||
}
|
||||
|
||||
// get type of fil (php)
|
||||
@@ -4073,9 +4105,9 @@ int get_userhttptype(httrackp * opt, char *s, const char *fil) {
|
||||
return 0;
|
||||
}
|
||||
|
||||
// renvoyer extesion d'un type mime..
|
||||
// ex: "image/gif" -> gif
|
||||
void give_mimext(char *s, size_t ssize, const char *st) {
|
||||
// give the file extension for a mime type (e.g. "image/gif" -> "gif")
|
||||
// returns 1 if an extension was found (and written to s), 0 otherwise
|
||||
int give_mimext(char *s, size_t ssize, const char *st) {
|
||||
int ok = 0;
|
||||
int j = 0;
|
||||
|
||||
@@ -4110,6 +4142,7 @@ void give_mimext(char *s, size_t ssize, const char *st) {
|
||||
}
|
||||
}
|
||||
}
|
||||
return ok;
|
||||
}
|
||||
|
||||
// extension connue?..
|
||||
@@ -4207,9 +4240,8 @@ int may_bogus_multiple(httrackp * opt, const char *mime, const char *filename) {
|
||||
if (strfield2(hts_mime_bogus_multiple[j], mime)) { /* found mime type in suspicious list */
|
||||
char ext[64];
|
||||
|
||||
ext[0] = '\0';
|
||||
give_mimext(ext, sizeof(ext), mime);
|
||||
if (ext[0] != 0) { /* we have an extension for that */
|
||||
if (give_mimext(ext, sizeof(ext),
|
||||
mime)) { /* we have an extension for that */
|
||||
const size_t ext_size = strlen(ext);
|
||||
const char *file = strrchr(filename, '/'); /* fetch terminal filename */
|
||||
|
||||
|
||||
11
src/htslib.h
11
src/htslib.h
@@ -252,7 +252,7 @@ int ishtml_ext(const char *a);
|
||||
int ishttperror(int err);
|
||||
|
||||
int get_userhttptype(httrackp * opt, char *s, const char *fil);
|
||||
void give_mimext(char *s, size_t ssize, const char *st);
|
||||
int give_mimext(char *s, size_t ssize, const char *st);
|
||||
|
||||
int may_bogus_multiple(httrackp * opt, const char *mime, const char *filename);
|
||||
int may_unknown2(httrackp * opt, const char *mime, const char *filename);
|
||||
@@ -500,7 +500,8 @@ HTS_STATIC int is_hypertext_mime(httrackp * opt, const char *mime,
|
||||
char guessed[256];
|
||||
|
||||
guessed[0] = '\0';
|
||||
guess_httptype(opt, guessed, file);
|
||||
if (!guess_httptype_sized(opt, guessed, sizeof(guessed), file))
|
||||
return 0;
|
||||
return is_hypertext_mime__(guessed);
|
||||
}
|
||||
return 0;
|
||||
@@ -515,7 +516,8 @@ HTS_STATIC int may_be_hypertext_mime(httrackp * opt, const char *mime,
|
||||
char guessed[256];
|
||||
|
||||
guessed[0] = '\0';
|
||||
guess_httptype(opt, guessed, file);
|
||||
if (!guess_httptype_sized(opt, guessed, sizeof(guessed), file))
|
||||
return 0;
|
||||
return may_be_hypertext_mime__(guessed);
|
||||
}
|
||||
return 0;
|
||||
@@ -530,7 +532,8 @@ HTS_STATIC int compare_mime(httrackp * opt, const char *mime, const char *file,
|
||||
char guessed[256];
|
||||
|
||||
guessed[0] = '\0';
|
||||
guess_httptype(opt, guessed, file);
|
||||
if (!guess_httptype_sized(opt, guessed, sizeof(guessed), file))
|
||||
return 0;
|
||||
return strfield2(guessed, reference);
|
||||
}
|
||||
return 0;
|
||||
|
||||
@@ -51,12 +51,13 @@ Please visit our Website: http://www.httrack.com
|
||||
url_savename_addstr(afs->save, buff);\
|
||||
}
|
||||
|
||||
#define ADD_STANDARD_NAME(shortname) \
|
||||
{ /* ajout nom */\
|
||||
char BIGSTK buff[HTS_URLMAXSIZE*2];\
|
||||
standard_name(buff,dot_pos,nom_pos,fil_complete,(shortname));\
|
||||
url_savename_addstr(afs->save, buff);\
|
||||
}
|
||||
#define ADD_STANDARD_NAME(shortname) \
|
||||
{ /* add name */ \
|
||||
char BIGSTK buff[HTS_URLMAXSIZE * 2]; \
|
||||
standard_name(buff, sizeof(buff), dot_pos, nom_pos, fil_complete, \
|
||||
(shortname)); \
|
||||
url_savename_addstr(afs->save, buff); \
|
||||
}
|
||||
|
||||
/* Avoid stupid DOS system folders/file such as 'nul' */
|
||||
/* Based on linux/fs/umsdos/mangle.c */
|
||||
@@ -200,7 +201,7 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
// foo.com/bar//foobar -> foo.com/bar/foobar
|
||||
if (opt->urlhack) {
|
||||
// copy of adr (without protocol), used for lookups (see urlhack)
|
||||
normadr = adr_normalized(adr, normadr_);
|
||||
normadr = adr_normalized_sized(adr, normadr_, sizeof(normadr_));
|
||||
normfil = fil_normalized(fil_complete, normfil_);
|
||||
} else {
|
||||
if (link_has_authority(adr_complete)) { // https or other protocols : in "http/" subfolder
|
||||
@@ -344,8 +345,7 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
mime[0] = ext[0] = '\0';
|
||||
get_userhttptype(opt, mime, fil);
|
||||
if (strnotempty(mime)) {
|
||||
give_mimext(ext, sizeof(ext), mime);
|
||||
if (strnotempty(ext)) {
|
||||
if (give_mimext(ext, sizeof(ext), mime)) {
|
||||
ext_chg = 1;
|
||||
}
|
||||
}
|
||||
@@ -378,8 +378,8 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
ext_chg = 2; /* change filename */
|
||||
strcpybuff(ext, r.cdispo);
|
||||
} else if (!may_unknown2(opt, r.contenttype, fil)) { // on peut patcher à priori?
|
||||
give_mimext(s, sizeof(s), r.contenttype); // get extension
|
||||
if (strnotempty(s) > 0) { // on a reconnu l'extension
|
||||
if (give_mimext(s, sizeof(s),
|
||||
r.contenttype)) { // recognized extension
|
||||
ext_chg = 1;
|
||||
strcpybuff(ext, s);
|
||||
}
|
||||
@@ -403,8 +403,7 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
mime[0] = ext[0] = '\0';
|
||||
get_userhttptype(opt, mime, fil);
|
||||
if (strnotempty(mime)) {
|
||||
give_mimext(ext, sizeof(ext), mime);
|
||||
if (strnotempty(ext)) {
|
||||
if (give_mimext(ext, sizeof(ext), mime)) {
|
||||
ext_chg = 1;
|
||||
}
|
||||
}
|
||||
@@ -420,10 +419,9 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
strcpybuff(ext, headers->r.cdispo);
|
||||
} else if (!may_unknown2(opt, headers->r.contenttype, headers->url_fil)) { // on peut patcher à priori? (pas interdit ou pas de type)
|
||||
char s[16];
|
||||
s[0] = '\0';
|
||||
give_mimext(s, sizeof(s),
|
||||
headers->r.contenttype); // get extension
|
||||
if (strnotempty(s) > 0) { // on a reconnu l'extension
|
||||
if (give_mimext(
|
||||
s, sizeof(s),
|
||||
headers->r.contenttype)) { // recognized extension
|
||||
ext_chg = 1;
|
||||
strcpybuff(ext, s);
|
||||
}
|
||||
@@ -438,7 +436,8 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
char mime_from_file[128];
|
||||
|
||||
mime_from_file[0] = 0;
|
||||
get_httptype(opt, mime_from_file, fil, 1);
|
||||
get_httptype_sized(opt, mime_from_file, sizeof(mime_from_file),
|
||||
fil, 1);
|
||||
if (!strnotempty(mime_from_file) || strcasecmp(mime_type, mime_from_file) != 0) { /* different mime for this type */
|
||||
/* type change not forbidden (or no extension at all) */
|
||||
if (!may_unknown2(opt, mime_type, fil)) {
|
||||
@@ -647,9 +646,9 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
ext_chg = 2; /* change filename */
|
||||
strcpybuff(ext, back[b].r.cdispo);
|
||||
} else if (!may_unknown2(opt, back[b].r.contenttype, back[b].url_fil)) { // on peut patcher à priori? (pas interdit ou pas de type)
|
||||
give_mimext(s, sizeof(s),
|
||||
back[b].r.contenttype); // get extension
|
||||
if (strnotempty(s) > 0) { // on a reconnu l'extension
|
||||
if (give_mimext(
|
||||
s, sizeof(s),
|
||||
back[b].r.contenttype)) { // recognized extension
|
||||
ext_chg = 1;
|
||||
strcpybuff(ext, s);
|
||||
}
|
||||
@@ -926,7 +925,7 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
|
||||
pth[0] = n83[0] = '\0';
|
||||
strncatbuff(pth, fil, (int) (nom_pos - fil) - 1);
|
||||
long_to_83(opt->savename_83, n83, pth);
|
||||
long_to_83(opt->savename_83, n83, sizeof(n83), pth);
|
||||
htsbuff_cat(&sb, n83);
|
||||
}
|
||||
}
|
||||
@@ -1308,7 +1307,7 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
if (opt->savename_83) {
|
||||
char BIGSTK n83[HTS_URLMAXSIZE * 2];
|
||||
|
||||
long_to_83(opt->savename_83, n83, afs->save);
|
||||
long_to_83(opt->savename_83, n83, sizeof(n83), afs->save);
|
||||
strcpybuff(afs->save, n83);
|
||||
}
|
||||
// enforce stricter ISO9660 compliance (bug reported by Steffo Carlsson)
|
||||
@@ -1379,7 +1378,9 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
if (lastDot == NULL) {
|
||||
strcatbuff(afs->save, "." DELAYED_EXT);
|
||||
} else if (!IS_DELAYED_EXT(afs->save)) {
|
||||
strcatbuff(lastDot, "." DELAYED_EXT);
|
||||
/* lastDot points within afs->save; bound by the remaining capacity */
|
||||
strlcatbuff(lastDot, "." DELAYED_EXT,
|
||||
sizeof(afs->save) - (size_t) (lastDot - afs->save));
|
||||
}
|
||||
}
|
||||
// enforce 260-character path limit before inserting destination path
|
||||
@@ -1584,41 +1585,41 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* nom avec md5 urilisé partout */
|
||||
void standard_name(char *b, const char *dot_pos, const char *nom_pos, const char *fil,
|
||||
int short_ver) {
|
||||
/* md5-based name used everywhere; builds into b (capacity bsize) */
|
||||
void standard_name(char *b, size_t bsize, const char *dot_pos,
|
||||
const char *nom_pos, const char *fil, int short_ver) {
|
||||
char md5[32 + 2];
|
||||
htsbuff bb = htsbuff_ptr(b, bsize);
|
||||
|
||||
b[0] = '\0';
|
||||
/* Nom */
|
||||
/* Name */
|
||||
if (dot_pos) {
|
||||
if (!short_ver) // Noms longs
|
||||
strncatbuff(b, nom_pos, (dot_pos - nom_pos));
|
||||
if (!short_ver) // long names
|
||||
htsbuff_catn(&bb, nom_pos, (size_t) (dot_pos - nom_pos));
|
||||
else
|
||||
strncatbuff(b, nom_pos, min(dot_pos - nom_pos, 8));
|
||||
htsbuff_catn(&bb, nom_pos, (size_t) min(dot_pos - nom_pos, 8));
|
||||
} else {
|
||||
if (!short_ver) // Noms longs
|
||||
strcatbuff(b, nom_pos);
|
||||
if (!short_ver) // long names
|
||||
htsbuff_cat(&bb, nom_pos);
|
||||
else
|
||||
strncatbuff(b, nom_pos, 8);
|
||||
htsbuff_catn(&bb, nom_pos, 8);
|
||||
}
|
||||
/* MD5 - 16 bits */
|
||||
strncatbuff(b, url_md5(md5, fil), 4);
|
||||
htsbuff_catn(&bb, url_md5(md5, fil), 4);
|
||||
/* Ext */
|
||||
if (dot_pos) {
|
||||
strcatbuff(b, ".");
|
||||
if (!short_ver) // Noms longs
|
||||
strcatbuff(b, dot_pos + 1);
|
||||
htsbuff_catc(&bb, '.');
|
||||
if (!short_ver) // long names
|
||||
htsbuff_cat(&bb, dot_pos + 1);
|
||||
else
|
||||
strncatbuff(b, dot_pos + 1, 3);
|
||||
htsbuff_catn(&bb, dot_pos + 1, 3);
|
||||
}
|
||||
// Allow extensionless
|
||||
#ifdef DO_NOT_ALLOW_EXTENSIONLESS
|
||||
else {
|
||||
if (!short_ver) // Noms longs
|
||||
strcatbuff(b, DEFAULT_EXT);
|
||||
if (!short_ver) // long names
|
||||
htsbuff_cat(&bb, DEFAULT_EXT);
|
||||
else
|
||||
strcatbuff(b, DEFAULT_EXT_SHORT);
|
||||
htsbuff_cat(&bb, DEFAULT_EXT_SHORT);
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
||||
@@ -96,8 +96,8 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
httrackp * opt, struct_back * sback, cache_back * cache,
|
||||
hash_struct * hash, int ptr, int numero_passe,
|
||||
const lien_back * headers);
|
||||
void standard_name(char *b, const char *dot_pos, const char *nom_pos,
|
||||
const char *fil_complete,
|
||||
void standard_name(char *b, size_t bsize, const char *dot_pos,
|
||||
const char *nom_pos, const char *fil_complete,
|
||||
int short_ver);
|
||||
void url_savename_addstr(char *d, const char *s);
|
||||
char *url_md5(char *digest_buffer, const char *fil_complete);
|
||||
|
||||
@@ -499,9 +499,9 @@ struct htsblk {
|
||||
FILE *out; // écriture directe sur disque (si is_write=1)
|
||||
LLint size; // taille fichier
|
||||
char msg[80]; // message éventuel si échec ("\0"=non précisé)
|
||||
char contenttype[64]; // content-type ("text/html" par exemple)
|
||||
char charset[64]; // charset ("iso-8859-1" par exemple)
|
||||
char contentencoding[64]; // content-encoding ("gzip" par exemple)
|
||||
char contenttype[HTS_MIMETYPE_SIZE]; // content-type (e.g. "text/html")
|
||||
char charset[HTS_MIMETYPE_SIZE]; // charset (e.g. "iso-8859-1")
|
||||
char contentencoding[HTS_MIMETYPE_SIZE]; // content-encoding (e.g. "gzip")
|
||||
char *location; // on copie dedans éventuellement la véritable 'location'
|
||||
LLint totalsize; // taille totale à télécharger (-1=inconnue)
|
||||
short int is_file; // ce n'est pas une socket mais un descripteur de fichier si 1
|
||||
|
||||
@@ -610,11 +610,11 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
|
||||
b = strchr(a, '<'); // prochain tag
|
||||
}
|
||||
}
|
||||
if (lienrelatif
|
||||
(tempo, heap(ptr)->sav,
|
||||
concat(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt),
|
||||
StringBuff(opt->path_html_utf8),
|
||||
"index.html")) == 0) {
|
||||
if (lienrelatif(tempo, sizeof(tempo), heap(ptr)->sav,
|
||||
concat(OPT_GET_BUFF(opt),
|
||||
OPT_GET_BUFF_SIZE(opt),
|
||||
StringBuff(opt->path_html_utf8),
|
||||
"index.html")) == 0) {
|
||||
detect_title = 1; // ok détecté pour cette page!
|
||||
makeindex_links++; // un de plus
|
||||
strcpybuff(makeindex_firstlink, tempo);
|
||||
@@ -1649,8 +1649,9 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
|
||||
}
|
||||
// Prendre si extension reconnue
|
||||
if (!url_ok) {
|
||||
get_httptype(opt, type, tempo, 0);
|
||||
if (strnotempty(type)) // type reconnu!
|
||||
if (get_httptype_sized(opt, type,
|
||||
sizeof(type), tempo,
|
||||
0)) // recognized type
|
||||
url_ok = 1;
|
||||
else if (is_dyntype(get_ext(OPT_GET_BUFF(opt), OPT_GET_BUFF_SIZE(opt), tempo))) // reconnu php,cgi,asp..
|
||||
url_ok = 1;
|
||||
@@ -2719,7 +2720,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
|
||||
|
||||
strcpybuff(save, StringBuff(opt->path_html_utf8));
|
||||
strcatbuff(save, cat_name);
|
||||
if (lienrelatif(tempo, save, relativesavename()) == 0) {
|
||||
if (lienrelatif(tempo, sizeof(tempo), save,
|
||||
relativesavename()) == 0) {
|
||||
/* Never escape high-chars (we don't know the encoding!!) */
|
||||
inplace_escape_uri_utf(tempo, sizeof(tempo)); // escape with %xx
|
||||
//if (!no_esc_utf)
|
||||
@@ -2949,7 +2951,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
|
||||
tempo[0] = '\0';
|
||||
// calculer le lien relatif
|
||||
|
||||
if (lienrelatif(tempo, afs.save, relativesavename()) == 0) {
|
||||
if (lienrelatif(tempo, sizeof(tempo), afs.save,
|
||||
relativesavename()) == 0) {
|
||||
if (!in_media) { // In media (such as real audio): don't patch
|
||||
/* Never escape high-chars (we don't know the encoding!!) */
|
||||
inplace_escape_uri_utf(tempo, sizeof(tempo));
|
||||
@@ -3507,9 +3510,9 @@ int hts_mirror_check_moved(htsmoduleStruct * str,
|
||||
char BIGSTK pn_adr[HTS_URLMAXSIZE * 2], pn_fil[HTS_URLMAXSIZE * 2];
|
||||
|
||||
n_adr[0] = n_fil[0] = '\0';
|
||||
(void) adr_normalized(moved->adr, n_adr);
|
||||
(void) adr_normalized_sized(moved->adr, n_adr, sizeof(n_adr));
|
||||
(void) fil_normalized(moved->fil, n_fil);
|
||||
(void) adr_normalized(urladr(), pn_adr);
|
||||
(void) adr_normalized_sized(urladr(), pn_adr, sizeof(pn_adr));
|
||||
(void) fil_normalized(urlfil(), pn_fil);
|
||||
if (strcasecmp(n_adr, pn_adr) == 0
|
||||
&& strcasecmp(n_fil, pn_fil) == 0) {
|
||||
|
||||
@@ -274,7 +274,9 @@ int ident_url_relatif(const char *lien, const char *origin_adr,
|
||||
char *const idna = hts_convertStringUTF8ToIDNA(a, strlen(a));
|
||||
if (idna != NULL) {
|
||||
if (strlen(idna) < HTS_URLMAXSIZE) {
|
||||
strcpybuff(a, idna);
|
||||
/* a points within adrfil->adr; bound by the remaining capacity */
|
||||
strlcpybuff(a, idna,
|
||||
sizeof(adrfil->adr) - (size_t) (a - adrfil->adr));
|
||||
}
|
||||
free(idna);
|
||||
}
|
||||
@@ -286,7 +288,7 @@ int ident_url_relatif(const char *lien, const char *origin_adr,
|
||||
|
||||
// créer dans s, à partir du chemin courant curr_fil, le lien vers link (absolu)
|
||||
// un ident_url_relatif a déja été fait avant, pour que link ne soit pas un chemin relatif
|
||||
int lienrelatif(char *s, const char *link, const char *curr_fil) {
|
||||
int lienrelatif(char *s, size_t ssize, const char *link, const char *curr_fil) {
|
||||
char BIGSTK _curr[HTS_URLMAXSIZE * 2];
|
||||
char BIGSTK newcurr_fil[HTS_URLMAXSIZE * 2], newlink[HTS_URLMAXSIZE * 2];
|
||||
char *curr;
|
||||
@@ -314,9 +316,9 @@ int lienrelatif(char *s, const char *link, const char *curr_fil) {
|
||||
}
|
||||
}
|
||||
|
||||
// recopier uniquement le chemin courant
|
||||
// copy only the current path
|
||||
curr = _curr;
|
||||
strcpybuff(curr, curr_fil);
|
||||
strlcpybuff(curr, curr_fil, sizeof(_curr));
|
||||
if ((a = strchr(curr, '?')) == NULL) // couper au ? (params)
|
||||
a = curr + strlen(curr) - 1; // pas de params: aller à la fin
|
||||
while((*a != '/') && (a > curr))
|
||||
@@ -359,14 +361,14 @@ int lienrelatif(char *s, const char *link, const char *curr_fil) {
|
||||
a++;
|
||||
while(*a)
|
||||
if (*(a++) == '/')
|
||||
strcatbuff(s, "../");
|
||||
strlcatbuff(s, "../", ssize);
|
||||
//if (strlen(s)==0) strcatbuff(s,"/");
|
||||
|
||||
if (slash)
|
||||
strcatbuff(s, "/"); // garder absolu!!
|
||||
strlcatbuff(s, "/", ssize); // keep it absolute!
|
||||
|
||||
// on est dans le répertoire de départ, copier
|
||||
strcatbuff(s, link + ((*link == '/') ? 1 : 0));
|
||||
// we are in the starting directory, copy
|
||||
strlcatbuff(s, link + ((*link == '/') ? 1 : 0), ssize);
|
||||
|
||||
/* Security check */
|
||||
if (strlen(s) >= HTS_URLMAXSIZE)
|
||||
@@ -410,7 +412,7 @@ int link_has_authorization(const char *lien) {
|
||||
}
|
||||
|
||||
// conversion chemin de fichier/dossier vers 8-3 ou ISO9660
|
||||
void long_to_83(int mode, char *n83, char *save) {
|
||||
void long_to_83(int mode, char *n83, size_t n83size, char *save) {
|
||||
n83[0] = '\0';
|
||||
|
||||
while(*save) {
|
||||
@@ -425,19 +427,19 @@ void long_to_83(int mode, char *n83, char *save) {
|
||||
}
|
||||
fnl[j] = '\0';
|
||||
// conversion
|
||||
longfile_to_83(mode, fn83, fnl);
|
||||
strcatbuff(n83, fn83);
|
||||
longfile_to_83(mode, fn83, sizeof(fn83), fnl);
|
||||
strlcatbuff(n83, fn83, n83size);
|
||||
|
||||
save += i;
|
||||
if (*save == '/') {
|
||||
strcatbuff(n83, "/");
|
||||
strlcatbuff(n83, "/", n83size);
|
||||
save++;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// conversion nom de fichier/dossier isolé vers 8-3 ou ISO9660
|
||||
void longfile_to_83(int mode, char *n83, char *save) {
|
||||
void longfile_to_83(int mode, char *n83, size_t n83size, char *save) {
|
||||
int j = 0, max = 0;
|
||||
int i = 0;
|
||||
char nom[256];
|
||||
@@ -526,10 +528,10 @@ void longfile_to_83(int mode, char *n83, char *save) {
|
||||
}
|
||||
// corriger vers 8-3
|
||||
n83[0] = '\0';
|
||||
strncatbuff(n83, nom, max);
|
||||
strlncatbuff(n83, nom, n83size, max);
|
||||
if (strnotempty(ext)) {
|
||||
strcatbuff(n83, ".");
|
||||
strncatbuff(n83, ext, 3);
|
||||
strlcatbuff(n83, ".", n83size);
|
||||
strlncatbuff(n83, ext, n83size, 3);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -61,11 +61,11 @@ typedef struct lien_adrfilsave lien_adrfilsave;
|
||||
int ident_url_relatif(const char *lien, const char *origin_adr,
|
||||
const char *origin_fil,
|
||||
lien_adrfil* const adrfil);
|
||||
int lienrelatif(char *s, const char *link, const char *curr);
|
||||
int lienrelatif(char *s, size_t ssize, const char *link, const char *curr);
|
||||
int link_has_authority(const char *lien);
|
||||
int link_has_authorization(const char *lien);
|
||||
void long_to_83(int mode, char *n83, char *save);
|
||||
void longfile_to_83(int mode, char *n83, char *save);
|
||||
void long_to_83(int mode, char *n83, size_t n83size, char *save);
|
||||
void longfile_to_83(int mode, char *n83, size_t n83size, char *save);
|
||||
HTS_INLINE int __rech_tageq(const char *adr, const char *s);
|
||||
HTS_INLINE int __rech_tageqbegdigits(const char *adr, const char *s);
|
||||
HTS_INLINE int rech_tageq_all(const char *adr, const char *s);
|
||||
|
||||
@@ -223,8 +223,9 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
|
||||
// note (up/down): on calcule à partir du lien primaire, ET du lien précédent.
|
||||
// ex: si on descend 2 fois on peut remonter 1 fois
|
||||
|
||||
if (lienrelatif(tempo, fil, heap(heap(ptr)->premier)->fil) == 0) {
|
||||
if (lienrelatif(tempo2, fil, heap(ptr)->fil) == 0) {
|
||||
if (lienrelatif(tempo, sizeof(tempo), fil,
|
||||
heap(heap(ptr)->premier)->fil) == 0) {
|
||||
if (lienrelatif(tempo2, sizeof(tempo2), fil, heap(ptr)->fil) == 0) {
|
||||
hts_log_print(opt, LOG_DEBUG,
|
||||
"build relative links to test: %s %s (with %s and %s)",
|
||||
tempo, tempo2, heap(heap(ptr)->premier)->fil,
|
||||
@@ -326,8 +327,9 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
|
||||
char BIGSTK tempo[HTS_URLMAXSIZE * 2];
|
||||
char BIGSTK tempo2[HTS_URLMAXSIZE * 2];
|
||||
|
||||
if (lienrelatif(tempo, fil, heap(heap(ptr)->premier)->fil) == 0) {
|
||||
if (lienrelatif(tempo2, fil, heap(ptr)->fil) == 0) {
|
||||
if (lienrelatif(tempo, sizeof(tempo), fil,
|
||||
heap(heap(ptr)->premier)->fil) == 0) {
|
||||
if (lienrelatif(tempo2, sizeof(tempo2), fil, heap(ptr)->fil) == 0) {
|
||||
} else {
|
||||
hts_log_print(opt, LOG_ERROR,
|
||||
"Error building relative link %s and %s", fil,
|
||||
@@ -336,7 +338,6 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
|
||||
} else {
|
||||
hts_log_print(opt, LOG_ERROR, "Error building relative link %s and %s",
|
||||
fil, heap(heap(ptr)->premier)->fil);
|
||||
|
||||
}
|
||||
} // fin tester interdiction de monter
|
||||
|
||||
|
||||
@@ -207,6 +207,9 @@ HTSEXT_API const char *jump_normalized_const(const char *);
|
||||
HTSEXT_API char *jump_toport(char *);
|
||||
HTSEXT_API const char *jump_toport_const(const char *);
|
||||
HTSEXT_API char *fil_normalized(const char *source, char *dest);
|
||||
HTSEXT_API char *adr_normalized_sized(const char *source, char *dest,
|
||||
size_t destsize);
|
||||
HTS_DEPRECATED("use adr_normalized_sized(source, dest, destsize)")
|
||||
HTSEXT_API char *adr_normalized(const char *source, char *dest);
|
||||
HTSEXT_API const char *hts_rootdir(char *file);
|
||||
|
||||
@@ -244,6 +247,9 @@ HTSEXT_API char *unescape_http_unharm(char *const catbuff, const size_t size, co
|
||||
HTSEXT_API char *antislash_unescaped(char *catbuff, const char *s);
|
||||
|
||||
HTSEXT_API void escape_remove_control(char *s);
|
||||
HTSEXT_API int get_httptype_sized(httrackp *opt, char *s, size_t ssize,
|
||||
const char *fil, int flag);
|
||||
HTS_DEPRECATED("use get_httptype_sized(opt, s, ssize, fil, flag)")
|
||||
HTSEXT_API void get_httptype(httrackp * opt, char *s, const char *fil,
|
||||
int flag);
|
||||
HTSEXT_API int is_knowntype(httrackp * opt, const char *fil);
|
||||
@@ -251,6 +257,9 @@ HTSEXT_API int is_userknowntype(httrackp * opt, const char *fil);
|
||||
HTSEXT_API int is_dyntype(const char *fil);
|
||||
HTSEXT_API const char *get_ext(char *catbuff, size_t size, const char *fil);
|
||||
HTSEXT_API int may_unknown(httrackp * opt, const char *st);
|
||||
HTSEXT_API int guess_httptype_sized(httrackp *opt, char *s, size_t ssize,
|
||||
const char *fil);
|
||||
HTS_DEPRECATED("use guess_httptype_sized(opt, s, ssize, fil)")
|
||||
HTSEXT_API void guess_httptype(httrackp * opt, char *s, const char *fil);
|
||||
|
||||
/* Ugly string tools */
|
||||
|
||||
Reference in New Issue
Block a user