mirror of
https://github.com/xroche/httrack.git
synced 2026-06-14 22:33:54 +03:00
Compare commits
31 Commits
docs/gover
...
tests/cach
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
83ff148efd | ||
|
|
50bb02e729 | ||
|
|
b80ee793ac | ||
|
|
d12456c1e8 | ||
|
|
a52a2b146c | ||
|
|
226a38d3d0 | ||
|
|
1e463f65a5 | ||
|
|
09ed9968cd | ||
|
|
ad6915e3cc | ||
|
|
4a5580dec0 | ||
|
|
f1d35e7691 | ||
|
|
6d7db83726 | ||
|
|
335c2c4b2a | ||
|
|
edd52bf3be | ||
|
|
9eb2a344a9 | ||
|
|
348a7d8cb2 | ||
|
|
5f81741ac5 | ||
|
|
0cf14c4e88 | ||
|
|
29a07ff487 | ||
|
|
f987083f14 | ||
|
|
eb565f0bd8 | ||
|
|
71398d510e | ||
|
|
75fc040f06 | ||
|
|
c4ef18f5a5 | ||
|
|
d76dad47f7 | ||
|
|
9c6ff54040 | ||
|
|
4a057514b9 | ||
|
|
055e17b057 | ||
|
|
d7bb97d697 | ||
|
|
d741188980 | ||
|
|
ca810ef7e3 |
27
.clang-format
Normal file
27
.clang-format
Normal file
@@ -0,0 +1,27 @@
|
||||
# clang-format 19 config for the HTTrack C engine.
|
||||
#
|
||||
# IMPORTANT: this is applied to TOUCHED LINES ONLY (via git-clang-format / the
|
||||
# CI format check). The engine was originally formatted by GNU indent / by hand
|
||||
# and does NOT round-trip through clang-format, so a whole-tree reformat is
|
||||
# intentionally never done. Format the lines you change; leave the rest.
|
||||
#
|
||||
# Reverse-engineered from src/*.c: 2-space indent, no tabs, 80 columns, pointers
|
||||
# bound to the name (char *x), attached braces, un-indented case labels, and a
|
||||
# space after C-style casts ((int) x). Most of that is LLVM's defaults; the
|
||||
# lines below are the deliberate deviations.
|
||||
|
||||
BasedOnStyle: LLVM
|
||||
|
||||
# Engine specifics / deviations from LLVM:
|
||||
SpaceAfterCStyleCast: true # "(int) x", overwhelmingly dominant (542 vs 7)
|
||||
SortIncludes: false # C include order can be significant; never reorder
|
||||
IncludeBlocks: Preserve # do not merge/reflow include groups
|
||||
|
||||
# Stated explicitly for robustness against base-style drift (these match LLVM):
|
||||
IndentWidth: 2
|
||||
UseTab: Never
|
||||
ColumnLimit: 80
|
||||
PointerAlignment: Right
|
||||
IndentCaseLabels: false
|
||||
SpaceBeforeParens: ControlStatements
|
||||
AllowShortIfStatementsOnASingleLine: Never
|
||||
35
.githooks/README.md
Normal file
35
.githooks/README.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# Git hooks
|
||||
|
||||
Versioned hooks for this repo. Enable them once per clone:
|
||||
|
||||
```sh
|
||||
git config core.hooksPath .githooks
|
||||
```
|
||||
|
||||
## pre-commit: auto-format changed C lines
|
||||
|
||||
Runs `git-clang-format` (clang-format 19, using the repo `.clang-format`) on the
|
||||
**staged lines only** and re-stages the result, so every commit is
|
||||
clang-format-clean and the CI `format` check passes. It never reformats the
|
||||
whole tree, only the lines you changed.
|
||||
|
||||
- Disable for a single commit: `HTTRACK_NO_AUTOFORMAT=1 git commit ...`
|
||||
- If clang-format 19 isn't installed, the hook skips silently (CI still
|
||||
enforces). Install it with your distro's `clang-format-19`, or from
|
||||
apt.llvm.org.
|
||||
- If a file has *both* staged and unstaged changes, the hook does not
|
||||
auto-mutate it (that would commit the unstaged part); it instead reports
|
||||
whether its staged lines need formatting and asks you to stage/stash the rest.
|
||||
|
||||
### noexec working trees
|
||||
|
||||
Git executes the hook directly, so if your working tree is on a `noexec` mount
|
||||
git cannot run `.githooks/pre-commit`. Point `core.hooksPath` at a copy on an
|
||||
exec filesystem instead:
|
||||
|
||||
```sh
|
||||
mkdir -p ~/.httrack-hooks && cp .githooks/pre-commit ~/.httrack-hooks/
|
||||
chmod +x ~/.httrack-hooks/pre-commit
|
||||
git config core.hooksPath ~/.httrack-hooks
|
||||
```
|
||||
</content>
|
||||
71
.githooks/pre-commit
Executable file
71
.githooks/pre-commit
Executable file
@@ -0,0 +1,71 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# Auto-format the staged C lines with clang-format (touched lines only), then
|
||||
# re-stage them, so commits stay clang-format-clean and CI's format check passes.
|
||||
#
|
||||
# Enable once per clone: git config core.hooksPath .githooks
|
||||
# Skip for one commit: HTTRACK_NO_AUTOFORMAT=1 git commit ...
|
||||
#
|
||||
# Matches the CI gate (.clang-format, clang-format 19). It only ever touches the
|
||||
# lines a commit changes; it never reformats the whole tree.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
[ "${HTTRACK_NO_AUTOFORMAT:-}" = "1" ] && exit 0
|
||||
|
||||
# Staged C/H files (added/copied/modified/renamed).
|
||||
mapfile -t files < <(git diff --cached --name-only --diff-filter=ACMR -- '*.c' '*.h')
|
||||
[ "${#files[@]}" -eq 0 ] && exit 0
|
||||
|
||||
# Locate clang-format 19 and the git driver; if absent, skip (CI is the backstop).
|
||||
cf=""
|
||||
for c in clang-format-19 clang-format; do
|
||||
if command -v "$c" >/dev/null 2>&1; then
|
||||
case "$("$c" --version)" in *"version 19."*)
|
||||
cf="$c"
|
||||
break
|
||||
;;
|
||||
esac
|
||||
fi
|
||||
done
|
||||
gcf=""
|
||||
for g in git-clang-format-19 git-clang-format; do
|
||||
command -v "$g" >/dev/null 2>&1 && {
|
||||
gcf="$g"
|
||||
break
|
||||
}
|
||||
done
|
||||
if [ -z "$cf" ] || [ -z "$gcf" ]; then
|
||||
echo "pre-commit: clang-format 19 not found; skipping auto-format (CI still checks)." >&2
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Files that are staged AND also have unstaged changes: re-staging them would
|
||||
# pull in the unstaged work, so don't auto-mutate. Check instead and let the
|
||||
# author resolve it.
|
||||
partial=()
|
||||
for f in "${files[@]}"; do
|
||||
if ! git diff --quiet -- "$f"; then partial+=("$f"); fi
|
||||
done
|
||||
|
||||
if [ "${#partial[@]}" -ne 0 ]; then
|
||||
d="$("$gcf" --binary "$cf" --style=file --staged --diff --extensions c,h || true)"
|
||||
case "$d" in
|
||||
"" | "no modified files to format" | *"did not modify any files"*)
|
||||
exit 0
|
||||
;; # staged lines already clean
|
||||
*)
|
||||
echo "pre-commit: these files have both staged and unstaged changes, so" >&2
|
||||
echo "auto-format was skipped to avoid committing unstaged work:" >&2
|
||||
printf ' %s\n' "${partial[@]}" >&2
|
||||
echo "Their staged lines need formatting. Stage the rest (or stash it)," >&2
|
||||
echo "or run: $gcf --binary $cf --staged" >&2
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
fi
|
||||
|
||||
# Clean-staged files: format the staged lines in the working tree, then re-stage.
|
||||
"$gcf" --binary "$cf" --style=file --staged --extensions c,h >/dev/null || true
|
||||
git add -- "${files[@]}"
|
||||
exit 0
|
||||
62
.github/workflows/ci.yml
vendored
62
.github/workflows/ci.yml
vendored
@@ -112,7 +112,65 @@ jobs:
|
||||
|
||||
# Lint the scripts we maintain; the legacy scripts are a separate cleanup.
|
||||
- name: shellcheck
|
||||
run: shellcheck man/makeman.sh tools/mkdeb.sh tests/*.test tests/check-network.sh
|
||||
run: shellcheck man/makeman.sh tools/mkdeb.sh .githooks/pre-commit tests/*.test tests/check-network.sh
|
||||
|
||||
- name: shfmt
|
||||
run: shfmt -d -i 4 man/makeman.sh tools/mkdeb.sh
|
||||
run: shfmt -d -i 4 man/makeman.sh tools/mkdeb.sh .githooks/pre-commit
|
||||
|
||||
# Check clang-format on CHANGED LINES ONLY. The engine predates clang-format
|
||||
# (it was shaped by an old Visual Studio formatter) and does not round-trip,
|
||||
# so we never reformat the whole tree -- only the lines a PR touches.
|
||||
format:
|
||||
name: format (clang-format-19, changed lines)
|
||||
if: github.event_name == 'pull_request'
|
||||
runs-on: ubuntu-24.04
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0
|
||||
|
||||
- name: Install clang-format 19 (pinned, from apt.llvm.org)
|
||||
run: |
|
||||
set -euo pipefail
|
||||
# ubuntu-24.04's native clang-format is 18; pin 19 to match local dev.
|
||||
wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \
|
||||
| sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null
|
||||
echo "deb http://apt.llvm.org/noble/ llvm-toolchain-noble-19 main" \
|
||||
| sudo tee /etc/apt/sources.list.d/llvm-19.list >/dev/null
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y --no-install-recommends clang-format-19
|
||||
# git-clang-format driver, pinned to an immutable release tag (not a
|
||||
# moving branch) since we curl and then execute it.
|
||||
sudo curl -fsSL -o /usr/local/bin/git-clang-format \
|
||||
https://raw.githubusercontent.com/llvm/llvm-project/llvmorg-19.1.7/clang/tools/clang-format/git-clang-format
|
||||
sudo chmod 0755 /usr/local/bin/git-clang-format
|
||||
clang-format-19 --version
|
||||
|
||||
- name: Check formatting of changed lines
|
||||
run: |
|
||||
set -euo pipefail
|
||||
git fetch --no-tags origin \
|
||||
"+refs/heads/${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}"
|
||||
base="origin/${{ github.base_ref }}"
|
||||
set +e
|
||||
diff="$(git clang-format --binary clang-format-19 --style=file \
|
||||
--diff --extensions c,h "$base")"
|
||||
rc=$?
|
||||
set -e
|
||||
# Classify by output first: a non-empty diff means "not clean",
|
||||
# regardless of the driver's exit convention (the release-tag driver
|
||||
# exits 0 and signals via stdout; some packaged drivers exit 1 on a
|
||||
# diff). A nonzero exit with clean output is a real checker error.
|
||||
case "$diff" in
|
||||
"" | "no modified files to format" | *"did not modify any files"*)
|
||||
if [ "$rc" -ne 0 ]; then
|
||||
echo "::error::git clang-format failed (exit $rc): checker error."
|
||||
exit 1
|
||||
fi
|
||||
echo "Formatting OK: changed C lines are clang-format-clean." ;;
|
||||
*)
|
||||
echo "$diff"
|
||||
echo "::error::Changed C lines are not clang-format-clean."
|
||||
echo "Fix locally with: git clang-format --binary clang-format-19 $base"
|
||||
exit 1 ;;
|
||||
esac
|
||||
|
||||
@@ -56,6 +56,7 @@ whttrackrundir = $(bindir)
|
||||
whttrackrun_SCRIPTS = webhttrack
|
||||
|
||||
libhttrack_la_SOURCES = htscore.c htsparse.c htsback.c htscache.c \
|
||||
htscache_selftest.c \
|
||||
htscatchurl.c htsfilters.c htsftp.c htshash.c coucal/coucal.c \
|
||||
htshelp.c htslib.c htscoremain.c \
|
||||
htsname.c htsrobots.c htstools.c htswizard.c \
|
||||
@@ -65,7 +66,7 @@ libhttrack_la_SOURCES = htscore.c htsparse.c htsback.c htscache.c \
|
||||
md5.c \
|
||||
minizip/ioapi.c minizip/mztools.c minizip/unzip.c minizip/zip.c \
|
||||
hts-indextmpl.h htsalias.h htsback.h htsbase.h htssafe.h \
|
||||
htsbasenet.h htsbauth.h htscache.h htscatchurl.h \
|
||||
htsbasenet.h htsbauth.h htscache.h htscache_selftest.h htscatchurl.h \
|
||||
htsconfig.h htscore.h htsparse.h htscoremain.h htsdefines.h \
|
||||
htsfilters.h htsftp.h htsglobal.h htshash.h coucal/coucal.h \
|
||||
htshelp.h htsindex.h htslib.h htsmd5.h \
|
||||
|
||||
@@ -266,13 +266,18 @@ const char *hts_optalias[][4] = {
|
||||
return value: number of arguments treated (0 if error)
|
||||
*/
|
||||
int optalias_check(int argc, const char *const *argv, int n_arg,
|
||||
int *return_argc, char **return_argv, char *return_error) {
|
||||
int *return_argc, char **return_argv,
|
||||
size_t return_argv_size, char *return_error,
|
||||
size_t return_error_size) {
|
||||
return_error[0] = '\0';
|
||||
*return_argc = 1;
|
||||
if (argv[n_arg][0] == '-')
|
||||
if (argv[n_arg][1] == '-') {
|
||||
char command[1000];
|
||||
char param[1000];
|
||||
/* sized to HTS_CDLMAXSIZE: a long-form option value (--user-agent,
|
||||
--headers, ...) is copied into param, and the value is bounded by the
|
||||
general per-argument check in htscoremain.c (HTS_CDLMAXSIZE) */
|
||||
char command[HTS_CDLMAXSIZE];
|
||||
char param[HTS_CDLMAXSIZE];
|
||||
char addcommand[256];
|
||||
|
||||
/* */
|
||||
@@ -320,9 +325,10 @@ int optalias_check(int argc, const char *const *argv, int n_arg,
|
||||
/* Copy parameters? */
|
||||
if (need_param == 2) {
|
||||
if ((n_arg + 1 >= argc) || (argv[n_arg + 1][0] == '-')) { /* no supplemental parameter */
|
||||
sprintf(return_error,
|
||||
"Syntax error:\n\tOption %s needs to be followed by a parameter: %s <param>\n\t%s\n",
|
||||
command, command, _NOT_NULL(optalias_help(command)));
|
||||
snprintf(return_error, return_error_size,
|
||||
"Syntax error:\n\tOption %s needs to be followed by a "
|
||||
"parameter: %s <param>\n\t%s\n",
|
||||
command, command, _NOT_NULL(optalias_help(command)));
|
||||
return 0;
|
||||
}
|
||||
strcpybuff(param, argv[n_arg + 1]);
|
||||
@@ -335,35 +341,36 @@ int optalias_check(int argc, const char *const *argv, int n_arg,
|
||||
|
||||
/* Must be alone (-P /tmp) */
|
||||
if (strcmp(hts_optalias[pos][2], "param1") == 0) {
|
||||
strcpybuff(return_argv[0], command);
|
||||
strcpybuff(return_argv[1], param);
|
||||
strlcpybuff(return_argv[0], command, return_argv_size);
|
||||
strlcpybuff(return_argv[1], param, return_argv_size);
|
||||
*return_argc = 2; /* 2 parameters returned */
|
||||
}
|
||||
/* Alone with parameter (+*.gif) */
|
||||
else if (strcmp(hts_optalias[pos][2], "param0") == 0) {
|
||||
/* Command */
|
||||
strcpybuff(return_argv[0], command);
|
||||
strcatbuff(return_argv[0], param);
|
||||
strlcpybuff(return_argv[0], command, return_argv_size);
|
||||
strlcatbuff(return_argv[0], param, return_argv_size);
|
||||
}
|
||||
/* Together (-c8) */
|
||||
else {
|
||||
/* Command */
|
||||
strcpybuff(return_argv[0], command);
|
||||
strlcpybuff(return_argv[0], command, return_argv_size);
|
||||
/* Parameters accepted */
|
||||
if (strncmp(hts_optalias[pos][2], "param", 5) == 0) {
|
||||
/* --cache=off or --index=on */
|
||||
if (strcmp(param, "off") == 0)
|
||||
strcatbuff(return_argv[0], "0");
|
||||
strlcatbuff(return_argv[0], "0", return_argv_size);
|
||||
else if (strcmp(param, "on") == 0) {
|
||||
// on is the default
|
||||
// strcatbuff(return_argv[0],"1");
|
||||
} else
|
||||
strcatbuff(return_argv[0], param);
|
||||
strlcatbuff(return_argv[0], param, return_argv_size);
|
||||
}
|
||||
*return_argc = 1; /* 1 parameter returned */
|
||||
}
|
||||
} else {
|
||||
sprintf(return_error, "Unknown option: %s\n", command);
|
||||
snprintf(return_error, return_error_size, "Unknown option: %s\n",
|
||||
command);
|
||||
return 0;
|
||||
}
|
||||
return need_param;
|
||||
@@ -377,15 +384,16 @@ int optalias_check(int argc, const char *const *argv, int n_arg,
|
||||
if ((strcmp(hts_optalias[pos][2], "param1") == 0)
|
||||
|| (strcmp(hts_optalias[pos][2], "param0") == 0)) {
|
||||
if ((n_arg + 1 >= argc) || (argv[n_arg + 1][0] == '-')) { /* no supplemental parameter */
|
||||
sprintf(return_error,
|
||||
"Syntax error:\n\tOption %s needs to be followed by a parameter: %s <param>\n\t%s\n",
|
||||
argv[n_arg], argv[n_arg],
|
||||
_NOT_NULL(optalias_help(argv[n_arg])));
|
||||
snprintf(return_error, return_error_size,
|
||||
"Syntax error:\n\tOption %s needs to be followed by a "
|
||||
"parameter: %s <param>\n\t%s\n",
|
||||
argv[n_arg], argv[n_arg],
|
||||
_NOT_NULL(optalias_help(argv[n_arg])));
|
||||
return 0;
|
||||
}
|
||||
/* Copy parameters */
|
||||
strcpybuff(return_argv[0], argv[n_arg]);
|
||||
strcpybuff(return_argv[1], argv[n_arg + 1]);
|
||||
strlcpybuff(return_argv[0], argv[n_arg], return_argv_size);
|
||||
strlcpybuff(return_argv[1], argv[n_arg + 1], return_argv_size);
|
||||
/* And return */
|
||||
*return_argc = 2; /* 2 parameters returned */
|
||||
return 2; /* 2 parameters used */
|
||||
@@ -394,7 +402,7 @@ int optalias_check(int argc, const char *const *argv, int n_arg,
|
||||
}
|
||||
|
||||
/* Copy and return other unknown option */
|
||||
strcpybuff(return_argv[0], argv[n_arg]);
|
||||
strlcpybuff(return_argv[0], argv[n_arg], return_argv_size);
|
||||
return 1;
|
||||
}
|
||||
|
||||
@@ -521,9 +529,10 @@ int optinclude_file(const char *name, int *argc, char **argv, char *x_argvblk,
|
||||
strcatbuff(_tmp_argv[0], a);
|
||||
strcpybuff(_tmp_argv[1], b);
|
||||
|
||||
result =
|
||||
optalias_check(2, (const char *const *) tmp_argv, 0, &return_argc,
|
||||
(tmp_argv + 2), return_error);
|
||||
result = optalias_check(2, (const char *const *) tmp_argv, 0,
|
||||
&return_argc, (tmp_argv + 2),
|
||||
sizeof(_tmp_argv[0]), return_error,
|
||||
sizeof(return_error));
|
||||
if (!result) {
|
||||
printf("%s\n", return_error);
|
||||
} else {
|
||||
|
||||
@@ -38,7 +38,9 @@ Please visit our Website: http://www.httrack.com
|
||||
#ifdef HTS_INTERNAL_BYTECODE
|
||||
extern const char *hts_optalias[][4];
|
||||
int optalias_check(int argc, const char *const *argv, int n_arg,
|
||||
int *return_argc, char **return_argv, char *return_error);
|
||||
int *return_argc, char **return_argv,
|
||||
size_t return_argv_size, char *return_error,
|
||||
size_t return_error_size);
|
||||
int optalias_find(const char *token);
|
||||
const char *optalias_help(const char *token);
|
||||
int optreal_find(const char *token);
|
||||
|
||||
@@ -102,7 +102,8 @@ int cookie_add(t_cookie * cookie, const char *cook_name, const char *cook_value,
|
||||
strcatbuff(cook, "\n");
|
||||
if (!((strlen(cookie->data) + strlen(cook)) < cookie->max_len))
|
||||
return -1; // impossible d'ajouter
|
||||
cookie_insert(insert, cook);
|
||||
cookie_insert(insert, cookie->max_len - (size_t) (insert - cookie->data),
|
||||
cook);
|
||||
#if DEBUG_COOK
|
||||
printf("add_new cookie: name=\"%s\" value=\"%s\" domain=\"%s\" path=\"%s\"\n",
|
||||
cook_name, cook_value, domain, path);
|
||||
@@ -118,7 +119,7 @@ int cookie_del(t_cookie * cookie, const char *cook_name, const char *domain, con
|
||||
b = cookie_find(cookie->data, cook_name, domain, path);
|
||||
if (b) {
|
||||
a = cookie_nextfield(b);
|
||||
cookie_delete(b, a - b);
|
||||
cookie_delete(b, cookie->max_len - (size_t) (b - cookie->data), a - b);
|
||||
#if DEBUG_COOK
|
||||
printf("deleted old cookie: %s %s %s\n", cook_name, domain, path);
|
||||
#endif
|
||||
@@ -336,41 +337,44 @@ int cookie_save(t_cookie * cookie, const char *name) {
|
||||
return -1;
|
||||
}
|
||||
|
||||
// insertion chaine ins avant s
|
||||
void cookie_insert(char *s, const char *ins) {
|
||||
// Insert string ins before s. s_size is the capacity of the buffer at s.
|
||||
void cookie_insert(char *s, size_t s_size, const char *ins) {
|
||||
char *buff;
|
||||
|
||||
if (strnotempty(s) == 0) { // rien à faire, juste concat
|
||||
strcatbuff(s, ins);
|
||||
if (strnotempty(s) == 0) { // nothing there yet: just concatenate
|
||||
strlcatbuff(s, ins, s_size);
|
||||
} else {
|
||||
buff = (char *) malloct(strlen(s) + 1);
|
||||
if (buff) {
|
||||
strcpybuff(buff, s); // copie temporaire
|
||||
strcpybuff(s, ins); // insérer
|
||||
strcatbuff(s, buff); // copier
|
||||
strlcpybuff(buff, s, strlen(s) + 1); // temporary copy of s
|
||||
strlcpybuff(s, ins, s_size); // write ins
|
||||
strlcatbuff(s, buff, s_size); // then the saved content
|
||||
freet(buff);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// destruction chaine dans s position pos
|
||||
void cookie_delete(char *s, size_t pos) {
|
||||
// Delete the substring of s at position pos. s_size is the capacity at s.
|
||||
void cookie_delete(char *s, size_t s_size, size_t pos) {
|
||||
char *buff;
|
||||
|
||||
if (strnotempty(s + pos) == 0) { // rien à faire, effacer
|
||||
if (strnotempty(s + pos) == 0) { // nothing after pos: truncate
|
||||
s[0] = '\0';
|
||||
} else {
|
||||
buff = (char *) malloct(strlen(s + pos) + 1);
|
||||
if (buff) {
|
||||
strcpybuff(buff, s + pos); // copie temporaire
|
||||
strcpybuff(s, buff); // copier
|
||||
strlcpybuff(buff, s + pos, strlen(s + pos) + 1); // temporary copy
|
||||
strlcpybuff(s, buff, s_size); // overwrite from start
|
||||
freet(buff);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// renvoie champ param de la chaine cookie_base
|
||||
// ex: cookie_get("ceci est<tab>un<tab>exemple",1) renvoi "un"
|
||||
// Return field <param> (0-based, tab-separated) of the cookie line cookie_base,
|
||||
// into buffer. ex: cookie_get("ceci est<tab>un<tab>exemple", 1) returns "un".
|
||||
// buffer must hold at least COOKIE_FIELD_BUFFER_SIZE bytes (all callers use
|
||||
// char[8192]).
|
||||
#define COOKIE_FIELD_BUFFER_SIZE 8192
|
||||
const char *cookie_get(char *buffer, const char *cookie_base, int param) {
|
||||
const char *limit;
|
||||
|
||||
@@ -394,11 +398,11 @@ const char *cookie_get(char *buffer, const char *cookie_base, int param) {
|
||||
if (cookie_base) {
|
||||
if (cookie_base < limit) {
|
||||
const char *a = cookie_base;
|
||||
htsbuff b = htsbuff_ptr(buffer, COOKIE_FIELD_BUFFER_SIZE);
|
||||
|
||||
while((*a) && (*a != '\t') && (*a != '\n'))
|
||||
a++;
|
||||
buffer[0] = '\0';
|
||||
strncatbuff(buffer, cookie_base, (int) (a - cookie_base));
|
||||
htsbuff_catn(&b, cookie_base, (size_t) (a - cookie_base));
|
||||
return buffer;
|
||||
} else
|
||||
return "";
|
||||
@@ -458,11 +462,13 @@ char *bauth_check(t_cookie * cookie, const char *adr, const char *fil) {
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/* Build the auth prefix (host + path, query stripped) into prefix.
|
||||
Callers pass a buffer of HTS_URLMAXSIZE * 2 bytes. */
|
||||
char *bauth_prefix(char *prefix, const char *adr, const char *fil) {
|
||||
char *a;
|
||||
|
||||
strcpybuff(prefix, jump_identification_const(adr));
|
||||
strcatbuff(prefix, fil);
|
||||
strlcpybuff(prefix, jump_identification_const(adr), HTS_URLMAXSIZE * 2);
|
||||
strlcatbuff(prefix, fil, HTS_URLMAXSIZE * 2);
|
||||
a = strchr(prefix, '?');
|
||||
if (a)
|
||||
*a = '\0';
|
||||
|
||||
@@ -67,8 +67,8 @@ int cookie_add(t_cookie * cookie, const char *cook_name, const char *cook_valu
|
||||
int cookie_del(t_cookie * cookie, const char *cook_name, const char *domain, const char *path);
|
||||
int cookie_load(t_cookie * cookie, const char *path, const char *name);
|
||||
int cookie_save(t_cookie * cookie, const char *name);
|
||||
void cookie_insert(char *s, const char *ins);
|
||||
void cookie_delete(char *s, size_t pos);
|
||||
void cookie_insert(char *s, size_t s_size, const char *ins);
|
||||
void cookie_delete(char *s, size_t s_size, size_t pos);
|
||||
const char *cookie_get(char *buffer, const char *cookie_base, int param);
|
||||
char *cookie_find(char *s, const char *cook_name, const char *domain, const char *path);
|
||||
char *cookie_nextfield(char *a);
|
||||
|
||||
140
src/htscache.c
140
src/htscache.c
@@ -196,12 +196,13 @@ struct cache_back_zip_entry {
|
||||
int compressionMethod;
|
||||
};
|
||||
|
||||
#define ZIP_READFIELD_STRING(line, value, refline, refvalue) do { \
|
||||
if (line[0] != '\0' && strfield2(line, refline)) { \
|
||||
strcpybuff(refvalue, value); \
|
||||
line[0] = '\0'; \
|
||||
} \
|
||||
} while(0)
|
||||
#define ZIP_READFIELD_STRING(line, value, refline, refvalue, refvalue_size) \
|
||||
do { \
|
||||
if (line[0] != '\0' && strfield2(line, refline)) { \
|
||||
strlcpybuff(refvalue, value, refvalue_size); \
|
||||
line[0] = '\0'; \
|
||||
} \
|
||||
} while (0)
|
||||
#define ZIP_READFIELD_INT(line, value, refline, refvalue) do { \
|
||||
if (line[0] != '\0' && strfield2(line, refline)) { \
|
||||
int intval = 0; \
|
||||
@@ -643,7 +644,7 @@ static htsblk cache_readex_new(httrackp * opt, cache_back * cache,
|
||||
} else {
|
||||
r.location = location_default;
|
||||
}
|
||||
strcpybuff(r.location, "");
|
||||
r.location[0] = '\0';
|
||||
strcpybuff(buff, adr);
|
||||
strcatbuff(buff, fil);
|
||||
hash_pos_return = coucal_read(cache->hashtable, buff, &hash_pos);
|
||||
@@ -706,17 +707,25 @@ static htsblk cache_readex_new(httrackp * opt, cache_back * cache,
|
||||
value++;
|
||||
ZIP_READFIELD_INT(line, value, "X-In-Cache", dataincache);
|
||||
ZIP_READFIELD_INT(line, value, "X-Statuscode", r.statuscode);
|
||||
ZIP_READFIELD_STRING(line, value, "X-StatusMessage", r.msg); // msg
|
||||
ZIP_READFIELD_STRING(line, value, "X-StatusMessage", r.msg,
|
||||
sizeof(r.msg));
|
||||
ZIP_READFIELD_LLINT(line, value, "X-Size", r.size); // size
|
||||
ZIP_READFIELD_STRING(line, value, "Content-Type", r.contenttype); // contenttype
|
||||
ZIP_READFIELD_STRING(line, value, "X-Charset", r.charset); // contenttype
|
||||
ZIP_READFIELD_STRING(line, value, "Last-Modified", r.lastmodified); // last-modified
|
||||
ZIP_READFIELD_STRING(line, value, "Etag", r.etag); // Etag
|
||||
ZIP_READFIELD_STRING(line, value, "Location", r.location); // 'location' pour moved
|
||||
ZIP_READFIELD_STRING(line, value, "Content-Disposition", r.cdispo); // Content-disposition
|
||||
ZIP_READFIELD_STRING(line, value, "Content-Type", r.contenttype,
|
||||
sizeof(r.contenttype));
|
||||
ZIP_READFIELD_STRING(line, value, "X-Charset", r.charset,
|
||||
sizeof(r.charset));
|
||||
ZIP_READFIELD_STRING(line, value, "Last-Modified", r.lastmodified,
|
||||
sizeof(r.lastmodified));
|
||||
ZIP_READFIELD_STRING(line, value, "Etag", r.etag, sizeof(r.etag));
|
||||
// r.location is a char* pointing into a HTS_URLMAXSIZE*2 buffer
|
||||
ZIP_READFIELD_STRING(line, value, "Location", r.location,
|
||||
HTS_URLMAXSIZE * 2);
|
||||
ZIP_READFIELD_STRING(line, value, "Content-Disposition", r.cdispo,
|
||||
sizeof(r.cdispo));
|
||||
//ZIP_READFIELD_STRING(line, value, "X-Addr", ..); // Original address
|
||||
//ZIP_READFIELD_STRING(line, value, "X-Fil", ..); // Original URI filename
|
||||
ZIP_READFIELD_STRING(line, value, "X-Save", previous_save_); // Original save filename
|
||||
ZIP_READFIELD_STRING(line, value, "X-Save", previous_save_,
|
||||
sizeof(previous_save_));
|
||||
}
|
||||
} while(offset < readSizeHeader && !lineEof);
|
||||
//totalHeader = offset;
|
||||
@@ -733,7 +742,7 @@ static htsblk cache_readex_new(httrackp * opt, cache_back * cache,
|
||||
}
|
||||
}
|
||||
if (return_save != NULL) {
|
||||
strcpybuff(return_save, previous_save);
|
||||
strlcpybuff(return_save, previous_save, HTS_URLMAXSIZE * 2);
|
||||
}
|
||||
|
||||
/* Complete fields */
|
||||
@@ -1025,7 +1034,7 @@ static htsblk cache_readex_old(httrackp * opt, cache_back * cache,
|
||||
} else {
|
||||
r.location = location_default;
|
||||
}
|
||||
strcpybuff(r.location, "");
|
||||
r.location[0] = '\0';
|
||||
#if HTS_FAST_CACHE
|
||||
strcpybuff(buff, adr);
|
||||
strcatbuff(buff, fil);
|
||||
@@ -1096,30 +1105,34 @@ static htsblk cache_readex_old(httrackp * opt, cache_back * cache,
|
||||
//
|
||||
cache_rint(cache->olddat, &r.statuscode);
|
||||
cache_rLLint(cache->olddat, &r.size);
|
||||
cache_rstr(cache->olddat, r.msg);
|
||||
cache_rstr(cache->olddat, r.contenttype);
|
||||
cache_rstr(cache->olddat, r.msg, sizeof(r.msg));
|
||||
cache_rstr(cache->olddat, r.contenttype, sizeof(r.contenttype));
|
||||
if (cache->version >= 3)
|
||||
cache_rstr(cache->olddat, r.charset);
|
||||
cache_rstr(cache->olddat, r.lastmodified);
|
||||
cache_rstr(cache->olddat, r.etag);
|
||||
cache_rstr(cache->olddat, r.location);
|
||||
cache_rstr(cache->olddat, r.charset, sizeof(r.charset));
|
||||
cache_rstr(cache->olddat, r.lastmodified, sizeof(r.lastmodified));
|
||||
cache_rstr(cache->olddat, r.etag, sizeof(r.etag));
|
||||
// r.location points into a HTS_URLMAXSIZE*2 buffer
|
||||
cache_rstr(cache->olddat, r.location, HTS_URLMAXSIZE * 2);
|
||||
if (cache->version >= 2)
|
||||
cache_rstr(cache->olddat, r.cdispo);
|
||||
cache_rstr(cache->olddat, r.cdispo, sizeof(r.cdispo));
|
||||
if (cache->version >= 4) {
|
||||
cache_rstr(cache->olddat, previous_save); // adr
|
||||
cache_rstr(cache->olddat, previous_save); // fil
|
||||
cache_rstr(cache->olddat, previous_save,
|
||||
sizeof(previous_save)); // adr
|
||||
cache_rstr(cache->olddat, previous_save,
|
||||
sizeof(previous_save)); // fil
|
||||
previous_save[0] = '\0';
|
||||
cache_rstr(cache->olddat, previous_save); // save
|
||||
cache_rstr(cache->olddat, previous_save,
|
||||
sizeof(previous_save)); // save
|
||||
if (return_save != NULL) {
|
||||
strcpybuff(return_save, previous_save);
|
||||
strlcpybuff(return_save, previous_save, HTS_URLMAXSIZE * 2);
|
||||
}
|
||||
}
|
||||
if (cache->version >= 5) {
|
||||
r.headers = cache_rstr_addr(cache->olddat);
|
||||
}
|
||||
//
|
||||
cache_rstr(cache->olddat, check);
|
||||
if (strcmp(check, "HTS") == 0) { /* intégrité OK */
|
||||
cache_rstr(cache->olddat, check, sizeof(check));
|
||||
if (strcmp(check, "HTS") == 0) { /* integrity OK */
|
||||
ok = 1;
|
||||
}
|
||||
cache_rLLint(cache->olddat, &size_read); /* lire size pour être sûr de la taille déclarée (réécrire) */
|
||||
@@ -1758,12 +1771,12 @@ void cache_init(cache_back * cache, httrackp * opt) {
|
||||
char firstline[256];
|
||||
char *a = cache->use;
|
||||
|
||||
a += cache_brstr(a, firstline);
|
||||
if (strncmp(firstline, "CACHE-", 6) == 0) { // Nouvelle version du cache
|
||||
if (strncmp(firstline, "CACHE-1.", 8) == 0) { // Version 1.1x
|
||||
a += cache_brstr(a, firstline, sizeof(firstline));
|
||||
if (strncmp(firstline, "CACHE-", 6) == 0) { // new cache format
|
||||
if (strncmp(firstline, "CACHE-1.", 8) == 0) { // version 1.1x
|
||||
cache->version = (int) (firstline[8] - '0'); // cache 1.x
|
||||
if (cache->version <= 5) {
|
||||
a += cache_brstr(a, firstline);
|
||||
a += cache_brstr(a, firstline, sizeof(firstline));
|
||||
strcpybuff(cache->lastmodified, firstline);
|
||||
} else {
|
||||
hts_log_print(opt, LOG_ERROR,
|
||||
@@ -1774,7 +1787,7 @@ void cache_init(cache_back * cache, httrackp * opt) {
|
||||
freet(cache->use);
|
||||
cache->use = NULL;
|
||||
}
|
||||
} else { // non supporté
|
||||
} else { // non supporté
|
||||
hts_log_print(opt, LOG_ERROR,
|
||||
"Cache: %s not supported, ignoring current cache",
|
||||
firstline);
|
||||
@@ -1784,7 +1797,7 @@ void cache_init(cache_back * cache, httrackp * opt) {
|
||||
cache->use = NULL;
|
||||
}
|
||||
/* */
|
||||
} else { // Vieille version du cache
|
||||
} else { // Vieille version du cache
|
||||
/* */
|
||||
hts_log_print(opt, LOG_WARNING,
|
||||
"Cache: importing old cache format");
|
||||
@@ -2088,7 +2101,7 @@ char *readfile_or(const char *fil, const char *defaultdata) {
|
||||
char *adr = malloct(strlen(defaultdata) + 1);
|
||||
|
||||
if (adr) {
|
||||
strcpybuff(adr, defaultdata);
|
||||
strlcpybuff(adr, defaultdata, strlen(defaultdata) + 1);
|
||||
return adr;
|
||||
}
|
||||
}
|
||||
@@ -2109,7 +2122,7 @@ int cache_wstr(FILE * fp, const char *s) {
|
||||
return -1;
|
||||
return 0;
|
||||
}
|
||||
void cache_rstr(FILE * fp, char *s) {
|
||||
void cache_rstr(FILE *fp, char *s, size_t s_size) {
|
||||
INTsys i;
|
||||
char buff[256 + 4];
|
||||
|
||||
@@ -2118,13 +2131,26 @@ void cache_rstr(FILE * fp, char *s) {
|
||||
if (i < 0 || i > 32768) /* error, something nasty happened */
|
||||
i = 0;
|
||||
if (i > 0) {
|
||||
if ((int) fread(s, 1, i, fp) != i) {
|
||||
/* Store at most s_size-1 bytes into s, but consume all i bytes from the
|
||||
stream so the next field stays aligned (the field may be longer than the
|
||||
destination in a tampered/old cache). */
|
||||
const size_t want = (size_t) i;
|
||||
const size_t store = want < s_size ? want : s_size - 1;
|
||||
|
||||
if (fread(s, 1, store, fp) != store) {
|
||||
int fread_cache_failed = 0;
|
||||
|
||||
assertf(fread_cache_failed);
|
||||
}
|
||||
if (want > store && fseek(fp, (long) (want - store), SEEK_CUR) != 0) {
|
||||
int fseek_cache_failed = 0;
|
||||
|
||||
assertf(fseek_cache_failed);
|
||||
}
|
||||
s[store] = '\0';
|
||||
} else {
|
||||
s[0] = '\0';
|
||||
}
|
||||
*(s + i) = '\0';
|
||||
}
|
||||
char *cache_rstr_addr(FILE * fp) {
|
||||
INTsys i;
|
||||
@@ -2148,7 +2174,7 @@ char *cache_rstr_addr(FILE * fp) {
|
||||
}
|
||||
return addr;
|
||||
}
|
||||
int cache_brstr(char *adr, char *s) {
|
||||
int cache_brstr(char *adr, char *s, size_t s_size) {
|
||||
int i;
|
||||
int off;
|
||||
char buff[256 + 4];
|
||||
@@ -2156,23 +2182,17 @@ int cache_brstr(char *adr, char *s) {
|
||||
off = binput(adr, buff, 256);
|
||||
adr += off;
|
||||
sscanf(buff, "%d", &i);
|
||||
if (i > 0)
|
||||
strncpy(s, adr, i);
|
||||
*(s + i) = '\0';
|
||||
off += i;
|
||||
return off;
|
||||
}
|
||||
int cache_quickbrstr(char *adr, char *s) {
|
||||
int i;
|
||||
int off;
|
||||
char buff[256 + 4];
|
||||
if (i < 0 || i > 32768) /* guard a corrupt length */
|
||||
i = 0;
|
||||
if (i > 0) {
|
||||
/* copy at most s_size-1 bytes; advance past the full field regardless */
|
||||
const size_t store = (size_t) i < s_size ? (size_t) i : s_size - 1;
|
||||
|
||||
off = binput(adr, buff, 256);
|
||||
adr += off;
|
||||
sscanf(buff, "%d", &i);
|
||||
if (i > 0)
|
||||
strncpy(s, adr, i);
|
||||
*(s + i) = '\0';
|
||||
strncpy(s, adr, store);
|
||||
s[store] = '\0';
|
||||
} else {
|
||||
s[0] = '\0';
|
||||
}
|
||||
off += i;
|
||||
return off;
|
||||
}
|
||||
@@ -2180,7 +2200,7 @@ int cache_quickbrstr(char *adr, char *s) {
|
||||
/* idem, mais en int */
|
||||
int cache_brint(char *adr, int *i) {
|
||||
char s[256];
|
||||
int r = cache_brstr(adr, s);
|
||||
int r = cache_brstr(adr, s, sizeof(s));
|
||||
|
||||
if (r != -1)
|
||||
sscanf(s, "%d", i);
|
||||
@@ -2189,7 +2209,7 @@ int cache_brint(char *adr, int *i) {
|
||||
void cache_rint(FILE * fp, int *i) {
|
||||
char s[256];
|
||||
|
||||
cache_rstr(fp, s);
|
||||
cache_rstr(fp, s, sizeof(s));
|
||||
sscanf(s, "%d", i);
|
||||
}
|
||||
int cache_wint(FILE * fp, int i) {
|
||||
@@ -2201,7 +2221,7 @@ int cache_wint(FILE * fp, int i) {
|
||||
void cache_rLLint(FILE * fp, LLint * i) {
|
||||
char s[256];
|
||||
|
||||
cache_rstr(fp, s);
|
||||
cache_rstr(fp, s, sizeof(s));
|
||||
sscanf(s, LLintP, i);
|
||||
}
|
||||
int cache_wLLint(FILE * fp, LLint i) {
|
||||
|
||||
@@ -80,10 +80,9 @@ int cache_writedata(FILE * cache_ndx, FILE * cache_dat, const char *str1,
|
||||
int cache_readdata(cache_back * cache, const char *str1, const char *str2,
|
||||
char **inbuff, int *len);
|
||||
|
||||
void cache_rstr(FILE * fp, char *s);
|
||||
void cache_rstr(FILE *fp, char *s, size_t s_size);
|
||||
char *cache_rstr_addr(FILE * fp);
|
||||
int cache_brstr(char *adr, char *s);
|
||||
int cache_quickbrstr(char *adr, char *s);
|
||||
int cache_brstr(char *adr, char *s, size_t s_size);
|
||||
int cache_brint(char *adr, int *i);
|
||||
void cache_rint(FILE * fp, int *i);
|
||||
void cache_rLLint(FILE * fp, LLint * i);
|
||||
|
||||
374
src/htscache_selftest.c
Normal file
374
src/htscache_selftest.c
Normal file
@@ -0,0 +1,374 @@
|
||||
/* ------------------------------------------------------------ */
|
||||
/*
|
||||
HTTrack Website Copier, Offline Browser for Windows and Unix
|
||||
Copyright (C) 1998-2017 Xavier Roche and other contributors
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
Important notes:
|
||||
|
||||
- We hereby ask people using this source NOT to use it in purpose of grabbing
|
||||
emails addresses, or collecting any other private information on persons.
|
||||
This would disgrace our work, and spoil the many hours we spent on it.
|
||||
|
||||
Please visit our Website: http://www.httrack.com
|
||||
*/
|
||||
|
||||
/* ------------------------------------------------------------ */
|
||||
/* File: htscache_selftest.c subroutines: */
|
||||
/* in-process self-test for the (ZIP) cache subsystem */
|
||||
/* Author: Xavier Roche */
|
||||
/* ------------------------------------------------------------ */
|
||||
|
||||
/* Drives the public cache API (cache_init / cache_add / cache_readex)
|
||||
through a create -> read -> update cycle on a real on-disk ZIP cache,
|
||||
asserting every header field and the (binary-safe) body round-trips.
|
||||
Besides a few hand-crafted edge cases it stores a few thousand entries
|
||||
(index/lookup scale) and a handful of large compressible/incompressible
|
||||
bodies (zlib deflate/inflate). Reached via `httrack -#A <dir>`. */
|
||||
|
||||
#define HTS_INTERNAL_BYTECODE
|
||||
|
||||
#include "htscache_selftest.h"
|
||||
|
||||
#include "htscache.h"
|
||||
#include "htscore.h"
|
||||
#include "htslib.h"
|
||||
#include "htszlib.h"
|
||||
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
|
||||
#define SELFTEST_VOLUME 3000 /* number of small entries in the scale pass */
|
||||
|
||||
/* Open a cache session. A write session (ro=0) rotates new.zip -> old.zip and
|
||||
opens a fresh new.zip; a read session (ro=1) opens new.zip in place. */
|
||||
static void selftest_open(cache_back *cache, httrackp *opt, int ro) {
|
||||
memset(cache, 0, sizeof(*cache));
|
||||
cache->type = 1;
|
||||
cache->log = stderr;
|
||||
cache->errlog = stderr;
|
||||
cache->hashtable = coucal_new(0);
|
||||
cache->ro = ro;
|
||||
cache_init(cache, opt);
|
||||
}
|
||||
|
||||
static void selftest_open_for_write(cache_back *cache, httrackp *opt) {
|
||||
selftest_open(cache, opt, 0);
|
||||
}
|
||||
|
||||
static void selftest_open_for_read(cache_back *cache, httrackp *opt) {
|
||||
selftest_open(cache, opt, 1);
|
||||
}
|
||||
|
||||
static void selftest_close(cache_back *cache) {
|
||||
if (cache->dat != NULL) {
|
||||
fclose(cache->dat);
|
||||
cache->dat = NULL;
|
||||
}
|
||||
if (cache->ndx != NULL) {
|
||||
fclose(cache->ndx);
|
||||
cache->ndx = NULL;
|
||||
}
|
||||
if (cache->zipOutput != NULL) {
|
||||
zipClose(cache->zipOutput,
|
||||
"Created by HTTrack Website Copier (cache self-test)");
|
||||
cache->zipOutput = NULL;
|
||||
}
|
||||
if (cache->zipInput != NULL) {
|
||||
unzClose(cache->zipInput);
|
||||
cache->zipInput = NULL;
|
||||
}
|
||||
/* hashtable is intentionally not coucal_delete()d: it would dump a stats
|
||||
summary to stderr on every call, and this is a one-shot CLI subcommand
|
||||
that exits right after (same choice as the other -# cache subcommands) */
|
||||
}
|
||||
|
||||
/* Store one entry. The body is copied into a private buffer (any size), so
|
||||
callers may pass const data and cache_add never sees a cast-away qualifier;
|
||||
it consumes everything synchronously, so the copy is freed on return. */
|
||||
static void store_entry(httrackp *opt, cache_back *cache, const char *adr,
|
||||
const char *fil, const char *save, int statuscode,
|
||||
const char *msg, const char *contenttype,
|
||||
const char *charset, const char *lastmodified,
|
||||
const char *etag, const char *location,
|
||||
const char *body, size_t body_len) {
|
||||
htsblk r;
|
||||
char locbuf[HTS_URLMAXSIZE * 2];
|
||||
char *bodycopy = NULL;
|
||||
|
||||
hts_init_htsblk(&r);
|
||||
r.statuscode = statuscode;
|
||||
r.size = (LLint) body_len;
|
||||
strcpybuff(r.msg, msg);
|
||||
strcpybuff(r.contenttype, contenttype);
|
||||
strcpybuff(r.charset, charset);
|
||||
strcpybuff(r.lastmodified, lastmodified);
|
||||
strcpybuff(r.etag, etag);
|
||||
strcpybuff(locbuf, location);
|
||||
r.location = locbuf;
|
||||
r.is_write = 0;
|
||||
/* an empty body must be a NULL pointer: cache_add rejects a non-NULL
|
||||
pointer with size 0 */
|
||||
if (body_len != 0) {
|
||||
bodycopy = malloct(body_len);
|
||||
memcpy(bodycopy, body, body_len);
|
||||
r.adr = bodycopy;
|
||||
}
|
||||
/* all_in_cache=1: keep the body in the ZIP whatever the content-type,
|
||||
so the read path never depends on a file on disk */
|
||||
cache_add(opt, cache, &r, adr, fil, save, 1, NULL);
|
||||
if (bodycopy != NULL) {
|
||||
freet(bodycopy);
|
||||
}
|
||||
}
|
||||
|
||||
/* Read one entry back and check every field. Returns the number of
|
||||
mismatches (0 == success). */
|
||||
static int check_entry(httrackp *opt, cache_back *cache, const char *adr,
|
||||
const char *fil, int statuscode, const char *msg,
|
||||
const char *contenttype, const char *charset,
|
||||
const char *lastmodified, const char *etag,
|
||||
const char *location, const char *body,
|
||||
size_t body_len) {
|
||||
int fail = 0;
|
||||
char *locbuf = malloct(HTS_URLMAXSIZE * 2);
|
||||
htsblk r;
|
||||
|
||||
locbuf[0] = '\0';
|
||||
/* readonly=1: pure read, no rename/disk-write decision logic */
|
||||
r = cache_readex(opt, cache, adr, fil, "", locbuf, NULL, 1);
|
||||
|
||||
#define CHECK_STR(field, want) \
|
||||
do { \
|
||||
if (strcmp((field), (want)) != 0) { \
|
||||
fprintf(stderr, \
|
||||
"cache-selftest: %s%s: " #field " is '%s', expected '%s'\n", \
|
||||
adr, fil, (field), (want)); \
|
||||
fail++; \
|
||||
} \
|
||||
} while (0)
|
||||
|
||||
if (r.statuscode != statuscode) {
|
||||
fprintf(stderr, "cache-selftest: %s%s: statuscode is %d, expected %d\n",
|
||||
adr, fil, r.statuscode, statuscode);
|
||||
fail++;
|
||||
}
|
||||
CHECK_STR(r.msg, msg);
|
||||
CHECK_STR(r.contenttype, contenttype);
|
||||
CHECK_STR(r.charset, charset);
|
||||
CHECK_STR(r.lastmodified, lastmodified);
|
||||
CHECK_STR(r.etag, etag);
|
||||
CHECK_STR(locbuf, location);
|
||||
|
||||
if (r.size != (LLint) body_len) {
|
||||
fprintf(stderr, "cache-selftest: %s%s: size is " LLintP ", expected %d\n",
|
||||
adr, fil, (LLint) r.size, (int) body_len);
|
||||
fail++;
|
||||
} else if (body_len != 0 &&
|
||||
(r.adr == NULL || memcmp(r.adr, body, body_len) != 0)) {
|
||||
fprintf(stderr, "cache-selftest: %s%s: body mismatch\n", adr, fil);
|
||||
fail++;
|
||||
}
|
||||
|
||||
#undef CHECK_STR
|
||||
|
||||
if (r.adr != NULL) {
|
||||
freet(r.adr);
|
||||
}
|
||||
freet(locbuf);
|
||||
return fail;
|
||||
}
|
||||
|
||||
/* Fill a body of the requested size. kind 0 is highly compressible (a short
|
||||
repeating pattern), kind 1 is incompressible (a deterministic PRNG), kind 2
|
||||
alternates the two -- together they exercise both deflate outcomes. */
|
||||
static void gen_body(char *buf, size_t len, int kind) {
|
||||
unsigned int seed = 0x9e3779b1u ^ (unsigned int) len;
|
||||
size_t j;
|
||||
|
||||
for (j = 0; j < len; j++) {
|
||||
if (kind == 0 || (kind == 2 && (j & 1) == 0)) {
|
||||
buf[j] = (char) ('A' + (j % 26));
|
||||
} else {
|
||||
seed = seed * 1103515245u + 12345u;
|
||||
buf[j] = (char) (seed >> 16);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
int cache_selftests(httrackp *opt, const char *dir) {
|
||||
int failures = 0;
|
||||
cache_back cache;
|
||||
int i;
|
||||
|
||||
/* near-limit field values. The etag stresses htsblk.etag[256]; the location
|
||||
stresses a long redirect URL. Each cached header line is read back through
|
||||
a HTS_URLMAXSIZE-sized parse buffer ("<field>: <value>\r\n"), so the
|
||||
round-trippable value is shorter than HTS_URLMAXSIZE: 1000 stays safely
|
||||
under that real limit. */
|
||||
static char etag_long[251];
|
||||
static char location_long[1001];
|
||||
|
||||
/* a body with embedded NUL and high bytes, to prove binary safety */
|
||||
static const char binary_body[] = {
|
||||
'P', 'N', 'G', '\0', '\r', '\n', (char) 0xFF, (char) 0x80,
|
||||
'\0', '\0', 'e', 'n', 'd', (char) 0xCA, (char) 0xFE, '\n'};
|
||||
|
||||
/* large bodies for the compression pass; kept alive across the write and
|
||||
read passes so the read can compare against them */
|
||||
static const size_t large_size[] = {200000, 200000, 50000};
|
||||
const int large_count = (int) (sizeof(large_size) / sizeof(large_size[0]));
|
||||
char *large_body[3];
|
||||
|
||||
/* edge-case bodies, named so store and read assert the exact same bytes */
|
||||
const char *const body_index = "<html><body>hello</body></html>";
|
||||
const char *const body_api = "{\"k\":\"v\"}";
|
||||
const char *const body_updated = "<html><body>UPDATED CONTENT</body></html>";
|
||||
const char *const body_404 = "<html><body>404 Not Found</body></html>";
|
||||
|
||||
memset(etag_long, 'E', sizeof(etag_long) - 1);
|
||||
etag_long[sizeof(etag_long) - 1] = '\0';
|
||||
memset(location_long, 'L', sizeof(location_long) - 1);
|
||||
location_long[sizeof(location_long) - 1] = '\0';
|
||||
|
||||
for (i = 0; i < large_count; i++) {
|
||||
large_body[i] = malloct(large_size[i]);
|
||||
gen_body(large_body[i], large_size[i], i);
|
||||
}
|
||||
|
||||
/* set up an isolated cache directory */
|
||||
{
|
||||
char base[HTS_URLMAXSIZE];
|
||||
|
||||
strcpybuff(base, dir);
|
||||
if (base[0] != '\0' && base[strlen(base) - 1] != '/') {
|
||||
strcatbuff(base, "/");
|
||||
}
|
||||
StringCopy(opt->path_log, base);
|
||||
}
|
||||
opt->cache = 1;
|
||||
|
||||
/* pass 1: create everything in a single write session */
|
||||
selftest_open_for_write(&cache, opt);
|
||||
|
||||
/* edge cases: normal HTML page */
|
||||
store_entry(opt, &cache, "example.com", "/", "example.com/index.html", 200,
|
||||
"OK", "text/html", "utf-8", "Mon, 01 Jan 2024 00:00:00 GMT",
|
||||
"etag-normal", "", body_index, strlen(body_index));
|
||||
/* redirect: empty body, empty optional fields, near-limit location */
|
||||
store_entry(opt, &cache, "example.com", "/moved", "example.com/moved.html",
|
||||
301, "Moved Permanently", "text/html", "", "", "", location_long,
|
||||
NULL, 0);
|
||||
/* non-HTML content-type kept in cache via all_in_cache, near-limit etag */
|
||||
store_entry(opt, &cache, "example.com", "/api", "example.com/api.json", 200,
|
||||
"OK", "application/json", "utf-8",
|
||||
"Tue, 02 Jan 2024 12:00:00 GMT", etag_long, "", body_api,
|
||||
strlen(body_api));
|
||||
/* binary body */
|
||||
store_entry(opt, &cache, "example.com", "/logo", "example.com/logo.png", 200,
|
||||
"OK", "image/png", "", "", "etag-bin", "", binary_body,
|
||||
sizeof(binary_body));
|
||||
/* error status with a body and a location (non-2xx codes are cached too) */
|
||||
store_entry(opt, &cache, "example.com", "/gone", "example.com/gone.html", 404,
|
||||
"Not Found", "text/html", "utf-8", "", "etag-404",
|
||||
"https://example.com/where-it-went", body_404, strlen(body_404));
|
||||
|
||||
/* scale: a few thousand small entries */
|
||||
for (i = 0; i < SELFTEST_VOLUME; i++) {
|
||||
char fil[64], save[128], body[64];
|
||||
|
||||
sprintf(fil, "/v/%05d", i);
|
||||
sprintf(save, "example.com/v/%05d.html", i);
|
||||
sprintf(body, "<html>volume entry %d</html>", i);
|
||||
store_entry(opt, &cache, "example.com", fil, save, 200, "OK", "text/html",
|
||||
"utf-8", "", "", "", body, strlen(body));
|
||||
}
|
||||
|
||||
/* compression: a few large bodies */
|
||||
for (i = 0; i < large_count; i++) {
|
||||
char fil[64], save[128];
|
||||
|
||||
sprintf(fil, "/big/%d.bin", i);
|
||||
sprintf(save, "example.com/big/%d.bin", i);
|
||||
store_entry(opt, &cache, "example.com", fil, save, 200, "OK",
|
||||
"application/octet-stream", "", "", "", "", large_body[i],
|
||||
large_size[i]);
|
||||
}
|
||||
|
||||
selftest_close(&cache);
|
||||
|
||||
/* pass 2: read back and verify everything round-tripped */
|
||||
selftest_open_for_read(&cache, opt);
|
||||
|
||||
failures += check_entry(opt, &cache, "example.com", "/", 200, "OK",
|
||||
"text/html", "utf-8", "Mon, 01 Jan 2024 00:00:00 GMT",
|
||||
"etag-normal", "", body_index, strlen(body_index));
|
||||
failures += check_entry(opt, &cache, "example.com", "/moved", 301,
|
||||
"Moved Permanently", "text/html", "", "", "",
|
||||
location_long, NULL, 0);
|
||||
failures +=
|
||||
check_entry(opt, &cache, "example.com", "/api", 200, "OK",
|
||||
"application/json", "utf-8", "Tue, 02 Jan 2024 12:00:00 GMT",
|
||||
etag_long, "", body_api, strlen(body_api));
|
||||
failures +=
|
||||
check_entry(opt, &cache, "example.com", "/logo", 200, "OK", "image/png",
|
||||
"", "", "etag-bin", "", binary_body, sizeof(binary_body));
|
||||
failures += check_entry(opt, &cache, "example.com", "/gone", 404, "Not Found",
|
||||
"text/html", "utf-8", "", "etag-404",
|
||||
"https://example.com/where-it-went", body_404,
|
||||
strlen(body_404));
|
||||
|
||||
for (i = 0; i < SELFTEST_VOLUME; i++) {
|
||||
char fil[64], body[64];
|
||||
|
||||
sprintf(fil, "/v/%05d", i);
|
||||
sprintf(body, "<html>volume entry %d</html>", i);
|
||||
failures +=
|
||||
check_entry(opt, &cache, "example.com", fil, 200, "OK", "text/html",
|
||||
"utf-8", "", "", "", body, strlen(body));
|
||||
}
|
||||
|
||||
for (i = 0; i < large_count; i++) {
|
||||
char fil[64];
|
||||
|
||||
sprintf(fil, "/big/%d.bin", i);
|
||||
failures += check_entry(opt, &cache, "example.com", fil, 200, "OK",
|
||||
"application/octet-stream", "", "", "", "",
|
||||
large_body[i], large_size[i]);
|
||||
}
|
||||
|
||||
selftest_close(&cache);
|
||||
|
||||
/* pass 3: update one edge entry with new body and headers */
|
||||
selftest_open_for_write(&cache, opt);
|
||||
store_entry(opt, &cache, "example.com", "/", "example.com/index.html", 200,
|
||||
"OK", "text/html", "iso-8859-1", "Wed, 03 Jan 2024 09:30:00 GMT",
|
||||
"etag-updated", "", body_updated, strlen(body_updated));
|
||||
selftest_close(&cache);
|
||||
|
||||
/* pass 4: re-read and confirm the updated value, not the old one */
|
||||
selftest_open_for_read(&cache, opt);
|
||||
failures +=
|
||||
check_entry(opt, &cache, "example.com", "/", 200, "OK", "text/html",
|
||||
"iso-8859-1", "Wed, 03 Jan 2024 09:30:00 GMT", "etag-updated",
|
||||
"", body_updated, strlen(body_updated));
|
||||
selftest_close(&cache);
|
||||
|
||||
for (i = 0; i < large_count; i++) {
|
||||
freet(large_body[i]);
|
||||
}
|
||||
|
||||
return failures;
|
||||
}
|
||||
49
src/htscache_selftest.h
Normal file
49
src/htscache_selftest.h
Normal file
@@ -0,0 +1,49 @@
|
||||
/* ------------------------------------------------------------ */
|
||||
/*
|
||||
HTTrack Website Copier, Offline Browser for Windows and Unix
|
||||
Copyright (C) 1998-2017 Xavier Roche and other contributors
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
Important notes:
|
||||
|
||||
- We hereby ask people using this source NOT to use it in purpose of grabbing
|
||||
emails addresses, or collecting any other private information on persons.
|
||||
This would disgrace our work, and spoil the many hours we spent on it.
|
||||
|
||||
Please visit our Website: http://www.httrack.com
|
||||
*/
|
||||
|
||||
/* ------------------------------------------------------------ */
|
||||
/* File: htscache_selftest.h */
|
||||
/* Author: Xavier Roche */
|
||||
/* ------------------------------------------------------------ */
|
||||
|
||||
#ifndef HTSCACHE_SELFTEST_DEFH
|
||||
#define HTSCACHE_SELFTEST_DEFH
|
||||
|
||||
#ifdef HTS_INTERNAL_BYTECODE
|
||||
|
||||
#ifndef HTS_DEF_FWSTRUCT_httrackp
|
||||
#define HTS_DEF_FWSTRUCT_httrackp
|
||||
typedef struct httrackp httrackp;
|
||||
#endif
|
||||
|
||||
/* Run the cache create/read/update self-test against a working directory.
|
||||
Returns the number of failed checks (0 == success). */
|
||||
int cache_selftests(httrackp *opt, const char *dir);
|
||||
|
||||
#endif
|
||||
|
||||
#endif
|
||||
@@ -201,8 +201,8 @@ HTSEXT_API int catch_url(T_SOC soc, char *url, char *method, char *data) {
|
||||
while(strnotempty(line)) {
|
||||
socinput(soc, line, 1000);
|
||||
treathead(NULL, NULL, NULL, &blkretour, line); // traiter
|
||||
strcatbuff(data, line);
|
||||
strcatbuff(data, "\r\n");
|
||||
strlcatbuff(data, line, CATCH_URL_DATA_SIZE);
|
||||
strlcatbuff(data, "\r\n", CATCH_URL_DATA_SIZE);
|
||||
}
|
||||
// CR/LF final de l'en tête inutile car déja placé via la ligne vide juste au dessus
|
||||
//strcatbuff(data,"\r\n");
|
||||
|
||||
@@ -40,6 +40,9 @@ Please visit our Website: http://www.httrack.com
|
||||
/* Library internal definictions */
|
||||
#ifdef HTS_INTERNAL_BYTECODE
|
||||
|
||||
// Capacity contract for the catch_url() 'data' buffer (32 Kb).
|
||||
#define CATCH_URL_DATA_SIZE 32768
|
||||
|
||||
// Fonctions
|
||||
void socinput(T_SOC soc, char *s, int max);
|
||||
|
||||
|
||||
@@ -40,11 +40,13 @@ Please visit our Website: http://www.httrack.com
|
||||
#include "htscore.h"
|
||||
#include "htsdefines.h"
|
||||
#include "htsalias.h"
|
||||
#include "htsbauth.h"
|
||||
#include "htswrap.h"
|
||||
#include "htsmodules.h"
|
||||
#include "htszlib.h"
|
||||
#include "htscharset.h"
|
||||
#include "htsencoding.h"
|
||||
#include "htscache_selftest.h"
|
||||
#include "htsmd5.h"
|
||||
|
||||
#include <ctype.h>
|
||||
@@ -138,6 +140,110 @@ static void basic_selftests(void) {
|
||||
fil_normalized(source, buffer);
|
||||
// MD5 selftests
|
||||
md5selftest();
|
||||
// cookie_get field extraction (tab-separated, 0-based)
|
||||
{
|
||||
char cbuf[8192];
|
||||
|
||||
assertf(strcmp(cookie_get(cbuf, "a\tb\tc", 0), "a") == 0);
|
||||
assertf(strcmp(cookie_get(cbuf, "a\tb\tc", 1), "b") == 0);
|
||||
assertf(strcmp(cookie_get(cbuf, "a\tb\tc", 2), "c") == 0);
|
||||
// multi-char fields catch length/boundary bugs that 1-char fields hide
|
||||
assertf(strcmp(cookie_get(cbuf, "host\tx\t/path/to", 0), "host") == 0);
|
||||
assertf(strcmp(cookie_get(cbuf, "host\tx\t/path/to", 2), "/path/to") == 0);
|
||||
assertf(strcmp(cookie_get(cbuf, "a\t\tc", 1), "") == 0); // empty field
|
||||
assertf(strcmp(cookie_get(cbuf, "a\tb\tc", 9), "") == 0); // beyond last
|
||||
}
|
||||
}
|
||||
|
||||
/* Self-tests for the htssafe.h bounded string ops (driven by httrack -#8).
|
||||
Returns 0 if every bounded operation behaved correctly, 1 otherwise.
|
||||
The abort-on-overflow guarantee is checked separately by the -#8 "overflow"
|
||||
sub-mode (it aborts the process by design). */
|
||||
static int string_safety_selftests(void) {
|
||||
char buf[8];
|
||||
|
||||
/* strcpybuff into a sized array: exact copy */
|
||||
strcpybuff(buf, "abc");
|
||||
if (strcmp(buf, "abc") != 0)
|
||||
return 1;
|
||||
|
||||
/* strcatbuff append within capacity */
|
||||
strcatbuff(buf, "de");
|
||||
if (strcmp(buf, "abcde") != 0)
|
||||
return 1;
|
||||
|
||||
/* strncatbuff appends at most N source chars */
|
||||
strcpybuff(buf, "ab");
|
||||
strncatbuff(buf, "cdef", 2);
|
||||
if (strcmp(buf, "abcd") != 0)
|
||||
return 1;
|
||||
|
||||
/* strlcpybuff: explicit-capacity copy into a pointer destination, the form
|
||||
the migration moves toward */
|
||||
{
|
||||
char storage[8];
|
||||
char *const p = storage;
|
||||
|
||||
strlcpybuff(p, "hello", sizeof(storage));
|
||||
if (strcmp(p, "hello") != 0)
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* strcpybuff into a pointer destination: routes through the unchecked
|
||||
strcpybuff_ptr_ fallback (the path the -#8 warning flags). The warning is
|
||||
intentional here; we only verify the fallback still copies correctly. */
|
||||
#if defined(__GNUC__)
|
||||
#pragma GCC diagnostic push
|
||||
#pragma GCC diagnostic ignored "-Wattribute-warning"
|
||||
#endif
|
||||
{
|
||||
char storage[8];
|
||||
char *const p = storage;
|
||||
|
||||
strcpybuff(p, "ptr");
|
||||
if (strcmp(p, "ptr") != 0)
|
||||
return 1;
|
||||
}
|
||||
#if defined(__GNUC__)
|
||||
#pragma GCC diagnostic pop
|
||||
#endif
|
||||
|
||||
/* htsbuff: bounded builder over a fixed array (append, truncating append,
|
||||
reset, and length tracking) */
|
||||
{
|
||||
char dst[8];
|
||||
htsbuff b = htsbuff_array(dst);
|
||||
|
||||
htsbuff_cat(&b, "ab");
|
||||
htsbuff_cat(&b, "cd");
|
||||
if (strcmp(htsbuff_str(&b), "abcd") != 0 || b.len != 4)
|
||||
return 1;
|
||||
|
||||
htsbuff_catn(&b, "efghij", 2); /* append at most 2 */
|
||||
if (strcmp(htsbuff_str(&b), "abcdef") != 0)
|
||||
return 1;
|
||||
|
||||
htsbuff_cpy(&b, "xyz"); /* reset */
|
||||
if (strcmp(htsbuff_str(&b), "xyz") != 0 || b.len != 3)
|
||||
return 1;
|
||||
|
||||
htsbuff_catc(&b, '!'); /* single character */
|
||||
if (strcmp(htsbuff_str(&b), "xyz!") != 0 || b.len != 4)
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* boundary: filling to exactly cap-1 must succeed (one more aborts, which the
|
||||
-#8 overflow-buff mode checks) */
|
||||
{
|
||||
char d2[4];
|
||||
htsbuff c = htsbuff_array(d2);
|
||||
|
||||
htsbuff_cat(&c, "abc");
|
||||
if (strcmp(htsbuff_str(&c), "abc") != 0 || c.len != 3)
|
||||
return 1;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int hts_main_internal(int argc, char **argv, httrackp * opt);
|
||||
@@ -294,10 +400,10 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
|
||||
/* Vérifier argv[] non vide */
|
||||
if (strnotempty(argv[na])) {
|
||||
|
||||
/* Vérifier Commande (alias) */
|
||||
result =
|
||||
optalias_check(argc, (const char *const *) argv, na, &tmp_argc,
|
||||
(char **) tmp_argv, tmp_error);
|
||||
/* Resolve an option alias, if any */
|
||||
result = optalias_check(argc, (const char *const *) argv, na, &tmp_argc,
|
||||
(char **) tmp_argv, sizeof(_tmp_argv[0]),
|
||||
tmp_error, sizeof(tmp_error));
|
||||
if (!result) {
|
||||
HTS_PANIC_PRINTF(tmp_error);
|
||||
htsmain_free();
|
||||
@@ -1787,10 +1893,6 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
|
||||
HTS_PANIC_PRINTF("Empty string given");
|
||||
htsmain_free();
|
||||
return -1;
|
||||
} else if (strlen(argv[na]) >= 256) {
|
||||
HTS_PANIC_PRINTF("Header line string too long");
|
||||
htsmain_free();
|
||||
return -1;
|
||||
}
|
||||
StringCat(opt->headers, argv[na]);
|
||||
StringCat(opt->headers, "\r\n"); /* separator */
|
||||
@@ -2012,6 +2114,19 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
|
||||
case '#':{ // non documenté
|
||||
com++;
|
||||
switch (*com) {
|
||||
case 'A': // cache self-test: httrack -#A <dir>
|
||||
if (na + 1 < argc) {
|
||||
const int err = cache_selftests(opt, argv[na + 1]);
|
||||
|
||||
printf("cache-selftest: %s\n", err ? "FAIL" : "OK");
|
||||
htsmain_free();
|
||||
return err;
|
||||
} else {
|
||||
fprintf(stderr, "Option #A requires a directory argument\n");
|
||||
htsmain_free();
|
||||
return 1;
|
||||
}
|
||||
break;
|
||||
case 'C': // list cache files : httrack -#C '*spid*.gif' will attempt to find the matching file
|
||||
{
|
||||
int hasFilter = 0;
|
||||
@@ -2054,8 +2169,8 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
|
||||
char firstline[256];
|
||||
char *a = cacheNdx;
|
||||
|
||||
a += cache_brstr(a, firstline);
|
||||
a += cache_brstr(a, firstline);
|
||||
a += cache_brstr(a, firstline, sizeof(firstline));
|
||||
a += cache_brstr(a, firstline, sizeof(firstline));
|
||||
while(a != NULL) {
|
||||
a = strchr(a + 1, '\n'); /* start of line */
|
||||
if (a) {
|
||||
@@ -2441,6 +2556,35 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
|
||||
htsmain_free();
|
||||
return 0;
|
||||
break;
|
||||
case '8': /* string-safety selftest: httrack -#8 [overflow <bigstr>] */
|
||||
if (na + 1 < argc
|
||||
&& strncmp(argv[na + 1], "overflow", 8) == 0) {
|
||||
/* Deliberately exceed a sized buffer: the bounded op must
|
||||
abort. The source comes from argv so its length is opaque
|
||||
to the compiler (no static -Wstringop-overflow, genuine
|
||||
runtime check). "overflow-buff" exercises htsbuff. */
|
||||
char small[4];
|
||||
const char *const src =
|
||||
(na + 2 < argc) ? argv[na + 2] : "overflowing";
|
||||
|
||||
if (strcmp(argv[na + 1], "overflow-buff") == 0) {
|
||||
htsbuff b = htsbuff_array(small);
|
||||
|
||||
htsbuff_cat(&b, src);
|
||||
} else {
|
||||
strcpybuff(small, src);
|
||||
}
|
||||
printf("strsafe: NOT aborted\n"); /* must be unreachable */
|
||||
htsmain_free();
|
||||
return 1;
|
||||
} else {
|
||||
const int err = string_safety_selftests();
|
||||
|
||||
printf("strsafe: %s\n", err ? "FAIL" : "OK");
|
||||
htsmain_free();
|
||||
return err;
|
||||
}
|
||||
break;
|
||||
case '7': // hashtable selftest: httrack -#7 nb_entries
|
||||
basic_selftests();
|
||||
if (++na < argc) {
|
||||
@@ -2691,11 +2835,6 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
|
||||
return -1;
|
||||
} else {
|
||||
na++;
|
||||
if (strlen(argv[na]) >= 126) {
|
||||
HTS_PANIC_PRINTF("User-agent length too long");
|
||||
htsmain_free();
|
||||
return -1;
|
||||
}
|
||||
StringCopy(opt->user_agent, argv[na]);
|
||||
if (StringNotEmpty(opt->user_agent))
|
||||
opt->user_agent_send = 1;
|
||||
|
||||
@@ -409,7 +409,7 @@ void help_catchurl(const char *dest_path) {
|
||||
if (soc != INVALID_SOCKET) {
|
||||
char BIGSTK url[HTS_URLMAXSIZE * 2];
|
||||
char method[32];
|
||||
char BIGSTK data[32768];
|
||||
char BIGSTK data[CATCH_URL_DATA_SIZE];
|
||||
|
||||
url[0] = method[0] = data[0] = '\0';
|
||||
//
|
||||
|
||||
149
src/htslib.c
149
src/htslib.c
@@ -878,7 +878,7 @@ int http_sendhead(httrackp * opt, t_cookie * cookie, int mode,
|
||||
const char *xsend, const char *adr, const char *fil,
|
||||
const char *referer_adr, const char *referer_fil,
|
||||
htsblk * retour) {
|
||||
char BIGSTK buffer_head_request[8192];
|
||||
char BIGSTK buffer_head_request[16384];
|
||||
buff_struct bstr = { buffer_head_request, sizeof(buffer_head_request), 0 };
|
||||
|
||||
//int use_11=0; // HTTP 1.1 utilisé
|
||||
@@ -1660,138 +1660,107 @@ void treathead(t_cookie * cookie, const char *adr, const char *fil, htsblk * ret
|
||||
}
|
||||
}
|
||||
|
||||
// transforme le message statuscode en chaîne
|
||||
HTSEXT_API void infostatuscode(char *msg, int statuscode) {
|
||||
// HTTP status code -> reason phrase (per RFC), or NULL if unknown.
|
||||
HTSEXT_API const char *infostatuscode_const(int statuscode) {
|
||||
// O(1) dispatch (the compiler builds a jump table); the phrases are static.
|
||||
switch (statuscode) {
|
||||
// Erreurs HTTP, selon RFC
|
||||
case 100:
|
||||
strcpybuff(msg, "Continue");
|
||||
break;
|
||||
return "Continue";
|
||||
case 101:
|
||||
strcpybuff(msg, "Switching Protocols");
|
||||
break;
|
||||
return "Switching Protocols";
|
||||
case 200:
|
||||
strcpybuff(msg, "OK");
|
||||
break;
|
||||
return "OK";
|
||||
case 201:
|
||||
strcpybuff(msg, "Created");
|
||||
break;
|
||||
return "Created";
|
||||
case 202:
|
||||
strcpybuff(msg, "Accepted");
|
||||
break;
|
||||
return "Accepted";
|
||||
case 203:
|
||||
strcpybuff(msg, "Non-Authoritative Information");
|
||||
break;
|
||||
return "Non-Authoritative Information";
|
||||
case 204:
|
||||
strcpybuff(msg, "No Content");
|
||||
break;
|
||||
return "No Content";
|
||||
case 205:
|
||||
strcpybuff(msg, "Reset Content");
|
||||
break;
|
||||
return "Reset Content";
|
||||
case 206:
|
||||
strcpybuff(msg, "Partial Content");
|
||||
break;
|
||||
return "Partial Content";
|
||||
case 300:
|
||||
strcpybuff(msg, "Multiple Choices");
|
||||
break;
|
||||
return "Multiple Choices";
|
||||
case 301:
|
||||
strcpybuff(msg, "Moved Permanently");
|
||||
break;
|
||||
return "Moved Permanently";
|
||||
case 302:
|
||||
strcpybuff(msg, "Moved Temporarily");
|
||||
break;
|
||||
return "Moved Temporarily";
|
||||
case 303:
|
||||
strcpybuff(msg, "See Other");
|
||||
break;
|
||||
return "See Other";
|
||||
case 304:
|
||||
strcpybuff(msg, "Not Modified");
|
||||
break;
|
||||
return "Not Modified";
|
||||
case 305:
|
||||
strcpybuff(msg, "Use Proxy");
|
||||
break;
|
||||
return "Use Proxy";
|
||||
case 306:
|
||||
strcpybuff(msg, "Undefined 306 error");
|
||||
break;
|
||||
return "Undefined 306 error";
|
||||
case 307:
|
||||
strcpybuff(msg, "Temporary Redirect");
|
||||
break;
|
||||
return "Temporary Redirect";
|
||||
case 400:
|
||||
strcpybuff(msg, "Bad Request");
|
||||
break;
|
||||
return "Bad Request";
|
||||
case 401:
|
||||
strcpybuff(msg, "Unauthorized");
|
||||
break;
|
||||
return "Unauthorized";
|
||||
case 402:
|
||||
strcpybuff(msg, "Payment Required");
|
||||
break;
|
||||
return "Payment Required";
|
||||
case 403:
|
||||
strcpybuff(msg, "Forbidden");
|
||||
break;
|
||||
return "Forbidden";
|
||||
case 404:
|
||||
strcpybuff(msg, "Not Found");
|
||||
break;
|
||||
return "Not Found";
|
||||
case 405:
|
||||
strcpybuff(msg, "Method Not Allowed");
|
||||
break;
|
||||
return "Method Not Allowed";
|
||||
case 406:
|
||||
strcpybuff(msg, "Not Acceptable");
|
||||
break;
|
||||
return "Not Acceptable";
|
||||
case 407:
|
||||
strcpybuff(msg, "Proxy Authentication Required");
|
||||
break;
|
||||
return "Proxy Authentication Required";
|
||||
case 408:
|
||||
strcpybuff(msg, "Request Time-out");
|
||||
break;
|
||||
return "Request Time-out";
|
||||
case 409:
|
||||
strcpybuff(msg, "Conflict");
|
||||
break;
|
||||
return "Conflict";
|
||||
case 410:
|
||||
strcpybuff(msg, "Gone");
|
||||
break;
|
||||
return "Gone";
|
||||
case 411:
|
||||
strcpybuff(msg, "Length Required");
|
||||
break;
|
||||
return "Length Required";
|
||||
case 412:
|
||||
strcpybuff(msg, "Precondition Failed");
|
||||
break;
|
||||
return "Precondition Failed";
|
||||
case 413:
|
||||
strcpybuff(msg, "Request Entity Too Large");
|
||||
break;
|
||||
return "Request Entity Too Large";
|
||||
case 414:
|
||||
strcpybuff(msg, "Request-URI Too Large");
|
||||
break;
|
||||
return "Request-URI Too Large";
|
||||
case 415:
|
||||
strcpybuff(msg, "Unsupported Media Type");
|
||||
break;
|
||||
return "Unsupported Media Type";
|
||||
case 416:
|
||||
strcpybuff(msg, "Requested Range Not Satisfiable");
|
||||
break;
|
||||
return "Requested Range Not Satisfiable";
|
||||
case 417:
|
||||
strcpybuff(msg, "Expectation Failed");
|
||||
break;
|
||||
return "Expectation Failed";
|
||||
case 500:
|
||||
strcpybuff(msg, "Internal Server Error");
|
||||
break;
|
||||
return "Internal Server Error";
|
||||
case 501:
|
||||
strcpybuff(msg, "Not Implemented");
|
||||
break;
|
||||
return "Not Implemented";
|
||||
case 502:
|
||||
strcpybuff(msg, "Bad Gateway");
|
||||
break;
|
||||
return "Bad Gateway";
|
||||
case 503:
|
||||
strcpybuff(msg, "Service Unavailable");
|
||||
break;
|
||||
return "Service Unavailable";
|
||||
case 504:
|
||||
strcpybuff(msg, "Gateway Time-out");
|
||||
break;
|
||||
return "Gateway Time-out";
|
||||
case 505:
|
||||
strcpybuff(msg, "HTTP Version Not Supported");
|
||||
break;
|
||||
//
|
||||
return "HTTP Version Not Supported";
|
||||
default:
|
||||
if (strnotempty(msg) == 0)
|
||||
strcpybuff(msg, "Unknown error");
|
||||
break;
|
||||
return NULL;
|
||||
}
|
||||
}
|
||||
|
||||
// Write the status code's reason phrase into msg. For an unknown code, keep any
|
||||
// caller-provided message, otherwise fall back to a default. Callers provide a
|
||||
// buffer of at least 64 bytes (the longest reason phrase is 31).
|
||||
HTSEXT_API void infostatuscode(char *msg, int statuscode) {
|
||||
const char *const text = infostatuscode_const(statuscode);
|
||||
|
||||
if (text != NULL) {
|
||||
strlcpybuff(msg, text, 64);
|
||||
} else if (strnotempty(msg) == 0) {
|
||||
strlcpybuff(msg, "Unknown error", 64);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
186
src/htsname.c
186
src/htsname.c
@@ -767,7 +767,7 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
// ajouter nom du site éventuellement en premier
|
||||
if (opt->savename_type == -1) { // utiliser savename_userdef! (%h%p/%n%q.%t)
|
||||
const char *a = StringBuff(opt->savename_userdef);
|
||||
char *b = afs->save;
|
||||
htsbuff sb = htsbuff_array(afs->save);
|
||||
|
||||
/*char *nom_pos=NULL,*dot_pos=NULL; // Position nom et point */
|
||||
char tok;
|
||||
@@ -787,17 +787,16 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
}
|
||||
*/
|
||||
|
||||
// Construire nom
|
||||
while((*a) && (((int) (b - afs->save)) < HTS_URLMAXSIZE)) { // parser, et pas trop long..
|
||||
// build the name
|
||||
while ((*a) && (sb.len < HTS_URLMAXSIZE)) { // parse, but not too long
|
||||
if (*a == '%') {
|
||||
int short_ver = 0;
|
||||
|
||||
a++;
|
||||
if (*a == 's') {
|
||||
if (*a == 's') { // '%s...' selects the short (8.3) form
|
||||
short_ver = 1;
|
||||
a++;
|
||||
}
|
||||
*b = '\0';
|
||||
switch (tok = *a++) {
|
||||
case '[': // %[param:prefix_if_not_empty:suffix_if_not_empty:empty_replacement:notfound_replacement]
|
||||
if (strchr(a, ']')) {
|
||||
@@ -834,8 +833,7 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
}
|
||||
if (cp) {
|
||||
c = cp + strlen(name[0]); /* jumps "param=" */
|
||||
strcpybuff(b, name[1]); /* prefix */
|
||||
b += strlen(b);
|
||||
htsbuff_cat(&sb, name[1]); /* prefix */
|
||||
if (*c != '\0' && *c != '&') {
|
||||
char *d = name[0];
|
||||
|
||||
@@ -846,110 +844,90 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
*d = '\0';
|
||||
d = unescape_http(catbuff, sizeof(catbuff), name[0]);
|
||||
if (d && *d) {
|
||||
strcpybuff(b, d); /* value */
|
||||
b += strlen(b);
|
||||
htsbuff_cat(&sb, d); /* value */
|
||||
} else {
|
||||
strcpybuff(b, name[3]); /* empty replacement if any */
|
||||
b += strlen(b);
|
||||
htsbuff_cat(&sb, name[3]); /* empty replacement if any */
|
||||
}
|
||||
} else {
|
||||
strcpybuff(b, name[3]); /* empty replacement if any */
|
||||
b += strlen(b);
|
||||
htsbuff_cat(&sb, name[3]); /* empty replacement if any */
|
||||
}
|
||||
strcpybuff(b, name[2]); /* suffix */
|
||||
b += strlen(b);
|
||||
htsbuff_cat(&sb, name[2]); /* suffix */
|
||||
} else {
|
||||
strcpybuff(b, name[4]); /* not found replacement if any */
|
||||
b += strlen(b);
|
||||
htsbuff_cat(&sb, name[4]); /* not found replacement if any */
|
||||
}
|
||||
} else {
|
||||
strcpybuff(b, name[4]); /* not found replacement if any */
|
||||
b += strlen(b);
|
||||
htsbuff_cat(&sb, name[4]); /* not found replacement if any */
|
||||
}
|
||||
}
|
||||
break;
|
||||
case '%':
|
||||
*b++ = '%';
|
||||
htsbuff_catc(&sb, '%');
|
||||
break;
|
||||
case 'n': // nom sans ext
|
||||
*b = '\0';
|
||||
case 'n': // name without extension
|
||||
if (dot_pos) {
|
||||
if (!short_ver) // Noms longs
|
||||
strncatbuff(b, nom_pos, (int) (dot_pos - nom_pos));
|
||||
if (!short_ver)
|
||||
htsbuff_catn(&sb, nom_pos, (int) (dot_pos - nom_pos));
|
||||
else
|
||||
strncatbuff(b, nom_pos, min((int) (dot_pos - nom_pos), 8));
|
||||
htsbuff_catn(&sb, nom_pos, min((int) (dot_pos - nom_pos), 8));
|
||||
} else {
|
||||
if (!short_ver) // Noms longs
|
||||
strcpybuff(b, nom_pos);
|
||||
if (!short_ver)
|
||||
htsbuff_cat(&sb, nom_pos);
|
||||
else
|
||||
strncatbuff(b, nom_pos, 8);
|
||||
htsbuff_catn(&sb, nom_pos, 8);
|
||||
}
|
||||
b += strlen(b); // pointer à la fin
|
||||
break;
|
||||
case 'N': // nom avec ext
|
||||
// RECOPIE NOM + EXT
|
||||
*b = '\0';
|
||||
case 'N': // name with extension
|
||||
if (dot_pos) {
|
||||
if (!short_ver) // Noms longs
|
||||
strncatbuff(b, nom_pos, (int) (dot_pos - nom_pos));
|
||||
if (!short_ver)
|
||||
htsbuff_catn(&sb, nom_pos, (int) (dot_pos - nom_pos));
|
||||
else
|
||||
strncatbuff(b, nom_pos, min((int) (dot_pos - nom_pos), 8));
|
||||
htsbuff_catn(&sb, nom_pos, min((int) (dot_pos - nom_pos), 8));
|
||||
} else {
|
||||
if (!short_ver) // Noms longs
|
||||
strcpybuff(b, nom_pos);
|
||||
if (!short_ver)
|
||||
htsbuff_cat(&sb, nom_pos);
|
||||
else
|
||||
strncatbuff(b, nom_pos, 8);
|
||||
htsbuff_catn(&sb, nom_pos, 8);
|
||||
}
|
||||
b += strlen(b); // pointer à la fin
|
||||
*b = '.';
|
||||
++b;
|
||||
// RECOPIE NOM + EXT
|
||||
*b = '\0';
|
||||
htsbuff_catc(&sb, '.');
|
||||
if (dot_pos) {
|
||||
if (!short_ver) // Noms longs
|
||||
strcpybuff(b, dot_pos + 1);
|
||||
if (!short_ver)
|
||||
htsbuff_cat(&sb, dot_pos + 1);
|
||||
else
|
||||
strncatbuff(b, dot_pos + 1, 3);
|
||||
htsbuff_catn(&sb, dot_pos + 1, 3);
|
||||
} else {
|
||||
if (!short_ver) // Noms longs
|
||||
strcpybuff(b, DEFAULT_EXT + 1); // pas de..
|
||||
if (!short_ver)
|
||||
htsbuff_cat(&sb, DEFAULT_EXT + 1); // skip the leading dot
|
||||
else
|
||||
strcpybuff(b, DEFAULT_EXT_SHORT + 1); // pas de..
|
||||
htsbuff_cat(&sb, DEFAULT_EXT_SHORT + 1); // skip the leading dot
|
||||
}
|
||||
b += strlen(b); // pointer à la fin
|
||||
//
|
||||
break;
|
||||
case 't': // ext
|
||||
*b = '\0';
|
||||
case 't': // extension
|
||||
if (dot_pos) {
|
||||
if (!short_ver) // Noms longs
|
||||
strcpybuff(b, dot_pos + 1);
|
||||
if (!short_ver)
|
||||
htsbuff_cat(&sb, dot_pos + 1);
|
||||
else
|
||||
strncatbuff(b, dot_pos + 1, 3);
|
||||
htsbuff_catn(&sb, dot_pos + 1, 3);
|
||||
} else {
|
||||
if (!short_ver) // Noms longs
|
||||
strcpybuff(b, DEFAULT_EXT + 1); // pas de..
|
||||
if (!short_ver)
|
||||
htsbuff_cat(&sb, DEFAULT_EXT + 1); // skip the leading dot
|
||||
else
|
||||
strcpybuff(b, DEFAULT_EXT_SHORT + 1); // pas de..
|
||||
htsbuff_cat(&sb, DEFAULT_EXT_SHORT + 1); // skip the leading dot
|
||||
}
|
||||
b += strlen(b); // pointer à la fin
|
||||
break;
|
||||
case 'p': // path sans dernier /
|
||||
*b = '\0';
|
||||
if (nom_pos != fil + 1) { // pas: /index.html (chemin nul)
|
||||
if (!short_ver) { // Noms longs
|
||||
strncatbuff(b, fil, (int) (nom_pos - fil) - 1);
|
||||
case 'p': // path without trailing /
|
||||
if (nom_pos !=
|
||||
fil + 1) { // skip when the path is empty (e.g. /index.html)
|
||||
if (!short_ver) {
|
||||
htsbuff_catn(&sb, fil, (int) (nom_pos - fil) - 1);
|
||||
} else {
|
||||
char BIGSTK pth[HTS_URLMAXSIZE * 2], n83[HTS_URLMAXSIZE * 2];
|
||||
|
||||
pth[0] = n83[0] = '\0';
|
||||
//
|
||||
strncatbuff(pth, fil, (int) (nom_pos - fil) - 1);
|
||||
long_to_83(opt->savename_83, n83, pth);
|
||||
strcpybuff(b, n83);
|
||||
htsbuff_cat(&sb, n83);
|
||||
}
|
||||
}
|
||||
b += strlen(b); // pointer à la fin
|
||||
break;
|
||||
case 'h': // host (IDNA decoded if suitable)
|
||||
// IDNA / RFC 3492 (Punycode) handling for HTTP(s)
|
||||
@@ -957,62 +935,50 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
DECLARE_ADR(final_adr);
|
||||
|
||||
/* Copy address */
|
||||
*b = '\0';
|
||||
if (!short_ver)
|
||||
strcpybuff(b, final_adr);
|
||||
htsbuff_cat(&sb, final_adr);
|
||||
else
|
||||
strcpybuff(b, final_adr);
|
||||
htsbuff_cat(&sb, final_adr);
|
||||
|
||||
/* release */
|
||||
RELEASE_ADR();
|
||||
}
|
||||
b += strlen(b); // pointer à la fin
|
||||
break;
|
||||
case 'H': // host, raw (old mode)
|
||||
*b = '\0';
|
||||
case 'H': // host, raw (old mode)
|
||||
if (protocol == PROTOCOL_FILE) {
|
||||
if (!short_ver) // Noms longs
|
||||
strcpybuff(b, "localhost");
|
||||
if (!short_ver)
|
||||
htsbuff_cat(&sb, "localhost");
|
||||
else
|
||||
strcpybuff(b, "local");
|
||||
htsbuff_cat(&sb, "local");
|
||||
} else {
|
||||
if (!short_ver) // Noms longs
|
||||
strcpybuff(b, print_adr);
|
||||
if (!short_ver)
|
||||
htsbuff_cat(&sb, print_adr);
|
||||
else
|
||||
strncatbuff(b, print_adr, 8);
|
||||
htsbuff_catn(&sb, print_adr, 8);
|
||||
}
|
||||
b += strlen(b); // pointer à la fin
|
||||
break;
|
||||
case 'M': /* host/address?query MD5 (128-bits) */
|
||||
*b = '\0';
|
||||
{
|
||||
char digest[32 + 2];
|
||||
char BIGSTK buff[HTS_URLMAXSIZE * 2];
|
||||
case 'M': /* host/address?query MD5 (128-bits) */
|
||||
{
|
||||
char digest[32 + 2];
|
||||
char BIGSTK buff[HTS_URLMAXSIZE * 2];
|
||||
|
||||
digest[0] = buff[0] = '\0';
|
||||
strcpybuff(buff, adr);
|
||||
strcatbuff(buff, fil_complete);
|
||||
domd5mem(buff, strlen(buff), digest, 1);
|
||||
strcpybuff(b, digest);
|
||||
}
|
||||
b += strlen(b); // pointer à la fin
|
||||
break;
|
||||
digest[0] = buff[0] = '\0';
|
||||
strcpybuff(buff, adr);
|
||||
strcatbuff(buff, fil_complete);
|
||||
domd5mem(buff, strlen(buff), digest, 1);
|
||||
htsbuff_cat(&sb, digest);
|
||||
} break;
|
||||
case 'Q':
|
||||
case 'q': /* query MD5 (128-bits/16-bits)
|
||||
GENERATED ONLY IF query string exists! */
|
||||
{
|
||||
char md5[32 + 2];
|
||||
case 'q': /* query MD5 (128-bits/16-bits)
|
||||
GENERATED ONLY IF query string exists! */
|
||||
{
|
||||
char md5[32 + 2];
|
||||
|
||||
*b = '\0';
|
||||
strncatbuff(b, url_md5(md5, fil_complete), (tok == 'Q') ? 32 : 4);
|
||||
b += strlen(b); // pointer à la fin
|
||||
}
|
||||
break;
|
||||
htsbuff_catn(&sb, url_md5(md5, fil_complete), (tok == 'Q') ? 32 : 4);
|
||||
} break;
|
||||
case 'r':
|
||||
case 'R': // protocol
|
||||
*b = '\0';
|
||||
strcatbuff(b, protocol_str[protocol]);
|
||||
b += strlen(b); // pointer à la fin
|
||||
htsbuff_cat(&sb, protocol_str[protocol]);
|
||||
break;
|
||||
|
||||
/* Patch by Juan Fco Rodriguez to get the full query string */
|
||||
@@ -1021,19 +987,17 @@ int url_savename(lien_adrfilsave *const afs,
|
||||
char *d = strchr(fil_complete, '?');
|
||||
|
||||
if (d != NULL) {
|
||||
strcatbuff(b, d);
|
||||
b += strlen(b);
|
||||
htsbuff_cat(&sb, d);
|
||||
}
|
||||
}
|
||||
break;
|
||||
|
||||
}
|
||||
} else
|
||||
*b++ = *a++;
|
||||
htsbuff_catc(&sb, *a++);
|
||||
}
|
||||
*b++ = '\0';
|
||||
//
|
||||
// Types prédéfinis
|
||||
// predefined types
|
||||
//
|
||||
|
||||
}
|
||||
|
||||
152
src/htssafe.h
152
src/htssafe.h
@@ -123,41 +123,111 @@ static HTS_UNUSED void htssafe_compile_time_check_(void) {
|
||||
(void) check_pointer;
|
||||
}
|
||||
|
||||
/*
|
||||
* Pointer-destination diagnostics for the buff() macros (GCC/Clang, C only).
|
||||
*
|
||||
* strcpybuff()/strcatbuff()/strncatbuff() bounds-check only when the
|
||||
* destination is a sized char[] array (HTS_IS_CHAR_BUFFER). For a bare char*
|
||||
* the capacity is unknown, so the macro silently falls back to plain
|
||||
* strcpy()/strcat()/strncat() while still looking like a checked call.
|
||||
*
|
||||
* These stubs route that pointer case through __builtin_choose_expr() so the
|
||||
* 'warning' attribute fires only at pointer-destination sites; array sites use
|
||||
* the bounded *_safe_ helpers and stay quiet. The warning names the
|
||||
* explicit-size replacement (strlcpybuff/strlcatbuff). Diagnostic only: no
|
||||
* runtime or ABI change, built only on GCC/Clang in C mode. Other compilers
|
||||
* (MSVC, ...) keep the previous behavior via the #else branches.
|
||||
*/
|
||||
#if (defined(__GNUC__) && !defined(__cplusplus))
|
||||
#if defined(__has_attribute)
|
||||
#if __has_attribute(warning)
|
||||
#define HTS_BUFF_PTR_ATTR(msg) __attribute__((unused, noinline, warning(msg)))
|
||||
#endif
|
||||
#endif
|
||||
#ifndef HTS_BUFF_PTR_ATTR
|
||||
/* 'warning' attribute unavailable: keep noinline so the migration can still
|
||||
grep for these symbols, but no compile-time diagnostic is emitted. */
|
||||
#define HTS_BUFF_PTR_ATTR(msg) __attribute__((unused, noinline))
|
||||
#endif
|
||||
|
||||
HTS_BUFF_PTR_ATTR("strcpybuff() destination is a pointer (capacity unknown): "
|
||||
"NOT bounds-checked; use strlcpybuff(dst, src, size)")
|
||||
static char *strcpybuff_ptr_(char *dest, const char *src) {
|
||||
return strcpy(dest, src);
|
||||
}
|
||||
|
||||
HTS_BUFF_PTR_ATTR("strcatbuff() destination is a pointer (capacity unknown): "
|
||||
"NOT bounds-checked; use strlcatbuff(dst, src, size)")
|
||||
static char *strcatbuff_ptr_(char *dest, const char *src) {
|
||||
return strcat(dest, src);
|
||||
}
|
||||
|
||||
HTS_BUFF_PTR_ATTR("strncatbuff() destination is a pointer (capacity unknown): "
|
||||
"NOT bounds-checked; use strlcatbuff(dst, src, size)")
|
||||
static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
|
||||
return strncat(dest, src, n);
|
||||
}
|
||||
#endif
|
||||
|
||||
/**
|
||||
* Append at most N characters from "B" to "A".
|
||||
* If "A" is a char[] variable whose size is not sizeof(char*), then the size
|
||||
* is assumed to be the capacity of this array.
|
||||
*/
|
||||
#if (defined(__GNUC__) && !defined(__cplusplus))
|
||||
#define strncatbuff(A, B, N) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
|
||||
strncat_safe_(A, sizeof(A), B, \
|
||||
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \
|
||||
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__), \
|
||||
strncatbuff_ptr_((A), (B), (N)) )
|
||||
#else
|
||||
#define strncatbuff(A, B, N) \
|
||||
( HTS_IS_NOT_CHAR_BUFFER(A) \
|
||||
? strncat(A, B, N) \
|
||||
: strncat_safe_(A, sizeof(A), B, \
|
||||
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \
|
||||
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) )
|
||||
#endif
|
||||
|
||||
/**
|
||||
* Append characters of "B" to "A".
|
||||
* If "A" is a char[] variable whose size is not sizeof(char*), then the size
|
||||
* is assumed to be the capacity of this array.
|
||||
*/
|
||||
#if (defined(__GNUC__) && !defined(__cplusplus))
|
||||
#define strcatbuff(A, B) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
|
||||
strncat_safe_(A, sizeof(A), B, \
|
||||
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \
|
||||
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__), \
|
||||
strcatbuff_ptr_((A), (B)) )
|
||||
#else
|
||||
#define strcatbuff(A, B) \
|
||||
( HTS_IS_NOT_CHAR_BUFFER(A) \
|
||||
? strcat(A, B) \
|
||||
: strncat_safe_(A, sizeof(A), B, \
|
||||
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \
|
||||
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) )
|
||||
#endif
|
||||
|
||||
/**
|
||||
* Copy characters from "B" to "A".
|
||||
* If "A" is a char[] variable whose size is not sizeof(char*), then the size
|
||||
* is assumed to be the capacity of this array.
|
||||
*/
|
||||
#if (defined(__GNUC__) && !defined(__cplusplus))
|
||||
#define strcpybuff(A, B) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
|
||||
strcpy_safe_(A, sizeof(A), B, \
|
||||
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
|
||||
"overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__), \
|
||||
strcpybuff_ptr_((A), (B)) )
|
||||
#else
|
||||
#define strcpybuff(A, B) \
|
||||
( HTS_IS_NOT_CHAR_BUFFER(A) \
|
||||
? strcpy(A, B) \
|
||||
: strcpy_safe_(A, sizeof(A), B, \
|
||||
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
|
||||
"overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__) )
|
||||
#endif
|
||||
|
||||
/**
|
||||
* Append characters of "B" to "A", "A" having a maximum capacity of "S".
|
||||
@@ -217,6 +287,88 @@ static HTS_INLINE HTS_UNUSED char* strcpy_safe_(char *const dest, const size_t s
|
||||
return strncat_safe_(dest, sizeof_dest, source, sizeof_source, (size_t) -1, exp, file, line);
|
||||
}
|
||||
|
||||
/**
|
||||
* htsbuff: a non-owning bounded string builder over a fixed buffer.
|
||||
*
|
||||
* Companion to the strcpybuff()/strcatbuff() macros for the common case of a
|
||||
* cursor walking a buffer of known capacity (building a name into a fixed
|
||||
* array, assembling a status line, etc.). It tracks the write position, bounds
|
||||
* every write against the real capacity, and aborts on overflow (same contract
|
||||
* as the *_safe_ helpers), so the error-prone manual "p += strlen(p)" dance
|
||||
* goes away.
|
||||
*
|
||||
* Build one from an in-scope array with htsbuff_array() (capacity via sizeof,
|
||||
* so pass an array, not a pointer), or from a pointer of known capacity with
|
||||
* htsbuff_ptr(). The buffer is kept NUL-terminated; htsbuff_str() returns it.
|
||||
*/
|
||||
typedef struct {
|
||||
char *buf; /* backing buffer (kept NUL-terminated) */
|
||||
size_t cap; /* total capacity of buf, including the NUL */
|
||||
size_t len; /* current length, excluding the NUL */
|
||||
} htsbuff;
|
||||
|
||||
static HTS_INLINE HTS_UNUSED htsbuff htsbuff_ptr_(char *buf, size_t cap) {
|
||||
htsbuff b;
|
||||
b.buf = buf;
|
||||
b.cap = cap;
|
||||
b.len = 0;
|
||||
assertf(cap != 0);
|
||||
buf[0] = '\0';
|
||||
return b;
|
||||
}
|
||||
|
||||
/**
|
||||
* Builder over the in-scope array ARR (capacity = sizeof(ARR)).
|
||||
* On GCC/Clang this rejects a non-array (e.g. a char* pointer), whose sizeof
|
||||
* would be the pointer size and silently wrong; use htsbuff_ptr() for pointers.
|
||||
* On other compilers there is no such guard, so pass only true arrays there.
|
||||
*/
|
||||
#if (defined(__GNUC__) && !defined(__cplusplus))
|
||||
/* 0 for an array, a -1 array-size compile error for a pointer. */
|
||||
#define htsbuff_must_be_array_(A) \
|
||||
(sizeof(char[1 - 2 * !!__builtin_types_compatible_p(typeof(A), typeof(&(A)[0]))]) - 1)
|
||||
#define htsbuff_array(ARR) htsbuff_ptr_((ARR), sizeof(ARR) + htsbuff_must_be_array_(ARR))
|
||||
#else
|
||||
#define htsbuff_array(ARR) htsbuff_ptr_((ARR), sizeof(ARR))
|
||||
#endif
|
||||
/** Builder over pointer P of known capacity N (N includes the NUL). */
|
||||
#define htsbuff_ptr(P, N) htsbuff_ptr_((P), (N))
|
||||
|
||||
/** Append at most n characters of s (stopping at its NUL). Aborts on overflow. */
|
||||
static HTS_INLINE HTS_UNUSED void htsbuff_catn(htsbuff *b, const char *s, size_t n) {
|
||||
const size_t add = strnlen(s, n);
|
||||
/* Overflow-safe: keep the (potentially huge) 'add' alone on one side. The
|
||||
maintained invariant len < cap makes 'cap - len' >= 1 (no underflow), so
|
||||
'add < cap - len' cannot wrap the way 'len + add < cap' could. */
|
||||
assertf__(add < b->cap - b->len, "htsbuff append overflow", __FILE__, __LINE__);
|
||||
memcpy(b->buf + b->len, s, add);
|
||||
b->len += add;
|
||||
b->buf[b->len] = '\0';
|
||||
}
|
||||
|
||||
/** Append s. Aborts on overflow. */
|
||||
static HTS_INLINE HTS_UNUSED void htsbuff_cat(htsbuff *b, const char *s) {
|
||||
htsbuff_catn(b, s, (size_t) -1);
|
||||
}
|
||||
|
||||
/** Append a single character (including '\0' as data). Aborts on overflow. */
|
||||
static HTS_INLINE HTS_UNUSED void htsbuff_catc(htsbuff *b, char c) {
|
||||
assertf__(1 < b->cap - b->len, "htsbuff append overflow", __FILE__, __LINE__);
|
||||
b->buf[b->len++] = c;
|
||||
b->buf[b->len] = '\0';
|
||||
}
|
||||
|
||||
/** Reset content to s. Aborts on overflow. */
|
||||
static HTS_INLINE HTS_UNUSED void htsbuff_cpy(htsbuff *b, const char *s) {
|
||||
b->len = 0;
|
||||
htsbuff_catn(b, s, (size_t) -1);
|
||||
}
|
||||
|
||||
/** Current NUL-terminated content. */
|
||||
static HTS_INLINE HTS_UNUSED const char *htsbuff_str(const htsbuff *b) {
|
||||
return b->buf;
|
||||
}
|
||||
|
||||
#define malloct(A) malloc(A)
|
||||
#define calloct(A,B) calloc((A), (B))
|
||||
#define freet(A) do { if ((A) != NULL) { free(A); (A) = NULL; } } while(0)
|
||||
|
||||
159
src/htswizard.c
159
src/htswizard.c
@@ -43,17 +43,23 @@ Please visit our Website: http://www.httrack.com
|
||||
/* END specific definitions */
|
||||
|
||||
// libérer filters[0] pour insérer un élément dans filters[0]
|
||||
#define HT_INSERT_FILTERS0 do {\
|
||||
int i;\
|
||||
if (*opt->filters.filptr > 0) {\
|
||||
for(i = (*opt->filters.filptr)-1 ; i>=0 ; i--) {\
|
||||
strcpybuff((*opt->filters.filters)[i+1],(*opt->filters.filters)[i]);\
|
||||
}\
|
||||
}\
|
||||
(*opt->filters.filters)[0][0]='\0';\
|
||||
(*opt->filters.filptr)++;\
|
||||
assertf((*opt->filters.filptr) < opt->maxfilter); \
|
||||
} while(0)
|
||||
/* Per-slot capacity of the filters array, matching the slot stride allocated by
|
||||
filters_init() in htscore.c (HTS_URLMAXSIZE * 2). */
|
||||
#define HTS_FILTER_SLOT_SIZE (HTS_URLMAXSIZE * 2)
|
||||
|
||||
#define HT_INSERT_FILTERS0 \
|
||||
do { \
|
||||
int i; \
|
||||
if (*opt->filters.filptr > 0) { \
|
||||
for (i = (*opt->filters.filptr) - 1; i >= 0; i--) { \
|
||||
strlcpybuff((*opt->filters.filters)[i + 1], \
|
||||
(*opt->filters.filters)[i], HTS_FILTER_SLOT_SIZE); \
|
||||
} \
|
||||
} \
|
||||
(*opt->filters.filters)[0][0] = '\0'; \
|
||||
(*opt->filters.filptr)++; \
|
||||
assertf((*opt->filters.filptr) < opt->maxfilter); \
|
||||
} while (0)
|
||||
|
||||
typedef struct htspair_t {
|
||||
const char *tag;
|
||||
@@ -707,17 +713,21 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
|
||||
forbidden_url = 1;
|
||||
opt->wizard = 2; // sauter tout le reste
|
||||
break;
|
||||
case 0: // interdire les mêmes liens: adr/fil
|
||||
case 0: // forbid the same link: adr/fil
|
||||
forbidden_url = 1;
|
||||
HT_INSERT_FILTERS0; // insérer en 0
|
||||
strcpybuff(_FILTERS[0], "-");
|
||||
strcatbuff(_FILTERS[0], jump_identification_const(adr));
|
||||
if (*fil != '/')
|
||||
strcatbuff(_FILTERS[0], "/");
|
||||
strcatbuff(_FILTERS[0], fil);
|
||||
HT_INSERT_FILTERS0; // insert at slot 0
|
||||
{
|
||||
htsbuff f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
|
||||
|
||||
htsbuff_cpy(&f, "-");
|
||||
htsbuff_cat(&f, jump_identification_const(adr));
|
||||
if (*fil != '/')
|
||||
htsbuff_cat(&f, "/");
|
||||
htsbuff_cat(&f, fil);
|
||||
}
|
||||
break;
|
||||
|
||||
case 1: // éliminer répertoire entier et sous rép: adr/path/ *
|
||||
case 1: // forbid the whole directory and subdirs: adr/path/*
|
||||
forbidden_url = 1;
|
||||
{
|
||||
size_t i = strlen(fil) - 1;
|
||||
@@ -725,27 +735,34 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
|
||||
while((fil[i] != '/') && (i > 0))
|
||||
i--;
|
||||
if (fil[i] == '/') {
|
||||
HT_INSERT_FILTERS0; // insérer en 0
|
||||
strcpybuff(_FILTERS[0], "-");
|
||||
strcatbuff(_FILTERS[0], jump_identification_const(adr));
|
||||
htsbuff f;
|
||||
|
||||
HT_INSERT_FILTERS0; // insert at slot 0
|
||||
f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
|
||||
htsbuff_cpy(&f, "-");
|
||||
htsbuff_cat(&f, jump_identification_const(adr));
|
||||
if (*fil != '/')
|
||||
strcatbuff(_FILTERS[0], "/");
|
||||
strncatbuff(_FILTERS[0], fil, i);
|
||||
if (_FILTERS[0][strlen(_FILTERS[0]) - 1] != '/')
|
||||
strcatbuff(_FILTERS[0], "/");
|
||||
strcatbuff(_FILTERS[0], "*");
|
||||
htsbuff_cat(&f, "/");
|
||||
htsbuff_catn(&f, fil, i);
|
||||
if (f.len > 0 && f.buf[f.len - 1] != '/')
|
||||
htsbuff_cat(&f, "/");
|
||||
htsbuff_cat(&f, "*");
|
||||
}
|
||||
}
|
||||
|
||||
// ** ...
|
||||
break;
|
||||
|
||||
case 2: // adresse adr*
|
||||
case 2: // the whole address: adr*
|
||||
forbidden_url = 1;
|
||||
HT_INSERT_FILTERS0; // insérer en 0
|
||||
strcpybuff(_FILTERS[0], "-");
|
||||
strcatbuff(_FILTERS[0], jump_identification_const(adr));
|
||||
strcatbuff(_FILTERS[0], "*");
|
||||
HT_INSERT_FILTERS0; // insert at slot 0
|
||||
{
|
||||
htsbuff f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
|
||||
|
||||
htsbuff_cpy(&f, "-");
|
||||
htsbuff_cat(&f, jump_identification_const(adr));
|
||||
htsbuff_cat(&f, "*");
|
||||
}
|
||||
break;
|
||||
|
||||
case 3: // ** A FAIRE
|
||||
@@ -777,54 +794,70 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
|
||||
|
||||
break;
|
||||
|
||||
case 5: // autoriser répertoire entier et fils
|
||||
if ((opt->seeker & 2) == 0) { // interdiction de monter
|
||||
case 5: // allow the whole directory and its children
|
||||
if ((opt->seeker & 2) == 0) { // not allowed to go up
|
||||
size_t i = strlen(fil) - 1;
|
||||
|
||||
while((fil[i] != '/') && (i > 0))
|
||||
i--;
|
||||
if (fil[i] == '/') {
|
||||
HT_INSERT_FILTERS0; // insérer en 0
|
||||
strcpybuff(_FILTERS[0], "+");
|
||||
strcatbuff(_FILTERS[0], jump_identification_const(adr));
|
||||
if (*fil != '/')
|
||||
strcatbuff(_FILTERS[0], "/");
|
||||
strncatbuff(_FILTERS[0], fil, i + 1);
|
||||
strcatbuff(_FILTERS[0], "*");
|
||||
HT_INSERT_FILTERS0; // insert at slot 0
|
||||
{
|
||||
htsbuff f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
|
||||
|
||||
htsbuff_cpy(&f, "+");
|
||||
htsbuff_cat(&f, jump_identification_const(adr));
|
||||
if (*fil != '/')
|
||||
htsbuff_cat(&f, "/");
|
||||
htsbuff_catn(&f, fil, i + 1);
|
||||
htsbuff_cat(&f, "*");
|
||||
}
|
||||
}
|
||||
} else { // then allow the domain
|
||||
HT_INSERT_FILTERS0; // insert at slot 0
|
||||
{
|
||||
htsbuff f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
|
||||
|
||||
htsbuff_cpy(&f, "+");
|
||||
htsbuff_cat(&f, jump_identification_const(adr));
|
||||
htsbuff_cat(&f, "*");
|
||||
}
|
||||
} else { // autoriser domaine alors!!
|
||||
HT_INSERT_FILTERS0; // insérer en 0 strcpybuff(filters[filptr],"+");
|
||||
strcpybuff(_FILTERS[0], "+");
|
||||
strcatbuff(_FILTERS[0], jump_identification_const(adr));
|
||||
strcatbuff(_FILTERS[0], "*");
|
||||
}
|
||||
break;
|
||||
|
||||
case 6: // same domain
|
||||
HT_INSERT_FILTERS0; // insérer en 0 strcpybuff(filters[filptr],"+");
|
||||
strcpybuff(_FILTERS[0], "+");
|
||||
strcatbuff(_FILTERS[0], jump_identification_const(adr));
|
||||
strcatbuff(_FILTERS[0], "*");
|
||||
HT_INSERT_FILTERS0; // insert at slot 0
|
||||
{
|
||||
htsbuff f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
|
||||
|
||||
htsbuff_cpy(&f, "+");
|
||||
htsbuff_cat(&f, jump_identification_const(adr));
|
||||
htsbuff_cat(&f, "*");
|
||||
}
|
||||
break;
|
||||
//
|
||||
case 7: // autoriser ce répertoire
|
||||
{
|
||||
size_t i = strlen(fil) - 1;
|
||||
case 7: // allow this directory
|
||||
{
|
||||
size_t i = strlen(fil) - 1;
|
||||
|
||||
while((fil[i] != '/') && (i > 0))
|
||||
i--;
|
||||
if (fil[i] == '/') {
|
||||
HT_INSERT_FILTERS0; // insérer en 0
|
||||
strcpybuff(_FILTERS[0], "+");
|
||||
strcatbuff(_FILTERS[0], jump_identification_const(adr));
|
||||
while ((fil[i] != '/') && (i > 0))
|
||||
i--;
|
||||
if (fil[i] == '/') {
|
||||
HT_INSERT_FILTERS0; // insert at slot 0
|
||||
{
|
||||
htsbuff f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
|
||||
|
||||
htsbuff_cpy(&f, "+");
|
||||
htsbuff_cat(&f, jump_identification_const(adr));
|
||||
if (*fil != '/')
|
||||
strcatbuff(_FILTERS[0], "/");
|
||||
strncatbuff(_FILTERS[0], fil, i + 1);
|
||||
strcatbuff(_FILTERS[0], "*[file]");
|
||||
htsbuff_cat(&f, "/");
|
||||
htsbuff_catn(&f, fil, i + 1);
|
||||
htsbuff_cat(&f, "*[file]");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
break;
|
||||
break;
|
||||
|
||||
case 50: // on fait rien
|
||||
break;
|
||||
|
||||
@@ -193,6 +193,7 @@ HTSEXT_API int structcheck(const char *path);
|
||||
HTSEXT_API int structcheck_utf8(const char *path);
|
||||
HTSEXT_API int dir_exists(const char *path);
|
||||
HTSEXT_API void infostatuscode(char *msg, int statuscode);
|
||||
HTSEXT_API const char *infostatuscode_const(int statuscode);
|
||||
HTSEXT_API TStamp mtime_local(void);
|
||||
HTSEXT_API void qsec2str(char *st, TStamp t);
|
||||
HTSEXT_API char *int2char(strc_int2bytes2 * strc, int n);
|
||||
|
||||
46
tests/01_engine-cache.test
Executable file
46
tests/01_engine-cache.test
Executable file
@@ -0,0 +1,46 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
|
||||
# Cache create/read/update logic (driven by 'httrack -#A <dir>').
|
||||
#
|
||||
# The in-process self-test stores several hand-crafted edge entries (normal
|
||||
# HTML, an empty redirect with a near-limit location, a non-HTML body kept via
|
||||
# all-in-cache, a binary body with embedded NUL/high bytes), a few thousand
|
||||
# small entries (index/lookup scale), and a few large compressible and
|
||||
# incompressible bodies (zlib deflate/inflate). It reads everything back
|
||||
# asserting every header field and the body round-trip byte for byte, then
|
||||
# updates one entry and confirms the new value is read back. It exits non-zero
|
||||
# on the first mismatch.
|
||||
|
||||
set -eu
|
||||
|
||||
dir=$(mktemp -d)
|
||||
trap 'rm -rf "$dir"' EXIT
|
||||
|
||||
# Like the other -# debug modes, a trailing token (the working directory) is
|
||||
# required; a bare '-#A' falls through to the usage screen.
|
||||
out=$(httrack -#A "$dir")
|
||||
|
||||
# Match the exact success line, so the test cannot pass for an unrelated reason
|
||||
# (e.g. the -#A mode being gone and falling through to the usage screen, which
|
||||
# also exits non-zero but never prints this).
|
||||
test "$out" = "cache-selftest: OK" || {
|
||||
echo "expected 'cache-selftest: OK', got: $out" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
# The self-test must have actually produced a ZIP cache on disk.
|
||||
test -e "$dir/hts-cache/new.zip" || {
|
||||
echo "no ZIP cache was written by the self-test" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Sanity-check the cache footprint: the few-thousand-entry pass is expected to
|
||||
# weigh ~1-2 MB. Fail if it balloons well past that (e.g. a per-entry overhead
|
||||
# regression or runaway growth), so the cache size stays bounded.
|
||||
ceiling=$((4 * 1024 * 1024))
|
||||
bytes=$(du -sb "$dir/hts-cache" | cut -f1)
|
||||
test "$bytes" -le "$ceiling" || {
|
||||
echo "cache footprint $bytes bytes exceeds ${ceiling} ceiling" >&2
|
||||
exit 1
|
||||
}
|
||||
71
tests/01_engine-cmdline.test
Executable file
71
tests/01_engine-cmdline.test
Executable file
@@ -0,0 +1,71 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
|
||||
# Offline command-line option tests (no network). The -F user-agent and -%X
|
||||
# raw-header values used to be rejected past 126 / 256 bytes (#152); they are
|
||||
# now bounded only by the general per-argument cap (HTS_CDLMAXSIZE). A value up
|
||||
# to that cap is accepted on both the short (-F, -%X) and long (--user-agent,
|
||||
# --headers) forms, and an over-cap value is refused cleanly rather than
|
||||
# overrunning a fixed scratch buffer.
|
||||
|
||||
set -u
|
||||
|
||||
tmp=$(mktemp -d "${TMPDIR:-/tmp}/httrack_cmdline.XXXXXX") || exit 1
|
||||
trap 'rm -rf "$tmp"' EXIT HUP INT QUIT PIPE TERM
|
||||
|
||||
echo '<html><body>hello</body></html>' >"$tmp/index.html"
|
||||
|
||||
# a string of N repeated 'A' characters
|
||||
nchars() {
|
||||
printf 'A%.0s' $(seq 1 "$1")
|
||||
}
|
||||
|
||||
# crawl the local fixture with the given extra args; leaves the exit status in RC
|
||||
run() {
|
||||
local out="$1"
|
||||
shift
|
||||
rm -rf "$out"
|
||||
mkdir -p "$out"
|
||||
httrack "file://$tmp/index.html" -O "$out" --quiet -n "$@" >"$out/.log" 2>&1
|
||||
RC=$?
|
||||
}
|
||||
|
||||
# assert the value was accepted: clean exit and the fixture was mirrored
|
||||
accepted() {
|
||||
{ test "$RC" -eq 0 && test -n "$(find "$1" -type f -path '*/index.html' -print -quit)"; } ||
|
||||
! echo "FAIL: $2 (exit $RC)" || exit 1
|
||||
}
|
||||
|
||||
# assert the value was refused cleanly: a normal error exit, never a crash
|
||||
# (a SIGABRT from an overflowed scratch buffer would surface as exit 134)
|
||||
refused() {
|
||||
{ test "$RC" -ne 0 && test "$RC" -ne 134; } ||
|
||||
! echo "FAIL: $1 (exit $RC)" || exit 1
|
||||
}
|
||||
|
||||
# a value past the old 126/256 caps but within the cap is accepted, on both the
|
||||
# short and long form of each option
|
||||
long=$(nchars 900)
|
||||
run "$tmp/ua-s" -F "$long"
|
||||
accepted "$tmp/ua-s" "#152: long -F user-agent rejected or crashed"
|
||||
run "$tmp/ua-l" --user-agent "$long"
|
||||
accepted "$tmp/ua-l" "#152: long --user-agent rejected or crashed"
|
||||
run "$tmp/hd-s" "-%X" "X-A: $long"
|
||||
accepted "$tmp/hd-s" "#152: long -%X header rejected or crashed"
|
||||
run "$tmp/hd-l" --headers "X-B: $long"
|
||||
accepted "$tmp/hd-l" "#152: long --headers rejected or crashed"
|
||||
|
||||
# a value just under the cap (>1000) must not overflow the long-form alias
|
||||
# scratch buffer (the param[] copy in optalias_check)
|
||||
run "$tmp/ua-n" --user-agent "$(nchars 1010)"
|
||||
accepted "$tmp/ua-n" "#152: near-cap --user-agent overflowed the param[] buffer"
|
||||
|
||||
# a value over the cap is refused cleanly (graceful error, not a SIGABRT), on
|
||||
# both forms
|
||||
over=$(nchars 1100)
|
||||
run "$tmp/ov-s" -F "$over"
|
||||
refused "#152: over-cap -F not refused cleanly"
|
||||
run "$tmp/ov-l" --user-agent "$over"
|
||||
refused "#152: over-cap --user-agent not refused cleanly"
|
||||
|
||||
exit 0
|
||||
@@ -99,17 +99,25 @@ grep -Eq 'srcset="j\.gif 2x"' "$saved" ||
|
||||
! grep -Eq 'srcset="[^"]*file://' "$saved" ||
|
||||
! echo "FAIL: a file:// URL survived inside a rewritten srcset attribute" || exit 1
|
||||
|
||||
# xlink:href (#298) and inline background-image (#237): detected and rewritten
|
||||
# to local; no-detect attributes (title, alt, ...) left untouched. Asserted by
|
||||
# rewrite (deterministic), not download. data-* (#201/#203) is omitted: its
|
||||
# detection is currently nondeterministic and can't be locked yet.
|
||||
# xlink:href (#298) and CSS background-image (#237): detected and rewritten to
|
||||
# local. background-image is covered in both an external <style> block and an
|
||||
# inline style attribute, with the URL unquoted, double-quoted and single-quoted
|
||||
# (the quote style is preserved on rewrite). No-detect attributes (title, alt,
|
||||
# ...) are left untouched. Asserted by rewrite (deterministic), not download.
|
||||
# data-* (#201/#203) is omitted: its detection is currently nondeterministic and
|
||||
# can't be locked yet.
|
||||
site2="$tmp/attrs"
|
||||
mkdir -p "$site2"
|
||||
for f in xl ibg tt; do gif "$site2/$f.gif"; done
|
||||
for f in xl ibg ibgs cex cexd cexs tt; do gif "$site2/$f.gif"; done
|
||||
cat >"$site2/index.html" <<EOF
|
||||
<html><body>
|
||||
<html><head><style>
|
||||
.a { background-image: url(file://$site2/cex.gif); }
|
||||
.b { background-image: url("file://$site2/cexd.gif"); }
|
||||
.c { background-image: url('file://$site2/cexs.gif'); }
|
||||
</style></head><body>
|
||||
<a xlink:href="file://$site2/xl.gif">xlink:href (#298)</a>
|
||||
<div style="background-image:url(file://$site2/ibg.gif)"></div>
|
||||
<div style="background-image:url('file://$site2/ibgs.gif')"></div>
|
||||
<span title="file://$site2/tt.gif">excluded attribute</span>
|
||||
</body></html>
|
||||
EOF
|
||||
@@ -121,8 +129,24 @@ test -n "$saved2" || ! echo "FAIL: saved attrs page not found" || exit 1
|
||||
# detected attributes: the absolute URL is rewritten to a local link
|
||||
grep -Eq 'xlink:href="xl\.gif"' "$saved2" ||
|
||||
! echo "FAIL #298: xlink:href not detected/rewritten" || exit 1
|
||||
|
||||
# #237 external <style> block, each quoting form, quote style preserved
|
||||
grep -Eq 'url\(cex\.gif\)' "$saved2" ||
|
||||
! echo "FAIL #237: unquoted background-image in <style> not rewritten" || exit 1
|
||||
grep -Eq 'url\("cexd\.gif"\)' "$saved2" ||
|
||||
! echo "FAIL #237: double-quoted background-image in <style> not rewritten" || exit 1
|
||||
grep -Eq "url\('cexs\.gif'\)" "$saved2" ||
|
||||
! echo "FAIL #237: single-quoted background-image in <style> not rewritten" || exit 1
|
||||
|
||||
# #237 inline style attribute, unquoted and single-quoted url()
|
||||
grep -Eq 'style="background-image:url\(ibg\.gif\)"' "$saved2" ||
|
||||
! echo "FAIL #237: inline background-image url() not detected/rewritten" || exit 1
|
||||
! echo "FAIL #237: inline unquoted background-image not rewritten" || exit 1
|
||||
grep -Eq "style=\"background-image:url\('ibgs\.gif'\)\"" "$saved2" ||
|
||||
! echo "FAIL #237: inline single-quoted background-image not rewritten" || exit 1
|
||||
|
||||
# no file:// URL survived inside any rewritten background-image
|
||||
! grep -Eq 'background-image:[^;"]*file://' "$saved2" ||
|
||||
! echo "FAIL #237: a file:// URL survived inside a rewritten background-image" || exit 1
|
||||
|
||||
# excluded attribute: title is on the no-detect list, so its value is left as-is
|
||||
grep -q 'title="file://' "$saved2" ||
|
||||
|
||||
34
tests/01_engine-strsafe.test
Executable file
34
tests/01_engine-strsafe.test
Executable file
@@ -0,0 +1,34 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
|
||||
# htssafe.h bounded string operations (driven by 'httrack -#8').
|
||||
|
||||
# Success path: every bounded op (strcpybuff/strcatbuff/strncatbuff/strlcpybuff)
|
||||
# must behave correctly. Like the other -# debug modes, a trailing token is
|
||||
# required (a bare '-#8' falls through to the usage screen).
|
||||
out=$(httrack -#8 run)
|
||||
test $? -eq 0 || exit 1
|
||||
test "$out" == "strsafe: OK" || exit 1
|
||||
|
||||
# Overflow path: an over-capacity write into a sized buffer must be caught by
|
||||
# the bounded macro and abort the process, not be silently truncated/completed.
|
||||
# Assert the htssafe abort signature specifically, so the test cannot pass for
|
||||
# an unrelated reason (e.g. the -#8 mode being gone and falling through to the
|
||||
# usage screen, which also exits non-zero).
|
||||
err=$(httrack -#8 overflow "this string is far too long for the buffer" 2>&1)
|
||||
case "$err" in
|
||||
*"strsafe: NOT aborted"*) echo "over-capacity write was NOT caught" >&2; exit 1 ;;
|
||||
*"overflow while copying"*) ;;
|
||||
*) echo "expected htssafe overflow abort, got: $err" >&2; exit 1 ;;
|
||||
esac
|
||||
|
||||
# Same guarantee for the htsbuff builder. The source is exactly the buffer
|
||||
# capacity (4 bytes into a 4-byte buffer), so this also pins the boundary: a
|
||||
# '<=' off-by-one in the capacity check would let it through (and print "NOT
|
||||
# aborted"). Match the specific htsbuff abort message, not just any assert.
|
||||
err=$(httrack -#8 overflow-buff "abcd" 2>&1)
|
||||
case "$err" in
|
||||
*"strsafe: NOT aborted"*) echo "htsbuff over-capacity write was NOT caught" >&2; exit 1 ;;
|
||||
*"htsbuff append overflow"*) ;;
|
||||
*) echo "expected htsbuff overflow abort, got: $err" >&2; exit 1 ;;
|
||||
esac
|
||||
62
tests/02_update-cache.test
Executable file
62
tests/02_update-cache.test
Executable file
@@ -0,0 +1,62 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
|
||||
# Update path: re-mirroring a site reads the cache (cache_readex) to decide what
|
||||
# is up to date -- a path the one-shot crawl tests never exercise. Offline
|
||||
# (file://), so it always runs.
|
||||
#
|
||||
# 1. mirror, then re-mirror unchanged -> the cache-read pass must complete clean
|
||||
# (guards against a crash/abort/error in cache_readex).
|
||||
# 2. change a source file, re-mirror -> the update must pick up the new content
|
||||
# (guards the update decision that reads the cached metadata).
|
||||
|
||||
set -eu
|
||||
|
||||
site=$(mktemp -d)
|
||||
out=$(mktemp -d)
|
||||
trap 'rm -rf "$site" "$out"' EXIT
|
||||
|
||||
cat >"$site/index.html" <<EOF
|
||||
<a href="a.html">a</a> <a href="sub/b.html">b</a>
|
||||
EOF
|
||||
echo 'OLDCONTENT' >"$site/a.html"
|
||||
mkdir -p "$site/sub"
|
||||
echo '<p>bbb</p>' >"$site/sub/b.html"
|
||||
|
||||
url="file://$site/index.html"
|
||||
|
||||
# count Error: lines in the log (grep -c exits 1 on zero matches: guard it)
|
||||
errors() { grep -ciE '^[0-9:]*[[:space:]]Error:' "$out/hts-log.txt" || true; }
|
||||
|
||||
# 1. fresh mirror writes the cache
|
||||
httrack "$url" -O "$out" -q -%v0 -r3 >/dev/null 2>&1
|
||||
test -e "$out/hts-cache/new.zip" || {
|
||||
echo "no cache was written" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
# 2. re-mirror unchanged: the update reads the cache and must complete cleanly
|
||||
httrack "$url" -O "$out" -q -%v0 -r3 >/dev/null 2>&1
|
||||
test "$(errors)" = 0 || {
|
||||
echo "update (unchanged) reported errors" >&2
|
||||
exit 1
|
||||
}
|
||||
for suffix in a.html sub/b.html; do
|
||||
find "$out" -path "*/$suffix" | grep -q . || {
|
||||
echo "missing $suffix after update" >&2
|
||||
exit 1
|
||||
}
|
||||
done
|
||||
|
||||
# 3. change a source file: the update must pick up the new content
|
||||
sleep 1
|
||||
echo 'NEWCONTENT' >"$site/a.html"
|
||||
httrack "$url" -O "$out" -q -%v0 -r3 >/dev/null 2>&1
|
||||
test "$(errors)" = 0 || {
|
||||
echo "update (changed) reported errors" >&2
|
||||
exit 1
|
||||
}
|
||||
grep -q NEWCONTENT "$(find "$out" -path '*/a.html')" || {
|
||||
echo "update did not pick up the changed source" >&2
|
||||
exit 1
|
||||
}
|
||||
@@ -9,6 +9,27 @@ TESTS_ENVIRONMENT += HTTPS_SUPPORT=$(HTTPS_SUPPORT)
|
||||
TESTS_ENVIRONMENT += top_srcdir=$(top_srcdir)
|
||||
|
||||
TEST_EXTENSIONS = .test
|
||||
TESTS = 00_runnable.test 01_engine-charset.test 01_engine-entities.test 01_engine-filter.test 01_engine-hashtable.test 01_engine-idna.test 01_engine-mime.test 01_engine-parse.test 01_engine-simplify.test 02_manpage-regen.test 10_crawl-simple.test 11_crawl-cookies.test 11_crawl-idna.test 11_crawl-international.test 11_crawl-longurl.test 11_crawl-parsing.test 12_crawl_https.test
|
||||
TESTS = \
|
||||
00_runnable.test \
|
||||
01_engine-cache.test \
|
||||
01_engine-charset.test \
|
||||
01_engine-cmdline.test \
|
||||
01_engine-entities.test \
|
||||
01_engine-filter.test \
|
||||
01_engine-hashtable.test \
|
||||
01_engine-idna.test \
|
||||
01_engine-mime.test \
|
||||
01_engine-parse.test \
|
||||
01_engine-simplify.test \
|
||||
01_engine-strsafe.test \
|
||||
02_manpage-regen.test \
|
||||
02_update-cache.test \
|
||||
10_crawl-simple.test \
|
||||
11_crawl-cookies.test \
|
||||
11_crawl-idna.test \
|
||||
11_crawl-international.test \
|
||||
11_crawl-longurl.test \
|
||||
11_crawl-parsing.test \
|
||||
12_crawl_https.test
|
||||
|
||||
CLEANFILES = check-network_sh.cache
|
||||
|
||||
@@ -472,7 +472,7 @@ TESTS_ENVIRONMENT = PATH=$(top_builddir)/src$(PATH_SEPARATOR)$$PATH \
|
||||
ONLINE_UNIT_TESTS=$(ONLINE_UNIT_TESTS) \
|
||||
HTTPS_SUPPORT=$(HTTPS_SUPPORT) top_srcdir=$(top_srcdir)
|
||||
TEST_EXTENSIONS = .test
|
||||
TESTS = 00_runnable.test 01_engine-charset.test 01_engine-entities.test 01_engine-filter.test 01_engine-hashtable.test 01_engine-idna.test 01_engine-mime.test 01_engine-parse.test 01_engine-simplify.test 02_manpage-regen.test 10_crawl-simple.test 11_crawl-cookies.test 11_crawl-idna.test 11_crawl-international.test 11_crawl-longurl.test 11_crawl-parsing.test 12_crawl_https.test
|
||||
TESTS = 00_runnable.test 01_engine-charset.test 01_engine-cmdline.test 01_engine-entities.test 01_engine-filter.test 01_engine-hashtable.test 01_engine-idna.test 01_engine-mime.test 01_engine-parse.test 01_engine-simplify.test 02_manpage-regen.test 10_crawl-simple.test 11_crawl-cookies.test 11_crawl-idna.test 11_crawl-international.test 11_crawl-longurl.test 11_crawl-parsing.test 12_crawl_https.test
|
||||
CLEANFILES = check-network_sh.cache
|
||||
all: all-am
|
||||
|
||||
|
||||
Reference in New Issue
Block a user