Compare commits

...

22 Commits

Author SHA1 Message Date
Xavier Roche
05306ee4fd Curate the 3.49-8 release notes
Round out the 3.49-8 entry in history.txt and the debian changelog with the
user-facing work landed since 3.49-7: the HTTPS-proxy CONNECT tunnel, wider
srcset parsing, the crawler and parser fixes (CSS @import, xmlns, relative
paths, RFC 6265 cookies, doit.log reload), the parser and engine buffer-copy
security hardening, and brief summary lines for the API, build, CI and test
work.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 13:02:51 +02:00
Xavier Roche
1d0fc0a566 Merge pull request #403 from xroche/chore/clang-format-separate-defs
Separate definition blocks in the public headers
2026-06-20 12:56:23 +02:00
Xavier Roche
a4452592b4 Separate definition blocks and canonicalize the public headers
Set SeparateDefinitionBlocks: Always in .clang-format so clang-format keeps
a blank line between adjacent definitions, then reformat the installed
(DevIncludes) headers in full. Several of them packed struct/typedef/macro
definitions with no separation and carried non-canonical spacing (char*,
__attribute__ ((x)), padded inner parens), which made them hard to read;
this brings them to the repo's clang-format-19 canonical form and inserts
the separating blank lines.

Headers only, no semantic change: out-of-tree build is clean and make check
passes (21 pass, 7 network skip, 0 fail). htsconfig.h is UTF-8 and its
French comments survive byte-for-byte (clang-format only reflowed them to 80
columns). The new option also governs future touched-line formatting of the
engine sources.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 12:52:19 +02:00
Xavier Roche
62c2364b59 Merge pull request #402 from xroche/chore/lint-all-shell-scripts
Lint every shell script with shfmt and shellcheck
2026-06-20 12:42:19 +02:00
Xavier Roche
fe7041ddbf Address review: keep empty-PATH parity, fold the CI script list
Review of the array refactor flagged one behaviour divergence: splitting
PATH with `IFS=: read -ra` keeps empty fields (from doubled or leading
colons) as "" elements, where the old `echo $PATH | tr : ' '` word-split
dropped them, so the search loop would probe /htsserver. Skip the empty
fields to restore exact parity.

Also reflow the CI SHELL_SCRIPTS list as a folded block scalar, one
entry per line and sorted, so it reads cleanly; the folded value is the
same space-separated string.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 12:39:31 +02:00
Xavier Roche
f5543df1af ci: lint every shell script with shellcheck and shfmt
The lint job only covered a handful of scripts; bootstrap, build.sh, the
generators, webhttrack, the CGI search helper and the crawl/run-all test
harnesses went unchecked, and shfmt ran on three files. Now both linters
run over the whole tracked shell tree, listed once in a job-level env var
so the two steps stay in sync.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 11:37:09 +02:00
Xavier Roche
fee30aa95d Make every shell script shellcheck-clean
Fix the shellcheck findings the shfmt pass left behind, all proven
behaviour-preserving:

- Quote single-value expansions, drop the redundant ${} in arithmetic,
  add read -r, and use printf '%s' instead of variables in format
  strings, across the generators, crawl-test.sh, run-all-tests.sh and
  search.sh.
- crawl-test.sh / webhttrack: turn the deliberately word-split search
  lists into bash arrays (space-safe, no scattered disables) and replace
  the numeric trap signal lists with names, dropping the un-trappable
  KILL/STOP that bash silently ignored anyway.
- search.sh: drop the bogus \" escapes that made grep search for a
  literal-quoted pattern.

The generators are exercised by hand and ship their committed output
(htscodepages.h, htsentities.h); a differential run on synthetic input
confirms byte-identical output before and after. crawl-test.sh and
webhttrack were run end to end against a local server / a faked install,
the latter also proving the array search now survives spaces in paths.
SC2153/SC2120 false positives carry a scoped disable with a reason.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 11:35:55 +02:00
Xavier Roche
f9f4700ee1 Reformat every shell script with shfmt -i 4
Mechanical pass: run shfmt -i 4 over the whole tracked shell tree (the
test harness .test files, the regen generators, webhttrack, the CGI
search helper, and the build/dist scripts) so they share one style.
shfmt also normalised backticks to $(...) and $[..] to $((..)).

No behaviour change: arithmetic is preserved exactly, non-ASCII bytes
are untouched, and the full make check suite still passes. The tab
indented .test files become 4-space indented, hence the wide diff.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 11:24:01 +02:00
Xavier Roche
f030fa21e3 Merge pull request #401 from xroche/fix/relative-path-dotdot-137-162
Test the relative-link engine; collapse ../ in file:// URLs
2026-06-20 11:15:53 +02:00
Xavier Roche
bdd1c1bc2c Test the relative-link engine; collapse ../ in file:// URLs
The ../-handling tickets #137 (embedded ../ in a URL) and #162 (cross-host
"too many ../") do not reproduce on master or the released 3.49.x: the engine
has resolved embedded, cross-host, out-of-scope and above-root ../ correctly
since the 2012 import, and the released binary behaves identically. #137's
actual breakage was a JS-generated iframe URL (httrack can't rewrite
dynamically-built links); #162 is a long-gone Windows path quirk.

The area was nearly untested, though, despite feeding both link rewriting and
crawl-scope decisions: two trivial lienrelatif asserts, none for
ident_url_relatif. Add a wide regression net via two hidden debug probes
(-#l lienrelatif, -#i ident_url_relatif, mirroring -#1 fil_simplifie) driving
tens of cases in tests/01_engine-relative.test (embedded/cross-host/sibling/
ancestor/above-root ../, query stripping, scheme handling), plus the missing
fil_simplifie edge cases (absolute paths, root clamp, query freeze) in
01_engine-simplify.test. Expected values are computed by hand, not echoed.

While covering it, fixed one real gap: the file:// branch of
ident_url_absolute skipped the fil_simplifie its http sibling runs, so file://
URLs kept their ../ in adrfil->fil while the save path was already collapsed
(htsname.c:1343). Collapsing it matches the other schemes, contains traversal
at the file:// root, and dedups a/../b against b.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 11:14:28 +02:00
Xavier Roche
56665a268f Merge pull request #400 from xroche/fix/css-url-paren-163
Encode parens in rewritten CSS url() so the value isn't truncated (#163)
2026-06-20 10:02:32 +02:00
Xavier Roche
2e948b9acd htsparse: percent-encode parens in rewritten CSS url() (#163)
A source url(...) whose target encodes '(' ')' as %28/%29 was rewritten
with literal parens, because they are RFC2396 "mark" characters that the
URI escaper (escape_uri_utf, mode 30) leaves alone. In an unquoted CSS
url(...) the literal ')' closes the token early, so the browser mis-parses
the value and drops the background image.

Re-escape '(' and ')' back to %28/%29 when emitting the link, gated on the
url() context (ending_p == ')'). The UA decodes them to the saved-on-disk
name, so the reference still resolves. Quoted url("...") and ordinary HTML
attributes keep their parens, matching prior behavior.

Test in 01_engine-parse.test crawls a CSS fixture whose url() references a
%20%28...%29 name and asserts the rewrite keeps the parens encoded;
negative control confirmed (literal-paren output fails it).

Closes #163

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-20 10:01:17 +02:00
Xavier Roche
cae11499f1 Merge pull request #399 from xroche/fix/js-string-falsepos-218
htsparse: don't treat XHR.open's method argument as a URL (#218)
2026-06-19 20:36:26 +02:00
Xavier Roche
02c7f4ebf6 htsparse: don't treat XHR.open's method argument as a URL (#218)
The JavaScript URL detector matched `.open(` for window.open("url",...)
and captured the first argument as a link. XMLHttpRequest.open(method,
url) puts the HTTP method first, so `xhr.open("GET", "ajax_info.txt")`
turned "GET" into a bogus link, rewritten to "GET.html" on a live server.

Reject a first argument that is exactly an HTTP method, mirroring the
existing ensure_not_mime guard. window.open(url) is unaffected; the real
XHR url (the second argument) is still picked up by the dirty parser.

Closes #218

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-19 20:27:04 +02:00
Xavier Roche
9070b44a70 Merge pull request #398 from xroche/fix/html-underflow-396
htsparse: fix buffer underflow reading *(html-1) at offset 0 (#396)
2026-06-19 19:55:40 +02:00
Xavier Roche
799c045061 htsparse: don't read *(html-1) before the parse buffer (#396)
The link detector's word-boundary guards dereference *(html-1) to check
the byte preceding a matched token. When the token sits at the very start
of the parse buffer (html == r->adr), that reads one byte before the
allocation: a heap-buffer-overflow under ASan, silent on a normal build.
A stylesheet beginning with a url() token is enough to hit it.

Route the three reachable guards (url(), location=, the makeindex /title
check) through html_prevc(), which returns a space sentinel at the buffer
start. Space is the right value for these tests: a token at offset 0 is at
a word boundary, so it stays a valid match. The other *(html-1) sites only
run after html has advanced past an opening tag or quote.

Covers it with an offset-0 url() fixture in 01_engine-parse.test; without
the fix it aborts at htsparse.c:1386 under the CI sanitizer job.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-19 19:44:25 +02:00
Xavier Roche
fb1ee3bf2e Merge pull request #397 from xroche/fix/css-import-94
CSS @import: capture URLs that carry a media/supports/layer condition (#94)
2026-06-19 19:30:21 +02:00
Xavier Roche
6a08ca7d39 htsparse: bound the URL-end scan against a missing closing delimiter
Reviewing the @import change, ASan flagged a pre-existing heap overflow:
when a quoted/parenthesized link token has no closing delimiter before the
buffer ends (truncated input such as `@import "x`, `@import "`, `url("x`),
the scan stops at the terminating NUL, then `c += ndelim` steps past it and
`while (*c == ' ')` / the terminator test read out of bounds. Such input
aborts under ASan on master.

Skip the URL-end scan and capture when no closing delimiter was found
(`*c == '\0'` right after the scan); c never advances past the NUL.
Well-formed tokens are unaffected.

01_engine-parse.test gains a truncated-@import fixture (the valid sibling
import is still captured, the unterminated one is not) that trips the
overflow under the CI ASan job, plus a check that an @import's trailing
media/supports/layer condition survives the rewrite verbatim.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-19 19:25:39 +02:00
Xavier Roche
a8b491e509 htsparse: capture conditional CSS @import URLs (#94)
A bare-string @import carrying a media/supports/layer condition, e.g.
`@import "theme.css" screen;`, was dropped. The detector required the closing
quote to be immediately followed by the statement terminator, so the trailing
condition aborted the capture. The `url(...)` form already worked because it
terminates at the paren.

Two coupled defects in the inscript/CSS detector:
- accept a whitespace-separated trailing condition after a quoted @import URL;
- bound the captured URL at its last content char (b) instead of recomputing
  from the terminator. The old `c -= (ndelim + 1)` mishandled spaces skipped
  before the terminator, leaving the closing quote inside the range so the
  bogus-link guard aborted. That also silently broke `foo="url" ;` (a space
  before the semicolon) for every quoted detection, not only @import.

01_engine-parse.test gains a CSS @import section that crawls a .css directly;
the conditioned cases are negative controls that fail without the fix.

Closes #94

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-19 18:46:31 +02:00
Xavier Roche
a8e4bb3b81 Merge pull request #395 from xroche/fix/xmlns-false-links-191
Don't crawl xmlns namespace declarations
2026-06-19 18:28:23 +02:00
Xavier Roche
0145ec37a3 htsparse: don't crawl xmlns namespace declarations (#191)
The "dirty parsing" heuristic accepts any tag attribute whose value looks
like a URL unless the attribute is on the no-detect list. xmlns and
xmlns:prefix declarations carry namespace URIs (xmlns:og="http://ogp.me/ns#",
etc.) that are not resources, so httrack queued and fetched them, stalling
the crawl on unrelated spec URLs. Reject xmlns/xmlns:prefix where the
no-detect list is already consulted.

01_engine-parse.test grows a fixture with each form (default and prefixed) as
the last attribute of its element, since the heuristic only inspects an
attribute whose value is immediately followed by '>'; the targets are local
file:// gifs so a regression actually downloads them (verified: reverting the
guard fetches all three).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-19 18:24:55 +02:00
Xavier Roche
a80fab38ba Merge pull request #394 from xroche/fix/proxy-https-connect-85
Tunnel https through the proxy via CONNECT (#85)
2026-06-19 18:03:31 +02:00
43 changed files with 1700 additions and 1146 deletions

View File

@@ -16,6 +16,7 @@ BasedOnStyle: LLVM
SpaceAfterCStyleCast: true # "(int) x", overwhelmingly dominant (542 vs 7) SpaceAfterCStyleCast: true # "(int) x", overwhelmingly dominant (542 vs 7)
SortIncludes: false # C include order can be significant; never reorder SortIncludes: false # C include order can be significant; never reorder
IncludeBlocks: Preserve # do not merge/reflow include groups IncludeBlocks: Preserve # do not merge/reflow include groups
SeparateDefinitionBlocks: Always # blank line between definitions (readability)
# Stated explicitly for robustness against base-style drift (these match LLVM): # Stated explicitly for robustness against base-style drift (these match LLVM):
IndentWidth: 2 IndentWidth: 2

View File

@@ -320,6 +320,21 @@ jobs:
lint: lint:
name: lint (shellcheck, shfmt) name: lint (shellcheck, shfmt)
runs-on: ubuntu-24.04 runs-on: ubuntu-24.04
# Every tracked shell script; the globs expand at run time. Kept here so the
# shellcheck and shfmt steps below cannot drift apart.
env:
SHELL_SCRIPTS: >-
.githooks/pre-commit
bootstrap
build.sh
html/div/search.sh
man/makeman.sh
src/htsbasiccharsets.sh
src/htsentities.sh
src/webhttrack
tests/*.sh
tests/*.test
tools/mkdeb.sh
steps: steps:
- uses: actions/checkout@v6 - uses: actions/checkout@v6
@@ -332,12 +347,11 @@ jobs:
sudo apt-get install -y --no-install-recommends shellcheck shfmt sudo apt-get install -y --no-install-recommends shellcheck shfmt
shfmt --version shfmt --version
# Lint the scripts we maintain; the legacy scripts are a separate cleanup.
- name: shellcheck - name: shellcheck
run: shellcheck man/makeman.sh tools/mkdeb.sh .githooks/pre-commit tests/*.test tests/check-network.sh run: shellcheck $SHELL_SCRIPTS
- name: shfmt - name: shfmt
run: shfmt -d -i 4 man/makeman.sh tools/mkdeb.sh .githooks/pre-commit run: shfmt -d -i 4 $SHELL_SCRIPTS
# Check clang-format on CHANGED LINES ONLY. The engine predates clang-format # Check clang-format on CHANGED LINES ONLY. The engine predates clang-format
# (it was shaped by an old Visual Studio formatter) and does not round-trip, # (it was shaped by an old Visual Studio formatter) and does not round-trip,

7
debian/changelog vendored
View File

@@ -1,6 +1,9 @@
httrack (3.49.8-1) unstable; urgency=medium httrack (3.49.8-1) unstable; urgency=medium
* New upstream release. * New upstream release: HTTPS-proxy CONNECT tunnelling and wider srcset
parsing, a batch of crawler and parser fixes (CSS @import, xmlns
namespaces, relative paths, RFC 6265 cookies), and security hardening of
the parser and of buffer copies throughout the engine.
* Drop the OpenSSL linking exception from the license: OpenSSL 3.0+ is * Drop the OpenSSL linking exception from the license: OpenSSL 3.0+ is
Apache-2.0 and GPL-compatible, so it is no longer needed. httrack is now Apache-2.0 and GPL-compatible, so it is no longer needed. httrack is now
plain GPL-3.0-or-later. Updated debian/copyright accordingly. plain GPL-3.0-or-later. Updated debian/copyright accordingly.
@@ -14,7 +17,7 @@ httrack (3.49.8-1) unstable; urgency=medium
the QA debcheck page. Depend on firefox-esr | chromium | www-browser the QA debcheck page. Depend on firefox-esr | chromium | www-browser
instead. instead.
-- Xavier Roche <xavier@debian.org> Sun, 07 Jun 2026 14:29:24 +0200 -- Xavier Roche <xavier@debian.org> Sat, 20 Jun 2026 13:02:08 +0200
httrack (3.49.7-2) unstable; urgency=medium httrack (3.49.7-2) unstable; urgency=medium

View File

@@ -5,12 +5,31 @@ HTTrack Website Copier release history:
This file lists all changes and fixes that have been made for HTTrack This file lists all changes and fixes that have been made for HTTrack
3.49-8 3.49-8
+ New: tunnel HTTPS downloads through the configured HTTP proxy via CONNECT (#85)
+ New: parse every candidate URL in <img> and <source> srcset lists (#326)
+ Changed: dropped the obsolete OpenSSL linking exception (OpenSSL 3.0+ is Apache-2.0 and GPL-compatible); httrack is now plain GPLv3-or-later + Changed: dropped the obsolete OpenSSL linking exception (OpenSSL 3.0+ is Apache-2.0 and GPL-compatible); httrack is now plain GPLv3-or-later
+ Fixed: link libhtsjava and the libtest examples directly against libc + Fixed: several out-of-bounds reads in the HTML/CSS parser on hostile input (#94, #396)
+ Fixed: stored XSS via an unescaped URL in the generated page footer (#165)
+ Fixed: hardened buffer copies throughout the engine against overflow
+ Fixed: capture conditional CSS @import URLs (#94)
+ Fixed: don't crawl xmlns namespace declarations as links (#191)
+ Fixed: don't mistake the method argument of XMLHttpRequest.open for a URL (#218)
+ Fixed: percent-encode parentheses when rewriting CSS url() targets (#163)
+ Fixed: collapse ../ in file:// URLs and widen relative-link handling (#137, #162)
+ Fixed: drop the obsolete $Version/$Path attributes from the request Cookie header, per RFC 6265 (#151)
+ Fixed: keep empty quoted arguments when reloading doit.log for --update/--continue (#106)
+ Fixed: raise the User-Agent and custom-header length limits (#152)
+ Fixed: abort on a long log path (lock-file buffer too small) (#183)
+ Fixed: race in lazy mutex initialization (#297)
+ Fixed: sub-second mtime precision when comparing local files on POSIX (#383)
+ Fixed: modernize OpenSSL TLS initialization for the 3.x to 4.x transition (#308)
+ Fixed: in-place changes made by the postprocess callback were not applied (Roman Sęk) + Fixed: in-place changes made by the postprocess callback were not applied (Roman Sęk)
+ Fixed: "preffered" typo in the help text and man page (yosinn1-blip) + Fixed: "preffered" typo in the help text and man page (yosinn1-blip)
+ Fixed: corrections and updates of the Russian translation (German Aizek) + Fixed: corrections and updates of the Russian translation (German Aizek)
+ Fixed: corrections and updates of the Danish translation (scootergrisen) + Fixed: corrections and updates of the Danish translation (scootergrisen)
+ Fixed: link libhtsjava and the libtest examples directly against libc
+ New: documented the public library API headers and typed the option fields as named enums
+ Fixed: numerous build, packaging, CI and test-coverage improvements (out-of-tree builds, sanitizer/distcheck CI, shell and Python linting, AppStream metainfo)
3.49-7 3.49-7
+ Fixed: keep generated config.h architecture-independent (Debian #1133728) + Fixed: keep generated config.h architecture-independent (Debian #1133728)

View File

@@ -1,4 +1,3 @@
#!/bin/sh #!/bin/sh
# Simple indexing test using HTTrack # Simple indexing test using HTTrack
@@ -18,22 +17,22 @@ if ! test -f "index.txt"; then
fi fi
# Convert crlf to lf # Convert crlf to lf
if test "`head index.txt -n 1 | tr '\r' '#' | grep -c '#'`" = "1"; then if test "$(head index.txt -n 1 | tr '\r' '#' | grep -c '#')" = "1"; then
echo "Converting index to Unix LF style (not CR/LF) .." echo "Converting index to Unix LF style (not CR/LF) .."
mv -f index.txt index.txt.old mv -f index.txt index.txt.old
cat index.txt.old|tr -d '\r' > index.txt tr -d '\r' <index.txt.old >index.txt
fi fi
keyword=- keyword=-
while test -n "$keyword"; do while test -n "$keyword"; do
printf "Enter a keyword: " printf "Enter a keyword: "
read keyword read -r keyword
if test -n "$keyword"; then if test -n "$keyword"; then
FOUNDK="`grep -niE \"^$keyword\" index.txt`" FOUNDK="$(grep -niE "^$keyword" index.txt)"
if test -n "$FOUNDK"; then if test -n "$FOUNDK"; then
if ! test `echo "$FOUNDK"|wc -l` = "1"; then if ! test "$(echo "$FOUNDK" | wc -l)" = "1"; then
# Multiple matches # Multiple matches
printf "Found multiple keywords: " printf "Found multiple keywords: "
echo "$FOUNDK" | cut -f2 -d':' | tr '\n' ' ' echo "$FOUNDK" | cut -f2 -d':' | tr '\n' ' '
@@ -41,12 +40,12 @@ while test -n "$keyword"; do
echo "Use keyword$ to find only one" echo "Use keyword$ to find only one"
else else
# One match # One match
N=`echo "$FOUNDK"|cut -f1 -d':'` N=$(echo "$FOUNDK" | cut -f1 -d':')
PM=`tail +$N index.txt|grep -nE "\("|head -n 1` PM=$(tail "+$N" index.txt | grep -nE "\(" | head -n 1)
if ! echo "$PM" | grep "ignored" >/dev/null; then if ! echo "$PM" | grep "ignored" >/dev/null; then
M=`echo $PM|cut -f1 -d':'` M=$(echo "$PM" | cut -f1 -d':')
echo "Found in:" echo "Found in:"
cat index.txt | tail "+$N" | head -n "$M" | grep -E "[0-9]* " | cut -f2 -d' ' tail "+$N" index.txt | head -n "$M" | grep -E "[0-9]* " | cut -f2 -d' '
else else
echo "keyword ignored (too many hits)" echo "keyword ignored (too many hits)"
fi fi
@@ -57,4 +56,3 @@ while test -n "$keyword"; do
fi fi
done done

View File

@@ -48,9 +48,8 @@ Please visit our Website: http://www.httrack.com
/* Abort (with the failed byte count) when a growth allocation fails. The /* Abort (with the failed byte count) when a growth allocation fails. The
array macros never return an out-of-memory error; they assert and abort. */ array macros never return an out-of-memory error; they assert and abort. */
static void hts_record_assert_memory_failed(const size_t size) { static void hts_record_assert_memory_failed(const size_t size) {
fprintf(stderr, "memory allocation failed (%lu bytes)", \ fprintf(stderr, "memory allocation failed (%lu bytes)", (long int) size);
(long int) size); \ assertf(!"memory allocation failed");
assertf(! "memory allocation failed"); \
} }
/** Dynamic array of T elements. **/ /** Dynamic array of T elements. **/
@@ -109,20 +108,22 @@ static void hts_record_assert_memory_failed(const size_t size) {
* After a call to this macro, TypedArrayRoom(A) is guaranteed to be at * After a call to this macro, TypedArrayRoom(A) is guaranteed to be at
* least equal to 'ROOM'. * least equal to 'ROOM'.
**/ **/
#define TypedArrayEnsureRoom(A, ROOM) do { \ #define TypedArrayEnsureRoom(A, ROOM) \
do { \
const size_t room_ = (ROOM); \ const size_t room_ = (ROOM); \
while (TypedArrayRoom(A) < room_) { \ while (TypedArrayRoom(A) < room_) { \
TypedArrayCapa(A) = TypedArrayCapa(A) < 16 ? 16 : TypedArrayCapa(A) * 2; \ TypedArrayCapa(A) = TypedArrayCapa(A) < 16 ? 16 : TypedArrayCapa(A) * 2; \
} \ } \
TypedArrayPtr(A) = realloc(TypedArrayPtr(A), \ TypedArrayPtr(A) = \
TypedArrayCapa(A)*TypedArrayWidth(A)); \ realloc(TypedArrayPtr(A), TypedArrayCapa(A) * TypedArrayWidth(A)); \
if (TypedArrayPtr(A) == NULL) { \ if (TypedArrayPtr(A) == NULL) { \
hts_record_assert_memory_failed(TypedArrayCapa(A) * TypedArrayWidth(A)); \ hts_record_assert_memory_failed(TypedArrayCapa(A) * TypedArrayWidth(A)); \
} \ } \
} while (0) } while (0)
/** Add an element. Macro, first element evaluated multiple times. **/ /** Add an element. Macro, first element evaluated multiple times. **/
#define TypedArrayAdd(A, E) do { \ #define TypedArrayAdd(A, E) \
do { \
TypedArrayEnsureRoom(A, 1); \ TypedArrayEnsureRoom(A, 1); \
assertf(TypedArraySize(A) < TypedArrayCapa(A)); \ assertf(TypedArraySize(A) < TypedArrayCapa(A)); \
TypedArrayTail(A) = (E); \ TypedArrayTail(A) = (E); \
@@ -133,7 +134,8 @@ static void hts_record_assert_memory_failed(const size_t size) {
* Add 'COUNT' elements from 'PTR'. * Add 'COUNT' elements from 'PTR'.
* Macro, first element evaluated multiple times. * Macro, first element evaluated multiple times.
**/ **/
#define TypedArrayAppend(A, PTR, COUNT) do { \ #define TypedArrayAppend(A, PTR, COUNT) \
do { \
const size_t count_ = (COUNT); \ const size_t count_ = (COUNT); \
/* This 1-case is to benefit from type safety. */ \ /* This 1-case is to benefit from type safety. */ \
if (count_ == 1) { \ if (count_ == 1) { \
@@ -148,7 +150,8 @@ static void hts_record_assert_memory_failed(const size_t size) {
} while (0) } while (0)
/** Clear an array, freeing memory and clearing size and capacity. **/ /** Clear an array, freeing memory and clearing size and capacity. **/
#define TypedArrayFree(A) do { \ #define TypedArrayFree(A) \
do { \
if (TypedArrayPtr(A) != NULL) { \ if (TypedArrayPtr(A) != NULL) { \
TypedArrayCapa(A) = TypedArraySize(A) = 0; \ TypedArrayCapa(A) = TypedArraySize(A) = 0; \
free(TypedArrayPtr(A)); \ free(TypedArrayPtr(A)); \

View File

@@ -49,9 +49,10 @@ Please visit our Website: http://www.httrack.com
#define WIN32_LEAN_AND_MEAN #define WIN32_LEAN_AND_MEAN
// KB955045 (http://support.microsoft.com/kb/955045) // KB955045 (http://support.microsoft.com/kb/955045)
// To execute an application using this function on earlier versions of Windows // To execute an application using this function on earlier versions of Windows
// (Windows 2000, Windows NT, and Windows Me/98/95), then it is mandatary to #include Ws2tcpip.h // (Windows 2000, Windows NT, and Windows Me/98/95), then it is mandatary to
// and also Wspiapi.h. When the Wspiapi.h header file is included, the 'getaddrinfo' function is // #include Ws2tcpip.h and also Wspiapi.h. When the Wspiapi.h header file is
// #defined to the 'WspiapiGetAddrInfo' inline function in Wspiapi.h. // included, the 'getaddrinfo' function is #defined to the 'WspiapiGetAddrInfo'
// inline function in Wspiapi.h.
#include <ws2tcpip.h> #include <ws2tcpip.h>
#include <Wspiapi.h> #include <Wspiapi.h>
// #include <winsock2.h> // #include <winsock2.h>

View File

@@ -13,14 +13,14 @@ rm -f CP932.TXT CP936.TXT CP949.TXT CP950.TXT
fi fi
# Produce code # Produce code
printf "/** GENERATED FILE ($0), DO NOT EDIT **/\n\n" printf '/** GENERATED FILE (%s), DO NOT EDIT **/\n\n' "$0"
for i in *.TXT; do for i in *.TXT; do
echo "processing $i" >&2 echo "processing $i" >&2
grep -vE "^(#|$)" $i | grep -E "^0x" | sed -e 's/[[:space:]]/ /g' | cut -f1,2 -d' ' | \ grep -vE "^(#|$)" "$i" | grep -E "^0x" | sed -e 's/[[:space:]]/ /g' | cut -f1,2 -d' ' |
( (
unset arr unset arr
while read LINE ; do while read -r LINE; do
from=$[$(echo $LINE | cut -f1 -d' ')] from=$(($(echo "$LINE" | cut -f1 -d' ')))
if ! test -n "$from"; then if ! test -n "$from"; then
echo "error with $i" >&2 echo "error with $i" >&2
exit 1 exit 1
@@ -28,22 +28,23 @@ for i in *.TXT ; do
echo "out-of-range ($LINE) with $i" >&2 echo "out-of-range ($LINE) with $i" >&2
exit 1 exit 1
fi fi
to=$(echo $LINE | cut -f2 -d' ') to=$(echo "$LINE" | cut -f2 -d' ')
arr[$from]=$to arr[from]=$to
done done
name=$(echo $i | tr 'A-Z' 'a-z' | tr '-' '_' | sed -e 's/\.txt//' -e 's/8859/iso_8859/') # shellcheck disable=SC2018,SC2019 # charset filenames are ASCII; keep C-locale A-Z/a-z
printf "/* Table for $i */\nstatic const hts_UCS4 table_${name}[256] = {\n " name=$(echo "$i" | tr 'A-Z' 'a-z' | tr '-' '_' | sed -e 's/\.txt//' -e 's/8859/iso_8859/')
i=0 printf '/* Table for %s */\nstatic const hts_UCS4 table_%s[256] = {\n ' "$i" "$name"
while test "$i" -lt 256; do idx=0
if test "$i" -gt 0; then while test "$idx" -lt 256; do
if test "$idx" -gt 0; then
printf ", " printf ", "
if test $[${i}%8] -eq 0; then if test $((idx % 8)) -eq 0; then
printf "\n " printf "\n "
fi fi
fi fi
value=${arr[$i]:-0} value=${arr[$idx]:-0}
printf "0x%04x" $value printf "0x%04x" "$value"
i=$[${i}+1] idx=$((idx + 1))
done done
printf " };\n\n" printf " };\n\n"
) )
@@ -53,7 +54,8 @@ done
# Indexes # Indexes
printf "static const struct {\n const char *name;\n const hts_UCS4 *table;\n} table_mappings[] = {\n" printf "static const struct {\n const char *name;\n const hts_UCS4 *table;\n} table_mappings[] = {\n"
for i in *.TXT; do for i in *.TXT; do
name=$(echo $i | tr 'A-Z' 'a-z' | tr '-' '_' | sed -e 's/\.txt//' -e 's/8859/iso_8859/') # shellcheck disable=SC2018,SC2019 # charset filenames are ASCII; keep C-locale A-Z/a-z
printf " { \"$(echo $name | tr -d '_')\", table_${name} },\n" name=$(echo "$i" | tr 'A-Z' 'a-z' | tr '-' '_' | sed -e 's/\.txt//' -e 's/8859/iso_8859/')
printf ' { "%s", table_%s },\n' "$(echo "$name" | tr -d '_')" "$name"
done done
printf " { NULL, NULL }\n};\n" printf " { NULL, NULL }\n};\n"

View File

@@ -71,7 +71,8 @@ struct t_cookie {
int cookie_add(t_cookie *cookie, const char *cook_name, const char *cook_value, int cookie_add(t_cookie *cookie, const char *cook_name, const char *cook_value,
const char *domain, const char *path); const char *domain, const char *path);
int cookie_del(t_cookie * cookie, const char *cook_name, const char *domain, const char *path); int cookie_del(t_cookie *cookie, const char *cook_name, const char *domain,
const char *path);
int cookie_load(t_cookie *cookie, const char *path, const char *name); int cookie_load(t_cookie *cookie, const char *path, const char *name);
@@ -83,7 +84,8 @@ void cookie_delete(char *s, size_t s_size, size_t pos);
const char *cookie_get(char *buffer, const char *cookie_base, int param); const char *cookie_get(char *buffer, const char *cookie_base, int param);
char *cookie_find(char *s, const char *cook_name, const char *domain, const char *path); char *cookie_find(char *s, const char *cook_name, const char *domain,
const char *path);
char *cookie_nextfield(char *a); char *cookie_nextfield(char *a);
@@ -92,7 +94,8 @@ char *cookie_nextfield(char *a);
/** Register credentials (auth = base-64 user:pass) for the prefix derived from /** Register credentials (auth = base-64 user:pass) for the prefix derived from
adr (host) and fil (path). No-op returning 0 if cookie is NULL, allocation adr (host) and fil (path). No-op returning 0 if cookie is NULL, allocation
fails, or a matching prefix is already stored; returns 1 on insertion. */ fails, or a matching prefix is already stored; returns 1 on insertion. */
int bauth_add(t_cookie * cookie, const char *adr, const char *fil, const char *auth); int bauth_add(t_cookie *cookie, const char *adr, const char *fil,
const char *auth);
/** Return the stored base-64 credentials whose prefix matches adr+fil, or NULL /** Return the stored base-64 credentials whose prefix matches adr+fil, or NULL
if none (or cookie is NULL). Returned pointer aliases the jar's bauth_chain; if none (or cookie is NULL). Returned pointer aliases the jar's bauth_chain;

View File

@@ -87,7 +87,8 @@ Please visit our Website: http://www.httrack.com
// fast cache (build hash table) // fast cache (build hash table)
#define HTS_FAST_CACHE 1 #define HTS_FAST_CACHE 1
// le > peut être considéré comme un tag de fermeture de commentaire (<!-- > est valide) // le > peut être considéré comme un tag de fermeture de commentaire (<!-- > est
// valide)
#define GT_ENDS_COMMENT 1 #define GT_ENDS_COMMENT 1
// always adds a '/' at the end if a '~' is encountered (/~smith -> /~smith/) // always adds a '/' at the end if a '~' is encountered (/~smith -> /~smith/)
@@ -97,7 +98,8 @@ Please visit our Website: http://www.httrack.com
#define HTS_STRIP_DOUBLE_SLASH 0 #define HTS_STRIP_DOUBLE_SLASH 0
// case-sensitive pour les dossiers et fichiers (0/1) // case-sensitive pour les dossiers et fichiers (0/1)
// [normalement 1, mais pose des problèmes (url malformée par exemple) et n'est pas très utile.. // [normalement 1, mais pose des problèmes (url malformée par exemple) et n'est
// pas très utile..
// ..et pas bcp respecté] // ..et pas bcp respecté]
// REMOVED // REMOVED
// #define HTS_CASSE 0 // #define HTS_CASSE 0

View File

@@ -2787,6 +2787,47 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
return 0; return 0;
} }
break; break;
case 'l': /* lienrelatif: relative link from curr_fil to link */
if (na + 2 >= argc) {
HTS_PANIC_PRINTF(
"Option #l needs a link and a current-file path");
printf(
"Example: '-#l' 'host/dir/img.gif' 'host/dir/p.html'\n");
htsmain_free();
return -1;
} else {
char s[HTS_URLMAXSIZE * 2];
if (lienrelatif(s, sizeof(s), argv[na + 1], argv[na + 2]) ==
0)
printf("relative=%s\n", s);
else
printf("relative=<ERROR>\n");
htsmain_free();
return 0;
}
break;
case 'i': /* ident_url_relatif: resolve a link -> adr/fil */
if (na + 3 >= argc) {
HTS_PANIC_PRINTF(
"Option #i needs a link, an origin address and file");
printf("Example: '-#i' '../img.gif' 'www.foo.com' "
"'/d/p.html'\n");
htsmain_free();
return -1;
} else {
lien_adrfil af;
const int r = ident_url_relatif(argv[na + 1], argv[na + 2],
argv[na + 3], &af);
if (r == 0)
printf("adr=%s fil=%s\n", af.adr, af.fil);
else
printf("error=%d\n", r);
htsmain_free();
return 0;
}
break;
case '2': // mimedefs case '2': // mimedefs
if (na + 1 >= argc) { if (na + 1 >= argc) {
HTS_PANIC_PRINTF("Option #2 needs to be followed by an URL"); HTS_PANIC_PRINTF("Option #2 needs to be followed by an URL");

View File

@@ -109,8 +109,8 @@ typedef int (*t_hts_htmlcheck_chopt) (t_hts_callbackarg * carg, httrackp * opt);
/* Rewrite hook over an in-memory page: the html and len arguments point at the /* Rewrite hook over an in-memory page: the html and len arguments point at the
buffer and its length (the callback may reallocate and resize it), buffer and its length (the callback may reallocate and resize it),
url_adresse and url_fichier name it. */ url_adresse and url_fichier name it. */
typedef int (*t_hts_htmlcheck_process) (t_hts_callbackarg * carg, typedef int (*t_hts_htmlcheck_process)(t_hts_callbackarg *carg, httrackp *opt,
httrackp * opt, char **html, int *len, char **html, int *len,
const char *url_adresse, const char *url_adresse,
const char *url_fichier); const char *url_fichier);
@@ -147,9 +147,8 @@ typedef const char *(*t_hts_htmlcheck_query3) (t_hts_callbackarg * carg,
queue size and running totals, stat_time the elapsed time. */ queue size and running totals, stat_time the elapsed time. */
typedef int (*t_hts_htmlcheck_loop)(t_hts_callbackarg *carg, httrackp *opt, typedef int (*t_hts_htmlcheck_loop)(t_hts_callbackarg *carg, httrackp *opt,
lien_back *back, int back_max, lien_back *back, int back_max,
int back_index, int lien_tot, int back_index, int lien_tot, int lien_ntot,
int lien_ntot, int stat_time, int stat_time, hts_stat_struct *stats);
hts_stat_struct * stats);
/* Veto a link (adr host, fil path) after its transfer; status is the result. /* Veto a link (adr host, fil path) after its transfer; status is the result.
Return 0 to drop the link. */ Return 0 to drop the link. */
@@ -168,8 +167,8 @@ typedef void (*t_hts_htmlcheck_pause) (t_hts_callbackarg * carg, httrackp * opt,
const char *lockfile); const char *lockfile);
/* Fired after a file is written to disk; 'file' is the local path. */ /* Fired after a file is written to disk; 'file' is the local path. */
typedef void (*t_hts_htmlcheck_filesave) (t_hts_callbackarg * carg, typedef void (*t_hts_htmlcheck_filesave)(t_hts_callbackarg *carg, httrackp *opt,
httrackp * opt, const char *file); const char *file);
/* Richer file-saved notification: source host/filename, local path, and flags /* Richer file-saved notification: source host/filename, local path, and flags
telling whether the file is new, modified, or left unchanged. */ telling whether the file is new, modified, or left unchanged. */
@@ -189,13 +188,12 @@ typedef int (*t_hts_htmlcheck_linkdetected2) (t_hts_callbackarg * carg,
const char *tag_start); const char *tag_start);
/* Fired on each transfer-status change of slot 'back'. */ /* Fired on each transfer-status change of slot 'back'. */
typedef int (*t_hts_htmlcheck_xfrstatus) (t_hts_callbackarg * carg, typedef int (*t_hts_htmlcheck_xfrstatus)(t_hts_callbackarg *carg, httrackp *opt,
httrackp * opt, lien_back * back); lien_back *back);
/* Choose the local save path for a URL; write it into 'save'. adr/fil name the /* Choose the local save path for a URL; write it into 'save'. adr/fil name the
target, referer_adr/referer_fil the page that linked it. */ target, referer_adr/referer_fil the page that linked it. */
typedef int (*t_hts_htmlcheck_savename) (t_hts_callbackarg * carg, typedef int (*t_hts_htmlcheck_savename)(t_hts_callbackarg *carg, httrackp *opt,
httrackp * opt,
const char *adr_complete, const char *adr_complete,
const char *fil_complete, const char *fil_complete,
const char *referer_adr, const char *referer_adr,
@@ -206,9 +204,9 @@ typedef t_hts_htmlcheck_savename t_hts_htmlcheck_extsavename;
/* Inspect or edit the outgoing request headers in 'buff' before they are sent. /* Inspect or edit the outgoing request headers in 'buff' before they are sent.
*/ */
typedef int (*t_hts_htmlcheck_sendhead) (t_hts_callbackarg * carg, typedef int (*t_hts_htmlcheck_sendhead)(t_hts_callbackarg *carg, httrackp *opt,
httrackp * opt, char *buff, char *buff, const char *adr,
const char *adr, const char *fil, const char *fil,
const char *referer_adr, const char *referer_adr,
const char *referer_fil, const char *referer_fil,
htsblk *outgoing); htsblk *outgoing);

View File

@@ -33,14 +33,14 @@ EOF
else else
GET "${url}" GET "${url}"
fi fi
) \ ) |
| grep -E '^<!ENTITY [a-zA-Z0-9_]' \ grep -E '^<!ENTITY [a-zA-Z0-9_]' |
| sed \ sed \
-e 's/<!ENTITY //' -e "s/[[:space:]][[:space:]]*/ /g" \ -e 's/<!ENTITY //' -e "s/[[:space:]][[:space:]]*/ /g" \
-e 's/-->$//' \ -e 's/-->$//' \
-e 's/\([^ ]*\) CDATA "&#\([^\"]*\);" -- \(.*\)/\1 \2 \3/'\ -e 's/\([^ ]*\) CDATA "&#\([^\"]*\);" -- \(.*\)/\1 \2 \3/' |
| ( \ (
read A read -r A
while test -n "$A"; do while test -n "$A"; do
ent="${A%% *}" ent="${A%% *}"
code=$(echo "$A" | cut -f2 -d' ') code=$(echo "$A" | cut -f2 -d' ')
@@ -49,11 +49,11 @@ EOF
i=0 i=0
a=1664525 a=1664525
c=1013904223 c=1013904223
m="$[1 << 32]" m="$((1 << 32))"
while test "$i" -lt ${#ent}; do while test "$i" -lt ${#ent}; do
d="$(echo -n "${ent:${i}:1}" | hexdump -v -e '/1 "%d"')" d="$(echo -n "${ent:${i}:1}" | hexdump -v -e '/1 "%d"')"
hash="$[((${hash}*${a})%(${m})+${d}+${c})%(${m})]" hash="$((((hash * a) % (m) + d + c) % (m)))"
i=$[${i}+1] i=$((i + 1))
done done
echo -e " /* $A */" echo -e " /* $A */"
echo -e " case ${hash}u:" echo -e " case ${hash}u:"
@@ -63,7 +63,7 @@ EOF
echo -e " break;" echo -e " break;"
# next # next
read A read -r A
done done
) )
cat <<EOF cat <<EOF

View File

@@ -226,9 +226,14 @@ Please visit our Website: http://www.httrack.com
/* Copyright (C) 1998 Xavier Roche and other contributors */ /* Copyright (C) 1998 Xavier Roche and other contributors */
#define HTTRACK_AFF_AUTHORS "[XR&CO'2014]" #define HTTRACK_AFF_AUTHORS "[XR&CO'2014]"
#define HTS_DEFAULT_FOOTER "<!-- Mirrored from %s%s by HTTrack Website Copier/" HTTRACK_AFF_VERSION " " HTTRACK_AFF_AUTHORS ", %s -->" #define HTS_DEFAULT_FOOTER \
"<!-- Mirrored from %s%s by HTTrack Website Copier/" HTTRACK_AFF_VERSION \
" " HTTRACK_AFF_AUTHORS ", %s -->"
#define HTTRACK_WEB "http://www.httrack.com" #define HTTRACK_WEB "http://www.httrack.com"
#define HTS_UPDATE_WEBSITE "http://www.httrack.com/update.php3?Product=HTTrack&Version=" HTTRACK_VERSIONID "&VersionStr=" HTTRACK_VERSION "&Platform=%d&Language=%s" #define HTS_UPDATE_WEBSITE \
"http://www.httrack.com/" \
"update.php3?Product=HTTrack&Version=" HTTRACK_VERSIONID \
"&VersionStr=" HTTRACK_VERSION "&Platform=%d&Language=%s"
#define H_CRLF "\x0d\x0a" #define H_CRLF "\x0d\x0a"
#define CRLF "\x0d\x0a" #define CRLF "\x0d\x0a"
@@ -247,6 +252,7 @@ Please visit our Website: http://www.httrack.com
return type stays compatible with the int it replaces. */ return type stays compatible with the int it replaces. */
#ifndef HTS_DEF_DEFSTRUCT_hts_boolean #ifndef HTS_DEF_DEFSTRUCT_hts_boolean
#define HTS_DEF_DEFSTRUCT_hts_boolean #define HTS_DEF_DEFSTRUCT_hts_boolean
typedef enum hts_boolean { HTS_FALSE = 0, HTS_TRUE = 1 } hts_boolean; typedef enum hts_boolean { HTS_FALSE = 0, HTS_TRUE = 1 } hts_boolean;
#endif #endif
@@ -278,8 +284,8 @@ typedef enum hts_boolean { HTS_FALSE = 0, HTS_TRUE = 1 } hts_boolean;
#endif #endif
#else #else
/* See <http://gcc.gnu.org/wiki/Visibility> */ /* See <http://gcc.gnu.org/wiki/Visibility> */
#if ( ( defined(__GNUC__) && ( __GNUC__ >= 4 ) ) \ #if ((defined(__GNUC__) && (__GNUC__ >= 4)) || \
|| ( defined(HAVE_VISIBILITY) && HAVE_VISIBILITY ) ) (defined(HAVE_VISIBILITY) && HAVE_VISIBILITY))
#define HTSEXT_API __attribute__((visibility("default"))) #define HTSEXT_API __attribute__((visibility("default")))
#else #else
@@ -335,8 +341,8 @@ typedef __int64 LLint;
typedef __int64 TStamp; typedef __int64 TStamp;
#define LLintP "%I64d" #define LLintP "%I64d"
#elif (defined(_LP64) || defined(__x86_64__) \ #elif (defined(_LP64) || defined(__x86_64__) || defined(__powerpc64__) || \
|| defined(__powerpc64__) || defined(__64BIT__)) defined(__64BIT__))
typedef long int LLint; typedef long int LLint;
@@ -405,7 +411,8 @@ typedef int T_SOC;
#if HTS_ACCESS #if HTS_ACCESS
#define HTS_ACCESS_FILE (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH) #define HTS_ACCESS_FILE (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)
#define HTS_ACCESS_FOLDER (S_IRUSR|S_IWUSR|S_IXUSR|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH) #define HTS_ACCESS_FOLDER \
(S_IRUSR | S_IWUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH)
#else #else
#define HTS_ACCESS_FILE (S_IRUSR | S_IWUSR) #define HTS_ACCESS_FILE (S_IRUSR | S_IWUSR)
@@ -427,7 +434,11 @@ typedef int T_SOC;
#endif #endif
/* fflush sur stdout */ /* fflush sur stdout */
#define io_flush { fflush(stdout); fflush(stdin); } #define io_flush \
{ \
fflush(stdout); \
fflush(stdin); \
}
/* HTSLib */ /* HTSLib */
@@ -524,7 +535,13 @@ static const t_htsboundary htsboundary = 0xDEADBEEF;
#if _HTS_WIDE #if _HTS_WIDE
extern FILE *DEBUG_fp; extern FILE *DEBUG_fp;
#define DEBUG_W(A) { if (DEBUG_fp==NULL) DEBUG_fp=fopen("bug.out","wb"); fprintf(DEBUG_fp,":>"A); fflush(DEBUG_fp); } #define DEBUG_W(A) \
{ \
if (DEBUG_fp == NULL) \
DEBUG_fp = fopen("bug.out", "wb"); \
fprintf(DEBUG_fp, ":>" A); \
fflush(DEBUG_fp); \
}
#undef _ #undef _
#define _ , #define _ ,
#endif #endif

View File

@@ -2605,6 +2605,8 @@ int ident_url_absolute(const char *url, lien_adrfil *adrfil) {
for(i = 0; adrfil->fil[i] != '\0'; i++) for(i = 0; adrfil->fil[i] != '\0'; i++)
if (adrfil->fil[i] == '\\') if (adrfil->fil[i] == '\\')
adrfil->fil[i] = '/'; adrfil->fil[i] = '/';
// collapse ../ like the http branch above (path-traversal safety)
fil_simplifie(adrfil->fil);
} }
// no hostname // no hostname

View File

@@ -92,8 +92,8 @@ struct htsmoduleStruct {
/* Callbacks */ /* Callbacks */
t_htsAddLink addLink; /* call this function when links are t_htsAddLink addLink; /* call this function when links are
being detected. it if not your responsability to decide being detected. it if not your responsability to
if the engine will keep them, or not. */ decide if the engine will keep them, or not. */
/* Optional */ /* Optional */
char *localLink; /* if non null, the engine will write there the local char *localLink; /* if non null, the engine will write there the local
@@ -117,7 +117,6 @@ struct htsmoduleStruct {
int *ptr_; int *ptr_;
const char *page_charset_; const char *page_charset_;
/* Internal use - please don't touch */ /* Internal use - please don't touch */
}; };
#ifdef __cplusplus #ifdef __cplusplus

View File

@@ -112,8 +112,8 @@ struct SOCaddr {
/** Pointer to the port field (network byte order) for the active family. /** Pointer to the port field (network byte order) for the active family.
Asserts on NULL or an unset/unknown family. */ Asserts on NULL or an unset/unknown family. */
static HTS_INLINE HTS_UNUSED in_port_t* SOCaddr_sinport_(SOCaddr *const addr, static HTS_INLINE HTS_UNUSED in_port_t *
const char *file, const int line) { SOCaddr_sinport_(SOCaddr *const addr, const char *file, const int line) {
assertf_(addr != NULL, file, line); assertf_(addr != NULL, file, line);
switch (addr->m_addr.sa.sa_family) { switch (addr->m_addr.sa.sa_family) {
case AF_INET: case AF_INET:
@@ -134,7 +134,8 @@ static HTS_INLINE HTS_UNUSED in_port_t* SOCaddr_sinport_(SOCaddr *const addr,
/** Length of the active sockaddr (sockaddr_in or sockaddr_in6), or 0 if the /** Length of the active sockaddr (sockaddr_in or sockaddr_in6), or 0 if the
family is unset/unknown. The 0 case doubles as the "not valid" test. */ family is unset/unknown. The 0 case doubles as the "not valid" test. */
static HTS_INLINE HTS_UNUSED socklen_t SOCaddr_size_(const SOCaddr *const addr, static HTS_INLINE HTS_UNUSED socklen_t SOCaddr_size_(const SOCaddr *const addr,
const char *file, const int line) { const char *file,
const int line) {
assertf_(addr != NULL, file, line); assertf_(addr != NULL, file, line);
switch (addr->m_addr.sa.sa_family) { switch (addr->m_addr.sa.sa_family) {
case AF_INET: case AF_INET:
@@ -152,8 +153,8 @@ static HTS_INLINE HTS_UNUSED socklen_t SOCaddr_size_(const SOCaddr*const addr,
} }
/** Reset to the unset state (family AF_UNSPEC), making the address invalid. */ /** Reset to the unset state (family AF_UNSPEC), making the address invalid. */
static HTS_INLINE HTS_UNUSED void SOCaddr_clear_(SOCaddr*const addr, static HTS_INLINE HTS_UNUSED void
const char *file, const int line) { SOCaddr_clear_(SOCaddr *const addr, const char *file, const int line) {
assertf_(addr != NULL, file, line); assertf_(addr != NULL, file, line);
addr->m_addr.sa.sa_family = AF_UNSPEC; addr->m_addr.sa.sa_family = AF_UNSPEC;
} }
@@ -191,14 +192,16 @@ static HTS_INLINE HTS_UNUSED void SOCaddr_clear_(SOCaddr*const addr,
/** Set the port (host-order argument, stored network-order) on the active /** Set the port (host-order argument, stored network-order) on the active
* family. */ * family. */
#define SOCaddr_initport(server, port) do { \ #define SOCaddr_initport(server, port) \
do { \
SOCaddr_sinport(server) = htons((in_port_t) (port)); \ SOCaddr_sinport(server) = htons((in_port_t) (port)); \
} while (0) } while (0)
/** Initialize as an all-zero IPv4 wildcard (INADDR_ANY) address; returns its /** Initialize as an all-zero IPv4 wildcard (INADDR_ANY) address; returns its
sockaddr length. */ sockaddr length. */
static HTS_INLINE HTS_UNUSED socklen_t SOCaddr_initany_(SOCaddr *const addr, static HTS_INLINE HTS_UNUSED socklen_t SOCaddr_initany_(SOCaddr *const addr,
const char *file, const int line) { const char *file,
const int line) {
assertf_(addr != NULL, file, line); assertf_(addr != NULL, file, line);
memset(&addr->m_addr.in, 0, sizeof(addr->m_addr.in)); memset(&addr->m_addr.in, 0, sizeof(addr->m_addr.in));
addr->m_addr.in.sin_family = AF_INET; addr->m_addr.in.sin_family = AF_INET;
@@ -206,7 +209,8 @@ static HTS_INLINE HTS_UNUSED socklen_t SOCaddr_initany_(SOCaddr*const addr,
} }
/** Initialize server as an IPv4 wildcard (INADDR_ANY) address. */ /** Initialize server as an IPv4 wildcard (INADDR_ANY) address. */
#define SOCaddr_initany(server) do { \ #define SOCaddr_initany(server) \
do { \
SOCaddr_initany_(&(server), __FILE__, __LINE__); \ SOCaddr_initany_(&(server), __FILE__, __LINE__); \
} while (0) } while (0)
@@ -215,8 +219,10 @@ static HTS_INLINE HTS_UNUSED socklen_t SOCaddr_initany_(SOCaddr*const addr,
with port zeroed. Any other size leaves an AF_INET shell. Returns the with port zeroed. Any other size leaves an AF_INET shell. Returns the
resulting sockaddr length. */ resulting sockaddr length. */
static HTS_UNUSED socklen_t SOCaddr_copyaddr_(SOCaddr *const server, static HTS_UNUSED socklen_t SOCaddr_copyaddr_(SOCaddr *const server,
const void *data, const size_t data_size, const void *data,
const char *file, const int line) { const size_t data_size,
const char *file,
const int line) {
assertf_(server != NULL, file, line); assertf_(server != NULL, file, line);
assertf_(data != NULL, file, line); assertf_(data != NULL, file, line);
@@ -248,32 +254,35 @@ static HTS_UNUSED socklen_t SOCaddr_copyaddr_(SOCaddr*const server,
/** Copy hpaddr (length hpsize) into server, writing the result length into the /** Copy hpaddr (length hpsize) into server, writing the result length into the
lvalue server_len (int). See SOCaddr_copyaddr_ for accepted forms. */ lvalue server_len (int). See SOCaddr_copyaddr_ for accepted forms. */
#define SOCaddr_copyaddr(server, server_len, hpaddr, hpsize) do { \ #define SOCaddr_copyaddr(server, server_len, hpaddr, hpsize) \
server_len = (int) SOCaddr_copyaddr_(&(server), hpaddr, hpsize, __FILE__, __LINE__); \ do { \
server_len = (int) SOCaddr_copyaddr_(&(server), hpaddr, hpsize, __FILE__, \
__LINE__); \
} while (0) } while (0)
/** Like SOCaddr_copyaddr but discards the result length. */ /** Like SOCaddr_copyaddr but discards the result length. */
#define SOCaddr_copyaddr2(server, hpaddr, hpsize) do { \ #define SOCaddr_copyaddr2(server, hpaddr, hpsize) \
do { \
(void) SOCaddr_copyaddr_(&(server), hpaddr, hpsize, __FILE__, __LINE__); \ (void) SOCaddr_copyaddr_(&(server), hpaddr, hpsize, __FILE__, __LINE__); \
} while (0) } while (0)
/** Copy one SOCaddr (src) into another (dest), preserving family and port. */ /** Copy one SOCaddr (src) into another (dest), preserving family and port. */
#define SOCaddr_copy_SOCaddr(dest, src) do { \ #define SOCaddr_copy_SOCaddr(dest, src) \
SOCaddr_copyaddr_(&(dest), &(src).m_addr.sa, SOCaddr_size(src), __FILE__, __LINE__); \ do { \
SOCaddr_copyaddr_(&(dest), &(src).m_addr.sa, SOCaddr_size(src), __FILE__, \
__LINE__); \
} while (0) } while (0)
/** Write the numeric (dotted/colon) host of ss into namebuf (capacity /** Write the numeric (dotted/colon) host of ss into namebuf (capacity
namebuflen), scope id stripped. On failure namebuf becomes "". */ namebuflen), scope id stripped. On failure namebuf becomes "". */
static HTS_UNUSED void SOCaddr_inetntoa_(char *namebuf, size_t namebuflen, static HTS_UNUSED void SOCaddr_inetntoa_(char *namebuf, size_t namebuflen,
SOCaddr *const ss, SOCaddr *const ss, const char *file,
const char *file, const int line) { const int line) {
assertf_(namebuf != NULL, file, line); assertf_(namebuf != NULL, file, line);
assertf_(ss != NULL, file, line); assertf_(ss != NULL, file, line);
if (getnameinfo(&ss->m_addr.sa, sizeof(ss->m_addr), if (getnameinfo(&ss->m_addr.sa, sizeof(ss->m_addr), namebuf, namebuflen, NULL,
namebuf, namebuflen, 0, NI_NUMERICHOST) == 0) {
NULL, 0,
NI_NUMERICHOST) == 0) {
/* remove scope id(s) */ /* remove scope id(s) */
char *const pos = strchr(namebuf, '%'); char *const pos = strchr(namebuf, '%');
if (pos != NULL) { if (pos != NULL) {
@@ -289,7 +298,8 @@ static HTS_UNUSED void SOCaddr_inetntoa_(char *namebuf, size_t namebuflen,
SOCaddr_inetntoa_(namebuf, namebuflen, &(ss), __FILE__, __LINE__) SOCaddr_inetntoa_(namebuf, namebuflen, &(ss), __FILE__, __LINE__)
/** Single-char family tag: '1' for IPv4, '2' otherwise (used in the cache). */ /** Single-char family tag: '1' for IPv4, '2' otherwise (used in the cache). */
#define SOCaddr_getproto(ss) ( SOCaddr_size(ss) == sizeof(struct sockaddr_in) ? '1' : '2') #define SOCaddr_getproto(ss) \
(SOCaddr_size(ss) == sizeof(struct sockaddr_in) ? '1' : '2')
/** Length type for socket APIs (getsockname, accept, ...). */ /** Length type for socket APIs (getsockname, accept, ...). */
typedef socklen_t SOClen; typedef socklen_t SOClen;

View File

@@ -72,6 +72,7 @@ typedef struct String String;
#endif #endif
#ifndef HTS_DEF_STRUCT_String #ifndef HTS_DEF_STRUCT_String
#define HTS_DEF_STRUCT_String #define HTS_DEF_STRUCT_String
struct String { struct String {
char *buffer_; char *buffer_;
size_t length_; size_t length_;
@@ -179,6 +180,7 @@ typedef struct lien_url lien_url;
#ifndef HTS_DEF_DEFSTRUCT_hts_log_type #ifndef HTS_DEF_DEFSTRUCT_hts_log_type
#define HTS_DEF_DEFSTRUCT_hts_log_type #define HTS_DEF_DEFSTRUCT_hts_log_type
typedef enum hts_log_type { typedef enum hts_log_type {
LOG_PANIC, LOG_PANIC,
LOG_ERROR, LOG_ERROR,
@@ -288,6 +290,7 @@ typedef enum htsparsejava_flags {
/* Link-rewriting style for saved pages (opt->urlmode). */ /* Link-rewriting style for saved pages (opt->urlmode). */
#ifndef HTS_DEF_DEFSTRUCT_hts_urlmode #ifndef HTS_DEF_DEFSTRUCT_hts_urlmode
#define HTS_DEF_DEFSTRUCT_hts_urlmode #define HTS_DEF_DEFSTRUCT_hts_urlmode
typedef enum hts_urlmode { typedef enum hts_urlmode {
HTS_URLMODE_ABSOLUTE = 0, /**< absolute URL (http://host/path) everywhere */ HTS_URLMODE_ABSOLUTE = 0, /**< absolute URL (http://host/path) everywhere */
HTS_URLMODE_ABSOLUTE_FILE = 1, /**< legacy file: form, unused */ HTS_URLMODE_ABSOLUTE_FILE = 1, /**< legacy file: form, unused */
@@ -301,6 +304,7 @@ typedef enum hts_urlmode {
/* Cache policy for updates and retries (opt->cache). */ /* Cache policy for updates and retries (opt->cache). */
#ifndef HTS_DEF_DEFSTRUCT_hts_cachemode #ifndef HTS_DEF_DEFSTRUCT_hts_cachemode
#define HTS_DEF_DEFSTRUCT_hts_cachemode #define HTS_DEF_DEFSTRUCT_hts_cachemode
typedef enum hts_cachemode { typedef enum hts_cachemode {
HTS_CACHE_NONE = 0, /**< no cache */ HTS_CACHE_NONE = 0, /**< no cache */
HTS_CACHE_PRIORITY = 1, /**< cache takes priority over the network */ HTS_CACHE_PRIORITY = 1, /**< cache takes priority over the network */
@@ -311,6 +315,7 @@ typedef enum hts_cachemode {
/* Interactive wizard level (opt->wizard). */ /* Interactive wizard level (opt->wizard). */
#ifndef HTS_DEF_DEFSTRUCT_hts_wizard #ifndef HTS_DEF_DEFSTRUCT_hts_wizard
#define HTS_DEF_DEFSTRUCT_hts_wizard #define HTS_DEF_DEFSTRUCT_hts_wizard
typedef enum hts_wizard { typedef enum hts_wizard {
HTS_WIZARD_NONE = 0, /**< no wizard */ HTS_WIZARD_NONE = 0, /**< no wizard */
HTS_WIZARD_ASK = 1, /**< wizard asks questions */ HTS_WIZARD_ASK = 1, /**< wizard asks questions */
@@ -321,6 +326,7 @@ typedef enum hts_wizard {
/* robots.txt / meta-robots obedience level (opt->robots). */ /* robots.txt / meta-robots obedience level (opt->robots). */
#ifndef HTS_DEF_DEFSTRUCT_hts_robots #ifndef HTS_DEF_DEFSTRUCT_hts_robots
#define HTS_DEF_DEFSTRUCT_hts_robots #define HTS_DEF_DEFSTRUCT_hts_robots
typedef enum hts_robots { typedef enum hts_robots {
HTS_ROBOTS_NEVER = 0, /**< ignore robots rules */ HTS_ROBOTS_NEVER = 0, /**< ignore robots rules */
HTS_ROBOTS_SOMETIMES = 1, /**< partial obedience (default) */ HTS_ROBOTS_SOMETIMES = 1, /**< partial obedience (default) */

View File

@@ -296,6 +296,48 @@ static const char *html_inline_safe(const char *src, char *dst, size_t size) {
return dst; return dst;
} }
/* Byte before html, or a space sentinel at the buffer start where html[-1]
would underflow; space reads as the word boundary the guards want there. */
static HTS_INLINE char html_prevc(const char *html, const char *start) {
return html > start ? html[-1] : ' ';
}
/* True if [s, s+len) is exactly an HTTP method token (XHR.open's first
argument is a method, not a URL: #218). Case-insensitive. */
static int is_http_method(const char *s, size_t len) {
static const char *const methods[] = {"GET", "POST", "PUT",
"DELETE", "HEAD", "OPTIONS",
"PATCH", "TRACE", NULL};
int i;
for (i = 0; methods[i] != NULL; i++) {
if (strlen(methods[i]) == len && strfield(s, methods[i]) == (int) len)
return 1;
}
return 0;
}
/* Percent-encode '(' and ')' in a link emitted into an unquoted url(...) (CSS
or JS): a literal ')' closes the token early and the UA mis-parses the value
(#163). The UA decodes %28/%29 back to the saved-on-disk name. */
static void escape_url_parens(char *const s, const size_t size) {
char BIGSTK buff[HTS_URLMAXSIZE * 2];
size_t i, j;
for (i = 0, j = 0; s[i] != '\0' && j + 3 < size && j + 3 < sizeof(buff);
i++) {
if (s[i] == '(' || s[i] == ')') {
buff[j++] = '%';
buff[j++] = '2';
buff[j++] = s[i] == '(' ? '8' : '9';
} else {
buff[j++] = s[i];
}
}
buff[j] = '\0';
strlcpybuff(s, buff, size);
}
/* Main parser */ /* Main parser */
int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) { int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
char catbuff[CATBUFF_SIZE]; char catbuff[CATBUFF_SIZE];
@@ -556,7 +598,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
if (opt->getmode & HTS_GETMODE_HTML) { if (opt->getmode & HTS_GETMODE_HTML) {
p = strfield(html, "title"); p = strfield(html, "title");
if (p) { if (p) {
if (*(html - 1) == '/') if (html_prevc(html, r->adr) == '/')
p = 0; // /title p = 0; // /title
} else { } else {
if (strfield(html, "/html")) if (strfield(html, "/html"))
@@ -1341,6 +1383,11 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
int can_avoid_quotes = 0; int can_avoid_quotes = 0;
char quotes_replacement = '\0'; char quotes_replacement = '\0';
int ensure_not_mime = 0; int ensure_not_mime = 0;
// .open(method,url): reject an HTTP-method first arg (#218)
int ensure_not_method = 0;
// @import: the quoted token is the URL; a trailing
// media/supports/layer condition is not part of it
int is_import = 0;
if (inscript_tag) if (inscript_tag)
expected_end = ";\"\'"; // voir a href="javascript:doc.location='foo'" expected_end = ";\"\'"; // voir a href="javascript:doc.location='foo'"
@@ -1357,9 +1404,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
if (!nc) if (!nc)
nc = strfield(html, ":location"); // javascript:location="doc" nc = strfield(html, ":location"); // javascript:location="doc"
if (!nc) { // location="doc" if (!nc) { // location="doc"
if ((nc = strfield(html, "location")) if ((nc = strfield(html, "location")) &&
&& !isspace(*(html - 1)) !isspace(html_prevc(html, r->adr)))
)
nc = 0; nc = 0;
} }
if (!nc) if (!nc)
@@ -1369,6 +1415,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
expected = '('; // parenthèse expected = '('; // parenthèse
expected_end = "),"; // fin: virgule ou parenthèse expected_end = "),"; // fin: virgule ou parenthèse
ensure_not_mime = 1; //* ensure the url is not a mime type */ ensure_not_mime = 1; //* ensure the url is not a mime type */
ensure_not_method = 1; // xhr.open: don't grab method
} }
if (!nc) if (!nc)
if ((nc = strfield(html, ".replace"))) { // window.replace("url") if ((nc = strfield(html, ".replace"))) { // window.replace("url")
@@ -1380,7 +1427,9 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
expected = '('; // parenthèse expected = '('; // parenthèse
expected_end = ")"; // fin: parenthèse expected_end = ")"; // fin: parenthèse
} }
if (!nc && (nc = strfield(html, "url")) && (!isalnum(*(html - 1))) && *(html - 1) != '_') { // url(url) if (!nc && (nc = strfield(html, "url")) &&
(!isalnum(html_prevc(html, r->adr))) &&
html_prevc(html, r->adr) != '_') { // url(url)
expected = '('; // parenthèse expected = '('; // parenthèse
expected_end = ")"; // fin: parenthèse expected_end = ")"; // fin: parenthèse
can_avoid_quotes = 1; can_avoid_quotes = 1;
@@ -1390,6 +1439,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
if ((nc = strfield(html, "import"))) { // import "url" if ((nc = strfield(html, "import"))) { // import "url"
if (is_space(*(html + nc))) { if (is_space(*(html + nc))) {
expected = 0; // no char expected expected = 0; // no char expected
is_import = 1;
} else } else
nc = 0; nc = 0;
} }
@@ -1407,6 +1457,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
if ((*a == 34) || (*a == '\'') || (can_avoid_quotes)) { if ((*a == 34) || (*a == '\'') || (can_avoid_quotes)) {
const char *b, *c; const char *b, *c;
int ndelim = 1; int ndelim = 1;
int valid_url = 0;
if ((*a == 34) || (*a == '\'')) if ((*a == 34) || (*a == '\''))
a++; a++;
@@ -1421,12 +1472,20 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
b++; b++;
} }
c = b--; c = b--;
// no closing delimiter here (truncated input):
// Don't scan past the buffer NUL or capture it.
if (*c != '\0') {
c += ndelim; c += ndelim;
while (*c == ' ') while (*c == ' ')
c++; c++;
if ((strchr(expected_end, *c)) || (*c == '\n') valid_url =
|| (*c == '\r')) { (strchr(expected_end, *c)) || (*c == '\n') ||
c -= (ndelim + 1); (*c == '\r') ||
(is_import && *(b + 1 + ndelim) == ' ');
}
if (valid_url) {
// URL end = last char (b), not the delimiter
c = b;
if ((int) (c - a + 1)) { if ((int) (c - a + 1)) {
if (ensure_not_mime) { if (ensure_not_mime) {
int i = 0; int i = 0;
@@ -1442,6 +1501,11 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
i++; i++;
} }
} }
// XHR.open's "GET" etc. is a method, not a URL
if (a != NULL && ensure_not_method &&
is_http_method(a, (size_t) (c - a + 1))) {
a = NULL;
}
// Check for bogus links (Vasiliy) // Check for bogus links (Vasiliy)
if (a != NULL) { if (a != NULL) {
const size_t size = c - a + 1; const size_t size = c - a + 1;
@@ -1485,7 +1549,6 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
} }
} }
} }
} }
} }
} }
@@ -1692,6 +1755,24 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
hts_nodetect[i - hts_nodetect[i -
1]); 1]);
} }
// xmlns / xmlns:prefix declare
// XML namespaces, not resources
// (#191)
else {
const int xl = strfield(
intag_startattr, "xmlns");
const char xc =
intag_startattr[xl];
if (xl &&
(xc == ':' || xc == '=' ||
is_space(xc))) {
url_ok = 0;
hts_log_print(
opt, LOG_DEBUG,
"dirty parsing: xmlns "
"namespace avoided");
}
}
} }
} }
@@ -2967,6 +3048,10 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
/* Never escape high-chars (we don't know the encoding!!) */ /* Never escape high-chars (we don't know the encoding!!) */
inplace_escape_uri_utf(tempo, sizeof(tempo)); inplace_escape_uri_utf(tempo, sizeof(tempo));
// unquoted url() (CSS/JS): keep parens escaped
if (ending_p == ')')
escape_url_parens(tempo, sizeof(tempo));
//if (!no_esc_utf) //if (!no_esc_utf)
// escape_uri(tempo); // escape with %xx // escape_uri(tempo); // escape with %xx
//else { //else {

View File

@@ -58,7 +58,8 @@ HTSEXT_API htsErrorCallback hts_get_error_callback(void);
#endif #endif
#endif #endif
#define HTSSAFE_ABORT_FUNCTION(A,B,C) do { \ #define HTSSAFE_ABORT_FUNCTION(A, B, C) \
do { \
htsErrorCallback callback = hts_get_error_callback(); \ htsErrorCallback callback = hts_get_error_callback(); \
if (callback != NULL) { \ if (callback != NULL) { \
callback(A, B, C); \ callback(A, B, C); \
@@ -75,7 +76,8 @@ HTSEXT_API htsErrorCallback hts_get_error_callback(void);
/** /**
* Fatal assertion check. * Fatal assertion check.
*/ */
#define assertf__(exp, sexp, file, line) (void) ( (exp) || (abortf_(sexp, file, line), 0) ) #define assertf__(exp, sexp, file, line) \
(void) ((exp) || (abortf_(sexp, file, line), 0))
/** /**
* Fatal assertion check. * Fatal assertion check.
@@ -106,7 +108,8 @@ static HTS_UNUSED void abortf_(const char *exp, const char *file, int line) {
#if (defined(__GNUC__) && !defined(__cplusplus)) #if (defined(__GNUC__) && !defined(__cplusplus))
/* Note: char[] and const char[] are compatible */ /* Note: char[] and const char[] are compatible */
#define HTS_IS_CHAR_BUFFER(VAR) ( __builtin_types_compatible_p ( typeof (VAR), char[] ) ) #define HTS_IS_CHAR_BUFFER(VAR) \
(__builtin_types_compatible_p(typeof(VAR), char[]))
#else #else
/* Note: a bit lame as char[8] won't be seen. */ /* Note: a bit lame as char[8] won't be seen. */
#define HTS_IS_CHAR_BUFFER(VAR) (sizeof(VAR) != sizeof(char *)) #define HTS_IS_CHAR_BUFFER(VAR) (sizeof(VAR) != sizeof(char *))
@@ -201,10 +204,13 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
*/ */
#if (defined(__GNUC__) && !defined(__cplusplus)) #if (defined(__GNUC__) && !defined(__cplusplus))
#define strncatbuff(A, B, N) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \ #define strncatbuff(A, B, N) \
__builtin_choose_expr( \
HTS_IS_CHAR_BUFFER(A), \
strncat_safe_(A, sizeof(A), B, \ strncat_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \ HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__), \ "overflow while appending '" #B "' to '" #A "'", __FILE__, \
__LINE__), \
strncatbuff_ptr_((A), (B), (N))) strncatbuff_ptr_((A), (B), (N)))
#else #else
#define strncatbuff(A, B, N) \ #define strncatbuff(A, B, N) \
@@ -212,7 +218,8 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
? strncat(A, B, N) \ ? strncat(A, B, N) \
: strncat_safe_(A, sizeof(A), B, \ : strncat_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \ HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) ) "overflow while appending '" #B "' to '" #A "'", \
__FILE__, __LINE__))
#endif #endif
/** /**
@@ -222,18 +229,24 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
*/ */
#if (defined(__GNUC__) && !defined(__cplusplus)) #if (defined(__GNUC__) && !defined(__cplusplus))
#define strcatbuff(A, B) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \ #define strcatbuff(A, B) \
__builtin_choose_expr( \
HTS_IS_CHAR_BUFFER(A), \
strncat_safe_(A, sizeof(A), B, \ strncat_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \ HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__), \ (size_t) -1, \
"overflow while appending '" #B "' to '" #A "'", __FILE__, \
__LINE__), \
strcatbuff_ptr_((A), (B))) strcatbuff_ptr_((A), (B)))
#else #else
#define strcatbuff(A, B) \ #define strcatbuff(A, B) \
(HTS_IS_NOT_CHAR_BUFFER(A) \ (HTS_IS_NOT_CHAR_BUFFER(A) \
? strcat(A, B) \ ? strcat(A, B) \
: strncat_safe_(A, sizeof(A), B, \ : strncat_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \ HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) ) (size_t) -1, \
"overflow while appending '" #B "' to '" #A "'", \
__FILE__, __LINE__))
#endif #endif
/** /**
@@ -243,10 +256,13 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
*/ */
#if (defined(__GNUC__) && !defined(__cplusplus)) #if (defined(__GNUC__) && !defined(__cplusplus))
#define strcpybuff(A, B) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \ #define strcpybuff(A, B) \
__builtin_choose_expr( \
HTS_IS_CHAR_BUFFER(A), \
strcpy_safe_(A, sizeof(A), B, \ strcpy_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \ HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
"overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__), \ "overflow while copying '" #B "' to '" #A "'", __FILE__, \
__LINE__), \
strcpybuff_ptr_((A), (B))) strcpybuff_ptr_((A), (B)))
#else #else
#define strcpybuff(A, B) \ #define strcpybuff(A, B) \
@@ -254,7 +270,8 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
? strcpy(A, B) \ ? strcpy(A, B) \
: strcpy_safe_(A, sizeof(A), B, \ : strcpy_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \ HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
"overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__) ) "overflow while copying '" #B "' to '" #A "'", __FILE__, \
__LINE__))
#endif #endif
/* /*
@@ -269,9 +286,9 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
* Append characters of "B" to "A", "A" having a maximum capacity of "S". * Append characters of "B" to "A", "A" having a maximum capacity of "S".
*/ */
#define strlcatbuff(A, B, S) \ #define strlcatbuff(A, B, S) \
strncat_safe_(A, S, B, \ strncat_safe_(A, S, B, HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \ (size_t) -1, "overflow while appending '" #B "' to '" #A "'", \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) __FILE__, __LINE__)
/** /**
* Append at most "N" characters of "B" to "A", "A" having a maximum capacity * Append at most "N" characters of "B" to "A", "A" having a maximum capacity
@@ -286,16 +303,17 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
* Copy characters of "B" to "A", "A" having a maximum capacity of "S". * Copy characters of "B" to "A", "A" having a maximum capacity of "S".
*/ */
#define strlcpybuff(A, B, S) \ #define strlcpybuff(A, B, S) \
strcpy_safe_(A, S, B, \ strcpy_safe_(A, S, B, HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \ "overflow while copying '" #B "' to '" #A "'", __FILE__, \
"overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__) __LINE__)
/** strnlen replacement (autotools). **/ /** strnlen replacement (autotools). **/
#if (!defined(_WIN32) && !defined(HAVE_STRNLEN)) #if (!defined(_WIN32) && !defined(HAVE_STRNLEN))
static HTS_UNUSED size_t strnlen(const char *s, size_t maxlen) { static HTS_UNUSED size_t strnlen(const char *s, size_t maxlen) {
size_t i; size_t i;
for(i = 0 ; i < maxlen && s[i] != '\0' ; i++) ; for (i = 0; i < maxlen && s[i] != '\0'; i++)
;
return i; return i;
} }
#endif #endif
@@ -304,12 +322,13 @@ static HTS_UNUSED size_t strnlen(const char *s, size_t maxlen) {
Aborts if source is NULL or has no NUL within that capacity. The sentinel Aborts if source is NULL or has no NUL within that capacity. The sentinel
sizeof_source == (size_t)-1 means "capacity unknown", and falls back to the sizeof_source == (size_t)-1 means "capacity unknown", and falls back to the
unbounded strlen (used when the source is a pointer rather than an array). */ unbounded strlen (used when the source is a pointer rather than an array). */
static HTS_INLINE HTS_UNUSED size_t strlen_safe_(const char *source, const size_t sizeof_source, static HTS_INLINE HTS_UNUSED size_t strlen_safe_(const char *source,
const size_t sizeof_source,
const char *file, int line) { const char *file, int line) {
size_t size; size_t size;
assertf_(source != NULL, file, line); assertf_(source != NULL, file, line);
size = sizeof_source != (size_t) -1 size = sizeof_source != (size_t) -1 ? strnlen(source, sizeof_source)
? strnlen(source, sizeof_source) : strlen(source); : strlen(source);
assertf_(size < sizeof_source, file, line); assertf_(size < sizeof_source, file, line);
return size; return size;
} }
@@ -319,10 +338,10 @@ static HTS_INLINE HTS_UNUSED size_t strlen_safe_(const char *source, const size_
source's capacity or (size_t)-1 if unknown. Aborts if the result (existing source's capacity or (size_t)-1 if unknown. Aborts if the result (existing
dest length + appended bytes + NUL) would not fit sizeof_dest: this NEVER dest length + appended bytes + NUL) would not fit sizeof_dest: this NEVER
truncates. Always NUL-terminates on success. */ truncates. Always NUL-terminates on success. */
static HTS_INLINE HTS_UNUSED char* strncat_safe_(char *const dest, const size_t sizeof_dest, static HTS_INLINE HTS_UNUSED char *
strncat_safe_(char *const dest, const size_t sizeof_dest,
const char *const source, const size_t sizeof_source, const char *const source, const size_t sizeof_source,
const size_t n, const size_t n, const char *exp, const char *file, int line) {
const char *exp, const char *file, int line) {
const size_t source_len = strlen_safe_(source, sizeof_source, file, line); const size_t source_len = strlen_safe_(source, sizeof_source, file, line);
const size_t dest_len = strlen_safe_(dest, sizeof_dest, file, line); const size_t dest_len = strlen_safe_(dest, sizeof_dest, file, line);
/* note: "size_t is an unsigned integral type" ((size_t) -1 is positive) */ /* note: "size_t is an unsigned integral type" ((size_t) -1 is positive) */
@@ -337,12 +356,14 @@ static HTS_INLINE HTS_UNUSED char* strncat_safe_(char *const dest, const size_t
/* Core bounded copy: empties dest then appends all of source via /* Core bounded copy: empties dest then appends all of source via
strncat_safe_. sizeof_dest is dest's total capacity (NUL included). Aborts strncat_safe_. sizeof_dest is dest's total capacity (NUL included). Aborts
(no truncation) if source plus its NUL would not fit. */ (no truncation) if source plus its NUL would not fit. */
static HTS_INLINE HTS_UNUSED char* strcpy_safe_(char *const dest, const size_t sizeof_dest, static HTS_INLINE HTS_UNUSED char *
strcpy_safe_(char *const dest, const size_t sizeof_dest,
const char *const source, const size_t sizeof_source, const char *const source, const size_t sizeof_source,
const char *exp, const char *file, int line) { const char *exp, const char *file, int line) {
assertf_(sizeof_dest != 0, file, line); assertf_(sizeof_dest != 0, file, line);
dest[0] = '\0'; dest[0] = '\0';
return strncat_safe_(dest, sizeof_dest, source, sizeof_source, (size_t) -1, exp, file, line); return strncat_safe_(dest, sizeof_dest, source, sizeof_source, (size_t) -1,
exp, file, line);
} }
/** /**
@@ -385,22 +406,28 @@ static HTS_INLINE HTS_UNUSED htsbuff htsbuff_ptr_(char *buf, size_t cap) {
/* 0 for an array, a -1 array-size compile error for a pointer. */ /* 0 for an array, a -1 array-size compile error for a pointer. */
#define htsbuff_must_be_array_(A) \ #define htsbuff_must_be_array_(A) \
(sizeof(char[1 - 2 * !!__builtin_types_compatible_p(typeof(A), typeof(&(A)[0]))]) - 1) (sizeof(char[1 - 2 * !!__builtin_types_compatible_p(typeof(A), \
typeof(&(A)[0]))]) - \
1)
#define htsbuff_array(ARR) htsbuff_ptr_((ARR), sizeof(ARR) + htsbuff_must_be_array_(ARR)) #define htsbuff_array(ARR) \
htsbuff_ptr_((ARR), sizeof(ARR) + htsbuff_must_be_array_(ARR))
#else #else
#define htsbuff_array(ARR) htsbuff_ptr_((ARR), sizeof(ARR)) #define htsbuff_array(ARR) htsbuff_ptr_((ARR), sizeof(ARR))
#endif #endif
/** Builder over pointer P of known capacity N (N includes the NUL). */ /** Builder over pointer P of known capacity N (N includes the NUL). */
#define htsbuff_ptr(P, N) htsbuff_ptr_((P), (N)) #define htsbuff_ptr(P, N) htsbuff_ptr_((P), (N))
/** Append at most n characters of s (stopping at its NUL). Aborts on overflow. */ /** Append at most n characters of s (stopping at its NUL). Aborts on overflow.
static HTS_INLINE HTS_UNUSED void htsbuff_catn(htsbuff *b, const char *s, size_t n) { */
static HTS_INLINE HTS_UNUSED void htsbuff_catn(htsbuff *b, const char *s,
size_t n) {
const size_t add = strnlen(s, n); const size_t add = strnlen(s, n);
/* Overflow-safe: keep the (potentially huge) 'add' alone on one side. The /* Overflow-safe: keep the (potentially huge) 'add' alone on one side. The
maintained invariant len < cap makes 'cap - len' >= 1 (no underflow), so maintained invariant len < cap makes 'cap - len' >= 1 (no underflow), so
'add < cap - len' cannot wrap the way 'len + add < cap' could. */ 'add < cap - len' cannot wrap the way 'len + add < cap' could. */
assertf__(add < b->cap - b->len, "htsbuff append overflow", __FILE__, __LINE__); assertf__(add < b->cap - b->len, "htsbuff append overflow", __FILE__,
__LINE__);
memcpy(b->buf + b->len, s, add); memcpy(b->buf + b->len, s, add);
b->len += add; b->len += add;
b->buf[b->len] = '\0'; b->buf[b->len] = '\0';
@@ -437,7 +464,13 @@ static HTS_INLINE HTS_UNUSED const char *htsbuff_str(const htsbuff *b) {
#define calloct(A, B) calloc((A), (B)) #define calloct(A, B) calloc((A), (B))
#define freet(A) do { if ((A) != NULL) { free(A); (A) = NULL; } } while(0) #define freet(A) \
do { \
if ((A) != NULL) { \
free(A); \
(A) = NULL; \
} \
} while (0)
#define strdupt(A) strdup(A) #define strdupt(A) strdup(A)

View File

@@ -60,6 +60,7 @@ typedef struct String String;
#endif #endif
#ifndef HTS_DEF_STRUCT_String #ifndef HTS_DEF_STRUCT_String
#define HTS_DEF_STRUCT_String #define HTS_DEF_STRUCT_String
/** /**
* Growable owned string. * Growable owned string.
* *
@@ -131,14 +132,16 @@ struct String {
/** Drop the last byte and re-terminate. Undefined if the String is empty /** Drop the last byte and re-terminate. Undefined if the String is empty
(no length check; would underflow). **/ (no length check; would underflow). **/
#define StringPopRight(BLK) do { \ #define StringPopRight(BLK) \
do { \
StringBuffRW(BLK)[--StringLength(BLK)] = '\0'; \ StringBuffRW(BLK)[--StringLength(BLK)] = '\0'; \
} while (0) } while (0)
/** Grow so capacity_ >= CAPACITY (total bytes, including the NUL). May realloc /** Grow so capacity_ >= CAPACITY (total bytes, including the NUL). May realloc
(invalidating prior buffer pointers); aborts via STRING_ASSERT on OOM. (invalidating prior buffer pointers); aborts via STRING_ASSERT on OOM.
Never shrinks. **/ Never shrinks. **/
#define StringRoomTotal(BLK, CAPACITY) do { \ #define StringRoomTotal(BLK, CAPACITY) \
do { \
const size_t capacity_ = (size_t) (CAPACITY); \ const size_t capacity_ = (size_t) (CAPACITY); \
while ((BLK).capacity_ < capacity_) { \ while ((BLK).capacity_ < capacity_) { \
if ((BLK).capacity_ < 16) { \ if ((BLK).capacity_ < 16) { \
@@ -153,11 +156,13 @@ struct String {
/** Reserve room for SIZE more bytes beyond the current length (plus the NUL). /** Reserve room for SIZE more bytes beyond the current length (plus the NUL).
May realloc, invalidating prior buffer pointers. **/ May realloc, invalidating prior buffer pointers. **/
#define StringRoom(BLK, SIZE) StringRoomTotal(BLK, StringLength(BLK) + (SIZE) + 1) #define StringRoom(BLK, SIZE) \
StringRoomTotal(BLK, StringLength(BLK) + (SIZE) + 1)
/** Reserve room for SIZE more bytes and return the (post-realloc) RW buffer, /** Reserve room for SIZE more bytes and return the (post-realloc) RW buffer,
for appending in place. Does not update length_; the caller must. **/ for appending in place. Does not update length_; the caller must. **/
#define StringBuffN(BLK, SIZE) StringBuffN_(&(BLK), SIZE) #define StringBuffN(BLK, SIZE) StringBuffN_(&(BLK), SIZE)
HTS_STATIC char *StringBuffN_(String *blk, int size) { HTS_STATIC char *StringBuffN_(String *blk, int size) {
StringRoom(*blk, size); StringRoom(*blk, size);
return StringBuffRW(*blk); return StringBuffRW(*blk);
@@ -166,7 +171,8 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
/** Zero the fields (NULL buffer, no allocation). Use on an uninitialized /** Zero the fields (NULL buffer, no allocation). Use on an uninitialized
String only; does NOT free an existing buffer (use StringFree to reset String only; does NOT free an existing buffer (use StringFree to reset
an owned one), so calling it on a live String leaks. **/ an owned one), so calling it on a live String leaks. **/
#define StringInit(BLK) do { \ #define StringInit(BLK) \
do { \
(BLK).buffer_ = NULL; \ (BLK).buffer_ = NULL; \
(BLK).capacity_ = 0; \ (BLK).capacity_ = 0; \
(BLK).length_ = 0; \ (BLK).length_ = 0; \
@@ -174,7 +180,8 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
/** Truncate to length 0, keeping the allocation. Forces a non-NULL buffer /** Truncate to length 0, keeping the allocation. Forces a non-NULL buffer
(allocates if empty) and writes the leading NUL, so StringBuff is "". **/ (allocates if empty) and writes the leading NUL, so StringBuff is "". **/
#define StringClear(BLK) do { \ #define StringClear(BLK) \
do { \
(BLK).length_ = 0; \ (BLK).length_ = 0; \
StringRoom(BLK, 0); \ StringRoom(BLK, 0); \
(BLK).buffer_[0] = '\0'; \ (BLK).buffer_[0] = '\0'; \
@@ -182,7 +189,8 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
/** Set length_ to SIZE, or to strlen(buffer_) if SIZE is negative. Caller /** Set length_ to SIZE, or to strlen(buffer_) if SIZE is negative. Caller
asserts SIZE fits the existing content; does not (re)allocate. **/ asserts SIZE fits the existing content; does not (re)allocate. **/
#define StringSetLength(BLK, SIZE) do { \ #define StringSetLength(BLK, SIZE) \
do { \
if (SIZE >= 0) { \ if (SIZE >= 0) { \
(BLK).length_ = SIZE; \ (BLK).length_ = SIZE; \
} else { \ } else { \
@@ -192,7 +200,8 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
/** Release the owned buffer and reset to the empty state (NULL buffer). /** Release the owned buffer and reset to the empty state (NULL buffer).
Idempotent; safe on an already-empty String. **/ Idempotent; safe on an already-empty String. **/
#define StringFree(BLK) do { \ #define StringFree(BLK) \
do { \
if ((BLK).buffer_ != NULL) { \ if ((BLK).buffer_ != NULL) { \
STRING_FREE((BLK).buffer_); \ STRING_FREE((BLK).buffer_); \
(BLK).buffer_ = NULL; \ (BLK).buffer_ = NULL; \
@@ -207,7 +216,8 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
freed or used by the caller afterwards. length_/capacity_ are set to freed or used by the caller afterwards. length_/capacity_ are set to
strlen(STR) (capacity_ here excludes the NUL, so the next append reallocs). strlen(STR) (capacity_ here excludes the NUL, so the next append reallocs).
**/ **/
#define StringSetBuffer(BLK, STR) do { \ #define StringSetBuffer(BLK, STR) \
do { \
size_t len__ = strlen(STR); \ size_t len__ = strlen(STR); \
StringFree(BLK); \ StringFree(BLK); \
(BLK).buffer_ = (STR); \ (BLK).buffer_ = (STR); \
@@ -218,7 +228,8 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
/** Append SIZE raw bytes from STR (NULs allowed as data). Grows as needed and /** Append SIZE raw bytes from STR (NULs allowed as data). Grows as needed and
re-terminates with a NUL after the appended bytes. STR must not alias re-terminates with a NUL after the appended bytes. STR must not alias
BLK's buffer (a realloc would invalidate it). **/ BLK's buffer (a realloc would invalidate it). **/
#define StringMemcat(BLK, STR, SIZE) do { \ #define StringMemcat(BLK, STR, SIZE) \
do { \
const char *str_mc_ = (STR); \ const char *str_mc_ = (STR); \
const size_t size_mc_ = (size_t) (SIZE); \ const size_t size_mc_ = (size_t) (SIZE); \
StringRoom(BLK, size_mc_); \ StringRoom(BLK, size_mc_); \
@@ -231,13 +242,15 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
/** Replace content with SIZE raw bytes from STR (NULs allowed as data). /** Replace content with SIZE raw bytes from STR (NULs allowed as data).
Same non-aliasing requirement as StringMemcat. **/ Same non-aliasing requirement as StringMemcat. **/
#define StringMemcpy(BLK, STR, SIZE) do { \ #define StringMemcpy(BLK, STR, SIZE) \
do { \
(BLK).length_ = 0; \ (BLK).length_ = 0; \
StringMemcat(BLK, STR, SIZE); \ StringMemcat(BLK, STR, SIZE); \
} while (0) } while (0)
/** Append one byte and re-terminate. Grows as needed. **/ /** Append one byte and re-terminate. Grows as needed. **/
#define StringAddchar(BLK, c) do { \ #define StringAddchar(BLK, c) \
do { \
String *const s__ = &(BLK); \ String *const s__ = &(BLK); \
char c__ = (c); \ char c__ = (c); \
StringRoom(*s__, 1); \ StringRoom(*s__, 1); \
@@ -281,7 +294,8 @@ HTS_STATIC void StringAttach(String * blk, char **str) {
/** Append the C string STR (up to its NUL). No-op if STR is NULL. STR must not /** Append the C string STR (up to its NUL). No-op if STR is NULL. STR must not
alias BLK's buffer. **/ alias BLK's buffer. **/
#define StringCat(BLK, STR) do { \ #define StringCat(BLK, STR) \
do { \
const char *const str__ = (STR); \ const char *const str__ = (STR); \
if (str__ != NULL) { \ if (str__ != NULL) { \
const size_t size__ = strlen(str__); \ const size_t size__ = strlen(str__); \
@@ -291,7 +305,8 @@ HTS_STATIC void StringAttach(String * blk, char **str) {
/** Append at most SIZE leading bytes of the C string STR. No-op if STR is /** Append at most SIZE leading bytes of the C string STR. No-op if STR is
NULL. STR must not alias BLK's buffer. **/ NULL. STR must not alias BLK's buffer. **/
#define StringCatN(BLK, STR, SIZE) do { \ #define StringCatN(BLK, STR, SIZE) \
do { \
const char *str__ = (STR); \ const char *str__ = (STR); \
if (str__ != NULL) { \ if (str__ != NULL) { \
size_t size__ = strlen(str__); \ size_t size__ = strlen(str__); \
@@ -304,7 +319,8 @@ HTS_STATIC void StringAttach(String * blk, char **str) {
/** Replace content with at most SIZE leading bytes of the C string STR. /** Replace content with at most SIZE leading bytes of the C string STR.
If STR is NULL, clears to "". STR must not alias BLK's buffer. **/ If STR is NULL, clears to "". STR must not alias BLK's buffer. **/
#define StringCopyN(BLK, STR, SIZE) do { \ #define StringCopyN(BLK, STR, SIZE) \
do { \
const char *str__ = (STR); \ const char *str__ = (STR); \
const size_t usize__ = (SIZE); \ const size_t usize__ = (SIZE); \
(BLK).length_ = 0; \ (BLK).length_ = 0; \
@@ -326,7 +342,8 @@ HTS_STATIC void StringAttach(String * blk, char **str) {
/** Replace content with a copy of the C string STR. If STR is NULL, clears to /** Replace content with a copy of the C string STR. If STR is NULL, clears to
"". STR must not alias BLK's buffer (use StringCopyOverlapped if it might). "". STR must not alias BLK's buffer (use StringCopyOverlapped if it might).
**/ **/
#define StringCopy(BLK, STR) do { \ #define StringCopy(BLK, STR) \
do { \
const char *str__ = (STR); \ const char *str__ = (STR); \
if (str__ != NULL) { \ if (str__ != NULL) { \
size_t size__ = strlen(str__); \ size_t size__ = strlen(str__); \
@@ -338,7 +355,8 @@ HTS_STATIC void StringAttach(String * blk, char **str) {
/** Like StringCopy but safe when STR aliases BLK's own buffer: copies via a /** Like StringCopy but safe when STR aliases BLK's own buffer: copies via a
temporary, so a self-copy or overlap is well-defined. **/ temporary, so a self-copy or overlap is well-defined. **/
#define StringCopyOverlapped(BLK, STR) do { \ #define StringCopyOverlapped(BLK, STR) \
do { \
String s__ = STRING_EMPTY; \ String s__ = STRING_EMPTY; \
StringCopy(s__, STR); \ StringCopy(s__, STR); \
StringCopyS(BLK, s__); \ StringCopyS(BLK, s__); \

View File

@@ -73,6 +73,7 @@ typedef struct strc_int2bytes2 strc_int2bytes2;
#endif #endif
#ifndef HTS_DEF_DEFSTRUCT_hts_log_type #ifndef HTS_DEF_DEFSTRUCT_hts_log_type
#define HTS_DEF_DEFSTRUCT_hts_log_type #define HTS_DEF_DEFSTRUCT_hts_log_type
/** Log severity levels, most to least severe. A message is emitted only if its /** Log severity levels, most to least severe. A message is emitted only if its
level is <= opt->debug. LOG_ERRNO is a flag OR'd into the level to append level is <= opt->debug. LOG_ERRNO is a flag OR'd into the level to append
": <strerror(errno)>" to the message. */ ": <strerror(errno)>" to the message. */
@@ -111,8 +112,10 @@ requires: htsdefines.h */
* CALLBACKARG_USERDEF(). Allocates a t_hts_callbackarg with hts_malloc (not * CALLBACKARG_USERDEF(). Allocates a t_hts_callbackarg with hts_malloc (not
* checked for OOM); it is freed by hts_free_opt(). * checked for OOM); it is freed by hts_free_opt().
*/ */
#define CHAIN_FUNCTION(OPT, MEMBER, FUNCTION, ARGUMENT) do { \ #define CHAIN_FUNCTION(OPT, MEMBER, FUNCTION, ARGUMENT) \
t_hts_callbackarg *carg = (t_hts_callbackarg*) hts_malloc(sizeof(t_hts_callbackarg)); \ do { \
t_hts_callbackarg *carg = \
(t_hts_callbackarg *) hts_malloc(sizeof(t_hts_callbackarg)); \
carg->userdef = (ARGUMENT); \ carg->userdef = (ARGUMENT); \
carg->prev.fun = (void *) (OPT)->callbacks_fun->MEMBER.fun; \ carg->prev.fun = (void *) (OPT)->callbacks_fun->MEMBER.fun; \
carg->prev.carg = (OPT)->callbacks_fun->MEMBER.carg; \ carg->prev.carg = (OPT)->callbacks_fun->MEMBER.carg; \
@@ -120,8 +123,10 @@ requires: htsdefines.h */
(OPT)->callbacks_fun->MEMBER.carg = carg; \ (OPT)->callbacks_fun->MEMBER.carg = carg; \
} while (0) } while (0)
/* The following helpers are useful only if you know that an existing callback migh be existing before before the call to CHAIN_FUNCTION() /* The following helpers are useful only if you know that an existing callback
If your functions were added just after hts_create_opt(), no need to make the previous function check */ migh be existing before before the call to CHAIN_FUNCTION() If your functions
were added just after hts_create_opt(), no need to make the previous function
check */
/** Inside a chained callback, return the ARGUMENT pointer originally passed to /** Inside a chained callback, return the ARGUMENT pointer originally passed to
CHAIN_FUNCTION(), or NULL when CARG is NULL. */ CHAIN_FUNCTION(), or NULL when CARG is NULL. */
@@ -129,11 +134,13 @@ If your functions were added just after hts_create_opt(), no need to make the pr
/** Return the callback of type NAME that this one chained over, cast to its /** Return the callback of type NAME that this one chained over, cast to its
function-pointer type, or NULL. Call it to forward to the prior handler. */ function-pointer type, or NULL. Call it to forward to the prior handler. */
#define CALLBACKARG_PREV_FUN(CARG, NAME) ( (t_hts_htmlcheck_ ##NAME) ( ( (CARG) != NULL ) ? (CARG)->prev.fun : NULL ) ) #define CALLBACKARG_PREV_FUN(CARG, NAME) \
((t_hts_htmlcheck_##NAME)(((CARG) != NULL) ? (CARG)->prev.fun : NULL))
/** Return the carg of the callback this one chained over (pass it when /** Return the carg of the callback this one chained over (pass it when
forwarding to the CALLBACKARG_PREV_FUN result), or NULL. */ forwarding to the CALLBACKARG_PREV_FUN result), or NULL. */
#define CALLBACKARG_PREV_CARG(CARG) ( ( (CARG) != NULL ) ? (CARG)->prev.carg : NULL ) #define CALLBACKARG_PREV_CARG(CARG) \
(((CARG) != NULL) ? (CARG)->prev.carg : NULL)
/* Functions */ /* Functions */
@@ -212,8 +219,8 @@ HTSEXT_API hts_boolean hts_log(httrackp *opt, const char *prefix,
/** printf-style log at level @p type (an hts_log_type, optionally |LOG_ERRNO). /** printf-style log at level @p type (an hts_log_type, optionally |LOG_ERRNO).
Forwards to the registered log callback, and when the level is <= opt->debug Forwards to the registered log callback, and when the level is <= opt->debug
also to opt->log. @p format must be non-NULL. */ also to opt->log. @p format must be non-NULL. */
HTSEXT_API void hts_log_print(httrackp * opt, int type, const char *format, HTSEXT_API void hts_log_print(httrackp *opt, int type, const char *format, ...)
...) HTS_PRINTF_FUN(3, 4); HTS_PRINTF_FUN(3, 4);
/** va_list form of hts_log_print(). @p opt may be NULL (only the callback /** va_list form of hts_log_print(). @p opt may be NULL (only the callback
runs). Preserves errno. @p format must be non-NULL. */ runs). Preserves errno. @p format must be non-NULL. */
@@ -255,7 +262,8 @@ HTSEXT_API int htswrap_add(httrackp * opt, const char *name, void *fct);
or 0 if none or unknown. */ or 0 if none or unknown. */
HTSEXT_API uintptr_t htswrap_read(httrackp *opt, const char *name); HTSEXT_API uintptr_t htswrap_read(httrackp *opt, const char *name);
/* Internal library allocators, if a different libc is being used by the client */ /* Internal library allocators, if a different libc is being used by the client
*/
/** strdup() through the library allocator. Returns a heap copy freed with /** strdup() through the library allocator. Returns a heap copy freed with
hts_free(), or NULL on failure. */ hts_free(), or NULL on failure. */
HTSEXT_API char *hts_strdup(const char *string); HTSEXT_API char *hts_strdup(const char *string);
@@ -490,40 +498,50 @@ HTSEXT_API void unescape_amp(char *s);
/** Percent-escape only spaces (' ' becomes "%20"); copy everything else /** Percent-escape only spaces (' ' becomes "%20"); copy everything else
* verbatim. */ * verbatim. */
HTSEXT_API size_t escape_spc_url(const char *const src, char *const dest, const size_t size); HTSEXT_API size_t escape_spc_url(const char *const src, char *const dest,
const size_t size);
/** Aggressively percent-escape @p src for use as a single URL path segment /** Aggressively percent-escape @p src for use as a single URL path segment
(reserved, delimiter, unwise, special, avoid and mark characters). */ (reserved, delimiter, unwise, special, avoid and mark characters). */
HTSEXT_API size_t escape_in_url(const char *const src, char *const dest, const size_t size); HTSEXT_API size_t escape_in_url(const char *const src, char *const dest,
const size_t size);
/** Percent-escape @p src as a URI, escaping only what is necessary and keeping /** Percent-escape @p src as a URI, escaping only what is necessary and keeping
'/' and other reserved characters. */ '/' and other reserved characters. */
HTSEXT_API size_t escape_uri(const char *const src, char *const dest, const size_t size); HTSEXT_API size_t escape_uri(const char *const src, char *const dest,
const size_t size);
/** Like escape_uri() for a UTF-8 URI: also escapes reserved characters other /** Like escape_uri() for a UTF-8 URI: also escapes reserved characters other
than '/'. */ than '/'. */
HTSEXT_API size_t escape_uri_utf(const char *const src, char *const dest, const size_t size); HTSEXT_API size_t escape_uri_utf(const char *const src, char *const dest,
const size_t size);
/** Minimal "make safe" escape: percent-escapes only '"', ' ' and control /** Minimal "make safe" escape: percent-escapes only '"', ' ' and control
characters, leaving an already-formed URL otherwise intact. */ characters, leaving an already-formed URL otherwise intact. */
HTSEXT_API size_t escape_check_url(const char *const src, char *const dest, const size_t size); HTSEXT_API size_t escape_check_url(const char *const src, char *const dest,
const size_t size);
/** Append-variant of escape_spc_url(): escapes @p src after the existing /** Append-variant of escape_spc_url(): escapes @p src after the existing
NUL-terminated content of @p dest. Returns the bytes appended (excluding the NUL-terminated content of @p dest. Returns the bytes appended (excluding the
NUL). */ NUL). */
HTSEXT_API size_t append_escape_spc_url(const char *const src, char *const dest, const size_t size); HTSEXT_API size_t append_escape_spc_url(const char *const src, char *const dest,
const size_t size);
/** Append-variant of escape_in_url(). See append_escape_spc_url(). */ /** Append-variant of escape_in_url(). See append_escape_spc_url(). */
HTSEXT_API size_t append_escape_in_url(const char *const src, char *const dest, const size_t size); HTSEXT_API size_t append_escape_in_url(const char *const src, char *const dest,
const size_t size);
/** Append-variant of escape_uri(). See append_escape_spc_url(). */ /** Append-variant of escape_uri(). See append_escape_spc_url(). */
HTSEXT_API size_t append_escape_uri(const char *const src, char *const dest, const size_t size); HTSEXT_API size_t append_escape_uri(const char *const src, char *const dest,
const size_t size);
/** Append-variant of escape_uri_utf(). See append_escape_spc_url(). */ /** Append-variant of escape_uri_utf(). See append_escape_spc_url(). */
HTSEXT_API size_t append_escape_uri_utf(const char *const src, char *const dest, const size_t size); HTSEXT_API size_t append_escape_uri_utf(const char *const src, char *const dest,
const size_t size);
/** Append-variant of escape_check_url(). See append_escape_spc_url(). */ /** Append-variant of escape_check_url(). See append_escape_spc_url(). */
HTSEXT_API size_t append_escape_check_url(const char *const src, char *const dest, const size_t size); HTSEXT_API size_t append_escape_check_url(const char *const src,
char *const dest, const size_t size);
/** In-place variant of escape_spc_url(): escapes the NUL-terminated string in /** In-place variant of escape_spc_url(): escapes the NUL-terminated string in
@p dest back into @p dest. */ @p dest back into @p dest. */
@@ -543,32 +561,39 @@ HTSEXT_API size_t inplace_escape_check_url(char *const dest, const size_t size);
/** Same escaping as escape_check_url() but returns @p dest instead of the byte /** Same escaping as escape_check_url() but returns @p dest instead of the byte
count. */ count. */
HTSEXT_API char *escape_check_url_addr(const char *const src, char *const dest, const size_t size); HTSEXT_API char *escape_check_url_addr(const char *const src, char *const dest,
const size_t size);
/** Build a MIME/MHTML content-id token in @p dest from @p adr and @p fil: /** Build a MIME/MHTML content-id token in @p dest from @p adr and @p fil:
escape_in_url() both, then replace every '%' with 'X' so the result is one escape_in_url() both, then replace every '%' with 'X' so the result is one
opaque token. */ opaque token. */
HTSEXT_API size_t make_content_id(const char *const adr, const char *const fil, char *const dest, const size_t size); HTSEXT_API size_t make_content_id(const char *const adr, const char *const fil,
char *const dest, const size_t size);
/** Low-level percent-escaper backing the escape_* family. @p mode selects the /** Low-level percent-escaper backing the escape_* family. @p mode selects the
character class to escape: 0 check_url, 1 in_url, 2 spc_url, 3 uri, character class to escape: 0 check_url, 1 in_url, 2 spc_url, 3 uri,
30 uri_utf. @p max_size is the dest capacity including the NUL. */ 30 uri_utf. @p max_size is the dest capacity including the NUL. */
HTSEXT_API size_t x_escape_http(const char *const s, char *const dest, const size_t max_size, const int mode); HTSEXT_API size_t x_escape_http(const char *const s, char *const dest,
const size_t max_size, const int mode);
/** Strip all control characters (byte value < 32) from @p s in place. */ /** Strip all control characters (byte value < 32) from @p s in place. */
HTSEXT_API void escape_remove_control(char *const s); HTSEXT_API void escape_remove_control(char *const s);
/** HTML-escape for text output: rewrite '&' to "&amp;" and pass every other /** HTML-escape for text output: rewrite '&' to "&amp;" and pass every other
byte through unchanged. */ byte through unchanged. */
HTSEXT_API size_t escape_for_html_print(const char *const s, char *const dest, const size_t size); HTSEXT_API size_t escape_for_html_print(const char *const s, char *const dest,
const size_t size);
/** Like escape_for_html_print() but also convert every high byte (>= 128) to a /** Like escape_for_html_print() but also convert every high byte (>= 128) to a
numeric entity "&#xNN;". */ numeric entity "&#xNN;". */
HTSEXT_API size_t escape_for_html_print_full(const char *const s, char *const dest, const size_t size); HTSEXT_API size_t escape_for_html_print_full(const char *const s,
char *const dest,
const size_t size);
/** Percent-decode @p s into @p catbuff (capacity @p size) and return @p /** Percent-decode @p s into @p catbuff (capacity @p size) and return @p
catbuff. Decodes every "%xx" hex escape. */ catbuff. Decodes every "%xx" hex escape. */
HTSEXT_API char *unescape_http(char *const catbuff, const size_t size, const char *const s); HTSEXT_API char *unescape_http(char *const catbuff, const size_t size,
const char *const s);
/** Percent-decode @p s into @p catbuff, but only the escapes that are safe to /** Percent-decode @p s into @p catbuff, but only the escapes that are safe to
decode while keeping a valid URI (reserved, delimiter, unwise, control and decode while keeping a valid URI (reserved, delimiter, unwise, control and
@@ -589,8 +614,7 @@ HTSEXT_API hts_boolean get_httptype_sized(httrackp *opt, char *s, size_t ssize,
HTS_MIMETYPE_SIZE capacity. */ HTS_MIMETYPE_SIZE capacity. */
HTS_DEPRECATED("use get_httptype_sized(opt, s, ssize, fil, flag)") HTS_DEPRECATED("use get_httptype_sized(opt, s, ssize, fil, flag)")
HTSEXT_API void get_httptype(httrackp * opt, char *s, const char *fil, HTSEXT_API void get_httptype(httrackp *opt, char *s, const char *fil, int flag);
int flag);
/** Classify @p fil by its extension: 0 unknown, 1 known non-HTML, 2 known HTML. /** Classify @p fil by its extension: 0 unknown, 1 known non-HTML, 2 known HTML.
Consults the built-in table then user --assume rules. 0 for a NULL @p fil. Consults the built-in table then user --assume rules. 0 for a NULL @p fil.
@@ -633,11 +657,13 @@ HTSEXT_API void guess_httptype(httrackp * opt, char *s, const char *fil);
time), not a pointer. */ time), not a pointer. */
/** Concatenate @p a and @p b into @p catbuff (NULL or empty operands are /** Concatenate @p a and @p b into @p catbuff (NULL or empty operands are
* skipped). */ * skipped). */
HTSEXT_API char *concat(char *catbuff, size_t size, const char *a, const char *b); HTSEXT_API char *concat(char *catbuff, size_t size, const char *a,
const char *b);
/** Like concat(a, b) but convert '/' to the platform path separator (Windows). /** Like concat(a, b) but convert '/' to the platform path separator (Windows).
*/ */
HTSEXT_API char *fconcat(char *catbuff, size_t size, const char *a, const char *b); HTSEXT_API char *fconcat(char *catbuff, size_t size, const char *a,
const char *b);
/** Copy @p a into @p catbuff, converting '/' to the platform path separator /** Copy @p a into @p catbuff, converting '/' to the platform path separator
(Windows). */ (Windows). */
@@ -756,7 +782,8 @@ typedef struct utimbuf STRUCT_UTIMBUF;
/** Macro aimed to break at build-time if a size is not a sizeof() strictly /** Macro aimed to break at build-time if a size is not a sizeof() strictly
* greater than sizeof(char*). **/ * greater than sizeof(char*). **/
#undef COMPILE_TIME_CHECK_SIZE #undef COMPILE_TIME_CHECK_SIZE
#define COMPILE_TIME_CHECK_SIZE(A) (void) ((void (*)(char[A - sizeof(char*) - 1])) NULL) #define COMPILE_TIME_CHECK_SIZE(A) \
(void) ((void (*)(char[A - sizeof(char *) - 1])) NULL)
/** Macro aimed to break at compile-time if a size is not a sizeof() strictly /** Macro aimed to break at compile-time if a size is not a sizeof() strictly
* greater than sizeof(char*). **/ * greater than sizeof(char*). **/

View File

@@ -4,28 +4,33 @@
# Initializes the htsserver GUI frontend and launch the default browser # Initializes the htsserver GUI frontend and launch the default browser
BROWSEREXE= BROWSEREXE=
SRCHBROWSEREXE="x-www-browser www-browser iceape mozilla firefox-developer-edition firefox icecat iceweasel abrowser firebird galeon konqueror midori opera google-chrome chrome chromium chromium-browser netscape firefox-developer-edition" SRCHBROWSEREXE=(x-www-browser www-browser iceape mozilla firefox-developer-edition firefox icecat iceweasel abrowser firebird galeon konqueror midori opera google-chrome chrome chromium chromium-browser netscape firefox-developer-edition)
# shellcheck disable=SC2153 # BROWSER is the standard freedesktop env var, not a typo
if test -n "${BROWSER}"; then if test -n "${BROWSER}"; then
# sensible-browser will f up if BROWSER is not set # sensible-browser will f up if BROWSER is not set
SRCHBROWSEREXE="xdg-open sensible-browser ${SRCHBROWSEREXE}" SRCHBROWSEREXE=(xdg-open sensible-browser "${SRCHBROWSEREXE[@]}")
fi fi
# Patch for Darwin/Mac by Ross Williams # Patch for Darwin/Mac by Ross Williams
if test "`uname -s`" == "Darwin"; then if test "$(uname -s)" == "Darwin"; then
# Darwin/Mac OS X uses a system 'open' command to find # Darwin/Mac OS X uses a system 'open' command to find
# the default browser. The -W flag causes it to wait for # the default browser. The -W flag causes it to wait for
# the browser to exit # the browser to exit
BROWSEREXE="/usr/bin/open -W" BROWSEREXE="/usr/bin/open -W"
fi fi
BINWD=`dirname "$0"` BINWD=$(dirname "$0")
SRCHPATH="$BINWD /usr/local/bin /usr/share/bin /usr/bin /usr/lib/httrack /usr/local/lib/httrack /usr/local/share/httrack /opt/local/bin /sw/bin ${HOME}/usr/bin ${HOME}/bin" SRCHPATH=("$BINWD" /usr/local/bin /usr/share/bin /usr/bin /usr/lib/httrack /usr/local/lib/httrack /usr/local/share/httrack /opt/local/bin /sw/bin "${HOME}/usr/bin" "${HOME}/bin")
SRCHPATH="$SRCHPATH "`echo $PATH | tr ":" " "` IFS=':' read -ra pathdirs <<<"$PATH"
SRCHDISTPATH="$BINWD/../share $BINWD/.. /usr/share /usr/local /usr /local /usr/local/share ${HOME}/usr ${HOME}/usr/share /opt/local/share /sw ${HOME}/usr/local ${HOME}/usr/share" for d in "${pathdirs[@]}"; do
# drop empty PATH fields, matching the old echo|tr word-split
test -n "$d" && SRCHPATH+=("$d")
done
SRCHDISTPATH=("$BINWD/../share" "$BINWD/.." /usr/share /usr/local /usr /local /usr/local/share "${HOME}/usr" "${HOME}/usr/share" /opt/local/share /sw "${HOME}/usr/local" "${HOME}/usr/share")
### ###
# And now some famous cuisine # And now some famous cuisine
function log { function log {
echo "$0($$): $@" >&2 echo "$0($$): $*" >&2
return 0 return 0
} }
@@ -42,35 +47,35 @@ log "Browser (or helper) exited"
# First ensure that we can launch the server # First ensure that we can launch the server
BINPATH= BINPATH=
for i in ${SRCHPATH}; do for i in "${SRCHPATH[@]}"; do
! test -n "${BINPATH}" && test -x ${i}/htsserver && BINPATH=${i} ! test -n "${BINPATH}" && test -x "${i}/htsserver" && BINPATH="${i}"
done done
for i in ${SRCHDISTPATH}; do for i in "${SRCHDISTPATH[@]}"; do
! test -n "${DISTPATH}" && test -f "${i}/httrack/lang.def" && DISTPATH="${i}/httrack" ! test -n "${DISTPATH}" && test -f "${i}/httrack/lang.def" && DISTPATH="${i}/httrack"
done done
test -n "${BINPATH}" || ! log "Could not find htsserver" || exit 1 test -n "${BINPATH}" || ! log "Could not find htsserver" || exit 1
test -n "${DISTPATH}" || ! log "Could not find httrack directory" || exit 1 test -n "${DISTPATH}" || ! log "Could not find httrack directory" || exit 1
test -f ${DISTPATH}/lang.def || ! log "Could not find ${DISTPATH}/lang.def" || exit 1 test -f "${DISTPATH}/lang.def" || ! log "Could not find ${DISTPATH}/lang.def" || exit 1
test -f ${DISTPATH}/lang.indexes || ! log "Could not find ${DISTPATH}/lang.indexes" || exit 1 test -f "${DISTPATH}/lang.indexes" || ! log "Could not find ${DISTPATH}/lang.indexes" || exit 1
test -d ${DISTPATH}/lang || ! log "Could not find ${DISTPATH}/lang" || exit 1 test -d "${DISTPATH}/lang" || ! log "Could not find ${DISTPATH}/lang" || exit 1
test -d ${DISTPATH}/html || ! log "Could not find ${DISTPATH}/html" || exit 1 test -d "${DISTPATH}/html" || ! log "Could not find ${DISTPATH}/html" || exit 1
# Locale # Locale
HTSLANG="${LC_MESSAGES}" HTSLANG="${LC_MESSAGES}"
! test -n "${HTSLANG}" && HTSLANG="${LC_ALL}" ! test -n "${HTSLANG}" && HTSLANG="${LC_ALL}"
! test -n "${HTSLANG}" && HTSLANG="${LANG}" ! test -n "${HTSLANG}" && HTSLANG="${LANG}"
HTSLANG="`echo $LANG | cut -f1 -d'.' | cut -f1 -d'_'`" HTSLANG="$(echo "$LANG" | cut -f1 -d'.' | cut -f1 -d'_')"
LANGN=`grep -E "^${HTSLANG}:" ${DISTPATH}/lang.indexes | cut -f2 -d':'` LANGN=$(grep -E "^${HTSLANG}:" "${DISTPATH}/lang.indexes" | cut -f2 -d':')
! test -n "${LANGN}" && LANGN=1 ! test -n "${LANGN}" && LANGN=1
# Find the browser # Find the browser
# note: not all systems have sensible-browser or www-browser alternative # note: not all systems have sensible-browser or www-browser alternative
# thefeore, we have to find a bit more if sensible-browser could not be found # thefeore, we have to find a bit more if sensible-browser could not be found
for i in ${SRCHBROWSEREXE}; do for i in "${SRCHBROWSEREXE[@]}"; do
for j in ${SRCHPATH}; do for j in "${SRCHPATH[@]}"; do
if test -x ${j}/${i}; then if test -x "${j}/${i}"; then
BROWSEREXE=${j}/${i} BROWSEREXE="${j}/${i}"
fi fi
test -n "$BROWSEREXE" && break test -n "$BROWSEREXE" && break
done done
@@ -81,7 +86,7 @@ test -n "$BROWSEREXE" || ! log "Could not find any suitable browser" || exit 1
# "browse" command # "browse" command
if test "$1" = "browse"; then if test "$1" = "browse"; then
if test -f "${HOME}/.httrack.ini"; then if test -f "${HOME}/.httrack.ini"; then
INDEXF=`cat ${HOME}/.httrack.ini | tr '\r' '\n' | grep -E "^path=" | cut -f2- -d'='` INDEXF=$(tr '\r' '\n' <"${HOME}/.httrack.ini" | grep -E "^path=" | cut -f2- -d'=')
if test -n "${INDEXF}" -a -d "${INDEXF}" -a -f "${INDEXF}/index.html"; then if test -n "${INDEXF}" -a -d "${INDEXF}" -a -f "${INDEXF}/index.html"; then
INDEXF="${INDEXF}/index.html" INDEXF="${INDEXF}/index.html"
else else
@@ -96,39 +101,43 @@ exit $?
fi fi
# Create a temporary filename # Create a temporary filename
TMPSRVFILE="$(mktemp ${TMPDIR:-/tmp}/.webhttrack.XXXXXXXX)" || ! log "Could not create the temporary file ${TMPSRVFILE}" || exit 1 TMPSRVFILE="$(mktemp "${TMPDIR:-/tmp}/.webhttrack.XXXXXXXX")" || ! log "Could not create the temporary file ${TMPSRVFILE}" || exit 1
# Launch htsserver binary and setup the server # Launch htsserver binary and setup the server
(${BINPATH}/htsserver "${DISTPATH}/" --ppid "$$" path "${HOME}/websites" lang "${LANGN}" $@; echo SRVURL=error) > ${TMPSRVFILE}& (
"${BINPATH}/htsserver" "${DISTPATH}/" --ppid "$$" path "${HOME}/websites" lang "${LANGN}" "$@"
echo SRVURL=error
) >"${TMPSRVFILE}" &
# Find the generated SRVURL # Find the generated SRVURL
SRVURL= SRVURL=
MAXCOUNT=60 MAXCOUNT=60
while ! test -n "$SRVURL"; do while ! test -n "$SRVURL"; do
MAXCOUNT=$[$MAXCOUNT - 1] MAXCOUNT=$((MAXCOUNT - 1))
test $MAXCOUNT -gt 0 || exit 1 test $MAXCOUNT -gt 0 || exit 1
test $MAXCOUNT -lt 50 && echo "waiting for server to reply.." test $MAXCOUNT -lt 50 && echo "waiting for server to reply.."
SRVURL=`grep -E URL= ${TMPSRVFILE} | cut -f2- -d=` SRVURL=$(grep -E URL= "${TMPSRVFILE}" | cut -f2- -d=)
test ! "$SRVURL" = "error" || ! log "Could not spawn htsserver" || exit 1 test ! "$SRVURL" = "error" || ! log "Could not spawn htsserver" || exit 1
test -n "$SRVURL" || sleep 1 test -n "$SRVURL" || sleep 1
done done
# Cleanup function # Cleanup function
# shellcheck disable=SC2120 # $1 is an optional "signal caught" marker; bare calls are intentional
function cleanup { function cleanup {
test -n "$1" && log "Nasty signal caught, cleaning up.." test -n "$1" && log "Nasty signal caught, cleaning up.."
# Do not kill if browser exited (chrome bug issue) ; server will die itself # Do not kill if browser exited (chrome bug issue) ; server will die itself
test -n "$1" && test -f ${TMPSRVFILE} && SRVPID=`grep -E PID= ${TMPSRVFILE} | cut -f2- -d=` test -n "$1" && test -f "${TMPSRVFILE}" && SRVPID=$(grep -E PID= "${TMPSRVFILE}" | cut -f2- -d=)
test -n "${SRVPID}" && kill -9 ${SRVPID} test -n "${SRVPID}" && kill -9 "${SRVPID}"
test -f ${TMPSRVFILE} && rm ${TMPSRVFILE} test -f "${TMPSRVFILE}" && rm "${TMPSRVFILE}"
test -n "$1" && log "..Done" test -n "$1" && log "..Done"
return 0 return 0
} }
# Cleanup in case of emergency # Cleanup in case of emergency
trap "cleanup now; exit" 1 2 3 4 5 6 7 8 9 11 13 14 15 16 19 24 25 trap "cleanup now; exit" HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV PIPE ALRM TERM STKFLT XCPU XFSZ
# Got SRVURL, launch browser # Got SRVURL, launch browser
launch_browser "${BROWSEREXE}" "${SRVURL}" launch_browser "${BROWSEREXE}" "${SRVURL}"
# That's all, folks! # That's all, folks!
trap "" 1 2 3 4 5 6 7 8 9 11 13 14 15 16 19 24 25 trap "" HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV PIPE ALRM TERM STKFLT XCPU XFSZ
cleanup cleanup
exit 0 exit 0

View File

@@ -154,4 +154,173 @@ grep -Eq "style=\"background-image:url\('ibgs\.gif'\)\"" "$saved2" ||
grep -q 'title="file://' "$saved2" || grep -q 'title="file://' "$saved2" ||
! echo "FAIL: a no-detect attribute (title) was wrongly rewritten" || exit 1 ! echo "FAIL: a no-detect attribute (title) was wrongly rewritten" || exit 1
# xmlns / xmlns:prefix decls must not be crawled (#191). Local file:// targets so a
# regression downloads them; each is the LAST attr (heuristic only scans a value before '>').
site3="$tmp/xmlns"
mkdir -p "$site3"
for f in ns og rdfs real; do gif "$site3/$f.gif"; done
cat >"$site3/index.html" <<EOF
<html xmlns="file://$site3/ns.gif"><body>
<svg xmlns:og="file://$site3/og.gif"></svg>
<div class="c" xmlns:rdfs="file://$site3/rdfs.gif"></div>
<a href="file://$site3/real.gif">real link</a>
</body></html>
EOF
out3="$tmp/xmlns-out"
crawl "$site3/index.html" "$out3"
# the real link is still captured
found "real.gif" "$out3"
# namespace-declaration targets must not be fetched (default + prefixed forms)
notfound "ns.gif" "$out3"
notfound "og.gif" "$out3"
notfound "rdfs.gif" "$out3"
# CSS @import (#94): every form's target is captured, crawling the .css directly.
# The "cond"/"sup"/"spc" cases carry a trailing media/supports/layer condition (or
# a space before ';'); they are the negative controls: without the parser fix the
# URL is dropped, so a regression fails these found() checks.
site4="$tmp/cssimport"
mkdir -p "$site4"
for f in nq dqu squ dqs sqs med cond sup lay spc; do printf 'body{}\n' >"$site4/$f.css"; done
cat >"$site4/main.css" <<'EOF'
@import url(nq.css);
@import url("dqu.css");
@import url('squ.css');
@import "dqs.css";
@import 'sqs.css';
@import url(med.css) screen and (min-width: 400px);
@import "cond.css" screen;
@import "sup.css" supports(display: flex);
@import url(lay.css) layer(base);
@import "spc.css" ;
EOF
out4="$tmp/cssimport-out"
crawl "$site4/main.css" "$out4"
for f in nq dqu squ dqs sqs med cond sup lay spc; do found "$f.css" "$out4"; done
# Over-capture guard: the trailing condition is not part of the URL, so it must
# survive the rewrite verbatim. A regression that grabs it would mangle these.
m4=$(find "$out4" -type f -path '*/file/*' -name main.css -print -quit)
test -n "$m4" || ! echo "FAIL: saved main.css not found" || exit 1
for cond in '@import "cond.css" screen;' 'supports(display: flex)' 'layer(base)'; do
grep -Fq "$cond" "$m4" ||
! echo "FAIL #94: '$cond' altered on rewrite (condition captured as URL?)" || exit 1
done
# Malformed input: an unterminated @import quote (truncated CSS) must not crash or
# capture a bogus link; a valid sibling import is still captured. Guards a heap
# overflow on the URL-end scan that aborts under ASan (CI sanitizer job).
site5="$tmp/cssimport-trunc"
mkdir -p "$site5"
printf 'body{}\n' >"$site5/good.css"
printf '@import "good.css";\n@import "trunc' >"$site5/main.css"
out5="$tmp/cssimport-trunc-out"
crawl "$site5/main.css" "$out5"
found "good.css" "$out5"
notfound "trunc" "$out5"
# Offset-0 underflow (#396): a token at the buffer start makes the detector's
# word-boundary guard read *(html-1) one byte early (aborts under ASan). The
# url() target is still captured; here it just must not underflow.
site6="$tmp/parse-off0"
mkdir -p "$site6"
printf 'body{}\n' >"$site6/off0.css"
printf 'url(off0.css)\n' >"$site6/main.css"
out6="$tmp/parse-off0-out"
crawl "$site6/main.css" "$out6"
found "off0.css" "$out6"
# XMLHttpRequest.open(method, url) (#218): the first argument is an HTTP method,
# not a URL. Without the fix "GET" is captured as a link and fetched (the offline
# fixture saves a bare file named GET; a live server mangles it to GET.html).
# window.open(url) detection must be unaffected.
site7="$tmp/xhropen"
mkdir -p "$site7"
gif "$site7/winopen.gif"
cat >"$site7/index.html" <<EOF
<html><body><script>
var x = new XMLHttpRequest();
x.open("GET", "ajax_info.txt");
var y = new XMLHttpRequest();
y.open("Post", "submit.cgi");
window.open("file://$site7/winopen.gif");
</script></body></html>
EOF
out7="$tmp/xhropen-out"
crawl "$site7/index.html" "$out7"
# negative control: without the fix a file named exactly GET is downloaded
notfound "GET" "$out7"
# methods are matched case-insensitively (XHR spec normalizes them): a mixed-case
# method is rejected too, so a file named Post must not appear either
notfound "Post" "$out7"
# regression guard: window.open(url) is still detected, so its absolute URL is
# rewritten to a local link. The rewrite only happens if the parser saw it, so
# these two assertions fail if .open detection broke (not a trivial --near save).
saved7=$(savedhtml "$out7")
test -n "$saved7" || ! echo "FAIL: saved xhr page not found" || exit 1
grep -Fq 'window.open("winopen.gif")' "$saved7" ||
! echo "FAIL #218: window.open(url) no longer detected/rewritten" || exit 1
! grep -Fq 'window.open("file://' "$saved7" ||
! echo "FAIL #218: window.open URL left absolute (not rewritten)" || exit 1
# Parens in an unquoted url(...) (#163): the source %28/%29 decode to literal
# '(' ')' in the saved name, but a literal ')' in the rewritten url() closes the
# token early, so they must stay encoded. Negative control: without the fix the
# %281%29 greps fail (parens are RFC2396 "mark" chars the escaper leaves alone).
site8="$tmp/cssparens"
mkdir -p "$site8"
for f in 'img (1).gif' 'a(b)c(1).gif' 'q (4).gif'; do gif "$site8/$f"; done
cat >"$site8/style.css" <<'EOF'
.a { background: url(img%20%281%29.gif); }
.b { background: url(a%28b%29c%281%29.gif); }
.c { background: url("q%20%284%29.gif"); }
EOF
out8="$tmp/cssparens-out"
crawl "$site8/style.css" "$out8"
found "img (1).gif" "$out8"
found "a(b)c(1).gif" "$out8"
found "q (4).gif" "$out8"
css8=$(find "$out8" -type f -path '*/file/*' -name style.css -print -quit)
test -n "$css8" || ! echo "FAIL: saved style.css not found" || exit 1
grep -Fq 'url(img%20%281%29.gif)' "$css8" ||
! echo "FAIL #163: parens in unquoted url() not percent-encoded on rewrite" || exit 1
grep -Fq 'url(a%28b%29c%281%29.gif)' "$css8" ||
! echo "FAIL #163: not every paren in a url() was percent-encoded" || exit 1
grep -Fq 'url("q%20%284%29.gif")' "$css8" ||
! echo "FAIL #163: quoted url() altered or parens left literal on rewrite" || exit 1
# The url() detector is not CSS-specific: <script> and inline style= get the
# same encoding, but ordinary href/src (ending_p is the quote, not ')') keep
# literal parens -- the attribute checks guard the gate against over-firing.
site9="$tmp/urlparens"
mkdir -p "$site9"
for f in 'js (1).gif' 'inl (2).gif' 'asrc (3).gif' 'ahref (4).gif'; do gif "$site9/$f"; done
cat >"$site9/index.html" <<EOF
<html><body>
<script>var bg = "url(js%20%281%29.gif)";</script>
<div style="background-image:url(inl%20%282%29.gif)"></div>
<img src="asrc%20%283%29.gif">
<a href="ahref%20%284%29.gif">link</a>
</body></html>
EOF
out9="$tmp/urlparens-out"
crawl "$site9/index.html" "$out9"
saved9=$(savedhtml "$out9")
test -n "$saved9" || ! echo "FAIL: saved urlparens page not found" || exit 1
# rewrite-only: the JS-string asset is not queued for download
grep -Fq 'url(js%20%281%29.gif)' "$saved9" ||
! echo "FAIL #163: parens in <script> url() not percent-encoded" || exit 1
found "inl (2).gif" "$out9"
grep -Fq 'url(inl%20%282%29.gif)' "$saved9" ||
! echo "FAIL #163: parens in inline style url() not percent-encoded" || exit 1
found "asrc (3).gif" "$out9"
found "ahref (4).gif" "$out9"
grep -Fq 'src="asrc%20(3).gif"' "$saved9" ||
! echo "FAIL #163: parens in a plain src attribute were wrongly encoded" || exit 1
grep -Fq 'href="ahref%20(4).gif"' "$saved9" ||
! echo "FAIL #163: parens in a plain href attribute were wrongly encoded" || exit 1
! grep -Eq '(src|href)="[^"]*%28' "$saved9" ||
! echo "FAIL #163: gate over-fired onto a non-url() attribute link" || exit 1
exit 0 exit 0

68
tests/01_engine-relative.test Executable file
View File

@@ -0,0 +1,68 @@
#!/bin/bash
#
# lienrelatif (build relative path) + ident_url_relatif (resolve a link, collapse
# ./ and ../). Regression net for #137/#162; expected values hand-computed.
set -euo pipefail
# relative path from <curr>'s directory to <link>
rel() {
local got
got=$(httrack -O /dev/null -#l "$1" "$2")
test "$got" == "relative=$3" ||
{
echo "FAIL rel($1, $2): got '$got' want 'relative=$3'"
exit 1
}
}
# resolve <link> against origin <adr>/<fil> -> adr=.. fil=..
ident() {
local got
got=$(httrack -O /dev/null -#i "$1" "$2" "$3")
test "$got" == "$4" ||
{
echo "FAIL ident($1, $2, $3): got '$got' want '$4'"
exit 1
}
}
### lienrelatif
rel 'dir/page.html' 'dir/index.html' 'page.html'
rel 'dir/page.html' 'dir/page.html' 'page.html' # self-link
rel 'a.html' 'dir/index.html' '../a.html'
rel 'x.html' 'a/b/c/index.html' '../../../x.html'
rel 'h/a/x.jpg' 'h/a/sub/page.html' '../x.jpg'
rel 'a/b/c/x.html' 'index.html' 'a/b/c/x.html'
rel 'h/sub/x.jpg' 'h/page.html' 'sub/x.jpg'
rel 'h/dir2/x.jpg' 'h/dir1/page.html' '../dir2/x.jpg' # sibling dir
rel 'h/bc/x.jpg' 'h/b/page.html' '../bc/x.jpg' # b/bc prefix trap
rel 'h/b/x.jpg' 'h/bc/page.html' '../b/x.jpg'
rel 'h2/img/x.jpg' 'h1/p/page.html' '../../h2/img/x.jpg' # cross-host
rel 'img.cdn/photo.jpg' 'www.site/articles/2020/post.html' '../../../img.cdn/photo.jpg'
rel 'h/a/' 'h/a/sub/page.html' '../' # link is ancestor dir
rel 'x.html' 'page.html' 'x.html'
rel 'dir/page.html?x=1' 'dir/index.html?y=2' 'page.html' # ? stripped
### ident_url_relatif
ident 'img.gif' 'www.foo.com' '/dir/page.html' 'adr=www.foo.com fil=/dir/img.gif'
ident 'sub/img.gif' 'www.foo.com' '/dir/page.html' 'adr=www.foo.com fil=/dir/sub/img.gif'
ident '/img.gif' 'www.foo.com' '/dir/page.html' 'adr=www.foo.com fil=/img.gif'
# embedded ../ collapses (#137)
ident '../img.gif' 'www.foo.com' '/dir/sub/page.html' 'adr=www.foo.com fil=/dir/img.gif'
ident 'sub/../logo.png' 'www.foo.com' '/articles/2020/post.html' 'adr=www.foo.com fil=/articles/2020/logo.png'
ident '../../pix/sub/../logo.png' 'www.foo.com' '/articles/2020/post.html' 'adr=www.foo.com fil=/pix/logo.png'
ident '../../../../x.gif' 'www.foo.com' '/a/b/page.html' 'adr=www.foo.com fil=/x.gif' # above-root clamp
ident '?page=2' 'www.foo.com' '/dir/index.html?old=1' 'adr=www.foo.com fil=/dir/index.html?page=2'
ident 'http://other.com/a/b/../c/index.html' 'www.foo.com' '/p.html' 'adr=other.com fil=/a/c/index.html'
# file:// collapses ../ like the other schemes; traversal contained, // authority kept
ident 'file:///var/data/pix/sub/../logo.png' 'www.foo.com' '/p.html' 'adr=file:// fil=/var/data/pix/logo.png'
ident 'file:///a/b/c/../../d/e.gif' 'www.foo.com' '/p.html' 'adr=file:// fil=/a/d/e.gif'
ident 'file:///a/../../b' 'www.foo.com' '/p.html' 'adr=file:// fil=/b'
ident 'file://srv/share/../x' 'www.foo.com' '/p.html' 'adr=file:// fil=//srv/x'
ident 'mailto:foo@bar.com' 'www.foo.com' '/p.html' 'error=-1' # unsupported scheme
ident 'javascript:void(0)' 'www.foo.com' '/p.html' 'error=-1'
echo "OK"

View File

@@ -26,3 +26,17 @@ simp './a/../../b' 'b'
# empty segments ('//') are not dot-segments and are preserved, per RFC 3986 # empty segments ('//') are not dot-segments and are preserved, per RFC 3986
simp 'a//b' 'a//b' simp 'a//b' 'a//b'
simp 'a//b/../c' 'a//c'
# absolute paths keep the leading '/'; above-root '..' is clamped to it
simp '/a/../b' '/b'
simp '/a/../../b' '/b'
simp '/../x' '/x'
# collapses to nothing -> './' (relative) or '/' (absolute)
simp '..' './'
simp 'a/..' './'
simp '/' '/'
simp 'a/b/..' 'a/' # trailing bare '..'
simp 'a/../b?x=../y' 'b?x=../y' # '?' freezes simplification

View File

@@ -21,9 +21,15 @@ test "$out" == "strsafe: OK" || exit 1
# the bounded macro aborts (non-zero exit), so don't let set -e trip on it # the bounded macro aborts (non-zero exit), so don't let set -e trip on it
err=$(httrack -#8 overflow "this string is far too long for the buffer" 2>&1) || true err=$(httrack -#8 overflow "this string is far too long for the buffer" 2>&1) || true
case "$err" in case "$err" in
*"strsafe: NOT aborted"*) echo "over-capacity write was NOT caught" >&2; exit 1 ;; *"strsafe: NOT aborted"*)
echo "over-capacity write was NOT caught" >&2
exit 1
;;
*"overflow while copying"*) ;; *"overflow while copying"*) ;;
*) echo "expected htssafe overflow abort, got: $err" >&2; exit 1 ;; *)
echo "expected htssafe overflow abort, got: $err" >&2
exit 1
;;
esac esac
# Same guarantee for the htsbuff builder. The source is exactly the buffer # Same guarantee for the htsbuff builder. The source is exactly the buffer
@@ -32,7 +38,13 @@ esac
# aborted"). Match the specific htsbuff abort message, not just any assert. # aborted"). Match the specific htsbuff abort message, not just any assert.
err=$(httrack -#8 overflow-buff "abcd" 2>&1) || true err=$(httrack -#8 overflow-buff "abcd" 2>&1) || true
case "$err" in case "$err" in
*"strsafe: NOT aborted"*) echo "htsbuff over-capacity write was NOT caught" >&2; exit 1 ;; *"strsafe: NOT aborted"*)
echo "htsbuff over-capacity write was NOT caught" >&2
exit 1
;;
*"htsbuff append overflow"*) ;; *"htsbuff append overflow"*) ;;
*) echo "expected htsbuff overflow abort, got: $err" >&2; exit 1 ;; *)
echo "expected htsbuff overflow abort, got: $err" >&2
exit 1
;;
esac esac

View File

@@ -35,6 +35,7 @@ TESTS = \
01_engine-mime.test \ 01_engine-mime.test \
01_engine-parse.test \ 01_engine-parse.test \
01_engine-rcfile.test \ 01_engine-rcfile.test \
01_engine-relative.test \
01_engine-simplify.test \ 01_engine-simplify.test \
01_engine-strsafe.test \ 01_engine-strsafe.test \
02_manpage-regen.test \ 02_manpage-regen.test \

View File

@@ -18,7 +18,7 @@ function debug {
} }
function info { function info {
printf "[$*] ..\t" >&2 printf '[%s] ..\t' "$*" >&2
} }
function result { function result {
@@ -66,31 +66,30 @@ function start-crawl {
--debug) --debug)
verbose=1 verbose=1
;; ;;
--no-purge|--summary|--print-files) --no-purge | --summary | --print-files) ;;
;;
--errors | --files | --found | --not-found | --directory) --errors | --files | --found | --not-found | --directory)
pos=$[${pos}+1] pos=$((pos + 1))
test "$#" -ge "$pos" || warning "missing argument" || return 1 test "$#" -ge "$pos" || warning "missing argument" || return 1
;; ;;
httrack) httrack)
pos=$[${pos}+1] pos=$((pos + 1))
break; break
;; ;;
*) *)
warning "unrecognized option ${!pos}" warning "unrecognized option ${!pos}"
return 1 return 1
;; ;;
esac esac
pos=$[${pos}+1] pos=$((pos + 1))
done done
debug "remaining args: ${@:${pos}}" debug "remaining args: ${*:pos}"
# ut/ won't exceed 2 minutes # ut/ won't exceed 2 minutes
moreargs="--quiet --max-time=120 --timeout=30 --connection-per-second=5" moreargs=(--quiet --max-time=120 --timeout=30 --connection-per-second=5)
# proxy environment ? # proxy environment ?
if test -n "$http_proxy"; then if test -n "${http_proxy:-}"; then
moreargs="$moreargs --proxy $http_proxy" moreargs+=(--proxy "$http_proxy")
fi fi
test -n "$tmpdir" || ! warning "no tmpdir" || return 1 test -n "$tmpdir" || ! warning "no tmpdir" || return 1
@@ -104,9 +103,9 @@ function start-crawl {
# start crawl # start crawl
log="${tmp}/log" log="${tmp}/log"
debug starting httrack -O "${tmp}" ${moreargs} ${@:${pos}} debug starting httrack -O "${tmp}" "${moreargs[@]}" "${@:pos}"
info "running httrack ${@:${pos}}" info "running httrack ${*:pos}"
httrack -O "${tmp}" --user-agent="httrack $ver ut ($(uname -omrs))" ${moreargs} ${@:${pos}} >"${log}" 2>&1 & httrack -O "${tmp}" --user-agent="httrack $ver ut ($(uname -omrs))" "${moreargs[@]}" "${@:pos}" >"${log}" 2>&1 &
crawlpid="$!" crawlpid="$!"
debug "started cralwer on pid $crawlpid" debug "started cralwer on pid $crawlpid"
wait "$crawlpid" wait "$crawlpid"
@@ -164,12 +163,12 @@ function start-crawl {
;; ;;
--files) --files)
shift shift
nFiles=$(grep -E "^HTTrack Website Copier/[^ ]* mirror complete in " "${tmp}/hts-log.txt" \ nFiles=$(grep -E "^HTTrack Website Copier/[^ ]* mirror complete in " "${tmp}/hts-log.txt" |
| sed -e 's/.*[[:space:]]\([^ ]*\)[[:space:]]files written.*/\1/g') sed -e 's/.*[[:space:]]\([^ ]*\)[[:space:]]files written.*/\1/g')
assert_equals "checking files" "$1" "$nFiles" assert_equals "checking files" "$1" "$nFiles"
;; ;;
httrack) httrack)
break; break
;; ;;
esac esac
shift shift
@@ -195,7 +194,7 @@ tmpdir=
crawlpid= crawlpid=
nopurge= nopurge=
verbose= verbose=
trap "cleanup" 0 1 2 3 4 5 6 7 8 9 11 13 14 15 16 19 24 25 trap cleanup EXIT HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV PIPE ALRM TERM STKFLT XCPU XFSZ
# working directory # working directory
tmpdir="${tmptopdir}/httrack_ut.$$" tmpdir="${tmptopdir}/httrack_ut.$$"

View File

@@ -3,11 +3,11 @@
error=0 error=0
for i in *.test; do for i in *.test; do
if bash $i ; then if bash "$i"; then
echo "$i: passed" >&2 echo "$i: passed" >&2
else else
echo "$i: ERROR" >&2 echo "$i: ERROR" >&2
error=$[${error}+1] error=$((error + 1))
fi fi
done done