Compare commits

...

70 Commits

Author SHA1 Message Date
Xavier Roche
a52a2b146c Add an offline update/cache regression test
Every crawl test runs httrack exactly once (crawl-test.sh), so the cache read /
update path (cache_readex) -- recently touched by the buffer-bounding work -- had
zero regression coverage: the cache was written but never read back.

Add tests/02_update-cache.test, a self-contained file:// two-pass test (no
network, always runs): mirror a local site, re-mirror it unchanged (the cache-
read pass must complete with no errors -- guards a crash/abort in cache_readex),
then change a source file and re-mirror (the update must pick up the new content
-- guards the update decision that reads the cached metadata).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 16:29:45 +02:00
Xavier Roche
226a38d3d0 Merge pull request #340 from xroche/cleanup/htscache-bounds
Bound htscache.c cache-field and save-name copies
2026-06-14 15:58:04 +02:00
Xavier Roche
1e463f65a5 Bound htscache.c cache-field and save-name copies
ZIP_READFIELD_STRING (the cached ZIP-header field reader) copied
attacker-influenced cache-file values into fixed htsblk fields with an unchecked
strcpybuff -- benign for the char[] fields, but r.location is a char* (degrades
to raw strcpy). Thread the destination size into the macro: sizeof(field) for
the array fields, HTS_URLMAXSIZE*2 for r.location (it points into a buffer of
that size, in both the caller-supplied and the location_default case).

Also bound cache_readex's return_save copy (its one non-NULL caller passes a
HTS_URLMAXSIZE*2 buffer), the exact-sized malloc copy in cache_rstr's default
path (strlen(defaultdata)+1), and replace the two strcpybuff(r.location, "")
clears with a direct r.location[0] = '\0'.

htscache.c pointer-destination warnings 6 -> 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 15:43:04 +02:00
Xavier Roche
09ed9968cd Merge pull request #339 from xroche/cleanup/htsbauth-bounds
Bound htsbauth cookie/auth buffer writes
2026-06-14 15:32:37 +02:00
Xavier Roche
ad6915e3cc Bound htsbauth cookie/auth buffer writes
cookie_get(), bauth_prefix(), cookie_insert() and cookie_delete() all wrote into
caller-provided char* buffers via unchecked strcpybuff/strcatbuff/strncatbuff
(the pointer-destination case). Bound them:

- cookie_get: write the extracted field with htsbuff over the buffer's 8192-byte
  contract (all callers use char[8192]).
- bauth_prefix: copy host+path with strlcpybuff/strlcatbuff bounded to the
  caller's HTS_URLMAXSIZE*2 buffer.
- cookie_insert/cookie_delete: thread the destination capacity (the cookie
  store's max_len minus the cursor offset) and use strlcpybuff/strlcatbuff;
  update cookie_add/cookie_del to pass it.

Add cookie_get field-extraction asserts to basic_selftests (run via -#7) rather
than a new -# digit. Translated the touched French comments.

htsbauth.c pointer-destination warnings 9 -> 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 15:29:33 +02:00
Xavier Roche
4a5580dec0 Merge pull request #338 from xroche/cleanup/htswizard-bounds
Build wizard auto-filter rules with htsbuff (bounded)
2026-06-14 14:37:56 +02:00
Xavier Roche
f1d35e7691 Build wizard auto-filter rules with htsbuff (bounded)
hts_acceptlink_()'s auto-generated allow/deny rules built _FILTERS[0] -- a
filter slot of HTS_URLMAXSIZE*2 bytes -- via unchecked strcpybuff/strcatbuff/
strncatbuff on the char* slot, and HT_INSERT_FILTERS0 shifted slots with an
unchecked strcpybuff. Convert each rule builder to an htsbuff over the slot
(new local HTS_FILTER_SLOT_SIZE, matching the stride allocated by
filters_init()), and bound the slot-shift copy with strlcpybuff.

Behavior preserved: old vs new produce byte-identical mirrors across four crawl
configurations on a local multi-directory site (the auto-rules fire for primary
links on normal crawls). Touched French comments translated.

htswizard.c pointer-destination warnings 30 -> 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 14:36:21 +02:00
Xavier Roche
6d7db83726 Merge pull request #336 from xroche/cleanup/htsalias-bounds
Bound optalias_check's output buffers (fix S1 overflow)
2026-06-14 13:50:38 +02:00
Xavier Roche
335c2c4b2a Merge pull request #337 from xroche/docs/governance
Add contributor governance: CONTRIBUTING, COC, SECURITY, DCO
2026-06-14 13:47:44 +02:00
Xavier Roche
62be177e35 Add obfuscated personal email as alternate security contact
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 13:47:15 +02:00
Xavier Roche
edd52bf3be Bound optalias_check's output buffers (thread their sizes)
optalias_check() wrote into caller-provided char* buffers with unchecked ops:
the param0 case did strcpybuff/strcatbuff of command+param into return_argv[0],
which can exceed the buffer, and the syntax-error paths sprintf()'d an option
name into return_error -- which is only 256 bytes in the config-file caller, so
a long option overflows it. Both are the overflow the audit flagged.

Thread return_argv_size and return_error_size through the (internal,
non-exported) signature; copy with strlcpybuff/strlcatbuff and format with
snprintf, so an over-long value aborts/truncates instead of overrunning. Update
both callers to pass their real sizes.

Leaves the shared cmdl_ins macro (the cmdl_* family wants its block size
threaded too -- a separate cleanup).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 13:47:12 +02:00
Xavier Roche
452a9f6c67 Add contributor governance: CONTRIBUTING, COC, SECURITY, DCO
httrack had no community-health files. Add a short CONTRIBUTING (PR/style
basics, security-sensitivity, an outcome-only AI-assistance policy), the
Contributor Covenant 2.1 as CODE_OF_CONDUCT, and a SECURITY policy with a
verified-reproduction bar for AI-assisted reports.

Require a Signed-off-by (DCO) on every commit and enforce it in CI via a new
pull_request-only job.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-14 13:41:19 +02:00
Xavier Roche
9eb2a344a9 Merge pull request #335 from xroche/cleanup/infostatuscode-const
Return HTTP status reason phrases via a const-returning switch
2026-06-14 13:18:16 +02:00
Xavier Roche
348a7d8cb2 Return HTTP status reason phrases via a const-returning switch
infostatuscode() was a ~60-case switch, each arm strcpybuff()-ing a literal into
the caller's char* msg: 42 unchecked pointer-destination copies of static data.
Keep the same O(1) switch dispatch but have it return the phrase instead of
copying -- new public infostatuscode_const(int) -> const char* (or NULL) -- and
do the copy in a thin wrapper.

infostatuscode() preserves exact behavior: a known code overwrites msg; an
unknown code keeps any caller-provided message, else writes "Unknown error".
The single remaining copy uses strlcpybuff with the documented 64-byte minimum
(longest phrase is 31; all callers pass >= 80).

Drops 42 pointer-destination warnings (htslib.c 56 -> 14; tree 179 -> 137).
No dispatch regression: it stays a switch (jump table), no allocation, no
per-call scan.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 13:14:23 +02:00
Xavier Roche
5f81741ac5 Merge pull request #332 from xroche/cleanup/url_savename-htsbuff
Convert the url_savename template renderer to htsbuff
2026-06-14 13:01:32 +02:00
Xavier Roche
0cf14c4e88 Convert the url_savename template renderer to htsbuff
The savename_type == -1 userdef renderer walked afs->save with a raw char*
cursor, doing "b += strlen(b)" after each write, and strcpybuff(b, ...) on that
char* was unchecked (the pointer-destination case). That manual pointer math is
where the function's off-by-one / strlen-based hazards lived.

Convert the cursor to an htsbuff over afs->save (capacity sizeof = the full
HTS_URLMAXSIZE*2 buffer): every append is now bounds-checked and the pointer
math is gone. The loop's truncation guard becomes "sb.len < HTS_URLMAXSIZE",
preserving the existing cap-at-1024 behavior; the 2x buffer means a write only
aborts where it would previously have overrun. Add htsbuff_catc for the
single-character appends ('%', '.', literal copy).

Removes 35 pointer-destination warnings (htsname.c 51 -> 9; the renderer is now
warning-free). Behavior verified identical: the pre-change and new binaries
produce byte-identical output across 14 -N templates (%n %N %t %p %h %H %M %q %r
%% %[param], the short %s variants, and literals) crawling a local site.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 12:59:29 +02:00
Xavier Roche
29a07ff487 Merge pull request #334 from xroche/cleanup/git-format-hook
Add an opt-in pre-commit hook that auto-formats changed C lines
2026-06-14 12:58:42 +02:00
Xavier Roche
f987083f14 Add an opt-in pre-commit hook that auto-formats changed C lines
Enable with: git config core.hooksPath .githooks

The hook runs git-clang-format (clang-format 19, repo .clang-format) on the
staged C lines only and re-stages the result, so commits stay
clang-format-clean and the CI format check passes without a round-trip. It never
reformats the whole tree, only the lines a commit changes.

Safe by construction: if clang-format 19 is absent it skips (CI still enforces);
and if a file has both staged and unstaged changes it does not auto-mutate
(which would commit the unstaged part), it reports and asks the author to
stage/stash. HTTRACK_NO_AUTOFORMAT=1 skips it for one commit. README covers the
noexec-working-tree case (point core.hooksPath at an exec-fs copy).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 12:55:17 +02:00
Xavier Roche
eb565f0bd8 Merge pull request #333 from xroche/cleanup/clang-format-setup
Add a .clang-format and a changed-lines CI format check
2026-06-14 12:38:20 +02:00
Xavier Roche
71398d510e Add a .clang-format and a changed-lines CI format check
The engine predates clang-format (it was shaped by an old Visual Studio
formatter) and does not round-trip through it: a whole-tree reformat is ~25k
lines of churn, so we never do one. Instead we format only the lines a change
touches, via git-clang-format, and enforce that in CI diff-scoped.

.clang-format is reverse-engineered from src/*.c (2-space, no tabs, 80 cols,
char *x pointers, attached braces, un-indented case labels, space after C-style
casts). That is mostly LLVM defaults; the deliberate deviations are
SpaceAfterCStyleCast (the dominant "(int) x" form) and SortIncludes: false
(C include order can be significant, so never reorder).

The CI "format" job pins clang-format-19 from apt.llvm.org's noble channel
(ubuntu-24.04's native is 18) to match local dev, and fails only if a PR's
changed C lines are not clang-format-clean. Existing untouched code is left
alone.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 12:26:49 +02:00
Xavier Roche
75fc040f06 Merge pull request #331 from xroche/cleanup/htsbuff-builder
Add htsbuff: a bounded string builder over a fixed buffer
2026-06-14 10:40:23 +02:00
Xavier Roche
c4ef18f5a5 Add htsbuff: a bounded string builder over a fixed buffer
Many pointer-destination buff() sites are cursors walking a buffer of known
capacity, with a manual "p += strlen(p)" after each write (the url_savename
renderer does this ~40 times). That hand-rolled pointer math is where several
of the off-by-one hazards live.

htsbuff captures the pattern: a non-owning builder (buf/cap/len) built from an
in-scope array (htsbuff_array, capacity via sizeof) or a pointer of known size
(htsbuff_ptr). htsbuff_cat/catn/cpy bound every write against the real capacity
and abort on overflow, same contract as the *_safe_ helpers, so the pointer
math goes away.

Extend the -#8 self-test and tests/01_engine-strsafe.test with builder
correctness (append, truncating append, reset, length) and an overflow-abort
case. No call sites are converted yet; that follows per file.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 10:38:22 +02:00
Xavier Roche
d76dad47f7 Merge pull request #330 from xroche/cleanup/htssafe-pointer-diagnostics
Flag unchecked pointer-destination uses of the buff() string macros
2026-06-14 08:49:26 +02:00
Xavier Roche
9c6ff54040 Bound catch_url() header buffer to its 32Kb contract
First consumer of the new buff() pointer-destination diagnostic. catch_url()
appended response headers into the caller's 'data' buffer with strcatbuff on
a char* destination, which is unchecked: a long header stream could overrun
the 32Kb buffer.

Make the capacity contract explicit (CATCH_URL_DATA_SIZE in htscatchurl.h,
used by the caller too) and append with strlcatbuff, which enforces the bound
and aborts rather than overflowing. htscatchurl.c now compiles warning-free
under the diagnostic.

The remaining raw sprintf/sscanf into the same buffer are separate items for
a later pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 08:46:03 +02:00
Xavier Roche
4a057514b9 Warn on unchecked pointer-destination uses of the buff() macros
strcpybuff/strcatbuff/strncatbuff only bounds-check when the destination
is a sized char[] array. For a bare char* the capacity is unknown, so the
macro silently falls back to plain strcpy/strcat/strncat while still
looking like a checked call.

On GCC/Clang, route the pointer case through __builtin_choose_expr() to a
stub carrying the 'warning' function attribute, so a compile-time warning
fires only at pointer-destination sites and points at the explicit-size
replacement (strlcpybuff/strlcatbuff). Array sites keep using the bounded
_safe_ helpers and stay quiet. The change is diagnostic only: no runtime
or ABI change, and other compilers keep the previous behavior.

Add a runtime self-test for the bounded ops behind a new -#8 debug mode,
plus tests/01_engine-strsafe.test covering both correct copies and the
abort-on-overflow guarantee.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 08:40:10 +02:00
Xavier Roche
055e17b057 Merge pull request #328 from xroche/cli/header-ua-length-152
Raise the user-agent and custom-header length limits
2026-06-14 01:43:31 +02:00
Xavier Roche
d7bb97d697 Merge pull request #329 from xroche/parser/lock-background-image-237
Lock CSS background-image url() rewriting in the parser test
2026-06-14 01:37:51 +02:00
Xavier Roche
d741188980 Raise the user-agent and custom-header length limits
The -F user-agent value was rejected past 126 bytes and the -%X header line
past 256. Both are stored in dynamically grown String buffers, so the caps were
arbitrary. Drop them; every argument is still bounded by the general
per-argument check in htscoremain.c (HTS_CDLMAXSIZE), which lifts the usable
limit to just under 1 KB.

optalias_check copied a long-form option value (--user-agent, --headers, ...)
into a fixed 1000-byte scratch buffer, smaller than that general cap, so a value
of 1000..1023 bytes aborted the process through the guarded-copy overflow check.
Size command and param to HTS_CDLMAXSIZE so the long form matches the cap; an
over-cap value is now refused with the normal "argument too long" message
instead of crashing. Grow the request-head buffer to 16384 for the larger
aggregate header set.

closes #152
2026-06-14 01:32:07 +02:00
Xavier Roche
ca810ef7e3 Lock CSS background-image url() rewriting in the parser test
background-image is already captured and rewritten through the style/CSS
url() path, in both an external <style> block and an inline style attribute,
with the URL unquoted, double-quoted or single-quoted. Extend the offline
parser test to cover all of these so the behavior stays locked.

closes #237
2026-06-14 01:07:42 +02:00
Xavier Roche
1bf90ce294 Merge pull request #326 from xroche/parser/srcset-candidates
Capture every srcset candidate URL on <img> and <source>
2026-06-14 00:42:48 +02:00
Xavier Roche
583817dcd4 Capture every srcset candidate URL on <img> and <source>
A srcset value is a comma-separated list of "URL descriptor" entries
(480w, 2x). HTTrack only had "data-srcset" in the link-detection table and
left the plain "srcset" attribute untouched, so responsive images were never
mirrored. The parser now captures and rewrites each candidate URL in turn,
preserving the descriptors and the commas between entries verbatim, and bounds
every new buffer scan against the page end.

Candidate splitting follows the WHATWG srcset algorithm: the URL is a run of
non-whitespace characters, so a comma inside a URL (a data: URI, a CDN
transform path like w_300,c_fill) stays part of the URL and is not mis-split;
only a trailing comma or a comma after the descriptor separates candidates.

Adds tests/01_engine-parse.test, an offline file:// parser test that asserts
each candidate is queued and rewritten (including the comma-in-URL cases), and
also locks the existing xlink:href (#298) and inline background-image (#237)
handling.

closes #235
closes #236
2026-06-14 00:37:20 +02:00
Xavier Roche
5351e96d71 Merge pull request #325 from xroche/docs/rfc2606-example-domains
docs: use www.example.com in examples; add html manual regen target
2026-06-13 10:41:24 +02:00
Xavier Roche
9d39a57576 build: add regen target for html/httrack.man.html
The rendered HTML manual had no regeneration path. Add regen-man-html,
which runs groff's html device over httrack.1, alongside the existing
regen-man target.
2026-06-13 10:38:31 +02:00
Xavier Roche
e3d4ec01f7 docs: use www.example.com in examples instead of www.someweb.com
someweb.com is a real registrable domain; example.com is reserved for
documentation (RFC 2606). Replace it across the HTML guides, the CLI
--help text (htshelp.c) and code comments, then regenerate man/httrack.1
and the rendered html/httrack.man.html. Other placeholder domains are
left alone: they appear inside filter/wildcard examples where the host
interacts with the pattern.
2026-06-13 10:38:31 +02:00
Xavier Roche
a0bf50f6b1 Merge pull request #324 from xroche/test/filter-escape-characterize
test: characterize wildcard class escape behavior
2026-06-13 10:17:24 +02:00
Xavier Roche
794404bba2 test: characterize wildcard class escape behavior
Add -#0 self-test cases for backslash escapes inside a '*[...]' class.
They pin two quirks of the current decoder: '\X' matches both X and the
backslash itself, and a literal ']' cannot be a class member because the
parser stops at the first ']' (escaped or not). The latter is why the
filter guide's '*[\[\]]' = "the [ or ] character" claim is wrong (#148):
it parses as the class {[,\} plus a trailing literal ']'. These tests
lock the behavior down so a later matcher fix is a deliberate change.

refs #148
2026-06-13 10:15:45 +02:00
Xavier Roche
82d08aaeaf Merge pull request #323 from xroche/fix/doc-lang-nits
docs: fix help-guide placeholders, README clone flag, Ukrainian charset
2026-06-13 10:12:09 +02:00
Xavier Roche
459f06e758 docs: fix help-guide placeholders, README clone flag, Ukrainian charset
Escape the literal <URLs>, <FILTERs>, <param>, <filter>, <file> and
related placeholders in fcguide.html so they render instead of being
swallowed as unknown HTML tags; several were also missing their closing
'>'. Use --recurse-submodules in the README clone command. Relabel
lang/Ukrainian.txt as windows-1251, which is what its bytes actually
are (ISO-8859-5 decodes them to garbage).

closes #132, closes #103, closes #167
2026-06-13 10:05:40 +02:00
Xavier Roche
89b25e418b Merge pull request #322 from xroche/test/expand-engine-coverage
test: expand offline engine self-test coverage
2026-06-13 09:58:03 +02:00
Xavier Roche
43f72afbad test: expand offline engine self-test coverage
Add filter (-#0) and MIME (-#2) tests, and broaden the charset, entity,
IDNA, and path-simplify cases that previously had one or two assertions
each.

Cover the punycode, charset, and entity parsers (areas with a CVE
history) with malformed-input probes that check the hardened build exits
cleanly rather than overflowing. The IDNA and path-simplify edge cases
are pinned to RFC 3492 and RFC 3986 semantics.

The &nbsp; entity case documents the known U+00A0 -> space behavior in
htsencoding.c instead of asserting the spec byte, so a future fix is not
blocked by a stale test.
2026-06-13 09:55:19 +02:00
Xavier Roche
017c634c53 Merge pull request #321 from xroche/fix/mutex-init-race-297
Fix race in lazy mutex initialization
2026-06-13 09:18:39 +02:00
Xavier Roche
f2b36c4b29 Merge pull request #320 from xroche/fix/lockpath-overflow-183
Fix abort on long log path (lock-file buffer too small)
2026-06-13 09:18:10 +02:00
Xavier Roche
19947efd74 Merge pull request #319 from xroche/fix/footer-xss-165
Fix XSS via unescaped URL in the page footer comment
2026-06-13 09:18:02 +02:00
Xavier Roche
de26ad881a fix: synchronize lazy mutex initialization (closes #297)
Two threads locking the same mutex for the first time could both run the
unsynchronized lazy init, corrupting the underlying pthread mutex and aborting
or deadlocking. Build the object and publish it with a single atomic
compare-and-swap; threads that lose the race free the object they built. This
needs no statically-initializable guard, so it stays valid on Windows 2000.
2026-06-13 09:15:31 +02:00
Xavier Roche
106d34d82c fix: size the lock-file path buffer to the concat buffer (closes #183)
A long log path made the lock-file path overflow the fixed 256-byte n_lock
buffer, tripping the guarded copy and aborting with signal 6. Size n_lock to
the concat-buffer capacity so it holds any path fconcat can produce.

(cherry picked from commit 15144ffd24667712cca2ac0fee96bd355239eff6)
2026-06-12 23:24:20 +02:00
Xavier Roche
61e0b3250b fix: escape angle brackets in the page footer URL (closes #165)
The default footer embeds the page URL inside an HTML comment. A URL
containing "-->" closed the comment and let an attacker inject script into
the mirrored page. Percent-encode < and > before the URL reaches the footer.

(cherry picked from commit 606883229244dc233d16915678e63cfa62000ff0)
2026-06-12 23:24:20 +02:00
Xavier Roche
827c227b94 history: document the Russian and Danish translation updates 2026-06-12 22:42:38 +02:00
Xavier Roche
17678fcee3 Merge pull request #117 from scootergrisen/master
Updated Danish translation, folded into Dansk.txt (the file the UI loads),
with stale/corrupted English keys restored and CRLF line endings kept
2026-06-12 22:42:05 +02:00
Xavier Roche
9ee8cbc58d Merge pull request #210 from GermanAizek/master
Updated Russian translation
2026-06-12 22:31:00 +02:00
Xavier Roche
418255c038 history: document the postprocess and help-text fixes 2026-06-12 22:14:44 +02:00
Xavier Roche
aa285715b3 Merge pull request #135 from RomanSek/plugin-postprocess-fix
Fix for handling changes introduced in postprocess
2026-06-12 22:13:02 +02:00
Xavier Roche
547c77062e Merge pull request #305 from yosinn1-blip/codex/typo-253-preferred-language-help-text
docs: fix preferred spelling in help text
2026-06-12 22:12:55 +02:00
Xavier Roche
58bdfde2a9 debian: document the lintian cleanup in changelog and history 2026-06-12 22:00:57 +02:00
Xavier Roche
3e30f4e572 Merge pull request #318 from xroche/fix/lintian-cleanup
debian: clean up lintian tags
2026-06-12 21:50:54 +02:00
Xavier Roche
46b7b8ed3f debian: override source-is-missing for upstream HTML docs
The bundled html/ and templates/ pages are the genuine upstream
documentation from the httrack.com website. lintian's long-line
heuristic flags them as missing source; they are the actual source.
2026-06-12 21:44:44 +02:00
Xavier Roche
2f40122bec debian: fix assorted lintian tags
- webhttrack: depend firmly on sensible-utils (it calls sensible-browser),
  drop the missing-depends-on-sensible-utils override.
- copyright: point to /usr/share/common-licenses/GPL-3, not the GPL symlink.
- watch: use https and version=4.
- control: add Rules-Requires-Root: no and Vcs-Browser.
- strip trailing whitespace in control, rules and changelog.
2026-06-12 21:27:11 +02:00
Xavier Roche
26b62369c5 build: link libhtsjava and libtest examples against libc
libhtsjava and the libtest callback examples reach libc only through
libhttrack, so the linker drops the direct libc edge from DT_NEEDED.
lintian flags this as library-not-linked-against-libc. Force libc to be
recorded as a dependency and drop the now-redundant override.
2026-06-12 21:23:29 +02:00
Xavier Roche
b21f85c53f Merge pull request #317 from xroche/fix/cookie-cmp-loop
Fix never-matching wildcard cookie domain comparison
2026-06-09 20:12:01 +02:00
Xavier Roche
0a20aa8522 Fix never-matching wildcard cookie domain comparison
cookie_cmp_wildcard_domain used an unsigned loop counter, so i >= 0 was always
true (infinite loop and out-of-bounds reads) and an empty domain underflowed
l - 1. Use a signed counter. Found and fixed by greenrd in #172. closes #171
2026-06-09 20:09:23 +02:00
Xavier Roche
abd19b8cfa Merge pull request #316 from xroche/chore/changelog-news-symlink
build: symlink ChangeLog and NEWS to history.txt
2026-06-08 20:40:51 +02:00
Xavier Roche
4797749d4d build: symlink ChangeLog and NEWS to history.txt
They were empty automake stubs (GNU strictness requires the files to exist).
Pointing them at history.txt satisfies automake, drops the confusing empty
files, and ships a real changelog in the dist tarball without duplicating
content in git.
2026-06-08 20:40:27 +02:00
Xavier Roche
566b9d5008 Merge pull request #315 from xroche/docs/readme-badges
docs: add CI and license badges to README.md
2026-06-08 20:22:21 +02:00
Xavier Roche
8b6bc1d0ed docs: add CI and license badges to README.md 2026-06-08 20:21:52 +02:00
Xavier Roche
e4fc8ca26f Merge pull request #314 from xroche/ci/github-actions
ci: add GitHub Actions build/test matrix and shell lint
2026-06-08 20:19:11 +02:00
Xavier Roche
52692668cd ci: add GitHub Actions build/test matrix and shell lint
Build and test (autoreconf, configure, make, make check) on x86-64 and arm64
with gcc and clang. A lint job runs shellcheck and shfmt -i 4 on the maintained
scripts.
2026-06-08 20:16:38 +02:00
Xavier Roche
a2b3dc93a3 Merge pull request #313 from xroche/feat/license-gpl3-simplify
Drop the OpenSSL linking exception, simplify to GPL-3.0
2026-06-07 14:38:17 +02:00
yosinn1-blip
47e59b670b docs: fix preferred spelling in help text
Signed-off-by: Yoshiki <yosinn1@gmail.com>
2026-05-24 06:03:47 +09:00
GermanAizek
e003396432 Corrected spelling, text made more understandable of the Russian translations (Andrei Iliev) 2021-02-10 19:45:31 +03:00
Roman Sęk of Clearcode
5c1ba37adb Fix for handling changes introduced in postprocess 2017-05-04 15:22:56 +02:00
scootergrisen
2f1bde915a Updated danish translation.
Please fix the filenames (Danish.txt/Dansk.txt) which ever way you want to use.
2016-11-28 00:53:00 +01:00
62 changed files with 2113 additions and 654 deletions

27
.clang-format Normal file
View File

@@ -0,0 +1,27 @@
# clang-format 19 config for the HTTrack C engine.
#
# IMPORTANT: this is applied to TOUCHED LINES ONLY (via git-clang-format / the
# CI format check). The engine was originally formatted by GNU indent / by hand
# and does NOT round-trip through clang-format, so a whole-tree reformat is
# intentionally never done. Format the lines you change; leave the rest.
#
# Reverse-engineered from src/*.c: 2-space indent, no tabs, 80 columns, pointers
# bound to the name (char *x), attached braces, un-indented case labels, and a
# space after C-style casts ((int) x). Most of that is LLVM's defaults; the
# lines below are the deliberate deviations.
BasedOnStyle: LLVM
# Engine specifics / deviations from LLVM:
SpaceAfterCStyleCast: true # "(int) x", overwhelmingly dominant (542 vs 7)
SortIncludes: false # C include order can be significant; never reorder
IncludeBlocks: Preserve # do not merge/reflow include groups
# Stated explicitly for robustness against base-style drift (these match LLVM):
IndentWidth: 2
UseTab: Never
ColumnLimit: 80
PointerAlignment: Right
IndentCaseLabels: false
SpaceBeforeParens: ControlStatements
AllowShortIfStatementsOnASingleLine: Never

35
.githooks/README.md Normal file
View File

@@ -0,0 +1,35 @@
# Git hooks
Versioned hooks for this repo. Enable them once per clone:
```sh
git config core.hooksPath .githooks
```
## pre-commit: auto-format changed C lines
Runs `git-clang-format` (clang-format 19, using the repo `.clang-format`) on the
**staged lines only** and re-stages the result, so every commit is
clang-format-clean and the CI `format` check passes. It never reformats the
whole tree, only the lines you changed.
- Disable for a single commit: `HTTRACK_NO_AUTOFORMAT=1 git commit ...`
- If clang-format 19 isn't installed, the hook skips silently (CI still
enforces). Install it with your distro's `clang-format-19`, or from
apt.llvm.org.
- If a file has *both* staged and unstaged changes, the hook does not
auto-mutate it (that would commit the unstaged part); it instead reports
whether its staged lines need formatting and asks you to stage/stash the rest.
### noexec working trees
Git executes the hook directly, so if your working tree is on a `noexec` mount
git cannot run `.githooks/pre-commit`. Point `core.hooksPath` at a copy on an
exec filesystem instead:
```sh
mkdir -p ~/.httrack-hooks && cp .githooks/pre-commit ~/.httrack-hooks/
chmod +x ~/.httrack-hooks/pre-commit
git config core.hooksPath ~/.httrack-hooks
```
</content>

71
.githooks/pre-commit Executable file
View File

@@ -0,0 +1,71 @@
#!/usr/bin/env bash
#
# Auto-format the staged C lines with clang-format (touched lines only), then
# re-stage them, so commits stay clang-format-clean and CI's format check passes.
#
# Enable once per clone: git config core.hooksPath .githooks
# Skip for one commit: HTTRACK_NO_AUTOFORMAT=1 git commit ...
#
# Matches the CI gate (.clang-format, clang-format 19). It only ever touches the
# lines a commit changes; it never reformats the whole tree.
set -euo pipefail
[ "${HTTRACK_NO_AUTOFORMAT:-}" = "1" ] && exit 0
# Staged C/H files (added/copied/modified/renamed).
mapfile -t files < <(git diff --cached --name-only --diff-filter=ACMR -- '*.c' '*.h')
[ "${#files[@]}" -eq 0 ] && exit 0
# Locate clang-format 19 and the git driver; if absent, skip (CI is the backstop).
cf=""
for c in clang-format-19 clang-format; do
if command -v "$c" >/dev/null 2>&1; then
case "$("$c" --version)" in *"version 19."*)
cf="$c"
break
;;
esac
fi
done
gcf=""
for g in git-clang-format-19 git-clang-format; do
command -v "$g" >/dev/null 2>&1 && {
gcf="$g"
break
}
done
if [ -z "$cf" ] || [ -z "$gcf" ]; then
echo "pre-commit: clang-format 19 not found; skipping auto-format (CI still checks)." >&2
exit 0
fi
# Files that are staged AND also have unstaged changes: re-staging them would
# pull in the unstaged work, so don't auto-mutate. Check instead and let the
# author resolve it.
partial=()
for f in "${files[@]}"; do
if ! git diff --quiet -- "$f"; then partial+=("$f"); fi
done
if [ "${#partial[@]}" -ne 0 ]; then
d="$("$gcf" --binary "$cf" --style=file --staged --diff --extensions c,h || true)"
case "$d" in
"" | "no modified files to format" | *"did not modify any files"*)
exit 0
;; # staged lines already clean
*)
echo "pre-commit: these files have both staged and unstaged changes, so" >&2
echo "auto-format was skipped to avoid committing unstaged work:" >&2
printf ' %s\n' "${partial[@]}" >&2
echo "Their staged lines need formatting. Stage the rest (or stash it)," >&2
echo "or run: $gcf --binary $cf --staged" >&2
exit 1
;;
esac
fi
# Clean-staged files: format the staged lines in the working tree, then re-stage.
"$gcf" --binary "$cf" --style=file --staged --extensions c,h >/dev/null || true
git add -- "${files[@]}"
exit 0

176
.github/workflows/ci.yml vendored Normal file
View File

@@ -0,0 +1,176 @@
# Build and test on x86-64 and arm64, and lint the shell scripts.
name: CI
on:
push:
branches: [master]
pull_request:
workflow_dispatch:
# Least privilege: the workflow only needs to read the repo.
permissions:
contents: read
# Cancel superseded runs on the same branch or PR.
concurrency:
group: ci-${{ github.ref }}
cancel-in-progress: true
jobs:
build:
name: build (${{ matrix.arch }}, ${{ matrix.cc }})
runs-on: ${{ matrix.runner }}
strategy:
fail-fast: false
matrix:
include:
- { arch: x86-64, runner: ubuntu-24.04, cc: gcc }
- { arch: x86-64, runner: ubuntu-24.04, cc: clang }
- { arch: arm64, runner: ubuntu-24.04-arm, cc: gcc }
- { arch: arm64, runner: ubuntu-24.04-arm, cc: clang }
env:
CC: ${{ matrix.cc }}
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Install build dependencies
run: |
set -euo pipefail
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential clang autoconf automake libtool autoconf-archive \
zlib1g-dev libssl-dev
- name: Configure
run: |
set -euo pipefail
# autoreconf installs the automake test-driver (not committed) and
# validates configure.ac, so "make check" works on a fresh checkout.
autoreconf -fi
./configure
- name: Build
run: make -j"$(nproc)"
- name: Test
run: make check
- name: Print the test log on failure
if: failure()
run: cat tests/test-suite.log 2>/dev/null || true
dco:
name: DCO sign-off
# Only checkable on a PR, where we have the base..head commit range.
if: github.event_name == 'pull_request'
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Every commit must be signed off
env:
BASE: ${{ github.event.pull_request.base.sha }}
HEAD: ${{ github.event.pull_request.head.sha }}
run: |
set -euo pipefail
fail=0
# --no-merges: merge commits are GitHub-generated and carry no sign-off.
for sha in $(git rev-list --no-merges "$BASE..$HEAD"); do
if [ -z "$(git log -1 --format='%(trailers:key=Signed-off-by)' "$sha")" ]; then
echo "Missing Signed-off-by: $(git log -1 --format='%h %s' "$sha")"
fail=1
fi
done
if [ "$fail" -ne 0 ]; then
echo
echo "Sign commits with 'git commit -s'; fix a branch with 'git rebase --signoff $BASE'."
echo "See CONTRIBUTING.md (Developer Certificate of Origin)."
exit 1
fi
lint:
name: lint (shellcheck, shfmt)
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- name: Install linters
env:
SHFMT_VERSION: v3.8.0
run: |
set -euo pipefail
sudo apt-get update
sudo apt-get install -y --no-install-recommends shellcheck
# shfmt is not packaged in apt; fetch a pinned release binary.
curl -fsSL -o /tmp/shfmt \
"https://github.com/mvdan/sh/releases/download/${SHFMT_VERSION}/shfmt_${SHFMT_VERSION}_linux_$(dpkg --print-architecture)"
sudo install -m 0755 /tmp/shfmt /usr/local/bin/shfmt
# Lint the scripts we maintain; the legacy scripts are a separate cleanup.
- name: shellcheck
run: shellcheck man/makeman.sh tools/mkdeb.sh .githooks/pre-commit tests/*.test tests/check-network.sh
- name: shfmt
run: shfmt -d -i 4 man/makeman.sh tools/mkdeb.sh .githooks/pre-commit
# Check clang-format on CHANGED LINES ONLY. The engine predates clang-format
# (it was shaped by an old Visual Studio formatter) and does not round-trip,
# so we never reformat the whole tree -- only the lines a PR touches.
format:
name: format (clang-format-19, changed lines)
if: github.event_name == 'pull_request'
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install clang-format 19 (pinned, from apt.llvm.org)
run: |
set -euo pipefail
# ubuntu-24.04's native clang-format is 18; pin 19 to match local dev.
wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \
| sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null
echo "deb http://apt.llvm.org/noble/ llvm-toolchain-noble-19 main" \
| sudo tee /etc/apt/sources.list.d/llvm-19.list >/dev/null
sudo apt-get update
sudo apt-get install -y --no-install-recommends clang-format-19
# git-clang-format driver, pinned to an immutable release tag (not a
# moving branch) since we curl and then execute it.
sudo curl -fsSL -o /usr/local/bin/git-clang-format \
https://raw.githubusercontent.com/llvm/llvm-project/llvmorg-19.1.7/clang/tools/clang-format/git-clang-format
sudo chmod 0755 /usr/local/bin/git-clang-format
clang-format-19 --version
- name: Check formatting of changed lines
run: |
set -euo pipefail
git fetch --no-tags origin \
"+refs/heads/${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}"
base="origin/${{ github.base_ref }}"
set +e
diff="$(git clang-format --binary clang-format-19 --style=file \
--diff --extensions c,h "$base")"
rc=$?
set -e
# Classify by output first: a non-empty diff means "not clean",
# regardless of the driver's exit convention (the release-tag driver
# exits 0 and signals via stdout; some packaged drivers exit 1 on a
# diff). A nonzero exit with clean output is a real checker error.
case "$diff" in
"" | "no modified files to format" | *"did not modify any files"*)
if [ "$rc" -ne 0 ]; then
echo "::error::git clang-format failed (exit $rc): checker error."
exit 1
fi
echo "Formatting OK: changed C lines are clang-format-clean." ;;
*)
echo "$diff"
echo "::error::Changed C lines are not clang-format-clean."
echo "Fix locally with: git clang-format --binary clang-format-19 $base"
exit 1 ;;
esac

83
CODE_OF_CONDUCT.md Normal file
View File

@@ -0,0 +1,83 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at <roche@httrack.com>. All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series of actions.
**Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.1, available at [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at [https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations

39
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,39 @@
# Contributing to HTTrack
HTTrack is small and old. Keep changes easy to review and safe to merge.
## Pull requests
- One change per PR. Small diffs merge fast.
- PRs are squash-merged: the title and description become the commit message, so
explain *why*.
- Add or update tests for engine changes (`tests/`), and keep CI green.
## Style
- C, matching nearby code. **Format only the lines you change** (`git
clang-format` against the repo `.clang-format`). Never reformat untouched code.
- Comment the *why*, in English.
- HTTrack parses hostile input off the network. Check bounds, avoid unchecked
copies, and never let an attacker-controlled length drive arithmetic unchecked.
## Sign your work
Every commit needs a `Signed-off-by` line, the
[DCO](https://developercertificate.org/): `git commit -s`. CI rejects unsigned
commits; fix a branch with `git rebase --signoff master`.
## AI assistants
Welcome, and nothing to disclose. Two rules:
- **Own every line** as if you wrote it. Can't explain it in review? Not ready.
- **Don't push your work onto reviewers.** A raw generated patch a maintainer has
to vet from scratch will be closed.
The sign-off covers AI-assisted code too.
## Bugs
Open an issue with the version, OS, command used, and expected vs actual result.
For security issues see [SECURITY.md](SECURITY.md), not a public issue.

View File

1
ChangeLog Symbolic link
View File

@@ -0,0 +1 @@
history.txt

0
NEWS
View File

1
NEWS Symbolic link
View File

@@ -0,0 +1 @@
history.txt

View File

@@ -1,5 +1,8 @@
# HTTrack Website Copier - Development Repository
[![CI](https://github.com/xroche/httrack/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/xroche/httrack/actions/workflows/ci.yml)
[![License](https://img.shields.io/github/license/xroche/httrack)](COPYING)
## About
_Copy websites to your computer (Offline browser)_
@@ -20,7 +23,7 @@ http://www.httrack.com/
## Compile trunk release
```sh
git clone https://github.com/xroche/httrack.git --recurse
git clone https://github.com/xroche/httrack.git --recurse-submodules
cd httrack
./configure --prefix=$HOME/usr && make -j8 && make install
```

23
SECURITY.md Normal file
View File

@@ -0,0 +1,23 @@
# Security Policy
## Reporting
Report privately, not in a public issue or PR: use GitHub
[private advisories](https://github.com/xroche/httrack/security/advisories/new)
or email <roche@httrack.com> (alternate: `xroche at gmail dot com`).
Include the HTTrack version and platform, a concrete reproduction (command line,
a sample page or server response, or a small proof of concept), and what an
attacker gains. We'll acknowledge it and keep you posted. Please allow time for a
release before disclosing publicly.
## Supported versions
Fixes land on `master` and ship in the next release; older releases aren't
maintained. Confirm against current `master` when you can.
## AI-assisted findings
Scanners and LLMs are fine, but only send reports you have verified yourself. A
confirmed, reproducible issue is worth our time; a plausible one that doesn't
reproduce is not, and will be closed. If a report is AI-assisted, say so.

7
debian/changelog vendored
View File

@@ -4,6 +4,10 @@ httrack (3.49.8-1) unstable; urgency=medium
* Drop the OpenSSL linking exception from the license: OpenSSL 3.0+ is
Apache-2.0 and GPL-compatible, so it is no longer needed. httrack is now
plain GPL-3.0-or-later. Updated debian/copyright accordingly.
* Fix a batch of lintian tags: depend on sensible-utils, point to
common-licenses/GPL-3, use a secure version=4 watch file, add
Rules-Requires-Root and Vcs-Browser, and override the false-positive
source-is-missing on the bundled HTML documentation.
-- Xavier Roche <xavier@debian.org> Sun, 07 Jun 2026 14:29:24 +0200
@@ -934,7 +938,7 @@ httrack (3.39.6-1) unstable; urgency=low
httrack (3.39.5-1) unstable; urgency=low
* Updated to 3.39.5 (3.40-alpha-5)
* Updated to 3.39.5 (3.40-alpha-5)
-- Xavier Roche <xavier@debian.org> Fri, 29 Jul 2005 20:57:44 +0200
@@ -1616,4 +1620,3 @@ httrack (3.22-1) unstable; urgency=low
* Initial Release.
-- Xavier Roche <xavier@debian.org> Fri, 27 Sep 2002 16:42:25 +0200

6
debian/control vendored
View File

@@ -4,8 +4,10 @@ Priority: optional
Maintainer: Xavier Roche <roche@httrack.com>
Standards-Version: 4.7.0
Build-Depends: debhelper-compat (= 13), autoconf, autoconf-archive, automake, libtool, zlib1g-dev, libssl-dev
Rules-Requires-Root: no
Homepage: http://www.httrack.com
Vcs-Git: https://github.com/xroche/httrack.git
Vcs-Browser: https://github.com/xroche/httrack
Package: httrack
Architecture: any
@@ -23,12 +25,12 @@ Description: Copy websites to your computer (Offline browser)
browse the site from link to link, as if you were viewing it online.
HTTrack can also update an existing mirrored site, and resume
interrupted downloads. HTTrack is fully configurable, and has an
integrated help system.
integrated help system.
Package: webhttrack
Architecture: any
Multi-Arch: foreign
Depends: ${misc:Depends}, ${shlibs:Depends}, webhttrack-common, iceape-browser | iceweasel | icecat | mozilla | firefox | mozilla-firefox | www-browser | sensible-utils
Depends: ${misc:Depends}, ${shlibs:Depends}, webhttrack-common, sensible-utils, iceape-browser | iceweasel | icecat | mozilla | firefox | mozilla-firefox | www-browser
Replaces: webhttrack-common (<< 3.43.9-2)
Breaks: webhttrack-common (<< 3.43.9-2)
Suggests: httrack, httrack-doc

2
debian/copyright vendored
View File

@@ -13,7 +13,7 @@ the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
On Debian systems, the complete text of the GNU General Public
License can be found in /usr/share/common-licenses/GPL file.
License version 3 can be found in /usr/share/common-licenses/GPL-3 file.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of

View File

@@ -1,5 +1,4 @@
libhttrack-dev: breakout-link *
libhttrack-dev: hardening-no-fortify-functions usr/lib/x86_64-linux-gnu/httrack/libtest/*
libhttrack-dev: library-not-linked-against-libc usr/lib/*/httrack/libtest/*
libhttrack-dev: package-contains-documentation-outside-usr-share-doc usr/share/httrack/libtest/readme.txt
libhttrack-dev: package-name-defined-in-config-h usr/include/httrack/config.h

2
debian/rules vendored
View File

@@ -44,7 +44,7 @@ build-indep:
build-arch: build-stamp
build-stamp: configure-stamp
build-stamp: configure-stamp
dh_testdir
dh_auto_build
dh_auto_test

View File

@@ -1,2 +1,8 @@
httrack source: changelog-should-mention-nmu
httrack source: source-nmu-has-incorrect-version-number
# The bundled HTML pages are the genuine upstream documentation taken from
# the httrack.com website. lintian's long-line heuristic mistakes them for
# minified or generated content, but they are the actual source.
httrack source: source-is-missing [html/*]
httrack source: source-is-missing [templates/*]

9
debian/watch vendored
View File

@@ -1,7 +1,6 @@
# format version number, currently 3; this line is compulsory!
version=3
# format version number; this line is compulsory!
version=4
# main httrack.com download page ; fetch the mirror version number
http://www.httrack.com/page/2/en/index.html\
.*/httrack-([\d\.]+).tar.gz
https://www.httrack.com/page/2/en/index.html \
.*/httrack-([\d\.]+)\.tar\.gz

View File

@@ -1 +0,0 @@
webhttrack: missing-depends-on-sensible-utils sensible-browser usr/bin/webhttrack

View File

@@ -5,8 +5,12 @@ HTTrack Website Copier release history:
This file lists all changes and fixes that have been made for HTTrack
3.49-8
+ Changed: dropped the obsolete OpenSSL linking exception (OpenSSL 3.0+ is
Apache-2.0 and GPL-compatible); httrack is now plain GPLv3-or-later
+ Changed: dropped the obsolete OpenSSL linking exception (OpenSSL 3.0+ is Apache-2.0 and GPL-compatible); httrack is now plain GPLv3-or-later
+ Fixed: link libhtsjava and the libtest examples directly against libc
+ Fixed: in-place changes made by the postprocess callback were not applied (Roman Sęk)
+ Fixed: "preffered" typo in the help text and man page (yosinn1-blip)
+ Fixed: corrections and updates of the Russian translation (German Aizek)
+ Fixed: corrections and updates of the Danish translation (scootergrisen)
3.49-7
+ Fixed: keep generated config.h architecture-independent (Debian #1133728)

View File

@@ -118,11 +118,11 @@ The command-line version
<br>
<br>
<li>Add the URLs, separated by a blank space</li>
<br><small><tt>httrack www.someweb.com/foo/</tt></small>
<br><small><tt>httrack www.example.com/foo/</tt></small>
<br>
<br>
<li>If you need, add some options (see the <a href="options.html">option list</a>)</li>
<br><small><tt>httrack www.someweb.com/foo/ -O "/webs" -N4 -P proxy.myhost.com:3128</tt></small>
<br><small><tt>httrack www.example.com/foo/ -O "/webs" -N4 -P proxy.myhost.com:3128</tt></small>
<br>
<br>
<li>Launch the command line, and wait until the mirror is finishing</li>

View File

@@ -303,43 +303,43 @@ Okay, let me explain how to precisely control the capture process.<br>
Let's take an example:<br>
<br>
Imagine you want to capture the following site:<br>
<tt>www.someweb.com/gallery/flowers/</tt><br>
<tt>www.example.com/gallery/flowers/</tt><br>
<br>
HTTrack, by default, will capture all links encountered in <tt>www.someweb.com/gallery/flowers/</tt> or in lower directories, like
<tt>www.someweb.com/gallery/flowers/roses/</tt>.<br>
HTTrack, by default, will capture all links encountered in <tt>www.example.com/gallery/flowers/</tt> or in lower directories, like
<tt>www.example.com/gallery/flowers/roses/</tt>.<br>
It will not follow links to other websites, because this behaviour might cause to capture the Web entirely!<br>
It will not follow links located in higher directories, too (for example, <tt>www.someweb.com/gallery/flowers/</tt> itself) because this
It will not follow links located in higher directories, too (for example, <tt>www.example.com/gallery/flowers/</tt> itself) because this
might cause to capture too much data.<br>
<br>
This is the <b><u>default behaviour</b></u> of HTTrack, BUT, of course, if you want, you can tell HTTrack to capture other directorie(s), website(s)!..
<br>
In our example, we might want also to capture all links in <tt>www.someweb.com/gallery/trees/</tt>, and in <tt>www.someweb.com/photos/</tt><br>
In our example, we might want also to capture all links in <tt>www.example.com/gallery/trees/</tt>, and in <tt>www.example.com/photos/</tt><br>
<br>
This can easily done by using filters: go to the Option panel, select the 'Scan rules' tab, and enter this line:
(you can leave a blank space between each rules, instead of entering a carriage return)<br>
<tt>+www.someweb.com/gallery/trees/*<br>
+www.someweb.com/photos/*</tt><br>
<tt>+www.example.com/gallery/trees/*<br>
+www.example.com/photos/*</tt><br>
<br>
This means "accept all links begining with <tt>www.someweb.com/gallery/trees/</tt> and <tt>www.someweb.com/photos/</tt>"
This means "accept all links begining with <tt>www.example.com/gallery/trees/</tt> and <tt>www.example.com/photos/</tt>"
- the <tt>+</tt> means "accept" and the final <tt>*</tt> means "any character will match after the previous ones".
Remember the <tt>*.doc</tt> or <tt>*.zip</tt> encountered when you want to select all files from a certain type on your computer:
it is almost the same here, except the begining "+"<br>
<br>
Now, we might want to exclude all links in <tt>www.someweb.com/gallery/trees/hugetrees/</tt>, because with the previous filter,
Now, we might want to exclude all links in <tt>www.example.com/gallery/trees/hugetrees/</tt>, because with the previous filter,
we accepted too many files. Here again, you can add a filter rule to refuse these links. Modify the previous filters to:<br>
<tt>+www.someweb.com/gallery/trees/*<br>
+www.someweb.com/photos/*<br>
-www.someweb.com/gallery/trees/hugetrees/*</tt><br>
<tt>+www.example.com/gallery/trees/*<br>
+www.example.com/photos/*<br>
-www.example.com/gallery/trees/hugetrees/*</tt><br>
<br>
You have noticed the <tt>-</tt> in the begining of the third rule: this means "refuse links matching the rule"
; and the rule is "any files begining with <tt>www.someweb.com/gallery/trees/hugetrees/</tt><br>
; and the rule is "any files begining with <tt>www.example.com/gallery/trees/hugetrees/</tt><br>
Voila! With these three rules, you have precisely defined what you wanted to capture.<br>
<br>
A more complex example?<br>
<br>
Imagine that you want to accept all jpg files (files with .jpg type) that have "blue" in the name and located in www.someweb.com<br>
<tt>+www.someweb.com/*blue*.jpg</tt><br>
Imagine that you want to accept all jpg files (files with .jpg type) that have "blue" in the name and located in www.example.com<br>
<tt>+www.example.com/*blue*.jpg</tt><br>
<br>
More detailed information can be found <a href="filters.html">here</a>!<br>
<br>
@@ -440,7 +440,7 @@ This will cause a performance loss, but will increase the compatibility with som
<a NAME="QT1">Q: <strong>Only the first page is caught. What's wrong?</a></strong></br>
A: <em>First, check the <tt>hts-log.txt</tt> file (and/or <tt>hts-err.txt</tt> error log file) - this can give you precious information.<br>
The problem can be a website that redirects you to another site (for example, <tt>www.someweb.com</tt> to <tt>public.someweb.com</tt>) :
The problem can be a website that redirects you to another site (for example, <tt>www.example.com</tt> to <tt>public.example.com</tt>) :
in this case, use filters to accept this site<br>
This can be, also, a problem in the HTTrack options (link depth too low, for example)</em>
@@ -485,10 +485,10 @@ You may also want to capture files that are forbidden by default by the <a href=
In these cases, HTTrack does not capture these links automatically, you have to tell it to do so.
<br><br>
<ul><li>Either use the <a href="filters.html">filters</a>.<br>
Example: You are downloading <tt>http://www.someweb.com/foo/</tt> and can not get .jpg images located
in <tt>http://www.someweb.com/bar/</tt> (for example, http://www.someweb.com/bar/blue.jpg)<br>
Then, add the filter rule <tt>+www.someweb.com/bar/*.jpg</tt> to accept all .jpg files from this location<br>
You can, also, accept all files from the /bar folder with <tt>+www.someweb.com/bar/*</tt>, or only html files with <tt>+www.someweb.com/bar/*.html</tt> and so on..<br><br>
Example: You are downloading <tt>http://www.example.com/foo/</tt> and can not get .jpg images located
in <tt>http://www.example.com/bar/</tt> (for example, http://www.example.com/bar/blue.jpg)<br>
Then, add the filter rule <tt>+www.example.com/bar/*.jpg</tt> to accept all .jpg files from this location<br>
You can, also, accept all files from the /bar folder with <tt>+www.example.com/bar/*</tt>, or only html files with <tt>+www.example.com/bar/*.html</tt> and so on..<br><br>
</li><li>
If the problems are related to robots.txt rules, that do not let you access some folders (check in the logs if you are not sure),
you may want to disable the default robots.txt rules in the options. (but only disable this option with great care,
@@ -509,8 +509,8 @@ and rescan the website as described before. HTTrack will be obliged to recatch t
<a NAME="Q1bb">Q: <strong>FTP links are not caught! What's happening?</strong><br>
A: <em>FTP files might be seen as external links, especially if they are located in outside domain. You have either to accept all external links (See the links options, -n option) or
only specific files (see <a href="filters.html">filters</a> section). <br>
Example: You are downloading <tt>http://www.someweb.com/foo/</tt> and can not get ftp://ftp.someweb.com files<br>
Then, add the filter rule <tt>+ftp.someweb.com/*</tt> to accept all files from this (ftp) location<br>
Example: You are downloading <tt>http://www.example.com/foo/</tt> and can not get ftp://ftp.example.com files<br>
Then, add the filter rule <tt>+ftp.example.com/*</tt> to accept all files from this (ftp) location<br>
</em>
<br>
@@ -551,10 +551,10 @@ Note: In some rare cases, duplicate data files can be found when the website red
<a NAME="Q1b2">Q: <strong>I'm downloading too many files! What can I do?</strong><br>
A: <em>This is often the case when you use too large a filter, for example <tt>+*.html</tt>, which asks the
engine to catch all .html pages (even ones on other sites!). In this case, try to use more specific filters, like <tt>+www.someweb.com/specificfolder/*.html</tt><br>
If you still have too many files, use filters to avoid somes files. For example, if you have too many files from www.someweb.com/big/,
use <tt>-www.someweb.com/big/*</tt> to avoid all files from this folder. Remember that the default behaviour of the engine, when
mirroring http://www.someweb.com/big/index.html, is to catch everything in http://www.someweb.com/big/. Filters are your friends,
engine to catch all .html pages (even ones on other sites!). In this case, try to use more specific filters, like <tt>+www.example.com/specificfolder/*.html</tt><br>
If you still have too many files, use filters to avoid somes files. For example, if you have too many files from www.example.com/big/,
use <tt>-www.example.com/big/*</tt> to avoid all files from this folder. Remember that the default behaviour of the engine, when
mirroring http://www.example.com/big/index.html, is to catch everything in http://www.example.com/big/. Filters are your friends,
use them!
</em>
<br>
@@ -562,7 +562,7 @@ use them!
<a NAME="Q1b22">Q: <strong>The engine turns crazy, getting thousands of files! What's going on?</strong><br>
A: <em>This can happen if a loop occurs in some bogus website. For example, a page that refers to itself, with a timestamp
in the query string (e.g. <tt>http://www.someweb.com/foo.asp?ts=2000/10/10,09:45:17:147</tt>).
in the query string (e.g. <tt>http://www.example.com/foo.asp?ts=2000/10/10,09:45:17:147</tt>).
These are really annoying, as it is VERY difficult to detect the loop (the timestamp might be a page number).
To limit the problem: set a recurse level (for example to 6), or avoid the bogus pages (use the filters)
</em>
@@ -571,7 +571,7 @@ To limit the problem: set a recurse level (for example to 6), or avoid the bogus
<a NAME="Q1b3">Q: <strong>File are sometimes renamed (the type is changed)! Why?</strong><br>
A: <em>By default, HTTrack tries to know the type of remote files. This is useful when links like
<tt>http://www.someweb.com/foo.cgi?id=1</tt> can be either HTML pages, images or anything else.
<tt>http://www.example.com/foo.cgi?id=1</tt> can be either HTML pages, images or anything else.
Locally, foo.cgi will not be recognized as an html page, or as an image, by your browser. HTTrack has to rename the file
as foo.html or foo.gif so that it can be viewed.<br>
</em>
@@ -730,8 +730,8 @@ but this is a smart bug..
the domain, too. How to retrieve them?</strong><br>
A: <em>If you just want to retrieve files that can be reached through links, just activate
the 'get file near links' option. But if you want to retrieve html pages too, you can both
use wildcards or explicit addresses ; e.g. add <tt>www.someweb.com/*</tt> to accept all
files and pages from www.someweb.com.<br>
use wildcards or explicit addresses ; e.g. add <tt>www.example.com/*</tt> to accept all
files and pages from www.example.com.<br>
<br>
</em></a><a NAME="Q6">Q: <strong>I have forgotten some URLs of files during a long
mirror.. Should I redo all?</strong><br>
@@ -744,7 +744,7 @@ A: <em>You can use different methods. You can use the 'get files near a link' op
files are in a foreign domain. You can use, too, a filter adress: adding <tt>+*.zip</tt>
in the URL list (or in the filter list) will accept all ZIP files, even if these files are
outside the address. <br>
Example : <tt>httrack www.someweb.com/someaddress.html +*.zip</tt> will allow
Example : <tt>httrack www.example.com/someaddress.html +*.zip</tt> will allow
you to retrieve all zip files that are linked on the site.</em><br>
<br>
</a><a NAME="Q8">Q: <strong>There are ZIP files in a page, but I don't want to transfer
@@ -771,7 +771,7 @@ them on filters!</strong><br>
A: <em>By default, HTTrack retrieves all types of files on authorized links. To avoid
that, define filters like </a><a NAME="Q7"><tt>-* +&lt;website&gt;/*.html
+&lt;website&gt;/*.htm +&lt;website&gt;/ +*.&lt;type wanted&gt;</tt></a><a NAME="Q10"><br>
Example: <tt>httrack www.someweb.com/index.html -* +www.someweb.com/*.htm* +www.someweb.com/*.gif +www.someweb.com/*.jpg</tt><br>
Example: <tt>httrack www.example.com/index.html -* +www.example.com/*.htm* +www.example.com/*.gif +www.example.com/*.jpg</tt><br>
<br>
</em><a NAME="Q10">Q: <strong>When I use filters, I get too many files!</strong><br>
A: <em>You might use too large a filter, for example <tt>*.html</tt> will get ALL html
@@ -779,13 +779,13 @@ files identified. If you want to get all files on an address, use <tt>www.&lt;ad
If you want to get ONLY files defined by your filters, use something like <tt>-* +www.foo.com/*</tt>, because
<tt>+www.foo.com/*</tt> will only accept selected links without forbidding other ones!<br>
There are lots of possibilities using filters.<br>
Example:<tt>httrack www.someweb.com +*.someweb.com/*.htm*</tt><br>
Example:<tt>httrack www.example.com +*.example.com/*.htm*</tt><br>
<br>
</em></a><a NAME="Q11">Q: <strong>When I use filters, I can't access another domain, but I
have filtered it!</strong><br>
A: <em>You may have done a mistake declaring filters, for example <tt>+www.someweb.com/*
-*someweb* </tt></em>will not work, because -*someweb* has an upper priority (because it has
been declared after +www.someweb.com)<br>
A: <em>You may have done a mistake declaring filters, for example <tt>+www.example.com/*
-*example* </tt></em>will not work, because -*example* has an upper priority (because it has
been declared after +www.example.com)<br>
<br>
</a><a NAME="Q12">Q: <strong>Must I add a&nbsp; '+' or '-' in the filter list when I want
to use filters?</strong><br>
@@ -800,7 +800,7 @@ filter list) and accept only html files and the file(s) you want to retrieve (BU
forget to add <tt>+&lt;website&gt;*.html</tt> in the filter list, or pages will not be
scanned! Add the name of files you want with a <tt>*/</tt> before ; i.e. if you want to
retrieve file.zip, add <tt>*/file.zip</tt>)<br>
Example:<tt>httrack www.someweb.com +www.someweb.com/*.htm* +thefileiwant.zip</tt><br>
Example:<tt>httrack www.example.com +www.example.com/*.htm* +thefileiwant.zip</tt><br>
<br>
</em>
@@ -828,7 +828,7 @@ A: <em>Yes. See the URL capture abilities (--catchurl for command-line release,
A: <em>Yes. See the shell system command option (-V option for command-line release)</em>
<br><br><a NAME="QM6">Q: <strong>Can I use username/password authentication on a site?</strong></a><br>
A: <em>Yes. Use user:password@your_url (example: <tt>http://foo:bar@www.someweb.com/private/mybox.html</tt>)</em>
A: <em>Yes. Use user:password@your_url (example: <tt>http://foo:bar@www.example.com/private/mybox.html</tt>)</em>
<br><br><a NAME="QM7">Q: <strong>Can I use username/password authentication for a proxy?</strong></a><br>
A: <em>Yes. Use user:password@your_proxy_name as your proxy name (example: <tt>smith:foo@proxy.mycorp.com</tt>)</em>

View File

@@ -181,17 +181,17 @@ used for some time.
<p align=justify> The rest of this manual is dedicated to detailing what
you find in the help message and providing examples - lots and lots of
examples... Here is what you get (page by page - use <enter> to move to
examples... Here is what you get (page by page - use &lt;enter&gt; to move to
the next page in the real program) if you type 'httrack --help':
<pre>
>httrack --help
HTTrack version 3.03BETAo4 (compiled Jul 1 2001)
usage: ./httrack <URLs [-option] [+<FILTERs>] [-<FILTERs>]
usage: ./httrack &lt;URLs&gt; [-option] [+&lt;FILTERs&gt;] [-&lt;FILTERs&gt;]
with options listed below: (* is the default value)
General options:
O path for mirror/logfiles+cache (-O path_mirror[,path_cache_and_logfiles]) (--path <param>)
O path for mirror/logfiles+cache (-O path_mirror[,path_cache_and_logfiles]) (--path &lt;param&gt;)
%O top path if no path defined (-O path_mirror[,path_cache_and_logfiles])
Action options:
@@ -202,7 +202,7 @@ Action options:
Y mirror ALL links located in the first level pages (mirror links) (--mirrorlinks)
Proxy options:
P proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy <param>)
P proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy &lt;param&gt;)
%f *use proxy for ftp (f0 don't use) (--httpproxy-ftp[=N])
Limits options:
@@ -227,7 +227,7 @@ Links options:
%P *extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don't use) (--extended-parsing[=N])
n get non-html files 'near' an html file (ex: an image located outside) (--near)
t test all URLs (even forbidden ones) (--test)
%L <file add all URL located in this text file (one URL per line) (--list <param>)
%L &lt;file&gt; add all URL located in this text file (one URL per line) (--list &lt;param&gt;)
Build options:
NN structure type (0 *original structure, 1+: see below) (--structure[=N])
@@ -248,12 +248,12 @@ Spider options:
%h force HTTP/1.0 requests (reduce update features, only for old servers or proxies) (--http-10)
%B tolerant requests (accept bogus responses on some servers, but not standard!) (--tolerant)
%s update hacks: various hacks to limit re-transfers when updating (identical size, bogus response..) (--updatehack)
%A assume that a type (cgi,asp..) is always linked with a mime type (-%A php3=text/html) (--assume <param>)
%A assume that a type (cgi,asp..) is always linked with a mime type (-%A php3=text/html) (--assume &lt;param&gt;)
Browser ID:
F user-agent field (-F "user-agent name") (--user-agent <param>)
%F footer string in Html code (-%F "Mirrored [from host %s [file %s [at %s]]]" (--footer <param>)
%l preffered language (-%l "fr, en, jp, *" (--language <param>)
F user-agent field (-F "user-agent name") (--user-agent &lt;param&gt;)
%F footer string in Html code (-%F "Mirrored [from host %s [file %s [at %s]]]" (--footer &lt;param&gt;)
%l preferred language (-%l "fr, en, jp, *" (--language &lt;param&gt;)
Log, index, cache
C create/use a cache for updates and retries (C0 no cache,C1 cache is prioritary,* C2 test update before) (--cache[=N])
@@ -303,8 +303,8 @@ Guru options: (do NOT use)
#! Execute a shell command (-#! "echo hello")
Command-line specific options:
V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd <param>)
%U run the engine with another id when called as root (-%U smith) (--user <param>)
V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd &lt;param&gt;)
%U run the engine with another id when called as root (-%U smith) (--user &lt;param&gt;)
Details: Option N
N0 Site-structure (default)
@@ -332,7 +332,7 @@ Details: User-defined option N
%N Name of file, including file type (ex: image.gif)
%t File type (ex: gif)
%p Path [without ending /] (ex: /someimages)
%h Host name (ex: www.someweb.com) (--http-10)
%h Host name (ex: www.example.com) (--http-10)
%M URL MD5 (128 bits, 32 ascii bytes)
%Q query string MD5 (128 bits, 32 ascii bytes)
%q small query string MD5 (16 bits, 4 ascii bytes) (--include-query-string)
@@ -340,14 +340,14 @@ Details: User-defined option N
%[param] param variable in query string
Shortcuts:
--mirror <URLs *make a mirror of site(s) (default)
--get <URLs get the files indicated, do not seek other URLs (-qg)
--list <text file add all URL located in this text file (-%L)
--mirrorlinks <URLs mirror all links in 1st level pages (-Y)
--testlinks <URLs test links in pages (-r1p0C0I0t)
--spider <URLs spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
--testsite <URLs identical to --spider
--skeleton <URLs make a mirror, but gets only html files (-p1)
--mirror &lt;URLs&gt; *make a mirror of site(s) (default)
--get &lt;URLs&gt; get the files indicated, do not seek other URLs (-qg)
--list &lt;text file&gt; add all URL located in this text file (-%L)
--mirrorlinks &lt;URLs&gt; mirror all links in 1st level pages (-Y)
--testlinks &lt;URLs&gt; test links in pages (-r1p0C0I0t)
--spider &lt;URLs&gt; spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
--testsite &lt;URLs&gt; identical to --spider
--skeleton &lt;URLs&gt; make a mirror, but gets only html files (-p1)
--update update a mirror, without confirmation (-iC2)
--continue continue a mirror, without confirmation (-iC1)
@@ -356,17 +356,17 @@ Shortcuts:
--http10 force http/1.0 requests (-%h)
example: httrack www.someweb.com/bob/
means: mirror site www.someweb.com/bob/ and only this site
example: httrack www.example.com/bob/
means: mirror site www.example.com/bob/ and only this site
example: httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg
example: httrack www.example.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg
means: mirror the two sites together (with shared links) and accept any .jpg files on .com sites
example: httrack www.someweb.com/bob/bobby.html +* -r6
example: httrack www.example.com/bob/bobby.html +* -r6
means get all files starting from bobby.html, with 6 link-depth, and possibility of going everywhere on the web
example: httrack www.someweb.com/bob/bobby.html --spider -P proxy.myhost.com:8080
runs the spider on www.someweb.com/bob/bobby.html using a proxy
example: httrack www.example.com/bob/bobby.html --spider -P proxy.myhost.com:8080
runs the spider on www.example.com/bob/bobby.html using a proxy
example: httrack --update
updates a mirror in the current folder
@@ -387,13 +387,13 @@ with examples... I will be here a while...
<hr>
<h2> Syntax </h2>
<pre><b><i>httrack <URLs> [-option] [+<FILTERs>] [-<FILTERs>] </i></b></pre>
<pre><b><i>httrack &lt;URLs&gt; [-option] [+&lt;FILTERs&gt;] [-&lt;FILTERs&gt;] </i></b></pre>
<p align=justify> The syntax of httrack is quite simple. You specify
the URLs you wish to start the process from (<URLS>), any options you
the URLs you wish to start the process from (&lt;URLS&gt;), any options you
might want to add ([-option], any filters specifying places you should
([+<FILTERs>]) and should not ([-<FILTERs>]) go, and end the command
line by pressing <enter>. Httrack then goes off and does your bidding.
([+&lt;FILTERs&gt;]) and should not ([-&lt;FILTERs&gt;]) go, and end the command
line by pressing &lt;enter&gt;. Httrack then goes off and does your bidding.
For example:
<pre><b><i>
@@ -425,7 +425,7 @@ site. Specifically, the defauls are:
pN priority mode: (* p3) *3 save all files
D *can only go down into subdirs
a *stay on the same address
--mirror <URLs> *make a mirror of site(s) (default)
--mirror &lt;URLs&gt; *make a mirror of site(s) (default)
</pre>
<p align=justify> Here's what all of that means:
@@ -542,7 +542,7 @@ subdirectories of the starting directory to be investigated.
search started are to be collected. Other sites they point to are not
to be imaged.
<pre><b><i> --mirror <URLs> *make a mirror of site(s) (default) </i></b></pre>
<pre><b><i> --mirror &lt;URLs&gt; *make a mirror of site(s) (default) </i></b></pre>
<p align=justify> This indicates that the program should try to make a
copy of the site as well as it can.
@@ -921,7 +921,7 @@ Links options:
%P *extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don't use)
n get non-html files 'near' an html file (ex: an image located outside)
t test all URLs (even forbidden ones)
%L <file> add all URL located in this text file (one URL per line)
%L &lt;file&gt; add all URL located in this text file (one URL per line)
</i></b></pre>
<p align=justify> The links options allow you to control what links are
@@ -1183,7 +1183,7 @@ Spider options:
%h force HTTP/1.0 requests (reduce update features, only for old servers or proxies)
%B tolerant requests (accept bogus responses on some servers, but not standard!)
%s update hacks: various hacks to limit re-transfers when updating
%A assume that a type (cgi,asp..) is always linked with a mime type (-%A php3=text/html) (--assume <param>)
%A assume that a type (cgi,asp..) is always linked with a mime type (-%A php3=text/html) (--assume &lt;param&gt;)
</i></b></pre>
<p align=justify> By default, cookies are universally accepted and
@@ -1387,7 +1387,7 @@ web servers leave footprints in the browser.
Browser ID:
F user-agent field (-F "user-agent name")
%F footer string in Html code (-%F "Mirrored [from host %s [file %s [at %s]]]"
%l preffered language (-%l "fr, en, jp, *" (--language <param>)
%l preferred language (-%l "fr, en, jp, *" (--language &lt;param&gt;)
</i></b></pre>
<p align=justify> The user-agent field is used by browsers to determine
@@ -1799,7 +1799,7 @@ based authentication)
<pre><b><i>
Command-line specific options:
V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd <param>)
V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd &lt;param&gt;)
</i></b></pre>
<p align=justify> This option is very nice for a wide array of actions
@@ -1811,7 +1811,7 @@ httrack http://www.shoesizes.com/bob/ -O /tmp/shoesizes -V "/bin/echo \$0"
</i></b></pre>
<pre>
%U run the engine with another id when called as root (-%U smith) (--user <param>)
%U run the engine with another id when called as root (-%U smith) (--user &lt;param&gt;)
</pre>
<p align=justify> Change the UID of the owner when running as r00t
@@ -1856,14 +1856,14 @@ of other options that are commonly used.
<pre><b><i>
Shortcuts:
--mirror <URLs> *make a mirror of site(s) (default)
--get <URLs> get the files indicated, do not seek other URLs (-qg)
--list <text file> add all URL located in this text file (-%L)
--mirrorlinks <URLs> mirror all links in 1st level pages (-Y)
--testlinks <URLs> test links in pages (-r1p0C0I0t)
--spider <URLs> spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
--testsite <URLs> identical to --spider
--skeleton <URLs> make a mirror, but gets only html files (-p1)
--mirror &lt;URLs&gt; *make a mirror of site(s) (default)
--get &lt;URLs&gt; get the files indicated, do not seek other URLs (-qg)
--list &lt;text file&gt; add all URL located in this text file (-%L)
--mirrorlinks &lt;URLs&gt; mirror all links in 1st level pages (-Y)
--testlinks &lt;URLs&gt; test links in pages (-r1p0C0I0t)
--spider &lt;URLs&gt; spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
--testsite &lt;URLs&gt; identical to --spider
--skeleton &lt;URLs&gt; make a mirror, but gets only html files (-p1)
--update update a mirror, without confirmation (-iC2)
--continue continue a mirror, without confirmation (-iC1)
--catchurl create a temporary proxy to capture an URL or a form post URL
@@ -2019,15 +2019,15 @@ are in reverse priority order. Here's an example:
<td>no characters must be present after</a></td>
</tr>
<tr>
<td> <b> <filter>*[&lt NN]</b></td>
<td> <b> &lt;filter&gt;*[&lt NN]</b></td>
<td> size less than NN Kbytes</td>
</tr>
<tr>
<td> <b> <filter>*[&gt PP]</b></td>
<td> <b> &lt;filter&gt;*[&gt PP]</b></td>
<td> size more than PP Kbytes</td>
</tr>
<tr>
<td> <b> <filter>*[&lt NN &gt PP]</b></td>
<td> <b> &lt;filter&gt;*[&lt NN &gt PP]</b></td>
<td> size less than NN Kbytes and more than PP Kbytes</td>
</tr>
</table>
@@ -2054,8 +2054,8 @@ generated automatically using the interface)
<td>This will accept all zip files in .com addresses</td>
</tr>
<tr>
<td><b>-*someweb*/*.tar*</b></td>
<td>This will refuse all tar (or tar.gz etc.) files in hosts containing someweb</td>
<td><b>-*example*/*.tar*</b></td>
<td>This will refuse all tar (or tar.gz etc.) files in hosts containing example</td>
</tr>
<tr>
<td><b>+*/*somepage*</b></td>

View File

@@ -109,8 +109,8 @@ See also: The <a href="faq.html#VF1">FAQ</a><br>
<i>You have to know that once you have defined
starts links, the default mode is to mirror these links - i.e. if one of your start page is
www.someweb.com/test/index.html, all links starting with www.someweb.com/test/ will be
accepted. But links directly in www.someweb.com/.. will not be accepted, however, because
www.example.com/test/index.html, all links starting with www.example.com/test/ will be
accepted. But links directly in www.example.com/.. will not be accepted, however, because
they are in a higher strcuture. This prevent HTTrack from mirroring the whole site. (All
files in structure levels equal or lower than the primary links will be retrieved.)<br>
</i>
@@ -278,8 +278,8 @@ See also: The <a href="faq.html#VF1">FAQ</a><br>
<td>This will refuse/accept all zip files in .com addresses</td>
</tr>
<tr>
<td nowrap><tt>*someweb*/*.tar*</tt></td>
<td>This will refuse/accept all tar (or tar.gz etc.) files in hosts containing someweb</td>
<td nowrap><tt>*example*/*.tar*</tt></td>
<td>This will refuse/accept all tar (or tar.gz etc.) files in hosts containing example</td>
</tr>
<tr>
<td nowrap><tt>*/*somepage*</tt></td>
@@ -289,13 +289,13 @@ See also: The <a href="faq.html#VF1">FAQ</a><br>
<td nowrap><tt>*.html</tt></td>
<td>This will refuse/accept all html files. <br>
Warning! With this filter you will accept ALL html files, even those in other addresses.
(causing a global (!) web mirror..) Use www.someweb.com/*.html to accept all html files from
(causing a global (!) web mirror..) Use www.example.com/*.html to accept all html files from
a web.</td>
</tr>
<tr>
<td nowrap><tt>*.html*[]</tt></td>
<td>Identical to <tt>*.html</tt>, but the link must not have any supplemental characters
at the end (links with parameters, like <tt>www.someweb.com/index.html?page=10</tt>, will be
at the end (links with parameters, like <tt>www.example.com/index.html?page=10</tt>, will be
refused)</td>
</tr>
</table>

View File

@@ -123,12 +123,12 @@ mirrored site, and resume interrupted downloads.</p>
<p style="margin-left:11%; margin-top: 1em"><b>httrack
www.someweb.com/bob/</b></p>
www.example.com/bob/</b></p>
<p style="margin-left:22%;">mirror site
www.someweb.com/bob/ and only this site</p>
www.example.com/bob/ and only this site</p>
<p style="margin-left:11%;"><b>httrack www.someweb.com/bob/
<p style="margin-left:11%;"><b>httrack www.example.com/bob/
www.anothertest.com/mike/ +*.com/*.jpg <br>
-mime:application/*</b></p>
@@ -137,18 +137,18 @@ www.anothertest.com/mike/ +*.com/*.jpg <br>
sites</p>
<p style="margin-left:11%;"><b>httrack
www.someweb.com/bob/bobby.html +* -r6</b></p>
www.example.com/bob/bobby.html +* -r6</b></p>
<p style="margin-left:22%;">means get all files starting
from bobby.html, with 6 link-depth, and possibility of going
everywhere on the web</p>
<p style="margin-left:11%;"><b>httrack
www.someweb.com/bob/bobby.html --spider -P <br>
www.example.com/bob/bobby.html --spider -P <br>
proxy.myhost.com:8080</b></p>
<p style="margin-left:22%;">runs the spider on
www.someweb.com/bob/bobby.html using a proxy</p>
www.example.com/bob/bobby.html using a proxy</p>
<p style="margin-left:11%;"><b>httrack --update</b></p>
@@ -958,7 +958,7 @@ host %s [file %s [at %s]]]&quot; (--footer
<td width="78%">
<p>preffered language (-%l &quot;fr, en, jp, *&quot;
<p>preferred language (-%l &quot;fr, en, jp, *&quot;
(--language &lt;param&gt;)</p></td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
@@ -1877,7 +1877,7 @@ User-defined option N</b> <br>
%N Name of file, including file type (ex: image.gif) <br>
%t File type (ex: gif) <br>
%p Path [without ending /] (ex: /someimages) <br>
%h Host name (ex: www.someweb.com) <br>
%h Host name (ex: www.example.com) <br>
%M URL MD5 (128 bits, 32 ascii bytes) <br>
%Q query string MD5 (128 bits, 32 ascii bytes) <br>
%k full query string <br>

View File

@@ -131,16 +131,16 @@ This is the default primary scanning option, the engine does not go out of domai
d stay on the same principal domain
This option lets the engine go on all sites that exist on the same principal domain.
Example: a link located at www.someweb.com that goes to members.someweb.com will be followed.
Example: a link located at www.example.com that goes to members.example.com will be followed.
l stay on the same location (.com, etc.)
This option lets the engine go on all sites that exist on the same location.
Example: a link located at www.someweb.com that goes to www.anyotherweb.com will be followed.
Example: a link located at www.example.com that goes to www.anyotherweb.com will be followed.
Warning: this is a potentially dangerous option, limit the recurse depth with r option.
e go everywhere on the web
This option lets the engine go on any sites.
Example: a link located at www.someweb.com that goes to www.anyotherweb.org will be followed.
Example: a link located at www.example.com that goes to www.anyotherweb.org will be followed.
Warning: this is a potentially dangerous option, limit the recurse depth with r option.
n get non-html files 'near' an html file (ex: an image located outside)

View File

@@ -117,7 +117,7 @@ h4 { margin: 0; font-weight: bold; font-size: 1.18em; }
<li>HTML Footer</li>
<br><small>Enter here the optionnal text that will be included as a comment in each HTML file to make archiving easier
<br>The string entered is generally an HTML comment (<tt>&lt;!-- HTML comment --&gt;</tt>) with optionnal %s, which will be transformed into a specific string information:
<br>%s #1 : host name (for example, www.someweb.com)
<br>%s #1 : host name (for example, www.example.com)
<br>%s #2 : file name (for example, /index.html)
<br>%s #3 : date of the mirror
<br><b>Example</b>: <tt>&lt;!-- Page mirrored from %s, file %s. Archive date: %s --&gt;</tt>

View File

@@ -21,21 +21,21 @@ Luk
Cancel changes
Annullér ændringer
Click to confirm
Klik OK for at godkende
Klik for at bekræfte
Click to get help!
Klik for at få hjælp!
Click to return to previous screen
Klik for at se den forrige skærm
Klik for at gå til den forrige skærm
Click to go to next screen
Klik for at se den næste skærm
Klik for at gå til den næste skærm
Hide password
Skjul adgangskode
Save project
Gem projekt
Close current project?
Vil du lukke det aktuelle projekt ?
Vil du lukke det aktuelle projekt?
Delete this project?
Slette dette projekt ?
Slette dette projekt?
Delete empty project %s?
Vil du slette det tomme projekt med navnet: %s?
Action not yet implemented
@@ -69,7 +69,7 @@ Udeluk link(s)
Include link(s)
Medtag link(s)
Tip: To have ALL GIF files included, use something like +www.someweb.com/*.gif. \n(+*.gif / -*.gif will include/exclude ALL GIFs from ALL sites)
Tip: For at medtage ALLE GIF-filer, så prøv at bruge: +www.eksempel.dk/*.gif. \n(+*.gif / -*.gif inkluderer/ekskluderer ALLE GIF-filer fra alle websteder)
Tip: for at medtage ALLE GIF-filer, så prøv at bruge: +www.eksempel.dk/*.gif. \n(+*.gif / -*.gif inkluderer/ekskluderer ALLE GIF-filer fra ALLE steder)
Save prefs
Gem foretrukne indstillinger
Matching links will be excluded:
@@ -97,7 +97,7 @@ www.eksempel.dk\r\nFinder links der matcher hele understrengen 'www.eksempel.dk'
someweb\r\nWill find any links with matching sub-string such as www.someweb.com/.., www.test.abc/fromsomeweb/index.html, www.test.abc/test/someweb.html etc.
eksempel\r\nFinder ethvert link med matchende understreng, såsom www.eksempel.dk/.., www.test.abc/franogetweb/index.html, www.test.abc/test/eksempel.html osv.
www.test.com/test/someweb.html\r\nWill only find the 'www.test.com/test/someweb.html' file. Note that you have to type the complete path (URL + site path)
www.test.dk/test/eksempel.html\r\nFinder kun 'www.test.dk/test/eksempel.html' file. Bemærk at du skal skrive den fulde sti [URL + webstedsti]
www.test.dk/test/eksempel.html\r\nFinder kun 'www.test.dk/test/eksempel.html' file. Bemærk at du skal skrive den fulde sti [URL + stedsti]
All links will match
Alle links vil matche
Add exclusion filter
@@ -109,13 +109,13 @@ Eksisterende filtre
Cancel changes
Annullér ændringer
Save current preferences as default values
Gem nuværende indstillinger som standardindstillinger
Gem aktuelle præferencer som standardværdier
Click to confirm
Klik for at bekræfte
No log files in %s!
Der findes ingen logfil i %s!
No 'index.html' file in %s!
Der er ingen 'index.html'-fil i %s!
Der er ikke nogen 'index.html'-fil i %s!
Click to quit WinHTTrack Website Copier
Klik for at afslutte WinHTTrack Website Copier
View log files
@@ -123,11 +123,11 @@ Vis logfiler
Browse HTML start page
Se HTML-startside
End of mirror
Kopieringen af websted er afsluttet
Slut på spejlkopiering
View log files
Vis logfiler
Browse Mirrored Website
Gennemse kopi-websted
Gennemse spejlkopieret websted
New project...
Nyt projekt...
View error and warning reports
@@ -179,57 +179,59 @@ Indl
Parsing HTML file (testing links)..
Overfører HTML-fil (tester links)...
Pause - Toggle [Mirror]/[Pause download] to resume operation
Pause - Vælg fra menuen [Kopiér]/[Pause download] for at genoptage overførslen
Pause - Vælg [Spejlkopiér]/[Sæt download på pause] for at genoptage overførslen
Finishing pending transfers - Select [Cancel] to stop now!
Afslutter igangværende overførsler - Vælg Annullér for at afslutte nu!
Afslutter igangværende overførsler - Vælg [Annullér] for at afslutte nu!
scanning
skanner
Waiting for scheduled time..
Venter på planlagt tidspunkt...
Transferring data..
Overfører data...
Connecting to provider
Opretter forbindelse til udbyder
[%d seconds] to go before start of operation
[%d sekunder] inden denne handling starter
Site mirroring in progress [%s, %s bytes]
Websted kopieres nu [%s, %s byte]
Igangværende spejlkopiering af sted [%s, %s byte]
Site mirroring finished!
Kopieringen af websted er afsluttet!
Spejlkopieringen af sted er afsluttet!
A problem occurred during the mirroring operation\n
Der opstod et problem under kopieringen af websted\n
Der opstod et problem under spejlkopieringen\n
\nDuring:\n
\nSamtidigt:\n
\nSee the log file if necessary.\n\nClick FINISH to quit WinHTTrack Website Copier.\n\nThanks for using WinHTTrack!
Se eventuelt logfilen.\n\nKlik AFSLUT for at lukke WinHTTrack Website Copier.\n\nTak for at du brugte WinHTTrack!
Se eventuelt logfilen.\n\nKlik på UDFØR for at afslutte WinHTTrack Website Copier.\n\nTak for at du brugte WinHTTrack!
Mirroring operation complete.\nClick Exit to quit WinHTTrack.\nSee log file(s) if necessary to ensure that everything is OK.\n\nThanks for using WinHTTrack!
Kopiering af websted fuldført.\nKlik OK for at afslutte WinHTTrack.\nSe logfil(erne) for at kontrollere at alt forløb OK.\n\nTak for at du brugte WinHTTrack!\r\n
Spejlkopieringen fuldført.\nKlik på Afslut for at afslutte WinHTTrack.\nSe logfil(erne) for at sikre at alt forløb OK.\n\nTak for at du brugte WinHTTrack!\r\n
* * MIRROR ABORTED! * *\r\nThe current temporary cache is required for any update operation and only contains data downloaded during the present aborted session.\r\nThe former cache might contain more complete information; if you do not want to lose that information, you have to restore it and delete the current cache.\r\n[Note: This can easily be done here by erasing the hts-cache/new.* files]\r\n\r\nDo you think the former cache might contain more complete information, and do you want to restore it?
* * KOPIERINGEN ER AFBRUDT! * *\r\nDen nuværende cache er påkrævet for alle opdaterings operationer og indeholder kun data der er downloadet med den aktuelle afbrudte session.\r\nDen tidligere cache kan indeholde mere fyldestgørende information; hvis du ønsker at bevare den information, skal du gendanne den og slette den aktuelle cache.\r\n[Note: Dette kan nemt gøres ved at slette 'hts-cache/new.* files]\r\n\r\nTror du den tidligere cache-fil muligvis indeholder mere fyldestgørende information, og vil du gendanne denne?
* * SPEJLKOPIERING AFBRUDT! * *\r\nDen aktuelle cache er påkrævet for alle opdaterings operationer og indeholder kun data der er downloadet med den aktuelle afbrudte session.\r\nDen tidligere cache kan indeholde mere fyldestgørende information; hvis du ønsker at bevare den information, skal du gendanne den og slette den aktuelle cache.\r\n[Bemærk: dette kan nemt gøres ved at slette 'hts-cache/new.* files]\r\n\r\nTror du den tidligere cache-fil muligvis indeholder mere fyldestgørende information, og vil du gendanne denne?
* * MIRROR ERROR! * *\r\nHTTrack has detected that the current mirror is empty. If it was an update, the previous mirror has been restored.\r\nReason: the first page(s) either could not be found, or a connection problem occurred.\r\n=> Ensure that the website still exists, and/or check your proxy settings! <=
* * KOPIERINGS FEJL! * *\r\nWinHTTrack har opdaget at den igangværende kopiering er tom. Hvis du var i gang med at opdatere en kopi, vil det tidligere indhold blive gendannet.\r\nMulig årsag: Den første side kunne enten ikke findes eller der opstod et problem med forbindelsen.\r\n=> Kontroller at webstedet findes og/eller kontroller Proxy-indstillingerne! <=
* * SPEJLKOPIERINGS FEJL! * *\r\nWinHTTrack har opdaget at den igangværende spejlkopiering er tom. Hvis du var i gang med at opdatere, vil den tidligere spejlkopiering blive gendannet.\r\nMulig årsag: den første side kunne enten ikke findes eller der opstod et problem med forbindelsen.\r\n=> Kontroller at webstedet findes og/eller kontroller proxy-indstillingerne! <=
\n\nTip: Click [View log file] to see warning or error messages
\n\nTip: Klik [Vis logfiler] for at se advarsels- og fejlmeddelelser
\n\nTip: klik [Vis logfil] for at se advarsels- og fejlmeddelelser
Error deleting a hts-cache/new.* file, please do it manually
Der opstod en fejl i forbindelse med sletningen af hts-cache/new.*filen. Slet venligst filen manuelt.
Do you really want to quit WinHTTrack Website Copier?
Vil du afslutte WinHTTrack Website Copier?
Er du sikker på, at du vil afslutte WinHTTrack Website Copier?
- Mirroring Mode -\n\nEnter address(es) in URL box
- Kopiering af websted -\n\nIndtast webadresse(r) i URL-feltet
- Spejlkopieringstilstand -\n\nIndtast adresse(r) i URL-feltet
- Interactive Wizard Mode (questions) -\n\nEnter address(es) in URL box
- Interaktiv guide-tilstand (spørgsmål) -\n\nIndtast webadresse(r) i URL-feltet
- Interaktiv guide-tilstand (spørgsmål) -\n\nIndtast adresse(r) i URL-feltet
- File Download Mode -\n\nEnter file address(es) in URL box
- Fil-download-tilstand-\n\nIndtast webadresse(r) i URL-feltet
- Fil-download-tilstand-\n\nIndtast adresse(r) i URL-feltet
- Link Testing Mode -\n\nEnter Web address(es) with links to test in URL box
- Links test tilstand-\n\nIndtast webadresse(r) i URL-feltet
- Links test tilstand-\n\nIndtast webadresse(r) med links til test i URL-feltet
- Update Mode -\n\nVerify address(es) in URL box, check parameters if necessary then click on 'NEXT' button
- Opdateringstilstand -\n\nBekræft webadresse(r) i URL-feltet. Kontroller eventuelt dine indstillinger og klik derefter på 'Næste'.
- Opdateringstilstand -\n\nBekræft adresse(r) i URL-feltet. Tjek eventuelt dine indstillinger og klik derefter på 'Næste'.
- Resume Mode (Interrupted Operation) -\n\nVerify address(es) in URL box, check parameters if necessary then click on 'NEXT' button
- Genoptag kopiering (hvis overførslen blev afbrudt) -\n\nBekræft webadresse(r) i URL-feltet. Kontroller eventuelt dine indstillinger og klik derefter på 'Næste'.
- Genoptag kopiering (hvis overførslen blev afbrudt) -\n\nBekræft adresse(r) i URL-feltet. Tjek eventuelt dine indstillinger og klik derefter på 'Næste'.
Log files Path
Stinavn for logfil
Path
Sti
- Links List Mode -\n\nUse URL box to enter address(es) of page(s) containing links to mirror
- Links liste -\n\nBrug URL-feltet til at angive adresse(r) på sider der indeholder links der skal kopieres.
- Links liste -\n\nBrug URL-feltet til at angive adresse(r) på sider der indeholder links som skal spejlkopieres.
New project / Import?
Nyt projekt / Importér?
Choose criterion
@@ -237,7 +239,7 @@ V
Maximum link scanning depth
Maksimal skanningsdybde for links
Enter address(es) here
Indtast webadresse(r) her
Indtast adresse(r) her
Define additional filtering rules
Tilføj yderligere filtreringsregler
Proxy Name (if needed)
@@ -261,31 +263,31 @@ Afslut WinHTTrack Website Copier
About WinHTTrack
Om WinHTTrack
Save current preferences as default values
Gem de nuværende indstillinger som standardindstillinger
Gem de aktuelle præferencer som standardværdier
Click to continue
Klik for at fortsætte
Click to define options
Klik for at definere indstillinger
Klik for at definere valgmuligheder
Click to add a URL
Klik for at tilføje URL
Klik for at tilføje en URL
Load URL(s) from text file
Hent URL(er) fra tekstfil
Indlæs URL(er) fra tekstfil
WinHTTrack preferences (*.opt)|*.opt||
WinHTTrack indstillinger (*.opt)|*.opt||
WinHTTrack-præferencer (*.opt)|*.opt||
Address List text file (*.txt)|*.txt||
Adresseliste-tekstfil (*.txt)|*.txt||
File not found!
Filen blev ikke fundet!
Do you really want to change the project name/path?
Er du sikker på at ændre i projekt/sti-navnet ?
Er du sikker på, at ændre i projekt/sti-navnet?
Load user-default options?
Indlæs brugerdefinerede standardindstillinger?
Indlæs brugerdefinerede valgmuligheder?
Save user-default options?
Gem brugerdefinerede standardindstillinger?
Gem brugerdefinerede valgmuligheder?
Reset all default options?
Nulstil alle standardindstillinger?
Nulstil alle valgmuligheder?
Welcome to WinHTTrack!
Velkommen til WinHTTrack Website Copier!
Velkommen til WinHTTrack!
Action:
Handling:
Max Depth
@@ -293,7 +295,7 @@ Maksimal dybde:
Maximum external depth:
Maksimal ekstern dybde:
Filters (refuse/accept links) :
Filtrerings-regel (udeluk/medtag links) :
Filtrerings-regel (udeluk/medtag links):
Paths
Sti
Save prefs
@@ -301,23 +303,23 @@ Gem indstillinger
Define..
Angiv...
Set options..
Angiv indstillinger...
Angiv valgmuligheder...
Preferences and mirror options:
Indstillinger og muligheder:
Præferencer og spejlkopiering-valgmuligheder:
Project name
Projektnavn
Add a URL...
Tilføj URL...
Web Addresses: (URL)
Webadresse: (URL)
Webadresser: (URL)
Stop WinHTTrack?
Stop WinHTTrack?
No log files in %s!
Der er ingen logfiler i %s!
Der er ikke nogen logfiler i %s!
Pause Download?
Pause kopieringen?
Sæt download på pause?
Stop the mirroring operation
Stop kopiering af websted?
Stop spejlkopieringen?
Minimize to System Tray
Minimér til proceslinjen
Click to skip a link or stop parsing
@@ -345,7 +347,7 @@ Informationer
Files written:
Filer skrevet:
Files updated:
Opdaterede filer:
Filer opdateret:
Errors:
Fejl:
In progress:
@@ -357,9 +359,9 @@ Test alle links p
Try to ferret out all links
Prøv at udvide alle links
Download HTML files first (faster)
Hent HTML-filer først (hurtigere)
Download HTML-filer først (hurtigere)
Choose local site structure
Vælg lokal websted-struktur
Vælg lokal sted-struktur
Set user-defined structure on disk
Sæt brugerdefinerede indstillinger for den lokale struktur
Use a cache for updates and retries
@@ -367,9 +369,11 @@ Brug cache til opdateringer og opdateringsfors
Do not update zero size or user-erased files
Opdater ikke filer med nul-værdi eller filer som brugeren har slettet
Create a Start Page
Opret startside
Opret en startside
Create a word database of all html pages
Opret ord-database fra alle html-sider
Opret en ord-database af alle html-sider
Build a complete RFC822 mail (MHT/EML) archive of the mirror
Byg et komplet RFC822 mail (MHT/EML)-arkiv af spejlkopieringen
Create error logging and report files
Lav fejllog og rapport-filer
Generate DOS 8-3 filenames ONLY
@@ -385,7 +389,7 @@ V
Select global parsing direction
Vælg overordnet overførselsretning
Setup URL rewriting rules for internal links (downloaded ones) and external links (not downloaded ones)
Opret URL-genskrivningsregel for interne links (downloadede links), og eksterne links (ikke downloadede)
Opt URL-genskrivningsregel for interne links (downloadede links), og eksterne links (ikke downloadede)
Max simultaneous connections
Maks.antal samtidige forbindelser
File timeout
@@ -403,11 +407,11 @@ Maksimal st
Maximum size for any single non-HTML file
Maksimal størrelse for ikke-HTML-filer
Maximum amount of bytes to retrieve from the Web
Maksimal antal byte der må hentes på Web
Maksimal antal byte der modtages fra webbet
Make a pause after downloading this amount of bytes
Hold pause efter download af denne mængde byte
Maximum duration time for the mirroring operation
Maksimal varighed for kopieringen af websted
Maksimal varighed for spejlkopieringen
Maximum transfer rate
Maksimal overførselshastighed
Maximum connections/seconds (avoid server overload)
@@ -418,34 +422,40 @@ Browser identity
Browser-identitet
Comment to be placed in each HTML file
Kommentarer der indsættes i alle HTML-filer
Languages accepted by the browser
Sprog som accepteres af browseren
Additional HTTP headers to be sent in each requests
Yderligere HTTP-headere som skal sendes i hver forespørgsel
HTTP referer to be sent for initial URLs
HTTP reference som skal sendes for indledende URL'er
Back to starting page
Tilbage til startsiden
Save current preferences as default values
Gem nuværende indstillinger som standardindstillinger
Gem aktuelle præferencer som standardværdier
Click to continue
Klik for at fortsætte
Click to cancel changes
Klik for at annullere ændringerne
Follow local robots rules on sites
Følg lokale robot-regler på websteder
Følg lokale robot-regler på steder
Links to non-localised external pages will produce error pages
Links til ikke-fundne eksterne sider, vil medføre fejlside(r)
Do not erase obsolete files after update
Slet ikke overflødige filer efter opdatering
Slet ikke forældede filer efter opdatering
Accept cookies?
Acceptér cookies?
Check document type when unknown?
Kontroller dokumenttypen hvis ukendt?
Tjek dokumenttypen hvis ukendt?
Parse java applets to retrieve included files that must be downloaded?
Overfør Java-applets sammen med inkluderede filer der skal downloades?
Store all files in cache instead of HTML only
Gem alle filer i cache fremfor kun HTML ?
Opbevar alle filer i cache fremfor kun HTML?
Log file type (if generated)
Log filtype (hvis genereret)
Maximum mirroring depth from root address
Maksimal kopieringsdybde fra rod-adressen
Maksimal spejlkopieringsdybde fra rod-adressen
Maximum mirroring depth for external/forbidden addresses (0, that is, none, is the default)
Maksimal kopieringsdybde for eksterne/forbudte adresser(0, altså ingen, er standard)
Maksimal spejlkopieringsdybde for eksterne/forbudte adresser(0, altså ingen, er standard)
Create a debugging file
Opret en fejlfindings-fil
Use non-standard requests to get round some server bugs
@@ -465,7 +475,7 @@ Hent ikke-HTML-filer relateret til et link, eksempelvis .ZIP -filer eller billed
Test all links (even forbidden ones)
Test alle links (også forbudte links)
Try to catch all URLs (even in unknown tags/code)
Forsøg at fange alle URL'er (også selvom html-tags eller kode er ukendt)
Forsøg at fange alle URL'er (også i ukendte opmærkninger/kode)
Get HTML files first!
Hent HTML-filer først!
Structure type (how links are saved)
@@ -473,11 +483,13 @@ Angiv struktur (hvordan links skal gemmes)
Use a cache for updates
Brug cache for opdateringer
Do not re-download locally erased files
Hent ikke filer der er slettet lokalt
Download ikke filer igen der er slettet lokalt
Make an index
Opret et indeks
Make a word database
Opret en ord-database
Build a mail archive
Byg et mail-arkiv
Log files
Logfiler
DOS names (8+3)
@@ -493,7 +505,7 @@ S
Global travel mode
Global søgemetode
These options should be modified only exceptionally
Disse indstillinger bør kun ændres undtagelsesvist!
Disse valgmuligheder bør kun ændres undtagelsesvist
Activate Debugging Mode (winhttrack.log)
Aktivér fejlfindingstilstand (winhttrack.log)
Rewrite links: internal / external
@@ -506,6 +518,12 @@ Identity
Identitet
HTML footer
HTML-sidefod
Languages
Languages
Additional HTTP Headers
Yderligere HTTP Headere
Default referer URL
Standard reference URL
N# connections
Antal forbindelser
Abandon host if error
@@ -533,7 +551,7 @@ Maksimal st
Max size of any non-HTML file
Maksimal størrelse for ikke-HTML-filer
Max site size
Maksimal størrelse af websted
Maksimal størrelse af sted
Max time
Maksimal tid
Save prefs
@@ -549,11 +567,11 @@ Slet ikke gamle filer
Accept cookies
Acceptér cookies
Check document type
Kontroller dokumenttypen
Tjek dokumenttypen
Parse java files
Overfør Java-filer
Store ALL files in cache
Gem alle filer i cache
Opbevar alle filer i cache
Tolerant requests (for servers)
Acceptér forespørgsler (for servere)
Update hack (limit re-transfers)
@@ -595,21 +613,21 @@ Proxy
MIME Types
MIME-typer
Do you really want to quit WinHTTrack Website Copier?
Vil du afslutte WinHTTrack Website Copier?
Er du sikker på, at du vil afslutte WinHTTrack Website Copier?
Do not connect to a provider (already connected)
Opret ikke forbindelse til en udbyder (er allerede forbundet)
Do not use remote access connection
Brug ikke en fjernadgangsforbindelse
Schedule the mirroring operation
Planlæg kopieringen
Planlæg spejlkopieringen
Quit WinHTTrack Website Copier
Afslut WinHTTrack Website Copier
Back to starting page
Tilbage til startsiden
Click to start!
Klik for at starte
Klik for at starte!
No saved password for this connection!
Der er ikke gemt en adgangskode for denne forbindelse
Der er ikke gemt en adgangskode for denne forbindelse!
Can not get remote connection settings
Kan ikke hente fjernforbindelsesindstillinger
Select a connection provider
@@ -617,13 +635,13 @@ V
Start
Start
Please adjust connection parameters if necessary,\nthen press FINISH to launch the mirroring operation.
Justér venligst forbindelsesparameterne hvis det er nødvendigt.\nKlik på Udfør for at starte kopieringen.
Justér venligst forbindelsesparameterne hvis det er nødvendigt.\nKlik på UDFØR for at starte spejlkopieringen.
Save settings only, do not launch download now.
Gem indstillingerne, men start ikke download endnu.
On hold
På hold
Transfer scheduled for: (hh/mm/ss)
Overførslen planlagt til: (tt/mm/ss)
Overførsel planlagt til: (tt/mm/ss)
Start
Start
Connect to provider (RAS)
@@ -657,9 +675,9 @@ Ignorer dom
Catch this page only
Gem kun denne side
Mirror site
Kopiér websted
Spejlkopiér sted
Mirror domain
Kopiér domæne
Spejlkopiér domæne
Ignore all
Ignorer alt
Wizard query
@@ -669,7 +687,7 @@ Nej
File
Fil
Options
Indstillinger
Valgmuligheder
Log
Log
Window
@@ -681,7 +699,7 @@ Pause overf
Exit
Afslut
Modify options
Rediger indstillinger
Rediger valgmuligheder
View log
Vis log
View error log
@@ -703,9 +721,9 @@ S&plit
File
Filer
Preferences
Indstillinger
Præferencer
Mirror
Kopiér websted
Spejlkopiér
Log
Log
Window
@@ -715,15 +733,15 @@ Hj
Exit
Afslut
Load default options
Indlæs standardindstillinger
Indlæs standard-valgmuligheder
Save default options
Gem standardindstillinger
Gem standard-valgmuligheder
Reset to default options
Nulstil standardindstillinger
Nulstil standard-valgmuligheder
Load options...
Indlæs indstillinger...
Indlæs valgmuligheder...
Save options as...
Gem indstillinger som...
Gem valgmuligheder som...
Language preference...
Foretrukne sprog...
Contents...
@@ -741,13 +759,13 @@ Gem &som...
&Delete...
&Slet...
&Browse sites...
&Gennemse websteder...
&Gennemse steder...
User-defined structure
Brugerdefineret struktur
%n\tName of file without file type (ex: image)\r\n%N\tName of file including file type (ex: image.gif)\r\n%t\tFile type only (ex: gif)\r\n%p\tPath [without ending /] (ex: /someimages)\r\n%h\tHost name (ex: www.someweb.com)\r\n%M\tMD5 URL (128 bits, 32 ascii bytes)\r\n%Q\tMD5 query string (128 bits, 32 ascii bytes)\r\n%q\tMD5 small query string (16 bits, 4 ascii bytes)\r\n\r\n%s?\tShort name (ex: %sN)
%n\tFilnavn uden type(eks: image)\r\n%N\tHele filnavnet inklusive filtype (eks: billede.gif)\r\n%t\tKun filtype (eks: gif)\r\n%p\tSti [uden endelsen /] (eks: /noglebilleder)\r\n%h\tVærts navn (eks: www.eksempel.dk)\r\n%M\tMD5 URL (128 bit, 32 ascii byte)\r\n%Q\tMD5 forespørgsel streng (128 bit, 32 ascii byte)\r\n%q\tMD5 kort forespørgselsstreng (16 bit, 4 ascii byte)\r\n\r\n%s?\tKort navn (eks: %sN)
Example:\t%h%p/%n%q.%t\n->\t\tc:\\mirror\\www.someweb.com\\someimages\\image.gif
Eksempel:\t%h%p/%n%q.%t\n->\t\tc:\\mirror\\www.eksempel.dk\\noglebilleder\\billede.gif
Eksempel:\t%h%p/%n%q.%t\n->\t\tc:\\spejlkopiering\\www.eksempel.dk\\noglebilleder\\billede.gif
Proxy settings
Proxy-indstillinger
Proxy address:
@@ -777,7 +795,7 @@ V
Click here to select path
Klik her for at vælge en stil
Select or create a new category name, to sort your mirrors in categories
Vælg eller opret et nyt kategorinavn, for at sortere dine kopierede websteder i kategorier
Vælg eller opret et nyt kategorinavn, for at sortere dine spejlkopieringer i kategorier
HTTrack Project Wizard...
HTTrack-projektguide...
New project name:
@@ -813,9 +831,9 @@ Fang URL...
Enter URL address(es) here
Indtast URL-adresse(r) her
Enter site login
Indtast websted-brugernavn
Indtast sted-brugernavn
Enter site password
Indtast websted-adgangskode
Indtast sted-adgangskode
Use this capture tool for links that can only be accessed through forms or javascript code
Brug dette værktøj til at 'fange' links der kun kan opnås adgang til via formularer eller JavaScript-kode
Choose language according to preference
@@ -823,7 +841,7 @@ V
Catch URL!
'Fang' URL!
Please set temporary browser proxy settings to the following values (Copy/Paste Proxy Address and Port).\nThen click on the Form SUBMIT button in your browser page, or click on the specific link you want to capture.
Sæt venligst browserens proxy indstillinger til følgende værdier:(Kopiér/Indsæt proxy-adresse og port).\nKlik på Form SUBMIT knappen på din browser-side, eller klik på specifikke link du ønsker at hente.\r\n\r\n
Sæt venligst browserens proxy indstillinger til følgende værdier:(Kopiér/Indsæt proxy-adresse og port).\nKlik på formularens SUBMIT-knap på din browser-side, eller klik på det specifikke link du ønsker at hente.\r\n\r\n
This will send the desired link from your browser to WinHTTrack.
Dette vil sende det ønskede link fra din browser til WinHTTrack.
ABORT
@@ -843,11 +861,11 @@ Tr
Please drag folders only
Træk kun mapper
Select user-defined structure?
Vælg brugerdefineret struktur ?
Vælg brugerdefineret struktur?
Please ensure that the user-defined-string is correct,\notherwise filenames will be bogus!
Vær sikker på at den brugerdefinerede streng er korrekt\nI modsat fald vil filnavnene være ugyldige!
Vær sikker på, at den brugerdefinerede streng er korrekt\nI modsat fald vil filnavnene være ugyldige!
Do you really want to use a user-defined structure?
Er du sikker på at ville bruge en brugerdefineret struktur ?
Er du sikker på, at ville bruge en brugerdefineret struktur?
Too manu URLs, cannot handle so many links!!
For mange URL' er, WinHTTrack kan ikke håndtere så mange links!!!
Not enough memory, fatal internal error..
@@ -857,7 +875,7 @@ Ukendt handling!
Add this URL?\r\n
Tilføj denne URL?\r\n
Warning: main process is still not responding, cannot add URL(s)..
Advarsel: Processen svarer stadigvæk ikke ,URL'en kan ikke tilføjes...
Advarsel: hovedprocessen svarer stadigvæk ikke, URL'en kan ikke tilføjes...
Type/MIME associations
Type/MIME-tilknytning
File types:
@@ -879,19 +897,19 @@ Frys vindue
More information:
Mere information
Welcome to WinHTTrack Website Copier!\n\nPlease click on the NEXT button to\n\n- start a new project\n- or resume a partial download
Velkommen til WinHTTrack Website Copier!\n\nKlik på Næste for at for at\n\n-starte et nyt projekt\n-eller genoptage et delvist download.
Velkommen til WinHTTrack Website Copier!\n\nKlik på Næste for at for at\n\n- starte et nyt projekt\n- eller genoptage et delvist download.
File names with extension:\nFile names containing:\nThis file name:\nFolder names containing:\nThis folder name:\nLinks on this domain:\nLinks on domains containing:\nLinks from this host:\nLinks containing:\nThis link:\nALL LINKS
Filnavne med 'efternavn':\nFilnavne der indeholder:\nDette filnavn:\nMappenavne der indeholder:\nDette mappenavn:\nLinks på dette domæne:\nLinks på dette domæne der indeholder:\nLinks fra denne vært:\nLinks der indeholder:\nDette Link:\nAlle Links*/
Show all\nHide debug\nHide infos\nHide debug and infos
Vis alle\nSkjul fejlfinding\nSkjul information\nSkjul fejlfinding og information
Site-structure (default)\nHtml in web/, images/other files in web/images/\nHtml in web/html, images/other in web/images\nHtml in web/, images/other in web/\nHtml in web/, images/other in web/xxx, where xxx is the file extension\nHtml in web/html, images/other in web/xxx\nSite-structure, without www.domain.xxx/\nHtml in site_name/, images/other files in site_name/images/\nHtml in site_name/html, images/other in site_name/images\nHtml in site_name/, images/other in site_name/\nHtml in site_name/, images/other in site_name/xxx\nHtml in site_name/html, images/other in site_name/xxx\nAll files in web/, with random names (gadget !)\nAll files in site_name/, with random names (gadget !)\nUser-defined structure..
Websted-struktur (standard)\nHtml i web/, images/other-filer i web/images/\nHtml i web/html, images/other i web/images\nHtml i web/, images/other i web/\nHtml i web/, images/other i web/xxx, hvor xxx er filendelsen\nHtml i web/html, images/other i web/xxx\nWebsted-struktur, uden www.domæne.xxx/\nHtml i webstednavn/, images/other-filer i webstednavn/images/\nHtml i webstednavn/html, images/other i webstednavn/images\nHtml i webstednavn/, images/other i webstednavn/\nHtml i webstednavn/, images/other i webstednavn/xxx\nHtml i webstednavn/html, images/other i webstednavn/xxx\nAlle filer in web/, med tilfældige navne (gadget !)\nAlle filer i webstednavn/, med tilfældige navne (gadget !)\nBrugerdefineret struktur...
Sted-struktur (standard)\nHtml i web/, images/other-filer i web/images/\nHtml i web/html, images/other i web/images\nHtml i web/, images/other i web/\nHtml i web/, images/other i web/xxx, hvor xxx er filendelsen\nHtml i web/html, images/other i web/xxx\nWebsted-struktur, uden www.domæne.xxx/\nHtml i webstednavn/, images/other-filer i webstednavn/images/\nHtml i webstednavn/html, images/other i webstednavn/images\nHtml i webstednavn/, images/other i webstednavn/\nHtml i webstednavn/, images/other i webstednavn/xxx\nHtml i webstednavn/html, images/other i webstednavn/xxx\nAlle filer in web/, med tilfældige navne (gadget !)\nAlle filer i webstednavn/, med tilfældige navne (gadget !)\nBrugerdefineret struktur...
Just scan\nStore html files\nStore non html files\nStore all files (default)\nStore html files first
ust skan\nGem html-filer\nGem ikke-html-filer\nGem alle filer (standard)\nGem html-filer først
ust skan\nOpbevar html-filer\nGem ikke-html-filer\nGem alle filer (standard)\nGem html-filer først
Stay in the same directory\nCan go down (default)\nCan go up\nCan both go up & down
Bliv i det samme bibliotek\nKan gå ned (standard]\nKan gå op\nKan gå både op og ned
Stay on the same address (default)\nStay on the same domain\nStay on the same top level domain\nGo everywhere on the web
Bliv på den samme adresse[standard]\nBliv på samme domæne\nBliv på samme top level domæne\n Gå overalt på internettet.
Bliv på den samme adresse (standard)\nBliv på det samme domæne\nBliv på det samme top-level-domæne\nGå overalt på webbet.
Never\nIf unknown (except /)\nIf unknown
Aldrig\nUkendt (undtaget /]\nhvis ukendt
no robots.txt rules\nrobots.txt except wizard\nfollow robots.txt rules
@@ -899,7 +917,7 @@ Ingen robots.txt-regler\nrobots.txt med undtagelse af guiden\nf
normal\nextended\ndebug
Normal\nUdvidet\nFejlfinding
Download web site(s)\nDownload web site(s) + questions\nGet individual files\nDownload all sites in pages (multiple mirror)\nTest links in pages (bookmark test)\n* Continue interrupted download\n* Update existing download
Download websted(er)\nDownload websted(er) + spørgsmål\nHent enkelte filer\nDownload alle websteder på sider (flere kopieret websteder)\nTest links på siderne (bogmærke test)\n* Fortsæt afbrudt projekt\n* Opdater tidligere projekt
Download websted(er)\nDownload websted(er) + spørgsmål\nHent enkelte filer\nDownload alle steder på sider (flere spejlkopiering)\nTest links på siderne (bogmærke test)\n* Fortsæt afbrudt projekt\n* Opdater tidligere projekt
Relative URI / Absolute URL (default)\nAbsolute URL / Absolute URL\nAbsolute URI / Absolute URL\nOriginal URL / Original URL
Relativ URL / absolut URL (standard)\nAbsolut URL / absolut URL\nAbsolut URL / absolut URL\nOriginal URL / original URL
Open Source offline browser
@@ -927,4 +945,34 @@ Du kan nu lukke vinduet
Server terminated
Server lukket
A fatal error has occurred during this mirror
Det opstod en fatal fejl under kopieringen
Det opstod en fatal fejl under denne spejlkopiering
View Documentation
Vis dokumentation
Go To HTTrack Website
Gå til HTTrack website
Go To HTTrack Forum
Gå til HTTrack forum
View License
Vis licens
Beware: you local browser might be unable to browse files with embedded filenames
OBS: din lokale browser er måske ikke i stand til at browse filer med indlejrede filnavne
Recreated HTTrack internal cached resources
Genskabte HTTrack internt mellemlagret ressourcer
Could not create internal cached resources
Kunne ikke oprette internt mellemlagret ressourcer
Could not get the system external storage directory
Kunne ikke hente systemets eksterne lagringsmappe
Could not write to:
Kunne ikke skrive til:
Read-only media (SDCARD)
Skrivebeskyttet medie (SDCARD)
No storage media (SDCARD)
Intet lagringsmedie (SDCARD)
HTTrack may not be able to download websites until this problem is fixed
HTTrack er måske ikke i stand til at downloade websteder før dette problem er rettet
HTTrack: mirror '%s' stopped!
HTTrack: spejlkopiering '%s' stoppet!
Click on this notification to restart the interrupted mirror
Klik på denne notifikation for at genstarte den afbrudte spejlkopiering
HTTrack: could not save profile for '%s'!
HTTrack: kunne ikke gemme profil for '%s'!

View File

@@ -23,7 +23,7 @@ Cancel changes
Click to confirm
Ïîäòâåðäèòü
Click to get help!
Ïîëó÷èòü ñïðàâêó
Ñïðàâêà
Click to return to previous screen
Âåðíóòüñÿ íàçàä
Click to go to next screen
@@ -39,9 +39,9 @@ Delete this project?
Delete empty project %s?
Óäàëèòü ïóñòîé ïðîåêò %s?
Action not yet implemented
Ïîêà íå ðåàëèçîâàíî
Äåéñòâèå íå ðåàëèçîâàíî
Error deleting this project
Îøèáêà óäàëåíèÿ ïðîåêòà
Îøèáêà óäàëåíèÿ ýòîãî ïðîåêòà
Select a rule for the filter
Âûáðàòü òèï ôèëüòðà
Enter keywords for the filter
@@ -51,11 +51,11 @@ Cancel
Add this rule
Äîáàâèòü ýòî óñëîâèå
Please enter one or several keyword(s) for the rule
Ââåäèòå çíà÷åíèÿ óñëîâèé ôèëüòðà
Ââåäèòå îäíî èëè íåñêîëüêî çíà÷åíèé óñëîâèé ôèëüòðà
Add Scan Rule
Äîáàâèòü ôèëüòð
Criterion
Âûáðàòü òèï:
Âûáðàòü êðèòåðèè:
String
Ââåñòè çíà÷åíèå:
Add
@@ -63,19 +63,19 @@ Add
Scan Rules
Ôèëüòðû
Use wildcards to exclude or include URLs or links.\nYou can put several scan strings on the same line.\nUse spaces as separators.\n\nExample: +*.zip -www.*.com -www.*.edu/cgi-bin/*.cgi
Èñïîëüçóÿ ìàñêè âû ìîæåòå èñêëþ÷èòü/âêëþ÷èòü ñðàçó íåñêîëüêî àäðåñîâ\nÊàê ðàçäåëèòåëü ôèëüòðîâ èñïîëüçóéòå çàïÿòûå èëè ïðîáåëû.\nÏðèìåð: +*.zip -www.*.com,-www.*.edu/cgi-bin/*.cgi
Èñïîëüçóÿ ìàñêè âû ìîæåòå èñêëþ÷èòü/âêëþ÷èòü ñðàçó íåñêîëüêî àäðåñîâ èëè ññûëîê.\nÊàê ðàçäåëèòåëü ôèëüòðîâ èñïîëüçóéòå çàïÿòûå èëè ïðîáåëû.\nÏðèìåð: +*.zip -www.*.com,-www.*.edu/cgi-bin/*.cgi
Exclude links
Èñêëþ÷èòü...
Èñêëþ÷èòü
Include link(s)
Âêëþ÷èòü...
Âêëþ÷èòü
Tip: To have ALL GIF files included, use something like +www.someweb.com/*.gif. \n(+*.gif / -*.gif will include/exclude ALL GIFs from ALL sites)
Ñîâåò: Åñëè âû õîòèòå ñêà÷àòü âñå gif-ôàéëû, èñïîëüçóéòå, íàïðèìåð, òàêîé ôèëüòð +www.someweb.com/*.gif. \n(+*.gif / -*.gif ðàçðåøàåò/çàïðåùàåò äëÿ ñêà÷èâàíèÿ ÂÑÅ gif-ôàéëû íà ÂÑÅÕ ñàéòàõ)
Ñîâåò: Êàê ïðèìåð åñëè âû õîòèòå ñêà÷àòü âñå âêëþ÷åííûå gif-ôàéëû, èñïîëüçóéòå òàêîé ôèëüòð +www.someweb.com/*.gif. \n(+*.gif / -*.gif ðàçðåøàåò/çàïðåùàåò äëÿ ñêà÷èâàíèÿ ÂÑÅ gif-ôàéëû íà ÂÑÅÕ ñàéòàõ)
Save prefs
Ñîõðàíèòü íàñòðîéêè
Matching links will be excluded:
Ëèíêè, óäîâëåòâîðÿþùèå ýòîìó óñëîâèþ áóäóò èñêëþ÷åíû:
Ññûëêè ïîäõîäÿùèå ïîä ýòî óñëîâèå áóäóò èñêëþ÷åíû:
Matching links will be included:
Ëèíêè, óäîâëåòâîðÿþùèå ýòîìó óñëîâèþ áóäóò âêëþ÷åíû:
Ññûëêè ïîäõîäÿùèå ïîä ýòî óñëîâèå áóäóò âêëþ÷åíû:
Example:
Ïðèìåð:
gif\r\nWill match all GIF files
@@ -83,9 +83,9 @@ gif\r\n
blue\r\nWill find all files with a matching 'blue' sub-string such as 'bluesky-small.jpeg'
blue\r\nÎòëîâèò âñå ôàéëû, ñîäåðæàùèå â èìåíè ïîäñòðîêó 'blue', íàïðèìåð 'bluesky-small.jpeg'
bigfile.mov\r\nWill match the file 'bigfile.mov', but not 'bigfile2.mov'
bigfile.mov\r\nÎòëîâèò ôàéë 'bigfile.mov', íî, â òîæå âðåìÿ, ïðîïóñòèò ôàéë 'bigfile2.mov'
bigfile.mov\r\nÎòëîâèò ôàéë 'bigfile.mov', íî, â òî æå âðåìÿ, ïðîïóñòèò ôàéë 'bigfile2.mov'
cgi\r\nWill find links with folder name matching sub-string 'cgi' such as /cgi-bin/somecgi.cgi
cgi\r\nÎòëîâèò àäðåñà, ñîäåðæàùèå êàòàëîãè ñ ïîäñòðîêîé 'cgi', òàêèå êàê /cgi-bin/somecgi.cgi
cgi\r\nÎòëîâèò àäðåñà, ñîäåðæàùèå êàòàëîãè ñ ïîäñòðîêîé 'cgi', òàêèå, êàê /cgi-bin/somecgi.cgi
cgi-bin\r\nWill find links with folder name matching whole 'cgi-bin' string (but not cgi-bin-2, for example)
cgi-bin\r\nÎòëîâèò àäðåñà, ñîäåðæàùèå êàòàëîã 'cgi-bin' (íî íå cgi-bin-2, íàïðèìåð)
someweb.com\r\nWill find links with matching sub-string such as www.someweb.com, private.someweb.com etc.
@@ -109,7 +109,7 @@ Existing filters
Cancel changes
Îòìåíèòü èçìåíåíèÿ
Save current preferences as default values
Ñîõðàíèòü òåêóùèå íàñòðîéêè êàê çíà÷åíèÿ ïî óìîë÷àíèþ
Ñîõðàíèòü òåêóùèå èçìåíåíèÿ êàê ïî óìîë÷àíèþ
Click to confirm
Ïîäòâåðäèòü
No log files in %s!
@@ -117,7 +117,7 @@ No log files in %s!
No 'index.html' file in %s!
Îòñóòñòâóåò ôàéë index.html â %s!
Click to quit WinHTTrack Website Copier
Âûéòè èç ïðîãðàììû WinHTTrack Website Copier
Âûéòè èç ïðîãðàììû
View log files
Ïðîñìîòð ëîã ôàéëîâ
Browse HTML start page
@@ -181,7 +181,7 @@ Parsing HTML file (testing links)..
Pause - Toggle [Mirror]/[Pause download] to resume operation
Îñòàíîâëåíî (äëÿ ïðîäîëæåíèÿ âûáåðèòå [Çåðêàëî]/[Ïðèîñòàíîâèòü çàêà÷êó])
Finishing pending transfers - Select [Cancel] to stop now!
Çàâåðøàþòñÿ îòëîæåííûå çàêà÷êè - ÷òîáû ïðåðâàòü, íàæìèòå Cancel!
Çàâåðøàþòñÿ îòëîæåííûå çàêà÷êè ÷òîáû ïðåðâàòü, íàæìèòå Cancel!
scanning
ñêàíèðóåì
Waiting for scheduled time..
@@ -205,11 +205,11 @@ Mirroring operation complete.\nClick Exit to quit WinHTTrack.\nSee log file(s) i
* * MIRROR ABORTED! * *\r\nThe current temporary cache is required for any update operation and only contains data downloaded during the present aborted session.\r\nThe former cache might contain more complete information; if you do not want to lose that information, you have to restore it and delete the current cache.\r\n[Note: This can easily be done here by erasing the hts-cache/new.* files]\r\n\r\nDo you think the former cache might contain more complete information, and do you want to restore it?
* * ÇÀÊÀ×ÊÀ ÏÐÅÐÂÀÍÀ! * *\r\nÂðåìåííûé êýø, ñîçäàííûé âî âðåìÿ òåêóùåé ñåññèé, ñîäåðæèò äàííûå, çàãðóæåííûå òîëüêî âî âðåìÿ äàííîé ñåññèè è ïîòðåáóåòñÿ òîëüêî â ñëó÷àå âîçîáíîâëåíèÿ çàêà÷êè.\r\nÎäíàêî, ïðåäûäóùèé êýø ìîæåò ñîäåðæàòü áîëåå ïîëíóþ èíôîðìàöèþ. Åñëè âû íå õîòèòå ïîòåðÿòü ýòè äàííûå, âàì íóæíî óäàëèòü òåêóùèé êýø è âîçîáíîâèòü ïðåäûäóùèé.\r\n(Ýòî ìîæíî ëåãêî ñäåëàòü ïðÿìî çäåñü, óäàëèâ ôàéëû hts-cache/new.]\r\n\r\nÑ÷èòàåòå ëè âû, ÷òî ïðåäûäóùèé êýø ìîæåò ñîäåðæàòü áîëåå ïîëíóþ èíôîðìàöèþ, è õîòèòå ëè âû âîññòàíîâèòü åãî?
* * MIRROR ERROR! * *\r\nHTTrack has detected that the current mirror is empty. If it was an update, the previous mirror has been restored.\r\nReason: the first page(s) either could not be found, or a connection problem occurred.\r\n=> Ensure that the website still exists, and/or check your proxy settings! <=
* * ÎØÈÁÊÀ! * *\r\nÒåêóùåå çåðêàëî - ïóñòî. Åñëè ýòî áûëî îáíîâëåíèå, ïðåäûäóùàÿ âåðñèÿ çåðêàëà âîññòàíîâëåíà.\r\nÏðè÷èíà: ïåðâàÿ ñòðàíèöà(û) èëè íå íàéäåíà, èëè áûëè ïðîáëåìû ñ ñîåäèíåíèåì.\r\n=> Óáåäèòåñü, ÷òî âåáñàéò âñå åùå ñóùåñòâóåò, è/èëè ïðîâåðüòå óñòàíîâêè ïðîêñè-ñåðâåðà! <=
* * ÎØÈÁÊÀ! * *\r\nÒåêóùåå çåðêàëî ïóñòî. Åñëè ýòî áûëî îáíîâëåíèå, ïðåäûäóùàÿ âåðñèÿ çåðêàëà âîññòàíîâëåíà.\r\nÏðè÷èíà: ïåðâàÿ ñòðàíèöà(û) èëè íå íàéäåíà, èëè áûëè ïðîáëåìû ñ ñîåäèíåíèåì.\r\n=> Óáåäèòåñü, ÷òî âåáñàéò âñå åùå ñóùåñòâóåò, è/èëè ïðîâåðüòå óñòàíîâêè ïðîêñè-ñåðâåðà! <=
\n\nTip: Click [View log file] to see warning or error messages
\nÏîäñêàçêà: Äëÿ ïðîñìîòðà ñîîáùåíèé îá îøèáêàõ è ïðåäóïðåæäåíèé íàæìèòå [Ïðîñìîòð ëîã ôàéëà]
Error deleting a hts-cache/new.* file, please do it manually
Îøèáêà óäàëåíèÿ ôàéëà hts-cache/new.* , ïîæàëóéñòà, óäàëèòå åãî âðó÷íóþ.\r\n
Îøèáêà óäàëåíèÿ ôàéëà hts-cache/new.*\r\nÏîæàëóéñòà, óäàëèòå ôàéë âðó÷íóþ.\r\n
Do you really want to quit WinHTTrack Website Copier?
Âû äåéñòâèòåëüíî õîòèòå âûéòè èç WinHTTrack?
- Mirroring Mode -\n\nEnter address(es) in URL box
@@ -319,7 +319,7 @@ Pause Download?
Stop the mirroring operation
Ïðåðâàòü çàêà÷êó
Minimize to System Tray
Ñïðÿòàòü â ñèñòåìíûé òðýé
Ñïðÿòàòü â ñèñòåìíûé òðåé
Click to skip a link or stop parsing
Ïðîïóñòèòü ëèíê èëè ïðåðâàòü àíàëèç ôàéëà
Click to skip a link
@@ -327,7 +327,7 @@ Click to skip a link
Bytes saved
Ñîõðàíåíî áàéò:
Links scanned
Ïðîñêàíèðîâàíî ëèíêîâ:
Ïðîñêàíèðîâàíî ññûëîê:
Time:
Âðåìÿ:
Connections:
@@ -363,7 +363,7 @@ Choose local site structure
Set user-defined structure on disk
Óñòàíîâèòü çàäàííóþ ëîêàëüíóþ ñòðóêòóðó ñàéòà
Use a cache for updates and retries
Èñïîëüçîâàòü êýø äëÿ îáíîâëåíèÿ è äîêà÷êè
Èñïîëüçîâàòü êýø äëÿ îáíîâëåíèÿ è ïîâòîðîâ ñêà÷èâàíèÿ
Do not update zero size or user-erased files
Íå êà÷àòü ôàéëû, êîòîðûå áûëè îäíàæäû ñêà÷àíû, äàæå åñëè îíè íóëåâîé äëèíû èëè óäàëåíû
Create a Start Page
@@ -407,7 +407,7 @@ Maximum amount of bytes to retrieve from the Web
Make a pause after downloading this amount of bytes
Ïîñëå çàãðóçêè óêàçàííîãî ÷èñëà áàéòîâ, ñäåëàòü ïàóçó
Maximum duration time for the mirroring operation
Ìàêñ. ïðîäîëæèòåëüíîñòü çåðêàëèçàöèè
Ìàêñ. ïðîäîëæèòåëüíîñòü ïðîöåññà ñîçäàíèÿ çåðêàë
Maximum transfer rate
Ìàêñ. ñêîðîñòü çàêà÷êè
Maximum connections/seconds (avoid server overload)
@@ -445,7 +445,7 @@ Log file type (if generated)
Maximum mirroring depth from root address
Ìàêñ. ãëóáèíà ñîçäàíèÿ çåðêàëà îò íà÷àëüíîãî àäðåñà
Maximum mirroring depth for external/forbidden addresses (0, that is, none, is the default)
Ìàêñèìàëüíàÿ ãëóáèíà çàêà÷êè äëÿ âíåøíèõ/çàïðåùåííûõ àäðåñîâ (0, ò.å., íåò îãðàíè÷åíèé, ýòî çíà÷åíèå ïîóìîë÷àíèþ)
Ìàêñèìàëüíàÿ ãëóáèíà çàêà÷êè äëÿ âíåøíèõ/çàïðåùåííûõ àäðåñîâ (0, ò.å., íåò îãðàíè÷åíèé, ýòî çíà÷åíèå ïî óìîë÷àíèþ)
Create a debugging file
Ñîçäàòü ôàéë ñ îòëàäî÷íîé èíôîðìàöèåé
Use non-standard requests to get round some server bugs
@@ -453,19 +453,19 @@ Use non-standard requests to get round some server bugs
Use old HTTP/1.0 requests (limits engine power!)
Èñïîëüçîâàòü ñòàðûé ïðîòîêîë HTTP/1.0 (îãðàíè÷èò âîçìîæíîñòè ïðîãðàììû!)
Attempt to limit retransfers through several tricks (file size test..)
Ïîïûòêà îãðàíè÷èòü ïåðåêà÷êó èñïîëüóÿ íåêîòîðûå ïðèåìû (òåñò íà ðàçìåð ôàéëà..)
Ïîïûòêà îãðàíè÷èòü ïåðåêà÷êó èñïîëüçóÿ íåêîòîðûå ïðèåìû (òåñò íà ðàçìåð ôàéëà..)
Attempt to limit the number of links by skipping similar URLs (www.foo.com==foo.com, http=https ..)
Îãðàíè÷èòü ÷èñëî ëèíêîâ, óäàëÿÿ àíàëîãè÷íûå ëèíêè (www.foo.com==foo.com, http=https ..)
Write external links without login/password
Ñîõðàíÿòü âíåøíèå ëèíêè áåç ëîãèíà/ïàðîëÿ
Write internal links without query string
Ñîõðàíÿòü âíóòðåííèå ëèíêè óñå÷åííî (äî çàíàêà ?)
Ñîõðàíÿòü âíóòðåííèå ëèíêè óñå÷åííî (äî çíàêà ?)
Get non-HTML files related to a link, eg external .ZIP or pictures
Êà÷àòü íå-html ôàéëû âáëèçè ññûëêè (íàïð.: âíåøíèå .ZIP èëè ãðàô. ôàéëû)
Test all links (even forbidden ones)
Ïðîâåðÿòü âñå ëèíêè (äàæå çàïðåùåííûå ê çàêà÷êå)
Try to catch all URLs (even in unknown tags/code)
Ñòàðàòüñÿ îïðåäåëÿòü âñå URL'û (äàæå â íåîïîçíàííûõ òýãàõ/ñêðèïòàõ)
Ñòàðàòüñÿ îïðåäåëÿòü âñå URL'û (äàæå â íåîïîçíàííûõ òåãàõ/ñêðèïòàõ)
Get HTML files first!
Ïîëó÷èòü âíà÷àëå HTML ôàéëû!
Structure type (how links are saved)
@@ -599,7 +599,7 @@ Do you really want to quit WinHTTrack Website Copier?
Do not connect to a provider (already connected)
Íå ñîåäèíÿòüñÿ ñ ïðîâàéäåðîì (ñîåäèíåíèå óæå óñòàíîâëåíî)
Do not use remote access connection
Íå èñïîüçîâàòü óäàëåííîé ñîåäèíåíèÿ
Íå èñïîëüçîâàòü óäàëåííîé ñîåäèíåíèÿ
Schedule the mirroring operation
Çàêà÷êà ïî ðàñïèñàíèþ
Quit WinHTTrack Website Copier
@@ -633,9 +633,9 @@ Connect to this provider
Disconnect when finished
Îòñîåäèíèòüñÿ ïðè çàâåðøåíèè
Disconnect modem on completion
Îòñîåäåíèòü ïðè çàâåðøåíèè
Îòñîåäèíèòü ïðè çàâåðøåíèè
\r\n(Please notify us of any bug or problem)\r\n\r\nDevelopment:\r\nInterface (Windows): Xavier Roche\r\nSpider: Xavier Roche\r\nJavaParserClasses: Yann Philippot\r\n\r\n(C)1998-2003 Xavier Roche and other contributors\r\nMANY THANKS for translation tips to:\r\nRobert Lagadec (rlagadec@yahoo.fr)
\r\n(Ñîîáùèòå íàì ïîæàëóéñòà î çàìå÷åííûõ ïðîáëåìàõ è îøèáêàõ)\r\n\r\nÐàçðàáîòêà:\r\nÈíòåðôåéñ (Windows): Xavier Roche\r\nÊà÷àëêà (spider): Xavier Roche\r\nÏàðñåð ÿâà-êëàññîâ: Yann Philippot\r\n\r\n(C)1998-2003 Xavier Roche and other contributors\r\nMANY THANKS for Russian translations to:\r\nAndrei Iliev (andreiiliev@mail.ru)
\r\n(Ñîîáùèòå íàì, ïîæàëóéñòà, î çàìå÷åííûõ ïðîáëåìàõ è îøèáêàõ)\r\n\r\nÐàçðàáîòêà:\r\nÈíòåðôåéñ (Windows): Xavier Roche\r\nÊà÷àëêà (spider): Xavier Roche\r\nÏàðñåð ÿâà-êëàññîâ: Yann Philippot\r\n\r\n(C)1998-2003 Xavier Roche and other contributors\r\nMANY THANKS for Russian translations to:\r\nAndrei Iliev (andreiiliev@mail.ru)
About WinHTTrack Website Copier
Î ïðîãðàììå WinHTTrack Website Copier
Please visit our Web page
@@ -657,9 +657,9 @@ Ignore domain
Catch this page only
Ñêà÷àòü òîëüêî ýòó ñòðàíè÷êó
Mirror site
Çåðêàëèçîâàòü ñàéò
Ñäåëàòü çåðêàëî ñàéòó
Mirror domain
Çåðêàëèçîâàòü äîìåí
Ñäåëàòü çåðêàëî äîìåíó
Ignore all
Èãíîðèðîâàòü âñå
Wizard query
@@ -693,9 +693,9 @@ Hide
About WinHTTrack Website Copier
Î ïðîãðàììå...
Check program updates...
Ïðîâåðèòü íàëè÷èå îáíîâëåííèé ïðîãðàììû...
Ïðîâåðèòü íàëè÷èå îáíîâëåíèé ïðîãðàììû...
&Toolbar
Ïàíåëü èíñòðóïåíòîâ
Ïàíåëü èíñòðóìåíòîâ
&Status Bar
Ïàíåëü ñîñòîÿíèÿ
S&plit

View File

@@ -7,7 +7,7 @@ uk
LANGUAGE_AUTHOR
Andrij Shevchuk (http://programy.com.ua, http://vic-info.com.ua) \r\n
LANGUAGE_CHARSET
ISO-8859-5
windows-1251
LANGUAGE_WINDOWSID
Ukrainian
OK

View File

@@ -14,9 +14,13 @@ AM_CPPFLAGS = \
-DLIBDIR=\""$(libdir)"\"
AM_CPPFLAGS += -I../src
# The callback examples reference libc only through libhttrack, so the direct
# libc edge gets dropped from DT_NEEDED (library-not-linked-against-libc).
# Force libc to be recorded as a dependency.
AM_LDFLAGS = \
@DEFAULT_LDFLAGS@ \
-L../src
-L../src \
-Wl,--push-state,--no-as-needed,-lc,--pop-state
# Examples
libbaselinks_la_SOURCES = callbacks-example-baselinks.c

View File

@@ -13,3 +13,9 @@ regen-man: makeman.sh $(top_builddir)/src/httrack$(EXEEXT)
README='$(top_srcdir)/README' $(SHELL) $(srcdir)/makeman.sh \
'$(top_builddir)/src/httrack$(EXEEXT)' > $(srcdir)/httrack.1
.PHONY: regen-man
# Render html/httrack.man.html from httrack.1. Needs the groff html device
# (Debian: full "groff" package, not "groff-base"). Run by hand: make -C man regen-man-html
regen-man-html: httrack.1
groff -t -man -Thtml $(srcdir)/httrack.1 > $(top_srcdir)/html/httrack.man.html
.PHONY: regen-man-html

View File

@@ -551,6 +551,12 @@ regen-man: makeman.sh $(top_builddir)/src/httrack$(EXEEXT)
'$(top_builddir)/src/httrack$(EXEEXT)' > $(srcdir)/httrack.1
.PHONY: regen-man
# Render html/httrack.man.html from httrack.1. Needs the groff html device
# (Debian: full "groff" package, not "groff-base"). Run by hand: make -C man regen-man-html
regen-man-html: httrack.1
groff -t -man -Thtml $(srcdir)/httrack.1 > $(top_srcdir)/html/httrack.man.html
.PHONY: regen-man-html
# Tell versions [3.59,3.63) of GNU make to not export all variables.
# Otherwise a system limit (for SysV at least) may be exceeded.
.NOEXPORT:

View File

@@ -2,7 +2,7 @@
.\" groff -man -Tascii httrack.1
.\"
.\" This file is generated by man/makeman.sh; do not edit by hand.
.TH httrack 1 "07 June 2026" "httrack website copier"
.TH httrack 1 "13 June 2026" "httrack website copier"
.SH NAME
httrack \- offline browser : copy websites to a local directory
.SH SYNOPSIS
@@ -98,15 +98,15 @@ httrack \- offline browser : copy websites to a local directory
allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads.
.SH EXAMPLES
.TP
.B httrack www.someweb.com/bob/
mirror site www.someweb.com/bob/ and only this site
.B httrack www.example.com/bob/
mirror site www.example.com/bob/ and only this site
.TP
.B httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg \-mime:application/*
.B httrack www.example.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg \-mime:application/*
mirror the two sites together (with shared links) and accept any .jpg files on .com sites
.TP
.B httrack www.someweb.com/bob/bobby.html +* \-r6
.B httrack www.example.com/bob/bobby.html +* \-r6
.TP
.B httrack www.someweb.com/bob/bobby.html \-\-spider \-P proxy.myhost.com:8080
.B httrack www.example.com/bob/bobby.html \-\-spider \-P proxy.myhost.com:8080
.TP
.B httrack \-\-update
.TP
@@ -244,7 +244,7 @@ from email address sent in HTTP headers (\-\-from <param>)
.IP \-%F
footer string in Html code (\-%F "Mirrored [from host %s [file %s [at %s]]]" (\-\-footer <param>)
.IP \-%l
preffered language (\-%l "fr, en, jp, *" (\-\-language <param>)
preferred language (\-%l "fr, en, jp, *" (\-\-language <param>)
.IP \-%a
accepted formats (\-%a "text/html,image/png;q=0.9,*/*;q=0.1" (\-\-accept <param>)
.IP \-%X
@@ -411,7 +411,7 @@ File type (ex: gif)
.IP \-%p
Path [without ending /] (ex: /someimages)
.IP \-%h
Host name (ex: www.someweb.com)
Host name (ex: www.example.com)
.IP \-%M
URL MD5 (128 bits, 32 ascii bytes)
.IP \-%Q

View File

@@ -83,7 +83,10 @@ libhttrack_la_LDFLAGS = $(AM_LDFLAGS) -version-info $(VERSION_INFO)
libhtsjava_la_SOURCES = htsjava.c htsjava.h
libhtsjava_la_LIBADD = $(THREADS_LIBS) $(DL_LIBS) libhttrack.la
libhtsjava_la_LDFLAGS = $(AM_LDFLAGS) -version-info $(VERSION_INFO)
# This thin JNI wrapper reaches libc only through libhttrack, so the direct
# libc edge is dropped from DT_NEEDED (library-not-linked-against-libc). Force
# libc to be recorded as a dependency.
libhtsjava_la_LDFLAGS = $(AM_LDFLAGS) -version-info $(VERSION_INFO) -Wl,--push-state,--no-as-needed,-lc,--pop-state
EXTRA_DIST = httrack.h webhttrack \
coucal/murmurhash3.h.diff \

View File

@@ -266,13 +266,18 @@ const char *hts_optalias[][4] = {
return value: number of arguments treated (0 if error)
*/
int optalias_check(int argc, const char *const *argv, int n_arg,
int *return_argc, char **return_argv, char *return_error) {
int *return_argc, char **return_argv,
size_t return_argv_size, char *return_error,
size_t return_error_size) {
return_error[0] = '\0';
*return_argc = 1;
if (argv[n_arg][0] == '-')
if (argv[n_arg][1] == '-') {
char command[1000];
char param[1000];
/* sized to HTS_CDLMAXSIZE: a long-form option value (--user-agent,
--headers, ...) is copied into param, and the value is bounded by the
general per-argument check in htscoremain.c (HTS_CDLMAXSIZE) */
char command[HTS_CDLMAXSIZE];
char param[HTS_CDLMAXSIZE];
char addcommand[256];
/* */
@@ -320,9 +325,10 @@ int optalias_check(int argc, const char *const *argv, int n_arg,
/* Copy parameters? */
if (need_param == 2) {
if ((n_arg + 1 >= argc) || (argv[n_arg + 1][0] == '-')) { /* no supplemental parameter */
sprintf(return_error,
"Syntax error:\n\tOption %s needs to be followed by a parameter: %s <param>\n\t%s\n",
command, command, _NOT_NULL(optalias_help(command)));
snprintf(return_error, return_error_size,
"Syntax error:\n\tOption %s needs to be followed by a "
"parameter: %s <param>\n\t%s\n",
command, command, _NOT_NULL(optalias_help(command)));
return 0;
}
strcpybuff(param, argv[n_arg + 1]);
@@ -335,35 +341,36 @@ int optalias_check(int argc, const char *const *argv, int n_arg,
/* Must be alone (-P /tmp) */
if (strcmp(hts_optalias[pos][2], "param1") == 0) {
strcpybuff(return_argv[0], command);
strcpybuff(return_argv[1], param);
strlcpybuff(return_argv[0], command, return_argv_size);
strlcpybuff(return_argv[1], param, return_argv_size);
*return_argc = 2; /* 2 parameters returned */
}
/* Alone with parameter (+*.gif) */
else if (strcmp(hts_optalias[pos][2], "param0") == 0) {
/* Command */
strcpybuff(return_argv[0], command);
strcatbuff(return_argv[0], param);
strlcpybuff(return_argv[0], command, return_argv_size);
strlcatbuff(return_argv[0], param, return_argv_size);
}
/* Together (-c8) */
else {
/* Command */
strcpybuff(return_argv[0], command);
strlcpybuff(return_argv[0], command, return_argv_size);
/* Parameters accepted */
if (strncmp(hts_optalias[pos][2], "param", 5) == 0) {
/* --cache=off or --index=on */
if (strcmp(param, "off") == 0)
strcatbuff(return_argv[0], "0");
strlcatbuff(return_argv[0], "0", return_argv_size);
else if (strcmp(param, "on") == 0) {
// on is the default
// strcatbuff(return_argv[0],"1");
} else
strcatbuff(return_argv[0], param);
strlcatbuff(return_argv[0], param, return_argv_size);
}
*return_argc = 1; /* 1 parameter returned */
}
} else {
sprintf(return_error, "Unknown option: %s\n", command);
snprintf(return_error, return_error_size, "Unknown option: %s\n",
command);
return 0;
}
return need_param;
@@ -377,15 +384,16 @@ int optalias_check(int argc, const char *const *argv, int n_arg,
if ((strcmp(hts_optalias[pos][2], "param1") == 0)
|| (strcmp(hts_optalias[pos][2], "param0") == 0)) {
if ((n_arg + 1 >= argc) || (argv[n_arg + 1][0] == '-')) { /* no supplemental parameter */
sprintf(return_error,
"Syntax error:\n\tOption %s needs to be followed by a parameter: %s <param>\n\t%s\n",
argv[n_arg], argv[n_arg],
_NOT_NULL(optalias_help(argv[n_arg])));
snprintf(return_error, return_error_size,
"Syntax error:\n\tOption %s needs to be followed by a "
"parameter: %s <param>\n\t%s\n",
argv[n_arg], argv[n_arg],
_NOT_NULL(optalias_help(argv[n_arg])));
return 0;
}
/* Copy parameters */
strcpybuff(return_argv[0], argv[n_arg]);
strcpybuff(return_argv[1], argv[n_arg + 1]);
strlcpybuff(return_argv[0], argv[n_arg], return_argv_size);
strlcpybuff(return_argv[1], argv[n_arg + 1], return_argv_size);
/* And return */
*return_argc = 2; /* 2 parameters returned */
return 2; /* 2 parameters used */
@@ -394,7 +402,7 @@ int optalias_check(int argc, const char *const *argv, int n_arg,
}
/* Copy and return other unknown option */
strcpybuff(return_argv[0], argv[n_arg]);
strlcpybuff(return_argv[0], argv[n_arg], return_argv_size);
return 1;
}
@@ -521,9 +529,10 @@ int optinclude_file(const char *name, int *argc, char **argv, char *x_argvblk,
strcatbuff(_tmp_argv[0], a);
strcpybuff(_tmp_argv[1], b);
result =
optalias_check(2, (const char *const *) tmp_argv, 0, &return_argc,
(tmp_argv + 2), return_error);
result = optalias_check(2, (const char *const *) tmp_argv, 0,
&return_argc, (tmp_argv + 2),
sizeof(_tmp_argv[0]), return_error,
sizeof(return_error));
if (!result) {
printf("%s\n", return_error);
} else {

View File

@@ -38,7 +38,9 @@ Please visit our Website: http://www.httrack.com
#ifdef HTS_INTERNAL_BYTECODE
extern const char *hts_optalias[][4];
int optalias_check(int argc, const char *const *argv, int n_arg,
int *return_argc, char **return_argv, char *return_error);
int *return_argc, char **return_argv,
size_t return_argv_size, char *return_error,
size_t return_error_size);
int optalias_find(const char *token);
const char *optalias_help(const char *token);
int optreal_find(const char *token);

View File

@@ -102,7 +102,8 @@ int cookie_add(t_cookie * cookie, const char *cook_name, const char *cook_value,
strcatbuff(cook, "\n");
if (!((strlen(cookie->data) + strlen(cook)) < cookie->max_len))
return -1; // impossible d'ajouter
cookie_insert(insert, cook);
cookie_insert(insert, cookie->max_len - (size_t) (insert - cookie->data),
cook);
#if DEBUG_COOK
printf("add_new cookie: name=\"%s\" value=\"%s\" domain=\"%s\" path=\"%s\"\n",
cook_name, cook_value, domain, path);
@@ -118,7 +119,7 @@ int cookie_del(t_cookie * cookie, const char *cook_name, const char *domain, con
b = cookie_find(cookie->data, cook_name, domain, path);
if (b) {
a = cookie_nextfield(b);
cookie_delete(b, a - b);
cookie_delete(b, cookie->max_len - (size_t) (b - cookie->data), a - b);
#if DEBUG_COOK
printf("deleted old cookie: %s %s %s\n", cook_name, domain, path);
#endif
@@ -133,8 +134,8 @@ static int cookie_cmp_wildcard_domain(const char *chk_dom, const char *domain) {
const size_t n = strlen(chk_dom);
const size_t m = strlen(domain);
const size_t l = n < m ? n : m;
size_t i;
for (i = l - 1; i >= 0; i--) {
int i;
for (i = (int) l - 1; i >= 0; i--) {
if (chk_dom[n - i - 1] != domain[m - i - 1]) {
return 1;
}
@@ -336,41 +337,44 @@ int cookie_save(t_cookie * cookie, const char *name) {
return -1;
}
// insertion chaine ins avant s
void cookie_insert(char *s, const char *ins) {
// Insert string ins before s. s_size is the capacity of the buffer at s.
void cookie_insert(char *s, size_t s_size, const char *ins) {
char *buff;
if (strnotempty(s) == 0) { // rien à faire, juste concat
strcatbuff(s, ins);
if (strnotempty(s) == 0) { // nothing there yet: just concatenate
strlcatbuff(s, ins, s_size);
} else {
buff = (char *) malloct(strlen(s) + 1);
if (buff) {
strcpybuff(buff, s); // copie temporaire
strcpybuff(s, ins); // insérer
strcatbuff(s, buff); // copier
strlcpybuff(buff, s, strlen(s) + 1); // temporary copy of s
strlcpybuff(s, ins, s_size); // write ins
strlcatbuff(s, buff, s_size); // then the saved content
freet(buff);
}
}
}
// destruction chaine dans s position pos
void cookie_delete(char *s, size_t pos) {
// Delete the substring of s at position pos. s_size is the capacity at s.
void cookie_delete(char *s, size_t s_size, size_t pos) {
char *buff;
if (strnotempty(s + pos) == 0) { // rien à faire, effacer
if (strnotempty(s + pos) == 0) { // nothing after pos: truncate
s[0] = '\0';
} else {
buff = (char *) malloct(strlen(s + pos) + 1);
if (buff) {
strcpybuff(buff, s + pos); // copie temporaire
strcpybuff(s, buff); // copier
strlcpybuff(buff, s + pos, strlen(s + pos) + 1); // temporary copy
strlcpybuff(s, buff, s_size); // overwrite from start
freet(buff);
}
}
}
// renvoie champ param de la chaine cookie_base
// ex: cookie_get("ceci est<tab>un<tab>exemple",1) renvoi "un"
// Return field <param> (0-based, tab-separated) of the cookie line cookie_base,
// into buffer. ex: cookie_get("ceci est<tab>un<tab>exemple", 1) returns "un".
// buffer must hold at least COOKIE_FIELD_BUFFER_SIZE bytes (all callers use
// char[8192]).
#define COOKIE_FIELD_BUFFER_SIZE 8192
const char *cookie_get(char *buffer, const char *cookie_base, int param) {
const char *limit;
@@ -394,11 +398,11 @@ const char *cookie_get(char *buffer, const char *cookie_base, int param) {
if (cookie_base) {
if (cookie_base < limit) {
const char *a = cookie_base;
htsbuff b = htsbuff_ptr(buffer, COOKIE_FIELD_BUFFER_SIZE);
while((*a) && (*a != '\t') && (*a != '\n'))
a++;
buffer[0] = '\0';
strncatbuff(buffer, cookie_base, (int) (a - cookie_base));
htsbuff_catn(&b, cookie_base, (size_t) (a - cookie_base));
return buffer;
} else
return "";
@@ -458,11 +462,13 @@ char *bauth_check(t_cookie * cookie, const char *adr, const char *fil) {
return NULL;
}
/* Build the auth prefix (host + path, query stripped) into prefix.
Callers pass a buffer of HTS_URLMAXSIZE * 2 bytes. */
char *bauth_prefix(char *prefix, const char *adr, const char *fil) {
char *a;
strcpybuff(prefix, jump_identification_const(adr));
strcatbuff(prefix, fil);
strlcpybuff(prefix, jump_identification_const(adr), HTS_URLMAXSIZE * 2);
strlcatbuff(prefix, fil, HTS_URLMAXSIZE * 2);
a = strchr(prefix, '?');
if (a)
*a = '\0';

View File

@@ -67,8 +67,8 @@ int cookie_add(t_cookie * cookie, const char *cook_name, const char *cook_valu
int cookie_del(t_cookie * cookie, const char *cook_name, const char *domain, const char *path);
int cookie_load(t_cookie * cookie, const char *path, const char *name);
int cookie_save(t_cookie * cookie, const char *name);
void cookie_insert(char *s, const char *ins);
void cookie_delete(char *s, size_t pos);
void cookie_insert(char *s, size_t s_size, const char *ins);
void cookie_delete(char *s, size_t s_size, size_t pos);
const char *cookie_get(char *buffer, const char *cookie_base, int param);
char *cookie_find(char *s, const char *cook_name, const char *domain, const char *path);
char *cookie_nextfield(char *a);

View File

@@ -196,12 +196,13 @@ struct cache_back_zip_entry {
int compressionMethod;
};
#define ZIP_READFIELD_STRING(line, value, refline, refvalue) do { \
if (line[0] != '\0' && strfield2(line, refline)) { \
strcpybuff(refvalue, value); \
line[0] = '\0'; \
} \
} while(0)
#define ZIP_READFIELD_STRING(line, value, refline, refvalue, refvalue_size) \
do { \
if (line[0] != '\0' && strfield2(line, refline)) { \
strlcpybuff(refvalue, value, refvalue_size); \
line[0] = '\0'; \
} \
} while (0)
#define ZIP_READFIELD_INT(line, value, refline, refvalue) do { \
if (line[0] != '\0' && strfield2(line, refline)) { \
int intval = 0; \
@@ -643,7 +644,7 @@ static htsblk cache_readex_new(httrackp * opt, cache_back * cache,
} else {
r.location = location_default;
}
strcpybuff(r.location, "");
r.location[0] = '\0';
strcpybuff(buff, adr);
strcatbuff(buff, fil);
hash_pos_return = coucal_read(cache->hashtable, buff, &hash_pos);
@@ -706,17 +707,25 @@ static htsblk cache_readex_new(httrackp * opt, cache_back * cache,
value++;
ZIP_READFIELD_INT(line, value, "X-In-Cache", dataincache);
ZIP_READFIELD_INT(line, value, "X-Statuscode", r.statuscode);
ZIP_READFIELD_STRING(line, value, "X-StatusMessage", r.msg); // msg
ZIP_READFIELD_STRING(line, value, "X-StatusMessage", r.msg,
sizeof(r.msg));
ZIP_READFIELD_LLINT(line, value, "X-Size", r.size); // size
ZIP_READFIELD_STRING(line, value, "Content-Type", r.contenttype); // contenttype
ZIP_READFIELD_STRING(line, value, "X-Charset", r.charset); // contenttype
ZIP_READFIELD_STRING(line, value, "Last-Modified", r.lastmodified); // last-modified
ZIP_READFIELD_STRING(line, value, "Etag", r.etag); // Etag
ZIP_READFIELD_STRING(line, value, "Location", r.location); // 'location' pour moved
ZIP_READFIELD_STRING(line, value, "Content-Disposition", r.cdispo); // Content-disposition
ZIP_READFIELD_STRING(line, value, "Content-Type", r.contenttype,
sizeof(r.contenttype));
ZIP_READFIELD_STRING(line, value, "X-Charset", r.charset,
sizeof(r.charset));
ZIP_READFIELD_STRING(line, value, "Last-Modified", r.lastmodified,
sizeof(r.lastmodified));
ZIP_READFIELD_STRING(line, value, "Etag", r.etag, sizeof(r.etag));
// r.location is a char* pointing into a HTS_URLMAXSIZE*2 buffer
ZIP_READFIELD_STRING(line, value, "Location", r.location,
HTS_URLMAXSIZE * 2);
ZIP_READFIELD_STRING(line, value, "Content-Disposition", r.cdispo,
sizeof(r.cdispo));
//ZIP_READFIELD_STRING(line, value, "X-Addr", ..); // Original address
//ZIP_READFIELD_STRING(line, value, "X-Fil", ..); // Original URI filename
ZIP_READFIELD_STRING(line, value, "X-Save", previous_save_); // Original save filename
ZIP_READFIELD_STRING(line, value, "X-Save", previous_save_,
sizeof(previous_save_));
}
} while(offset < readSizeHeader && !lineEof);
//totalHeader = offset;
@@ -733,7 +742,7 @@ static htsblk cache_readex_new(httrackp * opt, cache_back * cache,
}
}
if (return_save != NULL) {
strcpybuff(return_save, previous_save);
strlcpybuff(return_save, previous_save, HTS_URLMAXSIZE * 2);
}
/* Complete fields */
@@ -1025,7 +1034,7 @@ static htsblk cache_readex_old(httrackp * opt, cache_back * cache,
} else {
r.location = location_default;
}
strcpybuff(r.location, "");
r.location[0] = '\0';
#if HTS_FAST_CACHE
strcpybuff(buff, adr);
strcatbuff(buff, fil);
@@ -1111,7 +1120,7 @@ static htsblk cache_readex_old(httrackp * opt, cache_back * cache,
previous_save[0] = '\0';
cache_rstr(cache->olddat, previous_save); // save
if (return_save != NULL) {
strcpybuff(return_save, previous_save);
strlcpybuff(return_save, previous_save, HTS_URLMAXSIZE * 2);
}
}
if (cache->version >= 5) {
@@ -2088,7 +2097,7 @@ char *readfile_or(const char *fil, const char *defaultdata) {
char *adr = malloct(strlen(defaultdata) + 1);
if (adr) {
strcpybuff(adr, defaultdata);
strlcpybuff(adr, defaultdata, strlen(defaultdata) + 1);
return adr;
}
}

View File

@@ -201,8 +201,8 @@ HTSEXT_API int catch_url(T_SOC soc, char *url, char *method, char *data) {
while(strnotempty(line)) {
socinput(soc, line, 1000);
treathead(NULL, NULL, NULL, &blkretour, line); // traiter
strcatbuff(data, line);
strcatbuff(data, "\r\n");
strlcatbuff(data, line, CATCH_URL_DATA_SIZE);
strlcatbuff(data, "\r\n", CATCH_URL_DATA_SIZE);
}
// CR/LF final de l'en tête inutile car déja placé via la ligne vide juste au dessus
//strcatbuff(data,"\r\n");

View File

@@ -40,6 +40,9 @@ Please visit our Website: http://www.httrack.com
/* Library internal definictions */
#ifdef HTS_INTERNAL_BYTECODE
// Capacity contract for the catch_url() 'data' buffer (32 Kb).
#define CATCH_URL_DATA_SIZE 32768
// Fonctions
void socinput(T_SOC soc, char *s, int max);

View File

@@ -40,6 +40,7 @@ Please visit our Website: http://www.httrack.com
#include "htscore.h"
#include "htsdefines.h"
#include "htsalias.h"
#include "htsbauth.h"
#include "htswrap.h"
#include "htsmodules.h"
#include "htszlib.h"
@@ -138,6 +139,110 @@ static void basic_selftests(void) {
fil_normalized(source, buffer);
// MD5 selftests
md5selftest();
// cookie_get field extraction (tab-separated, 0-based)
{
char cbuf[8192];
assertf(strcmp(cookie_get(cbuf, "a\tb\tc", 0), "a") == 0);
assertf(strcmp(cookie_get(cbuf, "a\tb\tc", 1), "b") == 0);
assertf(strcmp(cookie_get(cbuf, "a\tb\tc", 2), "c") == 0);
// multi-char fields catch length/boundary bugs that 1-char fields hide
assertf(strcmp(cookie_get(cbuf, "host\tx\t/path/to", 0), "host") == 0);
assertf(strcmp(cookie_get(cbuf, "host\tx\t/path/to", 2), "/path/to") == 0);
assertf(strcmp(cookie_get(cbuf, "a\t\tc", 1), "") == 0); // empty field
assertf(strcmp(cookie_get(cbuf, "a\tb\tc", 9), "") == 0); // beyond last
}
}
/* Self-tests for the htssafe.h bounded string ops (driven by httrack -#8).
Returns 0 if every bounded operation behaved correctly, 1 otherwise.
The abort-on-overflow guarantee is checked separately by the -#8 "overflow"
sub-mode (it aborts the process by design). */
static int string_safety_selftests(void) {
char buf[8];
/* strcpybuff into a sized array: exact copy */
strcpybuff(buf, "abc");
if (strcmp(buf, "abc") != 0)
return 1;
/* strcatbuff append within capacity */
strcatbuff(buf, "de");
if (strcmp(buf, "abcde") != 0)
return 1;
/* strncatbuff appends at most N source chars */
strcpybuff(buf, "ab");
strncatbuff(buf, "cdef", 2);
if (strcmp(buf, "abcd") != 0)
return 1;
/* strlcpybuff: explicit-capacity copy into a pointer destination, the form
the migration moves toward */
{
char storage[8];
char *const p = storage;
strlcpybuff(p, "hello", sizeof(storage));
if (strcmp(p, "hello") != 0)
return 1;
}
/* strcpybuff into a pointer destination: routes through the unchecked
strcpybuff_ptr_ fallback (the path the -#8 warning flags). The warning is
intentional here; we only verify the fallback still copies correctly. */
#if defined(__GNUC__)
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wattribute-warning"
#endif
{
char storage[8];
char *const p = storage;
strcpybuff(p, "ptr");
if (strcmp(p, "ptr") != 0)
return 1;
}
#if defined(__GNUC__)
#pragma GCC diagnostic pop
#endif
/* htsbuff: bounded builder over a fixed array (append, truncating append,
reset, and length tracking) */
{
char dst[8];
htsbuff b = htsbuff_array(dst);
htsbuff_cat(&b, "ab");
htsbuff_cat(&b, "cd");
if (strcmp(htsbuff_str(&b), "abcd") != 0 || b.len != 4)
return 1;
htsbuff_catn(&b, "efghij", 2); /* append at most 2 */
if (strcmp(htsbuff_str(&b), "abcdef") != 0)
return 1;
htsbuff_cpy(&b, "xyz"); /* reset */
if (strcmp(htsbuff_str(&b), "xyz") != 0 || b.len != 3)
return 1;
htsbuff_catc(&b, '!'); /* single character */
if (strcmp(htsbuff_str(&b), "xyz!") != 0 || b.len != 4)
return 1;
}
/* boundary: filling to exactly cap-1 must succeed (one more aborts, which the
-#8 overflow-buff mode checks) */
{
char d2[4];
htsbuff c = htsbuff_array(d2);
htsbuff_cat(&c, "abc");
if (strcmp(htsbuff_str(&c), "abc") != 0 || c.len != 3)
return 1;
}
return 0;
}
static int hts_main_internal(int argc, char **argv, httrackp * opt);
@@ -294,10 +399,10 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
/* Vérifier argv[] non vide */
if (strnotempty(argv[na])) {
/* Vérifier Commande (alias) */
result =
optalias_check(argc, (const char *const *) argv, na, &tmp_argc,
(char **) tmp_argv, tmp_error);
/* Resolve an option alias, if any */
result = optalias_check(argc, (const char *const *) argv, na, &tmp_argc,
(char **) tmp_argv, sizeof(_tmp_argv[0]),
tmp_error, sizeof(tmp_error));
if (!result) {
HTS_PANIC_PRINTF(tmp_error);
htsmain_free();
@@ -1787,10 +1892,6 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
HTS_PANIC_PRINTF("Empty string given");
htsmain_free();
return -1;
} else if (strlen(argv[na]) >= 256) {
HTS_PANIC_PRINTF("Header line string too long");
htsmain_free();
return -1;
}
StringCat(opt->headers, argv[na]);
StringCat(opt->headers, "\r\n"); /* separator */
@@ -2441,6 +2542,35 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
htsmain_free();
return 0;
break;
case '8': /* string-safety selftest: httrack -#8 [overflow <bigstr>] */
if (na + 1 < argc
&& strncmp(argv[na + 1], "overflow", 8) == 0) {
/* Deliberately exceed a sized buffer: the bounded op must
abort. The source comes from argv so its length is opaque
to the compiler (no static -Wstringop-overflow, genuine
runtime check). "overflow-buff" exercises htsbuff. */
char small[4];
const char *const src =
(na + 2 < argc) ? argv[na + 2] : "overflowing";
if (strcmp(argv[na + 1], "overflow-buff") == 0) {
htsbuff b = htsbuff_array(small);
htsbuff_cat(&b, src);
} else {
strcpybuff(small, src);
}
printf("strsafe: NOT aborted\n"); /* must be unreachable */
htsmain_free();
return 1;
} else {
const int err = string_safety_selftests();
printf("strsafe: %s\n", err ? "FAIL" : "OK");
htsmain_free();
return err;
}
break;
case '7': // hashtable selftest: httrack -#7 nb_entries
basic_selftests();
if (++na < argc) {
@@ -2691,11 +2821,6 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
return -1;
} else {
na++;
if (strlen(argv[na]) >= 126) {
HTS_PANIC_PRINTF("User-agent length too long");
htsmain_free();
return -1;
}
StringCopy(opt->user_agent, argv[na]);
if (StringNotEmpty(opt->user_agent))
opt->user_agent_send = 1;
@@ -2899,7 +3024,9 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
}
{
char n_lock[256];
/* Sized to the concat-buffer capacity so it can always hold the lock-file
path produced by fconcat(), even with a long log path (issue #183). */
char n_lock[OPT_GET_BUFF_SIZE(opt)];
// on peut pas avoir un affichage ET un fichier log
// ca sera pour la version 2

View File

@@ -409,7 +409,7 @@ void help_catchurl(const char *dest_path) {
if (soc != INVALID_SOCKET) {
char BIGSTK url[HTS_URLMAXSIZE * 2];
char method[32];
char BIGSTK data[32768];
char BIGSTK data[CATCH_URL_DATA_SIZE];
url[0] = method[0] = data[0] = '\0';
//
@@ -604,7 +604,7 @@ void help(const char *app, int more) {
infomsg(" %E from email address sent in HTTP headers");
infomsg
(" %F footer string in Html code (-%F \"Mirrored [from host %s [file %s [at %s]]]\"");
infomsg(" %l preffered language (-%l \"fr, en, jp, *\"");
infomsg(" %l preferred language (-%l \"fr, en, jp, *\"");
infomsg(" %a accepted formats (-%a \"text/html,image/png;q=0.9,*/*;q=0.1\"");
infomsg(" %X additional HTTP header line (-%X \"X-Magic: 42\"");
infomsg("");
@@ -712,7 +712,7 @@ void help(const char *app, int more) {
infomsg(" '%N' Name of file, including file type (ex: image.gif)");
infomsg(" '%t' File type (ex: gif)");
infomsg(" '%p' Path [without ending /] (ex: /someimages)");
infomsg(" '%h' Host name (ex: www.someweb.com)");
infomsg(" '%h' Host name (ex: www.example.com)");
infomsg(" '%M' URL MD5 (128 bits, 32 ascii bytes)");
infomsg(" '%Q' query string MD5 (128 bits, 32 ascii bytes)");
infomsg(" '%k' full query string");
@@ -767,21 +767,21 @@ void help(const char *app, int more) {
infomsg("Details: Option %W: External callbacks prototypes");
infomsg("see htsdefines.h");
infomsg("");
infomsg("example: httrack www.someweb.com/bob/");
infomsg("means: mirror site www.someweb.com/bob/ and only this site");
infomsg("example: httrack www.example.com/bob/");
infomsg("means: mirror site www.example.com/bob/ and only this site");
infomsg("");
infomsg
("example: httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg -mime:application/*");
("example: httrack www.example.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg -mime:application/*");
infomsg
("means: mirror the two sites together (with shared links) and accept any .jpg files on .com sites");
infomsg("");
infomsg("example: httrack www.someweb.com/bob/bobby.html +* -r6");
infomsg("example: httrack www.example.com/bob/bobby.html +* -r6");
infomsg
("means get all files starting from bobby.html, with 6 link-depth, and possibility of going everywhere on the web");
infomsg("");
infomsg
("example: httrack www.someweb.com/bob/bobby.html --spider -P proxy.myhost.com:8080");
infomsg("runs the spider on www.someweb.com/bob/bobby.html using a proxy");
("example: httrack www.example.com/bob/bobby.html --spider -P proxy.myhost.com:8080");
infomsg("runs the spider on www.example.com/bob/bobby.html using a proxy");
infomsg("");
infomsg("example: httrack --update");
infomsg("updates a mirror in the current folder");

View File

@@ -121,6 +121,7 @@ const char *hts_detect[] = {
"lowsrc",
"profile", // element META
"src",
"srcset", // HTML5 responsive images (<img>, <source>)
"swurl",
"url",
"usemap",
@@ -877,7 +878,7 @@ int http_sendhead(httrackp * opt, t_cookie * cookie, int mode,
const char *xsend, const char *adr, const char *fil,
const char *referer_adr, const char *referer_fil,
htsblk * retour) {
char BIGSTK buffer_head_request[8192];
char BIGSTK buffer_head_request[16384];
buff_struct bstr = { buffer_head_request, sizeof(buffer_head_request), 0 };
//int use_11=0; // HTTP 1.1 utilisé
@@ -895,9 +896,9 @@ int http_sendhead(httrackp * opt, t_cookie * cookie, int mode,
// possibilité non documentée: >post: et >postfile:
// si présence d'un tag >post: alors executer un POST
// exemple: http://www.someweb.com/test.cgi?foo>post:posteddata=10&foo=5
// exemple: http://www.example.com/test.cgi?foo>post:posteddata=10&foo=5
// si présence d'un tag >postfile: alors envoyer en tête brut contenu dans le fichier en question
// exemple: http://www.someweb.com/test.cgi?foo>postfile:post0.txt
// exemple: http://www.example.com/test.cgi?foo>postfile:post0.txt
search_tag = strstr(fil, POSTTOK ":");
if (!search_tag) {
search_tag = strstr(fil, POSTTOK "file:");
@@ -1659,138 +1660,107 @@ void treathead(t_cookie * cookie, const char *adr, const char *fil, htsblk * ret
}
}
// transforme le message statuscode en chaîne
HTSEXT_API void infostatuscode(char *msg, int statuscode) {
// HTTP status code -> reason phrase (per RFC), or NULL if unknown.
HTSEXT_API const char *infostatuscode_const(int statuscode) {
// O(1) dispatch (the compiler builds a jump table); the phrases are static.
switch (statuscode) {
// Erreurs HTTP, selon RFC
case 100:
strcpybuff(msg, "Continue");
break;
return "Continue";
case 101:
strcpybuff(msg, "Switching Protocols");
break;
return "Switching Protocols";
case 200:
strcpybuff(msg, "OK");
break;
return "OK";
case 201:
strcpybuff(msg, "Created");
break;
return "Created";
case 202:
strcpybuff(msg, "Accepted");
break;
return "Accepted";
case 203:
strcpybuff(msg, "Non-Authoritative Information");
break;
return "Non-Authoritative Information";
case 204:
strcpybuff(msg, "No Content");
break;
return "No Content";
case 205:
strcpybuff(msg, "Reset Content");
break;
return "Reset Content";
case 206:
strcpybuff(msg, "Partial Content");
break;
return "Partial Content";
case 300:
strcpybuff(msg, "Multiple Choices");
break;
return "Multiple Choices";
case 301:
strcpybuff(msg, "Moved Permanently");
break;
return "Moved Permanently";
case 302:
strcpybuff(msg, "Moved Temporarily");
break;
return "Moved Temporarily";
case 303:
strcpybuff(msg, "See Other");
break;
return "See Other";
case 304:
strcpybuff(msg, "Not Modified");
break;
return "Not Modified";
case 305:
strcpybuff(msg, "Use Proxy");
break;
return "Use Proxy";
case 306:
strcpybuff(msg, "Undefined 306 error");
break;
return "Undefined 306 error";
case 307:
strcpybuff(msg, "Temporary Redirect");
break;
return "Temporary Redirect";
case 400:
strcpybuff(msg, "Bad Request");
break;
return "Bad Request";
case 401:
strcpybuff(msg, "Unauthorized");
break;
return "Unauthorized";
case 402:
strcpybuff(msg, "Payment Required");
break;
return "Payment Required";
case 403:
strcpybuff(msg, "Forbidden");
break;
return "Forbidden";
case 404:
strcpybuff(msg, "Not Found");
break;
return "Not Found";
case 405:
strcpybuff(msg, "Method Not Allowed");
break;
return "Method Not Allowed";
case 406:
strcpybuff(msg, "Not Acceptable");
break;
return "Not Acceptable";
case 407:
strcpybuff(msg, "Proxy Authentication Required");
break;
return "Proxy Authentication Required";
case 408:
strcpybuff(msg, "Request Time-out");
break;
return "Request Time-out";
case 409:
strcpybuff(msg, "Conflict");
break;
return "Conflict";
case 410:
strcpybuff(msg, "Gone");
break;
return "Gone";
case 411:
strcpybuff(msg, "Length Required");
break;
return "Length Required";
case 412:
strcpybuff(msg, "Precondition Failed");
break;
return "Precondition Failed";
case 413:
strcpybuff(msg, "Request Entity Too Large");
break;
return "Request Entity Too Large";
case 414:
strcpybuff(msg, "Request-URI Too Large");
break;
return "Request-URI Too Large";
case 415:
strcpybuff(msg, "Unsupported Media Type");
break;
return "Unsupported Media Type";
case 416:
strcpybuff(msg, "Requested Range Not Satisfiable");
break;
return "Requested Range Not Satisfiable";
case 417:
strcpybuff(msg, "Expectation Failed");
break;
return "Expectation Failed";
case 500:
strcpybuff(msg, "Internal Server Error");
break;
return "Internal Server Error";
case 501:
strcpybuff(msg, "Not Implemented");
break;
return "Not Implemented";
case 502:
strcpybuff(msg, "Bad Gateway");
break;
return "Bad Gateway";
case 503:
strcpybuff(msg, "Service Unavailable");
break;
return "Service Unavailable";
case 504:
strcpybuff(msg, "Gateway Time-out");
break;
return "Gateway Time-out";
case 505:
strcpybuff(msg, "HTTP Version Not Supported");
break;
//
return "HTTP Version Not Supported";
default:
if (strnotempty(msg) == 0)
strcpybuff(msg, "Unknown error");
break;
return NULL;
}
}
// Write the status code's reason phrase into msg. For an unknown code, keep any
// caller-provided message, otherwise fall back to a default. Callers provide a
// buffer of at least 64 bytes (the longest reason phrase is 31).
HTSEXT_API void infostatuscode(char *msg, int statuscode) {
const char *const text = infostatuscode_const(statuscode);
if (text != NULL) {
strlcpybuff(msg, text, 64);
} else if (strnotempty(msg) == 0) {
strlcpybuff(msg, "Unknown error", 64);
}
}

View File

@@ -767,7 +767,7 @@ int url_savename(lien_adrfilsave *const afs,
// ajouter nom du site éventuellement en premier
if (opt->savename_type == -1) { // utiliser savename_userdef! (%h%p/%n%q.%t)
const char *a = StringBuff(opt->savename_userdef);
char *b = afs->save;
htsbuff sb = htsbuff_array(afs->save);
/*char *nom_pos=NULL,*dot_pos=NULL; // Position nom et point */
char tok;
@@ -787,17 +787,16 @@ int url_savename(lien_adrfilsave *const afs,
}
*/
// Construire nom
while((*a) && (((int) (b - afs->save)) < HTS_URLMAXSIZE)) { // parser, et pas trop long..
// build the name
while ((*a) && (sb.len < HTS_URLMAXSIZE)) { // parse, but not too long
if (*a == '%') {
int short_ver = 0;
a++;
if (*a == 's') {
if (*a == 's') { // '%s...' selects the short (8.3) form
short_ver = 1;
a++;
}
*b = '\0';
switch (tok = *a++) {
case '[': // %[param:prefix_if_not_empty:suffix_if_not_empty:empty_replacement:notfound_replacement]
if (strchr(a, ']')) {
@@ -834,8 +833,7 @@ int url_savename(lien_adrfilsave *const afs,
}
if (cp) {
c = cp + strlen(name[0]); /* jumps "param=" */
strcpybuff(b, name[1]); /* prefix */
b += strlen(b);
htsbuff_cat(&sb, name[1]); /* prefix */
if (*c != '\0' && *c != '&') {
char *d = name[0];
@@ -846,110 +844,90 @@ int url_savename(lien_adrfilsave *const afs,
*d = '\0';
d = unescape_http(catbuff, sizeof(catbuff), name[0]);
if (d && *d) {
strcpybuff(b, d); /* value */
b += strlen(b);
htsbuff_cat(&sb, d); /* value */
} else {
strcpybuff(b, name[3]); /* empty replacement if any */
b += strlen(b);
htsbuff_cat(&sb, name[3]); /* empty replacement if any */
}
} else {
strcpybuff(b, name[3]); /* empty replacement if any */
b += strlen(b);
htsbuff_cat(&sb, name[3]); /* empty replacement if any */
}
strcpybuff(b, name[2]); /* suffix */
b += strlen(b);
htsbuff_cat(&sb, name[2]); /* suffix */
} else {
strcpybuff(b, name[4]); /* not found replacement if any */
b += strlen(b);
htsbuff_cat(&sb, name[4]); /* not found replacement if any */
}
} else {
strcpybuff(b, name[4]); /* not found replacement if any */
b += strlen(b);
htsbuff_cat(&sb, name[4]); /* not found replacement if any */
}
}
break;
case '%':
*b++ = '%';
htsbuff_catc(&sb, '%');
break;
case 'n': // nom sans ext
*b = '\0';
case 'n': // name without extension
if (dot_pos) {
if (!short_ver) // Noms longs
strncatbuff(b, nom_pos, (int) (dot_pos - nom_pos));
if (!short_ver)
htsbuff_catn(&sb, nom_pos, (int) (dot_pos - nom_pos));
else
strncatbuff(b, nom_pos, min((int) (dot_pos - nom_pos), 8));
htsbuff_catn(&sb, nom_pos, min((int) (dot_pos - nom_pos), 8));
} else {
if (!short_ver) // Noms longs
strcpybuff(b, nom_pos);
if (!short_ver)
htsbuff_cat(&sb, nom_pos);
else
strncatbuff(b, nom_pos, 8);
htsbuff_catn(&sb, nom_pos, 8);
}
b += strlen(b); // pointer à la fin
break;
case 'N': // nom avec ext
// RECOPIE NOM + EXT
*b = '\0';
case 'N': // name with extension
if (dot_pos) {
if (!short_ver) // Noms longs
strncatbuff(b, nom_pos, (int) (dot_pos - nom_pos));
if (!short_ver)
htsbuff_catn(&sb, nom_pos, (int) (dot_pos - nom_pos));
else
strncatbuff(b, nom_pos, min((int) (dot_pos - nom_pos), 8));
htsbuff_catn(&sb, nom_pos, min((int) (dot_pos - nom_pos), 8));
} else {
if (!short_ver) // Noms longs
strcpybuff(b, nom_pos);
if (!short_ver)
htsbuff_cat(&sb, nom_pos);
else
strncatbuff(b, nom_pos, 8);
htsbuff_catn(&sb, nom_pos, 8);
}
b += strlen(b); // pointer à la fin
*b = '.';
++b;
// RECOPIE NOM + EXT
*b = '\0';
htsbuff_catc(&sb, '.');
if (dot_pos) {
if (!short_ver) // Noms longs
strcpybuff(b, dot_pos + 1);
if (!short_ver)
htsbuff_cat(&sb, dot_pos + 1);
else
strncatbuff(b, dot_pos + 1, 3);
htsbuff_catn(&sb, dot_pos + 1, 3);
} else {
if (!short_ver) // Noms longs
strcpybuff(b, DEFAULT_EXT + 1); // pas de..
if (!short_ver)
htsbuff_cat(&sb, DEFAULT_EXT + 1); // skip the leading dot
else
strcpybuff(b, DEFAULT_EXT_SHORT + 1); // pas de..
htsbuff_cat(&sb, DEFAULT_EXT_SHORT + 1); // skip the leading dot
}
b += strlen(b); // pointer à la fin
//
break;
case 't': // ext
*b = '\0';
case 't': // extension
if (dot_pos) {
if (!short_ver) // Noms longs
strcpybuff(b, dot_pos + 1);
if (!short_ver)
htsbuff_cat(&sb, dot_pos + 1);
else
strncatbuff(b, dot_pos + 1, 3);
htsbuff_catn(&sb, dot_pos + 1, 3);
} else {
if (!short_ver) // Noms longs
strcpybuff(b, DEFAULT_EXT + 1); // pas de..
if (!short_ver)
htsbuff_cat(&sb, DEFAULT_EXT + 1); // skip the leading dot
else
strcpybuff(b, DEFAULT_EXT_SHORT + 1); // pas de..
htsbuff_cat(&sb, DEFAULT_EXT_SHORT + 1); // skip the leading dot
}
b += strlen(b); // pointer à la fin
break;
case 'p': // path sans dernier /
*b = '\0';
if (nom_pos != fil + 1) { // pas: /index.html (chemin nul)
if (!short_ver) { // Noms longs
strncatbuff(b, fil, (int) (nom_pos - fil) - 1);
case 'p': // path without trailing /
if (nom_pos !=
fil + 1) { // skip when the path is empty (e.g. /index.html)
if (!short_ver) {
htsbuff_catn(&sb, fil, (int) (nom_pos - fil) - 1);
} else {
char BIGSTK pth[HTS_URLMAXSIZE * 2], n83[HTS_URLMAXSIZE * 2];
pth[0] = n83[0] = '\0';
//
strncatbuff(pth, fil, (int) (nom_pos - fil) - 1);
long_to_83(opt->savename_83, n83, pth);
strcpybuff(b, n83);
htsbuff_cat(&sb, n83);
}
}
b += strlen(b); // pointer à la fin
break;
case 'h': // host (IDNA decoded if suitable)
// IDNA / RFC 3492 (Punycode) handling for HTTP(s)
@@ -957,62 +935,50 @@ int url_savename(lien_adrfilsave *const afs,
DECLARE_ADR(final_adr);
/* Copy address */
*b = '\0';
if (!short_ver)
strcpybuff(b, final_adr);
htsbuff_cat(&sb, final_adr);
else
strcpybuff(b, final_adr);
htsbuff_cat(&sb, final_adr);
/* release */
RELEASE_ADR();
}
b += strlen(b); // pointer à la fin
break;
case 'H': // host, raw (old mode)
*b = '\0';
case 'H': // host, raw (old mode)
if (protocol == PROTOCOL_FILE) {
if (!short_ver) // Noms longs
strcpybuff(b, "localhost");
if (!short_ver)
htsbuff_cat(&sb, "localhost");
else
strcpybuff(b, "local");
htsbuff_cat(&sb, "local");
} else {
if (!short_ver) // Noms longs
strcpybuff(b, print_adr);
if (!short_ver)
htsbuff_cat(&sb, print_adr);
else
strncatbuff(b, print_adr, 8);
htsbuff_catn(&sb, print_adr, 8);
}
b += strlen(b); // pointer à la fin
break;
case 'M': /* host/address?query MD5 (128-bits) */
*b = '\0';
{
char digest[32 + 2];
char BIGSTK buff[HTS_URLMAXSIZE * 2];
case 'M': /* host/address?query MD5 (128-bits) */
{
char digest[32 + 2];
char BIGSTK buff[HTS_URLMAXSIZE * 2];
digest[0] = buff[0] = '\0';
strcpybuff(buff, adr);
strcatbuff(buff, fil_complete);
domd5mem(buff, strlen(buff), digest, 1);
strcpybuff(b, digest);
}
b += strlen(b); // pointer à la fin
break;
digest[0] = buff[0] = '\0';
strcpybuff(buff, adr);
strcatbuff(buff, fil_complete);
domd5mem(buff, strlen(buff), digest, 1);
htsbuff_cat(&sb, digest);
} break;
case 'Q':
case 'q': /* query MD5 (128-bits/16-bits)
GENERATED ONLY IF query string exists! */
{
char md5[32 + 2];
case 'q': /* query MD5 (128-bits/16-bits)
GENERATED ONLY IF query string exists! */
{
char md5[32 + 2];
*b = '\0';
strncatbuff(b, url_md5(md5, fil_complete), (tok == 'Q') ? 32 : 4);
b += strlen(b); // pointer à la fin
}
break;
htsbuff_catn(&sb, url_md5(md5, fil_complete), (tok == 'Q') ? 32 : 4);
} break;
case 'r':
case 'R': // protocol
*b = '\0';
strcatbuff(b, protocol_str[protocol]);
b += strlen(b); // pointer à la fin
htsbuff_cat(&sb, protocol_str[protocol]);
break;
/* Patch by Juan Fco Rodriguez to get the full query string */
@@ -1021,19 +987,17 @@ int url_savename(lien_adrfilsave *const afs,
char *d = strchr(fil_complete, '?');
if (d != NULL) {
strcatbuff(b, d);
b += strlen(b);
htsbuff_cat(&sb, d);
}
}
break;
}
} else
*b++ = *a++;
htsbuff_catc(&sb, *a++);
}
*b++ = '\0';
//
// Types prédéfinis
// predefined types
//
}

View File

@@ -274,6 +274,28 @@ Please visit our Website: http://www.httrack.com
} \
} while(0)
/* Percent-encode the angle brackets of a string so it is safe to embed inside
an HTML comment (the default footer) or any other HTML context. A URL holding
"-->" would otherwise close the footer comment and inject markup (issue #165).
Raw '<' and '>' are not valid URL characters, so encoding them is harmless. */
static const char *html_inline_safe(const char *src, char *dst, size_t size) {
size_t i, j;
for(i = 0, j = 0; src[i] != '\0' && j + 4 < size; i++) {
const char c = src[i];
if (c == '<' || c == '>') {
dst[j++] = '%';
dst[j++] = '3';
dst[j++] = (c == '<') ? 'C' : 'E';
} else {
dst[j++] = c;
}
}
dst[j] = '\0';
return dst;
}
/* Main parser */
int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
char catbuff[CATBUFF_SIZE];
@@ -510,6 +532,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
int valid_p = 0; // force to take p even if == 0
int ending_p = '\0'; // ending quote?
int archivetag_p = 0; // avoid multiple-archives with commas
int srcset_p = 0; // srcset="url1 480w, url2 2x": list of URLs
int unquoted_script = 0;
INSCRIPT inscript_state_pos_prev = inscript_state_pos;
@@ -719,13 +742,16 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
if (StringNotEmpty(opt->footer)) {
char BIGSTK tempo[1024 + HTS_URLMAXSIZE * 2];
char gmttime[256];
char BIGSTK safe_adr[HTS_URLMAXSIZE * 3 + 4];
char BIGSTK safe_fil[HTS_URLMAXSIZE * 3 + 4];
tempo[0] = '\0';
time_gmt_rfc822(gmttime);
strcatbuff(tempo, eol);
hts_template_format_str(tempo + strlen(tempo), sizeof(tempo) - strlen(tempo),
StringBuff(opt->footer),
jump_identification_const(urladr()), urlfil(), gmttime,
html_inline_safe(jump_identification_const(urladr()), safe_adr, sizeof(safe_adr)),
html_inline_safe(urlfil(), safe_fil, sizeof(safe_fil)), gmttime,
HTTRACK_VERSIONID, /* EOF */ NULL);
strcatbuff(tempo, eol);
//fwrite(tempo,1,strlen(tempo),fp);
@@ -1025,6 +1051,12 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
if (strcmp(hts_detect[i], "archive") == 0) {
archivetag_p = 1;
}
/* srcset: a comma-list of candidate URLs, each split
out and rewritten below (#235, #236) */
else if (strcmp(hts_detect[i], "srcset") == 0
|| strcmp(hts_detect[i], "data-srcset") == 0) {
srcset_p = 1;
}
}
i++;
}
@@ -1790,6 +1822,14 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
html++; // sauter # pour usemap etc
}
}
srcset_next:
/* srcset: skip leading whitespace/commas before each candidate;
the skipped bytes flush verbatim below */
if (srcset_p) {
while(html < r->adr + r->size
&& (is_realspace(*html) || *html == ','))
INCREMENT_CURRENT_ADR(1);
}
eadr = html;
// ne pas flusher après code si on doit écrire le codebase avant!
@@ -1819,6 +1859,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
if ((*eadr == quote && (!quoteinscript || *(eadr - 1) == '\\')) // end quote
|| (noquote && (*eadr == '\"' || *eadr == '\'')) // end at any quote
|| (!noquote && quote == '\0' && is_realspace(*eadr)) // unquoted href
|| srcset_p // whitespace ends a srcset candidate URL
) // si pas d'attente de quote spéciale ou si quote atteinte
ok = 0;
} else if (ending_p && (*eadr == ending_p))
@@ -1847,6 +1888,16 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
break; // \" ou \' point d'arrêt
case '?': /*quote_adr=adr; */
break; // noter position query
case ',':
if (srcset_p) {
/* split only on a trailing comma; one inside the URL
(data: URI, CDN path) is kept, per the WHATWG algo */
const char *const n = eadr + 1;
if (n >= r->adr + r->size || is_space(*n) || *n == ',')
ok = 0;
}
break;
}
}
//}
@@ -3225,6 +3276,28 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
}
// adr=eadr-1; // ** sauter
/* srcset candidate loop: skip the descriptor and comma, then
re-enter the capture for the next URL. Backward goto, not a loop:
the per-candidate body is this whole block. */
if (srcset_p && ok == 0) {
const char *const endp = r->adr + r->size;
const char *q = html;
while(q < endp && *q != '\0' && *q != ',' && *q != quote
&& *q != '<' && *q != '>' && (unsigned char) *q >= 32)
q++; // skip the descriptor
if (q < endp && *q == ',') {
q++;
while(q < endp && (is_realspace(*q) || *q == ','))
q++; // skip whitespace and empty candidates
if (q < endp && *q != '\0' && *q != ',' && *q != quote
&& *q != '<' && *q != '>' && (unsigned char) *q >= 32) {
INCREMENT_CURRENT_ADR(q - html); // keep the automate in sync
ok = 1;
goto srcset_next;
}
}
}
/* We skipped bytes and skip the " : reset state */
/*if (inscript) {
inscript_state_pos = INSCRIPT_START;
@@ -3341,12 +3414,10 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
hts_log_print(opt, LOG_DEBUG, "engine: postprocess-html: %s%s",
urladr(), urlfil());
if (RUN_CALLBACK4(opt, postprocess, &cAddr, &cSize, urladr(), urlfil()) == 1) {
if (cAddr != TypedArrayElts(output_buffer)) {
hts_log_print(opt, LOG_DEBUG,
"engine: postprocess-html: callback modified data, applying %d bytes", cSize);
TypedArraySize(output_buffer) = 0;
TypedArrayAppend(output_buffer, cAddr, cSize);
}
hts_log_print(opt, LOG_DEBUG,
"engine: postprocess-html: callback modified data, applying %d bytes", cSize);
TypedArraySize(output_buffer) = 0;
TypedArrayAppend(output_buffer, cAddr, cSize);
}
}

View File

@@ -123,41 +123,111 @@ static HTS_UNUSED void htssafe_compile_time_check_(void) {
(void) check_pointer;
}
/*
* Pointer-destination diagnostics for the buff() macros (GCC/Clang, C only).
*
* strcpybuff()/strcatbuff()/strncatbuff() bounds-check only when the
* destination is a sized char[] array (HTS_IS_CHAR_BUFFER). For a bare char*
* the capacity is unknown, so the macro silently falls back to plain
* strcpy()/strcat()/strncat() while still looking like a checked call.
*
* These stubs route that pointer case through __builtin_choose_expr() so the
* 'warning' attribute fires only at pointer-destination sites; array sites use
* the bounded *_safe_ helpers and stay quiet. The warning names the
* explicit-size replacement (strlcpybuff/strlcatbuff). Diagnostic only: no
* runtime or ABI change, built only on GCC/Clang in C mode. Other compilers
* (MSVC, ...) keep the previous behavior via the #else branches.
*/
#if (defined(__GNUC__) && !defined(__cplusplus))
#if defined(__has_attribute)
#if __has_attribute(warning)
#define HTS_BUFF_PTR_ATTR(msg) __attribute__((unused, noinline, warning(msg)))
#endif
#endif
#ifndef HTS_BUFF_PTR_ATTR
/* 'warning' attribute unavailable: keep noinline so the migration can still
grep for these symbols, but no compile-time diagnostic is emitted. */
#define HTS_BUFF_PTR_ATTR(msg) __attribute__((unused, noinline))
#endif
HTS_BUFF_PTR_ATTR("strcpybuff() destination is a pointer (capacity unknown): "
"NOT bounds-checked; use strlcpybuff(dst, src, size)")
static char *strcpybuff_ptr_(char *dest, const char *src) {
return strcpy(dest, src);
}
HTS_BUFF_PTR_ATTR("strcatbuff() destination is a pointer (capacity unknown): "
"NOT bounds-checked; use strlcatbuff(dst, src, size)")
static char *strcatbuff_ptr_(char *dest, const char *src) {
return strcat(dest, src);
}
HTS_BUFF_PTR_ATTR("strncatbuff() destination is a pointer (capacity unknown): "
"NOT bounds-checked; use strlcatbuff(dst, src, size)")
static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
return strncat(dest, src, n);
}
#endif
/**
* Append at most N characters from "B" to "A".
* If "A" is a char[] variable whose size is not sizeof(char*), then the size
* is assumed to be the capacity of this array.
*/
#if (defined(__GNUC__) && !defined(__cplusplus))
#define strncatbuff(A, B, N) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
strncat_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__), \
strncatbuff_ptr_((A), (B), (N)) )
#else
#define strncatbuff(A, B, N) \
( HTS_IS_NOT_CHAR_BUFFER(A) \
? strncat(A, B, N) \
: strncat_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) )
#endif
/**
* Append characters of "B" to "A".
* If "A" is a char[] variable whose size is not sizeof(char*), then the size
* is assumed to be the capacity of this array.
*/
#if (defined(__GNUC__) && !defined(__cplusplus))
#define strcatbuff(A, B) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
strncat_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__), \
strcatbuff_ptr_((A), (B)) )
#else
#define strcatbuff(A, B) \
( HTS_IS_NOT_CHAR_BUFFER(A) \
? strcat(A, B) \
: strncat_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) )
#endif
/**
* Copy characters from "B" to "A".
* If "A" is a char[] variable whose size is not sizeof(char*), then the size
* is assumed to be the capacity of this array.
*/
#if (defined(__GNUC__) && !defined(__cplusplus))
#define strcpybuff(A, B) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
strcpy_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
"overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__), \
strcpybuff_ptr_((A), (B)) )
#else
#define strcpybuff(A, B) \
( HTS_IS_NOT_CHAR_BUFFER(A) \
? strcpy(A, B) \
: strcpy_safe_(A, sizeof(A), B, \
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
"overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__) )
#endif
/**
* Append characters of "B" to "A", "A" having a maximum capacity of "S".
@@ -217,6 +287,88 @@ static HTS_INLINE HTS_UNUSED char* strcpy_safe_(char *const dest, const size_t s
return strncat_safe_(dest, sizeof_dest, source, sizeof_source, (size_t) -1, exp, file, line);
}
/**
* htsbuff: a non-owning bounded string builder over a fixed buffer.
*
* Companion to the strcpybuff()/strcatbuff() macros for the common case of a
* cursor walking a buffer of known capacity (building a name into a fixed
* array, assembling a status line, etc.). It tracks the write position, bounds
* every write against the real capacity, and aborts on overflow (same contract
* as the *_safe_ helpers), so the error-prone manual "p += strlen(p)" dance
* goes away.
*
* Build one from an in-scope array with htsbuff_array() (capacity via sizeof,
* so pass an array, not a pointer), or from a pointer of known capacity with
* htsbuff_ptr(). The buffer is kept NUL-terminated; htsbuff_str() returns it.
*/
typedef struct {
char *buf; /* backing buffer (kept NUL-terminated) */
size_t cap; /* total capacity of buf, including the NUL */
size_t len; /* current length, excluding the NUL */
} htsbuff;
static HTS_INLINE HTS_UNUSED htsbuff htsbuff_ptr_(char *buf, size_t cap) {
htsbuff b;
b.buf = buf;
b.cap = cap;
b.len = 0;
assertf(cap != 0);
buf[0] = '\0';
return b;
}
/**
* Builder over the in-scope array ARR (capacity = sizeof(ARR)).
* On GCC/Clang this rejects a non-array (e.g. a char* pointer), whose sizeof
* would be the pointer size and silently wrong; use htsbuff_ptr() for pointers.
* On other compilers there is no such guard, so pass only true arrays there.
*/
#if (defined(__GNUC__) && !defined(__cplusplus))
/* 0 for an array, a -1 array-size compile error for a pointer. */
#define htsbuff_must_be_array_(A) \
(sizeof(char[1 - 2 * !!__builtin_types_compatible_p(typeof(A), typeof(&(A)[0]))]) - 1)
#define htsbuff_array(ARR) htsbuff_ptr_((ARR), sizeof(ARR) + htsbuff_must_be_array_(ARR))
#else
#define htsbuff_array(ARR) htsbuff_ptr_((ARR), sizeof(ARR))
#endif
/** Builder over pointer P of known capacity N (N includes the NUL). */
#define htsbuff_ptr(P, N) htsbuff_ptr_((P), (N))
/** Append at most n characters of s (stopping at its NUL). Aborts on overflow. */
static HTS_INLINE HTS_UNUSED void htsbuff_catn(htsbuff *b, const char *s, size_t n) {
const size_t add = strnlen(s, n);
/* Overflow-safe: keep the (potentially huge) 'add' alone on one side. The
maintained invariant len < cap makes 'cap - len' >= 1 (no underflow), so
'add < cap - len' cannot wrap the way 'len + add < cap' could. */
assertf__(add < b->cap - b->len, "htsbuff append overflow", __FILE__, __LINE__);
memcpy(b->buf + b->len, s, add);
b->len += add;
b->buf[b->len] = '\0';
}
/** Append s. Aborts on overflow. */
static HTS_INLINE HTS_UNUSED void htsbuff_cat(htsbuff *b, const char *s) {
htsbuff_catn(b, s, (size_t) -1);
}
/** Append a single character (including '\0' as data). Aborts on overflow. */
static HTS_INLINE HTS_UNUSED void htsbuff_catc(htsbuff *b, char c) {
assertf__(1 < b->cap - b->len, "htsbuff append overflow", __FILE__, __LINE__);
b->buf[b->len++] = c;
b->buf[b->len] = '\0';
}
/** Reset content to s. Aborts on overflow. */
static HTS_INLINE HTS_UNUSED void htsbuff_cpy(htsbuff *b, const char *s) {
b->len = 0;
htsbuff_catn(b, s, (size_t) -1);
}
/** Current NUL-terminated content. */
static HTS_INLINE HTS_UNUSED const char *htsbuff_str(const htsbuff *b) {
return b->buf;
}
#define malloct(A) malloc(A)
#define calloct(A,B) calloc((A), (B))
#define freet(A) do { if ((A) != NULL) { free(A); (A) = NULL; } } while(0)

View File

@@ -193,7 +193,23 @@ HTSEXT_API void hts_mutexfree(htsmutex * mutex) {
HTSEXT_API void hts_mutexlock(htsmutex * mutex) {
assertf(mutex != NULL);
if (*mutex == HTSMUTEX_INIT) { /* must be initialized */
hts_mutexinit(mutex);
/* Initialize exactly once, even when several threads race to lock the same
mutex for the first time. Build our own object, then publish it with a
single atomic compare-and-swap; the threads that lose the race free the
object they built (issue #297). No static guard is needed, which keeps
this safe on Windows 2000 (no statically-initializable lock there). */
htsmutex created = HTSMUTEX_INIT;
hts_mutexinit(&created);
#ifdef _WIN32
if (InterlockedCompareExchangePointer((PVOID volatile *) mutex, created,
HTSMUTEX_INIT) != HTSMUTEX_INIT)
#else
if (!__sync_bool_compare_and_swap(mutex, HTSMUTEX_INIT, created))
#endif
{
hts_mutexfree(&created);
}
}
assertf(*mutex != NULL);
#ifdef _WIN32

View File

@@ -43,17 +43,23 @@ Please visit our Website: http://www.httrack.com
/* END specific definitions */
// libérer filters[0] pour insérer un élément dans filters[0]
#define HT_INSERT_FILTERS0 do {\
int i;\
if (*opt->filters.filptr > 0) {\
for(i = (*opt->filters.filptr)-1 ; i>=0 ; i--) {\
strcpybuff((*opt->filters.filters)[i+1],(*opt->filters.filters)[i]);\
}\
}\
(*opt->filters.filters)[0][0]='\0';\
(*opt->filters.filptr)++;\
assertf((*opt->filters.filptr) < opt->maxfilter); \
} while(0)
/* Per-slot capacity of the filters array, matching the slot stride allocated by
filters_init() in htscore.c (HTS_URLMAXSIZE * 2). */
#define HTS_FILTER_SLOT_SIZE (HTS_URLMAXSIZE * 2)
#define HT_INSERT_FILTERS0 \
do { \
int i; \
if (*opt->filters.filptr > 0) { \
for (i = (*opt->filters.filptr) - 1; i >= 0; i--) { \
strlcpybuff((*opt->filters.filters)[i + 1], \
(*opt->filters.filters)[i], HTS_FILTER_SLOT_SIZE); \
} \
} \
(*opt->filters.filters)[0][0] = '\0'; \
(*opt->filters.filptr)++; \
assertf((*opt->filters.filptr) < opt->maxfilter); \
} while (0)
typedef struct htspair_t {
const char *tag;
@@ -707,17 +713,21 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
forbidden_url = 1;
opt->wizard = 2; // sauter tout le reste
break;
case 0: // interdire les mêmes liens: adr/fil
case 0: // forbid the same link: adr/fil
forbidden_url = 1;
HT_INSERT_FILTERS0; // insérer en 0
strcpybuff(_FILTERS[0], "-");
strcatbuff(_FILTERS[0], jump_identification_const(adr));
if (*fil != '/')
strcatbuff(_FILTERS[0], "/");
strcatbuff(_FILTERS[0], fil);
HT_INSERT_FILTERS0; // insert at slot 0
{
htsbuff f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
htsbuff_cpy(&f, "-");
htsbuff_cat(&f, jump_identification_const(adr));
if (*fil != '/')
htsbuff_cat(&f, "/");
htsbuff_cat(&f, fil);
}
break;
case 1: // éliminer répertoire entier et sous rép: adr/path/ *
case 1: // forbid the whole directory and subdirs: adr/path/*
forbidden_url = 1;
{
size_t i = strlen(fil) - 1;
@@ -725,27 +735,34 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
while((fil[i] != '/') && (i > 0))
i--;
if (fil[i] == '/') {
HT_INSERT_FILTERS0; // insérer en 0
strcpybuff(_FILTERS[0], "-");
strcatbuff(_FILTERS[0], jump_identification_const(adr));
htsbuff f;
HT_INSERT_FILTERS0; // insert at slot 0
f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
htsbuff_cpy(&f, "-");
htsbuff_cat(&f, jump_identification_const(adr));
if (*fil != '/')
strcatbuff(_FILTERS[0], "/");
strncatbuff(_FILTERS[0], fil, i);
if (_FILTERS[0][strlen(_FILTERS[0]) - 1] != '/')
strcatbuff(_FILTERS[0], "/");
strcatbuff(_FILTERS[0], "*");
htsbuff_cat(&f, "/");
htsbuff_catn(&f, fil, i);
if (f.len > 0 && f.buf[f.len - 1] != '/')
htsbuff_cat(&f, "/");
htsbuff_cat(&f, "*");
}
}
// ** ...
break;
case 2: // adresse adr*
case 2: // the whole address: adr*
forbidden_url = 1;
HT_INSERT_FILTERS0; // insérer en 0
strcpybuff(_FILTERS[0], "-");
strcatbuff(_FILTERS[0], jump_identification_const(adr));
strcatbuff(_FILTERS[0], "*");
HT_INSERT_FILTERS0; // insert at slot 0
{
htsbuff f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
htsbuff_cpy(&f, "-");
htsbuff_cat(&f, jump_identification_const(adr));
htsbuff_cat(&f, "*");
}
break;
case 3: // ** A FAIRE
@@ -777,54 +794,70 @@ static int hts_acceptlink_(httrackp * opt, int ptr,
break;
case 5: // autoriser répertoire entier et fils
if ((opt->seeker & 2) == 0) { // interdiction de monter
case 5: // allow the whole directory and its children
if ((opt->seeker & 2) == 0) { // not allowed to go up
size_t i = strlen(fil) - 1;
while((fil[i] != '/') && (i > 0))
i--;
if (fil[i] == '/') {
HT_INSERT_FILTERS0; // insérer en 0
strcpybuff(_FILTERS[0], "+");
strcatbuff(_FILTERS[0], jump_identification_const(adr));
if (*fil != '/')
strcatbuff(_FILTERS[0], "/");
strncatbuff(_FILTERS[0], fil, i + 1);
strcatbuff(_FILTERS[0], "*");
HT_INSERT_FILTERS0; // insert at slot 0
{
htsbuff f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
htsbuff_cpy(&f, "+");
htsbuff_cat(&f, jump_identification_const(adr));
if (*fil != '/')
htsbuff_cat(&f, "/");
htsbuff_catn(&f, fil, i + 1);
htsbuff_cat(&f, "*");
}
}
} else { // then allow the domain
HT_INSERT_FILTERS0; // insert at slot 0
{
htsbuff f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
htsbuff_cpy(&f, "+");
htsbuff_cat(&f, jump_identification_const(adr));
htsbuff_cat(&f, "*");
}
} else { // autoriser domaine alors!!
HT_INSERT_FILTERS0; // insérer en 0 strcpybuff(filters[filptr],"+");
strcpybuff(_FILTERS[0], "+");
strcatbuff(_FILTERS[0], jump_identification_const(adr));
strcatbuff(_FILTERS[0], "*");
}
break;
case 6: // same domain
HT_INSERT_FILTERS0; // insérer en 0 strcpybuff(filters[filptr],"+");
strcpybuff(_FILTERS[0], "+");
strcatbuff(_FILTERS[0], jump_identification_const(adr));
strcatbuff(_FILTERS[0], "*");
HT_INSERT_FILTERS0; // insert at slot 0
{
htsbuff f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
htsbuff_cpy(&f, "+");
htsbuff_cat(&f, jump_identification_const(adr));
htsbuff_cat(&f, "*");
}
break;
//
case 7: // autoriser ce répertoire
{
size_t i = strlen(fil) - 1;
case 7: // allow this directory
{
size_t i = strlen(fil) - 1;
while((fil[i] != '/') && (i > 0))
i--;
if (fil[i] == '/') {
HT_INSERT_FILTERS0; // insérer en 0
strcpybuff(_FILTERS[0], "+");
strcatbuff(_FILTERS[0], jump_identification_const(adr));
while ((fil[i] != '/') && (i > 0))
i--;
if (fil[i] == '/') {
HT_INSERT_FILTERS0; // insert at slot 0
{
htsbuff f = htsbuff_ptr(_FILTERS[0], HTS_FILTER_SLOT_SIZE);
htsbuff_cpy(&f, "+");
htsbuff_cat(&f, jump_identification_const(adr));
if (*fil != '/')
strcatbuff(_FILTERS[0], "/");
strncatbuff(_FILTERS[0], fil, i + 1);
strcatbuff(_FILTERS[0], "*[file]");
htsbuff_cat(&f, "/");
htsbuff_catn(&f, fil, i + 1);
htsbuff_cat(&f, "*[file]");
}
}
}
break;
break;
case 50: // on fait rien
break;

View File

@@ -193,6 +193,7 @@ HTSEXT_API int structcheck(const char *path);
HTSEXT_API int structcheck_utf8(const char *path);
HTSEXT_API int dir_exists(const char *path);
HTSEXT_API void infostatuscode(char *msg, int statuscode);
HTSEXT_API const char *infostatuscode_const(int statuscode);
HTSEXT_API TStamp mtime_local(void);
HTSEXT_API void qsec2str(char *st, TStamp t);
HTSEXT_API char *int2char(strc_int2bytes2 * strc, int n);

View File

@@ -1,5 +1,36 @@
#!/bin/bash
#
# minimalistic charset test
test "$(httrack -O /dev/null -#3 "iso-8859-1" "café")" == "café" || exit 1
# charset -> UTF-8 conversion (hts_convertStringToUTF8).
# -#3 <charset> <string> prints the string re-decoded from <charset> as UTF-8.
conv() {
test "$(httrack -O /dev/null -#3 "$1" "$2")" == "$3" || exit 1
}
# crash probe: malformed input must exit cleanly, not abort.
runs() {
httrack -O /dev/null -#3 "$1" "$2" >/dev/null 2>&1 || exit 1
}
# the source bytes below are UTF-8 (this file is UTF-8); "café" is 0x63 61 66 C3 A9.
# already UTF-8: identity
conv 'utf-8' 'café' 'café'
# bytes reinterpreted as latin-1: each input byte becomes one codepoint
conv 'iso-8859-1' 'café' 'café'
# windows-1252 is NOT latin-1: 0x80 is the euro sign, not U+0080. This is the
# case that actually exercises the cp1252 table (the 0x80-0x9F range).
conv 'windows-1252' $'\x80' '€'
# pure ASCII is charset-invariant
conv 'us-ascii' 'hello' 'hello'
# unknown charset: ASCII passes through unchanged, but non-ASCII input cannot be
# decoded and yields empty output (an error is printed to stderr).
conv 'no-such-charset-xyz' 'abc' 'abc'
test "$(httrack -O /dev/null -#3 'no-such-charset-xyz' 'café' 2>/dev/null)" == "" || exit 1
# malformed UTF-8 (lone continuation byte, truncated lead byte) must not crash
runs 'utf-8' $'\x80'
runs 'utf-8' $'\xc3'

71
tests/01_engine-cmdline.test Executable file
View File

@@ -0,0 +1,71 @@
#!/bin/bash
#
# Offline command-line option tests (no network). The -F user-agent and -%X
# raw-header values used to be rejected past 126 / 256 bytes (#152); they are
# now bounded only by the general per-argument cap (HTS_CDLMAXSIZE). A value up
# to that cap is accepted on both the short (-F, -%X) and long (--user-agent,
# --headers) forms, and an over-cap value is refused cleanly rather than
# overrunning a fixed scratch buffer.
set -u
tmp=$(mktemp -d "${TMPDIR:-/tmp}/httrack_cmdline.XXXXXX") || exit 1
trap 'rm -rf "$tmp"' EXIT HUP INT QUIT PIPE TERM
echo '<html><body>hello</body></html>' >"$tmp/index.html"
# a string of N repeated 'A' characters
nchars() {
printf 'A%.0s' $(seq 1 "$1")
}
# crawl the local fixture with the given extra args; leaves the exit status in RC
run() {
local out="$1"
shift
rm -rf "$out"
mkdir -p "$out"
httrack "file://$tmp/index.html" -O "$out" --quiet -n "$@" >"$out/.log" 2>&1
RC=$?
}
# assert the value was accepted: clean exit and the fixture was mirrored
accepted() {
{ test "$RC" -eq 0 && test -n "$(find "$1" -type f -path '*/index.html' -print -quit)"; } ||
! echo "FAIL: $2 (exit $RC)" || exit 1
}
# assert the value was refused cleanly: a normal error exit, never a crash
# (a SIGABRT from an overflowed scratch buffer would surface as exit 134)
refused() {
{ test "$RC" -ne 0 && test "$RC" -ne 134; } ||
! echo "FAIL: $1 (exit $RC)" || exit 1
}
# a value past the old 126/256 caps but within the cap is accepted, on both the
# short and long form of each option
long=$(nchars 900)
run "$tmp/ua-s" -F "$long"
accepted "$tmp/ua-s" "#152: long -F user-agent rejected or crashed"
run "$tmp/ua-l" --user-agent "$long"
accepted "$tmp/ua-l" "#152: long --user-agent rejected or crashed"
run "$tmp/hd-s" "-%X" "X-A: $long"
accepted "$tmp/hd-s" "#152: long -%X header rejected or crashed"
run "$tmp/hd-l" --headers "X-B: $long"
accepted "$tmp/hd-l" "#152: long --headers rejected or crashed"
# a value just under the cap (>1000) must not overflow the long-form alias
# scratch buffer (the param[] copy in optalias_check)
run "$tmp/ua-n" --user-agent "$(nchars 1010)"
accepted "$tmp/ua-n" "#152: near-cap --user-agent overflowed the param[] buffer"
# a value over the cap is refused cleanly (graceful error, not a SIGABRT), on
# both forms
over=$(nchars 1100)
run "$tmp/ov-s" -F "$over"
refused "#152: over-cap -F not refused cleanly"
run "$tmp/ov-l" --user-agent "$over"
refused "#152: over-cap --user-agent not refused cleanly"
exit 0

View File

@@ -1,5 +1,49 @@
#!/bin/bash
#
# basic entities handling (with boggy entities handling)
test "$(httrack -O /dev/null -#6 "&foo;&nbsp;th&eacute;&amp;caf&#xe9;&#e9;&#x3082;&#12398;&#x306e;&#x3051;&#x59eb;")" == "&foo; thé&café&#e9;もののけ姫" || exit 1
# HTML entity unescaping (hts_unescapeEntitiesWithCharset).
# -#6 <string> prints the string with entities decoded (UTF-8 output).
ent() {
test "$(httrack -O /dev/null -#6 "$1")" == "$2" || exit 1
}
# crash probe: malformed input must exit cleanly, not abort.
runs() {
httrack -O /dev/null -#6 "$1" >/dev/null 2>&1 || exit 1
}
# named entities
ent '&amp;' '&'
ent '&lt;&gt;' '<>'
ent '&eacute;' 'é'
# numeric: decimal and hex
ent '&#65;&#66;' 'AB'
ent '&#x41;' 'A'
ent '&#xe9;' 'é'
# malformed numeric reference (decimal 'e9' has no digits) is left verbatim
ent '&#e9;' '&#e9;'
# U+0000 is not emitted; the reference is left verbatim
ent '&#0;' '&#0;'
# unknown entity is left verbatim
ent '&unknownentity;' '&unknownentity;'
# no entities: pass-through
ent 'plain text' 'plain text'
# decoding is a single pass: &amp;amp; -> &amp; (not &)
ent '&amp;amp;' '&amp;'
# KNOWN BUG: &nbsp; (U+00A0) decodes to a plain space (0x20), not C2 A0. The
# engine forces 160 -> 32 in htsencoding.c (FIXME hack). Locked here; if that
# hack is ever removed, update this to expect the C2 A0 byte.
ent '&nbsp;' ' '
# overflowing numeric reference must not crash (value far above U+10FFFF)
runs '&#9999999999;'
# original compound case. NOTE: the space after '&foo;' is the &nbsp; known bug
# above (U+00A0 -> 0x20), not a real space in the source.
ent '&foo;&nbsp;th&eacute;&amp;caf&#xe9;&#e9;&#x3082;&#12398;&#x306e;&#x3051;&#x59eb;' '&foo; thé&café&#e9;もののけ姫'

71
tests/01_engine-filter.test Executable file
View File

@@ -0,0 +1,71 @@
#!/bin/bash
#
# wildcard filter engine (strjoker), the core of +/- include/exclude rules.
# -#0 <filter> <string> prints "<string> does match <filter>" or "... does NOT match ...".
match() {
test "$(httrack -O /dev/null -#0 "$1" "$2")" == "$2 does match $1" || exit 1
}
nomatch() {
test "$(httrack -O /dev/null -#0 "$1" "$2")" == "$2 does NOT match $1" || exit 1
}
# bare star matches everything
match '*' 'anything/at/all'
# prefix / suffix
match 'foo*' 'foobar'
nomatch 'foo*' 'xfoobar'
match '*.gif' 'a/b/c.gif'
# extension match is case-insensitive
match '*.GIF' 'a.gif'
# character classes
match '*[A-Z].txt' 'B.txt'
nomatch '*[A-Z].txt' 'b.txt'
match '*[0-9]' '5'
nomatch '*[0-9]' 'x'
# comma-separated class: both ranges are active, the comma is not matched
# literally and a char in neither range fails
match '*[A-Z,0-9]' 'Q'
match '*[A-Z,0-9]' '3'
nomatch '*[A-Z,0-9]' 'a'
# named groups: [file] stops at '/', [path] spans it
match '*[file].html' 'foo.html'
nomatch '*[file].html' 'foo/bar.html'
match '*[path]x' 'a/b/x'
# *[] means "nothing more after the star"
nomatch '*[]' 'abc'
# multiple stars
match '*foo*bar' 'foozbar'
# '?' is the query-string marker, not a single-char wildcard
nomatch 'a?c' 'abc'
# backslash escapes a metacharacter inside a class so it is matched literally.
# Quirk: the decoder also adds the backslash itself to the set, so '\X' matches
# both X and '\'. These assertions pin that behavior.
match '*[\*]' '*'
match '*[\*]' "\\"
nomatch '*[\*]' 'a'
match '*[\\]' "\\"
nomatch '*[\\]' 'a'
match '*[\[]' '['
match '*[\[]' "\\"
nomatch '*[\[]' 'a'
# A literal ']' cannot be a class member: the class parser stops at the first
# ']', escaped or not. So '*[\[\]]' does NOT mean "the [ or ] character" as the
# filter guide claims (GitHub #148); it parses as the class {'[','\'} followed
# by a trailing literal ']'. These assertions document the current (buggy)
# behavior so any future matcher fix is a deliberate, visible change.
nomatch '*[\[\]]' '[' # not matched, despite the docs
match '*[\[\]]' ']' # only via the empty class-match + trailing ']'
match '*[\[\]]' '[]' # one of {'[','\'} then the trailing ']'
nomatch '*[\[\]]' '[]x'

View File

@@ -1,10 +1,36 @@
#!/bin/bash
#
# IDNA routine
test "$(httrack -O /dev/null -#4 "www.café.com")" == "www.xn--caf-dma.com" || exit 1
test "$(httrack -O /dev/null -#4 "www.もののけ姫-the-movie.com")" == "www.xn---the-movie-g63irla2z8297c.com" || exit 1
# IDNA / punycode encode (-#4) and decode (-#5). This code has a CVE history,
# so the edge cases below cover passthrough, round-trips, and malformed input.
# reverse IDNA
test "$(httrack -O /dev/null -#5 "www.xn--caf-dma.com")" == "www.café.com" || exit 1
test "$(httrack -O /dev/null -#5 "www.xn---the-movie-g63irla2z8297c.com")" == "www.もののけ姫-the-movie.com" || exit 1
enc() { test "$(httrack -O /dev/null -#4 "$1")" == "$2" || exit 1; }
dec() { test "$(httrack -O /dev/null -#5 "$1")" == "$2" || exit 1; }
# crash probe: malformed ACE input must exit cleanly, not abort.
runs() { httrack -O /dev/null -#5 "$1" >/dev/null 2>&1 || exit 1; }
# encode
enc 'www.café.com' 'www.xn--caf-dma.com'
enc 'www.もののけ姫-the-movie.com' 'www.xn---the-movie-g63irla2z8297c.com'
enc 'münchen.de' 'xn--mnchen-3ya.de'
# decode (reverse of the above)
dec 'www.xn--caf-dma.com' 'www.café.com'
dec 'www.xn---the-movie-g63irla2z8297c.com' 'www.もののけ姫-the-movie.com'
dec 'xn--mnchen-3ya.de' 'münchen.de'
# pure-ASCII hostnames are unchanged either way
enc 'plain.example.com' 'plain.example.com'
dec 'plain.example.com' 'plain.example.com'
enc 'a.b.c.example.org' 'a.b.c.example.org'
# an all-ASCII label (even one starting with the xn-- prefix) is passed through
# by the encoder untouched, since there is nothing to encode
enc 'xn--already-encoded.com' 'xn--already-encoded.com'
# an empty punycode payload decodes back to the bare xn-- label
dec 'xn--' 'xn--'
# malformed ACE payloads (invalid base-36, garbage) must not crash
runs 'xn--!!!'
runs 'xn--already-encoded.com'

27
tests/01_engine-mime.test Executable file
View File

@@ -0,0 +1,27 @@
#!/bin/bash
#
# MIME type guessing from extension (get_httptype / give_mimext).
# -#2 <path> prints "<path> is '<mime>'" then "and its local type is '.<ext>'".
mime() {
test "$(httrack -O /dev/null -#2 "$1" | head -1)" == "$1 is '$2'" || exit 1
}
unknown() {
test "$(httrack -O /dev/null -#2 "$1" | head -1)" == "$1 is of an unknown MIME type" || exit 1
}
mime '/a/b.html' 'text/html'
mime '/a/b.htm' 'text/html'
mime '/x.css' 'text/css'
mime '/x.js' 'application/x-javascript'
mime '/x.png' 'image/png'
mime '/x.jpg' 'image/jpeg'
mime '/x.gif' 'image/gif'
mime '/x.txt' 'text/plain'
mime '/x.xml' 'application/xml'
mime '/x.pdf' 'application/pdf'
# no extension, or one not in the table
unknown '/noext'
unknown '/x.unknownext'

155
tests/01_engine-parse.test Executable file
View File

@@ -0,0 +1,155 @@
#!/bin/bash
#
# Offline HTML parser tests: each section crawls a file:// fixture (no network)
# and checks which assets the parser captured and how it rewrote the links.
set -u
tmp=$(mktemp -d "${TMPDIR:-/tmp}/httrack_parse.XXXXXX") || exit 1
trap 'rm -rf "$tmp"' EXIT HUP INT QUIT PIPE TERM
# a minimal valid 1x1 GIF, reused for every referenced asset
gif() {
printf 'GIF89a\1\0\1\0\200\0\0\0\0\0\377\377\377!\371\4\1\0\0\0\0,\0\0\0\0\1\0\1\0\0\2\2D\1\0;' >"$1"
}
# crawl <fixture-html> into <out> with link rewriting on, no extra fetching
crawl() {
local html="$1" out="$2"
rm -rf "$out"
mkdir -p "$out"
httrack "file://$html" -O "$out" --quiet --near -n >"$out/.log" 2>&1
}
# assert a file with the given basename was saved somewhere under <out>
found() {
test -n "$(find "$2" -type f -name "$1" -print -quit)" ||
! echo "FAIL: expected '$1' to be downloaded under $2" || exit 1
}
# assert NO file with the given basename was saved (e.g. a descriptor token must
# not be mistaken for a URL)
notfound() {
test -z "$(find "$2" -type f -name "$1" -print -quit)" ||
! echo "FAIL: '$1' should not have been downloaded under $2" || exit 1
}
# the mirrored fixture page (under "file/"), not HTTrack's own landing index
savedhtml() {
find "$1" -type f -path '*/file/*' -name index.html -print -quit
}
# srcset on <img> and <source> (#235, #236): every candidate captured and
# rewritten, descriptors preserved, following attributes left intact.
site="$tmp/srcset"
mkdir -p "$site"
for f in a b c d e f g h i j v dz; do gif "$site/$f.gif"; done
# unquoted heredoc: $site expands in the absolute-URL candidate
cat >"$site/index.html" <<EOF
<html><body>
<img src="a.gif" srcset="b.gif 480w, c.gif 800w">
<picture><source srcset="d.gif 1x, c.gif 2x"><img src="a.gif"></picture>
<img srcset="e.gif, f.gif">
<img srcset="g.gif 2x" alt="trailing attr after srcset">
<img srcset=" h.gif 2x , i.gif ">
<video><source src="v.gif"></video>
<img srcset="file://$site/j.gif 2x">
<img srcset="data:image/gif;base64,R0lGODlhAQABAAAAACw= 1x, dz.gif 2x">
<img srcset="">
<a href="a.gif">plain link still works</a>
</body></html>
EOF
out="$tmp/srcset-out"
crawl "$site/index.html" "$out"
# every candidate downloads, incl. unique tails (catches first-only parsing),
# whitespace-padded (h,i), <source src> (v), absolute (j), post-data: URI (dz)
for f in a b c d e f g h i j v dz; do found "$f.gif" "$out"; done
# the width/density descriptors are not URLs and must not be fetched
notfound "480w" "$out"
notfound "800w" "$out"
notfound "2x" "$out"
saved=$(savedhtml "$out")
test -n "$saved" || ! echo "FAIL: saved index.html not found" || exit 1
# descriptors must survive the rewrite (no "b.gif 480w" mangled into a path)
grep -Eq 'srcset="[^"]*480w[^"]*800w' "$saved" ||
! echo "FAIL: srcset width descriptors lost/reordered in rewritten HTML" || exit 1
grep -Eq 'srcset="[^"]*1x[^"]*2x' "$saved" ||
! echo "FAIL: srcset density descriptors lost/reordered in rewritten HTML" || exit 1
# the descriptor-less comma form keeps both candidates and the separator verbatim
grep -Eq 'srcset="e\.gif, f\.gif"' "$saved" ||
! echo "FAIL: comma-separated srcset without descriptors was altered" || exit 1
# an attribute following srcset in the same tag must be left intact
grep -q 'alt="trailing attr after srcset"' "$saved" ||
! echo "FAIL: srcset swallowed a following attribute" || exit 1
# a comma inside a URL (data: URI, CDN path) is part of the URL, not a split
# point (WHATWG): the data: URI stays verbatim; the next candidate (dz) downloads
grep -Fq 'data:image/gif;base64,R0lGODlhAQABAAAAACw= 1x' "$saved" ||
! echo "FAIL: a comma inside a data: URI srcset candidate was mis-split" || exit 1
# real rewrite, not passthrough: the absolute file:// candidate becomes local
# (a flat fixture can't show this; the footer comment's file:// is not in srcset)
grep -Eq 'srcset="j\.gif 2x"' "$saved" ||
! echo "FAIL: absolute file:// srcset URL was not rewritten to a local link" || exit 1
! grep -Eq 'srcset="[^"]*file://' "$saved" ||
! echo "FAIL: a file:// URL survived inside a rewritten srcset attribute" || exit 1
# xlink:href (#298) and CSS background-image (#237): detected and rewritten to
# local. background-image is covered in both an external <style> block and an
# inline style attribute, with the URL unquoted, double-quoted and single-quoted
# (the quote style is preserved on rewrite). No-detect attributes (title, alt,
# ...) are left untouched. Asserted by rewrite (deterministic), not download.
# data-* (#201/#203) is omitted: its detection is currently nondeterministic and
# can't be locked yet.
site2="$tmp/attrs"
mkdir -p "$site2"
for f in xl ibg ibgs cex cexd cexs tt; do gif "$site2/$f.gif"; done
cat >"$site2/index.html" <<EOF
<html><head><style>
.a { background-image: url(file://$site2/cex.gif); }
.b { background-image: url("file://$site2/cexd.gif"); }
.c { background-image: url('file://$site2/cexs.gif'); }
</style></head><body>
<a xlink:href="file://$site2/xl.gif">xlink:href (#298)</a>
<div style="background-image:url(file://$site2/ibg.gif)"></div>
<div style="background-image:url('file://$site2/ibgs.gif')"></div>
<span title="file://$site2/tt.gif">excluded attribute</span>
</body></html>
EOF
out2="$tmp/attrs-out"
crawl "$site2/index.html" "$out2"
saved2=$(savedhtml "$out2")
test -n "$saved2" || ! echo "FAIL: saved attrs page not found" || exit 1
# detected attributes: the absolute URL is rewritten to a local link
grep -Eq 'xlink:href="xl\.gif"' "$saved2" ||
! echo "FAIL #298: xlink:href not detected/rewritten" || exit 1
# #237 external <style> block, each quoting form, quote style preserved
grep -Eq 'url\(cex\.gif\)' "$saved2" ||
! echo "FAIL #237: unquoted background-image in <style> not rewritten" || exit 1
grep -Eq 'url\("cexd\.gif"\)' "$saved2" ||
! echo "FAIL #237: double-quoted background-image in <style> not rewritten" || exit 1
grep -Eq "url\('cexs\.gif'\)" "$saved2" ||
! echo "FAIL #237: single-quoted background-image in <style> not rewritten" || exit 1
# #237 inline style attribute, unquoted and single-quoted url()
grep -Eq 'style="background-image:url\(ibg\.gif\)"' "$saved2" ||
! echo "FAIL #237: inline unquoted background-image not rewritten" || exit 1
grep -Eq "style=\"background-image:url\('ibgs\.gif'\)\"" "$saved2" ||
! echo "FAIL #237: inline single-quoted background-image not rewritten" || exit 1
# no file:// URL survived inside any rewritten background-image
! grep -Eq 'background-image:[^;"]*file://' "$saved2" ||
! echo "FAIL #237: a file:// URL survived inside a rewritten background-image" || exit 1
# excluded attribute: title is on the no-detect list, so its value is left as-is
grep -q 'title="file://' "$saved2" ||
! echo "FAIL: a no-detect attribute (title) was wrongly rewritten" || exit 1
exit 0

View File

@@ -1,9 +1,26 @@
#!/bin/bash
#
# simplify engine
test "$(httrack -O /dev/null -#1 ./foo/bar/)" == "simplified=foo/bar/" || exit 1
test "$(httrack -O /dev/null -#1 ./foo/bar)" == "simplified=foo/bar" || exit 1
test "$(httrack -O /dev/null -#1 ./foo/./bar)" == "simplified=foo/bar" || exit 1
test "$(httrack -O /dev/null -#1 ./foo/bar/.././tmp/foobar)" == "simplified=foo/tmp/foobar" || exit 1
test "$(httrack -O /dev/null -#1 ./foo/bar/.././tmp/foobar/../foobaz)" == "simplified=foo/tmp/foobaz" || exit 1
# path simplify engine (fil_simplifie): collapses ./ and ../ segments.
simp() {
test "$(httrack -O /dev/null -#1 "$1")" == "simplified=$2" || exit 1
}
simp './foo/bar/' 'foo/bar/'
simp './foo/bar' 'foo/bar'
simp './foo/./bar' 'foo/bar'
simp './foo/bar/.././tmp/foobar' 'foo/tmp/foobar'
simp './foo/bar/.././tmp/foobar/../foobaz' 'foo/tmp/foobaz'
# single '..' collapses one segment
simp './a/../b' 'b'
simp './a/b/../../c' 'c'
# repeated './' is squeezed
simp './a/./././b' 'a/b'
# leading '..' that would go above the root is discarded, per RFC 3986 §5.2.4
simp './a/../../b' 'b'
# empty segments ('//') are not dot-segments and are preserved, per RFC 3986
simp 'a//b' 'a//b'

34
tests/01_engine-strsafe.test Executable file
View File

@@ -0,0 +1,34 @@
#!/bin/bash
#
# htssafe.h bounded string operations (driven by 'httrack -#8').
# Success path: every bounded op (strcpybuff/strcatbuff/strncatbuff/strlcpybuff)
# must behave correctly. Like the other -# debug modes, a trailing token is
# required (a bare '-#8' falls through to the usage screen).
out=$(httrack -#8 run)
test $? -eq 0 || exit 1
test "$out" == "strsafe: OK" || exit 1
# Overflow path: an over-capacity write into a sized buffer must be caught by
# the bounded macro and abort the process, not be silently truncated/completed.
# Assert the htssafe abort signature specifically, so the test cannot pass for
# an unrelated reason (e.g. the -#8 mode being gone and falling through to the
# usage screen, which also exits non-zero).
err=$(httrack -#8 overflow "this string is far too long for the buffer" 2>&1)
case "$err" in
*"strsafe: NOT aborted"*) echo "over-capacity write was NOT caught" >&2; exit 1 ;;
*"overflow while copying"*) ;;
*) echo "expected htssafe overflow abort, got: $err" >&2; exit 1 ;;
esac
# Same guarantee for the htsbuff builder. The source is exactly the buffer
# capacity (4 bytes into a 4-byte buffer), so this also pins the boundary: a
# '<=' off-by-one in the capacity check would let it through (and print "NOT
# aborted"). Match the specific htsbuff abort message, not just any assert.
err=$(httrack -#8 overflow-buff "abcd" 2>&1)
case "$err" in
*"strsafe: NOT aborted"*) echo "htsbuff over-capacity write was NOT caught" >&2; exit 1 ;;
*"htsbuff append overflow"*) ;;
*) echo "expected htsbuff overflow abort, got: $err" >&2; exit 1 ;;
esac

62
tests/02_update-cache.test Executable file
View File

@@ -0,0 +1,62 @@
#!/bin/bash
#
# Update path: re-mirroring a site reads the cache (cache_readex) to decide what
# is up to date -- a path the one-shot crawl tests never exercise. Offline
# (file://), so it always runs.
#
# 1. mirror, then re-mirror unchanged -> the cache-read pass must complete clean
# (guards against a crash/abort/error in cache_readex).
# 2. change a source file, re-mirror -> the update must pick up the new content
# (guards the update decision that reads the cached metadata).
set -eu
site=$(mktemp -d)
out=$(mktemp -d)
trap 'rm -rf "$site" "$out"' EXIT
cat >"$site/index.html" <<EOF
<a href="a.html">a</a> <a href="sub/b.html">b</a>
EOF
echo 'OLDCONTENT' >"$site/a.html"
mkdir -p "$site/sub"
echo '<p>bbb</p>' >"$site/sub/b.html"
url="file://$site/index.html"
# count Error: lines in the log (grep -c exits 1 on zero matches: guard it)
errors() { grep -ciE '^[0-9:]*[[:space:]]Error:' "$out/hts-log.txt" || true; }
# 1. fresh mirror writes the cache
httrack "$url" -O "$out" -q -%v0 -r3 >/dev/null 2>&1
test -e "$out/hts-cache/new.zip" || {
echo "no cache was written" >&2
exit 1
}
# 2. re-mirror unchanged: the update reads the cache and must complete cleanly
httrack "$url" -O "$out" -q -%v0 -r3 >/dev/null 2>&1
test "$(errors)" = 0 || {
echo "update (unchanged) reported errors" >&2
exit 1
}
for suffix in a.html sub/b.html; do
find "$out" -path "*/$suffix" | grep -q . || {
echo "missing $suffix after update" >&2
exit 1
}
done
# 3. change a source file: the update must pick up the new content
sleep 1
echo 'NEWCONTENT' >"$site/a.html"
httrack "$url" -O "$out" -q -%v0 -r3 >/dev/null 2>&1
test "$(errors)" = 0 || {
echo "update (changed) reported errors" >&2
exit 1
}
grep -q NEWCONTENT "$(find "$out" -path '*/a.html')" || {
echo "update did not pick up the changed source" >&2
exit 1
}

View File

@@ -9,6 +9,26 @@ TESTS_ENVIRONMENT += HTTPS_SUPPORT=$(HTTPS_SUPPORT)
TESTS_ENVIRONMENT += top_srcdir=$(top_srcdir)
TEST_EXTENSIONS = .test
TESTS = 00_runnable.test 01_engine-charset.test 01_engine-entities.test 01_engine-hashtable.test 01_engine-idna.test 01_engine-simplify.test 02_manpage-regen.test 10_crawl-simple.test 11_crawl-cookies.test 11_crawl-idna.test 11_crawl-international.test 11_crawl-longurl.test 11_crawl-parsing.test 12_crawl_https.test
TESTS = \
00_runnable.test \
01_engine-charset.test \
01_engine-cmdline.test \
01_engine-entities.test \
01_engine-filter.test \
01_engine-hashtable.test \
01_engine-idna.test \
01_engine-mime.test \
01_engine-parse.test \
01_engine-simplify.test \
01_engine-strsafe.test \
02_manpage-regen.test \
02_update-cache.test \
10_crawl-simple.test \
11_crawl-cookies.test \
11_crawl-idna.test \
11_crawl-international.test \
11_crawl-longurl.test \
11_crawl-parsing.test \
12_crawl_https.test
CLEANFILES = check-network_sh.cache

View File

@@ -472,7 +472,7 @@ TESTS_ENVIRONMENT = PATH=$(top_builddir)/src$(PATH_SEPARATOR)$$PATH \
ONLINE_UNIT_TESTS=$(ONLINE_UNIT_TESTS) \
HTTPS_SUPPORT=$(HTTPS_SUPPORT) top_srcdir=$(top_srcdir)
TEST_EXTENSIONS = .test
TESTS = 00_runnable.test 01_engine-charset.test 01_engine-entities.test 01_engine-hashtable.test 01_engine-idna.test 01_engine-simplify.test 02_manpage-regen.test 10_crawl-simple.test 11_crawl-cookies.test 11_crawl-idna.test 11_crawl-international.test 11_crawl-longurl.test 11_crawl-parsing.test 12_crawl_https.test
TESTS = 00_runnable.test 01_engine-charset.test 01_engine-cmdline.test 01_engine-entities.test 01_engine-filter.test 01_engine-hashtable.test 01_engine-idna.test 01_engine-mime.test 01_engine-parse.test 01_engine-simplify.test 02_manpage-regen.test 10_crawl-simple.test 11_crawl-cookies.test 11_crawl-idna.test 11_crawl-international.test 11_crawl-longurl.test 11_crawl-parsing.test 12_crawl_https.test
CLEANFILES = check-network_sh.cache
all: all-am