Return HTTP status reason phrases via a const-returning switch

infostatuscode() was a ~60-case switch, each arm strcpybuff()-ing a literal into the caller's char* msg: 42 unchecked pointer-destination copies of static data. Keep the same O(1) switch dispatch but have it return the phrase instead of copying -- new public infostatuscode_const(int) -> const char* (or NULL) -- and do the copy in a thin wrapper. infostatuscode() preserves exact behavior: a known code overwrites msg; an unknown code keeps any caller-provided message, else writes "Unknown error". The single remaining copy uses strlcpybuff with the documented 64-byte minimum (longest phrase is 31; all callers pass >= 80). Drops 42 pointer-destination warnings (htslib.c 56 -> 14; tree 179 -> 137). No dispatch regression: it stays a switch (jump table), no allocation, no per-call scan. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Merge pull request #332 from xroche/cleanup/url_savename-htsbuff
2026-06-14 22:33:54 +03:00 · 2026-06-14 13:14:23 +02:00 · 2026-06-14 13:01:32 +02:00 · 2026-06-14 12:59:29 +02:00 · 2026-06-14 12:58:42 +02:00 · 2026-06-14 12:55:17 +02:00
33 changed files with 1146 additions and 347 deletions
--- a/.clang-format
+++ b/.clang-format
@@ -0,0 +1,27 @@
+# clang-format 19 config for the HTTrack C engine.
+#
+# IMPORTANT: this is applied to TOUCHED LINES ONLY (via git-clang-format / the
+# CI format check). The engine was originally formatted by GNU indent / by hand
+# and does NOT round-trip through clang-format, so a whole-tree reformat is
+# intentionally never done. Format the lines you change; leave the rest.
+#
+# Reverse-engineered from src/*.c: 2-space indent, no tabs, 80 columns, pointers
+# bound to the name (char *x), attached braces, un-indented case labels, and a
+# space after C-style casts ((int) x). Most of that is LLVM's defaults; the
+# lines below are the deliberate deviations.
+
+BasedOnStyle: LLVM
+
+# Engine specifics / deviations from LLVM:
+SpaceAfterCStyleCast: true   # "(int) x", overwhelmingly dominant (542 vs 7)
+SortIncludes: false          # C include order can be significant; never reorder
+IncludeBlocks: Preserve      # do not merge/reflow include groups
+
+# Stated explicitly for robustness against base-style drift (these match LLVM):
+IndentWidth: 2
+UseTab: Never
+ColumnLimit: 80
+PointerAlignment: Right
+IndentCaseLabels: false
+SpaceBeforeParens: ControlStatements
+AllowShortIfStatementsOnASingleLine: Never
--- a/.githooks/README.md
+++ b/.githooks/README.md
@@ -0,0 +1,35 @@
+# Git hooks
+
+Versioned hooks for this repo. Enable them once per clone:
+
+```sh
+git config core.hooksPath .githooks
+```
+
+## pre-commit: auto-format changed C lines
+
+Runs `git-clang-format` (clang-format 19, using the repo `.clang-format`) on the
+**staged lines only** and re-stages the result, so every commit is
+clang-format-clean and the CI `format` check passes. It never reformats the
+whole tree, only the lines you changed.
+
+- Disable for a single commit: `HTTRACK_NO_AUTOFORMAT=1 git commit ...`
+- If clang-format 19 isn't installed, the hook skips silently (CI still
+  enforces). Install it with your distro's `clang-format-19`, or from
+  apt.llvm.org.
+- If a file has *both* staged and unstaged changes, the hook does not
+  auto-mutate it (that would commit the unstaged part); it instead reports
+  whether its staged lines need formatting and asks you to stage/stash the rest.
+
+### noexec working trees
+
+Git executes the hook directly, so if your working tree is on a `noexec` mount
+git cannot run `.githooks/pre-commit`. Point `core.hooksPath` at a copy on an
+exec filesystem instead:
+
+```sh
+mkdir -p ~/.httrack-hooks && cp .githooks/pre-commit ~/.httrack-hooks/
+chmod +x ~/.httrack-hooks/pre-commit
+git config core.hooksPath ~/.httrack-hooks
+```
+</content>
--- a/.githooks/pre-commit
+++ b/.githooks/pre-commit
@@ -0,0 +1,71 @@
+#!/usr/bin/env bash
+#
+# Auto-format the staged C lines with clang-format (touched lines only), then
+# re-stage them, so commits stay clang-format-clean and CI's format check passes.
+#
+# Enable once per clone:  git config core.hooksPath .githooks
+# Skip for one commit:    HTTRACK_NO_AUTOFORMAT=1 git commit ...
+#
+# Matches the CI gate (.clang-format, clang-format 19). It only ever touches the
+# lines a commit changes; it never reformats the whole tree.
+
+set -euo pipefail
+
+[ "${HTTRACK_NO_AUTOFORMAT:-}" = "1" ] && exit 0
+
+# Staged C/H files (added/copied/modified/renamed).
+mapfile -t files < <(git diff --cached --name-only --diff-filter=ACMR -- '*.c' '*.h')
+[ "${#files[@]}" -eq 0 ] && exit 0
+
+# Locate clang-format 19 and the git driver; if absent, skip (CI is the backstop).
+cf=""
+for c in clang-format-19 clang-format; do
+    if command -v "$c" >/dev/null 2>&1; then
+        case "$("$c" --version)" in *"version 19."*)
+            cf="$c"
+            break
+            ;;
+        esac
+    fi
+done
+gcf=""
+for g in git-clang-format-19 git-clang-format; do
+    command -v "$g" >/dev/null 2>&1 && {
+        gcf="$g"
+        break
+    }
+done
+if [ -z "$cf" ] || [ -z "$gcf" ]; then
+    echo "pre-commit: clang-format 19 not found; skipping auto-format (CI still checks)." >&2
+    exit 0
+fi
+
+# Files that are staged AND also have unstaged changes: re-staging them would
+# pull in the unstaged work, so don't auto-mutate. Check instead and let the
+# author resolve it.
+partial=()
+for f in "${files[@]}"; do
+    if ! git diff --quiet -- "$f"; then partial+=("$f"); fi
+done
+
+if [ "${#partial[@]}" -ne 0 ]; then
+    d="$("$gcf" --binary "$cf" --style=file --staged --diff --extensions c,h || true)"
+    case "$d" in
+    "" | "no modified files to format" | *"did not modify any files"*)
+        exit 0
+        ;; # staged lines already clean
+    *)
+        echo "pre-commit: these files have both staged and unstaged changes, so" >&2
+        echo "auto-format was skipped to avoid committing unstaged work:" >&2
+        printf '  %s\n' "${partial[@]}" >&2
+        echo "Their staged lines need formatting. Stage the rest (or stash it)," >&2
+        echo "or run: $gcf --binary $cf --staged" >&2
+        exit 1
+        ;;
+    esac
+fi
+
+# Clean-staged files: format the staged lines in the working tree, then re-stage.
+"$gcf" --binary "$cf" --style=file --staged --extensions c,h >/dev/null || true
+git add -- "${files[@]}"
+exit 0
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -81,7 +81,65 @@ jobs:

      # Lint the scripts we maintain; the legacy scripts are a separate cleanup.
      - name: shellcheck
-        run: shellcheck man/makeman.sh tools/mkdeb.sh tests/*.test tests/check-network.sh
+        run: shellcheck man/makeman.sh tools/mkdeb.sh .githooks/pre-commit tests/*.test tests/check-network.sh

      - name: shfmt
-        run: shfmt -d -i 4 man/makeman.sh tools/mkdeb.sh
+        run: shfmt -d -i 4 man/makeman.sh tools/mkdeb.sh .githooks/pre-commit
+
+  # Check clang-format on CHANGED LINES ONLY. The engine predates clang-format
+  # (it was shaped by an old Visual Studio formatter) and does not round-trip,
+  # so we never reformat the whole tree -- only the lines a PR touches.
+  format:
+    name: format (clang-format-19, changed lines)
+    if: github.event_name == 'pull_request'
+    runs-on: ubuntu-24.04
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - name: Install clang-format 19 (pinned, from apt.llvm.org)
+        run: |
+          set -euo pipefail
+          # ubuntu-24.04's native clang-format is 18; pin 19 to match local dev.
+          wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \
+            | sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null
+          echo "deb http://apt.llvm.org/noble/ llvm-toolchain-noble-19 main" \
+            | sudo tee /etc/apt/sources.list.d/llvm-19.list >/dev/null
+          sudo apt-get update
+          sudo apt-get install -y --no-install-recommends clang-format-19
+          # git-clang-format driver, pinned to an immutable release tag (not a
+          # moving branch) since we curl and then execute it.
+          sudo curl -fsSL -o /usr/local/bin/git-clang-format \
+            https://raw.githubusercontent.com/llvm/llvm-project/llvmorg-19.1.7/clang/tools/clang-format/git-clang-format
+          sudo chmod 0755 /usr/local/bin/git-clang-format
+          clang-format-19 --version
+
+      - name: Check formatting of changed lines
+        run: |
+          set -euo pipefail
+          git fetch --no-tags origin \
+            "+refs/heads/${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}"
+          base="origin/${{ github.base_ref }}"
+          set +e
+          diff="$(git clang-format --binary clang-format-19 --style=file \
+                    --diff --extensions c,h "$base")"
+          rc=$?
+          set -e
+          # Classify by output first: a non-empty diff means "not clean",
+          # regardless of the driver's exit convention (the release-tag driver
+          # exits 0 and signals via stdout; some packaged drivers exit 1 on a
+          # diff). A nonzero exit with clean output is a real checker error.
+          case "$diff" in
+            "" | "no modified files to format" | *"did not modify any files"*)
+              if [ "$rc" -ne 0 ]; then
+                echo "::error::git clang-format failed (exit $rc): checker error."
+                exit 1
+              fi
+              echo "Formatting OK: changed C lines are clang-format-clean." ;;
+            *)
+              echo "$diff"
+              echo "::error::Changed C lines are not clang-format-clean."
+              echo "Fix locally with: git clang-format --binary clang-format-19 $base"
+              exit 1 ;;
+          esac
--- a/README.md
+++ b/README.md
@@ -23,7 +23,7 @@ http://www.httrack.com/

 ## Compile trunk release
 ```sh
-git clone https://github.com/xroche/httrack.git --recurse
+git clone https://github.com/xroche/httrack.git --recurse-submodules
 cd httrack
 ./configure --prefix=$HOME/usr && make -j8 && make install
 ```
--- a/html/cmddoc.html
+++ b/html/cmddoc.html
@@ -118,11 +118,11 @@ The command-line version
      <br>
      <br>
    <li>Add the URLs, separated by a blank space</li>
-      <br><small><tt>httrack www.someweb.com/foo/</tt></small>
+      <br><small><tt>httrack www.example.com/foo/</tt></small>
      <br>
      <br>
    <li>If you need, add some options (see the <a href="options.html">option list</a>)</li>
-      <br><small><tt>httrack www.someweb.com/foo/ -O "/webs" -N4 -P proxy.myhost.com:3128</tt></small>
+      <br><small><tt>httrack www.example.com/foo/ -O "/webs" -N4 -P proxy.myhost.com:3128</tt></small>
      <br>
      <br>
    <li>Launch the command line, and wait until the mirror is finishing</li>
--- a/html/faq.html
+++ b/html/faq.html
@@ -303,43 +303,43 @@ Okay, let me explain how to precisely control the capture process.<br>
 Let's take an example:<br>
 <br>
 Imagine you want to capture the following site:<br>
-<tt>www.someweb.com/gallery/flowers/</tt><br>
+<tt>www.example.com/gallery/flowers/</tt><br>
 <br>
-HTTrack, by default, will capture all links encountered in <tt>www.someweb.com/gallery/flowers/</tt> or in lower directories, like
-<tt>www.someweb.com/gallery/flowers/roses/</tt>.<br>
+HTTrack, by default, will capture all links encountered in <tt>www.example.com/gallery/flowers/</tt> or in lower directories, like
+<tt>www.example.com/gallery/flowers/roses/</tt>.<br>
 It will not follow links to other websites, because this behaviour might cause to capture the Web entirely!<br>
-It will not follow links located in higher directories, too (for example, <tt>www.someweb.com/gallery/flowers/</tt> itself) because this 
+It will not follow links located in higher directories, too (for example, <tt>www.example.com/gallery/flowers/</tt> itself) because this 
 might cause to capture too much data.<br>
 <br>
 This is the <b><u>default behaviour</b></u> of HTTrack, BUT, of course, if you want, you can tell HTTrack to capture other directorie(s), website(s)!..
 <br>
-In our example, we might want also to capture all links in <tt>www.someweb.com/gallery/trees/</tt>, and in <tt>www.someweb.com/photos/</tt><br>
+In our example, we might want also to capture all links in <tt>www.example.com/gallery/trees/</tt>, and in <tt>www.example.com/photos/</tt><br>
 <br>
 This can easily done by using filters: go to the Option panel, select the 'Scan rules' tab, and enter this line:
 (you can leave a blank space between each rules, instead of entering a carriage return)<br>
-<tt>+www.someweb.com/gallery/trees/*<br>
-+www.someweb.com/photos/*</tt><br>
+<tt>+www.example.com/gallery/trees/*<br>
+www.example.com/photos/*</tt><br>
 <br>
-This means "accept all links begining with <tt>www.someweb.com/gallery/trees/</tt> and <tt>www.someweb.com/photos/</tt>" 
+This means "accept all links begining with <tt>www.example.com/gallery/trees/</tt> and <tt>www.example.com/photos/</tt>" 
 - the <tt>+</tt> means "accept" and the final <tt>*</tt> means "any character will match after the previous ones".
 Remember the <tt>*.doc</tt> or <tt>*.zip</tt> encountered when you want to select all files from a certain type on your computer: 
 it is almost the same here, except the begining "+"<br>
 <br>
-Now, we might want to exclude all links in <tt>www.someweb.com/gallery/trees/hugetrees/</tt>, because with the previous filter,
+Now, we might want to exclude all links in <tt>www.example.com/gallery/trees/hugetrees/</tt>, because with the previous filter,
 we accepted too many files. Here again, you can add a filter rule to refuse these links. Modify the previous filters to:<br>
-<tt>+www.someweb.com/gallery/trees/*<br>
-+www.someweb.com/photos/*<br>
-www.someweb.com/gallery/trees/hugetrees/*</tt><br>
+<tt>+www.example.com/gallery/trees/*<br>
+www.example.com/photos/*<br>
+-www.example.com/gallery/trees/hugetrees/*</tt><br>
 <br>
 You have noticed the <tt>-</tt> in the begining of the third rule: this means "refuse links matching the rule" 
-; and the rule is "any files begining with <tt>www.someweb.com/gallery/trees/hugetrees/</tt><br>
+; and the rule is "any files begining with <tt>www.example.com/gallery/trees/hugetrees/</tt><br>

 Voila! With these three rules, you have precisely defined what you wanted to capture.<br>
 <br>
 A more complex example?<br>
 <br>
-Imagine that you want to accept all jpg files (files with .jpg type) that have "blue" in the name and located in www.someweb.com<br>
-<tt>+www.someweb.com/*blue*.jpg</tt><br>
+Imagine that you want to accept all jpg files (files with .jpg type) that have "blue" in the name and located in www.example.com<br>
+<tt>+www.example.com/*blue*.jpg</tt><br>
 <br>
 More detailed information can be found <a href="filters.html">here</a>!<br>
 <br>
@@ -440,7 +440,7 @@ This will cause a performance loss, but will increase the compatibility with som

 <a NAME="QT1">Q: <strong>Only the first page is caught. What's wrong?</a></strong></br>
 A: <em>First, check the <tt>hts-log.txt</tt> file (and/or <tt>hts-err.txt</tt> error log file) - this can give you precious information.<br>
-The problem can be a website that redirects you to another site (for example, <tt>www.someweb.com</tt> to <tt>public.someweb.com</tt>) : 
+The problem can be a website that redirects you to another site (for example, <tt>www.example.com</tt> to <tt>public.example.com</tt>) : 
 in this case, use filters to accept this site<br>
 This can be, also, a problem in the HTTrack options (link depth too low, for example)</em>

@@ -485,10 +485,10 @@ You may also want to capture files that are forbidden by default by the <a href=
 In these cases, HTTrack does not capture these links automatically, you have to tell it to do so. 
 <br><br>
 <ul><li>Either use the <a href="filters.html">filters</a>.<br>
-Example: You are downloading <tt>http://www.someweb.com/foo/</tt> and can not get .jpg images located
-in <tt>http://www.someweb.com/bar/</tt> (for example, http://www.someweb.com/bar/blue.jpg)<br>
-Then, add the filter rule <tt>+www.someweb.com/bar/*.jpg</tt> to accept all .jpg files from this location<br>
-You can, also, accept all files from the /bar folder with <tt>+www.someweb.com/bar/*</tt>, or only html files with <tt>+www.someweb.com/bar/*.html</tt> and so on..<br><br>
+Example: You are downloading <tt>http://www.example.com/foo/</tt> and can not get .jpg images located
+in <tt>http://www.example.com/bar/</tt> (for example, http://www.example.com/bar/blue.jpg)<br>
+Then, add the filter rule <tt>+www.example.com/bar/*.jpg</tt> to accept all .jpg files from this location<br>
+You can, also, accept all files from the /bar folder with <tt>+www.example.com/bar/*</tt>, or only html files with <tt>+www.example.com/bar/*.html</tt> and so on..<br><br>
 </li><li>
 If the problems are related to robots.txt rules, that do not let you access some folders (check in the logs if you are not sure),
 you may want to disable the default robots.txt rules in the options. (but only disable this option with great care, 
@@ -509,8 +509,8 @@ and rescan the website as described before. HTTrack will be obliged to recatch t
 <a NAME="Q1bb">Q: <strong>FTP links are not caught! What's happening?</strong><br>
 A: <em>FTP files might be seen as external links, especially if they are located in outside domain. You have either to accept all external links (See the links options, -n option) or
 only specific files (see <a href="filters.html">filters</a> section). <br>
-Example: You are downloading <tt>http://www.someweb.com/foo/</tt> and can not get ftp://ftp.someweb.com files<br>
-Then, add the filter rule <tt>+ftp.someweb.com/*</tt> to accept all files from this (ftp) location<br>
+Example: You are downloading <tt>http://www.example.com/foo/</tt> and can not get ftp://ftp.example.com files<br>
+Then, add the filter rule <tt>+ftp.example.com/*</tt> to accept all files from this (ftp) location<br>
 </em>
 <br>

@@ -551,10 +551,10 @@ Note: In some rare cases, duplicate data files can be found when the website red

 <a NAME="Q1b2">Q: <strong>I'm downloading too many files! What can I do?</strong><br>
 A: <em>This is often the case when you use too large a filter, for example <tt>+*.html</tt>, which asks the
-engine to catch all .html pages (even ones on other sites!). In this case, try to use more specific filters, like <tt>+www.someweb.com/specificfolder/*.html</tt><br>
-If you still have too many files, use filters to avoid somes files. For example, if you have too many files from www.someweb.com/big/, 
-use <tt>-www.someweb.com/big/*</tt> to avoid all files from this folder. Remember that the default behaviour of the engine, when
-mirroring http://www.someweb.com/big/index.html, is to catch everything in http://www.someweb.com/big/. Filters are your friends,
+engine to catch all .html pages (even ones on other sites!). In this case, try to use more specific filters, like <tt>+www.example.com/specificfolder/*.html</tt><br>
+If you still have too many files, use filters to avoid somes files. For example, if you have too many files from www.example.com/big/, 
+use <tt>-www.example.com/big/*</tt> to avoid all files from this folder. Remember that the default behaviour of the engine, when
+mirroring http://www.example.com/big/index.html, is to catch everything in http://www.example.com/big/. Filters are your friends,
 use them!
 </em>
 <br>
@@ -562,7 +562,7 @@ use them!

 <a NAME="Q1b22">Q: <strong>The engine turns crazy, getting thousands of files! What's going on?</strong><br>
 A: <em>This can happen if a loop occurs in some bogus website. For example, a page that refers to itself, with a timestamp
-in the query string (e.g. <tt>http://www.someweb.com/foo.asp?ts=2000/10/10,09:45:17:147</tt>). 
+in the query string (e.g. <tt>http://www.example.com/foo.asp?ts=2000/10/10,09:45:17:147</tt>). 
 These are really annoying, as it is VERY difficult to detect the loop (the timestamp might be a page number).
 To limit the problem: set a recurse level (for example to 6), or avoid the bogus pages (use the filters)
 </em>
@@ -571,7 +571,7 @@ To limit the problem: set a recurse level (for example to 6), or avoid the bogus

 <a NAME="Q1b3">Q: <strong>File are sometimes renamed (the type is changed)! Why?</strong><br>
 A: <em>By default, HTTrack tries to know the type of remote files. This is useful when links like
-<tt>http://www.someweb.com/foo.cgi?id=1</tt> can be either HTML pages, images or anything else. 
+<tt>http://www.example.com/foo.cgi?id=1</tt> can be either HTML pages, images or anything else. 
 Locally, foo.cgi will not be recognized as an html page, or as an image, by your browser. HTTrack has to rename the file
 as foo.html or foo.gif so that it can be viewed.<br>
 </em>
@@ -730,8 +730,8 @@ but this is a smart bug..
 the domain, too. How to retrieve them?</strong><br>
 A: <em>If you just want to retrieve files that can be reached through links, just activate
 the 'get file near links' option. But if you want to retrieve html pages too, you can both
-use wildcards or explicit addresses ; e.g. add <tt>www.someweb.com/*</tt> to accept all
-files and pages from www.someweb.com.<br>
+use wildcards or explicit addresses ; e.g. add <tt>www.example.com/*</tt> to accept all
+files and pages from www.example.com.<br>
 <br>
 </em></a><a NAME="Q6">Q: <strong>I have forgotten some URLs of files during a long
 mirror.. Should I redo all?</strong><br>
@@ -744,7 +744,7 @@ A: <em>You can use different methods. You can use the 'get files near a link' op
 files are in a foreign domain. You can use, too, a filter adress: adding <tt>+*.zip</tt>
 in the URL list (or in the filter list) will accept all ZIP files, even if these files are
 outside the address. <br>
-Example : <tt>httrack www.someweb.com/someaddress.html +*.zip</tt> will allow
+Example : <tt>httrack www.example.com/someaddress.html +*.zip</tt> will allow
 you to retrieve all zip files that are linked on the site.</em><br>
 <br>
 </a><a NAME="Q8">Q: <strong>There are ZIP files in a page, but I don't want to transfer
@@ -771,7 +771,7 @@ them on filters!</strong><br>
 A: <em>By default, HTTrack retrieves all types of files on authorized links. To avoid
 that, define filters like </a><a NAME="Q7"><tt>-* +&lt;website&gt;/*.html
 +&lt;website&gt;/*.htm +&lt;website&gt;/ +*.&lt;type wanted&gt;</tt></a><a NAME="Q10"><br>
-Example: <tt>httrack www.someweb.com/index.html -* +www.someweb.com/*.htm* +www.someweb.com/*.gif +www.someweb.com/*.jpg</tt><br>
+Example: <tt>httrack www.example.com/index.html -* +www.example.com/*.htm* +www.example.com/*.gif +www.example.com/*.jpg</tt><br>
 <br>
 </em><a NAME="Q10">Q: <strong>When I use filters, I get too many files!</strong><br>
 A: <em>You might use too large a filter, for example <tt>*.html</tt> will get ALL html
@@ -779,13 +779,13 @@ files identified. If you want to get all files on an address, use <tt>www.&lt;ad
 If you want to get ONLY files defined by your filters, use something like <tt>-* +www.foo.com/*</tt>, because 
 <tt>+www.foo.com/*</tt> will only accept selected links without forbidding other ones!<br>
 There are lots of possibilities using filters.<br>
-Example:<tt>httrack www.someweb.com +*.someweb.com/*.htm*</tt><br>
+Example:<tt>httrack www.example.com +*.example.com/*.htm*</tt><br>
 <br>
 </em></a><a NAME="Q11">Q: <strong>When I use filters, I can't access another domain, but I
 have filtered it!</strong><br>
-A: <em>You may have done a mistake declaring filters, for example <tt>+www.someweb.com/*
-*someweb* </tt></em>will not work, because -*someweb* has an upper priority (because it has
-been declared after +www.someweb.com)<br>
+A: <em>You may have done a mistake declaring filters, for example <tt>+www.example.com/*
+-*example* </tt></em>will not work, because -*example* has an upper priority (because it has
+been declared after +www.example.com)<br>
 <br>
 </a><a NAME="Q12">Q: <strong>Must I add a&nbsp; '+' or '-' in the filter list when I want
 to use filters?</strong><br>
@@ -800,7 +800,7 @@ filter list) and accept only html files and the file(s) you want to retrieve (BU
 forget to add <tt>+&lt;website&gt;*.html</tt> in the filter list, or pages will not be
 scanned! Add the name of files you want with a <tt>*/</tt> before ; i.e. if you want to
 retrieve file.zip, add <tt>*/file.zip</tt>)<br>
-Example:<tt>httrack www.someweb.com +www.someweb.com/*.htm* +thefileiwant.zip</tt><br>
+Example:<tt>httrack www.example.com +www.example.com/*.htm* +thefileiwant.zip</tt><br>
 <br>
 </em>

@@ -828,7 +828,7 @@ A: <em>Yes. See the URL capture abilities (--catchurl for command-line release,
 A: <em>Yes. See the shell system command option (-V option for command-line release)</em>

 <br><br><a NAME="QM6">Q: <strong>Can I use username/password authentication on a site?</strong></a><br>
-A: <em>Yes. Use user:password@your_url (example: <tt>http://foo:bar@www.someweb.com/private/mybox.html</tt>)</em>
+A: <em>Yes. Use user:password@your_url (example: <tt>http://foo:bar@www.example.com/private/mybox.html</tt>)</em>

 <br><br><a NAME="QM7">Q: <strong>Can I use username/password authentication for a proxy?</strong></a><br>
 A: <em>Yes. Use user:password@your_proxy_name as your proxy name (example: <tt>smith:foo@proxy.mycorp.com</tt>)</em>
--- a/html/fcguide.html
+++ b/html/fcguide.html
@@ -181,17 +181,17 @@ used for some time.

 <p align=justify> The rest of this manual is dedicated to detailing what
 you find in the help message and providing examples - lots and lots of
-examples...  Here is what you get (page by page - use <enter> to move to
+examples...  Here is what you get (page by page - use &lt;enter&gt; to move to
 the next page in the real program) if you type 'httrack --help':

 <pre>
 >httrack --help
 HTTrack version 3.03BETAo4 (compiled Jul  1 2001)
-	usage: ./httrack <URLs [-option] [+<FILTERs>] [-<FILTERs>]
+	usage: ./httrack &lt;URLs&gt; [-option] [+&lt;FILTERs&gt;] [-&lt;FILTERs&gt;]
 	with options listed below: (* is the default value)

 General options:
-  O  path for mirror/logfiles+cache (-O path_mirror[,path_cache_and_logfiles]) (--path <param>)
+  O  path for mirror/logfiles+cache (-O path_mirror[,path_cache_and_logfiles]) (--path &lt;param&gt;)
 %O  top path if no path defined (-O path_mirror[,path_cache_and_logfiles])

 Action options:
@@ -202,7 +202,7 @@ Action options:
  Y   mirror ALL links located in the first level pages (mirror links) (--mirrorlinks)

 Proxy options:
-  P  proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy <param>)
+  P  proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy &lt;param&gt;)
 %f *use proxy for ftp (f0 don't use) (--httpproxy-ftp[=N])

 Limits options:
@@ -227,7 +227,7 @@ Links options:
 %P *extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don't use) (--extended-parsing[=N])
  n  get non-html files 'near' an html file (ex: an image located outside) (--near)
  t  test all URLs (even forbidden ones) (--test)
- %L <file add all URL located in this text file (one URL per line) (--list <param>)
+ %L &lt;file&gt; add all URL located in this text file (one URL per line) (--list &lt;param&gt;)

 Build options:
  NN structure type (0 *original structure, 1+: see below) (--structure[=N])
@@ -248,12 +248,12 @@ Spider options:
 %h  force HTTP/1.0 requests (reduce update features, only for old servers or proxies) (--http-10)
 %B  tolerant requests (accept bogus responses on some servers, but not standard!) (--tolerant)
 %s  update hacks: various hacks to limit re-transfers when updating (identical size, bogus response..) (--updatehack)
- %A  assume that a type (cgi,asp..) is always linked with a mime type (-%A php3=text/html) (--assume <param>)
+ %A  assume that a type (cgi,asp..) is always linked with a mime type (-%A php3=text/html) (--assume &lt;param&gt;)

 Browser ID:
-  F  user-agent field (-F "user-agent name") (--user-agent <param>)
- %F  footer string in Html code (-%F "Mirrored [from host %s [file %s [at %s]]]" (--footer <param>)
- %l  preferred language (-%l "fr, en, jp, *" (--language <param>)
+  F  user-agent field (-F "user-agent name") (--user-agent &lt;param&gt;)
+ %F  footer string in Html code (-%F "Mirrored [from host %s [file %s [at %s]]]" (--footer &lt;param&gt;)
+ %l  preferred language (-%l "fr, en, jp, *" (--language &lt;param&gt;)

 Log, index, cache
  C  create/use a cache for updates and retries (C0 no cache,C1 cache is prioritary,* C2 test update before) (--cache[=N])
@@ -303,8 +303,8 @@ Guru options: (do NOT use)
 #!  Execute a shell command (-#! "echo hello")

 Command-line specific options:
-  V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd <param>)
- %U run the engine with another id when called as root (-%U smith) (--user <param>)
+  V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd &lt;param&gt;)
+ %U run the engine with another id when called as root (-%U smith) (--user &lt;param&gt;)

 Details: Option N
  N0 Site-structure (default)
@@ -332,7 +332,7 @@ Details: User-defined option N
  %N Name of file, including file type (ex: image.gif)
  %t File type (ex: gif)
  %p Path [without ending /] (ex: /someimages)
-  %h Host name (ex: www.someweb.com) (--http-10)
+  %h Host name (ex: www.example.com) (--http-10)
  %M URL MD5 (128 bits, 32 ascii bytes)
  %Q query string MD5 (128 bits, 32 ascii bytes)
  %q small query string MD5 (16 bits, 4 ascii bytes) (--include-query-string)
@@ -340,14 +340,14 @@ Details: User-defined option N
  %[param] param variable in query string

 Shortcuts:
--mirror      <URLs *make a mirror of site(s) (default)
--get         <URLs  get the files indicated, do not seek other URLs (-qg)
--list   <text file  add all URL located in this text file (-%L)
--mirrorlinks <URLs  mirror all links in 1st level pages (-Y)
--testlinks   <URLs  test links in pages (-r1p0C0I0t)
--spider      <URLs  spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
--testsite    <URLs  identical to --spider
--skeleton    <URLs  make a mirror, but gets only html files (-p1)
+--mirror      &lt;URLs&gt; *make a mirror of site(s) (default)
+--get         &lt;URLs&gt;  get the files indicated, do not seek other URLs (-qg)
+--list   &lt;text file&gt;  add all URL located in this text file (-%L)
+--mirrorlinks &lt;URLs&gt;  mirror all links in 1st level pages (-Y)
+--testlinks   &lt;URLs&gt;  test links in pages (-r1p0C0I0t)
+--spider      &lt;URLs&gt;  spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
+--testsite    &lt;URLs&gt;  identical to --spider
+--skeleton    &lt;URLs&gt;  make a mirror, but gets only html files (-p1)
 --update              update a mirror, without confirmation (-iC2)
 --continue            continue a mirror, without confirmation (-iC1)

@@ -356,17 +356,17 @@ Shortcuts:

 --http10              force http/1.0 requests (-%h)

-example: httrack www.someweb.com/bob/
-means:   mirror site www.someweb.com/bob/ and only this site
+example: httrack www.example.com/bob/
+means:   mirror site www.example.com/bob/ and only this site

-example: httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg
+example: httrack www.example.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg
 means:   mirror the two sites together (with shared links) and accept any .jpg files on .com sites

-example: httrack www.someweb.com/bob/bobby.html +* -r6
+example: httrack www.example.com/bob/bobby.html +* -r6
 means get all files starting from bobby.html, with 6 link-depth, and possibility of going everywhere on the web

-example: httrack www.someweb.com/bob/bobby.html --spider -P proxy.myhost.com:8080
-runs the spider on www.someweb.com/bob/bobby.html using a proxy
+example: httrack www.example.com/bob/bobby.html --spider -P proxy.myhost.com:8080
+runs the spider on www.example.com/bob/bobby.html using a proxy

 example: httrack --update
 updates a mirror in the current folder
@@ -387,13 +387,13 @@ with examples... I will be here a while...
 <hr>
 <h2> Syntax </h2>

-<pre><b><i>httrack <URLs> [-option] [+<FILTERs>] [-<FILTERs>] </i></b></pre>
+<pre><b><i>httrack &lt;URLs&gt; [-option] [+&lt;FILTERs&gt;] [-&lt;FILTERs&gt;] </i></b></pre>

 <p align=justify> The syntax of httrack is quite simple.  You specify
-the URLs you wish to start the process from (<URLS>), any options you
+the URLs you wish to start the process from (&lt;URLS&gt;), any options you
 might want to add ([-option], any filters specifying places you should
-([+<FILTERs>]) and should not ([-<FILTERs>]) go, and end the command
-line by pressing <enter>.  Httrack then goes off and does your bidding.
+([+&lt;FILTERs&gt;]) and should not ([-&lt;FILTERs&gt;]) go, and end the command
+line by pressing &lt;enter&gt;.  Httrack then goes off and does your bidding.
 For example:

 <pre><b><i>
@@ -425,7 +425,7 @@ site. Specifically, the defauls are:
  pN priority mode: (* p3)  *3 save all files
  D  *can only go down into subdirs
  a  *stay on the same address
-  --mirror      <URLs> *make a mirror of site(s) (default)
+  --mirror      &lt;URLs&gt; *make a mirror of site(s) (default)
 </pre>

 <p align=justify> Here's what all of that means:
@@ -542,7 +542,7 @@ subdirectories of the starting directory to be investigated.
 search started are to be collected.  Other sites they point to are not
 to be imaged. 

-<pre><b><i>  --mirror      <URLs> *make a mirror of site(s) (default) </i></b></pre>
+<pre><b><i>  --mirror      &lt;URLs&gt; *make a mirror of site(s) (default) </i></b></pre>

 <p align=justify> This indicates that the program should try to make a
 copy of the site as well as it can. 
@@ -921,7 +921,7 @@ Links options:
 %P *extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don't use)
  n   get non-html files 'near' an html file (ex: an image located outside)
  t   test all URLs (even forbidden ones)
- %L <file> add all URL located in this text file (one URL per line)
+ %L &lt;file&gt; add all URL located in this text file (one URL per line)
 </i></b></pre>

 <p align=justify> The links options allow you to control what links are
@@ -1183,7 +1183,7 @@ Spider options:
 %h  force HTTP/1.0 requests (reduce update features, only for old servers or proxies)
 %B  tolerant requests (accept bogus responses on some servers, but not standard!)
 %s  update hacks: various hacks to limit re-transfers when updating
- %A  assume that a type (cgi,asp..) is always linked with a mime type (-%A php3=text/html) (--assume <param>)
+ %A  assume that a type (cgi,asp..) is always linked with a mime type (-%A php3=text/html) (--assume &lt;param&gt;)
 </i></b></pre>

 <p align=justify> By default, cookies are universally accepted and
@@ -1387,7 +1387,7 @@ web servers leave footprints in the browser.
 Browser ID:
  F  user-agent field (-F "user-agent name")
 %F  footer string in Html code (-%F "Mirrored [from host %s [file %s [at %s]]]"
- %l  preferred language (-%l "fr, en, jp, *" (--language <param>)
+ %l  preferred language (-%l "fr, en, jp, *" (--language &lt;param&gt;)
 </i></b></pre>

 <p align=justify> The user-agent field is used by browsers to determine
@@ -1799,7 +1799,7 @@ based authentication)

 <pre><b><i>
 Command-line specific options:
-  V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd <param>)
+  V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd &lt;param&gt;)
 </i></b></pre>

 <p align=justify> This option is very nice for a wide array of actions
@@ -1811,7 +1811,7 @@ httrack http://www.shoesizes.com/bob/ -O /tmp/shoesizes -V "/bin/echo \$0"
 </i></b></pre>

 <pre>
- %U run the engine with another id when called as root (-%U smith) (--user <param>)
+ %U run the engine with another id when called as root (-%U smith) (--user &lt;param&gt;)
 </pre>

 <p align=justify> Change the UID of the owner when running as r00t
@@ -1856,14 +1856,14 @@ of other options that are commonly used.

 <pre><b><i>
 Shortcuts:
--mirror      <URLs> *make a mirror of site(s) (default)
--get         <URLs>  get the files indicated, do not seek other URLs (-qg)
--list   <text file>  add all URL located in this text file (-%L)
--mirrorlinks <URLs>  mirror all links in 1st level pages (-Y)
--testlinks   <URLs>  test links in pages (-r1p0C0I0t)
--spider      <URLs>  spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
--testsite    <URLs>  identical to --spider
--skeleton    <URLs>  make a mirror, but gets only html files (-p1)
+--mirror      &lt;URLs&gt; *make a mirror of site(s) (default)
+--get         &lt;URLs&gt;  get the files indicated, do not seek other URLs (-qg)
+--list   &lt;text file&gt;  add all URL located in this text file (-%L)
+--mirrorlinks &lt;URLs&gt;  mirror all links in 1st level pages (-Y)
+--testlinks   &lt;URLs&gt;  test links in pages (-r1p0C0I0t)
+--spider      &lt;URLs&gt;  spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
+--testsite    &lt;URLs&gt;  identical to --spider
+--skeleton    &lt;URLs&gt;  make a mirror, but gets only html files (-p1)
 --update              update a mirror, without confirmation (-iC2)
 --continue            continue a mirror, without confirmation (-iC1)
 --catchurl            create a temporary proxy to capture an URL or a form post URL
@@ -2019,15 +2019,15 @@ are in reverse priority order.  Here's an example:
        <td>no characters must be present after</a></td>
      </tr>
 	<tr>
-		<td> <b> <filter>*[&lt NN]</b></td>
+		<td> <b> &lt;filter&gt;*[&lt NN]</b></td>
 		<td> size less than NN Kbytes</td>
 	</tr>
 	<tr>
-		<td> <b> <filter>*[&gt PP]</b></td>
+		<td> <b> &lt;filter&gt;*[&gt PP]</b></td>
 		<td> size more than PP Kbytes</td>
 	</tr>
 	<tr>
-		<td> <b> <filter>*[&lt NN &gt PP]</b></td>
+		<td> <b> &lt;filter&gt;*[&lt NN &gt PP]</b></td>
 		<td> size less than NN Kbytes and more than PP Kbytes</td>
 	</tr>
    </table>
@@ -2054,8 +2054,8 @@ generated automatically using the interface)
        <td>This will accept all zip files in .com addresses</td>
      </tr>
      <tr>
-        <td><b>-*someweb*/*.tar*</b></td>
-        <td>This will refuse all tar (or tar.gz etc.) files in hosts containing someweb</td>
+        <td><b>-*example*/*.tar*</b></td>
+        <td>This will refuse all tar (or tar.gz etc.) files in hosts containing example</td>
      </tr>
      <tr>
        <td><b>+*/*somepage*</b></td>
--- a/html/filters.html
+++ b/html/filters.html
@@ -109,8 +109,8 @@ See also: The <a href="faq.html#VF1">FAQ</a><br>

    <i>You have to know that once you have defined
    starts links, the default mode is to mirror these links - i.e. if one of your start page is
-    www.someweb.com/test/index.html, all links starting with www.someweb.com/test/ will be
-    accepted. But links directly in www.someweb.com/.. will not be accepted, however, because
+    www.example.com/test/index.html, all links starting with www.example.com/test/ will be
+    accepted. But links directly in www.example.com/.. will not be accepted, however, because
    they are in a higher strcuture. This prevent HTTrack from mirroring the whole site. (All
    files in structure levels equal or lower than the primary links will be retrieved.)<br>
    </i>
@@ -278,8 +278,8 @@ See also: The <a href="faq.html#VF1">FAQ</a><br>
        <td>This will refuse/accept all zip files in .com addresses</td>
      </tr>
      <tr>
-        <td nowrap><tt>*someweb*/*.tar*</tt></td>
-        <td>This will refuse/accept all tar (or tar.gz etc.) files in hosts containing someweb</td>
+        <td nowrap><tt>*example*/*.tar*</tt></td>
+        <td>This will refuse/accept all tar (or tar.gz etc.) files in hosts containing example</td>
      </tr>
      <tr>
        <td nowrap><tt>*/*somepage*</tt></td>
@@ -289,13 +289,13 @@ See also: The <a href="faq.html#VF1">FAQ</a><br>
        <td nowrap><tt>*.html</tt></td>
        <td>This will refuse/accept all html files. <br>
        Warning! With this filter you will accept ALL html files, even those in other addresses.
-        (causing a global (!) web mirror..) Use www.someweb.com/*.html to accept all html files from
+        (causing a global (!) web mirror..) Use www.example.com/*.html to accept all html files from
        a web.</td>
      </tr>
      <tr>
        <td nowrap><tt>*.html*[]</tt></td>
        <td>Identical to <tt>*.html</tt>, but the link must not have any supplemental characters
-        at the end (links with parameters, like <tt>www.someweb.com/index.html?page=10</tt>, will be
+        at the end (links with parameters, like <tt>www.example.com/index.html?page=10</tt>, will be
        refused)</td>
      </tr>
    </table>
--- a/html/httrack.man.html
+++ b/html/httrack.man.html
@@ -123,12 +123,12 @@ mirrored site, and resume interrupted downloads.</p>


 <p style="margin-left:11%; margin-top: 1em"><b>httrack
-www.someweb.com/bob/</b></p>
+www.example.com/bob/</b></p>

 <p style="margin-left:22%;">mirror site
-www.someweb.com/bob/ and only this site</p>
+www.example.com/bob/ and only this site</p>

-<p style="margin-left:11%;"><b>httrack www.someweb.com/bob/
+<p style="margin-left:11%;"><b>httrack www.example.com/bob/
 www.anothertest.com/mike/ +*.com/*.jpg <br>
 -mime:application/*</b></p>

@@ -137,18 +137,18 @@ www.anothertest.com/mike/ +*.com/*.jpg <br>
 sites</p>

 <p style="margin-left:11%;"><b>httrack
-www.someweb.com/bob/bobby.html +* -r6</b></p>
+www.example.com/bob/bobby.html +* -r6</b></p>

 <p style="margin-left:22%;">means get all files starting
 from bobby.html, with 6 link-depth, and possibility of going
 everywhere on the web</p>

 <p style="margin-left:11%;"><b>httrack
-www.someweb.com/bob/bobby.html --spider -P <br>
+www.example.com/bob/bobby.html --spider -P <br>
 proxy.myhost.com:8080</b></p>

 <p style="margin-left:22%;">runs the spider on
-www.someweb.com/bob/bobby.html using a proxy</p>
+www.example.com/bob/bobby.html using a proxy</p>

 <p style="margin-left:11%;"><b>httrack --update</b></p>

@@ -1877,7 +1877,7 @@ User-defined option N</b> <br>
 %N Name of file, including file type (ex: image.gif) <br>
 %t File type (ex: gif) <br>
 %p Path [without ending /] (ex: /someimages) <br>
-%h Host name (ex: www.someweb.com) <br>
+%h Host name (ex: www.example.com) <br>
 %M URL MD5 (128 bits, 32 ascii bytes) <br>
 %Q query string MD5 (128 bits, 32 ascii bytes) <br>
 %k full query string <br>
--- a/html/options.html
+++ b/html/options.html
@@ -131,16 +131,16 @@ This is the default primary scanning option, the engine does not go out of domai

 d   stay on the same principal domain
 This option lets the engine go on all sites that exist on the same principal domain.
-Example: a link located at www.someweb.com that goes to members.someweb.com will be followed.
+Example: a link located at www.example.com that goes to members.example.com will be followed.

 l   stay on the same location (.com, etc.)
 This option lets the engine go on all sites that exist on the same location.
-Example: a link located at www.someweb.com that goes to www.anyotherweb.com will be followed.
+Example: a link located at www.example.com that goes to www.anyotherweb.com will be followed.
 Warning: this is a potentially dangerous option, limit the recurse depth with r option.

 e   go everywhere on the web
 This option lets the engine go on any sites.
-Example: a link located at www.someweb.com that goes to www.anyotherweb.org will be followed.
+Example: a link located at www.example.com that goes to www.anyotherweb.org will be followed.
 Warning: this is a potentially dangerous option, limit the recurse depth with r option.

 n   get non-html files 'near' an html file (ex: an image located outside)
--- a/html/step9_opt8.html
+++ b/html/step9_opt8.html
@@ -117,7 +117,7 @@ h4 { margin: 0;  font-weight: bold;  font-size: 1.18em; }
  <li>HTML Footer</li>
  <br><small>Enter here the optionnal text that will be included as a comment in each HTML file to make archiving easier
  <br>The string entered is generally an HTML comment (<tt>&lt;!-- HTML comment --&gt;</tt>) with optionnal %s, which will be transformed into a specific string information:
-  <br>%s #1 : host name (for example, www.someweb.com)
+  <br>%s #1 : host name (for example, www.example.com)
  <br>%s #2 : file name (for example, /index.html)
  <br>%s #3 : date of the mirror
  <br><b>Example</b>: <tt>&lt;!-- Page mirrored from %s, file %s. Archive date: %s --&gt;</tt>
--- a/lang/Ukrainian.txt
+++ b/lang/Ukrainian.txt
@@ -7,7 +7,7 @@ uk
 LANGUAGE_AUTHOR
 Andrij Shevchuk (http://programy.com.ua, http://vic-info.com.ua) \r\n
 LANGUAGE_CHARSET
-ISO-8859-5
+windows-1251
 LANGUAGE_WINDOWSID
 Ukrainian
 OK
--- a/man/Makefile.am
+++ b/man/Makefile.am
@@ -13,3 +13,9 @@ regen-man: makeman.sh $(top_builddir)/src/httrack$(EXEEXT)
 	README='$(top_srcdir)/README' $(SHELL) $(srcdir)/makeman.sh \
 		'$(top_builddir)/src/httrack$(EXEEXT)' > $(srcdir)/httrack.1
 .PHONY: regen-man
+
+# Render html/httrack.man.html from httrack.1. Needs the groff html device
+# (Debian: full "groff" package, not "groff-base"). Run by hand: make -C man regen-man-html
+regen-man-html: httrack.1
+	groff -t -man -Thtml $(srcdir)/httrack.1 > $(top_srcdir)/html/httrack.man.html
+.PHONY: regen-man-html
--- a/man/Makefile.in
+++ b/man/Makefile.in
@@ -551,6 +551,12 @@ regen-man: makeman.sh $(top_builddir)/src/httrack$(EXEEXT)
 		'$(top_builddir)/src/httrack$(EXEEXT)' > $(srcdir)/httrack.1
 .PHONY: regen-man

+# Render html/httrack.man.html from httrack.1. Needs the groff html device
+# (Debian: full "groff" package, not "groff-base"). Run by hand: make -C man regen-man-html
+regen-man-html: httrack.1
+	groff -t -man -Thtml $(srcdir)/httrack.1 > $(top_srcdir)/html/httrack.man.html
+.PHONY: regen-man-html
+
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
 .NOEXPORT:
--- a/man/httrack.1
+++ b/man/httrack.1
@@ -2,7 +2,7 @@
 .\" groff -man -Tascii httrack.1
 .\"
 .\" This file is generated by man/makeman.sh; do not edit by hand.
-.TH httrack 1 "07 June 2026" "httrack website copier"
+.TH httrack 1 "13 June 2026" "httrack website copier"
 .SH NAME
 httrack \- offline browser : copy websites to a local directory
 .SH SYNOPSIS
@@ -98,15 +98,15 @@ httrack \- offline browser : copy websites to a local directory
 allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads.
 .SH EXAMPLES
 .TP
-.B httrack www.someweb.com/bob/
-mirror site www.someweb.com/bob/ and only this site
+.B httrack www.example.com/bob/
+mirror site www.example.com/bob/ and only this site
 .TP
-.B httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg \-mime:application/*
+.B httrack www.example.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg \-mime:application/*
 mirror the two sites together (with shared links) and accept any .jpg files on .com sites
 .TP
-.B httrack www.someweb.com/bob/bobby.html +* \-r6
+.B httrack www.example.com/bob/bobby.html +* \-r6
 .TP
-.B httrack www.someweb.com/bob/bobby.html \-\-spider \-P proxy.myhost.com:8080
+.B httrack www.example.com/bob/bobby.html \-\-spider \-P proxy.myhost.com:8080
 .TP
 .B httrack \-\-update
 .TP
@@ -411,7 +411,7 @@ File type (ex: gif)
 .IP \-%p
 Path [without ending /] (ex: /someimages)
 .IP \-%h
-Host name (ex: www.someweb.com)
+Host name (ex: www.example.com)
 .IP \-%M
 URL MD5 (128 bits, 32 ascii bytes)
 .IP \-%Q
--- a/src/htsalias.c
+++ b/src/htsalias.c
@@ -271,8 +271,11 @@ int optalias_check(int argc, const char *const *argv, int n_arg,
  *return_argc = 1;
  if (argv[n_arg][0] == '-')
    if (argv[n_arg][1] == '-') {
-      char command[1000];
-      char param[1000];
+      /* sized to HTS_CDLMAXSIZE: a long-form option value (--user-agent,
+         --headers, ...) is copied into param, and the value is bounded by the
+         general per-argument check in htscoremain.c (HTS_CDLMAXSIZE) */
+      char command[HTS_CDLMAXSIZE];
+      char param[HTS_CDLMAXSIZE];
      char addcommand[256];

      /* */
--- a/src/htscatchurl.c
+++ b/src/htscatchurl.c
@@ -201,8 +201,8 @@ HTSEXT_API int catch_url(T_SOC soc, char *url, char *method, char *data) {
            while(strnotempty(line)) {
              socinput(soc, line, 1000);
              treathead(NULL, NULL, NULL, &blkretour, line);    // traiter
-              strcatbuff(data, line);
-              strcatbuff(data, "\r\n");
+              strlcatbuff(data, line, CATCH_URL_DATA_SIZE);
+              strlcatbuff(data, "\r\n", CATCH_URL_DATA_SIZE);
            }
            // CR/LF final de l'en tête inutile car déja placé via la ligne vide juste au dessus
            //strcatbuff(data,"\r\n");
--- a/src/htscatchurl.h
+++ b/src/htscatchurl.h
@@ -40,6 +40,9 @@ Please visit our Website: http://www.httrack.com
 /* Library internal definictions */
 #ifdef HTS_INTERNAL_BYTECODE

+// Capacity contract for the catch_url() 'data' buffer (32 Kb).
+#define CATCH_URL_DATA_SIZE 32768
+
 // Fonctions
 void socinput(T_SOC soc, char *s, int max);

--- a/src/htscoremain.c
+++ b/src/htscoremain.c
@@ -140,6 +140,97 @@ static void basic_selftests(void) {
  md5selftest();
 }

+/* Self-tests for the htssafe.h bounded string ops (driven by httrack -#8).
+   Returns 0 if every bounded operation behaved correctly, 1 otherwise.
+   The abort-on-overflow guarantee is checked separately by the -#8 "overflow"
+   sub-mode (it aborts the process by design). */
+static int string_safety_selftests(void) {
+  char buf[8];
+
+  /* strcpybuff into a sized array: exact copy */
+  strcpybuff(buf, "abc");
+  if (strcmp(buf, "abc") != 0)
+    return 1;
+
+  /* strcatbuff append within capacity */
+  strcatbuff(buf, "de");
+  if (strcmp(buf, "abcde") != 0)
+    return 1;
+
+  /* strncatbuff appends at most N source chars */
+  strcpybuff(buf, "ab");
+  strncatbuff(buf, "cdef", 2);
+  if (strcmp(buf, "abcd") != 0)
+    return 1;
+
+  /* strlcpybuff: explicit-capacity copy into a pointer destination, the form
+     the migration moves toward */
+  {
+    char storage[8];
+    char *const p = storage;
+
+    strlcpybuff(p, "hello", sizeof(storage));
+    if (strcmp(p, "hello") != 0)
+      return 1;
+  }
+
+  /* strcpybuff into a pointer destination: routes through the unchecked
+     strcpybuff_ptr_ fallback (the path the -#8 warning flags). The warning is
+     intentional here; we only verify the fallback still copies correctly. */
+#if defined(__GNUC__)
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wattribute-warning"
+#endif
+  {
+    char storage[8];
+    char *const p = storage;
+
+    strcpybuff(p, "ptr");
+    if (strcmp(p, "ptr") != 0)
+      return 1;
+  }
+#if defined(__GNUC__)
+#pragma GCC diagnostic pop
+#endif
+
+  /* htsbuff: bounded builder over a fixed array (append, truncating append,
+     reset, and length tracking) */
+  {
+    char dst[8];
+    htsbuff b = htsbuff_array(dst);
+
+    htsbuff_cat(&b, "ab");
+    htsbuff_cat(&b, "cd");
+    if (strcmp(htsbuff_str(&b), "abcd") != 0 || b.len != 4)
+      return 1;
+
+    htsbuff_catn(&b, "efghij", 2);      /* append at most 2 */
+    if (strcmp(htsbuff_str(&b), "abcdef") != 0)
+      return 1;
+
+    htsbuff_cpy(&b, "xyz");             /* reset */
+    if (strcmp(htsbuff_str(&b), "xyz") != 0 || b.len != 3)
+      return 1;
+
+    htsbuff_catc(&b, '!'); /* single character */
+    if (strcmp(htsbuff_str(&b), "xyz!") != 0 || b.len != 4)
+      return 1;
+  }
+
+  /* boundary: filling to exactly cap-1 must succeed (one more aborts, which the
+     -#8 overflow-buff mode checks) */
+  {
+    char d2[4];
+    htsbuff c = htsbuff_array(d2);
+
+    htsbuff_cat(&c, "abc");
+    if (strcmp(htsbuff_str(&c), "abc") != 0 || c.len != 3)
+      return 1;
+  }
+
+  return 0;
+}
+
 static int hts_main_internal(int argc, char **argv, httrackp * opt);

 // Main, récupère les paramètres et appelle le robot
@@ -1787,10 +1878,6 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
                    HTS_PANIC_PRINTF("Empty string given");
                    htsmain_free();
                    return -1;
-                  } else if (strlen(argv[na]) >= 256) {
-                    HTS_PANIC_PRINTF("Header line string too long");
-                    htsmain_free();
-                    return -1;
                  }
                  StringCat(opt->headers, argv[na]);
                  StringCat(opt->headers, "\r\n");  /* separator */
@@ -2441,6 +2528,35 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
                htsmain_free();
                return 0;
                break;
+              case '8':        /* string-safety selftest: httrack -#8 [overflow <bigstr>] */
+                if (na + 1 < argc
+                    && strncmp(argv[na + 1], "overflow", 8) == 0) {
+                  /* Deliberately exceed a sized buffer: the bounded op must
+                     abort. The source comes from argv so its length is opaque
+                     to the compiler (no static -Wstringop-overflow, genuine
+                     runtime check). "overflow-buff" exercises htsbuff. */
+                  char small[4];
+                  const char *const src =
+                    (na + 2 < argc) ? argv[na + 2] : "overflowing";
+
+                  if (strcmp(argv[na + 1], "overflow-buff") == 0) {
+                    htsbuff b = htsbuff_array(small);
+
+                    htsbuff_cat(&b, src);
+                  } else {
+                    strcpybuff(small, src);
+                  }
+                  printf("strsafe: NOT aborted\n");     /* must be unreachable */
+                  htsmain_free();
+                  return 1;
+                } else {
+                  const int err = string_safety_selftests();
+
+                  printf("strsafe: %s\n", err ? "FAIL" : "OK");
+                  htsmain_free();
+                  return err;
+                }
+                break;
              case '7':  // hashtable selftest: httrack -#7 nb_entries
                basic_selftests();
                if (++na < argc) {
@@ -2691,11 +2807,6 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
              return -1;
            } else {
              na++;
-              if (strlen(argv[na]) >= 126) {
-                HTS_PANIC_PRINTF("User-agent length too long");
-                htsmain_free();
-                return -1;
-              }
              StringCopy(opt->user_agent, argv[na]);
              if (StringNotEmpty(opt->user_agent))
                opt->user_agent_send = 1;
@@ -2899,7 +3010,9 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
  }

  {
-    char n_lock[256];
+    /* Sized to the concat-buffer capacity so it can always hold the lock-file
+       path produced by fconcat(), even with a long log path (issue #183). */
+    char n_lock[OPT_GET_BUFF_SIZE(opt)];

    // on peut pas avoir un affichage ET un fichier log
    // ca sera pour la version 2
--- a/src/htshelp.c
+++ b/src/htshelp.c
@@ -409,7 +409,7 @@ void help_catchurl(const char *dest_path) {
  if (soc != INVALID_SOCKET) {
    char BIGSTK url[HTS_URLMAXSIZE * 2];
    char method[32];
-    char BIGSTK data[32768];
+    char BIGSTK data[CATCH_URL_DATA_SIZE];

    url[0] = method[0] = data[0] = '\0';
    //
@@ -712,7 +712,7 @@ void help(const char *app, int more) {
  infomsg("  '%N' Name of file, including file type (ex: image.gif)");
  infomsg("  '%t' File type (ex: gif)");
  infomsg("  '%p' Path [without ending /] (ex: /someimages)");
-  infomsg("  '%h' Host name (ex: www.someweb.com)");
+  infomsg("  '%h' Host name (ex: www.example.com)");
  infomsg("  '%M' URL MD5 (128 bits, 32 ascii bytes)");
  infomsg("  '%Q' query string MD5 (128 bits, 32 ascii bytes)");
  infomsg("  '%k' full query string");
@@ -767,21 +767,21 @@ void help(const char *app, int more) {
  infomsg("Details: Option %W: External callbacks prototypes");
  infomsg("see htsdefines.h");
  infomsg("");
-  infomsg("example: httrack www.someweb.com/bob/");
-  infomsg("means:   mirror site www.someweb.com/bob/ and only this site");
+  infomsg("example: httrack www.example.com/bob/");
+  infomsg("means:   mirror site www.example.com/bob/ and only this site");
  infomsg("");
  infomsg
-    ("example: httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg -mime:application/*");
+    ("example: httrack www.example.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg -mime:application/*");
  infomsg
    ("means:   mirror the two sites together (with shared links) and accept any .jpg files on .com sites");
  infomsg("");
-  infomsg("example: httrack www.someweb.com/bob/bobby.html +* -r6");
+  infomsg("example: httrack www.example.com/bob/bobby.html +* -r6");
  infomsg
    ("means get all files starting from bobby.html, with 6 link-depth, and possibility of going everywhere on the web");
  infomsg("");
  infomsg
-    ("example: httrack www.someweb.com/bob/bobby.html --spider -P proxy.myhost.com:8080");
-  infomsg("runs the spider on www.someweb.com/bob/bobby.html using a proxy");
+    ("example: httrack www.example.com/bob/bobby.html --spider -P proxy.myhost.com:8080");
+  infomsg("runs the spider on www.example.com/bob/bobby.html using a proxy");
  infomsg("");
  infomsg("example: httrack --update");
  infomsg("updates a mirror in the current folder");
--- a/src/htslib.c
+++ b/src/htslib.c
@@ -121,6 +121,7 @@ const char *hts_detect[] = {
  "lowsrc",
  "profile",                    // element META
  "src",
+  "srcset",                     // HTML5 responsive images (<img>, <source>)
  "swurl",
  "url",
  "usemap",
@@ -877,7 +878,7 @@ int http_sendhead(httrackp * opt, t_cookie * cookie, int mode,
                  const char *xsend, const char *adr, const char *fil,
                  const char *referer_adr, const char *referer_fil,
                  htsblk * retour) {
-  char BIGSTK buffer_head_request[8192];
+  char BIGSTK buffer_head_request[16384];
  buff_struct bstr = { buffer_head_request, sizeof(buffer_head_request), 0 };

  //int use_11=0;     // HTTP 1.1 utilisé
@@ -895,9 +896,9 @@ int http_sendhead(httrackp * opt, t_cookie * cookie, int mode,

  // possibilité non documentée: >post: et >postfile:
  // si présence d'un tag >post: alors executer un POST
-  // exemple: http://www.someweb.com/test.cgi?foo>post:posteddata=10&foo=5
+  // exemple: http://www.example.com/test.cgi?foo>post:posteddata=10&foo=5
  // si présence d'un tag >postfile: alors envoyer en tête brut contenu dans le fichier en question
-  // exemple: http://www.someweb.com/test.cgi?foo>postfile:post0.txt
+  // exemple: http://www.example.com/test.cgi?foo>postfile:post0.txt
  search_tag = strstr(fil, POSTTOK ":");
  if (!search_tag) {
    search_tag = strstr(fil, POSTTOK "file:");
@@ -1659,138 +1660,107 @@ void treathead(t_cookie * cookie, const char *adr, const char *fil, htsblk * ret
  }
 }

-// transforme le message statuscode en chaîne
-HTSEXT_API void infostatuscode(char *msg, int statuscode) {
+// HTTP status code -> reason phrase (per RFC), or NULL if unknown.
+HTSEXT_API const char *infostatuscode_const(int statuscode) {
+  // O(1) dispatch (the compiler builds a jump table); the phrases are static.
  switch (statuscode) {
-    // Erreurs HTTP, selon RFC
  case 100:
-    strcpybuff(msg, "Continue");
-    break;
+    return "Continue";
  case 101:
-    strcpybuff(msg, "Switching Protocols");
-    break;
+    return "Switching Protocols";
  case 200:
-    strcpybuff(msg, "OK");
-    break;
+    return "OK";
  case 201:
-    strcpybuff(msg, "Created");
-    break;
+    return "Created";
  case 202:
-    strcpybuff(msg, "Accepted");
-    break;
+    return "Accepted";
  case 203:
-    strcpybuff(msg, "Non-Authoritative Information");
-    break;
+    return "Non-Authoritative Information";
  case 204:
-    strcpybuff(msg, "No Content");
-    break;
+    return "No Content";
  case 205:
-    strcpybuff(msg, "Reset Content");
-    break;
+    return "Reset Content";
  case 206:
-    strcpybuff(msg, "Partial Content");
-    break;
+    return "Partial Content";
  case 300:
-    strcpybuff(msg, "Multiple Choices");
-    break;
+    return "Multiple Choices";
  case 301:
-    strcpybuff(msg, "Moved Permanently");
-    break;
+    return "Moved Permanently";
  case 302:
-    strcpybuff(msg, "Moved Temporarily");
-    break;
+    return "Moved Temporarily";
  case 303:
-    strcpybuff(msg, "See Other");
-    break;
+    return "See Other";
  case 304:
-    strcpybuff(msg, "Not Modified");
-    break;
+    return "Not Modified";
  case 305:
-    strcpybuff(msg, "Use Proxy");
-    break;
+    return "Use Proxy";
  case 306:
-    strcpybuff(msg, "Undefined 306 error");
-    break;
+    return "Undefined 306 error";
  case 307:
-    strcpybuff(msg, "Temporary Redirect");
-    break;
+    return "Temporary Redirect";
  case 400:
-    strcpybuff(msg, "Bad Request");
-    break;
+    return "Bad Request";
  case 401:
-    strcpybuff(msg, "Unauthorized");
-    break;
+    return "Unauthorized";
  case 402:
-    strcpybuff(msg, "Payment Required");
-    break;
+    return "Payment Required";
  case 403:
-    strcpybuff(msg, "Forbidden");
-    break;
+    return "Forbidden";
  case 404:
-    strcpybuff(msg, "Not Found");
-    break;
+    return "Not Found";
  case 405:
-    strcpybuff(msg, "Method Not Allowed");
-    break;
+    return "Method Not Allowed";
  case 406:
-    strcpybuff(msg, "Not Acceptable");
-    break;
+    return "Not Acceptable";
  case 407:
-    strcpybuff(msg, "Proxy Authentication Required");
-    break;
+    return "Proxy Authentication Required";
  case 408:
-    strcpybuff(msg, "Request Time-out");
-    break;
+    return "Request Time-out";
  case 409:
-    strcpybuff(msg, "Conflict");
-    break;
+    return "Conflict";
  case 410:
-    strcpybuff(msg, "Gone");
-    break;
+    return "Gone";
  case 411:
-    strcpybuff(msg, "Length Required");
-    break;
+    return "Length Required";
  case 412:
-    strcpybuff(msg, "Precondition Failed");
-    break;
+    return "Precondition Failed";
  case 413:
-    strcpybuff(msg, "Request Entity Too Large");
-    break;
+    return "Request Entity Too Large";
  case 414:
-    strcpybuff(msg, "Request-URI Too Large");
-    break;
+    return "Request-URI Too Large";
  case 415:
-    strcpybuff(msg, "Unsupported Media Type");
-    break;
+    return "Unsupported Media Type";
  case 416:
-    strcpybuff(msg, "Requested Range Not Satisfiable");
-    break;
+    return "Requested Range Not Satisfiable";
  case 417:
-    strcpybuff(msg, "Expectation Failed");
-    break;
+    return "Expectation Failed";
  case 500:
-    strcpybuff(msg, "Internal Server Error");
-    break;
+    return "Internal Server Error";
  case 501:
-    strcpybuff(msg, "Not Implemented");
-    break;
+    return "Not Implemented";
  case 502:
-    strcpybuff(msg, "Bad Gateway");
-    break;
+    return "Bad Gateway";
  case 503:
-    strcpybuff(msg, "Service Unavailable");
-    break;
+    return "Service Unavailable";
  case 504:
-    strcpybuff(msg, "Gateway Time-out");
-    break;
+    return "Gateway Time-out";
  case 505:
-    strcpybuff(msg, "HTTP Version Not Supported");
-    break;
-    //
+    return "HTTP Version Not Supported";
  default:
-    if (strnotempty(msg) == 0)
-      strcpybuff(msg, "Unknown error");
-    break;
+    return NULL;
+  }
+}
+
+// Write the status code's reason phrase into msg. For an unknown code, keep any
+// caller-provided message, otherwise fall back to a default. Callers provide a
+// buffer of at least 64 bytes (the longest reason phrase is 31).
+HTSEXT_API void infostatuscode(char *msg, int statuscode) {
+  const char *const text = infostatuscode_const(statuscode);
+
+  if (text != NULL) {
+    strlcpybuff(msg, text, 64);
+  } else if (strnotempty(msg) == 0) {
+    strlcpybuff(msg, "Unknown error", 64);
  }
 }

--- a/src/htsname.c
+++ b/src/htsname.c
@@ -767,7 +767,7 @@ int url_savename(lien_adrfilsave *const afs,
  // ajouter nom du site éventuellement en premier
  if (opt->savename_type == -1) {       // utiliser savename_userdef! (%h%p/%n%q.%t)
    const char *a = StringBuff(opt->savename_userdef);
-    char *b = afs->save;
+    htsbuff sb = htsbuff_array(afs->save);

    /*char *nom_pos=NULL,*dot_pos=NULL;  // Position nom et point */
    char tok;
@@ -787,17 +787,16 @@ int url_savename(lien_adrfilsave *const afs,
       }
     */

-    // Construire nom
-    while((*a) && (((int) (b - afs->save)) < HTS_URLMAXSIZE)) {      // parser, et pas trop long..
+    // build the name
+    while ((*a) && (sb.len < HTS_URLMAXSIZE)) { // parse, but not too long
      if (*a == '%') {
        int short_ver = 0;

        a++;
-        if (*a == 's') {
+        if (*a == 's') { // '%s...' selects the short (8.3) form
          short_ver = 1;
          a++;
        }
-        *b = '\0';
        switch (tok = *a++) {
        case '[':              // %[param:prefix_if_not_empty:suffix_if_not_empty:empty_replacement:notfound_replacement]
          if (strchr(a, ']')) {
@@ -834,8 +833,7 @@ int url_savename(lien_adrfilsave *const afs,
              }
              if (cp) {
                c = cp + strlen(name[0]);       /* jumps "param=" */
-                strcpybuff(b, name[1]); /* prefix */
-                b += strlen(b);
+                htsbuff_cat(&sb, name[1]);      /* prefix */
                if (*c != '\0' && *c != '&') {
                  char *d = name[0];

@@ -846,110 +844,90 @@ int url_savename(lien_adrfilsave *const afs,
                  *d = '\0';
                  d = unescape_http(catbuff, sizeof(catbuff), name[0]);
                  if (d && *d) {
-                    strcpybuff(b, d);   /* value */
-                    b += strlen(b);
+                    htsbuff_cat(&sb, d); /* value */
                  } else {
-                    strcpybuff(b, name[3]);     /* empty replacement if any */
-                    b += strlen(b);
+                    htsbuff_cat(&sb, name[3]); /* empty replacement if any */
                  }
                } else {
-                  strcpybuff(b, name[3]);       /* empty replacement if any */
-                  b += strlen(b);
+                  htsbuff_cat(&sb, name[3]); /* empty replacement if any */
                }
-                strcpybuff(b, name[2]); /* suffix */
-                b += strlen(b);
+                htsbuff_cat(&sb, name[2]); /* suffix */
              } else {
-                strcpybuff(b, name[4]); /* not found replacement if any */
-                b += strlen(b);
+                htsbuff_cat(&sb, name[4]); /* not found replacement if any */
              }
            } else {
-              strcpybuff(b, name[4]);   /* not found replacement if any */
-              b += strlen(b);
+              htsbuff_cat(&sb, name[4]); /* not found replacement if any */
            }
          }
          break;
        case '%':
-          *b++ = '%';
+          htsbuff_catc(&sb, '%');
          break;
-        case 'n':              // nom sans ext
-          *b = '\0';
+        case 'n': // name without extension
          if (dot_pos) {
-            if (!short_ver)     // Noms longs
-              strncatbuff(b, nom_pos, (int) (dot_pos - nom_pos));
+            if (!short_ver)
+              htsbuff_catn(&sb, nom_pos, (int) (dot_pos - nom_pos));
            else
-              strncatbuff(b, nom_pos, min((int) (dot_pos - nom_pos), 8));
+              htsbuff_catn(&sb, nom_pos, min((int) (dot_pos - nom_pos), 8));
          } else {
-            if (!short_ver)     // Noms longs
-              strcpybuff(b, nom_pos);
+            if (!short_ver)
+              htsbuff_cat(&sb, nom_pos);
            else
-              strncatbuff(b, nom_pos, 8);
+              htsbuff_catn(&sb, nom_pos, 8);
          }
-          b += strlen(b);       // pointer à la fin
          break;
-        case 'N':              // nom avec ext
-          // RECOPIE NOM + EXT
-          *b = '\0';
+        case 'N': // name with extension
          if (dot_pos) {
-            if (!short_ver)     // Noms longs
-              strncatbuff(b, nom_pos, (int) (dot_pos - nom_pos));
+            if (!short_ver)
+              htsbuff_catn(&sb, nom_pos, (int) (dot_pos - nom_pos));
            else
-              strncatbuff(b, nom_pos, min((int) (dot_pos - nom_pos), 8));
+              htsbuff_catn(&sb, nom_pos, min((int) (dot_pos - nom_pos), 8));
          } else {
-            if (!short_ver)     // Noms longs
-              strcpybuff(b, nom_pos);
+            if (!short_ver)
+              htsbuff_cat(&sb, nom_pos);
            else
-              strncatbuff(b, nom_pos, 8);
+              htsbuff_catn(&sb, nom_pos, 8);
          }
-          b += strlen(b);       // pointer à la fin
-          *b = '.';
-          ++b;
-          // RECOPIE NOM + EXT
-          *b = '\0';
+          htsbuff_catc(&sb, '.');
          if (dot_pos) {
-            if (!short_ver)     // Noms longs
-              strcpybuff(b, dot_pos + 1);
+            if (!short_ver)
+              htsbuff_cat(&sb, dot_pos + 1);
            else
-              strncatbuff(b, dot_pos + 1, 3);
+              htsbuff_catn(&sb, dot_pos + 1, 3);
          } else {
-            if (!short_ver)     // Noms longs
-              strcpybuff(b, DEFAULT_EXT + 1);   // pas de..
+            if (!short_ver)
+              htsbuff_cat(&sb, DEFAULT_EXT + 1); // skip the leading dot
            else
-              strcpybuff(b, DEFAULT_EXT_SHORT + 1);     // pas de..
+              htsbuff_cat(&sb, DEFAULT_EXT_SHORT + 1); // skip the leading dot
          }
-          b += strlen(b);       // pointer à la fin
-          //
          break;
-        case 't':              // ext
-          *b = '\0';
+        case 't': // extension
          if (dot_pos) {
-            if (!short_ver)     // Noms longs
-              strcpybuff(b, dot_pos + 1);
+            if (!short_ver)
+              htsbuff_cat(&sb, dot_pos + 1);
            else
-              strncatbuff(b, dot_pos + 1, 3);
+              htsbuff_catn(&sb, dot_pos + 1, 3);
          } else {
-            if (!short_ver)     // Noms longs
-              strcpybuff(b, DEFAULT_EXT + 1);   // pas de..
+            if (!short_ver)
+              htsbuff_cat(&sb, DEFAULT_EXT + 1); // skip the leading dot
            else
-              strcpybuff(b, DEFAULT_EXT_SHORT + 1);     // pas de..
+              htsbuff_cat(&sb, DEFAULT_EXT_SHORT + 1); // skip the leading dot
          }
-          b += strlen(b);       // pointer à la fin
          break;
-        case 'p':              // path sans dernier /
-          *b = '\0';
-          if (nom_pos != fil + 1) {     // pas: /index.html (chemin nul)
-            if (!short_ver) {   // Noms longs
-              strncatbuff(b, fil, (int) (nom_pos - fil) - 1);
+        case 'p': // path without trailing /
+          if (nom_pos !=
+              fil + 1) { // skip when the path is empty (e.g. /index.html)
+            if (!short_ver) {
+              htsbuff_catn(&sb, fil, (int) (nom_pos - fil) - 1);
            } else {
              char BIGSTK pth[HTS_URLMAXSIZE * 2], n83[HTS_URLMAXSIZE * 2];

              pth[0] = n83[0] = '\0';
-              //
              strncatbuff(pth, fil, (int) (nom_pos - fil) - 1);
              long_to_83(opt->savename_83, n83, pth);
-              strcpybuff(b, n83);
+              htsbuff_cat(&sb, n83);
            }
          }
-          b += strlen(b);       // pointer à la fin
          break;
        case 'h':              // host (IDNA decoded if suitable)
          // IDNA / RFC 3492 (Punycode) handling for HTTP(s)
@@ -957,62 +935,50 @@ int url_savename(lien_adrfilsave *const afs,
            DECLARE_ADR(final_adr);

            /* Copy address */
-            *b = '\0';
            if (!short_ver)
-              strcpybuff(b, final_adr);
+              htsbuff_cat(&sb, final_adr);
            else
-              strcpybuff(b, final_adr);
+              htsbuff_cat(&sb, final_adr);

            /* release */
            RELEASE_ADR();
          }
-          b += strlen(b);       // pointer à la fin
          break;
-        case 'H':              // host, raw (old mode)
-          *b = '\0';
+        case 'H': // host, raw (old mode)
          if (protocol == PROTOCOL_FILE) {
-            if (!short_ver)     // Noms longs
-              strcpybuff(b, "localhost");
+            if (!short_ver)
+              htsbuff_cat(&sb, "localhost");
            else
-              strcpybuff(b, "local");
+              htsbuff_cat(&sb, "local");
          } else {
-            if (!short_ver)     // Noms longs
-              strcpybuff(b, print_adr);
+            if (!short_ver)
+              htsbuff_cat(&sb, print_adr);
            else
-              strncatbuff(b, print_adr, 8);
+              htsbuff_catn(&sb, print_adr, 8);
          }
-          b += strlen(b);       // pointer à la fin
          break;
-        case 'M':              /* host/address?query MD5 (128-bits) */
-          *b = '\0';
-          {
-            char digest[32 + 2];
-            char BIGSTK buff[HTS_URLMAXSIZE * 2];
+        case 'M': /* host/address?query MD5 (128-bits) */
+        {
+          char digest[32 + 2];
+          char BIGSTK buff[HTS_URLMAXSIZE * 2];

-            digest[0] = buff[0] = '\0';
-            strcpybuff(buff, adr);
-            strcatbuff(buff, fil_complete);
-            domd5mem(buff, strlen(buff), digest, 1);
-            strcpybuff(b, digest);
-          }
-          b += strlen(b);       // pointer à la fin
-          break;
+          digest[0] = buff[0] = '\0';
+          strcpybuff(buff, adr);
+          strcatbuff(buff, fil_complete);
+          domd5mem(buff, strlen(buff), digest, 1);
+          htsbuff_cat(&sb, digest);
+        } break;
        case 'Q':
-        case 'q':              /* query MD5 (128-bits/16-bits) 
-                                   GENERATED ONLY IF query string exists! */
-          {
-            char md5[32 + 2];
+        case 'q': /* query MD5 (128-bits/16-bits)
+                      GENERATED ONLY IF query string exists! */
+        {
+          char md5[32 + 2];

-            *b = '\0';
-            strncatbuff(b, url_md5(md5, fil_complete), (tok == 'Q') ? 32 : 4);
-            b += strlen(b);     // pointer à la fin
-          }
-          break;
+          htsbuff_catn(&sb, url_md5(md5, fil_complete), (tok == 'Q') ? 32 : 4);
+        } break;
        case 'r':
        case 'R':              // protocol
-          *b = '\0';
-          strcatbuff(b, protocol_str[protocol]);
-          b += strlen(b);       // pointer à la fin
+          htsbuff_cat(&sb, protocol_str[protocol]);
          break;

          /* Patch by Juan Fco Rodriguez to get the full query string */
@@ -1021,19 +987,17 @@ int url_savename(lien_adrfilsave *const afs,
            char *d = strchr(fil_complete, '?');

            if (d != NULL) {
-              strcatbuff(b, d);
-              b += strlen(b);
+              htsbuff_cat(&sb, d);
            }
          }
          break;

        }
      } else
-        *b++ = *a++;
+        htsbuff_catc(&sb, *a++);
    }
-    *b++ = '\0';
    //
-    // Types prédéfinis
+    // predefined types
    //

  }
--- a/src/htsparse.c
+++ b/src/htsparse.c
@@ -274,6 +274,28 @@ Please visit our Website: http://www.httrack.com
  } \
 } while(0)

+/* Percent-encode the angle brackets of a string so it is safe to embed inside
+   an HTML comment (the default footer) or any other HTML context. A URL holding
+   "-->" would otherwise close the footer comment and inject markup (issue #165).
+   Raw '<' and '>' are not valid URL characters, so encoding them is harmless. */
+static const char *html_inline_safe(const char *src, char *dst, size_t size) {
+  size_t i, j;
+
+  for(i = 0, j = 0; src[i] != '\0' && j + 4 < size; i++) {
+    const char c = src[i];
+
+    if (c == '<' || c == '>') {
+      dst[j++] = '%';
+      dst[j++] = '3';
+      dst[j++] = (c == '<') ? 'C' : 'E';
+    } else {
+      dst[j++] = c;
+    }
+  }
+  dst[j] = '\0';
+  return dst;
+}
+
 /* Main parser */
 int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
  char catbuff[CATBUFF_SIZE];
@@ -510,6 +532,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
        int valid_p = 0;        // force to take p even if == 0
        int ending_p = '\0';    // ending quote?
        int archivetag_p = 0;   // avoid multiple-archives with commas
+        int srcset_p = 0;       // srcset="url1 480w, url2 2x": list of URLs
        int unquoted_script = 0;
        INSCRIPT inscript_state_pos_prev = inscript_state_pos;

@@ -719,13 +742,16 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                if (StringNotEmpty(opt->footer)) {
                  char BIGSTK tempo[1024 + HTS_URLMAXSIZE * 2];
                  char gmttime[256];
+                  char BIGSTK safe_adr[HTS_URLMAXSIZE * 3 + 4];
+                  char BIGSTK safe_fil[HTS_URLMAXSIZE * 3 + 4];

                  tempo[0] = '\0';
                  time_gmt_rfc822(gmttime);
                  strcatbuff(tempo, eol);
                  hts_template_format_str(tempo + strlen(tempo), sizeof(tempo) - strlen(tempo),
                          StringBuff(opt->footer),
-                          jump_identification_const(urladr()), urlfil(), gmttime,
+                          html_inline_safe(jump_identification_const(urladr()), safe_adr, sizeof(safe_adr)),
+                          html_inline_safe(urlfil(), safe_fil, sizeof(safe_fil)), gmttime,
                          HTTRACK_VERSIONID, /* EOF */ NULL);
                  strcatbuff(tempo, eol);
                  //fwrite(tempo,1,strlen(tempo),fp);
@@ -1025,6 +1051,12 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                          if (strcmp(hts_detect[i], "archive") == 0) {
                            archivetag_p = 1;
                          }
+                          /* srcset: a comma-list of candidate URLs, each split
+                             out and rewritten below (#235, #236) */
+                          else if (strcmp(hts_detect[i], "srcset") == 0
+                                   || strcmp(hts_detect[i], "data-srcset") == 0) {
+                            srcset_p = 1;
+                          }
                        }
                        i++;
                      }
@@ -1790,6 +1822,14 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                html++;          // sauter # pour usemap etc
              }
            }
+ srcset_next:
+            /* srcset: skip leading whitespace/commas before each candidate;
+               the skipped bytes flush verbatim below */
+            if (srcset_p) {
+              while(html < r->adr + r->size
+                    && (is_realspace(*html) || *html == ','))
+                INCREMENT_CURRENT_ADR(1);
+            }
            eadr = html;

            // ne pas flusher après code si on doit écrire le codebase avant!
@@ -1819,6 +1859,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                    if ((*eadr == quote && (!quoteinscript || *(eadr - 1) == '\\'))     // end quote
                        || (noquote && (*eadr == '\"' || *eadr == '\''))        // end at any quote
                        || (!noquote && quote == '\0' && is_realspace(*eadr))   // unquoted href
+                        || srcset_p     // whitespace ends a srcset candidate URL
                      )         // si pas d'attente de quote spéciale ou si quote atteinte
                      ok = 0;
                  } else if (ending_p && (*eadr == ending_p))
@@ -1847,6 +1888,16 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                      break;    // \" ou \' point d'arrêt
                    case '?':  /*quote_adr=adr; */
                      break;    // noter position query
+                    case ',':
+                      if (srcset_p) {
+                        /* split only on a trailing comma; one inside the URL
+                           (data: URI, CDN path) is kept, per the WHATWG algo */
+                        const char *const n = eadr + 1;
+
+                        if (n >= r->adr + r->size || is_space(*n) || *n == ',')
+                          ok = 0;
+                      }
+                      break;
                    }
                  }
                  //}
@@ -3225,6 +3276,28 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
            }
            // adr=eadr-1;  // ** sauter

+            /* srcset candidate loop: skip the descriptor and comma, then
+               re-enter the capture for the next URL. Backward goto, not a loop:
+               the per-candidate body is this whole block. */
+            if (srcset_p && ok == 0) {
+              const char *const endp = r->adr + r->size;
+              const char *q = html;
+              while(q < endp && *q != '\0' && *q != ',' && *q != quote
+                    && *q != '<' && *q != '>' && (unsigned char) *q >= 32)
+                q++;            // skip the descriptor
+              if (q < endp && *q == ',') {
+                q++;
+                while(q < endp && (is_realspace(*q) || *q == ','))
+                  q++;          // skip whitespace and empty candidates
+                if (q < endp && *q != '\0' && *q != ',' && *q != quote
+                    && *q != '<' && *q != '>' && (unsigned char) *q >= 32) {
+                  INCREMENT_CURRENT_ADR(q - html);   // keep the automate in sync
+                  ok = 1;
+                  goto srcset_next;
+                }
+              }
+            }
+
            /* We skipped bytes and skip the " : reset state */
            /*if (inscript) {
               inscript_state_pos = INSCRIPT_START;
--- a/src/htssafe.h
+++ b/src/htssafe.h
@@ -123,41 +123,111 @@ static HTS_UNUSED void htssafe_compile_time_check_(void) {
  (void) check_pointer;
 }

+/*
+ * Pointer-destination diagnostics for the buff() macros (GCC/Clang, C only).
+ *
+ * strcpybuff()/strcatbuff()/strncatbuff() bounds-check only when the
+ * destination is a sized char[] array (HTS_IS_CHAR_BUFFER). For a bare char*
+ * the capacity is unknown, so the macro silently falls back to plain
+ * strcpy()/strcat()/strncat() while still looking like a checked call.
+ *
+ * These stubs route that pointer case through __builtin_choose_expr() so the
+ * 'warning' attribute fires only at pointer-destination sites; array sites use
+ * the bounded *_safe_ helpers and stay quiet. The warning names the
+ * explicit-size replacement (strlcpybuff/strlcatbuff). Diagnostic only: no
+ * runtime or ABI change, built only on GCC/Clang in C mode. Other compilers
+ * (MSVC, ...) keep the previous behavior via the #else branches.
+ */
+#if (defined(__GNUC__) && !defined(__cplusplus))
+#if defined(__has_attribute)
+#if __has_attribute(warning)
+#define HTS_BUFF_PTR_ATTR(msg) __attribute__((unused, noinline, warning(msg)))
+#endif
+#endif
+#ifndef HTS_BUFF_PTR_ATTR
+/* 'warning' attribute unavailable: keep noinline so the migration can still
+   grep for these symbols, but no compile-time diagnostic is emitted. */
+#define HTS_BUFF_PTR_ATTR(msg) __attribute__((unused, noinline))
+#endif
+
+HTS_BUFF_PTR_ATTR("strcpybuff() destination is a pointer (capacity unknown): "
+                  "NOT bounds-checked; use strlcpybuff(dst, src, size)")
+static char *strcpybuff_ptr_(char *dest, const char *src) {
+  return strcpy(dest, src);
+}
+
+HTS_BUFF_PTR_ATTR("strcatbuff() destination is a pointer (capacity unknown): "
+                  "NOT bounds-checked; use strlcatbuff(dst, src, size)")
+static char *strcatbuff_ptr_(char *dest, const char *src) {
+  return strcat(dest, src);
+}
+
+HTS_BUFF_PTR_ATTR("strncatbuff() destination is a pointer (capacity unknown): "
+                  "NOT bounds-checked; use strlcatbuff(dst, src, size)")
+static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
+  return strncat(dest, src, n);
+}
+#endif
+
 /**
 * Append at most N characters from "B" to "A".
 * If "A" is a char[] variable whose size is not sizeof(char*), then the size 
 * is assumed to be the capacity of this array.
 */
+#if (defined(__GNUC__) && !defined(__cplusplus))
+#define strncatbuff(A, B, N) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
+  strncat_safe_(A, sizeof(A), B, \
+  HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \
+  "overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__), \
+  strncatbuff_ptr_((A), (B), (N)) )
+#else
 #define strncatbuff(A, B, N) \
  ( HTS_IS_NOT_CHAR_BUFFER(A) \
  ? strncat(A, B, N) \
  : strncat_safe_(A, sizeof(A), B, \
  HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \
  "overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) )
+#endif

 /**
 * Append characters of "B" to "A".
 * If "A" is a char[] variable whose size is not sizeof(char*), then the size 
 * is assumed to be the capacity of this array.
 */
+#if (defined(__GNUC__) && !defined(__cplusplus))
+#define strcatbuff(A, B) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
+  strncat_safe_(A, sizeof(A), B, \
+  HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \
+  "overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__), \
+  strcatbuff_ptr_((A), (B)) )
+#else
 #define strcatbuff(A, B) \
  ( HTS_IS_NOT_CHAR_BUFFER(A) \
  ? strcat(A, B) \
  : strncat_safe_(A, sizeof(A), B, \
  HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \
  "overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) )
+#endif

 /**
 * Copy characters from "B" to "A".
 * If "A" is a char[] variable whose size is not sizeof(char*), then the size 
 * is assumed to be the capacity of this array.
 */
+#if (defined(__GNUC__) && !defined(__cplusplus))
+#define strcpybuff(A, B) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
+  strcpy_safe_(A, sizeof(A), B, \
+  HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
+  "overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__), \
+  strcpybuff_ptr_((A), (B)) )
+#else
 #define strcpybuff(A, B) \
  ( HTS_IS_NOT_CHAR_BUFFER(A) \
  ? strcpy(A, B) \
  : strcpy_safe_(A, sizeof(A), B, \
  HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
  "overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__) )
+#endif

 /**
 * Append characters of "B" to "A", "A" having a maximum capacity of "S".
@@ -217,6 +287,88 @@ static HTS_INLINE HTS_UNUSED char* strcpy_safe_(char *const dest, const size_t s
  return strncat_safe_(dest, sizeof_dest, source, sizeof_source, (size_t) -1, exp, file, line);
 }

+/**
+ * htsbuff: a non-owning bounded string builder over a fixed buffer.
+ *
+ * Companion to the strcpybuff()/strcatbuff() macros for the common case of a
+ * cursor walking a buffer of known capacity (building a name into a fixed
+ * array, assembling a status line, etc.). It tracks the write position, bounds
+ * every write against the real capacity, and aborts on overflow (same contract
+ * as the *_safe_ helpers), so the error-prone manual "p += strlen(p)" dance
+ * goes away.
+ *
+ * Build one from an in-scope array with htsbuff_array() (capacity via sizeof,
+ * so pass an array, not a pointer), or from a pointer of known capacity with
+ * htsbuff_ptr(). The buffer is kept NUL-terminated; htsbuff_str() returns it.
+ */
+typedef struct {
+  char *buf;        /* backing buffer (kept NUL-terminated) */
+  size_t cap;       /* total capacity of buf, including the NUL */
+  size_t len;       /* current length, excluding the NUL */
+} htsbuff;
+
+static HTS_INLINE HTS_UNUSED htsbuff htsbuff_ptr_(char *buf, size_t cap) {
+  htsbuff b;
+  b.buf = buf;
+  b.cap = cap;
+  b.len = 0;
+  assertf(cap != 0);
+  buf[0] = '\0';
+  return b;
+}
+
+/**
+ * Builder over the in-scope array ARR (capacity = sizeof(ARR)).
+ * On GCC/Clang this rejects a non-array (e.g. a char* pointer), whose sizeof
+ * would be the pointer size and silently wrong; use htsbuff_ptr() for pointers.
+ * On other compilers there is no such guard, so pass only true arrays there.
+ */
+#if (defined(__GNUC__) && !defined(__cplusplus))
+/* 0 for an array, a -1 array-size compile error for a pointer. */
+#define htsbuff_must_be_array_(A) \
+  (sizeof(char[1 - 2 * !!__builtin_types_compatible_p(typeof(A), typeof(&(A)[0]))]) - 1)
+#define htsbuff_array(ARR) htsbuff_ptr_((ARR), sizeof(ARR) + htsbuff_must_be_array_(ARR))
+#else
+#define htsbuff_array(ARR) htsbuff_ptr_((ARR), sizeof(ARR))
+#endif
+/** Builder over pointer P of known capacity N (N includes the NUL). */
+#define htsbuff_ptr(P, N)  htsbuff_ptr_((P), (N))
+
+/** Append at most n characters of s (stopping at its NUL). Aborts on overflow. */
+static HTS_INLINE HTS_UNUSED void htsbuff_catn(htsbuff *b, const char *s, size_t n) {
+  const size_t add = strnlen(s, n);
+  /* Overflow-safe: keep the (potentially huge) 'add' alone on one side. The
+     maintained invariant len < cap makes 'cap - len' >= 1 (no underflow), so
+     'add < cap - len' cannot wrap the way 'len + add < cap' could. */
+  assertf__(add < b->cap - b->len, "htsbuff append overflow", __FILE__, __LINE__);
+  memcpy(b->buf + b->len, s, add);
+  b->len += add;
+  b->buf[b->len] = '\0';
+}
+
+/** Append s. Aborts on overflow. */
+static HTS_INLINE HTS_UNUSED void htsbuff_cat(htsbuff *b, const char *s) {
+  htsbuff_catn(b, s, (size_t) -1);
+}
+
+/** Append a single character (including '\0' as data). Aborts on overflow. */
+static HTS_INLINE HTS_UNUSED void htsbuff_catc(htsbuff *b, char c) {
+  assertf__(1 < b->cap - b->len, "htsbuff append overflow", __FILE__, __LINE__);
+  b->buf[b->len++] = c;
+  b->buf[b->len] = '\0';
+}
+
+/** Reset content to s. Aborts on overflow. */
+static HTS_INLINE HTS_UNUSED void htsbuff_cpy(htsbuff *b, const char *s) {
+  b->len = 0;
+  htsbuff_catn(b, s, (size_t) -1);
+}
+
+/** Current NUL-terminated content. */
+static HTS_INLINE HTS_UNUSED const char *htsbuff_str(const htsbuff *b) {
+  return b->buf;
+}
+
 #define malloct(A)          malloc(A)
 #define calloct(A,B)        calloc((A), (B))
 #define freet(A)            do { if ((A) != NULL) { free(A); (A) = NULL; } } while(0)
--- a/src/htsthread.c
+++ b/src/htsthread.c
@@ -193,7 +193,23 @@ HTSEXT_API void hts_mutexfree(htsmutex * mutex) {
 HTSEXT_API void hts_mutexlock(htsmutex * mutex) {
  assertf(mutex != NULL);
  if (*mutex == HTSMUTEX_INIT) {        /* must be initialized */
-    hts_mutexinit(mutex);
+    /* Initialize exactly once, even when several threads race to lock the same
+       mutex for the first time. Build our own object, then publish it with a
+       single atomic compare-and-swap; the threads that lose the race free the
+       object they built (issue #297). No static guard is needed, which keeps
+       this safe on Windows 2000 (no statically-initializable lock there). */
+    htsmutex created = HTSMUTEX_INIT;
+
+    hts_mutexinit(&created);
+#ifdef _WIN32
+    if (InterlockedCompareExchangePointer((PVOID volatile *) mutex, created,
+                                          HTSMUTEX_INIT) != HTSMUTEX_INIT)
+#else
+    if (!__sync_bool_compare_and_swap(mutex, HTSMUTEX_INIT, created))
+#endif
+    {
+      hts_mutexfree(&created);
+    }
  }
  assertf(*mutex != NULL);
 #ifdef _WIN32
--- a/src/httrack-library.h
+++ b/src/httrack-library.h
@@ -193,6 +193,7 @@ HTSEXT_API int structcheck(const char *path);
 HTSEXT_API int structcheck_utf8(const char *path);
 HTSEXT_API int dir_exists(const char *path);
 HTSEXT_API void infostatuscode(char *msg, int statuscode);
+HTSEXT_API const char *infostatuscode_const(int statuscode);
 HTSEXT_API TStamp mtime_local(void);
 HTSEXT_API void qsec2str(char *st, TStamp t);
 HTSEXT_API char *int2char(strc_int2bytes2 * strc, int n);
--- a/tests/01_engine-cmdline.test
+++ b/tests/01_engine-cmdline.test
@@ -0,0 +1,71 @@
+#!/bin/bash
+#
+
+# Offline command-line option tests (no network). The -F user-agent and -%X
+# raw-header values used to be rejected past 126 / 256 bytes (#152); they are
+# now bounded only by the general per-argument cap (HTS_CDLMAXSIZE). A value up
+# to that cap is accepted on both the short (-F, -%X) and long (--user-agent,
+# --headers) forms, and an over-cap value is refused cleanly rather than
+# overrunning a fixed scratch buffer.
+
+set -u
+
+tmp=$(mktemp -d "${TMPDIR:-/tmp}/httrack_cmdline.XXXXXX") || exit 1
+trap 'rm -rf "$tmp"' EXIT HUP INT QUIT PIPE TERM
+
+echo '<html><body>hello</body></html>' >"$tmp/index.html"
+
+# a string of N repeated 'A' characters
+nchars() {
+    printf 'A%.0s' $(seq 1 "$1")
+}
+
+# crawl the local fixture with the given extra args; leaves the exit status in RC
+run() {
+    local out="$1"
+    shift
+    rm -rf "$out"
+    mkdir -p "$out"
+    httrack "file://$tmp/index.html" -O "$out" --quiet -n "$@" >"$out/.log" 2>&1
+    RC=$?
+}
+
+# assert the value was accepted: clean exit and the fixture was mirrored
+accepted() {
+    { test "$RC" -eq 0 && test -n "$(find "$1" -type f -path '*/index.html' -print -quit)"; } ||
+        ! echo "FAIL: $2 (exit $RC)" || exit 1
+}
+
+# assert the value was refused cleanly: a normal error exit, never a crash
+# (a SIGABRT from an overflowed scratch buffer would surface as exit 134)
+refused() {
+    { test "$RC" -ne 0 && test "$RC" -ne 134; } ||
+        ! echo "FAIL: $1 (exit $RC)" || exit 1
+}
+
+# a value past the old 126/256 caps but within the cap is accepted, on both the
+# short and long form of each option
+long=$(nchars 900)
+run "$tmp/ua-s" -F "$long"
+accepted "$tmp/ua-s" "#152: long -F user-agent rejected or crashed"
+run "$tmp/ua-l" --user-agent "$long"
+accepted "$tmp/ua-l" "#152: long --user-agent rejected or crashed"
+run "$tmp/hd-s" "-%X" "X-A: $long"
+accepted "$tmp/hd-s" "#152: long -%X header rejected or crashed"
+run "$tmp/hd-l" --headers "X-B: $long"
+accepted "$tmp/hd-l" "#152: long --headers rejected or crashed"
+
+# a value just under the cap (>1000) must not overflow the long-form alias
+# scratch buffer (the param[] copy in optalias_check)
+run "$tmp/ua-n" --user-agent "$(nchars 1010)"
+accepted "$tmp/ua-n" "#152: near-cap --user-agent overflowed the param[] buffer"
+
+# a value over the cap is refused cleanly (graceful error, not a SIGABRT), on
+# both forms
+over=$(nchars 1100)
+run "$tmp/ov-s" -F "$over"
+refused "#152: over-cap -F not refused cleanly"
+run "$tmp/ov-l" --user-agent "$over"
+refused "#152: over-cap --user-agent not refused cleanly"
+
+exit 0
--- a/tests/01_engine-filter.test
+++ b/tests/01_engine-filter.test
@@ -47,3 +47,25 @@ match '*foo*bar' 'foozbar'

 # '?' is the query-string marker, not a single-char wildcard
 nomatch 'a?c' 'abc'
+
+# backslash escapes a metacharacter inside a class so it is matched literally.
+# Quirk: the decoder also adds the backslash itself to the set, so '\X' matches
+# both X and '\'. These assertions pin that behavior.
+match '*[\*]' '*'
+match '*[\*]' "\\"
+nomatch '*[\*]' 'a'
+match '*[\\]' "\\"
+nomatch '*[\\]' 'a'
+match '*[\[]' '['
+match '*[\[]' "\\"
+nomatch '*[\[]' 'a'
+
+# A literal ']' cannot be a class member: the class parser stops at the first
+# ']', escaped or not. So '*[\[\]]' does NOT mean "the [ or ] character" as the
+# filter guide claims (GitHub #148); it parses as the class {'[','\'} followed
+# by a trailing literal ']'. These assertions document the current (buggy)
+# behavior so any future matcher fix is a deliberate, visible change.
+nomatch '*[\[\]]' '['   # not matched, despite the docs
+match '*[\[\]]' ']'     # only via the empty class-match + trailing ']'
+match '*[\[\]]' '[]'    # one of {'[','\'} then the trailing ']'
+nomatch '*[\[\]]' '[]x'
--- a/tests/01_engine-parse.test
+++ b/tests/01_engine-parse.test
@@ -0,0 +1,155 @@
+#!/bin/bash
+#
+
+# Offline HTML parser tests: each section crawls a file:// fixture (no network)
+# and checks which assets the parser captured and how it rewrote the links.
+
+set -u
+
+tmp=$(mktemp -d "${TMPDIR:-/tmp}/httrack_parse.XXXXXX") || exit 1
+trap 'rm -rf "$tmp"' EXIT HUP INT QUIT PIPE TERM
+
+# a minimal valid 1x1 GIF, reused for every referenced asset
+gif() {
+    printf 'GIF89a\1\0\1\0\200\0\0\0\0\0\377\377\377!\371\4\1\0\0\0\0,\0\0\0\0\1\0\1\0\0\2\2D\1\0;' >"$1"
+}
+
+# crawl <fixture-html> into <out> with link rewriting on, no extra fetching
+crawl() {
+    local html="$1" out="$2"
+    rm -rf "$out"
+    mkdir -p "$out"
+    httrack "file://$html" -O "$out" --quiet --near -n >"$out/.log" 2>&1
+}
+
+# assert a file with the given basename was saved somewhere under <out>
+found() {
+    test -n "$(find "$2" -type f -name "$1" -print -quit)" ||
+        ! echo "FAIL: expected '$1' to be downloaded under $2" || exit 1
+}
+
+# assert NO file with the given basename was saved (e.g. a descriptor token must
+# not be mistaken for a URL)
+notfound() {
+    test -z "$(find "$2" -type f -name "$1" -print -quit)" ||
+        ! echo "FAIL: '$1' should not have been downloaded under $2" || exit 1
+}
+
+# the mirrored fixture page (under "file/"), not HTTrack's own landing index
+savedhtml() {
+    find "$1" -type f -path '*/file/*' -name index.html -print -quit
+}
+
+# srcset on <img> and <source> (#235, #236): every candidate captured and
+# rewritten, descriptors preserved, following attributes left intact.
+site="$tmp/srcset"
+mkdir -p "$site"
+for f in a b c d e f g h i j v dz; do gif "$site/$f.gif"; done
+# unquoted heredoc: $site expands in the absolute-URL candidate
+cat >"$site/index.html" <<EOF
+<html><body>
+<img src="a.gif" srcset="b.gif 480w, c.gif 800w">
+<picture><source srcset="d.gif 1x, c.gif 2x"><img src="a.gif"></picture>
+<img srcset="e.gif, f.gif">
+<img srcset="g.gif 2x" alt="trailing attr after srcset">
+<img srcset="  h.gif   2x ,  i.gif  ">
+<video><source src="v.gif"></video>
+<img srcset="file://$site/j.gif 2x">
+<img srcset="data:image/gif;base64,R0lGODlhAQABAAAAACw= 1x, dz.gif 2x">
+<img srcset="">
+<a href="a.gif">plain link still works</a>
+</body></html>
+EOF
+out="$tmp/srcset-out"
+crawl "$site/index.html" "$out"
+
+# every candidate downloads, incl. unique tails (catches first-only parsing),
+# whitespace-padded (h,i), <source src> (v), absolute (j), post-data: URI (dz)
+for f in a b c d e f g h i j v dz; do found "$f.gif" "$out"; done
+
+# the width/density descriptors are not URLs and must not be fetched
+notfound "480w" "$out"
+notfound "800w" "$out"
+notfound "2x" "$out"
+
+saved=$(savedhtml "$out")
+test -n "$saved" || ! echo "FAIL: saved index.html not found" || exit 1
+
+# descriptors must survive the rewrite (no "b.gif 480w" mangled into a path)
+grep -Eq 'srcset="[^"]*480w[^"]*800w' "$saved" ||
+    ! echo "FAIL: srcset width descriptors lost/reordered in rewritten HTML" || exit 1
+grep -Eq 'srcset="[^"]*1x[^"]*2x' "$saved" ||
+    ! echo "FAIL: srcset density descriptors lost/reordered in rewritten HTML" || exit 1
+# the descriptor-less comma form keeps both candidates and the separator verbatim
+grep -Eq 'srcset="e\.gif, f\.gif"' "$saved" ||
+    ! echo "FAIL: comma-separated srcset without descriptors was altered" || exit 1
+# an attribute following srcset in the same tag must be left intact
+grep -q 'alt="trailing attr after srcset"' "$saved" ||
+    ! echo "FAIL: srcset swallowed a following attribute" || exit 1
+
+# a comma inside a URL (data: URI, CDN path) is part of the URL, not a split
+# point (WHATWG): the data: URI stays verbatim; the next candidate (dz) downloads
+grep -Fq 'data:image/gif;base64,R0lGODlhAQABAAAAACw= 1x' "$saved" ||
+    ! echo "FAIL: a comma inside a data: URI srcset candidate was mis-split" || exit 1
+
+# real rewrite, not passthrough: the absolute file:// candidate becomes local
+# (a flat fixture can't show this; the footer comment's file:// is not in srcset)
+grep -Eq 'srcset="j\.gif 2x"' "$saved" ||
+    ! echo "FAIL: absolute file:// srcset URL was not rewritten to a local link" || exit 1
+! grep -Eq 'srcset="[^"]*file://' "$saved" ||
+    ! echo "FAIL: a file:// URL survived inside a rewritten srcset attribute" || exit 1
+
+# xlink:href (#298) and CSS background-image (#237): detected and rewritten to
+# local. background-image is covered in both an external <style> block and an
+# inline style attribute, with the URL unquoted, double-quoted and single-quoted
+# (the quote style is preserved on rewrite). No-detect attributes (title, alt,
+# ...) are left untouched. Asserted by rewrite (deterministic), not download.
+# data-* (#201/#203) is omitted: its detection is currently nondeterministic and
+# can't be locked yet.
+site2="$tmp/attrs"
+mkdir -p "$site2"
+for f in xl ibg ibgs cex cexd cexs tt; do gif "$site2/$f.gif"; done
+cat >"$site2/index.html" <<EOF
+<html><head><style>
+.a { background-image: url(file://$site2/cex.gif); }
+.b { background-image: url("file://$site2/cexd.gif"); }
+.c { background-image: url('file://$site2/cexs.gif'); }
+</style></head><body>
+<a xlink:href="file://$site2/xl.gif">xlink:href (#298)</a>
+<div style="background-image:url(file://$site2/ibg.gif)"></div>
+<div style="background-image:url('file://$site2/ibgs.gif')"></div>
+<span title="file://$site2/tt.gif">excluded attribute</span>
+</body></html>
+EOF
+out2="$tmp/attrs-out"
+crawl "$site2/index.html" "$out2"
+saved2=$(savedhtml "$out2")
+test -n "$saved2" || ! echo "FAIL: saved attrs page not found" || exit 1
+
+# detected attributes: the absolute URL is rewritten to a local link
+grep -Eq 'xlink:href="xl\.gif"' "$saved2" ||
+    ! echo "FAIL #298: xlink:href not detected/rewritten" || exit 1
+
+# #237 external <style> block, each quoting form, quote style preserved
+grep -Eq 'url\(cex\.gif\)' "$saved2" ||
+    ! echo "FAIL #237: unquoted background-image in <style> not rewritten" || exit 1
+grep -Eq 'url\("cexd\.gif"\)' "$saved2" ||
+    ! echo "FAIL #237: double-quoted background-image in <style> not rewritten" || exit 1
+grep -Eq "url\('cexs\.gif'\)" "$saved2" ||
+    ! echo "FAIL #237: single-quoted background-image in <style> not rewritten" || exit 1
+
+# #237 inline style attribute, unquoted and single-quoted url()
+grep -Eq 'style="background-image:url\(ibg\.gif\)"' "$saved2" ||
+    ! echo "FAIL #237: inline unquoted background-image not rewritten" || exit 1
+grep -Eq "style=\"background-image:url\('ibgs\.gif'\)\"" "$saved2" ||
+    ! echo "FAIL #237: inline single-quoted background-image not rewritten" || exit 1
+
+# no file:// URL survived inside any rewritten background-image
+! grep -Eq 'background-image:[^;"]*file://' "$saved2" ||
+    ! echo "FAIL #237: a file:// URL survived inside a rewritten background-image" || exit 1
+
+# excluded attribute: title is on the no-detect list, so its value is left as-is
+grep -q 'title="file://' "$saved2" ||
+    ! echo "FAIL: a no-detect attribute (title) was wrongly rewritten" || exit 1
+
+exit 0
--- a/tests/01_engine-strsafe.test
+++ b/tests/01_engine-strsafe.test
@@ -0,0 +1,34 @@
+#!/bin/bash
+#
+
+# htssafe.h bounded string operations (driven by 'httrack -#8').
+
+# Success path: every bounded op (strcpybuff/strcatbuff/strncatbuff/strlcpybuff)
+# must behave correctly. Like the other -# debug modes, a trailing token is
+# required (a bare '-#8' falls through to the usage screen).
+out=$(httrack -#8 run)
+test $? -eq 0 || exit 1
+test "$out" == "strsafe: OK" || exit 1
+
+# Overflow path: an over-capacity write into a sized buffer must be caught by
+# the bounded macro and abort the process, not be silently truncated/completed.
+# Assert the htssafe abort signature specifically, so the test cannot pass for
+# an unrelated reason (e.g. the -#8 mode being gone and falling through to the
+# usage screen, which also exits non-zero).
+err=$(httrack -#8 overflow "this string is far too long for the buffer" 2>&1)
+case "$err" in
+	*"strsafe: NOT aborted"*) echo "over-capacity write was NOT caught" >&2; exit 1 ;;
+	*"overflow while copying"*) ;;
+	*) echo "expected htssafe overflow abort, got: $err" >&2; exit 1 ;;
+esac
+
+# Same guarantee for the htsbuff builder. The source is exactly the buffer
+# capacity (4 bytes into a 4-byte buffer), so this also pins the boundary: a
+# '<=' off-by-one in the capacity check would let it through (and print "NOT
+# aborted"). Match the specific htsbuff abort message, not just any assert.
+err=$(httrack -#8 overflow-buff "abcd" 2>&1)
+case "$err" in
+	*"strsafe: NOT aborted"*) echo "htsbuff over-capacity write was NOT caught" >&2; exit 1 ;;
+	*"htsbuff append overflow"*) ;;
+	*) echo "expected htsbuff overflow abort, got: $err" >&2; exit 1 ;;
+esac
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -9,6 +9,25 @@ TESTS_ENVIRONMENT += HTTPS_SUPPORT=$(HTTPS_SUPPORT)
 TESTS_ENVIRONMENT += top_srcdir=$(top_srcdir)

 TEST_EXTENSIONS = .test
-TESTS = 00_runnable.test 01_engine-charset.test 01_engine-entities.test 01_engine-filter.test 01_engine-hashtable.test 01_engine-idna.test 01_engine-mime.test 01_engine-simplify.test 02_manpage-regen.test 10_crawl-simple.test 11_crawl-cookies.test 11_crawl-idna.test 11_crawl-international.test 11_crawl-longurl.test 11_crawl-parsing.test 12_crawl_https.test
+TESTS = \
+	00_runnable.test \
+	01_engine-charset.test \
+	01_engine-cmdline.test \
+	01_engine-entities.test \
+	01_engine-filter.test \
+	01_engine-hashtable.test \
+	01_engine-idna.test \
+	01_engine-mime.test \
+	01_engine-parse.test \
+	01_engine-simplify.test \
+	01_engine-strsafe.test \
+	02_manpage-regen.test \
+	10_crawl-simple.test \
+	11_crawl-cookies.test \
+	11_crawl-idna.test \
+	11_crawl-international.test \
+	11_crawl-longurl.test \
+	11_crawl-parsing.test \
+	12_crawl_https.test

 CLEANFILES = check-network_sh.cache
--- a/tests/Makefile.in
+++ b/tests/Makefile.in
@@ -472,7 +472,7 @@ TESTS_ENVIRONMENT = PATH=$(top_builddir)/src$(PATH_SEPARATOR)$$PATH \
 	ONLINE_UNIT_TESTS=$(ONLINE_UNIT_TESTS) \
 	HTTPS_SUPPORT=$(HTTPS_SUPPORT) top_srcdir=$(top_srcdir)
 TEST_EXTENSIONS = .test
-TESTS = 00_runnable.test 01_engine-charset.test 01_engine-entities.test 01_engine-filter.test 01_engine-hashtable.test 01_engine-idna.test 01_engine-mime.test 01_engine-simplify.test 02_manpage-regen.test 10_crawl-simple.test 11_crawl-cookies.test 11_crawl-idna.test 11_crawl-international.test 11_crawl-longurl.test 11_crawl-parsing.test 12_crawl_https.test
+TESTS = 00_runnable.test 01_engine-charset.test 01_engine-cmdline.test 01_engine-entities.test 01_engine-filter.test 01_engine-hashtable.test 01_engine-idna.test 01_engine-mime.test 01_engine-parse.test 01_engine-simplify.test 02_manpage-regen.test 10_crawl-simple.test 11_crawl-cookies.test 11_crawl-idna.test 11_crawl-international.test 11_crawl-longurl.test 11_crawl-parsing.test 12_crawl_https.test
 CLEANFILES = check-network_sh.cache
 all: all-am
Author	SHA1	Message	Date
Xavier Roche	348a7d8cb2	Return HTTP status reason phrases via a const-returning switch infostatuscode() was a ~60-case switch, each arm strcpybuff()-ing a literal into the caller's char* msg: 42 unchecked pointer-destination copies of static data. Keep the same O(1) switch dispatch but have it return the phrase instead of copying -- new public infostatuscode_const(int) -> const char* (or NULL) -- and do the copy in a thin wrapper. infostatuscode() preserves exact behavior: a known code overwrites msg; an unknown code keeps any caller-provided message, else writes "Unknown error". The single remaining copy uses strlcpybuff with the documented 64-byte minimum (longest phrase is 31; all callers pass >= 80). Drops 42 pointer-destination warnings (htslib.c 56 -> 14; tree 179 -> 137). No dispatch regression: it stays a switch (jump table), no allocation, no per-call scan. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 13:14:23 +02:00
Xavier Roche	5f81741ac5	Merge pull request #332 from xroche/cleanup/url_savename-htsbuff Convert the url_savename template renderer to htsbuff	2026-06-14 13:01:32 +02:00
Xavier Roche	0cf14c4e88	Convert the url_savename template renderer to htsbuff The savename_type == -1 userdef renderer walked afs->save with a raw char* cursor, doing "b += strlen(b)" after each write, and strcpybuff(b, ...) on that char* was unchecked (the pointer-destination case). That manual pointer math is where the function's off-by-one / strlen-based hazards lived. Convert the cursor to an htsbuff over afs->save (capacity sizeof = the full HTS_URLMAXSIZE*2 buffer): every append is now bounds-checked and the pointer math is gone. The loop's truncation guard becomes "sb.len < HTS_URLMAXSIZE", preserving the existing cap-at-1024 behavior; the 2x buffer means a write only aborts where it would previously have overrun. Add htsbuff_catc for the single-character appends ('%', '.', literal copy). Removes 35 pointer-destination warnings (htsname.c 51 -> 9; the renderer is now warning-free). Behavior verified identical: the pre-change and new binaries produce byte-identical output across 14 -N templates (%n %N %t %p %h %H %M %q %r %% %[param], the short %s variants, and literals) crawling a local site. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 12:59:29 +02:00
Xavier Roche	29a07ff487	Merge pull request #334 from xroche/cleanup/git-format-hook Add an opt-in pre-commit hook that auto-formats changed C lines	2026-06-14 12:58:42 +02:00
Xavier Roche	f987083f14	Add an opt-in pre-commit hook that auto-formats changed C lines Enable with: git config core.hooksPath .githooks The hook runs git-clang-format (clang-format 19, repo .clang-format) on the staged C lines only and re-stages the result, so commits stay clang-format-clean and the CI format check passes without a round-trip. It never reformats the whole tree, only the lines a commit changes. Safe by construction: if clang-format 19 is absent it skips (CI still enforces); and if a file has both staged and unstaged changes it does not auto-mutate (which would commit the unstaged part), it reports and asks the author to stage/stash. HTTRACK_NO_AUTOFORMAT=1 skips it for one commit. README covers the noexec-working-tree case (point core.hooksPath at an exec-fs copy). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 12:55:17 +02:00
Xavier Roche	eb565f0bd8	Merge pull request #333 from xroche/cleanup/clang-format-setup Add a .clang-format and a changed-lines CI format check	2026-06-14 12:38:20 +02:00
Xavier Roche	71398d510e	Add a .clang-format and a changed-lines CI format check The engine predates clang-format (it was shaped by an old Visual Studio formatter) and does not round-trip through it: a whole-tree reformat is ~25k lines of churn, so we never do one. Instead we format only the lines a change touches, via git-clang-format, and enforce that in CI diff-scoped. .clang-format is reverse-engineered from src/.c (2-space, no tabs, 80 cols, char x pointers, attached braces, un-indented case labels, space after C-style casts). That is mostly LLVM defaults; the deliberate deviations are SpaceAfterCStyleCast (the dominant "(int) x" form) and SortIncludes: false (C include order can be significant, so never reorder). The CI "format" job pins clang-format-19 from apt.llvm.org's noble channel (ubuntu-24.04's native is 18) to match local dev, and fails only if a PR's changed C lines are not clang-format-clean. Existing untouched code is left alone. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 12:26:49 +02:00
Xavier Roche	75fc040f06	Merge pull request #331 from xroche/cleanup/htsbuff-builder Add htsbuff: a bounded string builder over a fixed buffer	2026-06-14 10:40:23 +02:00
Xavier Roche	c4ef18f5a5	Add htsbuff: a bounded string builder over a fixed buffer Many pointer-destination buff() sites are cursors walking a buffer of known capacity, with a manual "p += strlen(p)" after each write (the url_savename renderer does this ~40 times). That hand-rolled pointer math is where several of the off-by-one hazards live. htsbuff captures the pattern: a non-owning builder (buf/cap/len) built from an in-scope array (htsbuff_array, capacity via sizeof) or a pointer of known size (htsbuff_ptr). htsbuff_cat/catn/cpy bound every write against the real capacity and abort on overflow, same contract as the *_safe_ helpers, so the pointer math goes away. Extend the -#8 self-test and tests/01_engine-strsafe.test with builder correctness (append, truncating append, reset, length) and an overflow-abort case. No call sites are converted yet; that follows per file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 10:38:22 +02:00
Xavier Roche	d76dad47f7	Merge pull request #330 from xroche/cleanup/htssafe-pointer-diagnostics Flag unchecked pointer-destination uses of the buff() string macros	2026-06-14 08:49:26 +02:00
Xavier Roche	9c6ff54040	Bound catch_url() header buffer to its 32Kb contract First consumer of the new buff() pointer-destination diagnostic. catch_url() appended response headers into the caller's 'data' buffer with strcatbuff on a char* destination, which is unchecked: a long header stream could overrun the 32Kb buffer. Make the capacity contract explicit (CATCH_URL_DATA_SIZE in htscatchurl.h, used by the caller too) and append with strlcatbuff, which enforces the bound and aborts rather than overflowing. htscatchurl.c now compiles warning-free under the diagnostic. The remaining raw sprintf/sscanf into the same buffer are separate items for a later pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 08:46:03 +02:00
Xavier Roche	4a057514b9	Warn on unchecked pointer-destination uses of the buff() macros strcpybuff/strcatbuff/strncatbuff only bounds-check when the destination is a sized char[] array. For a bare char* the capacity is unknown, so the macro silently falls back to plain strcpy/strcat/strncat while still looking like a checked call. On GCC/Clang, route the pointer case through __builtin_choose_expr() to a stub carrying the 'warning' function attribute, so a compile-time warning fires only at pointer-destination sites and points at the explicit-size replacement (strlcpybuff/strlcatbuff). Array sites keep using the bounded _safe_ helpers and stay quiet. The change is diagnostic only: no runtime or ABI change, and other compilers keep the previous behavior. Add a runtime self-test for the bounded ops behind a new -#8 debug mode, plus tests/01_engine-strsafe.test covering both correct copies and the abort-on-overflow guarantee. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 08:40:10 +02:00
Xavier Roche	055e17b057	Merge pull request #328 from xroche/cli/header-ua-length-152 Raise the user-agent and custom-header length limits	2026-06-14 01:43:31 +02:00
Xavier Roche	d7bb97d697	Merge pull request #329 from xroche/parser/lock-background-image-237 Lock CSS background-image url() rewriting in the parser test	2026-06-14 01:37:51 +02:00
Xavier Roche	d741188980	Raise the user-agent and custom-header length limits The -F user-agent value was rejected past 126 bytes and the -%X header line past 256. Both are stored in dynamically grown String buffers, so the caps were arbitrary. Drop them; every argument is still bounded by the general per-argument check in htscoremain.c (HTS_CDLMAXSIZE), which lifts the usable limit to just under 1 KB. optalias_check copied a long-form option value (--user-agent, --headers, ...) into a fixed 1000-byte scratch buffer, smaller than that general cap, so a value of 1000..1023 bytes aborted the process through the guarded-copy overflow check. Size command and param to HTS_CDLMAXSIZE so the long form matches the cap; an over-cap value is now refused with the normal "argument too long" message instead of crashing. Grow the request-head buffer to 16384 for the larger aggregate header set. closes #152	2026-06-14 01:32:07 +02:00
Xavier Roche	ca810ef7e3	Lock CSS background-image url() rewriting in the parser test background-image is already captured and rewritten through the style/CSS url() path, in both an external <style> block and an inline style attribute, with the URL unquoted, double-quoted or single-quoted. Extend the offline parser test to cover all of these so the behavior stays locked. closes #237	2026-06-14 01:07:42 +02:00
Xavier Roche	1bf90ce294	Merge pull request #326 from xroche/parser/srcset-candidates Capture every srcset candidate URL on <img> and <source>	2026-06-14 00:42:48 +02:00
Xavier Roche	583817dcd4	Capture every srcset candidate URL on <img> and <source> A srcset value is a comma-separated list of "URL descriptor" entries (480w, 2x). HTTrack only had "data-srcset" in the link-detection table and left the plain "srcset" attribute untouched, so responsive images were never mirrored. The parser now captures and rewrites each candidate URL in turn, preserving the descriptors and the commas between entries verbatim, and bounds every new buffer scan against the page end. Candidate splitting follows the WHATWG srcset algorithm: the URL is a run of non-whitespace characters, so a comma inside a URL (a data: URI, a CDN transform path like w_300,c_fill) stays part of the URL and is not mis-split; only a trailing comma or a comma after the descriptor separates candidates. Adds tests/01_engine-parse.test, an offline file:// parser test that asserts each candidate is queued and rewritten (including the comma-in-URL cases), and also locks the existing xlink:href (#298) and inline background-image (#237) handling. closes #235 closes #236	2026-06-14 00:37:20 +02:00
Xavier Roche	5351e96d71	Merge pull request #325 from xroche/docs/rfc2606-example-domains docs: use www.example.com in examples; add html manual regen target	2026-06-13 10:41:24 +02:00
Xavier Roche	9d39a57576	build: add regen target for html/httrack.man.html The rendered HTML manual had no regeneration path. Add regen-man-html, which runs groff's html device over httrack.1, alongside the existing regen-man target.	2026-06-13 10:38:31 +02:00
Xavier Roche	e3d4ec01f7	docs: use www.example.com in examples instead of www.someweb.com someweb.com is a real registrable domain; example.com is reserved for documentation (RFC 2606). Replace it across the HTML guides, the CLI --help text (htshelp.c) and code comments, then regenerate man/httrack.1 and the rendered html/httrack.man.html. Other placeholder domains are left alone: they appear inside filter/wildcard examples where the host interacts with the pattern.	2026-06-13 10:38:31 +02:00
Xavier Roche	a0bf50f6b1	Merge pull request #324 from xroche/test/filter-escape-characterize test: characterize wildcard class escape behavior	2026-06-13 10:17:24 +02:00
Xavier Roche	794404bba2	test: characterize wildcard class escape behavior Add -#0 self-test cases for backslash escapes inside a '[...]' class. They pin two quirks of the current decoder: '\X' matches both X and the backslash itself, and a literal ']' cannot be a class member because the parser stops at the first ']' (escaped or not). The latter is why the filter guide's '[\[\]]' = "the [ or ] character" claim is wrong (#148): it parses as the class {[,\} plus a trailing literal ']'. These tests lock the behavior down so a later matcher fix is a deliberate change. refs #148	2026-06-13 10:15:45 +02:00
Xavier Roche	82d08aaeaf	Merge pull request #323 from xroche/fix/doc-lang-nits docs: fix help-guide placeholders, README clone flag, Ukrainian charset	2026-06-13 10:12:09 +02:00
Xavier Roche	459f06e758	docs: fix help-guide placeholders, README clone flag, Ukrainian charset Escape the literal <URLs>, <FILTERs>, <param>, <filter>, <file> and related placeholders in fcguide.html so they render instead of being swallowed as unknown HTML tags; several were also missing their closing '>'. Use --recurse-submodules in the README clone command. Relabel lang/Ukrainian.txt as windows-1251, which is what its bytes actually are (ISO-8859-5 decodes them to garbage). closes #132, closes #103, closes #167	2026-06-13 10:05:40 +02:00
Xavier Roche	89b25e418b	Merge pull request #322 from xroche/test/expand-engine-coverage test: expand offline engine self-test coverage	2026-06-13 09:58:03 +02:00
Xavier Roche	017c634c53	Merge pull request #321 from xroche/fix/mutex-init-race-297 Fix race in lazy mutex initialization	2026-06-13 09:18:39 +02:00
Xavier Roche	f2b36c4b29	Merge pull request #320 from xroche/fix/lockpath-overflow-183 Fix abort on long log path (lock-file buffer too small)	2026-06-13 09:18:10 +02:00
Xavier Roche	19947efd74	Merge pull request #319 from xroche/fix/footer-xss-165 Fix XSS via unescaped URL in the page footer comment	2026-06-13 09:18:02 +02:00
Xavier Roche	de26ad881a	fix: synchronize lazy mutex initialization (closes #297 ) Two threads locking the same mutex for the first time could both run the unsynchronized lazy init, corrupting the underlying pthread mutex and aborting or deadlocking. Build the object and publish it with a single atomic compare-and-swap; threads that lose the race free the object they built. This needs no statically-initializable guard, so it stays valid on Windows 2000.	2026-06-13 09:15:31 +02:00
Xavier Roche	106d34d82c	fix: size the lock-file path buffer to the concat buffer (closes #183 ) A long log path made the lock-file path overflow the fixed 256-byte n_lock buffer, tripping the guarded copy and aborting with signal 6. Size n_lock to the concat-buffer capacity so it holds any path fconcat can produce. (cherry picked from commit 15144ffd24667712cca2ac0fee96bd355239eff6)	2026-06-12 23:24:20 +02:00
Xavier Roche	61e0b3250b	fix: escape angle brackets in the page footer URL (closes #165 ) The default footer embeds the page URL inside an HTML comment. A URL containing "-->" closed the comment and let an attacker inject script into the mirrored page. Percent-encode < and > before the URL reaches the footer. (cherry picked from commit 606883229244dc233d16915678e63cfa62000ff0)	2026-06-12 23:24:20 +02:00