Curate the 3.49-8 release notes

Round out the 3.49-8 entry in history.txt and the debian changelog with the user-facing work landed since 3.49-7: the HTTPS-proxy CONNECT tunnel, wider srcset parsing, the crawler and parser fixes (CSS @import, xmlns, relative paths, RFC 6265 cookies, doit.log reload), the parser and engine buffer-copy security hardening, and brief summary lines for the API, build, CI and test work. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>
Merge pull request #403 from xroche/chore/clang-format-separate-defs
2026-06-21 17:49:05 +03:00 · 2026-06-20 13:02:51 +02:00 · 2026-06-20 12:56:23 +02:00 · 2026-06-20 12:52:19 +02:00 · 2026-06-20 12:42:19 +02:00 · 2026-06-20 12:39:31 +02:00
43 changed files with 1700 additions and 1146 deletions
--- a/.clang-format
+++ b/.clang-format
@@ -16,6 +16,7 @@ BasedOnStyle: LLVM
 SpaceAfterCStyleCast: true   # "(int) x", overwhelmingly dominant (542 vs 7)
 SortIncludes: false          # C include order can be significant; never reorder
 IncludeBlocks: Preserve      # do not merge/reflow include groups
 SeparateDefinitionBlocks: Always  # blank line between definitions (readability)
 # Stated explicitly for robustness against base-style drift (these match LLVM):
 IndentWidth: 2
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -320,6 +320,21 @@ jobs:
  lint:
    name: lint (shellcheck, shfmt)
    runs-on: ubuntu-24.04
    # Every tracked shell script; the globs expand at run time. Kept here so the
    # shellcheck and shfmt steps below cannot drift apart.
    env:
      SHELL_SCRIPTS: >-
        .githooks/pre-commit
        bootstrap
        build.sh
        html/div/search.sh
        man/makeman.sh
        src/htsbasiccharsets.sh
        src/htsentities.sh
        src/webhttrack
        tests/*.sh
        tests/*.test
        tools/mkdeb.sh
    steps:
      - uses: actions/checkout@v6
@@ -332,12 +347,11 @@ jobs:
          sudo apt-get install -y --no-install-recommends shellcheck shfmt
          shfmt --version
      # Lint the scripts we maintain; the legacy scripts are a separate cleanup.
      - name: shellcheck
-        run: shellcheck man/makeman.sh tools/mkdeb.sh .githooks/pre-commit tests/*.test tests/check-network.sh
+        run: shellcheck $SHELL_SCRIPTS
      - name: shfmt
-        run: shfmt -d -i 4 man/makeman.sh tools/mkdeb.sh .githooks/pre-commit
+        run: shfmt -d -i 4 $SHELL_SCRIPTS
  # Check clang-format on CHANGED LINES ONLY. The engine predates clang-format
  # (it was shaped by an old Visual Studio formatter) and does not round-trip,
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,6 +1,9 @@
 httrack (3.49.8-1) unstable; urgency=medium
-  * New upstream release.
+  * New upstream release: HTTPS-proxy CONNECT tunnelling and wider srcset
    parsing, a batch of crawler and parser fixes (CSS @import, xmlns
    namespaces, relative paths, RFC 6265 cookies), and security hardening of
    the parser and of buffer copies throughout the engine.
  * Drop the OpenSSL linking exception from the license: OpenSSL 3.0+ is
    Apache-2.0 and GPL-compatible, so it is no longer needed. httrack is now
    plain GPL-3.0-or-later. Updated debian/copyright accordingly.
@@ -14,7 +17,7 @@ httrack (3.49.8-1) unstable; urgency=medium
    the QA debcheck page. Depend on firefox-esr | chromium | www-browser
    instead.
- -- Xavier Roche <xavier@debian.org>  Sun, 07 Jun 2026 14:29:24 +0200
+ -- Xavier Roche <xavier@debian.org>  Sat, 20 Jun 2026 13:02:08 +0200
 httrack (3.49.7-2) unstable; urgency=medium
--- a/history.txt
+++ b/history.txt
@@ -5,12 +5,31 @@ HTTrack Website Copier release history:
 This file lists all changes and fixes that have been made for HTTrack
 3.49-8
 + New: tunnel HTTPS downloads through the configured HTTP proxy via CONNECT (#85)
 + New: parse every candidate URL in <img> and <source> srcset lists (#326)
 + Changed: dropped the obsolete OpenSSL linking exception (OpenSSL 3.0+ is Apache-2.0 and GPL-compatible); httrack is now plain GPLv3-or-later
-+ Fixed: link libhtsjava and the libtest examples directly against libc
+ Fixed: several out-of-bounds reads in the HTML/CSS parser on hostile input (#94, #396)
 + Fixed: stored XSS via an unescaped URL in the generated page footer (#165)
 + Fixed: hardened buffer copies throughout the engine against overflow
 + Fixed: capture conditional CSS @import URLs (#94)
 + Fixed: don't crawl xmlns namespace declarations as links (#191)
 + Fixed: don't mistake the method argument of XMLHttpRequest.open for a URL (#218)
 + Fixed: percent-encode parentheses when rewriting CSS url() targets (#163)
 + Fixed: collapse ../ in file:// URLs and widen relative-link handling (#137, #162)
 + Fixed: drop the obsolete $Version/$Path attributes from the request Cookie header, per RFC 6265 (#151)
 + Fixed: keep empty quoted arguments when reloading doit.log for --update/--continue (#106)
 + Fixed: raise the User-Agent and custom-header length limits (#152)
 + Fixed: abort on a long log path (lock-file buffer too small) (#183)
 + Fixed: race in lazy mutex initialization (#297)
 + Fixed: sub-second mtime precision when comparing local files on POSIX (#383)
 + Fixed: modernize OpenSSL TLS initialization for the 3.x to 4.x transition (#308)
 + Fixed: in-place changes made by the postprocess callback were not applied (Roman Sęk)
 + Fixed: "preffered" typo in the help text and man page (yosinn1-blip)
 + Fixed: corrections and updates of the Russian translation (German Aizek)
 + Fixed: corrections and updates of the Danish translation (scootergrisen)
 + Fixed: link libhtsjava and the libtest examples directly against libc
 + New: documented the public library API headers and typed the option fields as named enums
 + Fixed: numerous build, packaging, CI and test-coverage improvements (out-of-tree builds, sanitizer/distcheck CI, shell and Python linting, AppStream metainfo)
 3.49-7
 + Fixed: keep generated config.h architecture-independent (Debian #1133728)
--- a/html/div/search.sh
+++ b/html/div/search.sh
@@ -1,4 +1,3 @@
 #!/bin/sh
 # Simple indexing test using HTTrack
@@ -18,22 +17,22 @@ if ! test -f "index.txt"; then
 fi
 # Convert crlf to lf
-if test "`head index.txt -n 1 | tr '\r' '#' | grep -c '#'`" = "1"; then
+if test "$(head index.txt -n 1 | tr '\r' '#' | grep -c '#')" = "1"; then
    echo "Converting index to Unix LF style (not CR/LF) .."
    mv -f index.txt index.txt.old
-	cat index.txt.old|tr -d '\r' > index.txt
+    tr -d '\r' <index.txt.old >index.txt
 fi
 keyword=-
 while test -n "$keyword"; do
    printf "Enter a keyword: "
-	read keyword
+    read -r keyword
    if test -n "$keyword"; then
-		FOUNDK="`grep -niE \"^$keyword\" index.txt`"
+        FOUNDK="$(grep -niE "^$keyword" index.txt)"
        if test -n "$FOUNDK"; then
-			if ! test `echo "$FOUNDK"|wc -l` = "1"; then
+            if ! test "$(echo "$FOUNDK" | wc -l)" = "1"; then
                # Multiple matches
                printf "Found multiple keywords: "
                echo "$FOUNDK" | cut -f2 -d':' | tr '\n' ' '
@@ -41,12 +40,12 @@ while test -n "$keyword"; do
                echo "Use keyword$ to find only one"
            else
                # One match
-				N=`echo "$FOUNDK"|cut -f1 -d':'`
+                N=$(echo "$FOUNDK" | cut -f1 -d':')
-				PM=`tail +$N index.txt|grep -nE "\("|head -n 1`
+                PM=$(tail "+$N" index.txt | grep -nE "\(" | head -n 1)
                if ! echo "$PM" | grep "ignored" >/dev/null; then
-					M=`echo $PM|cut -f1 -d':'`
+                    M=$(echo "$PM" | cut -f1 -d':')
                    echo "Found in:"
-					cat index.txt | tail "+$N" | head -n "$M" | grep -E "[0-9]* " | cut -f2 -d' '
+                    tail "+$N" index.txt | head -n "$M" | grep -E "[0-9]* " | cut -f2 -d' '
                else
                    echo "keyword ignored (too many hits)"
                fi
@@ -57,4 +56,3 @@ while test -n "$keyword"; do
    fi
 done
--- a/src/htsarrays.h
+++ b/src/htsarrays.h
@@ -48,9 +48,8 @@ Please visit our Website: http://www.httrack.com
 /* Abort (with the failed byte count) when a growth allocation fails. The
   array macros never return an out-of-memory error; they assert and abort. */
 static void hts_record_assert_memory_failed(const size_t size) {
-  fprintf(stderr, "memory allocation failed (%lu bytes)", \
+  fprintf(stderr, "memory allocation failed (%lu bytes)", (long int) size);
-          (long int) size); \
+  assertf(!"memory allocation failed");
  assertf(! "memory allocation failed"); \
 }
 /** Dynamic array of T elements. **/
@@ -109,20 +108,22 @@ static void hts_record_assert_memory_failed(const size_t size) {
 * After a call to this macro, TypedArrayRoom(A) is guaranteed to be at
 * least equal to 'ROOM'.
 **/
-#define TypedArrayEnsureRoom(A, ROOM) do { \
+#define TypedArrayEnsureRoom(A, ROOM)                                          \
  do {                                                                         \
    const size_t room_ = (ROOM);                                               \
    while (TypedArrayRoom(A) < room_) {                                        \
      TypedArrayCapa(A) = TypedArrayCapa(A) < 16 ? 16 : TypedArrayCapa(A) * 2; \
    }                                                                          \
-  TypedArrayPtr(A) = realloc(TypedArrayPtr(A), \
+    TypedArrayPtr(A) =                                                         \
-                             TypedArrayCapa(A)*TypedArrayWidth(A)); \
+        realloc(TypedArrayPtr(A), TypedArrayCapa(A) * TypedArrayWidth(A));     \
    if (TypedArrayPtr(A) == NULL) {                                            \
      hts_record_assert_memory_failed(TypedArrayCapa(A) * TypedArrayWidth(A)); \
    }                                                                          \
  } while (0)
 /** Add an element. Macro, first element evaluated multiple times. **/
-#define TypedArrayAdd(A, E) do { \
+#define TypedArrayAdd(A, E)                                                    \
  do {                                                                         \
    TypedArrayEnsureRoom(A, 1);                                                \
    assertf(TypedArraySize(A) < TypedArrayCapa(A));                            \
    TypedArrayTail(A) = (E);                                                   \
@@ -133,7 +134,8 @@ static void hts_record_assert_memory_failed(const size_t size) {
 * Add 'COUNT' elements from 'PTR'.
 * Macro, first element evaluated multiple times.
 **/
-#define TypedArrayAppend(A, PTR, COUNT) do { \
+#define TypedArrayAppend(A, PTR, COUNT)                                        \
  do {                                                                         \
    const size_t count_ = (COUNT);                                             \
    /* This 1-case is to benefit from type safety. */                          \
    if (count_ == 1) {                                                         \
@@ -148,7 +150,8 @@ static void hts_record_assert_memory_failed(const size_t size) {
  } while (0)
 /** Clear an array, freeing memory and clearing size and capacity. **/
-#define TypedArrayFree(A) do { \
+#define TypedArrayFree(A)                                                      \
  do {                                                                         \
    if (TypedArrayPtr(A) != NULL) {                                            \
      TypedArrayCapa(A) = TypedArraySize(A) = 0;                               \
      free(TypedArrayPtr(A));                                                  \
--- a/src/htsbasenet.h
+++ b/src/htsbasenet.h
@@ -49,9 +49,10 @@ Please visit our Website: http://www.httrack.com
 #define WIN32_LEAN_AND_MEAN
 // KB955045 (http://support.microsoft.com/kb/955045)
 // To execute an application using this function on earlier versions of Windows
-// (Windows 2000, Windows NT, and Windows Me/98/95), then it is mandatary to #include Ws2tcpip.h
+// (Windows 2000, Windows NT, and Windows Me/98/95), then it is mandatary to
-// and also Wspiapi.h. When the Wspiapi.h header file is included, the 'getaddrinfo' function is
+// #include Ws2tcpip.h and also Wspiapi.h. When the Wspiapi.h header file is
-// #defined to the 'WspiapiGetAddrInfo' inline function in Wspiapi.h. 
+// included, the 'getaddrinfo' function is #defined to the 'WspiapiGetAddrInfo'
 // inline function in Wspiapi.h.
 #include <ws2tcpip.h>
 #include <Wspiapi.h>
 // #include <winsock2.h>
--- a/src/htsbasiccharsets.sh
+++ b/src/htsbasiccharsets.sh
@@ -13,14 +13,14 @@ rm -f CP932.TXT CP936.TXT CP949.TXT CP950.TXT
 fi
 # Produce code
-printf "/** GENERATED FILE ($0), DO NOT EDIT **/\n\n"
+printf '/** GENERATED FILE (%s), DO NOT EDIT **/\n\n' "$0"
 for i in *.TXT; do
    echo "processing $i" >&2
-  grep -vE "^(#|$)" $i | grep -E "^0x" | sed -e 's/[[:space:]]/ /g' | cut -f1,2 -d' ' | \
+    grep -vE "^(#|$)" "$i" | grep -E "^0x" | sed -e 's/[[:space:]]/ /g' | cut -f1,2 -d' ' |
        (
            unset arr
-    while read LINE ; do
+            while read -r LINE; do
-      from=$[$(echo $LINE | cut -f1 -d' ')]
+                from=$(($(echo "$LINE" | cut -f1 -d' ')))
                if ! test -n "$from"; then
                    echo "error with $i" >&2
                    exit 1
@@ -28,22 +28,23 @@ for i in *.TXT ; do
                    echo "out-of-range ($LINE) with $i" >&2
                    exit 1
                fi
-      to=$(echo $LINE | cut -f2 -d' ') 
+                to=$(echo "$LINE" | cut -f2 -d' ')
-      arr[$from]=$to
+                arr[from]=$to
            done
-    name=$(echo $i | tr 'A-Z' 'a-z' | tr '-' '_' | sed -e 's/\.txt//' -e 's/8859/iso_8859/')
+            # shellcheck disable=SC2018,SC2019 # charset filenames are ASCII; keep C-locale A-Z/a-z
-    printf "/* Table for $i */\nstatic const hts_UCS4 table_${name}[256] = {\n  "
+            name=$(echo "$i" | tr 'A-Z' 'a-z' | tr '-' '_' | sed -e 's/\.txt//' -e 's/8859/iso_8859/')
-    i=0
+            printf '/* Table for %s */\nstatic const hts_UCS4 table_%s[256] = {\n  ' "$i" "$name"
-    while test "$i" -lt 256; do
+            idx=0
-      if test "$i" -gt 0; then
+            while test "$idx" -lt 256; do
                if test "$idx" -gt 0; then
                    printf ", "
-        if test $[${i}%8] -eq 0; then
+                    if test $((idx % 8)) -eq 0; then
                        printf "\n  "
                    fi
                fi
-      value=${arr[$i]:-0}
+                value=${arr[$idx]:-0}
-      printf "0x%04x" $value
+                printf "0x%04x" "$value"
-      i=$[${i}+1]
+                idx=$((idx + 1))
            done
            printf " };\n\n"
        )
@@ -53,7 +54,8 @@ done
 # Indexes
 printf "static const struct {\n  const char *name;\n  const hts_UCS4 *table;\n} table_mappings[] = {\n"
 for i in *.TXT; do
-  name=$(echo $i | tr 'A-Z' 'a-z' | tr '-' '_' | sed -e 's/\.txt//' -e 's/8859/iso_8859/')
+    # shellcheck disable=SC2018,SC2019 # charset filenames are ASCII; keep C-locale A-Z/a-z
-  printf "  { \"$(echo $name | tr -d '_')\", table_${name} },\n"
+    name=$(echo "$i" | tr 'A-Z' 'a-z' | tr '-' '_' | sed -e 's/\.txt//' -e 's/8859/iso_8859/')
    printf '  { "%s", table_%s },\n' "$(echo "$name" | tr -d '_')" "$name"
 done
 printf "  { NULL, NULL }\n};\n"
--- a/src/htsbauth.h
+++ b/src/htsbauth.h
@@ -71,7 +71,8 @@ struct t_cookie {
 int cookie_add(t_cookie *cookie, const char *cook_name, const char *cook_value,
               const char *domain, const char *path);
-int cookie_del(t_cookie * cookie, const char *cook_name, const char *domain, const char *path);
+int cookie_del(t_cookie *cookie, const char *cook_name, const char *domain,
               const char *path);
 int cookie_load(t_cookie *cookie, const char *path, const char *name);
@@ -83,7 +84,8 @@ void cookie_delete(char *s, size_t s_size, size_t pos);
 const char *cookie_get(char *buffer, const char *cookie_base, int param);
-char *cookie_find(char *s, const char *cook_name, const char *domain, const char *path);
+char *cookie_find(char *s, const char *cook_name, const char *domain,
                  const char *path);
 char *cookie_nextfield(char *a);
@@ -92,7 +94,8 @@ char *cookie_nextfield(char *a);
 /** Register credentials (auth = base-64 user:pass) for the prefix derived from
    adr (host) and fil (path). No-op returning 0 if cookie is NULL, allocation
    fails, or a matching prefix is already stored; returns 1 on insertion. */
-int bauth_add(t_cookie * cookie, const char *adr, const char *fil, const char *auth);
+int bauth_add(t_cookie *cookie, const char *adr, const char *fil,
              const char *auth);
 /** Return the stored base-64 credentials whose prefix matches adr+fil, or NULL
    if none (or cookie is NULL). Returned pointer aliases the jar's bauth_chain;
--- a/src/htsconfig.h
+++ b/src/htsconfig.h
@@ -87,7 +87,8 @@ Please visit our Website: http://www.httrack.com
 // fast cache (build hash table)
 #define HTS_FAST_CACHE 1
-// le > peut être considéré comme un tag de fermeture de commentaire (<!-- > est valide)
+// le > peut être considéré comme un tag de fermeture de commentaire (<!-- > est
 // valide)
 #define GT_ENDS_COMMENT 1
 // always adds a '/' at the end if a '~' is encountered (/~smith -> /~smith/)
@@ -97,7 +98,8 @@ Please visit our Website: http://www.httrack.com
 #define HTS_STRIP_DOUBLE_SLASH 0
 // case-sensitive pour les dossiers et fichiers (0/1)
-// [normalement 1, mais pose des problèmes (url malformée par exemple) et n'est pas très utile..
+// [normalement 1, mais pose des problèmes (url malformée par exemple) et n'est
 // pas très utile..
 // ..et pas bcp respecté]
 // REMOVED
 // #define HTS_CASSE 0
--- a/src/htscoremain.c
+++ b/src/htscoremain.c
@@ -2787,6 +2787,47 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
                  return 0;
                }
                break;
              case 'l': /* lienrelatif: relative link from curr_fil to link */
                if (na + 2 >= argc) {
                  HTS_PANIC_PRINTF(
                      "Option #l needs a link and a current-file path");
                  printf(
                      "Example: '-#l' 'host/dir/img.gif' 'host/dir/p.html'\n");
                  htsmain_free();
                  return -1;
                } else {
                  char s[HTS_URLMAXSIZE * 2];
                  if (lienrelatif(s, sizeof(s), argv[na + 1], argv[na + 2]) ==
                      0)
                    printf("relative=%s\n", s);
                  else
                    printf("relative=<ERROR>\n");
                  htsmain_free();
                  return 0;
                }
                break;
              case 'i': /* ident_url_relatif: resolve a link -> adr/fil */
                if (na + 3 >= argc) {
                  HTS_PANIC_PRINTF(
                      "Option #i needs a link, an origin address and file");
                  printf("Example: '-#i' '../img.gif' 'www.foo.com' "
                         "'/d/p.html'\n");
                  htsmain_free();
                  return -1;
                } else {
                  lien_adrfil af;
                  const int r = ident_url_relatif(argv[na + 1], argv[na + 2],
                                                  argv[na + 3], &af);
                  if (r == 0)
                    printf("adr=%s fil=%s\n", af.adr, af.fil);
                  else
                    printf("error=%d\n", r);
                  htsmain_free();
                  return 0;
                }
                break;
              case '2':        // mimedefs
                if (na + 1 >= argc) {
                  HTS_PANIC_PRINTF("Option #2 needs to be followed by an URL");
--- a/src/htsdefines.h
+++ b/src/htsdefines.h
@@ -109,8 +109,8 @@ typedef int (*t_hts_htmlcheck_chopt) (t_hts_callbackarg * carg, httrackp * opt);
 /* Rewrite hook over an in-memory page: the html and len arguments point at the
   buffer and its length (the callback may reallocate and resize it),
   url_adresse and url_fichier name it. */
-typedef int (*t_hts_htmlcheck_process) (t_hts_callbackarg * carg,
+typedef int (*t_hts_htmlcheck_process)(t_hts_callbackarg *carg, httrackp *opt,
-                                        httrackp * opt, char **html, int *len,
+                                       char **html, int *len,
                                       const char *url_adresse,
                                       const char *url_fichier);
@@ -147,9 +147,8 @@ typedef const char *(*t_hts_htmlcheck_query3) (t_hts_callbackarg * carg,
   queue size and running totals, stat_time the elapsed time. */
 typedef int (*t_hts_htmlcheck_loop)(t_hts_callbackarg *carg, httrackp *opt,
                                    lien_back *back, int back_max,
-                                     int back_index, int lien_tot,
+                                    int back_index, int lien_tot, int lien_ntot,
-                                     int lien_ntot, int stat_time,
+                                    int stat_time, hts_stat_struct *stats);
                                     hts_stat_struct * stats);
 /* Veto a link (adr host, fil path) after its transfer; status is the result.
   Return 0 to drop the link. */
@@ -168,8 +167,8 @@ typedef void (*t_hts_htmlcheck_pause) (t_hts_callbackarg * carg, httrackp * opt,
                                      const char *lockfile);
 /* Fired after a file is written to disk; 'file' is the local path. */
-typedef void (*t_hts_htmlcheck_filesave) (t_hts_callbackarg * carg,
+typedef void (*t_hts_htmlcheck_filesave)(t_hts_callbackarg *carg, httrackp *opt,
-                                          httrackp * opt, const char *file);
+                                         const char *file);
 /* Richer file-saved notification: source host/filename, local path, and flags
   telling whether the file is new, modified, or left unchanged. */
@@ -189,13 +188,12 @@ typedef int (*t_hts_htmlcheck_linkdetected2) (t_hts_callbackarg * carg,
                                             const char *tag_start);
 /* Fired on each transfer-status change of slot 'back'. */
-typedef int (*t_hts_htmlcheck_xfrstatus) (t_hts_callbackarg * carg,
+typedef int (*t_hts_htmlcheck_xfrstatus)(t_hts_callbackarg *carg, httrackp *opt,
-                                          httrackp * opt, lien_back * back);
+                                         lien_back *back);
 /* Choose the local save path for a URL; write it into 'save'. adr/fil name the
   target, referer_adr/referer_fil the page that linked it. */
-typedef int (*t_hts_htmlcheck_savename) (t_hts_callbackarg * carg,
+typedef int (*t_hts_htmlcheck_savename)(t_hts_callbackarg *carg, httrackp *opt,
                                         httrackp * opt,
                                        const char *adr_complete,
                                        const char *fil_complete,
                                        const char *referer_adr,
@@ -206,9 +204,9 @@ typedef t_hts_htmlcheck_savename t_hts_htmlcheck_extsavename;
 /* Inspect or edit the outgoing request headers in 'buff' before they are sent.
 */
-typedef int (*t_hts_htmlcheck_sendhead) (t_hts_callbackarg * carg,
+typedef int (*t_hts_htmlcheck_sendhead)(t_hts_callbackarg *carg, httrackp *opt,
-                                         httrackp * opt, char *buff,
+                                        char *buff, const char *adr,
-                                         const char *adr, const char *fil,
+                                        const char *fil,
                                        const char *referer_adr,
                                        const char *referer_fil,
                                        htsblk *outgoing);
--- a/src/htsentities.sh
+++ b/src/htsentities.sh
@@ -33,14 +33,14 @@ EOF
        else
            GET "${url}"
        fi
-    ) \
+    ) |
-        | grep -E '^<!ENTITY [a-zA-Z0-9_]' \
+        grep -E '^<!ENTITY [a-zA-Z0-9_]' |
-        | sed \
+        sed \
            -e 's/<!ENTITY //' -e "s/[[:space:]][[:space:]]*/ /g" \
            -e 's/-->$//' \
-        -e 's/\([^ ]*\) CDATA "&#\([^\"]*\);" -- \(.*\)/\1 \2 \3/'\
+            -e 's/\([^ ]*\) CDATA "&#\([^\"]*\);" -- \(.*\)/\1 \2 \3/' |
-| ( \
+        (
-        read A
+            read -r A
            while test -n "$A"; do
                ent="${A%% *}"
                code=$(echo "$A" | cut -f2 -d' ')
@@ -49,11 +49,11 @@ EOF
                i=0
                a=1664525
                c=1013904223
-            m="$[1 << 32]"
+                m="$((1 << 32))"
                while test "$i" -lt ${#ent}; do
                    d="$(echo -n "${ent:${i}:1}" | hexdump -v -e '/1 "%d"')"
-                hash="$[((${hash}*${a})%(${m})+${d}+${c})%(${m})]"
+                    hash="$((((hash * a) % (m) + d + c) % (m)))"
-                i=$[${i}+1]
+                    i=$((i + 1))
                done
                echo -e "    /* $A */"
                echo -e "  case ${hash}u:"
@@ -63,7 +63,7 @@ EOF
                echo -e "    break;"
                # next
-            read A
+                read -r A
            done
        )
    cat <<EOF
--- a/src/htsglobal.h
+++ b/src/htsglobal.h
@@ -226,9 +226,14 @@ Please visit our Website: http://www.httrack.com
 /* Copyright (C) 1998 Xavier Roche and other contributors */
 #define HTTRACK_AFF_AUTHORS "[XR&CO'2014]"
-#define HTS_DEFAULT_FOOTER "<!-- Mirrored from %s%s by HTTrack Website Copier/" HTTRACK_AFF_VERSION " " HTTRACK_AFF_AUTHORS ", %s -->"
+#define HTS_DEFAULT_FOOTER                                                     \
  "<!-- Mirrored from %s%s by HTTrack Website Copier/" HTTRACK_AFF_VERSION     \
  " " HTTRACK_AFF_AUTHORS ", %s -->"
 #define HTTRACK_WEB "http://www.httrack.com"
-#define HTS_UPDATE_WEBSITE "http://www.httrack.com/update.php3?Product=HTTrack&Version=" HTTRACK_VERSIONID "&VersionStr=" HTTRACK_VERSION "&Platform=%d&Language=%s"
+#define HTS_UPDATE_WEBSITE                                                     \
  "http://www.httrack.com/"                                                    \
  "update.php3?Product=HTTrack&Version=" HTTRACK_VERSIONID                     \
  "&VersionStr=" HTTRACK_VERSION "&Platform=%d&Language=%s"
 #define H_CRLF "\x0d\x0a"
 #define CRLF "\x0d\x0a"
@@ -247,6 +252,7 @@ Please visit our Website: http://www.httrack.com
   return type stays compatible with the int it replaces. */
 #ifndef HTS_DEF_DEFSTRUCT_hts_boolean
 #define HTS_DEF_DEFSTRUCT_hts_boolean
 typedef enum hts_boolean { HTS_FALSE = 0, HTS_TRUE = 1 } hts_boolean;
 #endif
@@ -278,8 +284,8 @@ typedef enum hts_boolean { HTS_FALSE = 0, HTS_TRUE = 1 } hts_boolean;
 #endif
 #else
 /* See <http://gcc.gnu.org/wiki/Visibility> */
-#if ( ( defined(__GNUC__) && ( __GNUC__ >= 4 ) ) \
+#if ((defined(__GNUC__) && (__GNUC__ >= 4)) ||                                 \
-      || ( defined(HAVE_VISIBILITY) && HAVE_VISIBILITY ) )
+     (defined(HAVE_VISIBILITY) && HAVE_VISIBILITY))
 #define HTSEXT_API __attribute__((visibility("default")))
 #else
@@ -335,8 +341,8 @@ typedef __int64 LLint;
 typedef __int64 TStamp;
 #define LLintP "%I64d"
-#elif (defined(_LP64) || defined(__x86_64__) \
+#elif (defined(_LP64) || defined(__x86_64__) || defined(__powerpc64__) ||      \
-       || defined(__powerpc64__) || defined(__64BIT__))
+       defined(__64BIT__))
 typedef long int LLint;
@@ -405,7 +411,8 @@ typedef int T_SOC;
 #if HTS_ACCESS
 #define HTS_ACCESS_FILE (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)
-#define HTS_ACCESS_FOLDER (S_IRUSR|S_IWUSR|S_IXUSR|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH)
+#define HTS_ACCESS_FOLDER                                                      \
  (S_IRUSR | S_IWUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH)
 #else
 #define HTS_ACCESS_FILE (S_IRUSR | S_IWUSR)
@@ -427,7 +434,11 @@ typedef int T_SOC;
 #endif
 /* fflush sur stdout */
-#define io_flush { fflush(stdout); fflush(stdin); }
+#define io_flush                                                               \
  {                                                                            \
    fflush(stdout);                                                            \
    fflush(stdin);                                                             \
  }
 /* HTSLib */
@@ -524,7 +535,13 @@ static const t_htsboundary htsboundary = 0xDEADBEEF;
 #if _HTS_WIDE
 extern FILE *DEBUG_fp;
-#define DEBUG_W(A)  { if (DEBUG_fp==NULL) DEBUG_fp=fopen("bug.out","wb"); fprintf(DEBUG_fp,":>"A); fflush(DEBUG_fp); }
+#define DEBUG_W(A)                                                             \
  {                                                                            \
    if (DEBUG_fp == NULL)                                                      \
      DEBUG_fp = fopen("bug.out", "wb");                                       \
    fprintf(DEBUG_fp, ":>" A);                                                 \
    fflush(DEBUG_fp);                                                          \
  }
 #undef _
 #define _ ,
 #endif
--- a/src/htslib.c
+++ b/src/htslib.c
@@ -2605,6 +2605,8 @@ int ident_url_absolute(const char *url, lien_adrfil *adrfil) {
    for(i = 0; adrfil->fil[i] != '\0'; i++)
      if (adrfil->fil[i] == '\\')
        adrfil->fil[i] = '/';
    // collapse ../ like the http branch above (path-traversal safety)
    fil_simplifie(adrfil->fil);
  }
  // no hostname
--- a/src/htsmodules.h
+++ b/src/htsmodules.h
@@ -92,8 +92,8 @@ struct htsmoduleStruct {
  /* Callbacks */
  t_htsAddLink addLink; /* call this function when links are
-                                   being detected. it if not your responsability to decide
+                           being detected. it if not your responsability to
-                                   if the engine will keep them, or not. */
+                           decide if the engine will keep them, or not. */
  /* Optional */
  char *localLink;   /* if non null, the engine will write there the local
@@ -117,7 +117,6 @@ struct htsmoduleStruct {
  int *ptr_;
  const char *page_charset_;
  /* Internal use - please don't touch */
 };
 #ifdef __cplusplus
--- a/src/htsnet.h
+++ b/src/htsnet.h
@@ -112,8 +112,8 @@ struct SOCaddr {
 /** Pointer to the port field (network byte order) for the active family.
    Asserts on NULL or an unset/unknown family. */
-static HTS_INLINE HTS_UNUSED in_port_t* SOCaddr_sinport_(SOCaddr *const addr,
+static HTS_INLINE HTS_UNUSED in_port_t *
-                                                         const char *file, const int line) {
+SOCaddr_sinport_(SOCaddr *const addr, const char *file, const int line) {
  assertf_(addr != NULL, file, line);
  switch (addr->m_addr.sa.sa_family) {
  case AF_INET:
@@ -134,7 +134,8 @@ static HTS_INLINE HTS_UNUSED in_port_t* SOCaddr_sinport_(SOCaddr *const addr,
 /** Length of the active sockaddr (sockaddr_in or sockaddr_in6), or 0 if the
    family is unset/unknown. The 0 case doubles as the "not valid" test. */
 static HTS_INLINE HTS_UNUSED socklen_t SOCaddr_size_(const SOCaddr *const addr,
-                                                     const char *file, const int line) {
+                                                     const char *file,
                                                     const int line) {
  assertf_(addr != NULL, file, line);
  switch (addr->m_addr.sa.sa_family) {
  case AF_INET:
@@ -152,8 +153,8 @@ static HTS_INLINE HTS_UNUSED socklen_t SOCaddr_size_(const SOCaddr*const addr,
 }
 /** Reset to the unset state (family AF_UNSPEC), making the address invalid. */
-static HTS_INLINE HTS_UNUSED void SOCaddr_clear_(SOCaddr*const addr,
+static HTS_INLINE HTS_UNUSED void
-                                                 const char *file, const int line) {
+SOCaddr_clear_(SOCaddr *const addr, const char *file, const int line) {
  assertf_(addr != NULL, file, line);
  addr->m_addr.sa.sa_family = AF_UNSPEC;
 }
@@ -191,14 +192,16 @@ static HTS_INLINE HTS_UNUSED void SOCaddr_clear_(SOCaddr*const addr,
 /** Set the port (host-order argument, stored network-order) on the active
 * family. */
-#define SOCaddr_initport(server, port) do { \
+#define SOCaddr_initport(server, port)                                         \
  do {                                                                         \
    SOCaddr_sinport(server) = htons((in_port_t) (port));                       \
  } while (0)
 /** Initialize as an all-zero IPv4 wildcard (INADDR_ANY) address; returns its
    sockaddr length. */
 static HTS_INLINE HTS_UNUSED socklen_t SOCaddr_initany_(SOCaddr *const addr,
-                                                        const char *file, const int line) {
+                                                        const char *file,
                                                        const int line) {
  assertf_(addr != NULL, file, line);
  memset(&addr->m_addr.in, 0, sizeof(addr->m_addr.in));
  addr->m_addr.in.sin_family = AF_INET;
@@ -206,7 +209,8 @@ static HTS_INLINE HTS_UNUSED socklen_t SOCaddr_initany_(SOCaddr*const addr,
 }
 /** Initialize server as an IPv4 wildcard (INADDR_ANY) address. */
-#define SOCaddr_initany(server) do { \
+#define SOCaddr_initany(server)                                                \
  do {                                                                         \
    SOCaddr_initany_(&(server), __FILE__, __LINE__);                           \
  } while (0)
@@ -215,8 +219,10 @@ static HTS_INLINE HTS_UNUSED socklen_t SOCaddr_initany_(SOCaddr*const addr,
    with port zeroed. Any other size leaves an AF_INET shell. Returns the
    resulting sockaddr length. */
 static HTS_UNUSED socklen_t SOCaddr_copyaddr_(SOCaddr *const server,
-                                              const void *data, const size_t data_size,
+                                              const void *data,
-                                              const char *file, const int line) {
+                                              const size_t data_size,
                                              const char *file,
                                              const int line) {
  assertf_(server != NULL, file, line);
  assertf_(data != NULL, file, line);
@@ -248,32 +254,35 @@ static HTS_UNUSED socklen_t SOCaddr_copyaddr_(SOCaddr*const server,
 /** Copy hpaddr (length hpsize) into server, writing the result length into the
    lvalue server_len (int). See SOCaddr_copyaddr_ for accepted forms. */
-#define SOCaddr_copyaddr(server, server_len, hpaddr, hpsize) do { \
+#define SOCaddr_copyaddr(server, server_len, hpaddr, hpsize)                   \
-  server_len = (int) SOCaddr_copyaddr_(&(server), hpaddr, hpsize, __FILE__, __LINE__); \
+  do {                                                                         \
    server_len = (int) SOCaddr_copyaddr_(&(server), hpaddr, hpsize, __FILE__,  \
                                         __LINE__);                            \
  } while (0)
 /** Like SOCaddr_copyaddr but discards the result length. */
-#define SOCaddr_copyaddr2(server, hpaddr, hpsize) do { \
+#define SOCaddr_copyaddr2(server, hpaddr, hpsize)                              \
  do {                                                                         \
    (void) SOCaddr_copyaddr_(&(server), hpaddr, hpsize, __FILE__, __LINE__);   \
  } while (0)
 /** Copy one SOCaddr (src) into another (dest), preserving family and port. */
-#define SOCaddr_copy_SOCaddr(dest, src) do { \
+#define SOCaddr_copy_SOCaddr(dest, src)                                        \
-  SOCaddr_copyaddr_(&(dest), &(src).m_addr.sa, SOCaddr_size(src), __FILE__, __LINE__); \
+  do {                                                                         \
    SOCaddr_copyaddr_(&(dest), &(src).m_addr.sa, SOCaddr_size(src), __FILE__,  \
                      __LINE__);                                               \
  } while (0)
 /** Write the numeric (dotted/colon) host of ss into namebuf (capacity
    namebuflen), scope id stripped. On failure namebuf becomes "". */
 static HTS_UNUSED void SOCaddr_inetntoa_(char *namebuf, size_t namebuflen,
-                                         SOCaddr *const ss,
+                                         SOCaddr *const ss, const char *file,
-                                         const char *file, const int line) {
+                                         const int line) {
  assertf_(namebuf != NULL, file, line);
  assertf_(ss != NULL, file, line);
-  if (getnameinfo(&ss->m_addr.sa, sizeof(ss->m_addr),
+  if (getnameinfo(&ss->m_addr.sa, sizeof(ss->m_addr), namebuf, namebuflen, NULL,
-                  namebuf, namebuflen, 
+                  0, NI_NUMERICHOST) == 0) {
                  NULL, 0, 
                  NI_NUMERICHOST) == 0) {
    /* remove scope id(s) */
    char *const pos = strchr(namebuf, '%');
    if (pos != NULL) {
@@ -289,7 +298,8 @@ static HTS_UNUSED void SOCaddr_inetntoa_(char *namebuf, size_t namebuflen,
  SOCaddr_inetntoa_(namebuf, namebuflen, &(ss), __FILE__, __LINE__)
 /** Single-char family tag: '1' for IPv4, '2' otherwise (used in the cache). */
-#define SOCaddr_getproto(ss) ( SOCaddr_size(ss) == sizeof(struct sockaddr_in) ? '1' : '2')
+#define SOCaddr_getproto(ss)                                                   \
  (SOCaddr_size(ss) == sizeof(struct sockaddr_in) ? '1' : '2')
 /** Length type for socket APIs (getsockname, accept, ...). */
 typedef socklen_t SOClen;
--- a/src/htsopt.h
+++ b/src/htsopt.h
@@ -72,6 +72,7 @@ typedef struct String String;
 #endif
 #ifndef HTS_DEF_STRUCT_String
 #define HTS_DEF_STRUCT_String
 struct String {
  char *buffer_;
  size_t length_;
@@ -179,6 +180,7 @@ typedef struct lien_url lien_url;
 #ifndef HTS_DEF_DEFSTRUCT_hts_log_type
 #define HTS_DEF_DEFSTRUCT_hts_log_type
 typedef enum hts_log_type {
  LOG_PANIC,
  LOG_ERROR,
@@ -288,6 +290,7 @@ typedef enum htsparsejava_flags {
 /* Link-rewriting style for saved pages (opt->urlmode). */
 #ifndef HTS_DEF_DEFSTRUCT_hts_urlmode
 #define HTS_DEF_DEFSTRUCT_hts_urlmode
 typedef enum hts_urlmode {
  HTS_URLMODE_ABSOLUTE = 0, /**< absolute URL (http://host/path) everywhere */
  HTS_URLMODE_ABSOLUTE_FILE = 1, /**< legacy file: form, unused */
@@ -301,6 +304,7 @@ typedef enum hts_urlmode {
 /* Cache policy for updates and retries (opt->cache). */
 #ifndef HTS_DEF_DEFSTRUCT_hts_cachemode
 #define HTS_DEF_DEFSTRUCT_hts_cachemode
 typedef enum hts_cachemode {
  HTS_CACHE_NONE = 0,       /**< no cache */
  HTS_CACHE_PRIORITY = 1,   /**< cache takes priority over the network */
@@ -311,6 +315,7 @@ typedef enum hts_cachemode {
 /* Interactive wizard level (opt->wizard). */
 #ifndef HTS_DEF_DEFSTRUCT_hts_wizard
 #define HTS_DEF_DEFSTRUCT_hts_wizard
 typedef enum hts_wizard {
  HTS_WIZARD_NONE = 0, /**< no wizard */
  HTS_WIZARD_ASK = 1,  /**< wizard asks questions */
@@ -321,6 +326,7 @@ typedef enum hts_wizard {
 /* robots.txt / meta-robots obedience level (opt->robots). */
 #ifndef HTS_DEF_DEFSTRUCT_hts_robots
 #define HTS_DEF_DEFSTRUCT_hts_robots
 typedef enum hts_robots {
  HTS_ROBOTS_NEVER = 0,        /**< ignore robots rules */
  HTS_ROBOTS_SOMETIMES = 1,    /**< partial obedience (default) */
--- a/src/htsparse.c
+++ b/src/htsparse.c
@@ -296,6 +296,48 @@ static const char *html_inline_safe(const char *src, char *dst, size_t size) {
  return dst;
 }
 /* Byte before html, or a space sentinel at the buffer start where html[-1]
   would underflow; space reads as the word boundary the guards want there. */
 static HTS_INLINE char html_prevc(const char *html, const char *start) {
  return html > start ? html[-1] : ' ';
 }
 /* True if [s, s+len) is exactly an HTTP method token (XHR.open's first
   argument is a method, not a URL: #218). Case-insensitive. */
 static int is_http_method(const char *s, size_t len) {
  static const char *const methods[] = {"GET",    "POST",  "PUT",
                                        "DELETE", "HEAD",  "OPTIONS",
                                        "PATCH",  "TRACE", NULL};
  int i;
  for (i = 0; methods[i] != NULL; i++) {
    if (strlen(methods[i]) == len && strfield(s, methods[i]) == (int) len)
      return 1;
  }
  return 0;
 }
 /* Percent-encode '(' and ')' in a link emitted into an unquoted url(...) (CSS
   or JS): a literal ')' closes the token early and the UA mis-parses the value
   (#163). The UA decodes %28/%29 back to the saved-on-disk name. */
 static void escape_url_parens(char *const s, const size_t size) {
  char BIGSTK buff[HTS_URLMAXSIZE * 2];
  size_t i, j;
  for (i = 0, j = 0; s[i] != '\0' && j + 3 < size && j + 3 < sizeof(buff);
       i++) {
    if (s[i] == '(' || s[i] == ')') {
      buff[j++] = '%';
      buff[j++] = '2';
      buff[j++] = s[i] == '(' ? '8' : '9';
    } else {
      buff[j++] = s[i];
    }
  }
  buff[j] = '\0';
  strlcpybuff(s, buff, size);
 }
 /* Main parser */
 int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
  char catbuff[CATBUFF_SIZE];
@@ -556,7 +598,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                  if (opt->getmode & HTS_GETMODE_HTML) {
                    p = strfield(html, "title");
                    if (p) {
-                      if (*(html - 1) == '/')
+                      if (html_prevc(html, r->adr) == '/')
                        p = 0;  // /title
                    } else {
                      if (strfield(html, "/html"))
@@ -1341,6 +1383,11 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                    int can_avoid_quotes = 0;
                    char quotes_replacement = '\0';
                    int ensure_not_mime = 0;
                    // .open(method,url): reject an HTTP-method first arg (#218)
                    int ensure_not_method = 0;
                    // @import: the quoted token is the URL; a trailing
                    // media/supports/layer condition is not part of it
                    int is_import = 0;
                    if (inscript_tag)
                      expected_end = ";\"\'";   // voir a href="javascript:doc.location='foo'"
@@ -1357,9 +1404,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                      if (!nc)
                        nc = strfield(html, ":location");        // javascript:location="doc"
                      if (!nc) {        // location="doc"
-                        if ((nc = strfield(html, "location"))
+                        if ((nc = strfield(html, "location")) &&
-                            && !isspace(*(html - 1))
+                            !isspace(html_prevc(html, r->adr)))
                          )
                          nc = 0;
                      }
                      if (!nc)
@@ -1369,6 +1415,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                          expected = '(';       // parenthèse
                          expected_end = "),";  // fin: virgule ou parenthèse
                          ensure_not_mime = 1;  //* ensure the url is not a mime type */
                          ensure_not_method = 1; // xhr.open: don't grab method
                        }
                      if (!nc)
                        if ((nc = strfield(html, ".replace"))) { // window.replace("url")
@@ -1380,7 +1427,9 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                          expected = '(';       // parenthèse
                          expected_end = ")";   // fin: parenthèse
                        }
-                      if (!nc && (nc = strfield(html, "url")) && (!isalnum(*(html - 1))) && *(html - 1) != '_') {  // url(url)
+                      if (!nc && (nc = strfield(html, "url")) &&
                          (!isalnum(html_prevc(html, r->adr))) &&
                          html_prevc(html, r->adr) != '_') { // url(url)
                        expected = '('; // parenthèse
                        expected_end = ")";     // fin: parenthèse
                        can_avoid_quotes = 1;
@@ -1390,6 +1439,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                        if ((nc = strfield(html, "import"))) {   // import "url"
                          if (is_space(*(html + nc))) {
                            expected = 0;       // no char expected
                            is_import = 1;
                          } else
                            nc = 0;
                        }
@@ -1407,6 +1457,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                          if ((*a == 34) || (*a == '\'') || (can_avoid_quotes)) {
                            const char *b, *c;
                            int ndelim = 1;
                            int valid_url = 0;
                            if ((*a == 34) || (*a == '\''))
                              a++;
@@ -1421,12 +1472,20 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                                b++;
                            }
                            c = b--;
                            // no closing delimiter here (truncated input):
                            // Don't scan past the buffer NUL or capture it.
                            if (*c != '\0') {
                              c += ndelim;
                              while (*c == ' ')
                                c++;
-                            if ((strchr(expected_end, *c)) || (*c == '\n')
+                              valid_url =
-                                || (*c == '\r')) {
+                                  (strchr(expected_end, *c)) || (*c == '\n') ||
-                              c -= (ndelim + 1);
+                                  (*c == '\r') ||
                                  (is_import && *(b + 1 + ndelim) == ' ');
                            }
                            if (valid_url) {
                              // URL end = last char (b), not the delimiter
                              c = b;
                              if ((int) (c - a + 1)) {
                                if (ensure_not_mime) {
                                  int i = 0;
@@ -1442,6 +1501,11 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                                    i++;
                                  }
                                }
                                // XHR.open's "GET" etc. is a method, not a URL
                                if (a != NULL && ensure_not_method &&
                                    is_http_method(a, (size_t) (c - a + 1))) {
                                  a = NULL;
                                }
                                // Check for bogus links (Vasiliy)
                                if (a != NULL) {
                                  const size_t size = c - a + 1;
@@ -1485,7 +1549,6 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                                }
                              }
                            }
                          }
                        }
                      }
@@ -1692,6 +1755,24 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                                                              hts_nodetect[i -
                                                                           1]);
                                              }
                                              // xmlns / xmlns:prefix declare
                                              // XML namespaces, not resources
                                              // (#191)
                                              else {
                                                const int xl = strfield(
                                                    intag_startattr, "xmlns");
                                                const char xc =
                                                    intag_startattr[xl];
                                                if (xl &&
                                                    (xc == ':' || xc == '=' ||
                                                     is_space(xc))) {
                                                  url_ok = 0;
                                                  hts_log_print(
                                                      opt, LOG_DEBUG,
                                                      "dirty parsing: xmlns "
                                                      "namespace avoided");
                                                }
                                              }
                                            }
                                  }
@@ -2967,6 +3048,10 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                          /* Never escape high-chars (we don't know the encoding!!) */
                          inplace_escape_uri_utf(tempo, sizeof(tempo));
                          // unquoted url() (CSS/JS): keep parens escaped
                          if (ending_p == ')')
                            escape_url_parens(tempo, sizeof(tempo));
                          //if (!no_esc_utf)
                          //  escape_uri(tempo);     // escape with %xx
                          //else {
--- a/src/htssafe.h
+++ b/src/htssafe.h
@@ -58,7 +58,8 @@ HTSEXT_API htsErrorCallback hts_get_error_callback(void);
 #endif
 #endif
-#define HTSSAFE_ABORT_FUNCTION(A,B,C) do { \
+#define HTSSAFE_ABORT_FUNCTION(A, B, C)                                        \
  do {                                                                         \
    htsErrorCallback callback = hts_get_error_callback();                      \
    if (callback != NULL) {                                                    \
      callback(A, B, C);                                                       \
@@ -75,7 +76,8 @@ HTSEXT_API htsErrorCallback hts_get_error_callback(void);
 /**
 * Fatal assertion check.
 */
-#define assertf__(exp, sexp, file, line) (void) ( (exp) || (abortf_(sexp, file, line), 0) )
+#define assertf__(exp, sexp, file, line)                                       \
  (void) ((exp) || (abortf_(sexp, file, line), 0))
 /**
 * Fatal assertion check.
@@ -106,7 +108,8 @@ static HTS_UNUSED void abortf_(const char *exp, const char *file, int line) {
 #if (defined(__GNUC__) && !defined(__cplusplus))
 /* Note: char[] and const char[] are compatible */
-#define HTS_IS_CHAR_BUFFER(VAR) ( __builtin_types_compatible_p ( typeof (VAR), char[] ) )
+#define HTS_IS_CHAR_BUFFER(VAR)                                                \
  (__builtin_types_compatible_p(typeof(VAR), char[]))
 #else
 /* Note: a bit lame as char[8] won't be seen. */
 #define HTS_IS_CHAR_BUFFER(VAR) (sizeof(VAR) != sizeof(char *))
@@ -201,10 +204,13 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
 */
 #if (defined(__GNUC__) && !defined(__cplusplus))
-#define strncatbuff(A, B, N) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
+#define strncatbuff(A, B, N)                                                   \
  __builtin_choose_expr(                                                       \
      HTS_IS_CHAR_BUFFER(A),                                                   \
      strncat_safe_(A, sizeof(A), B,                                           \
                    HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N,    \
-  "overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__), \
+                    "overflow while appending '" #B "' to '" #A "'", __FILE__, \
                    __LINE__),                                                 \
      strncatbuff_ptr_((A), (B), (N)))
 #else
 #define strncatbuff(A, B, N)                                                   \
@@ -212,7 +218,8 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
       ? strncat(A, B, N)                                                      \
       : strncat_safe_(A, sizeof(A), B,                                        \
                       HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \
-  "overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) )
+                       "overflow while appending '" #B "' to '" #A "'",        \
                       __FILE__, __LINE__))
 #endif
 /**
@@ -222,18 +229,24 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
 */
 #if (defined(__GNUC__) && !defined(__cplusplus))
-#define strcatbuff(A, B) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
+#define strcatbuff(A, B)                                                       \
  __builtin_choose_expr(                                                       \
      HTS_IS_CHAR_BUFFER(A),                                                   \
      strncat_safe_(A, sizeof(A), B,                                           \
-  HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \
+                    HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B),       \
-  "overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__), \
+                    (size_t) -1,                                               \
                    "overflow while appending '" #B "' to '" #A "'", __FILE__, \
                    __LINE__),                                                 \
      strcatbuff_ptr_((A), (B)))
 #else
 #define strcatbuff(A, B)                                                       \
  (HTS_IS_NOT_CHAR_BUFFER(A)                                                   \
       ? strcat(A, B)                                                          \
       : strncat_safe_(A, sizeof(A), B,                                        \
-  HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \
+                       HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B),    \
-  "overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) )
+                       (size_t) -1,                                            \
                       "overflow while appending '" #B "' to '" #A "'",        \
                       __FILE__, __LINE__))
 #endif
 /**
@@ -243,10 +256,13 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
 */
 #if (defined(__GNUC__) && !defined(__cplusplus))
-#define strcpybuff(A, B) __builtin_choose_expr( HTS_IS_CHAR_BUFFER(A), \
+#define strcpybuff(A, B)                                                       \
  __builtin_choose_expr(                                                       \
      HTS_IS_CHAR_BUFFER(A),                                                   \
      strcpy_safe_(A, sizeof(A), B,                                            \
                   HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B),        \
-  "overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__), \
+                   "overflow while copying '" #B "' to '" #A "'", __FILE__,    \
                   __LINE__),                                                  \
      strcpybuff_ptr_((A), (B)))
 #else
 #define strcpybuff(A, B)                                                       \
@@ -254,7 +270,8 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
       ? strcpy(A, B)                                                          \
       : strcpy_safe_(A, sizeof(A), B,                                         \
                      HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B),     \
-  "overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__) )
+                      "overflow while copying '" #B "' to '" #A "'", __FILE__, \
                      __LINE__))
 #endif
 /*
@@ -269,9 +286,9 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
 * Append characters of "B" to "A", "A" having a maximum capacity of "S".
 */
 #define strlcatbuff(A, B, S)                                                   \
-  strncat_safe_(A, S, B, \
+  strncat_safe_(A, S, B, HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B),  \
-  HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \
+                (size_t) -1, "overflow while appending '" #B "' to '" #A "'",  \
-  "overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__)
+                __FILE__, __LINE__)
 /**
 * Append at most "N" characters of "B" to "A", "A" having a maximum capacity
@@ -286,16 +303,17 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
 * Copy characters of "B" to "A", "A" having a maximum capacity of "S".
 */
 #define strlcpybuff(A, B, S)                                                   \
-  strcpy_safe_(A, S, B, \
+  strcpy_safe_(A, S, B, HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B),   \
-  HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
+               "overflow while copying '" #B "' to '" #A "'", __FILE__,        \
-  "overflow while copying '" #B "' to '"#A"'", __FILE__, __LINE__)
+               __LINE__)
 /** strnlen replacement (autotools). **/
 #if (!defined(_WIN32) && !defined(HAVE_STRNLEN))
 static HTS_UNUSED size_t strnlen(const char *s, size_t maxlen) {
  size_t i;
-  for(i = 0 ; i < maxlen && s[i] != '\0' ; i++) ;
+  for (i = 0; i < maxlen && s[i] != '\0'; i++)
    ;
  return i;
 }
 #endif
@@ -304,12 +322,13 @@ static HTS_UNUSED size_t strnlen(const char *s, size_t maxlen) {
   Aborts if source is NULL or has no NUL within that capacity. The sentinel
   sizeof_source == (size_t)-1 means "capacity unknown", and falls back to the
   unbounded strlen (used when the source is a pointer rather than an array). */
-static HTS_INLINE HTS_UNUSED size_t strlen_safe_(const char *source, const size_t sizeof_source, 
+static HTS_INLINE HTS_UNUSED size_t strlen_safe_(const char *source,
                                                 const size_t sizeof_source,
                                                 const char *file, int line) {
  size_t size;
  assertf_(source != NULL, file, line);
-  size = sizeof_source != (size_t) -1 
+  size = sizeof_source != (size_t) -1 ? strnlen(source, sizeof_source)
-    ? strnlen(source, sizeof_source) : strlen(source);
+                                      : strlen(source);
  assertf_(size < sizeof_source, file, line);
  return size;
 }
@@ -319,10 +338,10 @@ static HTS_INLINE HTS_UNUSED size_t strlen_safe_(const char *source, const size_
   source's capacity or (size_t)-1 if unknown. Aborts if the result (existing
   dest length + appended bytes + NUL) would not fit sizeof_dest: this NEVER
   truncates. Always NUL-terminates on success. */
-static HTS_INLINE HTS_UNUSED char* strncat_safe_(char *const dest, const size_t sizeof_dest,
+static HTS_INLINE HTS_UNUSED char *
 strncat_safe_(char *const dest, const size_t sizeof_dest,
              const char *const source, const size_t sizeof_source,
-                                                 const size_t n,
+              const size_t n, const char *exp, const char *file, int line) {
                                                 const char *exp, const char *file, int line) {
  const size_t source_len = strlen_safe_(source, sizeof_source, file, line);
  const size_t dest_len = strlen_safe_(dest, sizeof_dest, file, line);
  /* note: "size_t is an unsigned integral type" ((size_t) -1 is positive) */
@@ -337,12 +356,14 @@ static HTS_INLINE HTS_UNUSED char* strncat_safe_(char *const dest, const size_t
 /* Core bounded copy: empties dest then appends all of source via
   strncat_safe_. sizeof_dest is dest's total capacity (NUL included). Aborts
   (no truncation) if source plus its NUL would not fit. */
-static HTS_INLINE HTS_UNUSED char* strcpy_safe_(char *const dest, const size_t sizeof_dest,
+static HTS_INLINE HTS_UNUSED char *
 strcpy_safe_(char *const dest, const size_t sizeof_dest,
             const char *const source, const size_t sizeof_source,
             const char *exp, const char *file, int line) {
  assertf_(sizeof_dest != 0, file, line);
  dest[0] = '\0';
-  return strncat_safe_(dest, sizeof_dest, source, sizeof_source, (size_t) -1, exp, file, line);
+  return strncat_safe_(dest, sizeof_dest, source, sizeof_source, (size_t) -1,
                       exp, file, line);
 }
 /**
@@ -385,22 +406,28 @@ static HTS_INLINE HTS_UNUSED htsbuff htsbuff_ptr_(char *buf, size_t cap) {
 /* 0 for an array, a -1 array-size compile error for a pointer. */
 #define htsbuff_must_be_array_(A)                                              \
-  (sizeof(char[1 - 2 * !!__builtin_types_compatible_p(typeof(A), typeof(&(A)[0]))]) - 1)
+  (sizeof(char[1 - 2 * !!__builtin_types_compatible_p(typeof(A),               \
                                                      typeof(&(A)[0]))]) -     \
   1)
-#define htsbuff_array(ARR) htsbuff_ptr_((ARR), sizeof(ARR) + htsbuff_must_be_array_(ARR))
+#define htsbuff_array(ARR)                                                     \
  htsbuff_ptr_((ARR), sizeof(ARR) + htsbuff_must_be_array_(ARR))
 #else
 #define htsbuff_array(ARR) htsbuff_ptr_((ARR), sizeof(ARR))
 #endif
 /** Builder over pointer P of known capacity N (N includes the NUL). */
 #define htsbuff_ptr(P, N) htsbuff_ptr_((P), (N))
-/** Append at most n characters of s (stopping at its NUL). Aborts on overflow. */
+/** Append at most n characters of s (stopping at its NUL). Aborts on overflow.
-static HTS_INLINE HTS_UNUSED void htsbuff_catn(htsbuff *b, const char *s, size_t n) {
+ */
 static HTS_INLINE HTS_UNUSED void htsbuff_catn(htsbuff *b, const char *s,
                                               size_t n) {
  const size_t add = strnlen(s, n);
  /* Overflow-safe: keep the (potentially huge) 'add' alone on one side. The
     maintained invariant len < cap makes 'cap - len' >= 1 (no underflow), so
     'add < cap - len' cannot wrap the way 'len + add < cap' could. */
-  assertf__(add < b->cap - b->len, "htsbuff append overflow", __FILE__, __LINE__);
+  assertf__(add < b->cap - b->len, "htsbuff append overflow", __FILE__,
            __LINE__);
  memcpy(b->buf + b->len, s, add);
  b->len += add;
  b->buf[b->len] = '\0';
@@ -437,7 +464,13 @@ static HTS_INLINE HTS_UNUSED const char *htsbuff_str(const htsbuff *b) {
 #define calloct(A, B) calloc((A), (B))
-#define freet(A)            do { if ((A) != NULL) { free(A); (A) = NULL; } } while(0)
+#define freet(A)                                                               \
  do {                                                                         \
    if ((A) != NULL) {                                                         \
      free(A);                                                                 \
      (A) = NULL;                                                              \
    }                                                                          \
  } while (0)
 #define strdupt(A) strdup(A)
--- a/src/htsstrings.h
+++ b/src/htsstrings.h
@@ -60,6 +60,7 @@ typedef struct String String;
 #endif
 #ifndef HTS_DEF_STRUCT_String
 #define HTS_DEF_STRUCT_String
 /**
 * Growable owned string.
 *
@@ -131,14 +132,16 @@ struct String {
 /** Drop the last byte and re-terminate. Undefined if the String is empty
    (no length check; would underflow). **/
-#define StringPopRight(BLK) do { \
+#define StringPopRight(BLK)                                                    \
  do {                                                                         \
    StringBuffRW(BLK)[--StringLength(BLK)] = '\0';                             \
  } while (0)
 /** Grow so capacity_ >= CAPACITY (total bytes, including the NUL). May realloc
    (invalidating prior buffer pointers); aborts via STRING_ASSERT on OOM.
    Never shrinks. **/
-#define StringRoomTotal(BLK, CAPACITY) do { \
+#define StringRoomTotal(BLK, CAPACITY)                                         \
  do {                                                                         \
    const size_t capacity_ = (size_t) (CAPACITY);                              \
    while ((BLK).capacity_ < capacity_) {                                      \
      if ((BLK).capacity_ < 16) {                                              \
@@ -153,11 +156,13 @@ struct String {
 /** Reserve room for SIZE more bytes beyond the current length (plus the NUL).
    May realloc, invalidating prior buffer pointers. **/
-#define StringRoom(BLK, SIZE) StringRoomTotal(BLK, StringLength(BLK) + (SIZE) + 1)
+#define StringRoom(BLK, SIZE)                                                  \
  StringRoomTotal(BLK, StringLength(BLK) + (SIZE) + 1)
 /** Reserve room for SIZE more bytes and return the (post-realloc) RW buffer,
    for appending in place. Does not update length_; the caller must. **/
 #define StringBuffN(BLK, SIZE) StringBuffN_(&(BLK), SIZE)
 HTS_STATIC char *StringBuffN_(String *blk, int size) {
  StringRoom(*blk, size);
  return StringBuffRW(*blk);
@@ -166,7 +171,8 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
 /** Zero the fields (NULL buffer, no allocation). Use on an uninitialized
    String only; does NOT free an existing buffer (use StringFree to reset
    an owned one), so calling it on a live String leaks. **/
-#define StringInit(BLK) do { \
+#define StringInit(BLK)                                                        \
  do {                                                                         \
    (BLK).buffer_ = NULL;                                                      \
    (BLK).capacity_ = 0;                                                       \
    (BLK).length_ = 0;                                                         \
@@ -174,7 +180,8 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
 /** Truncate to length 0, keeping the allocation. Forces a non-NULL buffer
    (allocates if empty) and writes the leading NUL, so StringBuff is "". **/
-#define StringClear(BLK) do { \
+#define StringClear(BLK)                                                       \
  do {                                                                         \
    (BLK).length_ = 0;                                                         \
    StringRoom(BLK, 0);                                                        \
    (BLK).buffer_[0] = '\0';                                                   \
@@ -182,7 +189,8 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
 /** Set length_ to SIZE, or to strlen(buffer_) if SIZE is negative. Caller
    asserts SIZE fits the existing content; does not (re)allocate. **/
-#define StringSetLength(BLK, SIZE) do { \
+#define StringSetLength(BLK, SIZE)                                             \
  do {                                                                         \
    if (SIZE >= 0) {                                                           \
      (BLK).length_ = SIZE;                                                    \
    } else {                                                                   \
@@ -192,7 +200,8 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
 /** Release the owned buffer and reset to the empty state (NULL buffer).
    Idempotent; safe on an already-empty String. **/
-#define StringFree(BLK) do { \
+#define StringFree(BLK)                                                        \
  do {                                                                         \
    if ((BLK).buffer_ != NULL) {                                               \
      STRING_FREE((BLK).buffer_);                                              \
      (BLK).buffer_ = NULL;                                                    \
@@ -207,7 +216,8 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
    freed or used by the caller afterwards. length_/capacity_ are set to
    strlen(STR) (capacity_ here excludes the NUL, so the next append reallocs).
   **/
-#define StringSetBuffer(BLK, STR) do { \
+#define StringSetBuffer(BLK, STR)                                              \
  do {                                                                         \
    size_t len__ = strlen(STR);                                                \
    StringFree(BLK);                                                           \
    (BLK).buffer_ = (STR);                                                     \
@@ -218,7 +228,8 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
 /** Append SIZE raw bytes from STR (NULs allowed as data). Grows as needed and
    re-terminates with a NUL after the appended bytes. STR must not alias
    BLK's buffer (a realloc would invalidate it). **/
-#define StringMemcat(BLK, STR, SIZE) do { \
+#define StringMemcat(BLK, STR, SIZE)                                           \
  do {                                                                         \
    const char *str_mc_ = (STR);                                               \
    const size_t size_mc_ = (size_t) (SIZE);                                   \
    StringRoom(BLK, size_mc_);                                                 \
@@ -231,13 +242,15 @@ HTS_STATIC char *StringBuffN_(String * blk, int size) {
 /** Replace content with SIZE raw bytes from STR (NULs allowed as data).
    Same non-aliasing requirement as StringMemcat. **/
-#define StringMemcpy(BLK, STR, SIZE) do { \
+#define StringMemcpy(BLK, STR, SIZE)                                           \
  do {                                                                         \
    (BLK).length_ = 0;                                                         \
    StringMemcat(BLK, STR, SIZE);                                              \
  } while (0)
 /** Append one byte and re-terminate. Grows as needed. **/
-#define StringAddchar(BLK, c) do { \
+#define StringAddchar(BLK, c)                                                  \
  do {                                                                         \
    String *const s__ = &(BLK);                                                \
    char c__ = (c);                                                            \
    StringRoom(*s__, 1);                                                       \
@@ -281,7 +294,8 @@ HTS_STATIC void StringAttach(String * blk, char **str) {
 /** Append the C string STR (up to its NUL). No-op if STR is NULL. STR must not
    alias BLK's buffer. **/
-#define StringCat(BLK, STR) do { \
+#define StringCat(BLK, STR)                                                    \
  do {                                                                         \
    const char *const str__ = (STR);                                           \
    if (str__ != NULL) {                                                       \
      const size_t size__ = strlen(str__);                                     \
@@ -291,7 +305,8 @@ HTS_STATIC void StringAttach(String * blk, char **str) {
 /** Append at most SIZE leading bytes of the C string STR. No-op if STR is
    NULL. STR must not alias BLK's buffer. **/
-#define StringCatN(BLK, STR, SIZE) do { \
+#define StringCatN(BLK, STR, SIZE)                                             \
  do {                                                                         \
    const char *str__ = (STR);                                                 \
    if (str__ != NULL) {                                                       \
      size_t size__ = strlen(str__);                                           \
@@ -304,7 +319,8 @@ HTS_STATIC void StringAttach(String * blk, char **str) {
 /** Replace content with at most SIZE leading bytes of the C string STR.
    If STR is NULL, clears to "". STR must not alias BLK's buffer. **/
-#define StringCopyN(BLK, STR, SIZE) do { \
+#define StringCopyN(BLK, STR, SIZE)                                            \
  do {                                                                         \
    const char *str__ = (STR);                                                 \
    const size_t usize__ = (SIZE);                                             \
    (BLK).length_ = 0;                                                         \
@@ -326,7 +342,8 @@ HTS_STATIC void StringAttach(String * blk, char **str) {
 /** Replace content with a copy of the C string STR. If STR is NULL, clears to
    "". STR must not alias BLK's buffer (use StringCopyOverlapped if it might).
   **/
-#define StringCopy(BLK, STR) do { \
+#define StringCopy(BLK, STR)                                                   \
  do {                                                                         \
    const char *str__ = (STR);                                                 \
    if (str__ != NULL) {                                                       \
      size_t size__ = strlen(str__);                                           \
@@ -338,7 +355,8 @@ HTS_STATIC void StringAttach(String * blk, char **str) {
 /** Like StringCopy but safe when STR aliases BLK's own buffer: copies via a
    temporary, so a self-copy or overlap is well-defined. **/
-#define StringCopyOverlapped(BLK, STR) do { \
+#define StringCopyOverlapped(BLK, STR)                                         \
  do {                                                                         \
    String s__ = STRING_EMPTY;                                                 \
    StringCopy(s__, STR);                                                      \
    StringCopyS(BLK, s__);                                                     \
--- a/src/httrack-library.h
+++ b/src/httrack-library.h
@@ -73,6 +73,7 @@ typedef struct strc_int2bytes2 strc_int2bytes2;
 #endif
 #ifndef HTS_DEF_DEFSTRUCT_hts_log_type
 #define HTS_DEF_DEFSTRUCT_hts_log_type
 /** Log severity levels, most to least severe. A message is emitted only if its
    level is <= opt->debug. LOG_ERRNO is a flag OR'd into the level to append
    ": <strerror(errno)>" to the message. */
@@ -111,8 +112,10 @@ requires: htsdefines.h */
 * CALLBACKARG_USERDEF(). Allocates a t_hts_callbackarg with hts_malloc (not
 * checked for OOM); it is freed by hts_free_opt().
 */
-#define CHAIN_FUNCTION(OPT, MEMBER, FUNCTION, ARGUMENT) do { \
+#define CHAIN_FUNCTION(OPT, MEMBER, FUNCTION, ARGUMENT)                        \
-  t_hts_callbackarg *carg = (t_hts_callbackarg*) hts_malloc(sizeof(t_hts_callbackarg)); \
+  do {                                                                         \
    t_hts_callbackarg *carg =                                                  \
        (t_hts_callbackarg *) hts_malloc(sizeof(t_hts_callbackarg));           \
    carg->userdef = (ARGUMENT);                                                \
    carg->prev.fun = (void *) (OPT)->callbacks_fun->MEMBER.fun;                \
    carg->prev.carg = (OPT)->callbacks_fun->MEMBER.carg;                       \
@@ -120,8 +123,10 @@ requires: htsdefines.h */
    (OPT)->callbacks_fun->MEMBER.carg = carg;                                  \
  } while (0)
-/* The following helpers are useful only if you know that an existing callback migh be existing before before the call to CHAIN_FUNCTION()
+/* The following helpers are useful only if you know that an existing callback
-If your functions were added just after hts_create_opt(), no need to make the previous function check */
+migh be existing before before the call to CHAIN_FUNCTION() If your functions
 were added just after hts_create_opt(), no need to make the previous function
 check */
 /** Inside a chained callback, return the ARGUMENT pointer originally passed to
    CHAIN_FUNCTION(), or NULL when CARG is NULL. */
@@ -129,11 +134,13 @@ If your functions were added just after hts_create_opt(), no need to make the pr
 /** Return the callback of type NAME that this one chained over, cast to its
    function-pointer type, or NULL. Call it to forward to the prior handler. */
-#define CALLBACKARG_PREV_FUN(CARG, NAME) ( (t_hts_htmlcheck_ ##NAME) ( ( (CARG) != NULL ) ? (CARG)->prev.fun : NULL ) )
+#define CALLBACKARG_PREV_FUN(CARG, NAME)                                       \
  ((t_hts_htmlcheck_##NAME)(((CARG) != NULL) ? (CARG)->prev.fun : NULL))
 /** Return the carg of the callback this one chained over (pass it when
   forwarding to the CALLBACKARG_PREV_FUN result), or NULL. */
-#define CALLBACKARG_PREV_CARG(CARG) ( ( (CARG) != NULL ) ? (CARG)->prev.carg : NULL )
+#define CALLBACKARG_PREV_CARG(CARG)                                            \
  (((CARG) != NULL) ? (CARG)->prev.carg : NULL)
 /* Functions */
@@ -212,8 +219,8 @@ HTSEXT_API hts_boolean hts_log(httrackp *opt, const char *prefix,
 /** printf-style log at level @p type (an hts_log_type, optionally |LOG_ERRNO).
    Forwards to the registered log callback, and when the level is <= opt->debug
    also to opt->log. @p format must be non-NULL. */
-HTSEXT_API void hts_log_print(httrackp * opt, int type, const char *format,
+HTSEXT_API void hts_log_print(httrackp *opt, int type, const char *format, ...)
-                              ...) HTS_PRINTF_FUN(3, 4);
+    HTS_PRINTF_FUN(3, 4);
 /** va_list form of hts_log_print(). @p opt may be NULL (only the callback
   runs). Preserves errno. @p format must be non-NULL. */
@@ -255,7 +262,8 @@ HTSEXT_API int htswrap_add(httrackp * opt, const char *name, void *fct);
   or 0 if none or unknown. */
 HTSEXT_API uintptr_t htswrap_read(httrackp *opt, const char *name);
-/* Internal library allocators, if a different libc is being used by the client */
+/* Internal library allocators, if a different libc is being used by the client
 */
 /** strdup() through the library allocator. Returns a heap copy freed with
    hts_free(), or NULL on failure. */
 HTSEXT_API char *hts_strdup(const char *string);
@@ -490,40 +498,50 @@ HTSEXT_API void unescape_amp(char *s);
 /** Percent-escape only spaces (' ' becomes "%20"); copy everything else
 * verbatim. */
-HTSEXT_API size_t escape_spc_url(const char *const src, char *const dest, const size_t size);
+HTSEXT_API size_t escape_spc_url(const char *const src, char *const dest,
                                 const size_t size);
 /** Aggressively percent-escape @p src for use as a single URL path segment
    (reserved, delimiter, unwise, special, avoid and mark characters). */
-HTSEXT_API size_t escape_in_url(const char *const src, char *const dest, const size_t size);
+HTSEXT_API size_t escape_in_url(const char *const src, char *const dest,
                                const size_t size);
 /** Percent-escape @p src as a URI, escaping only what is necessary and keeping
    '/' and other reserved characters. */
-HTSEXT_API size_t escape_uri(const char *const src, char *const dest, const size_t size);
+HTSEXT_API size_t escape_uri(const char *const src, char *const dest,
                             const size_t size);
 /** Like escape_uri() for a UTF-8 URI: also escapes reserved characters other
    than '/'. */
-HTSEXT_API size_t escape_uri_utf(const char *const src, char *const dest, const size_t size);
+HTSEXT_API size_t escape_uri_utf(const char *const src, char *const dest,
                                 const size_t size);
 /** Minimal "make safe" escape: percent-escapes only '"', ' ' and control
    characters, leaving an already-formed URL otherwise intact. */
-HTSEXT_API size_t escape_check_url(const char *const src, char *const dest, const size_t size);
+HTSEXT_API size_t escape_check_url(const char *const src, char *const dest,
                                   const size_t size);
 /** Append-variant of escape_spc_url(): escapes @p src after the existing
    NUL-terminated content of @p dest. Returns the bytes appended (excluding the
    NUL). */
-HTSEXT_API size_t append_escape_spc_url(const char *const src, char *const dest, const size_t size);
+HTSEXT_API size_t append_escape_spc_url(const char *const src, char *const dest,
                                        const size_t size);
 /** Append-variant of escape_in_url(). See append_escape_spc_url(). */
-HTSEXT_API size_t append_escape_in_url(const char *const src, char *const dest, const size_t size);
+HTSEXT_API size_t append_escape_in_url(const char *const src, char *const dest,
                                       const size_t size);
 /** Append-variant of escape_uri(). See append_escape_spc_url(). */
-HTSEXT_API size_t append_escape_uri(const char *const src, char *const dest, const size_t size);
+HTSEXT_API size_t append_escape_uri(const char *const src, char *const dest,
                                    const size_t size);
 /** Append-variant of escape_uri_utf(). See append_escape_spc_url(). */
-HTSEXT_API size_t append_escape_uri_utf(const char *const src, char *const dest, const size_t size);
+HTSEXT_API size_t append_escape_uri_utf(const char *const src, char *const dest,
                                        const size_t size);
 /** Append-variant of escape_check_url(). See append_escape_spc_url(). */
-HTSEXT_API size_t append_escape_check_url(const char *const src, char *const dest, const size_t size);
+HTSEXT_API size_t append_escape_check_url(const char *const src,
                                          char *const dest, const size_t size);
 /** In-place variant of escape_spc_url(): escapes the NUL-terminated string in
    @p dest back into @p dest. */
@@ -543,32 +561,39 @@ HTSEXT_API size_t inplace_escape_check_url(char *const dest, const size_t size);
 /** Same escaping as escape_check_url() but returns @p dest instead of the byte
    count. */
-HTSEXT_API char *escape_check_url_addr(const char *const src, char *const dest, const size_t size);
+HTSEXT_API char *escape_check_url_addr(const char *const src, char *const dest,
                                       const size_t size);
 /** Build a MIME/MHTML content-id token in @p dest from @p adr and @p fil:
    escape_in_url() both, then replace every '%' with 'X' so the result is one
    opaque token. */
-HTSEXT_API size_t make_content_id(const char *const adr, const char *const fil, char *const dest, const size_t size);
+HTSEXT_API size_t make_content_id(const char *const adr, const char *const fil,
                                  char *const dest, const size_t size);
 /** Low-level percent-escaper backing the escape_* family. @p mode selects the
    character class to escape: 0 check_url, 1 in_url, 2 spc_url, 3 uri,
    30 uri_utf. @p max_size is the dest capacity including the NUL. */
-HTSEXT_API size_t x_escape_http(const char *const s, char *const dest, const size_t max_size, const int mode);
+HTSEXT_API size_t x_escape_http(const char *const s, char *const dest,
                                const size_t max_size, const int mode);
 /** Strip all control characters (byte value < 32) from @p s in place. */
 HTSEXT_API void escape_remove_control(char *const s);
 /** HTML-escape for text output: rewrite '&' to "&amp;" and pass every other
   byte through unchanged. */
-HTSEXT_API size_t escape_for_html_print(const char *const s, char *const dest, const size_t size);
+HTSEXT_API size_t escape_for_html_print(const char *const s, char *const dest,
                                        const size_t size);
 /** Like escape_for_html_print() but also convert every high byte (>= 128) to a
    numeric entity "&#xNN;". */
-HTSEXT_API size_t escape_for_html_print_full(const char *const s, char *const dest, const size_t size);
+HTSEXT_API size_t escape_for_html_print_full(const char *const s,
                                             char *const dest,
                                             const size_t size);
 /** Percent-decode @p s into @p catbuff (capacity @p size) and return @p
   catbuff. Decodes every "%xx" hex escape. */
-HTSEXT_API char *unescape_http(char *const catbuff, const size_t size, const char *const s);
+HTSEXT_API char *unescape_http(char *const catbuff, const size_t size,
                               const char *const s);
 /** Percent-decode @p s into @p catbuff, but only the escapes that are safe to
    decode while keeping a valid URI (reserved, delimiter, unwise, control and
@@ -589,8 +614,7 @@ HTSEXT_API hts_boolean get_httptype_sized(httrackp *opt, char *s, size_t ssize,
    HTS_MIMETYPE_SIZE capacity. */
 HTS_DEPRECATED("use get_httptype_sized(opt, s, ssize, fil, flag)")
-HTSEXT_API void get_httptype(httrackp * opt, char *s, const char *fil,
+HTSEXT_API void get_httptype(httrackp *opt, char *s, const char *fil, int flag);
                             int flag);
 /** Classify @p fil by its extension: 0 unknown, 1 known non-HTML, 2 known HTML.
    Consults the built-in table then user --assume rules. 0 for a NULL @p fil.
@@ -633,11 +657,13 @@ HTSEXT_API void guess_httptype(httrackp * opt, char *s, const char *fil);
   time), not a pointer. */
 /** Concatenate @p a and @p b into @p catbuff (NULL or empty operands are
 * skipped). */
-HTSEXT_API char *concat(char *catbuff, size_t size, const char *a, const char *b);
+HTSEXT_API char *concat(char *catbuff, size_t size, const char *a,
                        const char *b);
 /** Like concat(a, b) but convert '/' to the platform path separator (Windows).
 */
-HTSEXT_API char *fconcat(char *catbuff, size_t size, const char *a, const char *b);
+HTSEXT_API char *fconcat(char *catbuff, size_t size, const char *a,
                         const char *b);
 /** Copy @p a into @p catbuff, converting '/' to the platform path separator
    (Windows). */
@@ -756,7 +782,8 @@ typedef struct utimbuf STRUCT_UTIMBUF;
 /** Macro aimed to break at build-time if a size is not a sizeof() strictly
 *  greater than sizeof(char*). **/
 #undef COMPILE_TIME_CHECK_SIZE
-#define COMPILE_TIME_CHECK_SIZE(A) (void) ((void (*)(char[A - sizeof(char*) - 1])) NULL)
+#define COMPILE_TIME_CHECK_SIZE(A)                                             \
  (void) ((void (*)(char[A - sizeof(char *) - 1])) NULL)
 /** Macro aimed to break at compile-time if a size is not a sizeof() strictly
 *  greater than sizeof(char*). **/
--- a/src/webhttrack
+++ b/src/webhttrack
@@ -4,28 +4,33 @@
 # Initializes the htsserver GUI frontend and launch the default browser
 BROWSEREXE=
-SRCHBROWSEREXE="x-www-browser www-browser iceape mozilla firefox-developer-edition firefox icecat iceweasel abrowser firebird galeon konqueror midori opera google-chrome chrome chromium chromium-browser netscape firefox-developer-edition"
+SRCHBROWSEREXE=(x-www-browser www-browser iceape mozilla firefox-developer-edition firefox icecat iceweasel abrowser firebird galeon konqueror midori opera google-chrome chrome chromium chromium-browser netscape firefox-developer-edition)
 # shellcheck disable=SC2153 # BROWSER is the standard freedesktop env var, not a typo
 if test -n "${BROWSER}"; then
    # sensible-browser will f up if BROWSER is not set
-SRCHBROWSEREXE="xdg-open sensible-browser ${SRCHBROWSEREXE}"
+    SRCHBROWSEREXE=(xdg-open sensible-browser "${SRCHBROWSEREXE[@]}")
 fi
 # Patch for Darwin/Mac by Ross Williams
-if test "`uname -s`" == "Darwin"; then
+if test "$(uname -s)" == "Darwin"; then
    # Darwin/Mac OS X uses a system 'open' command to find
    # the default browser. The -W flag causes it to wait for
    # the browser to exit
    BROWSEREXE="/usr/bin/open -W"
 fi
-BINWD=`dirname "$0"`
+BINWD=$(dirname "$0")
-SRCHPATH="$BINWD /usr/local/bin /usr/share/bin /usr/bin /usr/lib/httrack /usr/local/lib/httrack /usr/local/share/httrack /opt/local/bin /sw/bin ${HOME}/usr/bin ${HOME}/bin"
+SRCHPATH=("$BINWD" /usr/local/bin /usr/share/bin /usr/bin /usr/lib/httrack /usr/local/lib/httrack /usr/local/share/httrack /opt/local/bin /sw/bin "${HOME}/usr/bin" "${HOME}/bin")
-SRCHPATH="$SRCHPATH "`echo $PATH | tr ":" " "`
+IFS=':' read -ra pathdirs <<<"$PATH"
-SRCHDISTPATH="$BINWD/../share $BINWD/.. /usr/share /usr/local /usr /local /usr/local/share ${HOME}/usr ${HOME}/usr/share /opt/local/share /sw ${HOME}/usr/local ${HOME}/usr/share"
+for d in "${pathdirs[@]}"; do
    # drop empty PATH fields, matching the old echo|tr word-split
    test -n "$d" && SRCHPATH+=("$d")
 done
 SRCHDISTPATH=("$BINWD/../share" "$BINWD/.." /usr/share /usr/local /usr /local /usr/local/share "${HOME}/usr" "${HOME}/usr/share" /opt/local/share /sw "${HOME}/usr/local" "${HOME}/usr/share")
 ###
 # And now some famous cuisine
 function log {
-echo "$0($$): $@" >&2
+    echo "$0($$): $*" >&2
    return 0
 }
@@ -42,35 +47,35 @@ log "Browser (or helper) exited"
 # First ensure that we can launch the server
 BINPATH=
-for i in ${SRCHPATH}; do
+for i in "${SRCHPATH[@]}"; do
-	! test -n "${BINPATH}" && test -x ${i}/htsserver && BINPATH=${i}
+    ! test -n "${BINPATH}" && test -x "${i}/htsserver" && BINPATH="${i}"
 done
-for i in ${SRCHDISTPATH}; do
+for i in "${SRCHDISTPATH[@]}"; do
    ! test -n "${DISTPATH}" && test -f "${i}/httrack/lang.def" && DISTPATH="${i}/httrack"
 done
 test -n "${BINPATH}" || ! log "Could not find htsserver" || exit 1
 test -n "${DISTPATH}" || ! log "Could not find httrack directory" || exit 1
-test -f ${DISTPATH}/lang.def || ! log "Could not find ${DISTPATH}/lang.def" || exit 1
+test -f "${DISTPATH}/lang.def" || ! log "Could not find ${DISTPATH}/lang.def" || exit 1
-test -f ${DISTPATH}/lang.indexes || ! log "Could not find ${DISTPATH}/lang.indexes" || exit 1
+test -f "${DISTPATH}/lang.indexes" || ! log "Could not find ${DISTPATH}/lang.indexes" || exit 1
-test -d ${DISTPATH}/lang || ! log "Could not find ${DISTPATH}/lang" || exit 1
+test -d "${DISTPATH}/lang" || ! log "Could not find ${DISTPATH}/lang" || exit 1
-test -d ${DISTPATH}/html || ! log "Could not find ${DISTPATH}/html" || exit 1
+test -d "${DISTPATH}/html" || ! log "Could not find ${DISTPATH}/html" || exit 1
 # Locale
 HTSLANG="${LC_MESSAGES}"
 ! test -n "${HTSLANG}" && HTSLANG="${LC_ALL}"
 ! test -n "${HTSLANG}" && HTSLANG="${LANG}"
-HTSLANG="`echo $LANG | cut -f1 -d'.' | cut -f1 -d'_'`"
+HTSLANG="$(echo "$LANG" | cut -f1 -d'.' | cut -f1 -d'_')"
-LANGN=`grep -E "^${HTSLANG}:" ${DISTPATH}/lang.indexes | cut -f2 -d':'`
+LANGN=$(grep -E "^${HTSLANG}:" "${DISTPATH}/lang.indexes" | cut -f2 -d':')
 ! test -n "${LANGN}" && LANGN=1
 # Find the browser
 # note: not all systems have sensible-browser or www-browser alternative
 # thefeore, we have to find a bit more if sensible-browser could not be found
-for i in ${SRCHBROWSEREXE}; do
+for i in "${SRCHBROWSEREXE[@]}"; do
-for j in ${SRCHPATH}; do
+    for j in "${SRCHPATH[@]}"; do
-if test -x ${j}/${i}; then
+        if test -x "${j}/${i}"; then
-BROWSEREXE=${j}/${i}
+            BROWSEREXE="${j}/${i}"
        fi
        test -n "$BROWSEREXE" && break
    done
@@ -81,7 +86,7 @@ test -n "$BROWSEREXE" || ! log "Could not find any suitable browser" || exit 1
 # "browse" command
 if test "$1" = "browse"; then
    if test -f "${HOME}/.httrack.ini"; then
-INDEXF=`cat ${HOME}/.httrack.ini | tr '\r' '\n' | grep -E "^path=" | cut -f2- -d'='`
+        INDEXF=$(tr '\r' '\n' <"${HOME}/.httrack.ini" | grep -E "^path=" | cut -f2- -d'=')
        if test -n "${INDEXF}" -a -d "${INDEXF}" -a -f "${INDEXF}/index.html"; then
            INDEXF="${INDEXF}/index.html"
        else
@@ -96,39 +101,43 @@ exit $?
 fi
 # Create a temporary filename
-TMPSRVFILE="$(mktemp ${TMPDIR:-/tmp}/.webhttrack.XXXXXXXX)" || ! log "Could not create the temporary file ${TMPSRVFILE}" || exit 1
+TMPSRVFILE="$(mktemp "${TMPDIR:-/tmp}/.webhttrack.XXXXXXXX")" || ! log "Could not create the temporary file ${TMPSRVFILE}" || exit 1
 # Launch htsserver binary and setup the server
-(${BINPATH}/htsserver "${DISTPATH}/" --ppid "$$" path "${HOME}/websites" lang "${LANGN}" $@; echo SRVURL=error) > ${TMPSRVFILE}&
+(
    "${BINPATH}/htsserver" "${DISTPATH}/" --ppid "$$" path "${HOME}/websites" lang "${LANGN}" "$@"
    echo SRVURL=error
 ) >"${TMPSRVFILE}" &
 # Find the generated SRVURL
 SRVURL=
 MAXCOUNT=60
 while ! test -n "$SRVURL"; do
-MAXCOUNT=$[$MAXCOUNT - 1]
+    MAXCOUNT=$((MAXCOUNT - 1))
    test $MAXCOUNT -gt 0 || exit 1
    test $MAXCOUNT -lt 50 && echo "waiting for server to reply.."
-SRVURL=`grep -E URL= ${TMPSRVFILE} | cut -f2- -d=`
+    SRVURL=$(grep -E URL= "${TMPSRVFILE}" | cut -f2- -d=)
    test ! "$SRVURL" = "error" || ! log "Could not spawn htsserver" || exit 1
    test -n "$SRVURL" || sleep 1
 done
 # Cleanup function
 # shellcheck disable=SC2120 # $1 is an optional "signal caught" marker; bare calls are intentional
 function cleanup {
    test -n "$1" && log "Nasty signal caught, cleaning up.."
    # Do not kill if browser exited (chrome bug issue) ; server will die itself
-test -n "$1" && test -f ${TMPSRVFILE} && SRVPID=`grep -E PID= ${TMPSRVFILE} | cut -f2- -d=`
+    test -n "$1" && test -f "${TMPSRVFILE}" && SRVPID=$(grep -E PID= "${TMPSRVFILE}" | cut -f2- -d=)
-test -n "${SRVPID}" && kill -9 ${SRVPID}
+    test -n "${SRVPID}" && kill -9 "${SRVPID}"
-test -f ${TMPSRVFILE} && rm ${TMPSRVFILE}
+    test -f "${TMPSRVFILE}" && rm "${TMPSRVFILE}"
    test -n "$1" && log "..Done"
    return 0
 }
 # Cleanup in case of emergency
-trap "cleanup now; exit" 1 2 3 4 5 6 7 8 9 11 13 14 15 16 19 24 25
+trap "cleanup now; exit" HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV PIPE ALRM TERM STKFLT XCPU XFSZ
 # Got SRVURL, launch browser
 launch_browser "${BROWSEREXE}" "${SRVURL}"
 # That's all, folks!
-trap "" 1 2 3 4 5 6 7 8 9 11 13 14 15 16 19 24 25
+trap "" HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV PIPE ALRM TERM STKFLT XCPU XFSZ
 cleanup
 exit 0
--- a/tests/01_engine-parse.test
+++ b/tests/01_engine-parse.test
@@ -154,4 +154,173 @@ grep -Eq "style=\"background-image:url\('ibgs\.gif'\)\"" "$saved2" ||
 grep -q 'title="file://' "$saved2" ||
    ! echo "FAIL: a no-detect attribute (title) was wrongly rewritten" || exit 1
 # xmlns / xmlns:prefix decls must not be crawled (#191). Local file:// targets so a
 # regression downloads them; each is the LAST attr (heuristic only scans a value before '>').
 site3="$tmp/xmlns"
 mkdir -p "$site3"
 for f in ns og rdfs real; do gif "$site3/$f.gif"; done
 cat >"$site3/index.html" <<EOF
 <html xmlns="file://$site3/ns.gif"><body>
 <svg xmlns:og="file://$site3/og.gif"></svg>
 <div class="c" xmlns:rdfs="file://$site3/rdfs.gif"></div>
 <a href="file://$site3/real.gif">real link</a>
 </body></html>
 EOF
 out3="$tmp/xmlns-out"
 crawl "$site3/index.html" "$out3"
 # the real link is still captured
 found "real.gif" "$out3"
 # namespace-declaration targets must not be fetched (default + prefixed forms)
 notfound "ns.gif" "$out3"
 notfound "og.gif" "$out3"
 notfound "rdfs.gif" "$out3"
 # CSS @import (#94): every form's target is captured, crawling the .css directly.
 # The "cond"/"sup"/"spc" cases carry a trailing media/supports/layer condition (or
 # a space before ';'); they are the negative controls: without the parser fix the
 # URL is dropped, so a regression fails these found() checks.
 site4="$tmp/cssimport"
 mkdir -p "$site4"
 for f in nq dqu squ dqs sqs med cond sup lay spc; do printf 'body{}\n' >"$site4/$f.css"; done
 cat >"$site4/main.css" <<'EOF'
@import url(nq.css);
@import url("dqu.css");
@import url('squ.css');
@import "dqs.css";
@import 'sqs.css';
@import url(med.css) screen and (min-width: 400px);
@import "cond.css" screen;
@import "sup.css" supports(display: flex);
@import url(lay.css) layer(base);
@import "spc.css" ;
 EOF
 out4="$tmp/cssimport-out"
 crawl "$site4/main.css" "$out4"
 for f in nq dqu squ dqs sqs med cond sup lay spc; do found "$f.css" "$out4"; done
 # Over-capture guard: the trailing condition is not part of the URL, so it must
 # survive the rewrite verbatim. A regression that grabs it would mangle these.
 m4=$(find "$out4" -type f -path '*/file/*' -name main.css -print -quit)
 test -n "$m4" || ! echo "FAIL: saved main.css not found" || exit 1
 for cond in '@import "cond.css" screen;' 'supports(display: flex)' 'layer(base)'; do
    grep -Fq "$cond" "$m4" ||
        ! echo "FAIL #94: '$cond' altered on rewrite (condition captured as URL?)" || exit 1
 done
 # Malformed input: an unterminated @import quote (truncated CSS) must not crash or
 # capture a bogus link; a valid sibling import is still captured. Guards a heap
 # overflow on the URL-end scan that aborts under ASan (CI sanitizer job).
 site5="$tmp/cssimport-trunc"
 mkdir -p "$site5"
 printf 'body{}\n' >"$site5/good.css"
 printf '@import "good.css";\n@import "trunc' >"$site5/main.css"
 out5="$tmp/cssimport-trunc-out"
 crawl "$site5/main.css" "$out5"
 found "good.css" "$out5"
 notfound "trunc" "$out5"
 # Offset-0 underflow (#396): a token at the buffer start makes the detector's
 # word-boundary guard read *(html-1) one byte early (aborts under ASan). The
 # url() target is still captured; here it just must not underflow.
 site6="$tmp/parse-off0"
 mkdir -p "$site6"
 printf 'body{}\n' >"$site6/off0.css"
 printf 'url(off0.css)\n' >"$site6/main.css"
 out6="$tmp/parse-off0-out"
 crawl "$site6/main.css" "$out6"
 found "off0.css" "$out6"
 # XMLHttpRequest.open(method, url) (#218): the first argument is an HTTP method,
 # not a URL. Without the fix "GET" is captured as a link and fetched (the offline
 # fixture saves a bare file named GET; a live server mangles it to GET.html).
 # window.open(url) detection must be unaffected.
 site7="$tmp/xhropen"
 mkdir -p "$site7"
 gif "$site7/winopen.gif"
 cat >"$site7/index.html" <<EOF
 <html><body><script>
 var x = new XMLHttpRequest();
 x.open("GET", "ajax_info.txt");
 var y = new XMLHttpRequest();
 y.open("Post", "submit.cgi");
 window.open("file://$site7/winopen.gif");
 </script></body></html>
 EOF
 out7="$tmp/xhropen-out"
 crawl "$site7/index.html" "$out7"
 # negative control: without the fix a file named exactly GET is downloaded
 notfound "GET" "$out7"
 # methods are matched case-insensitively (XHR spec normalizes them): a mixed-case
 # method is rejected too, so a file named Post must not appear either
 notfound "Post" "$out7"
 # regression guard: window.open(url) is still detected, so its absolute URL is
 # rewritten to a local link. The rewrite only happens if the parser saw it, so
 # these two assertions fail if .open detection broke (not a trivial --near save).
 saved7=$(savedhtml "$out7")
 test -n "$saved7" || ! echo "FAIL: saved xhr page not found" || exit 1
 grep -Fq 'window.open("winopen.gif")' "$saved7" ||
    ! echo "FAIL #218: window.open(url) no longer detected/rewritten" || exit 1
 ! grep -Fq 'window.open("file://' "$saved7" ||
    ! echo "FAIL #218: window.open URL left absolute (not rewritten)" || exit 1
 # Parens in an unquoted url(...) (#163): the source %28/%29 decode to literal
 # '(' ')' in the saved name, but a literal ')' in the rewritten url() closes the
 # token early, so they must stay encoded. Negative control: without the fix the
 # %281%29 greps fail (parens are RFC2396 "mark" chars the escaper leaves alone).
 site8="$tmp/cssparens"
 mkdir -p "$site8"
 for f in 'img (1).gif' 'a(b)c(1).gif' 'q (4).gif'; do gif "$site8/$f"; done
 cat >"$site8/style.css" <<'EOF'
 .a { background: url(img%20%281%29.gif); }
 .b { background: url(a%28b%29c%281%29.gif); }
 .c { background: url("q%20%284%29.gif"); }
 EOF
 out8="$tmp/cssparens-out"
 crawl "$site8/style.css" "$out8"
 found "img (1).gif" "$out8"
 found "a(b)c(1).gif" "$out8"
 found "q (4).gif" "$out8"
 css8=$(find "$out8" -type f -path '*/file/*' -name style.css -print -quit)
 test -n "$css8" || ! echo "FAIL: saved style.css not found" || exit 1
 grep -Fq 'url(img%20%281%29.gif)' "$css8" ||
    ! echo "FAIL #163: parens in unquoted url() not percent-encoded on rewrite" || exit 1
 grep -Fq 'url(a%28b%29c%281%29.gif)' "$css8" ||
    ! echo "FAIL #163: not every paren in a url() was percent-encoded" || exit 1
 grep -Fq 'url("q%20%284%29.gif")' "$css8" ||
    ! echo "FAIL #163: quoted url() altered or parens left literal on rewrite" || exit 1
 # The url() detector is not CSS-specific: <script> and inline style= get the
 # same encoding, but ordinary href/src (ending_p is the quote, not ')') keep
 # literal parens -- the attribute checks guard the gate against over-firing.
 site9="$tmp/urlparens"
 mkdir -p "$site9"
 for f in 'js (1).gif' 'inl (2).gif' 'asrc (3).gif' 'ahref (4).gif'; do gif "$site9/$f"; done
 cat >"$site9/index.html" <<EOF
 <html><body>
 <script>var bg = "url(js%20%281%29.gif)";</script>
 <div style="background-image:url(inl%20%282%29.gif)"></div>
 <img src="asrc%20%283%29.gif">
 <a href="ahref%20%284%29.gif">link</a>
 </body></html>
 EOF
 out9="$tmp/urlparens-out"
 crawl "$site9/index.html" "$out9"
 saved9=$(savedhtml "$out9")
 test -n "$saved9" || ! echo "FAIL: saved urlparens page not found" || exit 1
 # rewrite-only: the JS-string asset is not queued for download
 grep -Fq 'url(js%20%281%29.gif)' "$saved9" ||
    ! echo "FAIL #163: parens in <script> url() not percent-encoded" || exit 1
 found "inl (2).gif" "$out9"
 grep -Fq 'url(inl%20%282%29.gif)' "$saved9" ||
    ! echo "FAIL #163: parens in inline style url() not percent-encoded" || exit 1
 found "asrc (3).gif" "$out9"
 found "ahref (4).gif" "$out9"
 grep -Fq 'src="asrc%20(3).gif"' "$saved9" ||
    ! echo "FAIL #163: parens in a plain src attribute were wrongly encoded" || exit 1
 grep -Fq 'href="ahref%20(4).gif"' "$saved9" ||
    ! echo "FAIL #163: parens in a plain href attribute were wrongly encoded" || exit 1
 ! grep -Eq '(src|href)="[^"]*%28' "$saved9" ||
    ! echo "FAIL #163: gate over-fired onto a non-url() attribute link" || exit 1
 exit 0
--- a/tests/01_engine-relative.test
+++ b/tests/01_engine-relative.test
@@ -0,0 +1,68 @@
 #!/bin/bash
 #
 # lienrelatif (build relative path) + ident_url_relatif (resolve a link, collapse
 # ./ and ../). Regression net for #137/#162; expected values hand-computed.
 set -euo pipefail
 # relative path from <curr>'s directory to <link>
 rel() {
    local got
    got=$(httrack -O /dev/null -#l "$1" "$2")
    test "$got" == "relative=$3" ||
        {
            echo "FAIL rel($1, $2): got '$got' want 'relative=$3'"
            exit 1
        }
 }
 # resolve <link> against origin <adr>/<fil> -> adr=.. fil=..
 ident() {
    local got
    got=$(httrack -O /dev/null -#i "$1" "$2" "$3")
    test "$got" == "$4" ||
        {
            echo "FAIL ident($1, $2, $3): got '$got' want '$4'"
            exit 1
        }
 }
 ### lienrelatif
 rel 'dir/page.html' 'dir/index.html' 'page.html'
 rel 'dir/page.html' 'dir/page.html' 'page.html' # self-link
 rel 'a.html' 'dir/index.html' '../a.html'
 rel 'x.html' 'a/b/c/index.html' '../../../x.html'
 rel 'h/a/x.jpg' 'h/a/sub/page.html' '../x.jpg'
 rel 'a/b/c/x.html' 'index.html' 'a/b/c/x.html'
 rel 'h/sub/x.jpg' 'h/page.html' 'sub/x.jpg'
 rel 'h/dir2/x.jpg' 'h/dir1/page.html' '../dir2/x.jpg' # sibling dir
 rel 'h/bc/x.jpg' 'h/b/page.html' '../bc/x.jpg'        # b/bc prefix trap
 rel 'h/b/x.jpg' 'h/bc/page.html' '../b/x.jpg'
 rel 'h2/img/x.jpg' 'h1/p/page.html' '../../h2/img/x.jpg' # cross-host
 rel 'img.cdn/photo.jpg' 'www.site/articles/2020/post.html' '../../../img.cdn/photo.jpg'
 rel 'h/a/' 'h/a/sub/page.html' '../' # link is ancestor dir
 rel 'x.html' 'page.html' 'x.html'
 rel 'dir/page.html?x=1' 'dir/index.html?y=2' 'page.html' # ? stripped
 ### ident_url_relatif
 ident 'img.gif' 'www.foo.com' '/dir/page.html' 'adr=www.foo.com fil=/dir/img.gif'
 ident 'sub/img.gif' 'www.foo.com' '/dir/page.html' 'adr=www.foo.com fil=/dir/sub/img.gif'
 ident '/img.gif' 'www.foo.com' '/dir/page.html' 'adr=www.foo.com fil=/img.gif'
 # embedded ../ collapses (#137)
 ident '../img.gif' 'www.foo.com' '/dir/sub/page.html' 'adr=www.foo.com fil=/dir/img.gif'
 ident 'sub/../logo.png' 'www.foo.com' '/articles/2020/post.html' 'adr=www.foo.com fil=/articles/2020/logo.png'
 ident '../../pix/sub/../logo.png' 'www.foo.com' '/articles/2020/post.html' 'adr=www.foo.com fil=/pix/logo.png'
 ident '../../../../x.gif' 'www.foo.com' '/a/b/page.html' 'adr=www.foo.com fil=/x.gif' # above-root clamp
 ident '?page=2' 'www.foo.com' '/dir/index.html?old=1' 'adr=www.foo.com fil=/dir/index.html?page=2'
 ident 'http://other.com/a/b/../c/index.html' 'www.foo.com' '/p.html' 'adr=other.com fil=/a/c/index.html'
 # file:// collapses ../ like the other schemes; traversal contained, // authority kept
 ident 'file:///var/data/pix/sub/../logo.png' 'www.foo.com' '/p.html' 'adr=file:// fil=/var/data/pix/logo.png'
 ident 'file:///a/b/c/../../d/e.gif' 'www.foo.com' '/p.html' 'adr=file:// fil=/a/d/e.gif'
 ident 'file:///a/../../b' 'www.foo.com' '/p.html' 'adr=file:// fil=/b'
 ident 'file://srv/share/../x' 'www.foo.com' '/p.html' 'adr=file:// fil=//srv/x'
 ident 'mailto:foo@bar.com' 'www.foo.com' '/p.html' 'error=-1' # unsupported scheme
 ident 'javascript:void(0)' 'www.foo.com' '/p.html' 'error=-1'
 echo "OK"
--- a/tests/01_engine-simplify.test
+++ b/tests/01_engine-simplify.test
@@ -26,3 +26,17 @@ simp './a/../../b' 'b'
 # empty segments ('//') are not dot-segments and are preserved, per RFC 3986
 simp 'a//b' 'a//b'
 simp 'a//b/../c' 'a//c'
 # absolute paths keep the leading '/'; above-root '..' is clamped to it
 simp '/a/../b' '/b'
 simp '/a/../../b' '/b'
 simp '/../x' '/x'
 # collapses to nothing -> './' (relative) or '/' (absolute)
 simp '..' './'
 simp 'a/..' './'
 simp '/' '/'
 simp 'a/b/..' 'a/'              # trailing bare '..'
 simp 'a/../b?x=../y' 'b?x=../y' # '?' freezes simplification
--- a/tests/01_engine-strsafe.test
+++ b/tests/01_engine-strsafe.test
@@ -21,9 +21,15 @@ test "$out" == "strsafe: OK" || exit 1
 # the bounded macro aborts (non-zero exit), so don't let set -e trip on it
 err=$(httrack -#8 overflow "this string is far too long for the buffer" 2>&1) || true
 case "$err" in
-	*"strsafe: NOT aborted"*) echo "over-capacity write was NOT caught" >&2; exit 1 ;;
+*"strsafe: NOT aborted"*)
    echo "over-capacity write was NOT caught" >&2
    exit 1
    ;;
 *"overflow while copying"*) ;;
-	*) echo "expected htssafe overflow abort, got: $err" >&2; exit 1 ;;
+*)
    echo "expected htssafe overflow abort, got: $err" >&2
    exit 1
    ;;
 esac
 # Same guarantee for the htsbuff builder. The source is exactly the buffer
@@ -32,7 +38,13 @@ esac
 # aborted"). Match the specific htsbuff abort message, not just any assert.
 err=$(httrack -#8 overflow-buff "abcd" 2>&1) || true
 case "$err" in
-	*"strsafe: NOT aborted"*) echo "htsbuff over-capacity write was NOT caught" >&2; exit 1 ;;
+*"strsafe: NOT aborted"*)
    echo "htsbuff over-capacity write was NOT caught" >&2
    exit 1
    ;;
 *"htsbuff append overflow"*) ;;
-	*) echo "expected htsbuff overflow abort, got: $err" >&2; exit 1 ;;
+*)
    echo "expected htsbuff overflow abort, got: $err" >&2
    exit 1
    ;;
 esac
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -35,6 +35,7 @@ TESTS = \
 	01_engine-mime.test \
 	01_engine-parse.test \
 	01_engine-rcfile.test \
 	01_engine-relative.test \
 	01_engine-simplify.test \
 	01_engine-strsafe.test \
 	02_manpage-regen.test \
--- a/tests/crawl-test.sh
+++ b/tests/crawl-test.sh
@@ -18,7 +18,7 @@ function debug {
 }
 function info {
-  printf "[$*] ..\t" >&2
+    printf '[%s] ..\t' "$*" >&2
 }
 function result {
@@ -66,31 +66,30 @@ function start-crawl {
        --debug)
            verbose=1
            ;;
-    --no-purge|--summary|--print-files)
+        --no-purge | --summary | --print-files) ;;
      ;;
        --errors | --files | --found | --not-found | --directory)
-      pos=$[${pos}+1]
+            pos=$((pos + 1))
            test "$#" -ge "$pos" || warning "missing argument" || return 1
            ;;
        httrack)
-      pos=$[${pos}+1]
+            pos=$((pos + 1))
-      break;
+            break
            ;;
        *)
            warning "unrecognized option ${!pos}"
            return 1
            ;;
        esac
-    pos=$[${pos}+1]
+        pos=$((pos + 1))
    done
-  debug "remaining args: ${@:${pos}}"
+    debug "remaining args: ${*:pos}"
    # ut/ won't exceed 2 minutes
-  moreargs="--quiet --max-time=120 --timeout=30 --connection-per-second=5"
+    moreargs=(--quiet --max-time=120 --timeout=30 --connection-per-second=5)
    # proxy environment ?
-  if test -n "$http_proxy"; then
+    if test -n "${http_proxy:-}"; then
-    moreargs="$moreargs --proxy $http_proxy"
+        moreargs+=(--proxy "$http_proxy")
    fi
    test -n "$tmpdir" || ! warning "no tmpdir" || return 1
@@ -104,9 +103,9 @@ function start-crawl {
    # start crawl
    log="${tmp}/log"
-  debug starting httrack -O "${tmp}" ${moreargs} ${@:${pos}}
+    debug starting httrack -O "${tmp}" "${moreargs[@]}" "${@:pos}"
-  info "running httrack ${@:${pos}}"
+    info "running httrack ${*:pos}"
-  httrack -O "${tmp}" --user-agent="httrack $ver ut ($(uname -omrs))" ${moreargs} ${@:${pos}} >"${log}" 2>&1 &
+    httrack -O "${tmp}" --user-agent="httrack $ver ut ($(uname -omrs))" "${moreargs[@]}" "${@:pos}" >"${log}" 2>&1 &
    crawlpid="$!"
    debug "started cralwer on pid $crawlpid"
    wait "$crawlpid"
@@ -164,12 +163,12 @@ function start-crawl {
            ;;
        --files)
            shift
-      nFiles=$(grep -E "^HTTrack Website Copier/[^ ]* mirror complete in " "${tmp}/hts-log.txt" \
+            nFiles=$(grep -E "^HTTrack Website Copier/[^ ]* mirror complete in " "${tmp}/hts-log.txt" |
-        | sed -e 's/.*[[:space:]]\([^ ]*\)[[:space:]]files written.*/\1/g')
+                sed -e 's/.*[[:space:]]\([^ ]*\)[[:space:]]files written.*/\1/g')
            assert_equals "checking files" "$1" "$nFiles"
            ;;
        httrack)
-      break;
+            break
            ;;
        esac
        shift
@@ -195,7 +194,7 @@ tmpdir=
 crawlpid=
 nopurge=
 verbose=
-trap "cleanup" 0 1 2 3 4 5 6 7 8 9 11 13 14 15 16 19 24 25
+trap cleanup EXIT HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV PIPE ALRM TERM STKFLT XCPU XFSZ
 # working directory
 tmpdir="${tmptopdir}/httrack_ut.$$"
--- a/tests/run-all-tests.sh
+++ b/tests/run-all-tests.sh
@@ -3,11 +3,11 @@
 error=0
 for i in *.test; do
-	if bash $i ; then
+    if bash "$i"; then
        echo "$i: passed" >&2
    else
        echo "$i: ERROR" >&2
-		error=$[${error}+1]
+        error=$((error + 1))
    fi
 done
Author	SHA1	Message	Date
Xavier Roche	05306ee4fd	Curate the 3.49-8 release notes Round out the 3.49-8 entry in history.txt and the debian changelog with the user-facing work landed since 3.49-7: the HTTPS-proxy CONNECT tunnel, wider srcset parsing, the crawler and parser fixes (CSS @import, xmlns, relative paths, RFC 6265 cookies, doit.log reload), the parser and engine buffer-copy security hardening, and brief summary lines for the API, build, CI and test work. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 13:02:51 +02:00
Xavier Roche	1d0fc0a566	Merge pull request #403 from xroche/chore/clang-format-separate-defs Separate definition blocks in the public headers	2026-06-20 12:56:23 +02:00
Xavier Roche	a4452592b4	Separate definition blocks and canonicalize the public headers Set SeparateDefinitionBlocks: Always in .clang-format so clang-format keeps a blank line between adjacent definitions, then reformat the installed (DevIncludes) headers in full. Several of them packed struct/typedef/macro definitions with no separation and carried non-canonical spacing (char*, __attribute__ ((x)), padded inner parens), which made them hard to read; this brings them to the repo's clang-format-19 canonical form and inserts the separating blank lines. Headers only, no semantic change: out-of-tree build is clean and make check passes (21 pass, 7 network skip, 0 fail). htsconfig.h is UTF-8 and its French comments survive byte-for-byte (clang-format only reflowed them to 80 columns). The new option also governs future touched-line formatting of the engine sources. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 12:52:19 +02:00
Xavier Roche	62c2364b59	Merge pull request #402 from xroche/chore/lint-all-shell-scripts Lint every shell script with shfmt and shellcheck	2026-06-20 12:42:19 +02:00
Xavier Roche	fe7041ddbf	Address review: keep empty-PATH parity, fold the CI script list Review of the array refactor flagged one behaviour divergence: splitting PATH with `IFS=: read -ra` keeps empty fields (from doubled or leading colons) as "" elements, where the old `echo $PATH \| tr : ' '` word-split dropped them, so the search loop would probe /htsserver. Skip the empty fields to restore exact parity. Also reflow the CI SHELL_SCRIPTS list as a folded block scalar, one entry per line and sorted, so it reads cleanly; the folded value is the same space-separated string. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 12:39:31 +02:00
Xavier Roche	f5543df1af	ci: lint every shell script with shellcheck and shfmt The lint job only covered a handful of scripts; bootstrap, build.sh, the generators, webhttrack, the CGI search helper and the crawl/run-all test harnesses went unchecked, and shfmt ran on three files. Now both linters run over the whole tracked shell tree, listed once in a job-level env var so the two steps stay in sync. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 11:37:09 +02:00
Xavier Roche	fee30aa95d	Make every shell script shellcheck-clean Fix the shellcheck findings the shfmt pass left behind, all proven behaviour-preserving: - Quote single-value expansions, drop the redundant ${} in arithmetic, add read -r, and use printf '%s' instead of variables in format strings, across the generators, crawl-test.sh, run-all-tests.sh and search.sh. - crawl-test.sh / webhttrack: turn the deliberately word-split search lists into bash arrays (space-safe, no scattered disables) and replace the numeric trap signal lists with names, dropping the un-trappable KILL/STOP that bash silently ignored anyway. - search.sh: drop the bogus \" escapes that made grep search for a literal-quoted pattern. The generators are exercised by hand and ship their committed output (htscodepages.h, htsentities.h); a differential run on synthetic input confirms byte-identical output before and after. crawl-test.sh and webhttrack were run end to end against a local server / a faked install, the latter also proving the array search now survives spaces in paths. SC2153/SC2120 false positives carry a scoped disable with a reason. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 11:35:55 +02:00
Xavier Roche	f9f4700ee1	Reformat every shell script with shfmt -i 4 Mechanical pass: run shfmt -i 4 over the whole tracked shell tree (the test harness .test files, the regen generators, webhttrack, the CGI search helper, and the build/dist scripts) so they share one style. shfmt also normalised backticks to $(...) and $[..] to $((..)). No behaviour change: arithmetic is preserved exactly, non-ASCII bytes are untouched, and the full make check suite still passes. The tab indented .test files become 4-space indented, hence the wide diff. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 11:24:01 +02:00
Xavier Roche	f030fa21e3	Merge pull request #401 from xroche/fix/relative-path-dotdot-137-162 Test the relative-link engine; collapse ../ in file:// URLs	2026-06-20 11:15:53 +02:00
Xavier Roche	bdd1c1bc2c	Test the relative-link engine; collapse ../ in file:// URLs The ../-handling tickets #137 (embedded ../ in a URL) and #162 (cross-host "too many ../") do not reproduce on master or the released 3.49.x: the engine has resolved embedded, cross-host, out-of-scope and above-root ../ correctly since the 2012 import, and the released binary behaves identically. #137's actual breakage was a JS-generated iframe URL (httrack can't rewrite dynamically-built links); #162 is a long-gone Windows path quirk. The area was nearly untested, though, despite feeding both link rewriting and crawl-scope decisions: two trivial lienrelatif asserts, none for ident_url_relatif. Add a wide regression net via two hidden debug probes (-#l lienrelatif, -#i ident_url_relatif, mirroring -#1 fil_simplifie) driving tens of cases in tests/01_engine-relative.test (embedded/cross-host/sibling/ ancestor/above-root ../, query stripping, scheme handling), plus the missing fil_simplifie edge cases (absolute paths, root clamp, query freeze) in 01_engine-simplify.test. Expected values are computed by hand, not echoed. While covering it, fixed one real gap: the file:// branch of ident_url_absolute skipped the fil_simplifie its http sibling runs, so file:// URLs kept their ../ in adrfil->fil while the save path was already collapsed (htsname.c:1343). Collapsing it matches the other schemes, contains traversal at the file:// root, and dedups a/../b against b. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 11:14:28 +02:00
Xavier Roche	56665a268f	Merge pull request #400 from xroche/fix/css-url-paren-163 Encode parens in rewritten CSS url() so the value isn't truncated (#163)	2026-06-20 10:02:32 +02:00
Xavier Roche	2e948b9acd	htsparse: percent-encode parens in rewritten CSS url() (#163 ) A source url(...) whose target encodes '(' ')' as %28/%29 was rewritten with literal parens, because they are RFC2396 "mark" characters that the URI escaper (escape_uri_utf, mode 30) leaves alone. In an unquoted CSS url(...) the literal ')' closes the token early, so the browser mis-parses the value and drops the background image. Re-escape '(' and ')' back to %28/%29 when emitting the link, gated on the url() context (ending_p == ')'). The UA decodes them to the saved-on-disk name, so the reference still resolves. Quoted url("...") and ordinary HTML attributes keep their parens, matching prior behavior. Test in 01_engine-parse.test crawls a CSS fixture whose url() references a %20%28...%29 name and asserts the rewrite keeps the parens encoded; negative control confirmed (literal-paren output fails it). Closes #163 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 10:01:17 +02:00
Xavier Roche	cae11499f1	Merge pull request #399 from xroche/fix/js-string-falsepos-218 htsparse: don't treat XHR.open's method argument as a URL (#218)	2026-06-19 20:36:26 +02:00
Xavier Roche	02c7f4ebf6	htsparse: don't treat XHR.open's method argument as a URL (#218 ) The JavaScript URL detector matched `.open(` for window.open("url",...) and captured the first argument as a link. XMLHttpRequest.open(method, url) puts the HTTP method first, so `xhr.open("GET", "ajax_info.txt")` turned "GET" into a bogus link, rewritten to "GET.html" on a live server. Reject a first argument that is exactly an HTTP method, mirroring the existing ensure_not_mime guard. window.open(url) is unaffected; the real XHR url (the second argument) is still picked up by the dirty parser. Closes #218 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-19 20:27:04 +02:00
Xavier Roche	9070b44a70	Merge pull request #398 from xroche/fix/html-underflow-396 htsparse: fix buffer underflow reading *(html-1) at offset 0 (#396)	2026-06-19 19:55:40 +02:00
Xavier Roche	799c045061	htsparse: don't read (html-1) before the parse buffer (#396 ) The link detector's word-boundary guards dereference (html-1) to check the byte preceding a matched token. When the token sits at the very start of the parse buffer (html == r->adr), that reads one byte before the allocation: a heap-buffer-overflow under ASan, silent on a normal build. A stylesheet beginning with a url() token is enough to hit it. Route the three reachable guards (url(), location=, the makeindex /title check) through html_prevc(), which returns a space sentinel at the buffer start. Space is the right value for these tests: a token at offset 0 is at a word boundary, so it stays a valid match. The other *(html-1) sites only run after html has advanced past an opening tag or quote. Covers it with an offset-0 url() fixture in 01_engine-parse.test; without the fix it aborts at htsparse.c:1386 under the CI sanitizer job. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-19 19:44:25 +02:00
Xavier Roche	fb1ee3bf2e	Merge pull request #397 from xroche/fix/css-import-94 CSS @import: capture URLs that carry a media/supports/layer condition (#94)	2026-06-19 19:30:21 +02:00
Xavier Roche	6a08ca7d39	htsparse: bound the URL-end scan against a missing closing delimiter Reviewing the @import change, ASan flagged a pre-existing heap overflow: when a quoted/parenthesized link token has no closing delimiter before the buffer ends (truncated input such as `@import "x`, `@import "`, `url("x`), the scan stops at the terminating NUL, then `c += ndelim` steps past it and `while (c == ' ')` / the terminator test read out of bounds. Such input aborts under ASan on master. Skip the URL-end scan and capture when no closing delimiter was found (`c == '\0'` right after the scan); c never advances past the NUL. Well-formed tokens are unaffected. 01_engine-parse.test gains a truncated-@import fixture (the valid sibling import is still captured, the unterminated one is not) that trips the overflow under the CI ASan job, plus a check that an @import's trailing media/supports/layer condition survives the rewrite verbatim. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-19 19:25:39 +02:00
Xavier Roche	a8b491e509	htsparse: capture conditional CSS @import URLs (#94 ) A bare-string @import carrying a media/supports/layer condition, e.g. `@import "theme.css" screen;`, was dropped. The detector required the closing quote to be immediately followed by the statement terminator, so the trailing condition aborted the capture. The `url(...)` form already worked because it terminates at the paren. Two coupled defects in the inscript/CSS detector: - accept a whitespace-separated trailing condition after a quoted @import URL; - bound the captured URL at its last content char (b) instead of recomputing from the terminator. The old `c -= (ndelim + 1)` mishandled spaces skipped before the terminator, leaving the closing quote inside the range so the bogus-link guard aborted. That also silently broke `foo="url" ;` (a space before the semicolon) for every quoted detection, not only @import. 01_engine-parse.test gains a CSS @import section that crawls a .css directly; the conditioned cases are negative controls that fail without the fix. Closes #94 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-19 18:46:31 +02:00
Xavier Roche	a8e4bb3b81	Merge pull request #395 from xroche/fix/xmlns-false-links-191 Don't crawl xmlns namespace declarations	2026-06-19 18:28:23 +02:00
Xavier Roche	0145ec37a3	htsparse: don't crawl xmlns namespace declarations (#191 ) The "dirty parsing" heuristic accepts any tag attribute whose value looks like a URL unless the attribute is on the no-detect list. xmlns and xmlns:prefix declarations carry namespace URIs (xmlns:og="http://ogp.me/ns#", etc.) that are not resources, so httrack queued and fetched them, stalling the crawl on unrelated spec URLs. Reject xmlns/xmlns:prefix where the no-detect list is already consulted. 01_engine-parse.test grows a fixture with each form (default and prefixed) as the last attribute of its element, since the heuristic only inspects an attribute whose value is immediately followed by '>'; the targets are local file:// gifs so a regression actually downloads them (verified: reverting the guard fetches all three). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-19 18:24:55 +02:00
Xavier Roche	a80fab38ba	Merge pull request #394 from xroche/fix/proxy-https-connect-85 Tunnel https through the proxy via CONNECT (#85)	2026-06-19 18:03:31 +02:00