Address review: keep empty-PATH parity, fold the CI script list

Review of the array refactor flagged one behaviour divergence: splitting PATH with `IFS=: read -ra` keeps empty fields (from doubled or leading colons) as "" elements, where the old `echo $PATH | tr : ' '` word-split dropped them, so the search loop would probe /htsserver. Skip the empty fields to restore exact parity. Also reflow the CI SHELL_SCRIPTS list as a folded block scalar, one entry per line and sorted, so it reads cleanly; the folded value is the same space-separated string. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>
ci: lint every shell script with shellcheck and shfmt
2026-06-20 17:18:14 +03:00 · 2026-06-20 12:39:31 +02:00 · 2026-06-20 11:37:09 +02:00 · 2026-06-20 11:35:55 +02:00 · 2026-06-20 11:24:01 +02:00 · 2026-06-20 11:15:53 +02:00
37 changed files with 1651 additions and 622 deletions
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -320,6 +320,21 @@ jobs:
  lint:
    name: lint (shellcheck, shfmt)
    runs-on: ubuntu-24.04
+    # Every tracked shell script; the globs expand at run time. Kept here so the
+    # shellcheck and shfmt steps below cannot drift apart.
+    env:
+      SHELL_SCRIPTS: >-
+        .githooks/pre-commit
+        bootstrap
+        build.sh
+        html/div/search.sh
+        man/makeman.sh
+        src/htsbasiccharsets.sh
+        src/htsentities.sh
+        src/webhttrack
+        tests/*.sh
+        tests/*.test
+        tools/mkdeb.sh
    steps:
      - uses: actions/checkout@v6

@@ -332,12 +347,11 @@ jobs:
          sudo apt-get install -y --no-install-recommends shellcheck shfmt
          shfmt --version

-      # Lint the scripts we maintain; the legacy scripts are a separate cleanup.
      - name: shellcheck
-        run: shellcheck man/makeman.sh tools/mkdeb.sh .githooks/pre-commit tests/*.test tests/check-network.sh
+        run: shellcheck $SHELL_SCRIPTS

      - name: shfmt
-        run: shfmt -d -i 4 man/makeman.sh tools/mkdeb.sh .githooks/pre-commit
+        run: shfmt -d -i 4 $SHELL_SCRIPTS

  # Check clang-format on CHANGED LINES ONLY. The engine predates clang-format
  # (it was shaped by an old Visual Studio formatter) and does not round-trip,
--- a/html/div/search.sh
+++ b/html/div/search.sh
@@ -1,8 +1,7 @@
-
 #!/bin/sh

 # Simple indexing test using HTTrack
-# A "real" script/program would use advanced search, and 
+# A "real" script/program would use advanced search, and
 # use dichotomy to find the word in the index.txt file
 # This script is really basic and NOT optimized, and
 # should not be used for professional purpose :)
@@ -11,50 +10,49 @@ TESTSITE="http://localhost/"

 # Create an index if necessary
 if ! test -f "index.txt"; then
-	echo "Building the index .."
-	rm -rf test
-	httrack --display "$TESTSITE"  -%I -O test
-	mv test/index.txt ./
+    echo "Building the index .."
+    rm -rf test
+    httrack --display "$TESTSITE" -%I -O test
+    mv test/index.txt ./
 fi

 # Convert crlf to lf
-if test "`head index.txt -n 1 | tr '\r' '#' | grep -c '#'`" = "1"; then
-	echo "Converting index to Unix LF style (not CR/LF) .."
-	mv -f index.txt index.txt.old
-	cat index.txt.old|tr -d '\r' > index.txt
+if test "$(head index.txt -n 1 | tr '\r' '#' | grep -c '#')" = "1"; then
+    echo "Converting index to Unix LF style (not CR/LF) .."
+    mv -f index.txt index.txt.old
+    tr -d '\r' <index.txt.old >index.txt
 fi

 keyword=-
 while test -n "$keyword"; do
-	printf "Enter a keyword: "
-	read keyword
+    printf "Enter a keyword: "
+    read -r keyword

-	if test -n "$keyword"; then
-		FOUNDK="`grep -niE \"^$keyword\" index.txt`"
+    if test -n "$keyword"; then
+        FOUNDK="$(grep -niE "^$keyword" index.txt)"

-		if test -n "$FOUNDK"; then	
-			if ! test `echo "$FOUNDK"|wc -l` = "1"; then
-				# Multiple matches
-				printf "Found multiple keywords: "
-				echo "$FOUNDK"|cut -f2 -d':'|tr '\n' ' '
-				echo ""
-				echo "Use keyword$ to find only one"
-			else
-				# One match
-				N=`echo "$FOUNDK"|cut -f1 -d':'`
-				PM=`tail +$N index.txt|grep -nE "\("|head -n 1`
-				if ! echo "$PM"|grep "ignored">/dev/null; then
-					M=`echo $PM|cut -f1 -d':'`
-					echo "Found in:"
-					cat index.txt | tail "+$N" | head -n "$M" | grep -E "[0-9]* " | cut -f2 -d' '
-				else
-					echo "keyword ignored (too many hits)"
-				fi
-					fi
-		else
-			echo "not found"
-		fi
+        if test -n "$FOUNDK"; then
+            if ! test "$(echo "$FOUNDK" | wc -l)" = "1"; then
+                # Multiple matches
+                printf "Found multiple keywords: "
+                echo "$FOUNDK" | cut -f2 -d':' | tr '\n' ' '
+                echo ""
+                echo "Use keyword$ to find only one"
+            else
+                # One match
+                N=$(echo "$FOUNDK" | cut -f1 -d':')
+                PM=$(tail "+$N" index.txt | grep -nE "\(" | head -n 1)
+                if ! echo "$PM" | grep "ignored" >/dev/null; then
+                    M=$(echo "$PM" | cut -f1 -d':')
+                    echo "Found in:"
+                    tail "+$N" index.txt | head -n "$M" | grep -E "[0-9]* " | cut -f2 -d' '
+                else
+                    echo "keyword ignored (too many hits)"
+                fi
+            fi
+        else
+            echo "not found"
+        fi

-	fi
+    fi
 done
-
--- a/src/htsback.c
+++ b/src/htsback.c
@@ -2532,8 +2532,26 @@ void back_wait(struct_back * sback, httrackp * opt, cache_back * cache,
 #if HTS_USEOPENSSL
          /* SSL mode */
          if (back[i].r.ssl) {
+            int tunnel_ok = 1;
+
+            // https via proxy: CONNECT-tunnel before TLS (#85)
+            if (back[i].r.req.proxy.active && back[i].r.ssl_con == NULL) {
+              const int timeout = back[i].timeout > 0 ? back[i].timeout : 30;
+
+              tunnel_ok =
+                  http_proxy_tunnel(opt, &back[i].r, back[i].url_adr, timeout);
+              if (!tunnel_ok) {
+                if (!strnotempty(back[i].r.msg))
+                  strcpybuff(back[i].r.msg, "proxy CONNECT failed");
+                deletehttp(&back[i].r);
+                back[i].r.soc = INVALID_SOCKET;
+                back[i].r.statuscode = STATUSCODE_NON_FATAL;
+                back[i].status = STATUS_READY;
+                back_set_finished(sback, i);
+              }
+            }
            // handshake not yet launched
-            if (!back[i].r.ssl_con) {
+            if (tunnel_ok && !back[i].r.ssl_con) {
              SSL_CTX_set_options(openssl_ctx, SSL_OP_ALL);
              // new session
              back[i].r.ssl_con = SSL_new(openssl_ctx);
@@ -2551,7 +2569,7 @@ void back_wait(struct_back * sback, httrackp * opt, cache_back * cache,
                back[i].r.statuscode = STATUSCODE_SSL_HANDSHAKE;
            }
            /* Error */
-            if (back[i].r.statuscode == STATUSCODE_SSL_HANDSHAKE) {
+            if (tunnel_ok && back[i].r.statuscode == STATUSCODE_SSL_HANDSHAKE) {
              strcpybuff(back[i].r.msg, "bad SSL/TLS handshake");
              deletehttp(&back[i].r);
              back[i].r.soc = INVALID_SOCKET;
@@ -3838,7 +3856,7 @@ void back_wait(struct_back * sback, httrackp * opt, cache_back * cache,
        /* funny log for commandline users */
        //if (!opt->quiet) {  
        // petite animation
-        if (opt->verbosedisplay == 1) {
+        if (opt->verbosedisplay == HTS_VERBOSE_SIMPLE) {
          if (back[i].status == STATUS_READY) {
            if (back[i].r.statuscode == HTTP_OK)
              printf("* %s%s (" LLintP " bytes) - OK" VT_CLREOL "\r",
--- a/src/htsbasiccharsets.sh
+++ b/src/htsbasiccharsets.sh
@@ -3,57 +3,59 @@

 # Change this to download files
 if false; then
-echo "mget ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-*.TXT" | lftp
-echo "mget ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP*.TXT" | lftp
-echo "mget ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP*.TXT" | lftp
-echo "mget ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP*.TXT" | lftp
-echo "mget ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/CP*.TXT" | lftp
-echo "mget ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8*.TXT" | lftp
-rm -f CP932.TXT CP936.TXT CP949.TXT CP950.TXT
+    echo "mget ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-*.TXT" | lftp
+    echo "mget ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP*.TXT" | lftp
+    echo "mget ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP*.TXT" | lftp
+    echo "mget ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP*.TXT" | lftp
+    echo "mget ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/CP*.TXT" | lftp
+    echo "mget ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8*.TXT" | lftp
+    rm -f CP932.TXT CP936.TXT CP949.TXT CP950.TXT
 fi

 # Produce code
-printf "/** GENERATED FILE ($0), DO NOT EDIT **/\n\n"
-for i in *.TXT ; do
-  echo "processing $i" >&2
-  grep -vE "^(#|$)" $i | grep -E "^0x" | sed -e 's/[[:space:]]/ /g' | cut -f1,2 -d' ' | \
-  (
-    unset arr
-    while read LINE ; do
-      from=$[$(echo $LINE | cut -f1 -d' ')]
-      if ! test -n "$from"; then
-        echo "error with $i" >&2
-        exit 1
-      elif test $from -ge 256; then
-        echo "out-of-range ($LINE) with $i" >&2
-        exit 1
-      fi
-      to=$(echo $LINE | cut -f2 -d' ') 
-      arr[$from]=$to
-    done
-    name=$(echo $i | tr 'A-Z' 'a-z' | tr '-' '_' | sed -e 's/\.txt//' -e 's/8859/iso_8859/')
-    printf "/* Table for $i */\nstatic const hts_UCS4 table_${name}[256] = {\n  "
-    i=0
-    while test "$i" -lt 256; do
-      if test "$i" -gt 0; then
-        printf ", "
-        if test $[${i}%8] -eq 0; then
-          printf "\n  "
-        fi
-      fi
-      value=${arr[$i]:-0}
-      printf "0x%04x" $value
-      i=$[${i}+1]
-    done
-    printf " };\n\n"
-  )
-  echo "processed $i" >&2
+printf '/** GENERATED FILE (%s), DO NOT EDIT **/\n\n' "$0"
+for i in *.TXT; do
+    echo "processing $i" >&2
+    grep -vE "^(#|$)" "$i" | grep -E "^0x" | sed -e 's/[[:space:]]/ /g' | cut -f1,2 -d' ' |
+        (
+            unset arr
+            while read -r LINE; do
+                from=$(($(echo "$LINE" | cut -f1 -d' ')))
+                if ! test -n "$from"; then
+                    echo "error with $i" >&2
+                    exit 1
+                elif test $from -ge 256; then
+                    echo "out-of-range ($LINE) with $i" >&2
+                    exit 1
+                fi
+                to=$(echo "$LINE" | cut -f2 -d' ')
+                arr[from]=$to
+            done
+            # shellcheck disable=SC2018,SC2019 # charset filenames are ASCII; keep C-locale A-Z/a-z
+            name=$(echo "$i" | tr 'A-Z' 'a-z' | tr '-' '_' | sed -e 's/\.txt//' -e 's/8859/iso_8859/')
+            printf '/* Table for %s */\nstatic const hts_UCS4 table_%s[256] = {\n  ' "$i" "$name"
+            idx=0
+            while test "$idx" -lt 256; do
+                if test "$idx" -gt 0; then
+                    printf ", "
+                    if test $((idx % 8)) -eq 0; then
+                        printf "\n  "
+                    fi
+                fi
+                value=${arr[$idx]:-0}
+                printf "0x%04x" "$value"
+                idx=$((idx + 1))
+            done
+            printf " };\n\n"
+        )
+    echo "processed $i" >&2
 done

 # Indexes
 printf "static const struct {\n  const char *name;\n  const hts_UCS4 *table;\n} table_mappings[] = {\n"
-for i in *.TXT ; do
-  name=$(echo $i | tr 'A-Z' 'a-z' | tr '-' '_' | sed -e 's/\.txt//' -e 's/8859/iso_8859/')
-  printf "  { \"$(echo $name | tr -d '_')\", table_${name} },\n"
+for i in *.TXT; do
+    # shellcheck disable=SC2018,SC2019 # charset filenames are ASCII; keep C-locale A-Z/a-z
+    name=$(echo "$i" | tr 'A-Z' 'a-z' | tr '-' '_' | sed -e 's/\.txt//' -e 's/8859/iso_8859/')
+    printf '  { "%s", table_%s },\n' "$(echo "$name" | tr -d '_')" "$name"
 done
 printf "  { NULL, NULL }\n};\n"
--- a/src/htscore.c
+++ b/src/htscore.c
@@ -3342,7 +3342,8 @@ int back_fill(struct_back * sback, httrackp * opt, cache_back * cache,
              int ptr, int numero_passe) {
  int n = back_pluggable_sockets(sback, opt);

-  if (opt->savename_delayed == 2 && !opt->delayed_cached)       /* cancel (always delayed) */
+  if (opt->savename_delayed == HTS_SAVENAME_DELAYED_HARD &&
+      !opt->delayed_cached) /* cancel (always delayed) */
    return 0;
  if (n > 0) {
    int p;
@@ -3846,7 +3847,7 @@ int htsAddLink(htsmoduleStruct * str, char *link) {
            a = opt->savename_type;
            b = opt->savename_83;
            opt->savename_type = 0;
-            opt->savename_83 = 0;
+            opt->savename_83 = HTS_SAVENAME_83_LONG;
            // note: adr,fil peuvent être patchés
            r =
              url_savename(&afs, NULL, NULL, NULL, opt, sback, cache, hashptr, ptr, numero_passe,
--- a/src/htscoremain.c
+++ b/src/htscoremain.c
@@ -612,12 +612,12 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
  /* Terminal is a tty, may ask questions and display funny information */
  if (isatty(1)) {
    opt->quiet = 0;
-    opt->verbosedisplay = 1;
+    opt->verbosedisplay = HTS_VERBOSE_SIMPLE;
  }
  /* Not a tty, no stdin input or funny output! */
  else {
    opt->quiet = 1;
-    opt->verbosedisplay = 0;
+    opt->verbosedisplay = HTS_VERBOSE_NONE;
  }
 #endif

@@ -953,9 +953,11 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
      p = buff;
      do {
        int insert_after_argc;
+        int quoted; /* "" unquotes to empty but is still a real token (#106) */

        // read next
        lastp = p;
+        quoted = (p != NULL && *p == '"');
        if (p) {
          p = next_token(p, 1);
          if (p) {
@@ -966,7 +968,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {

        /* Insert parameters BUT so that they can be in the same order */
        if (lastp) {
-          if (strnotempty(lastp)) {
+          if (strnotempty(lastp) || quoted) {
            insert_after_argc = argc - insert_after;
            cmdl_ins(lastp, insert_after_argc, (argv + insert_after), x_argvblk,
                     x_argvblk_size, x_ptr);
@@ -1815,24 +1817,22 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
                com++;
            }
            break;
-          case 'L':
-            {
-              sscanf(com + 1, "%d", &opt->savename_83);
-              switch (opt->savename_83) {
-              case 0:          // 8-3 (ISO9660 L1)
-                opt->savename_83 = 1;
-                break;
-              case 1:
-                opt->savename_83 = 0;
-                break;
-              default:         // 2 == ISO9660 (ISO9660 L2)
-                opt->savename_83 = 2;
-                break;
-              }
-              while(isdigit((unsigned char) *(com + 1)))
-                com++;
+          case 'L': {
+            sscanf(com + 1, "%d", (int *) &opt->savename_83);
+            switch (opt->savename_83) {
+            case 0: // 8-3 (ISO9660 L1)
+              opt->savename_83 = HTS_SAVENAME_83_DOS;
+              break;
+            case 1:
+              opt->savename_83 = HTS_SAVENAME_83_LONG;
+              break;
+            default: // 2 == ISO9660 (ISO9660 L2)
+              opt->savename_83 = HTS_SAVENAME_83_ISO9660;
+              break;
            }
-            break;
+            while (isdigit((unsigned char) *(com + 1)))
+              com++;
+          } break;
          case 's':
            if (isdigit((unsigned char) *(com + 1))) {
              sscanf(com + 1, "%d", (int *) &opt->robots);
@@ -1989,7 +1989,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
                }
                break;          // url hack
              case 'v':
-                opt->verbosedisplay = 2;
+                opt->verbosedisplay = HTS_VERBOSE_FULL;
                if (isdigit((unsigned char) *(com + 1))) {
                  sscanf(com + 1, "%d", (int *) &opt->verbosedisplay);
                  while(isdigit((unsigned char) *(com + 1)))
@@ -2004,7 +2004,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
                }
                break;
              case 'N':
-                opt->savename_delayed = 2;
+                opt->savename_delayed = HTS_SAVENAME_DELAYED_HARD;
                if (isdigit((unsigned char) *(com + 1))) {
                  sscanf(com + 1, "%d", (int *) &opt->savename_delayed);
                  while(isdigit((unsigned char) *(com + 1)))
@@ -2787,6 +2787,47 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
                  return 0;
                }
                break;
+              case 'l': /* lienrelatif: relative link from curr_fil to link */
+                if (na + 2 >= argc) {
+                  HTS_PANIC_PRINTF(
+                      "Option #l needs a link and a current-file path");
+                  printf(
+                      "Example: '-#l' 'host/dir/img.gif' 'host/dir/p.html'\n");
+                  htsmain_free();
+                  return -1;
+                } else {
+                  char s[HTS_URLMAXSIZE * 2];
+
+                  if (lienrelatif(s, sizeof(s), argv[na + 1], argv[na + 2]) ==
+                      0)
+                    printf("relative=%s\n", s);
+                  else
+                    printf("relative=<ERROR>\n");
+                  htsmain_free();
+                  return 0;
+                }
+                break;
+              case 'i': /* ident_url_relatif: resolve a link -> adr/fil */
+                if (na + 3 >= argc) {
+                  HTS_PANIC_PRINTF(
+                      "Option #i needs a link, an origin address and file");
+                  printf("Example: '-#i' '../img.gif' 'www.foo.com' "
+                         "'/d/p.html'\n");
+                  htsmain_free();
+                  return -1;
+                } else {
+                  lien_adrfil af;
+                  const int r = ident_url_relatif(argv[na + 1], argv[na + 2],
+                                                  argv[na + 3], &af);
+
+                  if (r == 0)
+                    printf("adr=%s fil=%s\n", af.adr, af.fil);
+                  else
+                    printf("error=%d\n", r);
+                  htsmain_free();
+                  return 0;
+                }
+                break;
              case '2':        // mimedefs
                if (na + 1 >= argc) {
                  HTS_PANIC_PRINTF("Option #2 needs to be followed by an URL");
@@ -3131,6 +3172,43 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
                htsmain_free();
                return err;
              } break;
+              case 'Q': { // cookie request-header selftest: httrack -#Q
+                static t_cookie cookie;
+                char hdr[1024];
+                /* RFC 6265: bare name=value pairs, no $Version/$Path (#151). */
+                const char *expected = "Cookie: name=value; has_js=1" H_CRLF;
+                int err = 0;
+
+                const char *dom = "www.example.com";
+                int added;
+
+                cookie.max_len = (int) sizeof(cookie.data);
+                cookie.data[0] = '\0';
+                added = cookie_add(&cookie, "name", "value", dom, "/");
+                added |= cookie_add(&cookie, "has_js", "1", dom, "/");
+                /* different domain: must be filtered out */
+                added |= cookie_add(&cookie, "junk", "x", "other.org", "/");
+                if (added) {
+                  printf("cookie-header: FAIL (cookie_add setup)\n");
+                  htsmain_free();
+                  return 1;
+                }
+
+                http_cookie_header_selftest(&cookie, dom, "/", hdr,
+                                            sizeof(hdr));
+                if (strcmp(hdr, expected) != 0)
+                  err = 1;
+                if (strstr(hdr, "$Version") != NULL ||
+                    strstr(hdr, "$Path") != NULL)
+                  err = 1;
+                if (strstr(hdr, "junk") != NULL) // wrong-domain cookie leaked
+                  err = 1;
+                printf("cookie-header: %s\n", err ? "FAIL" : "OK");
+                if (err)
+                  printf("  got: %s\n", hdr);
+                htsmain_free();
+                return err;
+              } break;
              case '!':
                HTS_PANIC_PRINTF
                  ("Option #! is disabled for security reasons");
--- a/src/htsentities.sh
+++ b/src/htsentities.sh
@@ -33,43 +33,43 @@ EOF
        else
            GET "${url}"
        fi
-    ) \
-        | grep -E '^<!ENTITY [a-zA-Z0-9_]' \
-        | sed \
-        -e 's/<!ENTITY //' -e "s/[[:space:]][[:space:]]*/ /g" \
-        -e 's/-->$//' \
-        -e 's/\([^ ]*\) CDATA "&#\([^\"]*\);" -- \(.*\)/\1 \2 \3/'\
-| ( \
-        read A
-        while test -n "$A"; do
-            ent="${A%% *}"
-            code=$(echo "$A"|cut -f2 -d' ')
-            # compute hash
-            hash=0
-            i=0
-            a=1664525
-            c=1013904223
-            m="$[1 << 32]"
-            while test "$i" -lt ${#ent}; do
-                d="$(echo -n "${ent:${i}:1}"|hexdump -v -e '/1 "%d"')"
-                hash="$[((${hash}*${a})%(${m})+${d}+${c})%(${m})]"
-                i=$[${i}+1]
-            done
-            echo -e "    /* $A */"
-            echo -e "  case ${hash}u:"
-            echo -e "    if (len == ${#ent} /* && strncmp(ent, \"${ent}\") == 0 */) {"
-            echo -e "      return ${code};"
-            echo -e "    }"
-            echo -e "    break;"
+    ) |
+        grep -E '^<!ENTITY [a-zA-Z0-9_]' |
+        sed \
+            -e 's/<!ENTITY //' -e "s/[[:space:]][[:space:]]*/ /g" \
+            -e 's/-->$//' \
+            -e 's/\([^ ]*\) CDATA "&#\([^\"]*\);" -- \(.*\)/\1 \2 \3/' |
+        (
+            read -r A
+            while test -n "$A"; do
+                ent="${A%% *}"
+                code=$(echo "$A" | cut -f2 -d' ')
+                # compute hash
+                hash=0
+                i=0
+                a=1664525
+                c=1013904223
+                m="$((1 << 32))"
+                while test "$i" -lt ${#ent}; do
+                    d="$(echo -n "${ent:${i}:1}" | hexdump -v -e '/1 "%d"')"
+                    hash="$((((hash * a) % (m) + d + c) % (m)))"
+                    i=$((i + 1))
+                done
+                echo -e "    /* $A */"
+                echo -e "  case ${hash}u:"
+                echo -e "    if (len == ${#ent} /* && strncmp(ent, \"${ent}\") == 0 */) {"
+                echo -e "      return ${code};"
+                echo -e "    }"
+                echo -e "    break;"

-            # next
-            read A
-        done
-    )
+                # next
+                read -r A
+            done
+        )
    cat <<EOF
  }
  /* unknown */
  return -1;
 }
 EOF
-) > ${dest}
+) >${dest}
--- a/src/htslib.c
+++ b/src/htslib.c
@@ -644,6 +644,165 @@ T_SOC http_fopen(httrackp * opt, const char *adr, const char *fil, htsblk * reto
  return http_xfopen(opt, 0, 1, 1, NULL, adr, fil, retour);
 }

+// Read a CRLF line from a non-blocking socket (waits up to timeout per recv).
+// Returns the line length (0 = empty), or -1 on timeout/EOF/error.
+static int proxy_getline(T_SOC soc, char *s, int max, int timeout) {
+  int j = 0;
+
+  for (;;) {
+    unsigned char ch;
+    int n;
+
+    if (!check_readinput_t(soc, timeout))
+      return -1; // timed out waiting for data
+    n = (int) recv(soc, &ch, 1, 0);
+    if (n == 1) {
+      if (ch == 13) // CR
+        continue;
+      if (ch == 10) // LF: end of line
+        break;
+      if (j >= max - 1)
+        return -1; // line too long: bound the read against a hostile proxy
+      s[j++] = (char) ch;
+    } else if (n == 0) {
+      return -1; // connection closed
+    } else {
+#ifdef _WIN32
+      if (WSAGetLastError() == WSAEWOULDBLOCK)
+        continue;
+#else
+      if (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK)
+        continue;
+#endif
+      return -1;
+    }
+  }
+  s[j] = '\0';
+  return j;
+}
+
+int http_proxy_tunnel(httrackp *opt, htsblk *retour, const char *adr,
+                      int timeout) {
+  const T_SOC soc = retour->soc;
+  const char *const host = jump_identification_const(adr); // host[:port]
+  const char *const portsep = jump_toport_const(adr);      // ":port" or NULL
+  char BIGSTK authority[HTS_URLMAXSIZE * 2];
+  char BIGSTK req[HTS_URLMAXSIZE * 4 + 1100];
+  char line[1024];
+  int code;
+
+  if (soc == INVALID_SOCKET)
+    return 0;
+
+  // CONNECT needs an explicit host:port; default the https port
+  authority[0] = '\0';
+  if (portsep != NULL)
+    strlcatbuff(authority, host, sizeof(authority)); // already host:port
+  else
+    snprintf(authority, sizeof(authority), "%s:%d", host, 443);
+
+  // backstop: never let a stray CR/LF in the host smuggle a second line into
+  // the CONNECT request (the host is already sanitized upstream)
+  {
+    const char *c;
+
+    for (c = authority; *c != '\0'; c++) {
+      if ((unsigned char) *c < ' ') {
+        strcpybuff(retour->msg, "proxy CONNECT: invalid host");
+        return 0;
+      }
+    }
+  }
+
+  snprintf(req, sizeof(req), "CONNECT %s HTTP/1.0" H_CRLF "Host: %s" H_CRLF,
+           authority, authority);
+
+  // creds go on the CONNECT, not the tunneled origin request
+  if (link_has_authorization(retour->req.proxy.name)) {
+    const char *a = jump_identification_const(retour->req.proxy.name);
+    const char *astart = jump_protocol_const(retour->req.proxy.name);
+    char autorisation[1100];
+    char user_pass[256];
+
+    autorisation[0] = user_pass[0] = '\0';
+    strncatbuff(user_pass, astart, (int) (a - astart) - 1);
+    strcpybuff(user_pass, unescape_http(OPT_GET_BUFF(opt),
+                                        OPT_GET_BUFF_SIZE(opt), user_pass));
+    code64((unsigned char *) user_pass, (int) strlen(user_pass),
+           (unsigned char *) autorisation, 0);
+    strlcatbuff(req, "Proxy-Authorization: Basic ", sizeof(req));
+    strlcatbuff(req, autorisation, sizeof(req));
+    strlcatbuff(req, H_CRLF, sizeof(req));
+  }
+  strlcatbuff(req, H_CRLF, sizeof(req)); // end of request headers
+
+  // raw send: ssl is set, so sendc() would route to TLS
+  {
+    const char *p = req;
+    size_t remain = strlen(req);
+    int stalls = 0;
+
+    while (remain > 0) {
+      const int n = (int) send(soc, p, (int) remain, 0);
+
+      if (n > 0) {
+        p += n;
+        remain -= (size_t) n;
+        stalls = 0;
+      } else {
+#ifdef _WIN32
+        const int wouldblock = (WSAGetLastError() == WSAEWOULDBLOCK);
+#else
+        const int wouldblock =
+            (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR);
+#endif
+        // don't spin forever on a fatal error or an unwritable socket
+        if (!wouldblock || !check_writeinput_t(soc, timeout) ||
+            ++stalls > 100) {
+          strcpybuff(retour->msg, "proxy CONNECT: write error");
+          return 0;
+        }
+      }
+    }
+  }
+
+  // proxy status line: "HTTP/1.x <code> ..."
+  if (proxy_getline(soc, line, sizeof(line), timeout) < 0) {
+    strcpybuff(retour->msg, "proxy CONNECT: no response");
+    return 0;
+  }
+  if (sscanf(line, "HTTP/%*d.%*d %d", &code) < 1)
+    code = 0;
+  if (code < 200 || code >= 300) {
+    snprintf(retour->msg, sizeof(retour->msg), "proxy CONNECT refused: %s",
+             strnotempty(line) ? line : "(no status)");
+    return 0;
+  }
+
+  // drain headers to the blank line; cap the count so a flooding proxy can't
+  // stall the crawl
+  {
+    int headers = 0;
+
+    for (;;) {
+      const int n = proxy_getline(soc, line, sizeof(line), timeout);
+
+      if (n < 0) {
+        strcpybuff(retour->msg, "proxy CONNECT: truncated response");
+        return 0;
+      }
+      if (n == 0)
+        break; // blank line: tunnel ready
+      if (++headers > 64) {
+        strcpybuff(retour->msg, "proxy CONNECT: too many response headers");
+        return 0;
+      }
+    }
+  }
+
+  return 1;
+}
+
 // ouverture d'une liaison http, envoi d'une requète
 // mode: 0 GET  1 HEAD  [2 POST]
 // treat: traiter header?
@@ -680,14 +839,14 @@ T_SOC http_xfopen(httrackp * opt, int mode, int treat, int waitconnect,

  /* connexion */
  if (retour) {
-    if ((!(retour->req.proxy.active))
-        || ((strcmp(adr, "file://") == 0)
-            || (strncmp(adr, "https://", 8) == 0)
-        )
-      ) {                       /* pas de proxy, ou non utilisable ici */
+    /* no proxy, or proxy not usable here (local file) */
+    if ((!(retour->req.proxy.active)) || (strcmp(adr, "file://") == 0)) {
      soc = newhttp(opt, adr, retour, -1, waitconnect);
    } else {
-      soc = newhttp(opt, retour->req.proxy.name, retour, retour->req.proxy.port, waitconnect);  // ouvrir sur le proxy à la place
+      // to the proxy; https tunnels to the origin via CONNECT in back_wait
+      // (#85)
+      soc = newhttp(opt, retour->req.proxy.name, retour, retour->req.proxy.port,
+                    waitconnect);
    }
  } else {
    soc = newhttp(opt, adr, NULL, -1, waitconnect);
@@ -874,6 +1033,50 @@ static void print_buffer(buff_struct*const str, const char *format, ...) {
  assertf(str->pos < str->capacity);
 }

+/* Append the request "Cookie:" header line for every stored cookie matching
+   domain/path. RFC 6265 form: bare "name=value" pairs joined by "; ", no
+   $Version/$Path attributes (those are RFC 2965 syntax that modern servers
+   reject, issue #151). Returns the number of cookies emitted. */
+static int append_cookie_header(buff_struct *bstr, t_cookie *cookie,
+                                const char *domain, const char *path) {
+  char buffer[8192];
+  char *b;
+  int cook = 0;
+  int max_cookies = 8;
+
+  if (cookie == NULL)
+    return 0;
+  b = cookie->data;
+  do {
+    b = cookie_find(b, "", domain, path); // next matching cookie
+    if (b != NULL) {
+      max_cookies--;
+      if (!cook) {
+        print_buffer(bstr, "Cookie: ");
+        cook = 1;
+      } else
+        print_buffer(bstr, "; ");
+      print_buffer(bstr, "%s", cookie_get(buffer, b, 5));
+      print_buffer(bstr, "=%s", cookie_get(buffer, b, 6));
+      b = cookie_nextfield(b);
+    }
+  } while (b != NULL && max_cookies > 0);
+  if (cook)
+    print_buffer(bstr, H_CRLF);
+  return cook;
+}
+
+/* Self-test entry for append_cookie_header(): build the request Cookie line
+   into dst (always NUL-terminated). Returns the number of cookies emitted. */
+int http_cookie_header_selftest(t_cookie *cookie, const char *domain,
+                                const char *path, char *dst, size_t dst_size) {
+  buff_struct bstr = {dst, dst_size, 0};
+
+  assertf(dst != NULL && dst_size > 0);
+  dst[0] = '\0';
+  return append_cookie_header(&bstr, cookie, domain, path);
+}
+
 // envoi d'une requète
 int http_sendhead(httrackp * opt, t_cookie * cookie, int mode,
                  const char *xsend, const char *adr, const char *fil,
@@ -999,8 +1202,8 @@ int http_sendhead(httrackp * opt, t_cookie * cookie, int mode,
    if (xsend)
      print_buffer(&bstr, "%s", xsend);  // éventuelles autres lignes

-    // tester proxy authentication
-    if (retour->req.proxy.active) {
+    // for https, auth rides the CONNECT (the tunneled GET would leak it)
+    if (retour->req.proxy.active && strncmp(adr, "https://", 8) != 0) {
      if (link_has_authorization(retour->req.proxy.name)) {     // et hop, authentification proxy!
        const char *a = jump_identification_const(retour->req.proxy.name);
        const char *astart = jump_protocol_const(retour->req.proxy.name);
@@ -1048,34 +1251,9 @@ int http_sendhead(httrackp * opt, t_cookie * cookie, int mode,
                         search_tag + strlen(POSTTOK) + 1))));
      }
    }
-    // gestion cookies?
+    // send stored cookies matching this host/path
    if (cookie) {
-      char buffer[8192];
-      char *b = cookie->data;
-      int cook = 0;
-      int max_cookies = 8;
-
-      do {
-        b = cookie_find(b, "", jump_identification_const(adr), fil);  // prochain cookie satisfaisant aux conditions
-        if (b != NULL) {
-          max_cookies--;
-          if (!cook) {
-            print_buffer(&bstr, "Cookie: $Version=1; ");
-            cook = 1;
-          } else
-            print_buffer(&bstr, "; ");
-          print_buffer(&bstr, "%s", cookie_get(buffer, b, 5));
-          print_buffer(&bstr, "=%s", cookie_get(buffer, b, 6));
-          print_buffer(&bstr, "; $Path=%s", cookie_get(buffer, b, 2));
-          b = cookie_nextfield(b);
-        }
-      } while(b != NULL && max_cookies > 0);
-      if (cook) {               // on a envoyé un (ou plusieurs) cookie?
-        print_buffer(&bstr, H_CRLF);
-#if DEBUG_COOK
-        printf("Header:\n%s\n", bstr.buffer);
-#endif
-      }
+      append_cookie_header(&bstr, cookie, jump_identification_const(adr), fil);
    }
    // gérer le keep-alive (garder socket)
    if (retour->req.http11 && !retour->req.nokeepalive) {
@@ -1808,6 +1986,24 @@ int check_readinput_t(T_SOC soc, int timeout) {
    return 0;
 }

+// wait until the socket is writable, up to timeout seconds
+int check_writeinput_t(T_SOC soc, int timeout) {
+  if (soc != INVALID_SOCKET) {
+    fd_set fds;
+    struct timeval tv;
+    const int isoc = (int) soc;
+
+    assertf(isoc == soc);
+    FD_ZERO(&fds);
+    FD_SET(isoc, &fds);
+    tv.tv_sec = timeout;
+    tv.tv_usec = 0;
+    select(isoc + 1, NULL, &fds, NULL, &tv);
+    return FD_ISSET(isoc, &fds) ? 1 : 0;
+  } else
+    return 0;
+}
+
 // idem, sauf qu'ici on peut choisir la taille max de données à recevoir
 // SI bufl==0 alors le buffer est censé être de 8kos, et on recoit par bloc de lignes
 // en éliminant les cr (ex: header), arrêt si double-lf
@@ -2409,6 +2605,8 @@ int ident_url_absolute(const char *url, lien_adrfil *adrfil) {
    for(i = 0; adrfil->fil[i] != '\0'; i++)
      if (adrfil->fil[i] == '\\')
        adrfil->fil[i] = '/';
+    // collapse ../ like the http branch above (path-traversal safety)
+    fil_simplifie(adrfil->fil);
  }

  // no hostname
@@ -5468,9 +5666,10 @@ HTSEXT_API httrackp *hts_create_opt(void) {
             "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)");
  StringCopy(opt->referer, "");
  StringCopy(opt->from, "");
-  opt->savename_83 = 0;         // noms longs par défaut
+  opt->savename_83 = HTS_SAVENAME_83_LONG; // long names by default
  opt->savename_type = 0;       // avec structure originale
-  opt->savename_delayed = 2;    // hard delayed type (default)
+  opt->savename_delayed =
+      HTS_SAVENAME_DELAYED_HARD; // always delay the type check (default)
  opt->delayed_cached = HTS_TRUE;
  opt->mimehtml = HTS_FALSE;
  opt->parsejava = HTSPARSE_DEFAULT;    // parser classes
@@ -5495,7 +5694,7 @@ HTSEXT_API httrackp *hts_create_opt(void) {
  opt->parseall = HTS_TRUE;
  opt->parsedebug = HTS_FALSE;
  opt->norecatch = HTS_FALSE;
-  opt->verbosedisplay = 0;      // pas d'animation texte
+  opt->verbosedisplay = HTS_VERBOSE_NONE; // no text animation
  opt->sizehack = HTS_FALSE;
  opt->urlhack = HTS_TRUE;
  StringCopy(opt->footer, HTS_DEFAULT_FOOTER);
--- a/src/htslib.h
+++ b/src/htslib.h
@@ -182,6 +182,11 @@ int http_sendhead(httrackp * opt, t_cookie * cookie, int mode, const char *xsend
                  const char *adr, const char *fil,
                  const char *referer_adr, const char *referer_fil,
                  htsblk * retour);
+/* Build the request "Cookie:" header line for stored cookies matching
+   domain/path into dst (NUL-terminated). Exposed for the -#Q self-test;
+   wraps the same logic http_sendhead() uses. Returns cookies emitted. */
+int http_cookie_header_selftest(t_cookie *cookie, const char *domain,
+                                const char *path, char *dst, size_t dst_size);

 //int newhttp(char* iadr,char* err=NULL);
 T_SOC newhttp(httrackp * opt, const char *iadr, htsblk * retour, int port,
@@ -193,6 +198,17 @@ HTS_INLINE void deletesoc_r(htsblk * r);
 htsblk http_test(httrackp * opt, const char *adr, const char *fil, char *loc);
 int check_readinput(htsblk * r);
 int check_readinput_t(T_SOC soc, int timeout);
+int check_writeinput_t(T_SOC soc, int timeout);
+
+/* Open an HTTP CONNECT tunnel through the active proxy for an https request:
+   `retour->soc` must already be TCP-connected to the proxy, and `adr` is the
+   origin authority (url_adr, e.g. "https://host:port"). Sends the CONNECT
+   request (with Proxy-Authorization when the proxy carries credentials) and
+   reads the proxy's status line, so the caller's TLS handshake then runs
+   end-to-end with the origin. Blocks up to `timeout` seconds. Returns 1 on a
+   2xx tunnel, 0 on failure (retour->msg/statuscode set). */
+int http_proxy_tunnel(httrackp *opt, htsblk *retour, const char *adr,
+                      int timeout);
 void treathead(t_cookie * cookie, const char *adr, const char *fil, htsblk * retour,
               char *rcvd);
 void treatfirstline(htsblk * retour, const char *rcvd);
--- a/src/htsname.c
+++ b/src/htsname.c
@@ -184,10 +184,11 @@ int url_savename(lien_adrfilsave *const afs,

  /* 8-3 ? */
  switch (opt->savename_83) {
-  case 1:                      // 8-3
+  case HTS_SAVENAME_83_DOS: // 8-3
    max_char = 8;
    break;
-  case 2:                      // Level 2 File names may be up to 31 characters.
+  case HTS_SAVENAME_83_ISO9660: // Level 2 File names may be up to 31
+                                // characters.
    max_char = 31;
    break;
  default:
@@ -324,7 +325,7 @@ int url_savename(lien_adrfilsave *const afs,
  }

  /* replace shtml to html.. */
-  if (opt->savename_delayed == 2)
+  if (opt->savename_delayed == HTS_SAVENAME_DELAYED_HARD)
    is_html = -1;               /* ALWAYS delay type */
  else
    is_html = ishtml(opt, fil);
@@ -363,7 +364,9 @@ int url_savename(lien_adrfilsave *const afs,
      ) {
      // tester type avec requète HEAD si on ne connait pas le type du fichier
      if (!((opt->check_type == 1) && (fil[strlen(fil) - 1] == '/')))   // slash doit être html?
-        if (opt->savename_delayed == 2 || (ishtest = ishtml(opt, fil)) < 0) {   // on ne sait pas si c'est un html ou un fichier..
+        if (opt->savename_delayed == HTS_SAVENAME_DELAYED_HARD ||
+            (ishtest = ishtml(opt, fil)) <
+                0) { // unsure whether it's html or a file
          // lire dans le cache
          htsblk r = cache_read_including_broken(opt, cache, adr, fil); // test uniquement

@@ -393,11 +396,12 @@ int url_savename(lien_adrfilsave *const afs,
            }
 #endif
            //
-          } else if (opt->savename_delayed != 2 && is_userknowntype(opt, fil)) {        /* PATCH BY BRIAN SCHRÖDER. 
-                                                                                           Lookup mimetype not only by extension, 
-                                                                                           but also by filename */
-            /* Note: "foo.cgi => text/html" means that foo.cgi shall have the text/html MIME file type,
-               that is, ".html" */
+          } else if (opt->savename_delayed != HTS_SAVENAME_DELAYED_HARD &&
+                     is_userknowntype(opt, fil)) { /* PATCH BY BRIAN SCHRÖDER.
+                              Lookup mimetype not only by extension,
+                              but also by filename */
+            /* Note: "foo.cgi => text/html" means that foo.cgi shall have the
+               text/html MIME file type, that is, ".html" */
            char BIGSTK mime[1024];

            mime[0] = ext[0] = '\0';
@@ -408,9 +412,13 @@ int url_savename(lien_adrfilsave *const afs,
              }
            }
          }
-          // note: if savename_delayed is enabled, the naming will be temporary (and slightly invalid!)
-          // note: if we are about to stop (opt->state.stop), back_add() will fail later
-          else if (opt->savename_delayed != 0 && !opt->state.stop) {
+          // note: if savename_delayed is enabled, the naming will be temporary
+          // (and slightly invalid!)
+          //
+          // note: if we are about to stop (opt->state.stop), back_add() will
+          // fail later
+          else if (opt->savename_delayed != HTS_SAVENAME_DELAYED_NONE &&
+                   !opt->state.stop) {
            // Check if the file is ready in backing. We basically take the same logic as later.
            // FIXME: we should cleanup and factorize this unholy mess
            if (headers != NULL && headers->status >= 0 && !is_redirect) {
@@ -698,7 +706,7 @@ int url_savename(lien_adrfilsave *const afs,
            }
            // restaurer
            opt->state._hts_in_html_parsing = hihp;
-          }                     // caché?
+          } // caché?
        }
    }
  }
@@ -1190,7 +1198,8 @@ int url_savename(lien_adrfilsave *const afs,
  // Not used anymore unless non-delayed types.
  // de même en cas de manque d'extension on en place une de manière forcée..
  // cela évite les /chez/toto et les /chez/toto/index.html incompatibles
-  if (opt->savename_type != -1 && opt->savename_delayed != 2) {
+  if (opt->savename_type != -1 &&
+      opt->savename_delayed != HTS_SAVENAME_DELAYED_HARD) {
    char *a = afs->save + strlen(afs->save) - 1;

    while((a > afs->save) && (*a != '.') && (*a != '/'))
@@ -1236,31 +1245,21 @@ int url_savename(lien_adrfilsave *const afs,
    size_t i;
    for(i = 0 ; afs->save[i] != '\0' ; i++) {
      unsigned char c = (unsigned char) afs->save[i];
-      if (c < 32      // control
-        || c == 127   // unwise
-        || c == '~'   // unix unwise
-        || c == '\\'  // windows separator
-        || c == ':'   // windows forbidden
-        || c == '*'   // windows forbidden
-        || c == '?'   // windows forbidden
-        || c == '\"'  // windows forbidden
-        || c == '<'   // windows forbidden
-        || c == '>'   // windows forbidden
-        || c == '|'   // windows forbidden
-        //|| c == '@' // ?
-        ||
-          (
-            opt->savename_83 == 2 // CDROM
-            &&
-            (
-              c == '-'
-              || c == '='
-              || c == '+'
-            )
-          )
-        )
-      {
-         afs->save[i] = '_';
+      if (c < 32       // control
+          || c == 127  // unwise
+          || c == '~'  // unix unwise
+          || c == '\\' // windows separator
+          || c == ':'  // windows forbidden
+          || c == '*'  // windows forbidden
+          || c == '?'  // windows forbidden
+          || c == '\"' // windows forbidden
+          || c == '<'  // windows forbidden
+          || c == '>'  // windows forbidden
+          || c == '|'  // windows forbidden
+          //|| c == '@' // ?
+          || (opt->savename_83 == HTS_SAVENAME_83_ISO9660 // CDROM
+              && (c == '-' || c == '=' || c == '+'))) {
+        afs->save[i] = '_';
      }
    }
  }
@@ -1521,7 +1520,8 @@ int url_savename(lien_adrfilsave *const afs,
          char *a = afs->save + strlen(afs->save) - 1;
          char *b;
          int n = 2;
-          char collisionSeparator = ((opt->savename_83 != 2) ? '-' : '_');
+          char collisionSeparator =
+              ((opt->savename_83 != HTS_SAVENAME_83_ISO9660) ? '-' : '_');

          tempo[0] = '\0';

--- a/src/htsopt.h
+++ b/src/htsopt.h
@@ -368,6 +368,13 @@ typedef enum hts_savename_delayed {
  HTS_SAVENAME_DELAYED_HARD = 2  /**< always delay the type check (default) */
 } hts_savename_delayed;

+/* Saved-name length layout (opt->savename_83). */
+typedef enum hts_savename_83 {
+  HTS_SAVENAME_83_LONG = 0,   /**< long file names (default) */
+  HTS_SAVENAME_83_DOS = 1,    /**< DOS 8.3 names (ISO9660 level 1) */
+  HTS_SAVENAME_83_ISO9660 = 2 /**< ISO9660 level 2 names (up to 31 chars) */
+} hts_savename_83;
+
 /* Host-banning triggers (opt->hostcontrol bitmask). */
 typedef enum hts_hostcontrol {
  HTS_HOSTCONTROL_BAN_TIMEOUT = 1 << 0, /**< ban a timing-out host */
@@ -430,7 +437,8 @@ struct httrackp {
  // int aff_progress;     // progress bar
  hts_boolean shell; /**< driven by a shell over stdin/stdout pipes */
  t_proxy proxy;     /**< proxy configuration */
-  int savename_83;   /**< force 8.3 (DOS) file names */
+  hts_savename_83
+      savename_83;   /**< saved-name length layout (long/DOS/ISO9660) */
  int savename_type; /**< saved-name layout (original tree, flat, ...) */
  String
      savename_userdef; /**< user-defined name template (e.g. %h%p/%n%q.%t) */
--- a/src/htsparse.c
+++ b/src/htsparse.c
@@ -296,6 +296,48 @@ static const char *html_inline_safe(const char *src, char *dst, size_t size) {
  return dst;
 }

+/* Byte before html, or a space sentinel at the buffer start where html[-1]
+   would underflow; space reads as the word boundary the guards want there. */
+static HTS_INLINE char html_prevc(const char *html, const char *start) {
+  return html > start ? html[-1] : ' ';
+}
+
+/* True if [s, s+len) is exactly an HTTP method token (XHR.open's first
+   argument is a method, not a URL: #218). Case-insensitive. */
+static int is_http_method(const char *s, size_t len) {
+  static const char *const methods[] = {"GET",    "POST",  "PUT",
+                                        "DELETE", "HEAD",  "OPTIONS",
+                                        "PATCH",  "TRACE", NULL};
+  int i;
+
+  for (i = 0; methods[i] != NULL; i++) {
+    if (strlen(methods[i]) == len && strfield(s, methods[i]) == (int) len)
+      return 1;
+  }
+  return 0;
+}
+
+/* Percent-encode '(' and ')' in a link emitted into an unquoted url(...) (CSS
+   or JS): a literal ')' closes the token early and the UA mis-parses the value
+   (#163). The UA decodes %28/%29 back to the saved-on-disk name. */
+static void escape_url_parens(char *const s, const size_t size) {
+  char BIGSTK buff[HTS_URLMAXSIZE * 2];
+  size_t i, j;
+
+  for (i = 0, j = 0; s[i] != '\0' && j + 3 < size && j + 3 < sizeof(buff);
+       i++) {
+    if (s[i] == '(' || s[i] == ')') {
+      buff[j++] = '%';
+      buff[j++] = '2';
+      buff[j++] = s[i] == '(' ? '8' : '9';
+    } else {
+      buff[j++] = s[i];
+    }
+  }
+  buff[j] = '\0';
+  strlcpybuff(s, buff, size);
+}
+
 /* Main parser */
 int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
  char catbuff[CATBUFF_SIZE];
@@ -556,7 +598,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                  if (opt->getmode & HTS_GETMODE_HTML) {
                    p = strfield(html, "title");
                    if (p) {
-                      if (*(html - 1) == '/')
+                      if (html_prevc(html, r->adr) == '/')
                        p = 0;  // /title
                    } else {
                      if (strfield(html, "/html"))
@@ -1341,6 +1383,11 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                    int can_avoid_quotes = 0;
                    char quotes_replacement = '\0';
                    int ensure_not_mime = 0;
+                    // .open(method,url): reject an HTTP-method first arg (#218)
+                    int ensure_not_method = 0;
+                    // @import: the quoted token is the URL; a trailing
+                    // media/supports/layer condition is not part of it
+                    int is_import = 0;

                    if (inscript_tag)
                      expected_end = ";\"\'";   // voir a href="javascript:doc.location='foo'"
@@ -1357,9 +1404,8 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                      if (!nc)
                        nc = strfield(html, ":location");        // javascript:location="doc"
                      if (!nc) {        // location="doc"
-                        if ((nc = strfield(html, "location"))
-                            && !isspace(*(html - 1))
-                          )
+                        if ((nc = strfield(html, "location")) &&
+                            !isspace(html_prevc(html, r->adr)))
                          nc = 0;
                      }
                      if (!nc)
@@ -1369,6 +1415,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                          expected = '(';       // parenthèse
                          expected_end = "),";  // fin: virgule ou parenthèse
                          ensure_not_mime = 1;  //* ensure the url is not a mime type */
+                          ensure_not_method = 1; // xhr.open: don't grab method
                        }
                      if (!nc)
                        if ((nc = strfield(html, ".replace"))) { // window.replace("url")
@@ -1380,7 +1427,9 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                          expected = '(';       // parenthèse
                          expected_end = ")";   // fin: parenthèse
                        }
-                      if (!nc && (nc = strfield(html, "url")) && (!isalnum(*(html - 1))) && *(html - 1) != '_') {  // url(url)
+                      if (!nc && (nc = strfield(html, "url")) &&
+                          (!isalnum(html_prevc(html, r->adr))) &&
+                          html_prevc(html, r->adr) != '_') { // url(url)
                        expected = '('; // parenthèse
                        expected_end = ")";     // fin: parenthèse
                        can_avoid_quotes = 1;
@@ -1390,6 +1439,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                        if ((nc = strfield(html, "import"))) {   // import "url"
                          if (is_space(*(html + nc))) {
                            expected = 0;       // no char expected
+                            is_import = 1;
                          } else
                            nc = 0;
                        }
@@ -1407,6 +1457,7 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                          if ((*a == 34) || (*a == '\'') || (can_avoid_quotes)) {
                            const char *b, *c;
                            int ndelim = 1;
+                            int valid_url = 0;

                            if ((*a == 34) || (*a == '\''))
                              a++;
@@ -1421,12 +1472,20 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                                b++;
                            }
                            c = b--;
-                            c += ndelim;
-                            while(*c == ' ')
-                              c++;
-                            if ((strchr(expected_end, *c)) || (*c == '\n')
-                                || (*c == '\r')) {
-                              c -= (ndelim + 1);
+                            // no closing delimiter here (truncated input):
+                            // Don't scan past the buffer NUL or capture it.
+                            if (*c != '\0') {
+                              c += ndelim;
+                              while (*c == ' ')
+                                c++;
+                              valid_url =
+                                  (strchr(expected_end, *c)) || (*c == '\n') ||
+                                  (*c == '\r') ||
+                                  (is_import && *(b + 1 + ndelim) == ' ');
+                            }
+                            if (valid_url) {
+                              // URL end = last char (b), not the delimiter
+                              c = b;
                              if ((int) (c - a + 1)) {
                                if (ensure_not_mime) {
                                  int i = 0;
@@ -1442,6 +1501,11 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                                    i++;
                                  }
                                }
+                                // XHR.open's "GET" etc. is a method, not a URL
+                                if (a != NULL && ensure_not_method &&
+                                    is_http_method(a, (size_t) (c - a + 1))) {
+                                  a = NULL;
+                                }
                                // Check for bogus links (Vasiliy)
                                if (a != NULL) {
                                  const size_t size = c - a + 1;
@@ -1485,7 +1549,6 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                                }
                              }
                            }
-
                          }
                        }
                      }
@@ -1692,6 +1755,24 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                                                              hts_nodetect[i -
                                                                           1]);
                                              }
+                                              // xmlns / xmlns:prefix declare
+                                              // XML namespaces, not resources
+                                              // (#191)
+                                              else {
+                                                const int xl = strfield(
+                                                    intag_startattr, "xmlns");
+                                                const char xc =
+                                                    intag_startattr[xl];
+                                                if (xl &&
+                                                    (xc == ':' || xc == '=' ||
+                                                     is_space(xc))) {
+                                                  url_ok = 0;
+                                                  hts_log_print(
+                                                      opt, LOG_DEBUG,
+                                                      "dirty parsing: xmlns "
+                                                      "namespace avoided");
+                                                }
+                                              }
                                            }
                                  }

@@ -2967,6 +3048,10 @@ int htsparse(htsmoduleStruct * str, htsmoduleStructExtended * stre) {
                          /* Never escape high-chars (we don't know the encoding!!) */
                          inplace_escape_uri_utf(tempo, sizeof(tempo));

+                          // unquoted url() (CSS/JS): keep parens escaped
+                          if (ending_p == ')')
+                            escape_url_parens(tempo, sizeof(tempo));
+
                          //if (!no_esc_utf)
                          //  escape_uri(tempo);     // escape with %xx
                          //else {
@@ -4262,10 +4347,10 @@ int hts_mirror_wait_for_next_file(htsmoduleStruct * str,
            char com[256];

            linput(stdin, com, 200);
-            if (opt->verbosedisplay == 2)
-              opt->verbosedisplay = 1;
+            if (opt->verbosedisplay == HTS_VERBOSE_FULL)
+              opt->verbosedisplay = HTS_VERBOSE_SIMPLE;
            else
-              opt->verbosedisplay = 2;
+              opt->verbosedisplay = HTS_VERBOSE_FULL;
            /* Info for wrappers */
            hts_log_print(opt, LOG_INFO, "engine: change-options");
            RUN_CALLBACK0(opt, chopt);
@@ -4375,7 +4460,7 @@ int hts_mirror_wait_for_next_file(htsmoduleStruct * str,
        printf("%c\x0d", ("/-\\|")[roll]);
        fflush(stdout);
      }
-    } else if (opt->verbosedisplay == 1) {
+    } else if (opt->verbosedisplay == HTS_VERBOSE_SIMPLE) {
      if (b >= 0) {
        if (back[b].r.statuscode == HTTP_OK)
          printf("%d/%d: %s%s (" LLintP " bytes) - OK\33[K\r", ptr, opt->lien_tot,
@@ -4466,8 +4551,8 @@ int hts_wait_delayed(htsmoduleStruct * str, lien_adrfilsave *afs,
  char in_error_msg[32];

  // resolve unresolved type
-  if (opt->savename_delayed != 0 && *forbidden_url == 0 && IS_DELAYED_EXT(afs->save)
-      && !opt->state.stop) {
+  if (opt->savename_delayed != HTS_SAVENAME_DELAYED_NONE &&
+      *forbidden_url == 0 && IS_DELAYED_EXT(afs->save) && !opt->state.stop) {
    int loops;
    int continue_loop;

@@ -4851,7 +4936,7 @@ int hts_wait_delayed(htsmoduleStruct * str, lien_adrfilsave *afs,
      }
    }

-  }                             // delayed type check ?
+  } // delayed type check ?

  ENGINE_SAVE_CONTEXT_BASE();

--- a/src/httrack.c
+++ b/src/httrack.c
@@ -288,7 +288,7 @@ static void __cdecl htsshow_uninit(t_hts_callbackarg * carg) {
 }
 static int __cdecl htsshow_start(t_hts_callbackarg * carg, httrackp * opt) {
  use_show = 0;
-  if (opt->verbosedisplay == 2) {
+  if (opt->verbosedisplay == HTS_VERBOSE_FULL) {
    use_show = 1;
    vt_clear();
  }
@@ -852,7 +852,7 @@ static void sig_doback(int blind) {     // mettre en backing
  if (global_opt != NULL) {
    // suppress logging and asking lousy questions
    global_opt->quiet = 1;
-    global_opt->verbosedisplay = 0;
+    global_opt->verbosedisplay = HTS_VERBOSE_NONE;
  }

  if (!blind)
--- a/src/webhttrack
+++ b/src/webhttrack
@@ -4,131 +4,140 @@
 # Initializes the htsserver GUI frontend and launch the default browser

 BROWSEREXE=
-SRCHBROWSEREXE="x-www-browser www-browser iceape mozilla firefox-developer-edition firefox icecat iceweasel abrowser firebird galeon konqueror midori opera google-chrome chrome chromium chromium-browser netscape firefox-developer-edition"
+SRCHBROWSEREXE=(x-www-browser www-browser iceape mozilla firefox-developer-edition firefox icecat iceweasel abrowser firebird galeon konqueror midori opera google-chrome chrome chromium chromium-browser netscape firefox-developer-edition)
+# shellcheck disable=SC2153 # BROWSER is the standard freedesktop env var, not a typo
 if test -n "${BROWSER}"; then
-# sensible-browser will f up if BROWSER is not set
-SRCHBROWSEREXE="xdg-open sensible-browser ${SRCHBROWSEREXE}"
+    # sensible-browser will f up if BROWSER is not set
+    SRCHBROWSEREXE=(xdg-open sensible-browser "${SRCHBROWSEREXE[@]}")
 fi
 # Patch for Darwin/Mac by Ross Williams
-if test "`uname -s`" == "Darwin"; then
-# Darwin/Mac OS X uses a system 'open' command to find
-# the default browser. The -W flag causes it to wait for
-# the browser to exit
-BROWSEREXE="/usr/bin/open -W"
+if test "$(uname -s)" == "Darwin"; then
+    # Darwin/Mac OS X uses a system 'open' command to find
+    # the default browser. The -W flag causes it to wait for
+    # the browser to exit
+    BROWSEREXE="/usr/bin/open -W"
 fi
-BINWD=`dirname "$0"`
-SRCHPATH="$BINWD /usr/local/bin /usr/share/bin /usr/bin /usr/lib/httrack /usr/local/lib/httrack /usr/local/share/httrack /opt/local/bin /sw/bin ${HOME}/usr/bin ${HOME}/bin"
-SRCHPATH="$SRCHPATH "`echo $PATH | tr ":" " "`
-SRCHDISTPATH="$BINWD/../share $BINWD/.. /usr/share /usr/local /usr /local /usr/local/share ${HOME}/usr ${HOME}/usr/share /opt/local/share /sw ${HOME}/usr/local ${HOME}/usr/share"
+BINWD=$(dirname "$0")
+SRCHPATH=("$BINWD" /usr/local/bin /usr/share/bin /usr/bin /usr/lib/httrack /usr/local/lib/httrack /usr/local/share/httrack /opt/local/bin /sw/bin "${HOME}/usr/bin" "${HOME}/bin")
+IFS=':' read -ra pathdirs <<<"$PATH"
+for d in "${pathdirs[@]}"; do
+    # drop empty PATH fields, matching the old echo|tr word-split
+    test -n "$d" && SRCHPATH+=("$d")
+done
+SRCHDISTPATH=("$BINWD/../share" "$BINWD/.." /usr/share /usr/local /usr /local /usr/local/share "${HOME}/usr" "${HOME}/usr/share" /opt/local/share /sw "${HOME}/usr/local" "${HOME}/usr/share")

 ###
 # And now some famous cuisine

 function log {
-echo "$0($$): $@" >&2
-return 0
+    echo "$0($$): $*" >&2
+    return 0
 }

 function launch_browser {
-log "Launching $1"
-browser=$1
-url=$2
-log "Spawning browser.."
-${browser} "${url}"
-# note: browser can hiddenly use the -remote feature of
-# mozilla and therefore return immediately
-log "Browser (or helper) exited"
+    log "Launching $1"
+    browser=$1
+    url=$2
+    log "Spawning browser.."
+    ${browser} "${url}"
+    # note: browser can hiddenly use the -remote feature of
+    # mozilla and therefore return immediately
+    log "Browser (or helper) exited"
 }

 # First ensure that we can launch the server
 BINPATH=
-for i in ${SRCHPATH}; do
-	! test -n "${BINPATH}" && test -x ${i}/htsserver && BINPATH=${i}
+for i in "${SRCHPATH[@]}"; do
+    ! test -n "${BINPATH}" && test -x "${i}/htsserver" && BINPATH="${i}"
 done
-for i in ${SRCHDISTPATH}; do
-	! test -n "${DISTPATH}" && test -f "${i}/httrack/lang.def" && DISTPATH="${i}/httrack"
+for i in "${SRCHDISTPATH[@]}"; do
+    ! test -n "${DISTPATH}" && test -f "${i}/httrack/lang.def" && DISTPATH="${i}/httrack"
 done
 test -n "${BINPATH}" || ! log "Could not find htsserver" || exit 1
 test -n "${DISTPATH}" || ! log "Could not find httrack directory" || exit 1
-test -f ${DISTPATH}/lang.def || ! log "Could not find ${DISTPATH}/lang.def" || exit 1
-test -f ${DISTPATH}/lang.indexes || ! log "Could not find ${DISTPATH}/lang.indexes" || exit 1
-test -d ${DISTPATH}/lang || ! log "Could not find ${DISTPATH}/lang" || exit 1
-test -d ${DISTPATH}/html || ! log "Could not find ${DISTPATH}/html" || exit 1
+test -f "${DISTPATH}/lang.def" || ! log "Could not find ${DISTPATH}/lang.def" || exit 1
+test -f "${DISTPATH}/lang.indexes" || ! log "Could not find ${DISTPATH}/lang.indexes" || exit 1
+test -d "${DISTPATH}/lang" || ! log "Could not find ${DISTPATH}/lang" || exit 1
+test -d "${DISTPATH}/html" || ! log "Could not find ${DISTPATH}/html" || exit 1

 # Locale
 HTSLANG="${LC_MESSAGES}"
 ! test -n "${HTSLANG}" && HTSLANG="${LC_ALL}"
 ! test -n "${HTSLANG}" && HTSLANG="${LANG}"
-HTSLANG="`echo $LANG | cut -f1 -d'.' | cut -f1 -d'_'`"
-LANGN=`grep -E "^${HTSLANG}:" ${DISTPATH}/lang.indexes | cut -f2 -d':'`
+HTSLANG="$(echo "$LANG" | cut -f1 -d'.' | cut -f1 -d'_')"
+LANGN=$(grep -E "^${HTSLANG}:" "${DISTPATH}/lang.indexes" | cut -f2 -d':')
 ! test -n "${LANGN}" && LANGN=1

 # Find the browser
 # note: not all systems have sensible-browser or www-browser alternative
 # thefeore, we have to find a bit more if sensible-browser could not be found

-for i in ${SRCHBROWSEREXE}; do
-for j in ${SRCHPATH}; do
-if test -x ${j}/${i}; then
-BROWSEREXE=${j}/${i}
-fi
-test -n "$BROWSEREXE" && break
-done
-test -n "$BROWSEREXE" && break
+for i in "${SRCHBROWSEREXE[@]}"; do
+    for j in "${SRCHPATH[@]}"; do
+        if test -x "${j}/${i}"; then
+            BROWSEREXE="${j}/${i}"
+        fi
+        test -n "$BROWSEREXE" && break
+    done
+    test -n "$BROWSEREXE" && break
 done
 test -n "$BROWSEREXE" || ! log "Could not find any suitable browser" || exit 1

 # "browse" command
 if test "$1" = "browse"; then
-if test -f "${HOME}/.httrack.ini"; then
-INDEXF=`cat ${HOME}/.httrack.ini | tr '\r' '\n' | grep -E "^path=" | cut -f2- -d'='`
-if test -n "${INDEXF}" -a -d "${INDEXF}" -a -f "${INDEXF}/index.html"; then
-INDEXF="${INDEXF}/index.html"
-else
-INDEXF=""
-fi
-fi
-if ! test -n "$INDEXF"; then 
-INDEXF="${HOME}/websites/index.html"
-fi
-launch_browser "${BROWSEREXE}" "file://${INDEXF}"
-exit $?
+    if test -f "${HOME}/.httrack.ini"; then
+        INDEXF=$(tr '\r' '\n' <"${HOME}/.httrack.ini" | grep -E "^path=" | cut -f2- -d'=')
+        if test -n "${INDEXF}" -a -d "${INDEXF}" -a -f "${INDEXF}/index.html"; then
+            INDEXF="${INDEXF}/index.html"
+        else
+            INDEXF=""
+        fi
+    fi
+    if ! test -n "$INDEXF"; then
+        INDEXF="${HOME}/websites/index.html"
+    fi
+    launch_browser "${BROWSEREXE}" "file://${INDEXF}"
+    exit $?
 fi

 # Create a temporary filename
-TMPSRVFILE="$(mktemp ${TMPDIR:-/tmp}/.webhttrack.XXXXXXXX)" || ! log "Could not create the temporary file ${TMPSRVFILE}" || exit 1
+TMPSRVFILE="$(mktemp "${TMPDIR:-/tmp}/.webhttrack.XXXXXXXX")" || ! log "Could not create the temporary file ${TMPSRVFILE}" || exit 1
 # Launch htsserver binary and setup the server
-(${BINPATH}/htsserver "${DISTPATH}/" --ppid "$$" path "${HOME}/websites" lang "${LANGN}" $@; echo SRVURL=error) > ${TMPSRVFILE}&
+(
+    "${BINPATH}/htsserver" "${DISTPATH}/" --ppid "$$" path "${HOME}/websites" lang "${LANGN}" "$@"
+    echo SRVURL=error
+) >"${TMPSRVFILE}" &
 # Find the generated SRVURL
 SRVURL=
 MAXCOUNT=60
 while ! test -n "$SRVURL"; do
-MAXCOUNT=$[$MAXCOUNT - 1]
-test $MAXCOUNT -gt 0 || exit 1
-test $MAXCOUNT -lt 50 && echo "waiting for server to reply.."
-SRVURL=`grep -E URL= ${TMPSRVFILE} | cut -f2- -d=`
-test ! "$SRVURL" = "error" || ! log "Could not spawn htsserver" || exit 1
-test -n "$SRVURL" || sleep 1
+    MAXCOUNT=$((MAXCOUNT - 1))
+    test $MAXCOUNT -gt 0 || exit 1
+    test $MAXCOUNT -lt 50 && echo "waiting for server to reply.."
+    SRVURL=$(grep -E URL= "${TMPSRVFILE}" | cut -f2- -d=)
+    test ! "$SRVURL" = "error" || ! log "Could not spawn htsserver" || exit 1
+    test -n "$SRVURL" || sleep 1
 done

 # Cleanup function
+# shellcheck disable=SC2120 # $1 is an optional "signal caught" marker; bare calls are intentional
 function cleanup {
-test -n "$1" && log "Nasty signal caught, cleaning up.."
-# Do not kill if browser exited (chrome bug issue) ; server will die itself
-test -n "$1" && test -f ${TMPSRVFILE} && SRVPID=`grep -E PID= ${TMPSRVFILE} | cut -f2- -d=`
-test -n "${SRVPID}" && kill -9 ${SRVPID}
-test -f ${TMPSRVFILE} && rm ${TMPSRVFILE}
-test -n "$1" && log "..Done"
-return 0
+    test -n "$1" && log "Nasty signal caught, cleaning up.."
+    # Do not kill if browser exited (chrome bug issue) ; server will die itself
+    test -n "$1" && test -f "${TMPSRVFILE}" && SRVPID=$(grep -E PID= "${TMPSRVFILE}" | cut -f2- -d=)
+    test -n "${SRVPID}" && kill -9 "${SRVPID}"
+    test -f "${TMPSRVFILE}" && rm "${TMPSRVFILE}"
+    test -n "$1" && log "..Done"
+    return 0
 }

 # Cleanup in case of emergency
-trap "cleanup now; exit" 1 2 3 4 5 6 7 8 9 11 13 14 15 16 19 24 25
+trap "cleanup now; exit" HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV PIPE ALRM TERM STKFLT XCPU XFSZ

 # Got SRVURL, launch browser
 launch_browser "${BROWSEREXE}" "${SRVURL}"

 # That's all, folks!
-trap "" 1 2 3 4 5 6 7 8 9 11 13 14 15 16 19 24 25
+trap "" HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV PIPE ALRM TERM STKFLT XCPU XFSZ
 cleanup
 exit 0
--- a/tests/01_engine-charset.test
+++ b/tests/01_engine-charset.test
@@ -6,11 +6,11 @@ set -euo pipefail
 # charset -> UTF-8 conversion (hts_convertStringToUTF8).
 # -#3 <charset> <string> prints the string re-decoded from <charset> as UTF-8.
 conv() {
-	test "$(httrack -O /dev/null -#3 "$1" "$2")" == "$3" || exit 1
+    test "$(httrack -O /dev/null -#3 "$1" "$2")" == "$3" || exit 1
 }
 # crash probe: malformed input must exit cleanly, not abort.
 runs() {
-	httrack -O /dev/null -#3 "$1" "$2" >/dev/null 2>&1 || exit 1
+    httrack -O /dev/null -#3 "$1" "$2" >/dev/null 2>&1 || exit 1
 }

 # the source bytes below are UTF-8 (this file is UTF-8); "café" is 0x63 61 66 C3 A9.
--- a/tests/01_engine-cookies.test
+++ b/tests/01_engine-cookies.test
@@ -0,0 +1,15 @@
+#!/bin/bash
+#
+# Issue #151 guard: the request Cookie header must be bare RFC 6265 name=value
+# pairs, no $Version/$Path attributes. Driven by the 'httrack -#Q' selftest.
+
+set -eu
+
+# A trailing token is required; a bare '-#Q' falls through to the usage screen.
+out=$(httrack -#Q run)
+
+# Exact-match the success line so a fall-through to usage can't pass the test.
+test "$out" = "cookie-header: OK" || {
+    echo "expected 'cookie-header: OK', got: $out" >&2
+    exit 1
+}
--- a/tests/01_engine-doitlog.test
+++ b/tests/01_engine-doitlog.test
@@ -89,4 +89,37 @@ grep -q NEWCONTENT "$(find "$out" -path '*/a.html' -print -quit)" || {
    exit 1
 }

+# --- 3. an empty quoted arg survives the doit.log round-trip (#106) ----------
+# -%F "" (empty footer) records an empty "" token in doit.log; -r2 follows it so
+# a "drop the empty token" bug shifts -r2 into -%F's slot (the reprise then sees
+# -%F -r2 and panics "%F needs to be followed by ..."), making the bug visible
+# rather than a harmless run off the end of argv.
+out2="$tmp/out2"
+rc=0
+"$bin" "$url" -O "$out2" --quiet -n -%v0 -%F "" -r2 >/dev/null 2>&1 || rc=$?
+test "$rc" -eq 0 || {
+    echo "FAIL: initial mirror with empty footer exited $rc"
+    exit 1
+}
+# precondition: the writer put the empty token on disk for the reader to reload.
+grep -q ' -%F "" -r2' "$out2/hts-cache/doit.log" || {
+    echo "FAIL: empty footer not recorded as -%F \"\" -r2 in doit.log"
+    grep -- '-%F' "$out2/hts-cache/doit.log" || true
+    exit 1
+}
+# no-url reprise: the reader rebuilds argv from doit.log and rewrites doit.log
+# from it. The empty token surviving in the regenerated file proves the reader
+# kept it (a drop/swallow would panic above or rewrite -%F without the "").
+rc=0
+"$bin" -O "$out2" --quiet >/dev/null 2>&1 || rc=$?
+test "$rc" -eq 0 || {
+    echo "FAIL: empty-footer reprise exited $rc (empty token dropped from doit.log?)"
+    exit 1
+}
+grep -q ' -%F "" -r2' "$out2/hts-cache/doit.log" || {
+    echo "FAIL: empty footer did not survive the doit.log reload round-trip"
+    grep -- '-%F' "$out2/hts-cache/doit.log" || true
+    exit 1
+}
+
 exit 0
--- a/tests/01_engine-entities.test
+++ b/tests/01_engine-entities.test
@@ -6,11 +6,11 @@ set -euo pipefail
 # HTML entity unescaping (hts_unescapeEntitiesWithCharset).
 # -#6 <string> prints the string with entities decoded (UTF-8 output).
 ent() {
-	test "$(httrack -O /dev/null -#6 "$1")" == "$2" || exit 1
+    test "$(httrack -O /dev/null -#6 "$1")" == "$2" || exit 1
 }
 # crash probe: malformed input must exit cleanly, not abort.
 runs() {
-	httrack -O /dev/null -#6 "$1" >/dev/null 2>&1 || exit 1
+    httrack -O /dev/null -#6 "$1" >/dev/null 2>&1 || exit 1
 }

 # named entities
--- a/tests/01_engine-filter.test
+++ b/tests/01_engine-filter.test
@@ -7,10 +7,10 @@ set -euo pipefail
 # -#0 <filter> <string> prints "<string> does match <filter>" or "... does NOT match ...".

 match() {
-	test "$(httrack -O /dev/null -#0 "$1" "$2")" == "$2 does match $1" || exit 1
+    test "$(httrack -O /dev/null -#0 "$1" "$2")" == "$2 does match $1" || exit 1
 }
 nomatch() {
-	test "$(httrack -O /dev/null -#0 "$1" "$2")" == "$2 does NOT match $1" || exit 1
+    test "$(httrack -O /dev/null -#0 "$1" "$2")" == "$2 does NOT match $1" || exit 1
 }

 # bare star matches everything
@@ -67,7 +67,7 @@ nomatch '*[\[]' 'a'
 # filter guide claims (GitHub #148); it parses as the class {'[','\'} followed
 # by a trailing literal ']'. These assertions document the current (buggy)
 # behavior so any future matcher fix is a deliberate, visible change.
-nomatch '*[\[\]]' '['   # not matched, despite the docs
-match '*[\[\]]' ']'     # only via the empty class-match + trailing ']'
-match '*[\[\]]' '[]'    # one of {'[','\'} then the trailing ']'
+nomatch '*[\[\]]' '[' # not matched, despite the docs
+match '*[\[\]]' ']'   # only via the empty class-match + trailing ']'
+match '*[\[\]]' '[]'  # one of {'[','\'} then the trailing ']'
 nomatch '*[\[\]]' '[]x'
--- a/tests/01_engine-mime.test
+++ b/tests/01_engine-mime.test
@@ -7,10 +7,10 @@ set -euo pipefail
 # -#2 <path> prints "<path> is '<mime>'" then "and its local type is '.<ext>'".

 mime() {
-	test "$(httrack -O /dev/null -#2 "$1" | head -1)" == "$1 is '$2'" || exit 1
+    test "$(httrack -O /dev/null -#2 "$1" | head -1)" == "$1 is '$2'" || exit 1
 }
 unknown() {
-	test "$(httrack -O /dev/null -#2 "$1" | head -1)" == "$1 is of an unknown MIME type" || exit 1
+    test "$(httrack -O /dev/null -#2 "$1" | head -1)" == "$1 is of an unknown MIME type" || exit 1
 }

 mime '/a/b.html' 'text/html'
--- a/tests/01_engine-parse.test
+++ b/tests/01_engine-parse.test
@@ -154,4 +154,173 @@ grep -Eq "style=\"background-image:url\('ibgs\.gif'\)\"" "$saved2" ||
 grep -q 'title="file://' "$saved2" ||
    ! echo "FAIL: a no-detect attribute (title) was wrongly rewritten" || exit 1

+# xmlns / xmlns:prefix decls must not be crawled (#191). Local file:// targets so a
+# regression downloads them; each is the LAST attr (heuristic only scans a value before '>').
+site3="$tmp/xmlns"
+mkdir -p "$site3"
+for f in ns og rdfs real; do gif "$site3/$f.gif"; done
+cat >"$site3/index.html" <<EOF
+<html xmlns="file://$site3/ns.gif"><body>
+<svg xmlns:og="file://$site3/og.gif"></svg>
+<div class="c" xmlns:rdfs="file://$site3/rdfs.gif"></div>
+<a href="file://$site3/real.gif">real link</a>
+</body></html>
+EOF
+out3="$tmp/xmlns-out"
+crawl "$site3/index.html" "$out3"
+
+# the real link is still captured
+found "real.gif" "$out3"
+# namespace-declaration targets must not be fetched (default + prefixed forms)
+notfound "ns.gif" "$out3"
+notfound "og.gif" "$out3"
+notfound "rdfs.gif" "$out3"
+
+# CSS @import (#94): every form's target is captured, crawling the .css directly.
+# The "cond"/"sup"/"spc" cases carry a trailing media/supports/layer condition (or
+# a space before ';'); they are the negative controls: without the parser fix the
+# URL is dropped, so a regression fails these found() checks.
+site4="$tmp/cssimport"
+mkdir -p "$site4"
+for f in nq dqu squ dqs sqs med cond sup lay spc; do printf 'body{}\n' >"$site4/$f.css"; done
+cat >"$site4/main.css" <<'EOF'
+@import url(nq.css);
+@import url("dqu.css");
+@import url('squ.css');
+@import "dqs.css";
+@import 'sqs.css';
+@import url(med.css) screen and (min-width: 400px);
+@import "cond.css" screen;
+@import "sup.css" supports(display: flex);
+@import url(lay.css) layer(base);
+@import "spc.css" ;
+EOF
+out4="$tmp/cssimport-out"
+crawl "$site4/main.css" "$out4"
+for f in nq dqu squ dqs sqs med cond sup lay spc; do found "$f.css" "$out4"; done
+
+# Over-capture guard: the trailing condition is not part of the URL, so it must
+# survive the rewrite verbatim. A regression that grabs it would mangle these.
+m4=$(find "$out4" -type f -path '*/file/*' -name main.css -print -quit)
+test -n "$m4" || ! echo "FAIL: saved main.css not found" || exit 1
+for cond in '@import "cond.css" screen;' 'supports(display: flex)' 'layer(base)'; do
+    grep -Fq "$cond" "$m4" ||
+        ! echo "FAIL #94: '$cond' altered on rewrite (condition captured as URL?)" || exit 1
+done
+
+# Malformed input: an unterminated @import quote (truncated CSS) must not crash or
+# capture a bogus link; a valid sibling import is still captured. Guards a heap
+# overflow on the URL-end scan that aborts under ASan (CI sanitizer job).
+site5="$tmp/cssimport-trunc"
+mkdir -p "$site5"
+printf 'body{}\n' >"$site5/good.css"
+printf '@import "good.css";\n@import "trunc' >"$site5/main.css"
+out5="$tmp/cssimport-trunc-out"
+crawl "$site5/main.css" "$out5"
+found "good.css" "$out5"
+notfound "trunc" "$out5"
+
+# Offset-0 underflow (#396): a token at the buffer start makes the detector's
+# word-boundary guard read *(html-1) one byte early (aborts under ASan). The
+# url() target is still captured; here it just must not underflow.
+site6="$tmp/parse-off0"
+mkdir -p "$site6"
+printf 'body{}\n' >"$site6/off0.css"
+printf 'url(off0.css)\n' >"$site6/main.css"
+out6="$tmp/parse-off0-out"
+crawl "$site6/main.css" "$out6"
+found "off0.css" "$out6"
+
+# XMLHttpRequest.open(method, url) (#218): the first argument is an HTTP method,
+# not a URL. Without the fix "GET" is captured as a link and fetched (the offline
+# fixture saves a bare file named GET; a live server mangles it to GET.html).
+# window.open(url) detection must be unaffected.
+site7="$tmp/xhropen"
+mkdir -p "$site7"
+gif "$site7/winopen.gif"
+cat >"$site7/index.html" <<EOF
+<html><body><script>
+var x = new XMLHttpRequest();
+x.open("GET", "ajax_info.txt");
+var y = new XMLHttpRequest();
+y.open("Post", "submit.cgi");
+window.open("file://$site7/winopen.gif");
+</script></body></html>
+EOF
+out7="$tmp/xhropen-out"
+crawl "$site7/index.html" "$out7"
+# negative control: without the fix a file named exactly GET is downloaded
+notfound "GET" "$out7"
+# methods are matched case-insensitively (XHR spec normalizes them): a mixed-case
+# method is rejected too, so a file named Post must not appear either
+notfound "Post" "$out7"
+# regression guard: window.open(url) is still detected, so its absolute URL is
+# rewritten to a local link. The rewrite only happens if the parser saw it, so
+# these two assertions fail if .open detection broke (not a trivial --near save).
+saved7=$(savedhtml "$out7")
+test -n "$saved7" || ! echo "FAIL: saved xhr page not found" || exit 1
+grep -Fq 'window.open("winopen.gif")' "$saved7" ||
+    ! echo "FAIL #218: window.open(url) no longer detected/rewritten" || exit 1
+! grep -Fq 'window.open("file://' "$saved7" ||
+    ! echo "FAIL #218: window.open URL left absolute (not rewritten)" || exit 1
+
+# Parens in an unquoted url(...) (#163): the source %28/%29 decode to literal
+# '(' ')' in the saved name, but a literal ')' in the rewritten url() closes the
+# token early, so they must stay encoded. Negative control: without the fix the
+# %281%29 greps fail (parens are RFC2396 "mark" chars the escaper leaves alone).
+site8="$tmp/cssparens"
+mkdir -p "$site8"
+for f in 'img (1).gif' 'a(b)c(1).gif' 'q (4).gif'; do gif "$site8/$f"; done
+cat >"$site8/style.css" <<'EOF'
+.a { background: url(img%20%281%29.gif); }
+.b { background: url(a%28b%29c%281%29.gif); }
+.c { background: url("q%20%284%29.gif"); }
+EOF
+out8="$tmp/cssparens-out"
+crawl "$site8/style.css" "$out8"
+found "img (1).gif" "$out8"
+found "a(b)c(1).gif" "$out8"
+found "q (4).gif" "$out8"
+css8=$(find "$out8" -type f -path '*/file/*' -name style.css -print -quit)
+test -n "$css8" || ! echo "FAIL: saved style.css not found" || exit 1
+grep -Fq 'url(img%20%281%29.gif)' "$css8" ||
+    ! echo "FAIL #163: parens in unquoted url() not percent-encoded on rewrite" || exit 1
+grep -Fq 'url(a%28b%29c%281%29.gif)' "$css8" ||
+    ! echo "FAIL #163: not every paren in a url() was percent-encoded" || exit 1
+grep -Fq 'url("q%20%284%29.gif")' "$css8" ||
+    ! echo "FAIL #163: quoted url() altered or parens left literal on rewrite" || exit 1
+
+# The url() detector is not CSS-specific: <script> and inline style= get the
+# same encoding, but ordinary href/src (ending_p is the quote, not ')') keep
+# literal parens -- the attribute checks guard the gate against over-firing.
+site9="$tmp/urlparens"
+mkdir -p "$site9"
+for f in 'js (1).gif' 'inl (2).gif' 'asrc (3).gif' 'ahref (4).gif'; do gif "$site9/$f"; done
+cat >"$site9/index.html" <<EOF
+<html><body>
+<script>var bg = "url(js%20%281%29.gif)";</script>
+<div style="background-image:url(inl%20%282%29.gif)"></div>
+<img src="asrc%20%283%29.gif">
+<a href="ahref%20%284%29.gif">link</a>
+</body></html>
+EOF
+out9="$tmp/urlparens-out"
+crawl "$site9/index.html" "$out9"
+saved9=$(savedhtml "$out9")
+test -n "$saved9" || ! echo "FAIL: saved urlparens page not found" || exit 1
+# rewrite-only: the JS-string asset is not queued for download
+grep -Fq 'url(js%20%281%29.gif)' "$saved9" ||
+    ! echo "FAIL #163: parens in <script> url() not percent-encoded" || exit 1
+found "inl (2).gif" "$out9"
+grep -Fq 'url(inl%20%282%29.gif)' "$saved9" ||
+    ! echo "FAIL #163: parens in inline style url() not percent-encoded" || exit 1
+found "asrc (3).gif" "$out9"
+found "ahref (4).gif" "$out9"
+grep -Fq 'src="asrc%20(3).gif"' "$saved9" ||
+    ! echo "FAIL #163: parens in a plain src attribute were wrongly encoded" || exit 1
+grep -Fq 'href="ahref%20(4).gif"' "$saved9" ||
+    ! echo "FAIL #163: parens in a plain href attribute were wrongly encoded" || exit 1
+! grep -Eq '(src|href)="[^"]*%28' "$saved9" ||
+    ! echo "FAIL #163: gate over-fired onto a non-url() attribute link" || exit 1
+
 exit 0
--- a/tests/01_engine-relative.test
+++ b/tests/01_engine-relative.test
@@ -0,0 +1,68 @@
+#!/bin/bash
+#
+# lienrelatif (build relative path) + ident_url_relatif (resolve a link, collapse
+# ./ and ../). Regression net for #137/#162; expected values hand-computed.
+
+set -euo pipefail
+
+# relative path from <curr>'s directory to <link>
+rel() {
+    local got
+    got=$(httrack -O /dev/null -#l "$1" "$2")
+    test "$got" == "relative=$3" ||
+        {
+            echo "FAIL rel($1, $2): got '$got' want 'relative=$3'"
+            exit 1
+        }
+}
+
+# resolve <link> against origin <adr>/<fil> -> adr=.. fil=..
+ident() {
+    local got
+    got=$(httrack -O /dev/null -#i "$1" "$2" "$3")
+    test "$got" == "$4" ||
+        {
+            echo "FAIL ident($1, $2, $3): got '$got' want '$4'"
+            exit 1
+        }
+}
+
+### lienrelatif
+
+rel 'dir/page.html' 'dir/index.html' 'page.html'
+rel 'dir/page.html' 'dir/page.html' 'page.html' # self-link
+rel 'a.html' 'dir/index.html' '../a.html'
+rel 'x.html' 'a/b/c/index.html' '../../../x.html'
+rel 'h/a/x.jpg' 'h/a/sub/page.html' '../x.jpg'
+rel 'a/b/c/x.html' 'index.html' 'a/b/c/x.html'
+rel 'h/sub/x.jpg' 'h/page.html' 'sub/x.jpg'
+rel 'h/dir2/x.jpg' 'h/dir1/page.html' '../dir2/x.jpg' # sibling dir
+rel 'h/bc/x.jpg' 'h/b/page.html' '../bc/x.jpg'        # b/bc prefix trap
+rel 'h/b/x.jpg' 'h/bc/page.html' '../b/x.jpg'
+rel 'h2/img/x.jpg' 'h1/p/page.html' '../../h2/img/x.jpg' # cross-host
+rel 'img.cdn/photo.jpg' 'www.site/articles/2020/post.html' '../../../img.cdn/photo.jpg'
+rel 'h/a/' 'h/a/sub/page.html' '../' # link is ancestor dir
+rel 'x.html' 'page.html' 'x.html'
+rel 'dir/page.html?x=1' 'dir/index.html?y=2' 'page.html' # ? stripped
+
+### ident_url_relatif
+
+ident 'img.gif' 'www.foo.com' '/dir/page.html' 'adr=www.foo.com fil=/dir/img.gif'
+ident 'sub/img.gif' 'www.foo.com' '/dir/page.html' 'adr=www.foo.com fil=/dir/sub/img.gif'
+ident '/img.gif' 'www.foo.com' '/dir/page.html' 'adr=www.foo.com fil=/img.gif'
+# embedded ../ collapses (#137)
+ident '../img.gif' 'www.foo.com' '/dir/sub/page.html' 'adr=www.foo.com fil=/dir/img.gif'
+ident 'sub/../logo.png' 'www.foo.com' '/articles/2020/post.html' 'adr=www.foo.com fil=/articles/2020/logo.png'
+ident '../../pix/sub/../logo.png' 'www.foo.com' '/articles/2020/post.html' 'adr=www.foo.com fil=/pix/logo.png'
+ident '../../../../x.gif' 'www.foo.com' '/a/b/page.html' 'adr=www.foo.com fil=/x.gif' # above-root clamp
+ident '?page=2' 'www.foo.com' '/dir/index.html?old=1' 'adr=www.foo.com fil=/dir/index.html?page=2'
+ident 'http://other.com/a/b/../c/index.html' 'www.foo.com' '/p.html' 'adr=other.com fil=/a/c/index.html'
+# file:// collapses ../ like the other schemes; traversal contained, // authority kept
+ident 'file:///var/data/pix/sub/../logo.png' 'www.foo.com' '/p.html' 'adr=file:// fil=/var/data/pix/logo.png'
+ident 'file:///a/b/c/../../d/e.gif' 'www.foo.com' '/p.html' 'adr=file:// fil=/a/d/e.gif'
+ident 'file:///a/../../b' 'www.foo.com' '/p.html' 'adr=file:// fil=/b'
+ident 'file://srv/share/../x' 'www.foo.com' '/p.html' 'adr=file:// fil=//srv/x'
+ident 'mailto:foo@bar.com' 'www.foo.com' '/p.html' 'error=-1' # unsupported scheme
+ident 'javascript:void(0)' 'www.foo.com' '/p.html' 'error=-1'
+
+echo "OK"
--- a/tests/01_engine-simplify.test
+++ b/tests/01_engine-simplify.test
@@ -5,7 +5,7 @@ set -euo pipefail

 # path simplify engine (fil_simplifie): collapses ./ and ../ segments.
 simp() {
-	test "$(httrack -O /dev/null -#1 "$1")" == "simplified=$2" || exit 1
+    test "$(httrack -O /dev/null -#1 "$1")" == "simplified=$2" || exit 1
 }

 simp './foo/bar/' 'foo/bar/'
@@ -26,3 +26,17 @@ simp './a/../../b' 'b'

 # empty segments ('//') are not dot-segments and are preserved, per RFC 3986
 simp 'a//b' 'a//b'
+simp 'a//b/../c' 'a//c'
+
+# absolute paths keep the leading '/'; above-root '..' is clamped to it
+simp '/a/../b' '/b'
+simp '/a/../../b' '/b'
+simp '/../x' '/x'
+
+# collapses to nothing -> './' (relative) or '/' (absolute)
+simp '..' './'
+simp 'a/..' './'
+simp '/' '/'
+
+simp 'a/b/..' 'a/'              # trailing bare '..'
+simp 'a/../b?x=../y' 'b?x=../y' # '?' freezes simplification
--- a/tests/01_engine-strsafe.test
+++ b/tests/01_engine-strsafe.test
@@ -21,9 +21,15 @@ test "$out" == "strsafe: OK" || exit 1
 # the bounded macro aborts (non-zero exit), so don't let set -e trip on it
 err=$(httrack -#8 overflow "this string is far too long for the buffer" 2>&1) || true
 case "$err" in
-	*"strsafe: NOT aborted"*) echo "over-capacity write was NOT caught" >&2; exit 1 ;;
-	*"overflow while copying"*) ;;
-	*) echo "expected htssafe overflow abort, got: $err" >&2; exit 1 ;;
+*"strsafe: NOT aborted"*)
+    echo "over-capacity write was NOT caught" >&2
+    exit 1
+    ;;
+*"overflow while copying"*) ;;
+*)
+    echo "expected htssafe overflow abort, got: $err" >&2
+    exit 1
+    ;;
 esac

 # Same guarantee for the htsbuff builder. The source is exactly the buffer
@@ -32,7 +38,13 @@ esac
 # aborted"). Match the specific htsbuff abort message, not just any assert.
 err=$(httrack -#8 overflow-buff "abcd" 2>&1) || true
 case "$err" in
-	*"strsafe: NOT aborted"*) echo "htsbuff over-capacity write was NOT caught" >&2; exit 1 ;;
-	*"htsbuff append overflow"*) ;;
-	*) echo "expected htsbuff overflow abort, got: $err" >&2; exit 1 ;;
+*"strsafe: NOT aborted"*)
+    echo "htsbuff over-capacity write was NOT caught" >&2
+    exit 1
+    ;;
+*"htsbuff append overflow"*) ;;
+*)
+    echo "expected htsbuff overflow abort, got: $err" >&2
+    exit 1
+    ;;
 esac
--- a/tests/10_crawl-simple.test
+++ b/tests/10_crawl-simple.test
@@ -3,6 +3,6 @@

 set -euo pipefail

-bash check-network.sh ||  ! echo "skipping online unit tests" || exit 77
+bash check-network.sh || ! echo "skipping online unit tests" || exit 77

 bash crawl-test.sh --errors 0 --files 5 httrack http://ut.httrack.com/simple/basic.html
--- a/tests/11_crawl-cookies.test
+++ b/tests/11_crawl-cookies.test
@@ -3,10 +3,10 @@

 set -euo pipefail

-bash check-network.sh ||  ! echo "skipping online unit tests" || exit 77
+bash check-network.sh || ! echo "skipping online unit tests" || exit 77

 bash crawl-test.sh --errors 0 --files 3 \
-	--found ut.httrack.com/cookies/third.html \
-	--found ut.httrack.com/cookies/second.html \
-	--found ut.httrack.com/cookies/entrance.html \
-	httrack http://ut.httrack.com/cookies/entrance.php
+    --found ut.httrack.com/cookies/third.html \
+    --found ut.httrack.com/cookies/second.html \
+    --found ut.httrack.com/cookies/entrance.html \
+    httrack http://ut.httrack.com/cookies/entrance.php
--- a/tests/11_crawl-idna.test
+++ b/tests/11_crawl-idna.test
@@ -3,21 +3,21 @@

 set -euo pipefail

-bash check-network.sh ||  ! echo "skipping online unit tests" || exit 77
+bash check-network.sh || ! echo "skipping online unit tests" || exit 77

 # unicode tests
 bash crawl-test.sh \
-	--errors 1 --files 5 \
-	--found 'café.ut.httrack.com/unicode-links/café3860.html' \
-	--found 'café.ut.httrack.com/unicode-links/café30f4.html' \
-	--found 'café.ut.httrack.com/unicode-links/café5e1f.html' \
-	--found 'café.ut.httrack.com/unicode-links/café7b30.html' \
-	httrack 'http://ut.httrack.com/unicode-links/idna.html' \
-	'+*.ut.httrack.com/*' --robots=0
+    --errors 1 --files 5 \
+    --found 'café.ut.httrack.com/unicode-links/café3860.html' \
+    --found 'café.ut.httrack.com/unicode-links/café30f4.html' \
+    --found 'café.ut.httrack.com/unicode-links/café5e1f.html' \
+    --found 'café.ut.httrack.com/unicode-links/café7b30.html' \
+    httrack 'http://ut.httrack.com/unicode-links/idna.html' \
+    '+*.ut.httrack.com/*' --robots=0

 # unicode tests (bogus links)
 bash crawl-test.sh \
-	--errors 0 --files 1 \
-	--found 'ut.httrack.com/unicode-links/idna_bogus.html' \
-	httrack 'http://ut.httrack.com/unicode-links/idna_bogus.html' \
-	'-*' --robots=0
+    --errors 0 --files 1 \
+    --found 'ut.httrack.com/unicode-links/idna_bogus.html' \
+    httrack 'http://ut.httrack.com/unicode-links/idna_bogus.html' \
+    '-*' --robots=0
--- a/tests/11_crawl-international.test
+++ b/tests/11_crawl-international.test
@@ -3,67 +3,67 @@

 set -euo pipefail

-bash check-network.sh ||  ! echo "skipping online unit tests" || exit 77
+bash check-network.sh || ! echo "skipping online unit tests" || exit 77

 # unicode tests
 bash crawl-test.sh \
-	--errors 1 --files 10 \
-	--found ut.httrack.com/unicode-links/caf%a91bce.html \
-	--found ut.httrack.com/unicode-links/café30f4.html \
-	--found ut.httrack.com/unicode-links/café3860.html \
-	--found ut.httrack.com/unicode-links/café463e.html \
-	--found ut.httrack.com/unicode-links/café5e1f.html \
-	--found ut.httrack.com/unicode-links/café7b30.html \
-	--found ut.httrack.com/unicode-links/café8007.html \
-	--found ut.httrack.com/unicode-links/café9fa8.html \
-	--found ut.httrack.com/unicode-links/caféae52.html \
-	--found ut.httrack.com/unicode-links/caféc009.html \
-	--found ut.httrack.com/unicode-links/utf8.html \
-	httrack http://ut.httrack.com/unicode-links/utf8.html
+    --errors 1 --files 10 \
+    --found ut.httrack.com/unicode-links/caf%a91bce.html \
+    --found ut.httrack.com/unicode-links/café30f4.html \
+    --found ut.httrack.com/unicode-links/café3860.html \
+    --found ut.httrack.com/unicode-links/café463e.html \
+    --found ut.httrack.com/unicode-links/café5e1f.html \
+    --found ut.httrack.com/unicode-links/café7b30.html \
+    --found ut.httrack.com/unicode-links/café8007.html \
+    --found ut.httrack.com/unicode-links/café9fa8.html \
+    --found ut.httrack.com/unicode-links/caféae52.html \
+    --found ut.httrack.com/unicode-links/caféc009.html \
+    --found ut.httrack.com/unicode-links/utf8.html \
+    httrack http://ut.httrack.com/unicode-links/utf8.html

 bash crawl-test.sh \
-	--errors 4 --files  7 \
-	--found ut.httrack.com/unicode-links/cafÃ©3860.html \
-	--found ut.httrack.com/unicode-links/cafÃ©9fa8.html \
-	--found ut.httrack.com/unicode-links/café30f4.html \
-	--found ut.httrack.com/unicode-links/café5e1f.html \
-	--found ut.httrack.com/unicode-links/café7b30.html \
-	--found ut.httrack.com/unicode-links/café8007.html \
-	--found ut.httrack.com/unicode-links/caf%e939bd.html \
-	--found ut.httrack.com/unicode-links/caf%e9ae52.html \
-	--found ut.httrack.com/unicode-links/caféaec2.html \
-	--found ut.httrack.com/unicode-links/caféfad6.html \
-	--found ut.httrack.com/unicode-links/default.html \
-	httrack http://ut.httrack.com/unicode-links/default.html
+    --errors 4 --files 7 \
+    --found ut.httrack.com/unicode-links/cafÃ©3860.html \
+    --found ut.httrack.com/unicode-links/cafÃ©9fa8.html \
+    --found ut.httrack.com/unicode-links/café30f4.html \
+    --found ut.httrack.com/unicode-links/café5e1f.html \
+    --found ut.httrack.com/unicode-links/café7b30.html \
+    --found ut.httrack.com/unicode-links/café8007.html \
+    --found ut.httrack.com/unicode-links/caf%e939bd.html \
+    --found ut.httrack.com/unicode-links/caf%e9ae52.html \
+    --found ut.httrack.com/unicode-links/caféaec2.html \
+    --found ut.httrack.com/unicode-links/caféfad6.html \
+    --found ut.httrack.com/unicode-links/default.html \
+    httrack http://ut.httrack.com/unicode-links/default.html

 bash crawl-test.sh \
-	--errors 2 --files  9 \
-	--found ut.httrack.com/unicode-links/caf%a9ae52.html \
-	--found ut.httrack.com/unicode-links/caf%a9bf59.html \
-	--found ut.httrack.com/unicode-links/café30f4.html \
-	--found ut.httrack.com/unicode-links/café3860.html \
-	--found ut.httrack.com/unicode-links/café5e1f.html \
-	--found ut.httrack.com/unicode-links/café647f.html \
-	--found ut.httrack.com/unicode-links/café7b30.html \
-	--found ut.httrack.com/unicode-links/café8007.html \
-	--found ut.httrack.com/unicode-links/caféaec2.html \
-	--found ut.httrack.com/unicode-links/caféfad6.html \
-	--found ut.httrack.com/unicode-links/iso88591.html \
-	httrack http://ut.httrack.com/unicode-links/iso88591.html
+    --errors 2 --files 9 \
+    --found ut.httrack.com/unicode-links/caf%a9ae52.html \
+    --found ut.httrack.com/unicode-links/caf%a9bf59.html \
+    --found ut.httrack.com/unicode-links/café30f4.html \
+    --found ut.httrack.com/unicode-links/café3860.html \
+    --found ut.httrack.com/unicode-links/café5e1f.html \
+    --found ut.httrack.com/unicode-links/café647f.html \
+    --found ut.httrack.com/unicode-links/café7b30.html \
+    --found ut.httrack.com/unicode-links/café8007.html \
+    --found ut.httrack.com/unicode-links/caféaec2.html \
+    --found ut.httrack.com/unicode-links/caféfad6.html \
+    --found ut.httrack.com/unicode-links/iso88591.html \
+    httrack http://ut.httrack.com/unicode-links/iso88591.html

 bash crawl-test.sh \
-	--errors 4 --files  9 \
-	--found ut.httrack.com/unicode-links/caf%a8%a6c72a.html \
-	--found ut.httrack.com/unicode-links/caf%a9bf59.html \
-	--found ut.httrack.com/unicode-links/café8007.html \
-	--found ut.httrack.com/unicode-links/cafébf43.html \
-	--found ut.httrack.com/unicode-links/cafédcd8.html \
-	--found ut.httrack.com/unicode-links/café2461.html \
-	--found ut.httrack.com/unicode-links/caf%a8%a61bce.html \
-	--found ut.httrack.com/unicode-links/caf%a9ae52.html \
-	--found ut.httrack.com/unicode-links/café7b30.html \
-	--found ut.httrack.com/unicode-links/café30f4.html \
-	--found ut.httrack.com/unicode-links/café5e1f.html \
-	--found ut.httrack.com/unicode-links/café3860.html \
-	--found ut.httrack.com/unicode-links/gb18030.html \
-	httrack http://ut.httrack.com/unicode-links/gb18030.html
+    --errors 4 --files 9 \
+    --found ut.httrack.com/unicode-links/caf%a8%a6c72a.html \
+    --found ut.httrack.com/unicode-links/caf%a9bf59.html \
+    --found ut.httrack.com/unicode-links/café8007.html \
+    --found ut.httrack.com/unicode-links/cafébf43.html \
+    --found ut.httrack.com/unicode-links/cafédcd8.html \
+    --found ut.httrack.com/unicode-links/café2461.html \
+    --found ut.httrack.com/unicode-links/caf%a8%a61bce.html \
+    --found ut.httrack.com/unicode-links/caf%a9ae52.html \
+    --found ut.httrack.com/unicode-links/café7b30.html \
+    --found ut.httrack.com/unicode-links/café30f4.html \
+    --found ut.httrack.com/unicode-links/café5e1f.html \
+    --found ut.httrack.com/unicode-links/café3860.html \
+    --found ut.httrack.com/unicode-links/gb18030.html \
+    httrack http://ut.httrack.com/unicode-links/gb18030.html
--- a/tests/11_crawl-longurl.test
+++ b/tests/11_crawl-longurl.test
@@ -3,10 +3,10 @@

 set -euo pipefail

-bash check-network.sh ||  ! echo "skipping online unit tests" || exit 77
+bash check-network.sh || ! echo "skipping online unit tests" || exit 77

 # http://code.google.com/p/httrack/issues/detail?id=42&can=1
 # we expect 2 errors only because other links are too longs (to be modified if suitable)
 bash crawl-test.sh --errors 2 --files 1 \
-	--found ut.httrack.com/overflow/longquerywithaccents.html \
-	httrack http://ut.httrack.com/overflow/longquerywithaccents.php
+    --found ut.httrack.com/overflow/longquerywithaccents.html \
+    httrack http://ut.httrack.com/overflow/longquerywithaccents.php
--- a/tests/11_crawl-parsing.test
+++ b/tests/11_crawl-parsing.test
@@ -3,45 +3,45 @@

 set -euo pipefail

-bash check-network.sh ||  ! echo "skipping online unit tests" || exit 77
+bash check-network.sh || ! echo "skipping online unit tests" || exit 77

 # http://code.google.com/p/httrack/issues/detail?id=4&can=1
 bash crawl-test.sh --errors 0 --files 4 \
-	--found ut.httrack.com/parsing/back5e1f.gif \
-	--found ut.httrack.com/parsing/events.html \
-	--found ut.httrack.com/parsing/fade230f4.gif \
-	--found ut.httrack.com/parsing/fade3860.gif \
-	httrack http://ut.httrack.com/parsing/events.html
+    --found ut.httrack.com/parsing/back5e1f.gif \
+    --found ut.httrack.com/parsing/events.html \
+    --found ut.httrack.com/parsing/fade230f4.gif \
+    --found ut.httrack.com/parsing/fade3860.gif \
+    httrack http://ut.httrack.com/parsing/events.html

 # http://code.google.com/p/httrack/issues/detail?id=2&can=1
 bash crawl-test.sh --errors 0 --files 3 \
-	--found ut.httrack.com/parsing/background-image.css \
-	--found ut.httrack.com/parsing/background-image.html \
-	--found ut.httrack.com/parsing/fade.gif \
-	httrack http://ut.httrack.com/parsing/background-image.html
+    --found ut.httrack.com/parsing/background-image.css \
+    --found ut.httrack.com/parsing/background-image.html \
+    --found ut.httrack.com/parsing/fade.gif \
+    httrack http://ut.httrack.com/parsing/background-image.html

 # javascript parsing
 bash crawl-test.sh --errors 0 --files 3 \
-	--found ut.httrack.com/parsing/back.gif \
-	--found ut.httrack.com/parsing/fade.gif \
-	--found ut.httrack.com/parsing/javascript.html \
-	httrack http://ut.httrack.com/parsing/javascript.html
+    --found ut.httrack.com/parsing/back.gif \
+    --found ut.httrack.com/parsing/fade.gif \
+    --found ut.httrack.com/parsing/javascript.html \
+    httrack http://ut.httrack.com/parsing/javascript.html

 # handling of + before query string
 bash crawl-test.sh --errors 0 --files 6 \
-	--found ut.httrack.com/parsing/escaping.html \
-	--found "ut.httrack.com/parsing/foo bar30f4.html" \
-	--found "ut.httrack.com/parsing/foo bar5e1f.html" \
-	--found "ut.httrack.com/parsing/foo+bar+plus3860.html" \
-	--found "ut.httrack.com/parsing/foo barae52.html" \
-	--found "ut.httrack.com/parsing/foo bar7b30.html" \
-	httrack http://ut.httrack.com/parsing/escaping.html
+    --found ut.httrack.com/parsing/escaping.html \
+    --found "ut.httrack.com/parsing/foo bar30f4.html" \
+    --found "ut.httrack.com/parsing/foo bar5e1f.html" \
+    --found "ut.httrack.com/parsing/foo+bar+plus3860.html" \
+    --found "ut.httrack.com/parsing/foo barae52.html" \
+    --found "ut.httrack.com/parsing/foo bar7b30.html" \
+    httrack http://ut.httrack.com/parsing/escaping.html

 # handling of # encoded in filename
 # see http://code.google.com/p/httrack/issues/detail?id=25
 bash crawl-test.sh --errors 2 --files 4 \
-	--found "ut.httrack.com/parsing/escaping2.html" \
-	--found "ut.httrack.com/parsing/++foo++bar++plus++.html" \
-	--found "ut.httrack.com/parsing/foo#bar#.html" \
-	--found "ut.httrack.com/parsing/foo bar.html" \
-	httrack http://ut.httrack.com/parsing/escaping2.html
+    --found "ut.httrack.com/parsing/escaping2.html" \
+    --found "ut.httrack.com/parsing/++foo++bar++plus++.html" \
+    --found "ut.httrack.com/parsing/foo#bar#.html" \
+    --found "ut.httrack.com/parsing/foo bar.html" \
+    httrack http://ut.httrack.com/parsing/escaping2.html
--- a/tests/12_crawl_https.test
+++ b/tests/12_crawl_https.test
@@ -3,11 +3,11 @@

 set -euo pipefail

-bash check-network.sh ||  ! echo "skipping online unit tests" || exit 77
+bash check-network.sh || ! echo "skipping online unit tests" || exit 77

 if test "${HTTPS_SUPPORT:-}" == "no"; then
-	echo "no https support compiled, skipping"
-	exit 77
+    echo "no https support compiled, skipping"
+    exit 77
 fi

 bash crawl-test.sh --errors 0 --files 5 httrack https://ut.httrack.com/simple/basic.html
--- a/tests/13_crawl_proxy_https.test
+++ b/tests/13_crawl_proxy_https.test
@@ -0,0 +1,136 @@
+#!/bin/bash
+#
+# Issue #85: an https crawl must go through the configured proxy (CONNECT
+# tunnel), not bypass it and hit the origin directly. Fully local: a self-signed
+# TLS origin plus a logging CONNECT proxy, so no network access is needed.
+
+set -euo pipefail
+
+: "${top_srcdir:=..}"
+
+if test "${HTTPS_SUPPORT:-}" == "no"; then
+    echo "no https support compiled, skipping"
+    exit 77
+fi
+if ! command -v python3 >/dev/null 2>&1 || ! command -v openssl >/dev/null 2>&1; then
+    echo "python3/openssl missing, skipping"
+    exit 77
+fi
+
+server="$top_srcdir/tests/proxy-https-server.py"
+tmpdir=$(mktemp -d)
+pids=
+
+cleanup() {
+    for pid in $pids; do
+        kill "$pid" 2>/dev/null || true
+    done
+    rm -rf "$tmpdir"
+}
+trap cleanup EXIT
+
+# self-signed cert for the local TLS origin (httrack does not verify certs)
+openssl req -x509 -newkey rsa:2048 -keyout "$tmpdir/key.pem" \
+    -out "$tmpdir/cert.pem" -days 2 -nodes -subj "/CN=127.0.0.1" \
+    >/dev/null 2>&1
+cat "$tmpdir/key.pem" "$tmpdir/cert.pem" >"$tmpdir/both.pem"
+
+# start_server <logdir> <mode>: launches a proxy+origin pair, sets $origin_port
+# and $proxy_port from its announced ephemeral ports.
+start_server() {
+    local dir="$1" mode="$2" ports
+    mkdir -p "$dir"
+    ports="$dir/ports.txt"
+    python3 "$server" "$tmpdir/both.pem" "$dir" "$mode" \
+        >"$ports" 2>"$dir/server.err" &
+    pids="$pids $!"
+    for _ in $(seq 1 100); do
+        grep -q "^ready" "$ports" 2>/dev/null && break
+        sleep 0.1
+    done
+    grep -q "^ready" "$ports" 2>/dev/null || {
+        echo "server ($mode) did not start" >&2
+        cat "$dir/server.err" >&2
+        exit 1
+    }
+    origin_port=$(awk '/^ORIGIN/{print $2}' "$ports")
+    proxy_port=$(awk '/^PROXY/{print $2}' "$ports")
+}
+
+# Run httrack, but kill it after a deadline so a hang (e.g. a missing bound on
+# the proxy response) surfaces as the kill code $HANG_RC instead of stalling the
+# whole job. A portable stand-in for `timeout`, which macOS lacks.
+HANG_RC=137 # 128 + SIGKILL
+run_crawl() {
+    local out="$1" proxy="$2" port="$3"
+    rm -rf "$out"
+    httrack "https://127.0.0.1:${port}/" --proxy "$proxy" \
+        -O "$out" -r1 -s0 --timeout=10 >"$out.log" 2>&1 &
+    local pid=$!
+    (sleep 60 && kill -9 "$pid" 2>/dev/null) &
+    local guard=$!
+    local rc=0
+    wait "$pid" 2>/dev/null || rc=$?
+    kill "$guard" 2>/dev/null || true
+    wait "$guard" 2>/dev/null || true
+    return "$rc"
+}
+
+# --- working proxy ----------------------------------------------------------
+ok="$tmpdir/ok"
+start_server "$ok" ok
+
+# 1. page retrieved AND the proxy saw a CONNECT to the origin
+run_crawl "$ok/out" "127.0.0.1:${proxy_port}" "$origin_port"
+grep -rq "ORIGIN-PAGE-85" "$ok/out" || {
+    echo "FAIL: origin page not downloaded through proxy" >&2
+    cat "$ok/out.log" >&2
+    exit 1
+}
+grep -q "^CONNECT 127.0.0.1:${origin_port} " "$ok/proxy.log" || {
+    echo "FAIL: proxy never received a CONNECT (https bypassed the proxy)" >&2
+    cat "$ok/proxy.log" >&2
+    exit 1
+}
+echo "OK: https tunneled through proxy via CONNECT"
+
+# 2. authenticated proxy: creds ride the CONNECT, and NEVER reach the origin
+: >"$ok/proxy.log"
+: >"$ok/origin-headers.log"
+run_crawl "$ok/out2" "user:secret@127.0.0.1:${proxy_port}" "$origin_port"
+grep -rq "ORIGIN-PAGE-85" "$ok/out2" || {
+    echo "FAIL: origin page not downloaded through authenticated proxy" >&2
+    exit 1
+}
+got=$(awk '/^AUTH Basic /{print $3}' "$ok/proxy.log" | head -1)
+# base64("user:secret"); compared as a literal to stay portable (no base64 -d,
+# which differs between GNU and BSD)
+test "$got" == "dXNlcjpzZWNyZXQ=" || {
+    echo "FAIL: Proxy-Authorization not carried on CONNECT (got '$got')" >&2
+    cat "$ok/proxy.log" >&2
+    exit 1
+}
+if grep -qi "proxy-authorization" "$ok/origin-headers.log"; then
+    echo "FAIL: proxy credentials leaked to the origin through the tunnel" >&2
+    cat "$ok/origin-headers.log" >&2
+    exit 1
+fi
+echo "OK: proxy credentials carried on CONNECT, not leaked to origin"
+
+# --- hostile proxy ----------------------------------------------------------
+# A proxy that answers 200 then streams headers forever must not hang the crawl:
+# the client bounds the response. run_crawl kills a hung httrack after 60s, so a
+# missing bound surfaces as $HANG_RC here.
+flood="$tmpdir/flood"
+start_server "$flood" flood
+rc=0
+run_crawl "$flood/out" "127.0.0.1:${proxy_port}" "$origin_port" || rc=$?
+test "$rc" -ne "$HANG_RC" || {
+    echo "FAIL: crawl hung on a flooding proxy (bounded read missing)" >&2
+    exit 1
+}
+grep -rq "ORIGIN-PAGE-85" "$flood/out" 2>/dev/null && {
+    echo "FAIL: flooding proxy unexpectedly served the page" >&2
+    exit 1
+}
+echo "OK: bounded proxy response, no hang on a flooding proxy"
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -2,6 +2,7 @@
 # explicitly: automake does not expand wildcards in EXTRA_DIST, so a glob would
 # silently drop it from the dist tarball and break "make distcheck".
 EXTRA_DIST = $(TESTS) crawl-test.sh run-all-tests.sh check-network.sh \
+	proxy-https-server.py \
 	fixtures/cache-golden/hts-cache/new.zip

 TESTS_ENVIRONMENT =
@@ -24,6 +25,7 @@ TESTS = \
 	01_engine-cache-golden.test \
 	01_engine-charset.test \
 	01_engine-cmdline.test \
+	01_engine-cookies.test \
 	01_engine-copyopt.test \
 	01_engine-doitlog.test \
 	01_engine-entities.test \
@@ -33,6 +35,7 @@ TESTS = \
 	01_engine-mime.test \
 	01_engine-parse.test \
 	01_engine-rcfile.test \
+	01_engine-relative.test \
 	01_engine-simplify.test \
 	01_engine-strsafe.test \
 	02_manpage-regen.test \
@@ -43,6 +46,7 @@ TESTS = \
 	11_crawl-international.test \
 	11_crawl-longurl.test \
 	11_crawl-parsing.test \
-	12_crawl_https.test
+	12_crawl_https.test \
+	13_crawl_proxy_https.test

 CLEANFILES = check-network_sh.cache
--- a/tests/check-network.sh
+++ b/tests/check-network.sh
@@ -6,39 +6,39 @@

 # do not enable online tests (./configure --disable-online-unit-tests)
 if test "$ONLINE_UNIT_TESTS" == "no"; then
-echo "online tests are disabled" >&2
-exit 1
+    echo "online tests are disabled" >&2
+    exit 1

 # enable online tests (--enable-online-unit-tests)
 elif test "$ONLINE_UNIT_TESTS" == "yes"; then
-exit 0
+    exit 0

 # check if online tests are reachable
 else

-# test url
-url=http://ut.httrack.com/enabled
+    # test url
+    url=http://ut.httrack.com/enabled

-# cache file name
-cache=check-network_sh.cache
+    # cache file name
+    cache=check-network_sh.cache

-# cached result ?
-if test -f $cache ; then
-	if grep -q "ok" $cache ; then
-		exit 0
-	else
-		echo "online tests are disabled (cached)" >&2
-		exit 1
-	fi
+    # cached result ?
+    if test -f $cache; then
+        if grep -q "ok" $cache; then
+            exit 0
+        else
+            echo "online tests are disabled (cached)" >&2
+            exit 1
+        fi

-# fetch single file
-elif bash crawl-test.sh --errors 0 --files 1 httrack --timeout=3 --max-time=3 "$url" 2>/dev/null >/dev/null ; then
-	echo "ok" > $cache
-	exit 0
-else
-	echo "error" > $cache
-	echo "online tests are disabled (auto)" >&2
-	exit 1
-fi
+    # fetch single file
+    elif bash crawl-test.sh --errors 0 --files 1 httrack --timeout=3 --max-time=3 "$url" 2>/dev/null >/dev/null; then
+        echo "ok" >$cache
+        exit 0
+    else
+        echo "error" >$cache
+        echo "online tests are disabled (auto)" >&2
+        exit 1
+    fi

 fi
--- a/tests/crawl-test.sh
+++ b/tests/crawl-test.sh
@@ -2,185 +2,184 @@
 #

 function warning {
-  echo "** $*" >&2
-  return 0
+    echo "** $*" >&2
+    return 0
 }

 function die {
-  warning "$*"
-  exit 1
+    warning "$*"
+    exit 1
 }

 function debug {
-  if test -n "$verbose"; then
-    echo "$*" >&2
-  fi
+    if test -n "$verbose"; then
+        echo "$*" >&2
+    fi
 }

 function info {
-  printf "[$*] ..\t" >&2
+    printf '[%s] ..\t' "$*" >&2
 }

 function result {
-  echo "$*" >&2
+    echo "$*" >&2
 }

 function cleanup {
-  debug "cleaning function called"
-  if test -n "$tmpdir"; then
-    if test -d "$tmpdir"; then
-      if test -z "$nopurge"; then
-        debug "cleaning up $tmpdir"
-        rm -rf "$tmpdir"
-      fi
+    debug "cleaning function called"
+    if test -n "$tmpdir"; then
+        if test -d "$tmpdir"; then
+            if test -z "$nopurge"; then
+                debug "cleaning up $tmpdir"
+                rm -rf "$tmpdir"
+            fi
+        fi
+    fi
+    if test -n "$crawlpid"; then
+        debug "killing $crawlpid"
+        kill -9 "$crawlpid"
+        crawlpid=
    fi
-  fi
-  if test -n "$crawlpid"; then
-    debug "killing $crawlpid"
-    kill -9 "$crawlpid"
-    crawlpid=
-  fi
 }

 function usage {
-  cat << EOF
+    cat <<EOF
 usage: $0
 EOF
 }

 function assert_equals {
-  info "$1"
-  if test ! "$2" == "$3"; then
-    result "expected '$2', got '$3'"
-    exit 1
-  else
-    result "OK ($2)"
-  fi
+    info "$1"
+    if test ! "$2" == "$3"; then
+        result "expected '$2', got '$3'"
+        exit 1
+    else
+        result "OK ($2)"
+    fi
 }

 function start-crawl {
-  # parse args
-  pos=1
-  while test "$#" -ge "$pos" ; do
-    case "${!pos}" in
-    --debug)
-      verbose=1
-      ;;
-    --no-purge|--summary|--print-files)
-      ;;
-    --errors|--files|--found|--not-found|--directory)
-      pos=$[${pos}+1]
-      test "$#" -ge "$pos" || warning "missing argument" || return 1
-      ;;
-    httrack)
-      pos=$[${pos}+1]
-      break;
-      ;;
-    *)
-      warning "unrecognized option ${!pos}"
-      return 1
-      ;;
-    esac
-    pos=$[${pos}+1]
-  done
-  debug "remaining args: ${@:${pos}}"
+    # parse args
+    pos=1
+    while test "$#" -ge "$pos"; do
+        case "${!pos}" in
+        --debug)
+            verbose=1
+            ;;
+        --no-purge | --summary | --print-files) ;;
+        --errors | --files | --found | --not-found | --directory)
+            pos=$((pos + 1))
+            test "$#" -ge "$pos" || warning "missing argument" || return 1
+            ;;
+        httrack)
+            pos=$((pos + 1))
+            break
+            ;;
+        *)
+            warning "unrecognized option ${!pos}"
+            return 1
+            ;;
+        esac
+        pos=$((pos + 1))
+    done
+    debug "remaining args: ${*:pos}"

-  # ut/ won't exceed 2 minutes
-  moreargs="--quiet --max-time=120 --timeout=30 --connection-per-second=5"
+    # ut/ won't exceed 2 minutes
+    moreargs=(--quiet --max-time=120 --timeout=30 --connection-per-second=5)

-  # proxy environment ?
-  if test -n "$http_proxy"; then
-    moreargs="$moreargs --proxy $http_proxy"
-  fi
+    # proxy environment ?
+    if test -n "${http_proxy:-}"; then
+        moreargs+=(--proxy "$http_proxy")
+    fi

-  test -n "$tmpdir" || ! warning "no tmpdir" || return 1
-  tmp="${tmpdir}/crawl"
-  rm -rf "$tmp"
-  mkdir "$tmp" || ! warning "could not create $tmp" || return 1
-
-  which httrack >/dev/null || ! warning "could not find httrack" || return 1
-  ver=$(httrack -O /dev/null --version | sed -e 's/HTTrack version //')
-  test -n "$ver" || ! warning "could not run httrack" || return 1
-
-  # start crawl
-  log="${tmp}/log"
-  debug starting httrack -O "${tmp}" ${moreargs} ${@:${pos}}
-  info "running httrack ${@:${pos}}"
-  httrack -O "${tmp}" --user-agent="httrack $ver ut ($(uname -omrs))" ${moreargs} ${@:${pos}} >"${log}" 2>&1 &
-  crawlpid="$!"
-  debug "started cralwer on pid $crawlpid"
-  wait "$crawlpid"
-  result="$?"
-  crawlpid=
-  test "$result" -eq 0 || ! result "error code $result" || return 1
-  result "OK"
-  grep -iE "^[0-9\:]*[[:space:]]Error:" "${tmp}/hts-log.txt" >&2
-
-  # now audit
-  while test "$#" -gt 0; do
-    case "$1" in
-    --no-purge)
-      nopurge=1
-      ;;
-    --summary)
-      grep -E "^HTTrack Website Copier/[^ ]* mirror complete in " "${tmp}/hts-log.txt"
-      ;;
-    --print-files)
-      find "${tmp}" -mindepth 1 -type f
-      ;;
-    --errors)
-      shift
-      assert_equals "checking errors" "$1" "$(grep -iEc "^[0-9\:]*[[:space:]]Error:" "${tmp}/hts-log.txt")"
-      ;;
-    --found)
-      shift
-      info "checking for $1"
-      if test -f "${tmp}/$1" ; then
-        result "OK"
-      else
-        result "not found"
-        exit 1
-      fi
-      ;;
-    --not-found)
-      shift
-      info "checking for $1"
-      if test -f "${tmp}/$1" ; then
-        result "OK"
-      else
-        result "not found"
-        exit 1
-      fi
-      ;;
-    --directory)
-      shift
-      info "checking for $1"
-      if test -d "${tmp}/$1" ; then
-        result "OK"
-      else
-        result "not found"
-        exit 1
-      fi
-      ;;
-    --files)
-      shift
-      nFiles=$(grep -E "^HTTrack Website Copier/[^ ]* mirror complete in " "${tmp}/hts-log.txt" \
-        | sed -e 's/.*[[:space:]]\([^ ]*\)[[:space:]]files written.*/\1/g')
-      assert_equals "checking files" "$1" "$nFiles"
-      ;;
-    httrack)
-      break;
-      ;;
-    esac
-    shift
-  done
-
-  # cleanup
-  if test -z "$nopurge"; then
+    test -n "$tmpdir" || ! warning "no tmpdir" || return 1
+    tmp="${tmpdir}/crawl"
    rm -rf "$tmp"
-  else
-    tmpdir=
-  fi
+    mkdir "$tmp" || ! warning "could not create $tmp" || return 1
+
+    which httrack >/dev/null || ! warning "could not find httrack" || return 1
+    ver=$(httrack -O /dev/null --version | sed -e 's/HTTrack version //')
+    test -n "$ver" || ! warning "could not run httrack" || return 1
+
+    # start crawl
+    log="${tmp}/log"
+    debug starting httrack -O "${tmp}" "${moreargs[@]}" "${@:pos}"
+    info "running httrack ${*:pos}"
+    httrack -O "${tmp}" --user-agent="httrack $ver ut ($(uname -omrs))" "${moreargs[@]}" "${@:pos}" >"${log}" 2>&1 &
+    crawlpid="$!"
+    debug "started cralwer on pid $crawlpid"
+    wait "$crawlpid"
+    result="$?"
+    crawlpid=
+    test "$result" -eq 0 || ! result "error code $result" || return 1
+    result "OK"
+    grep -iE "^[0-9\:]*[[:space:]]Error:" "${tmp}/hts-log.txt" >&2
+
+    # now audit
+    while test "$#" -gt 0; do
+        case "$1" in
+        --no-purge)
+            nopurge=1
+            ;;
+        --summary)
+            grep -E "^HTTrack Website Copier/[^ ]* mirror complete in " "${tmp}/hts-log.txt"
+            ;;
+        --print-files)
+            find "${tmp}" -mindepth 1 -type f
+            ;;
+        --errors)
+            shift
+            assert_equals "checking errors" "$1" "$(grep -iEc "^[0-9\:]*[[:space:]]Error:" "${tmp}/hts-log.txt")"
+            ;;
+        --found)
+            shift
+            info "checking for $1"
+            if test -f "${tmp}/$1"; then
+                result "OK"
+            else
+                result "not found"
+                exit 1
+            fi
+            ;;
+        --not-found)
+            shift
+            info "checking for $1"
+            if test -f "${tmp}/$1"; then
+                result "OK"
+            else
+                result "not found"
+                exit 1
+            fi
+            ;;
+        --directory)
+            shift
+            info "checking for $1"
+            if test -d "${tmp}/$1"; then
+                result "OK"
+            else
+                result "not found"
+                exit 1
+            fi
+            ;;
+        --files)
+            shift
+            nFiles=$(grep -E "^HTTrack Website Copier/[^ ]* mirror complete in " "${tmp}/hts-log.txt" |
+                sed -e 's/.*[[:space:]]\([^ ]*\)[[:space:]]files written.*/\1/g')
+            assert_equals "checking files" "$1" "$nFiles"
+            ;;
+        httrack)
+            break
+            ;;
+        esac
+        shift
+    done
+
+    # cleanup
+    if test -z "$nopurge"; then
+        rm -rf "$tmp"
+    else
+        tmpdir=
+    fi
 }

 # check args
@@ -195,7 +194,7 @@ tmpdir=
 crawlpid=
 nopurge=
 verbose=
-trap "cleanup" 0 1 2 3 4 5 6 7 8 9 11 13 14 15 16 19 24 25
+trap cleanup EXIT HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV PIPE ALRM TERM STKFLT XCPU XFSZ

 # working directory
 tmpdir="${tmptopdir}/httrack_ut.$$"
--- a/tests/proxy-https-server.py
+++ b/tests/proxy-https-server.py
@@ -0,0 +1,151 @@
+#!/usr/bin/env python3
+"""Local CONNECT proxy + self-signed HTTPS origin for the issue #85 test.
+
+Starts a TLS origin server and an HTTP proxy that honours CONNECT, on ephemeral
+ports. Every request line the proxy receives (and any Proxy-Authorization) is
+appended to the proxy log; every header the origin receives over the tunnel is
+appended to the origin log. That lets the test assert both that an https crawl
+tunneled through the proxy and that proxy credentials never leaked to the origin.
+
+Proxy modes (argv[3], default "ok"):
+  ok    - honour CONNECT and tunnel to the origin
+  flood - answer 200 then stream headers forever with no blank line, to exercise
+          the client's bound on the proxy response (must not hang the crawl)
+
+Usage: proxy-https-server.py <cert.pem> <logdir> [mode]
+Prints "ORIGIN <port>", "PROXY <port>", then "ready" (one per line) on stdout.
+"""
+import http.server
+import os
+import socket
+import socketserver
+import ssl
+import sys
+import threading
+
+ORIGIN_BODY = b"<html><body>ORIGIN-PAGE-85</body></html>"
+PROXY_LOG = "proxy.log"
+ORIGIN_LOG = "origin-headers.log"
+
+
+def make_origin(logdir):
+    class Origin(http.server.BaseHTTPRequestHandler):
+        def do_GET(self):
+            with open(os.path.join(logdir, ORIGIN_LOG), "a") as handle:
+                for key in self.headers.keys():
+                    handle.write(key + "\n")
+            self.send_response(200)
+            self.send_header("Content-Type", "text/html")
+            self.send_header("Content-Length", str(len(ORIGIN_BODY)))
+            self.end_headers()
+            self.wfile.write(ORIGIN_BODY)
+
+        def log_message(self, *args):
+            pass
+
+    return Origin
+
+
+def start_origin(certfile, logdir):
+    httpd = socketserver.TCPServer(("127.0.0.1", 0), make_origin(logdir))
+    ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
+    ctx.load_cert_chain(certfile)
+    httpd.socket = ctx.wrap_socket(httpd.socket, server_side=True)
+    port = httpd.socket.getsockname()[1]
+    threading.Thread(target=httpd.serve_forever, daemon=True).start()
+    return port
+
+
+def pipe(src, dst):
+    try:
+        while True:
+            data = src.recv(65536)
+            if not data:
+                break
+            dst.sendall(data)
+    except OSError:
+        pass
+    finally:
+        for sock in (src, dst):
+            try:
+                sock.shutdown(socket.SHUT_RDWR)
+            except OSError:
+                pass
+
+
+def handle_client(conn, logdir, mode):
+    rfile = conn.makefile("rb")
+    request_line = rfile.readline().decode("latin-1").strip()
+    auth = None
+    while True:
+        line = rfile.readline().decode("latin-1")
+        if line in ("\r\n", "\n", ""):
+            break
+        key, _, value = line.partition(":")
+        if key.strip().lower() == "proxy-authorization":
+            auth = value.strip()
+    with open(os.path.join(logdir, PROXY_LOG), "a") as handle:
+        handle.write(request_line + "\n")
+        if auth is not None:
+            handle.write("AUTH " + auth + "\n")
+    parts = request_line.split()
+    if not (len(parts) >= 2 and parts[0] == "CONNECT"):
+        conn.sendall(b"HTTP/1.0 501 Not Implemented\r\n\r\n")
+        conn.close()
+        return
+    if mode == "flood":
+        # 200, then an endless header stream with no terminating blank line: the
+        # client must bound this and give up, not hang.
+        try:
+            conn.sendall(b"HTTP/1.0 200 Connection established\r\n")
+            while True:
+                conn.sendall(b"X-Pad: 0123456789\r\n")
+        except OSError:
+            pass
+        conn.close()
+        return
+    host, _, port = parts[1].partition(":")
+    try:
+        upstream = socket.create_connection((host, int(port or 443)))
+    except OSError:
+        conn.sendall(b"HTTP/1.0 502 Bad Gateway\r\n\r\n")
+        conn.close()
+        return
+    conn.sendall(b"HTTP/1.0 200 Connection established\r\n\r\n")
+    threading.Thread(target=pipe, args=(conn, upstream), daemon=True).start()
+    pipe(upstream, conn)
+
+
+def start_proxy(logdir, mode):
+    srv = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+    srv.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+    srv.bind(("127.0.0.1", 0))
+    srv.listen(16)
+    port = srv.getsockname()[1]
+
+    def serve():
+        while True:
+            conn, _ = srv.accept()
+            threading.Thread(
+                target=handle_client, args=(conn, logdir, mode), daemon=True
+            ).start()
+
+    threading.Thread(target=serve, daemon=True).start()
+    return port
+
+
+def main():
+    certfile, logdir = sys.argv[1], sys.argv[2]
+    mode = sys.argv[3] if len(sys.argv) > 3 else "ok"
+    for name in (PROXY_LOG, ORIGIN_LOG):
+        open(os.path.join(logdir, name), "w").close()
+    origin_port = start_origin(certfile, logdir)
+    proxy_port = start_proxy(logdir, mode)
+    print("ORIGIN %d" % origin_port, flush=True)
+    print("PROXY %d" % proxy_port, flush=True)
+    print("ready", flush=True)
+    threading.Event().wait()
+
+
+if __name__ == "__main__":
+    main()
--- a/tests/run-all-tests.sh
+++ b/tests/run-all-tests.sh
@@ -2,19 +2,19 @@
 #

 error=0
-for i in *.test ; do
-	if bash $i ; then
-		echo "$i: passed" >&2
-	else
-		echo "$i: ERROR" >&2
-		error=$[${error}+1]
-	fi
+for i in *.test; do
+    if bash "$i"; then
+        echo "$i: passed" >&2
+    else
+        echo "$i: ERROR" >&2
+        error=$((error + 1))
+    fi
 done

 if test "$error" -eq 0; then
-	echo "all tests passed" >&2
+    echo "all tests passed" >&2
 else
-	echo "${error} test(s) failed" >&2
+    echo "${error} test(s) failed" >&2
 fi

 exit $error
Author	SHA1	Message	Date
Xavier Roche	fe7041ddbf	Address review: keep empty-PATH parity, fold the CI script list Review of the array refactor flagged one behaviour divergence: splitting PATH with `IFS=: read -ra` keeps empty fields (from doubled or leading colons) as "" elements, where the old `echo $PATH \| tr : ' '` word-split dropped them, so the search loop would probe /htsserver. Skip the empty fields to restore exact parity. Also reflow the CI SHELL_SCRIPTS list as a folded block scalar, one entry per line and sorted, so it reads cleanly; the folded value is the same space-separated string. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 12:39:31 +02:00
Xavier Roche	f5543df1af	ci: lint every shell script with shellcheck and shfmt The lint job only covered a handful of scripts; bootstrap, build.sh, the generators, webhttrack, the CGI search helper and the crawl/run-all test harnesses went unchecked, and shfmt ran on three files. Now both linters run over the whole tracked shell tree, listed once in a job-level env var so the two steps stay in sync. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 11:37:09 +02:00
Xavier Roche	fee30aa95d	Make every shell script shellcheck-clean Fix the shellcheck findings the shfmt pass left behind, all proven behaviour-preserving: - Quote single-value expansions, drop the redundant ${} in arithmetic, add read -r, and use printf '%s' instead of variables in format strings, across the generators, crawl-test.sh, run-all-tests.sh and search.sh. - crawl-test.sh / webhttrack: turn the deliberately word-split search lists into bash arrays (space-safe, no scattered disables) and replace the numeric trap signal lists with names, dropping the un-trappable KILL/STOP that bash silently ignored anyway. - search.sh: drop the bogus \" escapes that made grep search for a literal-quoted pattern. The generators are exercised by hand and ship their committed output (htscodepages.h, htsentities.h); a differential run on synthetic input confirms byte-identical output before and after. crawl-test.sh and webhttrack were run end to end against a local server / a faked install, the latter also proving the array search now survives spaces in paths. SC2153/SC2120 false positives carry a scoped disable with a reason. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 11:35:55 +02:00
Xavier Roche	f9f4700ee1	Reformat every shell script with shfmt -i 4 Mechanical pass: run shfmt -i 4 over the whole tracked shell tree (the test harness .test files, the regen generators, webhttrack, the CGI search helper, and the build/dist scripts) so they share one style. shfmt also normalised backticks to $(...) and $[..] to $((..)). No behaviour change: arithmetic is preserved exactly, non-ASCII bytes are untouched, and the full make check suite still passes. The tab indented .test files become 4-space indented, hence the wide diff. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 11:24:01 +02:00
Xavier Roche	f030fa21e3	Merge pull request #401 from xroche/fix/relative-path-dotdot-137-162 Test the relative-link engine; collapse ../ in file:// URLs	2026-06-20 11:15:53 +02:00
Xavier Roche	bdd1c1bc2c	Test the relative-link engine; collapse ../ in file:// URLs The ../-handling tickets #137 (embedded ../ in a URL) and #162 (cross-host "too many ../") do not reproduce on master or the released 3.49.x: the engine has resolved embedded, cross-host, out-of-scope and above-root ../ correctly since the 2012 import, and the released binary behaves identically. #137's actual breakage was a JS-generated iframe URL (httrack can't rewrite dynamically-built links); #162 is a long-gone Windows path quirk. The area was nearly untested, though, despite feeding both link rewriting and crawl-scope decisions: two trivial lienrelatif asserts, none for ident_url_relatif. Add a wide regression net via two hidden debug probes (-#l lienrelatif, -#i ident_url_relatif, mirroring -#1 fil_simplifie) driving tens of cases in tests/01_engine-relative.test (embedded/cross-host/sibling/ ancestor/above-root ../, query stripping, scheme handling), plus the missing fil_simplifie edge cases (absolute paths, root clamp, query freeze) in 01_engine-simplify.test. Expected values are computed by hand, not echoed. While covering it, fixed one real gap: the file:// branch of ident_url_absolute skipped the fil_simplifie its http sibling runs, so file:// URLs kept their ../ in adrfil->fil while the save path was already collapsed (htsname.c:1343). Collapsing it matches the other schemes, contains traversal at the file:// root, and dedups a/../b against b. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 11:14:28 +02:00
Xavier Roche	56665a268f	Merge pull request #400 from xroche/fix/css-url-paren-163 Encode parens in rewritten CSS url() so the value isn't truncated (#163)	2026-06-20 10:02:32 +02:00
Xavier Roche	2e948b9acd	htsparse: percent-encode parens in rewritten CSS url() (#163 ) A source url(...) whose target encodes '(' ')' as %28/%29 was rewritten with literal parens, because they are RFC2396 "mark" characters that the URI escaper (escape_uri_utf, mode 30) leaves alone. In an unquoted CSS url(...) the literal ')' closes the token early, so the browser mis-parses the value and drops the background image. Re-escape '(' and ')' back to %28/%29 when emitting the link, gated on the url() context (ending_p == ')'). The UA decodes them to the saved-on-disk name, so the reference still resolves. Quoted url("...") and ordinary HTML attributes keep their parens, matching prior behavior. Test in 01_engine-parse.test crawls a CSS fixture whose url() references a %20%28...%29 name and asserts the rewrite keeps the parens encoded; negative control confirmed (literal-paren output fails it). Closes #163 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-20 10:01:17 +02:00
Xavier Roche	cae11499f1	Merge pull request #399 from xroche/fix/js-string-falsepos-218 htsparse: don't treat XHR.open's method argument as a URL (#218)	2026-06-19 20:36:26 +02:00
Xavier Roche	02c7f4ebf6	htsparse: don't treat XHR.open's method argument as a URL (#218 ) The JavaScript URL detector matched `.open(` for window.open("url",...) and captured the first argument as a link. XMLHttpRequest.open(method, url) puts the HTTP method first, so `xhr.open("GET", "ajax_info.txt")` turned "GET" into a bogus link, rewritten to "GET.html" on a live server. Reject a first argument that is exactly an HTTP method, mirroring the existing ensure_not_mime guard. window.open(url) is unaffected; the real XHR url (the second argument) is still picked up by the dirty parser. Closes #218 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-19 20:27:04 +02:00
Xavier Roche	9070b44a70	Merge pull request #398 from xroche/fix/html-underflow-396 htsparse: fix buffer underflow reading *(html-1) at offset 0 (#396)	2026-06-19 19:55:40 +02:00
Xavier Roche	799c045061	htsparse: don't read (html-1) before the parse buffer (#396 ) The link detector's word-boundary guards dereference (html-1) to check the byte preceding a matched token. When the token sits at the very start of the parse buffer (html == r->adr), that reads one byte before the allocation: a heap-buffer-overflow under ASan, silent on a normal build. A stylesheet beginning with a url() token is enough to hit it. Route the three reachable guards (url(), location=, the makeindex /title check) through html_prevc(), which returns a space sentinel at the buffer start. Space is the right value for these tests: a token at offset 0 is at a word boundary, so it stays a valid match. The other *(html-1) sites only run after html has advanced past an opening tag or quote. Covers it with an offset-0 url() fixture in 01_engine-parse.test; without the fix it aborts at htsparse.c:1386 under the CI sanitizer job. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-19 19:44:25 +02:00
Xavier Roche	fb1ee3bf2e	Merge pull request #397 from xroche/fix/css-import-94 CSS @import: capture URLs that carry a media/supports/layer condition (#94)	2026-06-19 19:30:21 +02:00
Xavier Roche	6a08ca7d39	htsparse: bound the URL-end scan against a missing closing delimiter Reviewing the @import change, ASan flagged a pre-existing heap overflow: when a quoted/parenthesized link token has no closing delimiter before the buffer ends (truncated input such as `@import "x`, `@import "`, `url("x`), the scan stops at the terminating NUL, then `c += ndelim` steps past it and `while (c == ' ')` / the terminator test read out of bounds. Such input aborts under ASan on master. Skip the URL-end scan and capture when no closing delimiter was found (`c == '\0'` right after the scan); c never advances past the NUL. Well-formed tokens are unaffected. 01_engine-parse.test gains a truncated-@import fixture (the valid sibling import is still captured, the unterminated one is not) that trips the overflow under the CI ASan job, plus a check that an @import's trailing media/supports/layer condition survives the rewrite verbatim. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-19 19:25:39 +02:00
Xavier Roche	a8b491e509	htsparse: capture conditional CSS @import URLs (#94 ) A bare-string @import carrying a media/supports/layer condition, e.g. `@import "theme.css" screen;`, was dropped. The detector required the closing quote to be immediately followed by the statement terminator, so the trailing condition aborted the capture. The `url(...)` form already worked because it terminates at the paren. Two coupled defects in the inscript/CSS detector: - accept a whitespace-separated trailing condition after a quoted @import URL; - bound the captured URL at its last content char (b) instead of recomputing from the terminator. The old `c -= (ndelim + 1)` mishandled spaces skipped before the terminator, leaving the closing quote inside the range so the bogus-link guard aborted. That also silently broke `foo="url" ;` (a space before the semicolon) for every quoted detection, not only @import. 01_engine-parse.test gains a CSS @import section that crawls a .css directly; the conditioned cases are negative controls that fail without the fix. Closes #94 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-19 18:46:31 +02:00
Xavier Roche	a8e4bb3b81	Merge pull request #395 from xroche/fix/xmlns-false-links-191 Don't crawl xmlns namespace declarations	2026-06-19 18:28:23 +02:00
Xavier Roche	0145ec37a3	htsparse: don't crawl xmlns namespace declarations (#191 ) The "dirty parsing" heuristic accepts any tag attribute whose value looks like a URL unless the attribute is on the no-detect list. xmlns and xmlns:prefix declarations carry namespace URIs (xmlns:og="http://ogp.me/ns#", etc.) that are not resources, so httrack queued and fetched them, stalling the crawl on unrelated spec URLs. Reject xmlns/xmlns:prefix where the no-detect list is already consulted. 01_engine-parse.test grows a fixture with each form (default and prefixed) as the last attribute of its element, since the heuristic only inspects an attribute whose value is immediately followed by '>'; the targets are local file:// gifs so a regression actually downloads them (verified: reverting the guard fetches all three). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-19 18:24:55 +02:00
Xavier Roche	a80fab38ba	Merge pull request #394 from xroche/fix/proxy-https-connect-85 Tunnel https through the proxy via CONNECT (#85)	2026-06-19 18:03:31 +02:00
Xavier Roche	c52a524a63	htslib: bound the proxy CONNECT response; harden + cover review findings Follow-up to the CONNECT-tunnel change, from an adversarial review (the proxy response is hostile input: a malicious or MITM proxy controls every byte). - Bound the response read so a proxy cannot stall the single-threaded back_wait crawl: proxy_getline now fails on an over-long line instead of consuming it forever, the header drain is capped at 64 lines, and the send loop gives up rather than spin against a socket that reports writable but never accepts. - Size `authority` to hold any url_adr host (HTS_URLMAXSIZE*2) so an oversized hostname can't trip the abort-on-overflow buff helpers; grow `req` to match. - Reject control bytes in the CONNECT authority as a local backstop; today the CR/LF defense lives entirely upstream (escape_remove_control / header-line splitting). - Test: the origin now records the headers it receives, and the test asserts Proxy-Authorization never reaches the origin through the tunnel (the previous assertions couldn't see a leak). Added a flooding-proxy scenario that proves the crawl terminates instead of hanging on an unbounded response. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-19 09:52:10 +02:00
Xavier Roche	1907621d37	htslib: tunnel https through the proxy via CONNECT (#85 ) httrack opened https connections straight to the origin even when a proxy was configured, so --proxy was silently ignored for https and the crawler used the real IP. http_xfopen bypassed the proxy for any https:// URL, because the absolute-URI proxy form it uses for http cannot carry https. Connect to the proxy instead and, once the TCP connection is up, open an HTTP CONNECT tunnel (http_proxy_tunnel) before the TLS handshake, so TLS runs end-to-end with the origin. Proxy credentials now ride the CONNECT request rather than the tunneled GET, where they would leak to the origin. The exchange is a bounded blocking read inside the back_wait connect path: no new async state, no struct/ABI change (the helpers stay visibility-hidden). Verified end-to-end by 13_crawl_proxy_https.test: it crawls a local self-signed https origin through a logging CONNECT proxy and asserts the proxy saw the CONNECT and that credentials ride it. The assertion fails on the pre-fix bypass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-19 08:43:56 +02:00
Xavier Roche	3b2d7afdaa	Merge pull request #393 from xroche/fix/empty-footer-doitlog-106 Keep empty quoted args when reloading doit.log (#106)	2026-06-19 08:13:19 +02:00
Xavier Roche	6ee539619e	htscoremain: keep empty quoted args when reloading doit.log (#106 ) An empty footer (-%F "") is written to hts-cache/doit.log correctly as the two-character token "", and next_token() unquotes it back to an empty string. But the doit.log reload loop only re-inserted a token when strnotempty(lastp), which dropped the empty one. With its argument gone, -%F absorbed the following token (or had none), so a no-url --continue/--update reprise misparsed and failed. Track whether the token started with a quote (before next_token() strips it in place) and keep it even when empty, so "" survives the round-trip. Whitespace gaps still produce no token, so spacing behavior is unchanged. 01_engine-doitlog.test gains a scenario that mirrors with -%F "" -r2, then on the no-url reprise checks the regenerated doit.log still round-trips the empty token -- probing the reader's rebuilt argv, not just that the reprise didn't crash. The trailing -r2 makes a dropped-token bug visible (it shifts into -%F's slot and panics) rather than a harmless run off the end of argv. Reverting only the guard makes the scenario fail (reprise exits 255). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-19 08:09:57 +02:00
Xavier Roche	fb098b27b4	Merge pull request #392 from xroche/fix/cookie-rfc6265-151 Drop $Version/$Path from the request Cookie header (#151)	2026-06-18 22:42:47 +02:00
Xavier Roche	5f6a3fb917	htslib: drop $Version/$Path from request Cookie header (#151 ) The request "Cookie:" header was built in the obsolete RFC 2965 style, emitting "$Version=1" before the first cookie and a "$Path=..." attribute after every value: Cookie: $Version=1; name=value; $Path=/; has_js=1; $Path=/ Servers expecting RFC 6265 treat $Version and $Path as stray cookies and reject or misread the request. Emit bare name=value pairs joined by "; ": Cookie: name=value; has_js=1 The cookie loop is factored out of http_sendhead into append_cookie_header (same logic, same buffer), with a thin http_cookie_header_selftest wrapper so the exact code path can be unit-tested. A new hidden "-#Q" subcommand builds the header for two same-domain cookies plus one on a different domain (which must be filtered out) and checks the output is the clean RFC 6265 form with no $Version/$Path and no cross-domain leak; driven by tests/01_engine-cookies.test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-18 22:12:28 +02:00
Xavier Roche	f9e676dbe3	Merge pull request #391 from xroche/feature/api-enum-callsites-savename83 htsopt: name the savename_83 enum and finish the call-site constant adoption	2026-06-18 21:43:34 +02:00
Xavier Roche	1b440c44b5	htsopt: name savename_83 enum and adopt enum constants at call sites Type opt->savename_83 as a new hts_savename_83 enum (LONG/DOS/ISO9660 = 0/1/2) and replace the remaining magic-number literals for the already- typed verbosedisplay and savename_delayed fields with their named enum constants across the engine. Behavior-preserving: every constant equals the literal it replaces, and a C enum is int-sized, so struct layout is unchanged (sizeof(httrackp) and offsetof(savename_83) are identical to origin/master, no soname bump). The -L option block is deliberately reflowed to clang-format style, which is what made the savename_83 retype tractable. Bitmask fields (travel/seeker/ getmode/parsejava/hostcontrol) intentionally stay int with named bit enums, per the existing flags-as-enum split. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>	2026-06-18 21:03:33 +02:00
Xavier Roche	ac6dd1a570	Merge pull request #390 from xroche/fix/copy-htsopt-unsigned-enum-guards copy_htsopt silently drops boolean option fields	2026-06-18 20:46:00 +02:00