Merge pull request #325 from xroche/docs/rfc2606-example-domains

docs: use www.example.com in examples; add html manual regen target
Merge pull request #324 from xroche/test/filter-escape-characterize
2026-06-13 22:04:07 +03:00 · 2026-06-13 10:41:24 +02:00 · 2026-06-13 10:17:24 +02:00 · 2026-06-13 10:15:45 +02:00 · 2026-06-13 10:12:09 +02:00 · 2026-06-13 10:05:40 +02:00
4 changed files with 65 additions and 43 deletions
--- a/README.md
+++ b/README.md
@@ -23,7 +23,7 @@ http://www.httrack.com/

 ## Compile trunk release
 ```sh
-git clone https://github.com/xroche/httrack.git --recurse
+git clone https://github.com/xroche/httrack.git --recurse-submodules
 cd httrack
 ./configure --prefix=$HOME/usr && make -j8 && make install
 ```
--- a/html/fcguide.html
+++ b/html/fcguide.html
@@ -181,17 +181,17 @@ used for some time.

 <p align=justify> The rest of this manual is dedicated to detailing what
 you find in the help message and providing examples - lots and lots of
-examples...  Here is what you get (page by page - use <enter> to move to
+examples...  Here is what you get (page by page - use &lt;enter&gt; to move to
 the next page in the real program) if you type 'httrack --help':

 <pre>
 >httrack --help
 HTTrack version 3.03BETAo4 (compiled Jul  1 2001)
-	usage: ./httrack <URLs [-option] [+<FILTERs>] [-<FILTERs>]
+	usage: ./httrack &lt;URLs&gt; [-option] [+&lt;FILTERs&gt;] [-&lt;FILTERs&gt;]
 	with options listed below: (* is the default value)

 General options:
-  O  path for mirror/logfiles+cache (-O path_mirror[,path_cache_and_logfiles]) (--path <param>)
+  O  path for mirror/logfiles+cache (-O path_mirror[,path_cache_and_logfiles]) (--path &lt;param&gt;)
 %O  top path if no path defined (-O path_mirror[,path_cache_and_logfiles])

 Action options:
@@ -202,7 +202,7 @@ Action options:
  Y   mirror ALL links located in the first level pages (mirror links) (--mirrorlinks)

 Proxy options:
-  P  proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy <param>)
+  P  proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy &lt;param&gt;)
 %f *use proxy for ftp (f0 don't use) (--httpproxy-ftp[=N])

 Limits options:
@@ -227,7 +227,7 @@ Links options:
 %P *extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don't use) (--extended-parsing[=N])
  n  get non-html files 'near' an html file (ex: an image located outside) (--near)
  t  test all URLs (even forbidden ones) (--test)
- %L <file add all URL located in this text file (one URL per line) (--list <param>)
+ %L &lt;file&gt; add all URL located in this text file (one URL per line) (--list &lt;param&gt;)

 Build options:
  NN structure type (0 *original structure, 1+: see below) (--structure[=N])
@@ -248,12 +248,12 @@ Spider options:
 %h  force HTTP/1.0 requests (reduce update features, only for old servers or proxies) (--http-10)
 %B  tolerant requests (accept bogus responses on some servers, but not standard!) (--tolerant)
 %s  update hacks: various hacks to limit re-transfers when updating (identical size, bogus response..) (--updatehack)
- %A  assume that a type (cgi,asp..) is always linked with a mime type (-%A php3=text/html) (--assume <param>)
+ %A  assume that a type (cgi,asp..) is always linked with a mime type (-%A php3=text/html) (--assume &lt;param&gt;)

 Browser ID:
-  F  user-agent field (-F "user-agent name") (--user-agent <param>)
- %F  footer string in Html code (-%F "Mirrored [from host %s [file %s [at %s]]]" (--footer <param>)
- %l  preferred language (-%l "fr, en, jp, *" (--language <param>)
+  F  user-agent field (-F "user-agent name") (--user-agent &lt;param&gt;)
+ %F  footer string in Html code (-%F "Mirrored [from host %s [file %s [at %s]]]" (--footer &lt;param&gt;)
+ %l  preferred language (-%l "fr, en, jp, *" (--language &lt;param&gt;)

 Log, index, cache
  C  create/use a cache for updates and retries (C0 no cache,C1 cache is prioritary,* C2 test update before) (--cache[=N])
@@ -303,8 +303,8 @@ Guru options: (do NOT use)
 #!  Execute a shell command (-#! "echo hello")

 Command-line specific options:
-  V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd <param>)
- %U run the engine with another id when called as root (-%U smith) (--user <param>)
+  V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd &lt;param&gt;)
+ %U run the engine with another id when called as root (-%U smith) (--user &lt;param&gt;)

 Details: Option N
  N0 Site-structure (default)
@@ -340,14 +340,14 @@ Details: User-defined option N
  %[param] param variable in query string

 Shortcuts:
--mirror      <URLs *make a mirror of site(s) (default)
--get         <URLs  get the files indicated, do not seek other URLs (-qg)
--list   <text file  add all URL located in this text file (-%L)
--mirrorlinks <URLs  mirror all links in 1st level pages (-Y)
--testlinks   <URLs  test links in pages (-r1p0C0I0t)
--spider      <URLs  spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
--testsite    <URLs  identical to --spider
--skeleton    <URLs  make a mirror, but gets only html files (-p1)
+--mirror      &lt;URLs&gt; *make a mirror of site(s) (default)
+--get         &lt;URLs&gt;  get the files indicated, do not seek other URLs (-qg)
+--list   &lt;text file&gt;  add all URL located in this text file (-%L)
+--mirrorlinks &lt;URLs&gt;  mirror all links in 1st level pages (-Y)
+--testlinks   &lt;URLs&gt;  test links in pages (-r1p0C0I0t)
+--spider      &lt;URLs&gt;  spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
+--testsite    &lt;URLs&gt;  identical to --spider
+--skeleton    &lt;URLs&gt;  make a mirror, but gets only html files (-p1)
 --update              update a mirror, without confirmation (-iC2)
 --continue            continue a mirror, without confirmation (-iC1)

@@ -387,13 +387,13 @@ with examples... I will be here a while...
 <hr>
 <h2> Syntax </h2>

-<pre><b><i>httrack <URLs> [-option] [+<FILTERs>] [-<FILTERs>] </i></b></pre>
+<pre><b><i>httrack &lt;URLs&gt; [-option] [+&lt;FILTERs&gt;] [-&lt;FILTERs&gt;] </i></b></pre>

 <p align=justify> The syntax of httrack is quite simple.  You specify
-the URLs you wish to start the process from (<URLS>), any options you
+the URLs you wish to start the process from (&lt;URLS&gt;), any options you
 might want to add ([-option], any filters specifying places you should
-([+<FILTERs>]) and should not ([-<FILTERs>]) go, and end the command
-line by pressing <enter>.  Httrack then goes off and does your bidding.
+([+&lt;FILTERs&gt;]) and should not ([-&lt;FILTERs&gt;]) go, and end the command
+line by pressing &lt;enter&gt;.  Httrack then goes off and does your bidding.
 For example:

 <pre><b><i>
@@ -425,7 +425,7 @@ site. Specifically, the defauls are:
  pN priority mode: (* p3)  *3 save all files
  D  *can only go down into subdirs
  a  *stay on the same address
-  --mirror      <URLs> *make a mirror of site(s) (default)
+  --mirror      &lt;URLs&gt; *make a mirror of site(s) (default)
 </pre>

 <p align=justify> Here's what all of that means:
@@ -542,7 +542,7 @@ subdirectories of the starting directory to be investigated.
 search started are to be collected.  Other sites they point to are not
 to be imaged. 

-<pre><b><i>  --mirror      <URLs> *make a mirror of site(s) (default) </i></b></pre>
+<pre><b><i>  --mirror      &lt;URLs&gt; *make a mirror of site(s) (default) </i></b></pre>

 <p align=justify> This indicates that the program should try to make a
 copy of the site as well as it can. 
@@ -921,7 +921,7 @@ Links options:
 %P *extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don't use)
  n   get non-html files 'near' an html file (ex: an image located outside)
  t   test all URLs (even forbidden ones)
- %L <file> add all URL located in this text file (one URL per line)
+ %L &lt;file&gt; add all URL located in this text file (one URL per line)
 </i></b></pre>

 <p align=justify> The links options allow you to control what links are
@@ -1183,7 +1183,7 @@ Spider options:
 %h  force HTTP/1.0 requests (reduce update features, only for old servers or proxies)
 %B  tolerant requests (accept bogus responses on some servers, but not standard!)
 %s  update hacks: various hacks to limit re-transfers when updating
- %A  assume that a type (cgi,asp..) is always linked with a mime type (-%A php3=text/html) (--assume <param>)
+ %A  assume that a type (cgi,asp..) is always linked with a mime type (-%A php3=text/html) (--assume &lt;param&gt;)
 </i></b></pre>

 <p align=justify> By default, cookies are universally accepted and
@@ -1387,7 +1387,7 @@ web servers leave footprints in the browser.
 Browser ID:
  F  user-agent field (-F "user-agent name")
 %F  footer string in Html code (-%F "Mirrored [from host %s [file %s [at %s]]]"
- %l  preferred language (-%l "fr, en, jp, *" (--language <param>)
+ %l  preferred language (-%l "fr, en, jp, *" (--language &lt;param&gt;)
 </i></b></pre>

 <p align=justify> The user-agent field is used by browsers to determine
@@ -1799,7 +1799,7 @@ based authentication)

 <pre><b><i>
 Command-line specific options:
-  V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd <param>)
+  V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd &lt;param&gt;)
 </i></b></pre>

 <p align=justify> This option is very nice for a wide array of actions
@@ -1811,7 +1811,7 @@ httrack http://www.shoesizes.com/bob/ -O /tmp/shoesizes -V "/bin/echo \$0"
 </i></b></pre>

 <pre>
- %U run the engine with another id when called as root (-%U smith) (--user <param>)
+ %U run the engine with another id when called as root (-%U smith) (--user &lt;param&gt;)
 </pre>

 <p align=justify> Change the UID of the owner when running as r00t
@@ -1856,14 +1856,14 @@ of other options that are commonly used.

 <pre><b><i>
 Shortcuts:
--mirror      <URLs> *make a mirror of site(s) (default)
--get         <URLs>  get the files indicated, do not seek other URLs (-qg)
--list   <text file>  add all URL located in this text file (-%L)
--mirrorlinks <URLs>  mirror all links in 1st level pages (-Y)
--testlinks   <URLs>  test links in pages (-r1p0C0I0t)
--spider      <URLs>  spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
--testsite    <URLs>  identical to --spider
--skeleton    <URLs>  make a mirror, but gets only html files (-p1)
+--mirror      &lt;URLs&gt; *make a mirror of site(s) (default)
+--get         &lt;URLs&gt;  get the files indicated, do not seek other URLs (-qg)
+--list   &lt;text file&gt;  add all URL located in this text file (-%L)
+--mirrorlinks &lt;URLs&gt;  mirror all links in 1st level pages (-Y)
+--testlinks   &lt;URLs&gt;  test links in pages (-r1p0C0I0t)
+--spider      &lt;URLs&gt;  spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
+--testsite    &lt;URLs&gt;  identical to --spider
+--skeleton    &lt;URLs&gt;  make a mirror, but gets only html files (-p1)
 --update              update a mirror, without confirmation (-iC2)
 --continue            continue a mirror, without confirmation (-iC1)
 --catchurl            create a temporary proxy to capture an URL or a form post URL
@@ -2019,15 +2019,15 @@ are in reverse priority order.  Here's an example:
        <td>no characters must be present after</a></td>
      </tr>
 	<tr>
-		<td> <b> <filter>*[&lt NN]</b></td>
+		<td> <b> &lt;filter&gt;*[&lt NN]</b></td>
 		<td> size less than NN Kbytes</td>
 	</tr>
 	<tr>
-		<td> <b> <filter>*[&gt PP]</b></td>
+		<td> <b> &lt;filter&gt;*[&gt PP]</b></td>
 		<td> size more than PP Kbytes</td>
 	</tr>
 	<tr>
-		<td> <b> <filter>*[&lt NN &gt PP]</b></td>
+		<td> <b> &lt;filter&gt;*[&lt NN &gt PP]</b></td>
 		<td> size less than NN Kbytes and more than PP Kbytes</td>
 	</tr>
    </table>
--- a/lang/Ukrainian.txt
+++ b/lang/Ukrainian.txt
@@ -7,7 +7,7 @@ uk
 LANGUAGE_AUTHOR
 Andrij Shevchuk (http://programy.com.ua, http://vic-info.com.ua) \r\n
 LANGUAGE_CHARSET
-ISO-8859-5
+windows-1251
 LANGUAGE_WINDOWSID
 Ukrainian
 OK
--- a/tests/01_engine-filter.test
+++ b/tests/01_engine-filter.test
@@ -47,3 +47,25 @@ match '*foo*bar' 'foozbar'

 # '?' is the query-string marker, not a single-char wildcard
 nomatch 'a?c' 'abc'
+
+# backslash escapes a metacharacter inside a class so it is matched literally.
+# Quirk: the decoder also adds the backslash itself to the set, so '\X' matches
+# both X and '\'. These assertions pin that behavior.
+match '*[\*]' '*'
+match '*[\*]' "\\"
+nomatch '*[\*]' 'a'
+match '*[\\]' "\\"
+nomatch '*[\\]' 'a'
+match '*[\[]' '['
+match '*[\[]' "\\"
+nomatch '*[\[]' 'a'
+
+# A literal ']' cannot be a class member: the class parser stops at the first
+# ']', escaped or not. So '*[\[\]]' does NOT mean "the [ or ] character" as the
+# filter guide claims (GitHub #148); it parses as the class {'[','\'} followed
+# by a trailing literal ']'. These assertions document the current (buggy)
+# behavior so any future matcher fix is a deliberate, visible change.
+nomatch '*[\[\]]' '['   # not matched, despite the docs
+match '*[\[\]]' ']'     # only via the empty class-match + trailing ']'
+match '*[\[\]]' '[]'    # one of {'[','\'} then the trailing ']'
+nomatch '*[\[\]]' '[]x'
Author	SHA1	Message	Date
Xavier Roche	5351e96d71	Merge pull request #325 from xroche/docs/rfc2606-example-domains docs: use www.example.com in examples; add html manual regen target	2026-06-13 10:41:24 +02:00
Xavier Roche	a0bf50f6b1	Merge pull request #324 from xroche/test/filter-escape-characterize test: characterize wildcard class escape behavior	2026-06-13 10:17:24 +02:00
Xavier Roche	794404bba2	test: characterize wildcard class escape behavior Add -#0 self-test cases for backslash escapes inside a '[...]' class. They pin two quirks of the current decoder: '\X' matches both X and the backslash itself, and a literal ']' cannot be a class member because the parser stops at the first ']' (escaped or not). The latter is why the filter guide's '[\[\]]' = "the [ or ] character" claim is wrong (#148): it parses as the class {[,\} plus a trailing literal ']'. These tests lock the behavior down so a later matcher fix is a deliberate change. refs #148	2026-06-13 10:15:45 +02:00
Xavier Roche	82d08aaeaf	Merge pull request #323 from xroche/fix/doc-lang-nits docs: fix help-guide placeholders, README clone flag, Ukrainian charset	2026-06-13 10:12:09 +02:00
Xavier Roche	459f06e758	docs: fix help-guide placeholders, README clone flag, Ukrainian charset Escape the literal <URLs>, <FILTERs>, <param>, <filter>, <file> and related placeholders in fcguide.html so they render instead of being swallowed as unknown HTML tags; several were also missing their closing '>'. Use --recurse-submodules in the README clone command. Relabel lang/Ukrainian.txt as windows-1251, which is what its bytes actually are (ISO-8859-5 decodes them to garbage). closes #132, closes #103, closes #167	2026-06-13 10:05:40 +02:00