mirror of
https://github.com/xroche/httrack.git
synced 2026-06-29 05:26:32 +03:00
Mirroring a site behind a login meant either re-implementing the auth flow or dropping a file literally named cookies.txt into the output or working directory, the only two places the engine looked. This adds a CLI option to point at an arbitrary Netscape/Mozilla cookies.txt, so a session exported from a browser (the "Get cookies.txt" extensions write exactly this format) is replayed on the crawl and authenticated pages come down. The plumbing already existed: cookie_load parses the format into the shared jar and the request path sends every matching cookie. The new opt->cookies_file is loaded last, after the mirror/CWD defaults, so a user-supplied value wins on a name/domain/path conflict. The field is appended at the tail of httrackp, so the exported ABI is unchanged. Cookies key on host[:port], so a bare-domain file matches a normal crawl of a default-port site; only an explicit-port URL needs the port in the cookie domain. Covered by 27_local-cookies-file.test: a gated page that 500s without a cookie no page ever sets, reachable only once the file preloads it (with -o0 so the absence of a 500 error page is meaningful), plus a no-cookie control. The local-crawl harness grows a --cookie helper that writes a port-scoped jar. The copyopt self-test also gains a String round-trip so the exported copy_htsopt path for the new field is covered. Closes #215 Signed-off-by: Xavier Roche <roche@httrack.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>