mirror of
https://github.com/xroche/httrack.git
synced 2026-06-20 17:18:14 +03:00
httrack opened https connections straight to the origin even when a proxy was configured, so --proxy was silently ignored for https and the crawler used the real IP. http_xfopen bypassed the proxy for any https:// URL, because the absolute-URI proxy form it uses for http cannot carry https. Connect to the proxy instead and, once the TCP connection is up, open an HTTP CONNECT tunnel (http_proxy_tunnel) before the TLS handshake, so TLS runs end-to-end with the origin. Proxy credentials now ride the CONNECT request rather than the tunneled GET, where they would leak to the origin. The exchange is a bounded blocking read inside the back_wait connect path: no new async state, no struct/ABI change (the helpers stay visibility-hidden). Verified end-to-end by 13_crawl_proxy_https.test: it crawls a local self-signed https origin through a logging CONNECT proxy and asserts the proxy saw the CONNECT and that credentials ride it. The assertion fails on the pre-fix bypass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>
3.7 KiB
3.7 KiB