Compare commits

..

1 Commits

Author SHA1 Message Date
Xavier Roche
594820d3eb Add AGENTS.md operational checklist for AI-assisted contributions
LLM-assisted PRs are arriving; give agents one compact, tool-neutral file
covering the repo's toolchain rules and invariants so contributions arrive
review-ready instead of needing the conventions reconstructed each time.

AGENTS.md is the operational checklist (build/test, autotools regen, touched-
lines-only formatting, byte-safe Latin-1 edits, overflow-safe bounds,
adversarial self-review, commit/PR discipline). CLAUDE.md imports it via
@AGENTS.md so Claude Code auto-loads the same source. CONTRIBUTING.md keeps the
policy and gains a Co-Authored-By attribution rule plus a PR-conciseness line.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
2026-06-16 04:01:29 +02:00
9 changed files with 110 additions and 91 deletions

67
AGENTS.md Normal file
View File

@@ -0,0 +1,67 @@
# AGENTS.md — working in the HTTrack tree
Policy and PR etiquette live in [CONTRIBUTING.md](CONTRIBUTING.md). This file is
the operational checklist: toolchain, invariants, and how to ship a change.
## Build & test
- Fresh clone first: `git submodule update --init src/coucal`
- `bash configure && make && make check`
## Hard invariants
- **Toolchain edit** (`configure.ac`, any `Makefile.am`, `m4/`) → run
`autoreconf -fi` and commit the regenerated tracked files. The repo ships the
generated `configure`/`Makefile.in` so users build without autotools; CI does
**not** catch staleness.
- **Format only changed lines** with `git clang-format` (clang-format 19). Never
reformat untouched code: the engine was formatted by an old tool and won't
round-trip.
- **Byte-safe edits.** Files with raw high bytes are ISO-8859-1 (French
comments). Edit them byte-wise (`perl -0pi`, `sed`), not through a tool that
re-encodes to UTF-8 and corrupts them.
## Security (HTTrack parses hostile input off the network)
- Bounds-check every copy. Overflow-safe form: put the untrusted value alone,
`untrusted < limit - controlled` — never `controlled + untrusted < limit`,
which can wrap and pass.
## Code & prose
- Be terse. Comment the why, in English; translate French comments you touch.
- Strip AI tells from prose (em-dash overuse, rule-of-three, filler, vague
attributions). Ref: Wikipedia "Signs of AI writing". Claude Code: `/humanizer`.
- Behavior change → add a test. Fast path: a hidden `httrack -#N` debug
subcommand (`htscoremain.c`) driven by a `tests/NN_*.test`, over a slow crawl.
## Review your change adversarially (strongly suggested)
Before pushing, and when reviewing others, don't skim for bugs:
- **One invariant at a time.** Name a property the diff must preserve (bounds
hold, cache/wire format unchanged, no use-after-free, ABI stable), then
construct inputs that would break it. "General correctness" is not a charter.
- **Audit tests against the spec, not the code.** For each new test ask: "what
buggy path would still pass this?" If you can build one, the test is
confirmation-biased: assertions copied from observed output lock bugs in.
- **Risk areas need runtime probes.** Touching hostile-input parsing, struct
layout/ABI, cache/wire format, or a security path? A static or unit check
isn't enough; exercise the wrong behavior at runtime. Claude Code:
`/review-recipe`.
## Commits
- **Sign-off is mandatory.** Every commit carries a `Signed-off-by` trailer:
`git commit -s` (DCO, CI-enforced — unsigned commits are rejected).
- **Co-Authored-By is mandatory for AI-assisted commits.** Carry a
`Co-Authored-By:` trailer naming the assistant. Attribute there, never in a
PR-body footer.
- PRs land as a merge commit; every commit on the branch goes onto master, so
keep each commit message clean and meaningful.
## PR descriptions
- Plain concise prose; lead with what changed and why. No What/Why/How template.
- Title names the problem, not the implementation.
- Don't restate the diff — give what it can't show: motivation, context,
tradeoffs, risk.
- Length tracks the change: a typo is one sentence; a security fix earns a writeup.
- Verify claims against the code before you write them; flag drift, don't repeat it.
- Don't hard-wrap (GitHub reflows). No "Generated with Claude" footer. Run the
prose through `/humanizer`.
## Toolchain
C · clang-format-19 · autoreconf · shfmt + shellcheck (shell) · black + flake8 (Python)

1
CLAUDE.md Normal file
View File

@@ -0,0 +1 @@
@AGENTS.md

View File

@@ -1,12 +1,15 @@
# Contributing to HTTrack
HTTrack is small and old. Keep changes easy to review and safe to merge.
HTTrack is small and old. Keep changes easy to review and safe to merge. Working
with an AI assistant? The operational checklist is [AGENTS.md](AGENTS.md).
## Pull requests
- One change per PR. Small diffs merge fast.
- PRs are squash-merged: the title and description become the commit message, so
explain *why*.
- PRs land as a merge commit, so the branch's commits go onto master as-is: keep
each commit message clean and explain *why*.
- Be terse in the PR title and description: name the problem, not the fix, don't
restate the diff, and calibrate length to the change.
- Add or update tests for engine changes (`tests/`), and keep CI green.
## Style
@@ -30,6 +33,9 @@ Welcome, and nothing to disclose. Two rules:
- **Own every line** as if you wrote it. Can't explain it in review? Not ready.
- **Don't push your work onto reviewers.** A raw generated patch a maintainer has
to vet from scratch will be closed.
- **Attribution is mandatory.** AI-assisted commits must carry a
`Co-Authored-By:` trailer naming the assistant, not a footer in the PR
description.
The sign-off covers AI-assisted code too.

View File

@@ -285,46 +285,6 @@ static void basic_selftests(void) {
assertf(end == NULL && strcmp(tok, "a\\") == 0);
}
}
// fil_normalized(): canonicalizes a URL path. Query arguments are sorted
// alphabetically (by the text after each '?'/'&') and the query is rebuilt
// through a bounded builder; outside the query, "//" collapses to "/".
// Regression for that builder.
{
char norm[256];
assertf(strcmp(fil_normalized("/p?b=2&a=1&c=3", norm), "/p?a=1&b=2&c=3") ==
0);
assertf(strcmp(fil_normalized("/a//b", norm), "/a/b") == 0);
}
// give_mimext(): mime type -> file extension, bounded into the caller buffer.
{
char ext[16];
give_mimext(ext, sizeof(ext), "image/gif");
assertf(strcmp(ext, "gif") == 0);
give_mimext(ext, sizeof(ext), "text/html");
assertf(strcmp(ext, "html") == 0);
give_mimext(ext, sizeof(ext), "no/such-mime-type");
assertf(ext[0] == '\0');
}
// convtolower(): lower-cases into the caller buffer (bounded by its size).
{
char low[64];
assertf(strcmp(convtolower(low, sizeof(low), "ABC/Def.HTML"),
"abc/def.html") == 0);
}
// cut_path(): splits a path into directory (with trailing '/') and basename,
// each bounded by its buffer size.
{
char full[] = "/dir/sub/file.html";
char path[256];
char pname[256];
cut_path(full, path, sizeof(path), pname, sizeof(pname));
assertf(strcmp(path, "/dir/sub/") == 0);
assertf(strcmp(pname, "file.html") == 0);
}
}
/* Self-tests for the htssafe.h bounded string ops (driven by httrack -#8).
@@ -2645,7 +2605,7 @@ static int hts_main_internal(int argc, char **argv, httrackp * opt) {
printf("%s is '%s'\n", argv[na + 1], mime);
ext[0] = '\0';
give_mimext(ext, sizeof(ext), mime);
give_mimext(ext, mime);
if (ext[0]) {
printf("and its local type is '.%s'\n", ext);
}

View File

@@ -76,7 +76,7 @@ static coucal_key key_duphandler(void *arg, coucal_key_const name) {
/* Key sav hashes are using case-insensitive version */
static coucal_hashkeys key_sav_hashes(void *arg, coucal_key_const key) {
hash_struct *const hash = (hash_struct*) arg;
convtolower(hash->catbuff, sizeof(hash->catbuff), (const char *) key);
convtolower(hash->catbuff, (const char*) key);
return coucal_hash_string(hash->catbuff);
}

View File

@@ -1530,9 +1530,8 @@ void treathead(t_cookie * cookie, const char *adr, const char *fil, htsblk * ret
if (retour->location) {
while(is_realspace(*(rcvd + p)))
p++; // sauter espaces
if ((int) strlen(rcvd + p) < HTS_URLMAXSIZE) // not too long?
/* location aliases location_buffer[HTS_URLMAXSIZE * 2] */
strlcpybuff(retour->location, rcvd + p, HTS_URLMAXSIZE * 2);
if ((int) strlen(rcvd + p) < HTS_URLMAXSIZE) // pas trop long?
strcpybuff(retour->location, rcvd + p);
else // erreur.. ignorer
retour->location[0] = '\0';
}
@@ -3445,17 +3444,16 @@ HTSEXT_API char *fil_normalized(const char *source, char *dest) {
/* Replace query by sorted query */
copyBuff = malloct(qLen + 1);
assertf(copyBuff != NULL);
{
htsbuff cb = htsbuff_ptr(copyBuff, qLen + 1);
for (i = 0; i < ampargs; i++) {
htsbuff_cat(&cb, i == 0 ? "?" : "&");
htsbuff_cat(&cb, amps[i] + 1);
}
assertf(cb.len == qLen);
copyBuff[0] = '\0';
for(i = 0; i < ampargs; i++) {
if (i == 0)
strcatbuff(copyBuff, "?");
else
strcatbuff(copyBuff, "&");
strcatbuff(copyBuff, amps[i] + 1);
}
/* query points into dest where the original qLen-byte query was */
strlcpybuff(query, copyBuff, qLen + 1);
assertf(strlen(copyBuff) == qLen);
strcpybuff(query, copyBuff);
/* Cleanup */
freet(amps);
@@ -3896,9 +3894,9 @@ HTSEXT_API size_t escape_for_html_print_full(const char *const s, char *const de
#undef ADD_CHAR
// lower-case conversion into caller buffer (capacity catbuffsize)
char *convtolower(char *catbuff, size_t catbuffsize, const char *a) {
strlcpybuff(catbuff, a, catbuffsize);
// conversion minuscules, avec buffer
char *convtolower(char *catbuff, const char *a) {
strcpybuff(catbuff, a);
hts_lowcase(catbuff); // lower case
return catbuff;
}
@@ -4075,15 +4073,15 @@ int get_userhttptype(httrackp * opt, char *s, const char *fil) {
// renvoyer extesion d'un type mime..
// ex: "image/gif" -> gif
void give_mimext(char *s, size_t ssize, const char *st) {
void give_mimext(char *s, const char *st) {
int ok = 0;
int j = 0;
s[0] = '\0';
while((!ok) && (strnotempty(hts_mime[j][1]))) {
if (strfield2(hts_mime[j][0], st)) {
if (hts_mime[j][1][0] != '*') { // a match exists
strlcpybuff(s, hts_mime[j][1], ssize);
if (hts_mime[j][1][0] != '*') { // Une correspondance existe
strcpybuff(s, hts_mime[j][1]);
ok = 1;
}
}
@@ -4104,7 +4102,7 @@ void give_mimext(char *s, size_t ssize, const char *st) {
if (a) {
if ((int) strlen(a) >= 1) {
if ((int) strlen(a) <= 4) {
strlcpybuff(s, a, ssize);
strcpybuff(s, a);
ok = 1;
}
}
@@ -4208,7 +4206,7 @@ int may_bogus_multiple(httrackp * opt, const char *mime, const char *filename) {
char ext[64];
ext[0] = '\0';
give_mimext(ext, sizeof(ext), mime);
give_mimext(ext, mime);
if (ext[0] != 0) { /* we have an extension for that */
const size_t ext_size = strlen(ext);
const char *file = strrchr(filename, '/'); /* fetch terminal filename */
@@ -4932,8 +4930,7 @@ void hts_freeall(void) {
// cut path and project name
// patch also initial path
void cut_path(char *fullpath, char *path, size_t path_size, char *pname,
size_t pname_size) {
void cut_path(char *fullpath, char *path, char *pname) {
path[0] = pname[0] = '\0';
if (strnotempty(fullpath)) {
if ((fullpath[strlen(fullpath) - 1] == '/')
@@ -4949,8 +4946,8 @@ void cut_path(char *fullpath, char *path, size_t path_size, char *pname,
a--;
if (*a == '/')
a++;
strlcpybuff(pname, a, pname_size);
strlncatbuff(path, fullpath, path_size, (size_t) (a - fullpath));
strcpybuff(pname, a);
strncatbuff(path, fullpath, (int) (a - fullpath));
}
}
}

View File

@@ -252,7 +252,7 @@ int ishtml_ext(const char *a);
int ishttperror(int err);
int get_userhttptype(httrackp * opt, char *s, const char *fil);
void give_mimext(char *s, size_t ssize, const char *st);
void give_mimext(char *s, const char *st);
int may_bogus_multiple(httrackp * opt, const char *mime, const char *filename);
int may_unknown2(httrackp * opt, const char *mime, const char *filename);
@@ -264,7 +264,7 @@ void code64(unsigned char *a, int size_a, unsigned char *b, int crlf);
#define copychar(catbuff,a) concat(catbuff,(a),NULL)
char *convtolower(char *catbuff, size_t catbuffsize, const char *a);
char *convtolower(char *catbuff, const char *a);
void hts_lowcase(char *s);
void hts_replace(char *s, char from, char to);
int multipleStringMatch(const char *s, const char *match);
@@ -276,8 +276,7 @@ void fprintfio(FILE * fp, const char *buff, const char *prefix);
int sig_ignore_flag(int setflag); // flag ignore
#endif
void cut_path(char *fullpath, char *path, size_t path_size, char *pname,
size_t pname_size);
void cut_path(char *fullpath, char *path, char *pname);
int fexist(const char *s);
int fexist_utf8(const char *s);

View File

@@ -344,7 +344,7 @@ int url_savename(lien_adrfilsave *const afs,
mime[0] = ext[0] = '\0';
get_userhttptype(opt, mime, fil);
if (strnotempty(mime)) {
give_mimext(ext, sizeof(ext), mime);
give_mimext(ext, mime);
if (strnotempty(ext)) {
ext_chg = 1;
}
@@ -378,7 +378,7 @@ int url_savename(lien_adrfilsave *const afs,
ext_chg = 2; /* change filename */
strcpybuff(ext, r.cdispo);
} else if (!may_unknown2(opt, r.contenttype, fil)) { // on peut patcher à priori?
give_mimext(s, sizeof(s), r.contenttype); // get extension
give_mimext(s, r.contenttype); // obtenir extension
if (strnotempty(s) > 0) { // on a reconnu l'extension
ext_chg = 1;
strcpybuff(ext, s);
@@ -403,7 +403,7 @@ int url_savename(lien_adrfilsave *const afs,
mime[0] = ext[0] = '\0';
get_userhttptype(opt, mime, fil);
if (strnotempty(mime)) {
give_mimext(ext, sizeof(ext), mime);
give_mimext(ext, mime);
if (strnotempty(ext)) {
ext_chg = 1;
}
@@ -421,8 +421,7 @@ int url_savename(lien_adrfilsave *const afs,
} else if (!may_unknown2(opt, headers->r.contenttype, headers->url_fil)) { // on peut patcher à priori? (pas interdit ou pas de type)
char s[16];
s[0] = '\0';
give_mimext(s, sizeof(s),
headers->r.contenttype); // get extension
give_mimext(s, headers->r.contenttype); // obtenir extension
if (strnotempty(s) > 0) { // on a reconnu l'extension
ext_chg = 1;
strcpybuff(ext, s);
@@ -432,7 +431,7 @@ int url_savename(lien_adrfilsave *const afs,
else if (mime_type != NULL) {
ext[0] = '\0';
if (*mime_type) {
give_mimext(ext, sizeof(ext), mime_type);
give_mimext(ext, mime_type);
}
if (strnotempty(ext)) {
char mime_from_file[128];
@@ -647,8 +646,7 @@ int url_savename(lien_adrfilsave *const afs,
ext_chg = 2; /* change filename */
strcpybuff(ext, back[b].r.cdispo);
} else if (!may_unknown2(opt, back[b].r.contenttype, back[b].url_fil)) { // on peut patcher à priori? (pas interdit ou pas de type)
give_mimext(s, sizeof(s),
back[b].r.contenttype); // get extension
give_mimext(s, back[b].r.contenttype); // obtenir extension
if (strnotempty(s) > 0) { // on a reconnu l'extension
ext_chg = 1;
strcpybuff(ext, s);

View File

@@ -237,15 +237,6 @@ static char *strncatbuff_ptr_(char *dest, const char *src, size_t n) {
HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), (size_t) -1, \
"overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__)
/**
* Append at most "N" characters of "B" to "A", "A" having a maximum capacity
* of "S".
*/
#define strlncatbuff(A, B, S, N) \
strncat_safe_(A, S, B, HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), \
N, "overflow while appending '" #B "' to '" #A "'", __FILE__, \
__LINE__)
/**
* Copy characters of "B" to "A", "A" having a maximum capacity of "S".
*/