[fix/utf8] reviewed and fixed all points where utf8proc_iterate is called and may return an error which can cause the iteration not to make forward progress. This includes fixing a bug where injecting invalid UTF-8 through a series of HTML-encoded codepoints can cause the C library to hang. Note: we're not fixing all the garbage encoding in the world, so if encoding is bad the output of expand_address may not be useful but it won't hang. Fixes #448

This commit is contained in:
Al
2025-07-02 00:10:49 -04:00
parent 053de1c8e4
commit 95e97c0585
6 changed files with 37 additions and 16 deletions

View File

@@ -736,6 +736,8 @@ phrase_t trie_search_prefixes_from_index(trie_t *self, char *word, size_t len, u
return (phrase_t){phrase_start, phrase_len, value};
}
}
// Note: don't need to check the < 0 case because we're returning from this branch.
}
if (first_char) phrase_start = idx;
phrase_len = (uint32_t)(idx + match_len) - phrase_start;