Commit Graph

11 Commits

Author SHA1 Message Date
Al
95e97c0585 [fix/utf8] reviewed and fixed all points where utf8proc_iterate is called and may return an error which can cause the iteration not to make forward progress. This includes fixing a bug where injecting invalid UTF-8 through a series of HTML-encoded codepoints can cause the C library to hang. Note: we're not fixing all the garbage encoding in the world, so if encoding is bad the output of expand_address may not be useful but it won't hang. Fixes #448 2025-07-02 00:10:49 -04:00
Al
c5e2f89ee9 [fix] declaring is_common_script function as static 2017-04-02 23:53:21 -04:00
Al
25ae5bed33 [unicode] Adding SCRIPT_INHERITED as a common script so diacritics like COMBING CEDILLA don't break the current script and produce false word breaks 2016-01-11 16:39:21 -05:00
Al
88bd0cd158 [unicode] better segmentation on script breaks 2015-09-23 04:06:34 -04:00
Al
ee96dab93c [fix] unnecessary headers 2015-07-25 13:49:42 -04:00
Al
cc0401a8d1 [utf8] Adding a boolean struct member for string_script_t return values, set to true if the string is ASCII (no transliteration needed, should be frequent for English addresses) 2015-06-28 19:37:58 -04:00
Al
c376bcef3d [utils] get_string_script returns a struct rather than modifying a pointer for the length 2015-06-25 10:06:38 -04:00
Al
581cf406a6 [utf8] Adding length argument to string_script function 2015-06-24 13:39:09 -05:00
Al
5e71a9d805 [utf8] Adding method to get the script of a string and the length of the span (rolls Common script up with the previuos script) 2015-06-24 13:29:40 -05:00
Al
f2d03a7937 [fix] renaming structure 2015-06-23 02:12:24 -05:00
Al
d5a9041cd3 [unicode] Adding generated unicode script data 2015-03-18 17:01:03 -04:00