Commit Graph

49 Commits

Author SHA1 Message Date
Al
98c395d34c [numex] Concatenating a string of numeric expressions with no intervening tokens like Seventeen Eighty or Ten Oh Four 2016-02-10 09:21:31 -05:00
Al
7b300639f1 [fix] Trie prefix search tail comparison 2016-01-17 20:56:37 -05:00
Al
0d5cf0d6d7 [utils] char_array_cat_printf was forcing a doubling of the size of the buffer, which is bad if calling many times. Now only initiates a realloc if the char_array is almost full. Also adding cstring_array_from_strings which takes a list of char *s 2016-01-06 22:56:01 -05:00
Al
d0aaff1482 [utils] string_equals with NULL check 2015-12-01 13:12:08 -05:00
Al
40918812e2 [normalize] Adding hyphen elimination as a string option (changes tokenization) 2015-10-27 13:32:47 -04:00
Al
6428c0ae20 [utils] cstring_array_cat 2015-10-03 16:00:13 -04:00
Al
3fab0f984f [fix] fixing some compiler warnings, using type-specific abs functions for vector_math 2015-09-19 16:11:09 -04:00
Al
35b9122a1a [utils] inlining a few functions 2015-09-10 16:33:54 -07:00
Al
0ddf50cb5f [utils] add to feature array with printf syntax 2015-09-10 10:24:51 -07:00
Al
b3f89a207a [utils] Version of string_split for single character delimiters which modifies the input string directly rather than creating (essentially) a copy 2015-09-09 18:07:31 -07:00
Al
9d2ca08fc2 [utils] Adding _copy and _new_copy methods to vectors (the former copies data to a pre-allocated vector, the latter allocates a new vector) 2015-09-06 21:01:26 -07:00
Al
a13e5117b5 [utils] string_tree_num_strings method 2015-08-10 17:46:37 -04:00
Al
064b6b5898 [utils] char_array_append_reversed for adding reversed strings without a malloc 2015-08-10 16:10:05 -04:00
Al
9b69d1f67a [fix] Removing C++ checks from all but the main API functions 2015-08-07 17:15:39 -04:00
Al
3178eda501 [utils] string_contains_hyphen method 2015-08-02 14:35:18 -06:00
Al
7aee159c0c [utils] string_tree_num_tokens 2015-07-27 12:36:34 -04:00
Al
b94526a27b [utils] Making string_trim handle all kinds of UTF-8 whitespace/separators 2015-07-27 01:55:46 -04:00
Al
93042761ac [fix] warnings in string_utils.c 2015-07-26 23:36:03 -04:00
Al
a67ec44a08 [utils] cstring_array_terminate, moving msgpack_utils to separate file 2015-07-25 18:41:02 -04:00
Al
2adaf475c2 [utils] cstring_array (contiguous) to array of malloc'd strings 2015-07-25 12:14:01 -04:00
Al
f713c53993 [utils] Adding an option to char_array_add_joined to strip separators for path manipulation 2015-07-16 03:49:00 -04:00
Al
d7f73e62f1 [utils] Adding cstring_array_clear method 2015-07-06 12:48:26 -04:00
Al
b58877ec6c [utils] string_is_lower/string_is_upper method 2015-07-01 14:49:22 -04:00
Al
a5dacf3d2b [utils] Adding method to get a particular token alternative from a string tree 2015-06-28 15:15:29 -04:00
Al
82e85732c4 [fix] Setting codepoint in utf8proc_iterate_reversed 2015-06-25 17:20:55 -04:00
Al
bcee9832b3 [utils] cstring_array_get_token=>cstring_array_get_string 2015-06-25 10:05:35 -04:00
Al
7dd772de0f [fix] implementation of cstring_array_split 2015-06-23 02:11:24 -05:00
Al
8520df96c8 [utils] utf8 comparison can handle a non-valid UTF-8 sequence e.g. for trie suffix comparison where we may be in the middle of a multi-byte character. Adding a standard utf8_common_prefix method 2015-06-12 16:11:40 -04:00
Al
3442b9ad92 [utils] require at least one non-space/non-hyphen match in utf8_common_prefix_len_ignore_separators 2015-06-12 11:19:37 -04:00
Al
ab5ea6d791 [utils] Common prefix-style return value instead of a utf8 strcmp 2015-06-11 10:59:51 -04:00
Al
aad5f3edd3 [utils] UTF-8 lowercasing and string comparison, including a version which ignores dashes/spaces 2015-06-10 18:27:14 -04:00
Al
81be8e771e [numex] regen data file. utf8_is_hyphen requires a character, all other methods use category 2015-06-08 21:32:38 -04:00
Al
06835d5c37 [utils] string_utils category functions take a category instead of a codepoint 2015-06-06 20:41:07 -04:00
Al
114b728f96 [fix] var 2015-06-04 17:18:05 -04:00
Al
528dd05983 [numex] Adding utf8_is_number_or_letter 2015-06-04 14:49:12 -04:00
Al
ca746304e3 [utils] Adding a few methods to string_utils for finding utf8proc category groups 2015-06-04 13:20:14 -04:00
Al
8ac8f83b7f [utils] changing signature of utf8proc_iterate_reversed so it takes the same arguments as utf8proc_iterate for function pointer purposes 2015-05-25 15:35:28 -04:00
Al
bfb9aa21a1 [fix] unused var 2015-05-19 18:04:06 -04:00
Al
58571f70cc [utils] adding a boolean flag on string tree iterators for single path trees 2015-05-18 13:57:11 -04:00
Al
e8fdd4564d [utils] adding string_tree for listing sets of token alternatives and string_tree_iterator to generate permutations over the strings, needed for transliteration and ambiguous address elements/place names 2015-05-16 23:16:10 -04:00
Al
e21873635c [utils] Using token offsets to calculate lengths for contiguous string arrays, inlining a few functions 2015-04-15 20:17:03 -04:00
Al
0234754c20 [fix] warnings in string_utils 2015-04-12 12:16:32 -04:00
Al
4729dfe178 [utils] string_[rl]strip => string_[rl]trim, removing warning about allocation 2015-04-06 02:19:19 -04:00
Al
198e51b8a3 [utils] more/better char_array methods 2015-04-05 22:01:46 -04:00
Al
5f3d74de18 [fix] contiguous string array 2015-04-03 11:22:50 -04:00
Al
c81aa72254 [utils] a few changes to contiguous string arrays 2015-04-01 19:02:11 -04:00
Al
1ac4438e39 [utils] More consistent naming in string_utils 2015-03-27 21:12:08 -04:00
Al
70195fffd5 [utils] new methods on string_utils for better dynamic strings which retains the benefits of sds without having to worry about the pointer changing, renaming contiguous string array methods to something more succinct 2015-03-27 20:55:36 -04:00
Al
5216aba1b6 [utils] string utils, file utils, contiguous arrays of strings used for storing tokenized strings, klib for generic hashtables and vectors, antirez's sds for certain types of string building, utf8proc for iterating over utf-8 strings and unicode normalization 2015-03-03 12:33:13 -05:00