Al
|
c78566c241
|
[utils] adding cstring_array_extend and string_tree_clear
|
2017-12-24 01:46:20 -05:00 |
|
Al
|
e4e84f0147
|
[utils] adding unicode_common_prefix/unicode_common_suffix, string_hyphen_prefix_len and string_hyphen_suffix_len to string_utils
|
2017-12-08 14:28:30 -05:00 |
|
Al
|
cfa5b1ce42
|
[similarity] adding a stopword-aware acronym alignment method for matching U.N. with United Nations, Museum of Modern Art with MoMA, as well as things like University of California - Los Angeles with UCLA. All of these should work across languages, including non-Latin character sets like Cyrllic (but not ideograms as the concept doesn't make as much sense there). Skipping tokens like "of" or "the" depends only on the stopwords dictionary being defined for a given language.
|
2017-12-04 15:21:44 -05:00 |
|
Al
|
ec4d683d1b
|
Merge branch 'master' into lieu_api
|
2017-11-29 15:49:52 -05:00 |
|
AeroXuk
|
26ac9ab5c2
|
Removing EXPORT statements from all source files and most header files, leaving only the exports for the main API in libpostal.h. Modified Makefiles so that all the test apps build without having extra functions exported from libpostal.
|
2017-11-25 04:35:28 +00:00 |
|
AeroXuk
|
f0246e7333
|
Fix bug in strndup fix for windows. Move all includes out of headers and into code for strndup.h and move it to be the last include.
|
2017-11-23 19:11:25 +00:00 |
|
AeroXuk
|
f07ab765cb
|
Adding the export marker to all functions used in tests.
|
2017-11-20 20:58:37 +00:00 |
|
Al
|
665b780422
|
[utils] adding unicode_equals function in string_utils for testing equality of unicode char arrays
|
2017-11-11 02:45:41 -05:00 |
|
Al
|
6d430f7e9b
|
[utils] adding functions for finding the next index of a full stop/period charater in a string
|
2017-10-27 04:07:28 -04:00 |
|
Al
|
245aa226e0
|
[utils] function to create an array of uint32_t codepoints from a UTF-8 string, a few bug fixes to string_utils
|
2017-10-19 04:48:50 -04:00 |
|
Al
|
09fbb02042
|
[utils] adding utf8_equal_ignore_separators to string utils
|
2017-10-14 01:36:56 -04:00 |
|
Al
|
f8a808e254
|
[utils] adding utf8_len function for strings, and utf8_is_digit
|
2017-10-12 11:16:53 -04:00 |
|
Oliver Keyes
|
35821f975e
|
Remove unused variable
What it says on the tin!
|
2017-04-18 21:25:00 -07:00 |
|
Al
|
1b2696b3b5
|
[utils] adding string_is_digit function, similar to Python\'s (i.e. counts if it's in the Nd unicode category)
|
2017-03-15 13:04:39 -04:00 |
|
Al
|
b88487f633
|
[utils] string_replace_char does single byte/character replacement, new string_replace to do full string replacement, again using char_array for safety, string_replace_with_array function for memory reuse
|
2017-02-17 13:58:51 -05:00 |
|
Al
|
ae35da8d17
|
[fix] uninitialized var
|
2017-02-08 01:58:53 -05:00 |
|
Al
|
ec3a563591
|
Merge branch 'master' into parser-data
|
2017-01-14 13:06:25 -05:00 |
|
Rinigus
|
67624f89d0
|
cstring_array_from_char_array: return empty initializes cstring_array from empty string
|
2017-01-14 10:43:47 +02:00 |
|
Al
|
b320aed9ac
|
[merge] merging master
|
2017-01-13 19:58:49 -05:00 |
|
Al
|
e1f258171f
|
[fix] handle cstring_array_from_char_array where char_array is NULL or 0-length
|
2017-01-13 16:52:41 -05:00 |
|
Al
|
953a26e54e
|
[utils] char_array_add_vjoined to stay consistent (add_* methods NUL termiante)
|
2017-01-09 16:10:07 -05:00 |
|
Al
|
4ad3a52fe1
|
[strings] fix lowercasing in string_utils.c
|
2017-01-01 20:08:34 -05:00 |
|
Al
|
7d6c85aeec
|
[fix] new string tree iterator, don't decrement permutations on rollovers
|
2017-01-01 13:34:08 -05:00 |
|
Al
|
1780c5e053
|
[fix] moving enum
|
2016-12-31 13:01:57 -05:00 |
|
Al
|
475aa3dbfa
|
[strings] fixing and simplifying string tree iterator. This version is inspired by Python's itertools.product (itertoolsmodule.c has so many goodies)
|
2016-12-31 03:22:27 -05:00 |
|
Al
|
58b063b632
|
[strings] making string_tree_iterator_done more meaningful (returns true if the iterator has no paths left to traverse)
|
2016-12-31 00:54:36 -05:00 |
|
Al
|
8978000320
|
[strings] adding latest utf8proc, new functions for utf8_lower (instead of case folding) and utf8_upper, and a utf8_is_whitespace that takes things like tabs into account
|
2016-12-31 00:52:12 -05:00 |
|
Al
|
0284913aa7
|
[utils] ignore initial separators when splitting on delimiter
|
2016-12-26 04:14:20 -05:00 |
|
Al
|
3ac2c93e1c
|
[utils] using renaming char_array_append_vjoined to char_array_add_vjoined to follow convention that add_* calls NUL-terminate while append_* calls do not
|
2016-12-18 15:26:58 -05:00 |
|
Al
|
3939dd0ca6
|
[fix] cstring_array_split calls
|
2016-12-12 11:37:27 -05:00 |
|
Al
|
b1816e9b70
|
[utils] Adding cstring_array_split_ignore_consecutive
|
2016-12-12 11:37:27 -05:00 |
|
Al
|
b639fa5127
|
[utils] string_replace also creates a copy
|
2016-11-30 10:09:33 -08:00 |
|
Al
|
89f6611c4e
|
[strings] string_trim makes a copy rather than modifying the pointer
|
2016-11-28 15:06:07 -08:00 |
|
Al
|
92e66fd60c
|
[utils] string_next_hyphen_index
|
2016-08-16 12:49:52 -04:00 |
|
Al
|
b8d43dc601
|
[fix] cstring_array_split calls
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
b664ab1cea
|
[utils] Adding cstring_array_split_ignore_consecutive
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
98c395d34c
|
[numex] Concatenating a string of numeric expressions with no intervening tokens like Seventeen Eighty or Ten Oh Four
|
2016-02-10 09:21:31 -05:00 |
|
Al
|
7b300639f1
|
[fix] Trie prefix search tail comparison
|
2016-01-17 20:56:37 -05:00 |
|
Al
|
0d5cf0d6d7
|
[utils] char_array_cat_printf was forcing a doubling of the size of the buffer, which is bad if calling many times. Now only initiates a realloc if the char_array is almost full. Also adding cstring_array_from_strings which takes a list of char *s
|
2016-01-06 22:56:01 -05:00 |
|
Al
|
d0aaff1482
|
[utils] string_equals with NULL check
|
2015-12-01 13:12:08 -05:00 |
|
Al
|
40918812e2
|
[normalize] Adding hyphen elimination as a string option (changes tokenization)
|
2015-10-27 13:32:47 -04:00 |
|
Al
|
6428c0ae20
|
[utils] cstring_array_cat
|
2015-10-03 16:00:13 -04:00 |
|
Al
|
3fab0f984f
|
[fix] fixing some compiler warnings, using type-specific abs functions for vector_math
|
2015-09-19 16:11:09 -04:00 |
|
Al
|
35b9122a1a
|
[utils] inlining a few functions
|
2015-09-10 16:33:54 -07:00 |
|
Al
|
0ddf50cb5f
|
[utils] add to feature array with printf syntax
|
2015-09-10 10:24:51 -07:00 |
|
Al
|
b3f89a207a
|
[utils] Version of string_split for single character delimiters which modifies the input string directly rather than creating (essentially) a copy
|
2015-09-09 18:07:31 -07:00 |
|
Al
|
9d2ca08fc2
|
[utils] Adding _copy and _new_copy methods to vectors (the former copies data to a pre-allocated vector, the latter allocates a new vector)
|
2015-09-06 21:01:26 -07:00 |
|
Al
|
a13e5117b5
|
[utils] string_tree_num_strings method
|
2015-08-10 17:46:37 -04:00 |
|
Al
|
064b6b5898
|
[utils] char_array_append_reversed for adding reversed strings without a malloc
|
2015-08-10 16:10:05 -04:00 |
|
Al
|
9b69d1f67a
|
[fix] Removing C++ checks from all but the main API functions
|
2015-08-07 17:15:39 -04:00 |
|