Al
|
448ca6a61a
|
[merge] merging commit from v1.1
|
2017-10-12 01:41:04 -04:00 |
|
Al
|
0c6af2b74c
|
[fix] normalize canonical strings (after expanding abbreviations, concatenated suffixes, etc.) with Latin-ASCII, Latin-ASCII-Simple or simple UTF-8 normalization depending on the options
|
2017-08-03 14:08:05 -06:00 |
|
Iestyn Pryce
|
ecd07b18c1
|
Fix log_* formats which expect size_t but receive uint32_t.
|
2017-05-19 22:31:56 +01:00 |
|
Iestyn Pryce
|
f34fc56fec
|
Fix log_debug formats which expect unsigned int but receive size_t
|
2017-05-14 17:48:26 +01:00 |
|
Al
|
a7e67c4967
|
[fix] adding maximum number of permutations for libpostal_expand_address to consider (n=100 for both the inner and outer loop, so max strings=10000), fixes #200
|
2017-05-13 14:11:08 -04:00 |
|
Al
|
5780a08b48
|
[fix] check that possible ordinal suffix also has non-zero digit length before normalizing
|
2017-05-12 15:48:20 -04:00 |
|
Al
|
f3adde746e
|
[numex] adding ability to handle handle the degree symbol in numex parsing since it's technically a separate token
|
2017-04-19 20:18:21 -04:00 |
|
Al
|
cddc368533
|
[numex] adding one form of normalization which strips ordinal suffixes so {96th, Ninety-sixth} => 96. This is an additional form of normalization, so there's still one form where the suffixes are kept. One case that's still not handled is something like "IXe Arrondissement"
|
2017-04-18 21:39:54 -04:00 |
|
Al
|
8742574257
|
[parser] storing address_parser_context on the parser struct itself so it doesn't have to be allocated every time
|
2017-04-04 20:40:55 -04:00 |
|
Al
|
6d4c7984df
|
[api] doing this now since we're bumping a major version. Using a libpostal prefixes for all public header functions and definitions
|
2017-03-31 03:35:51 -04:00 |
|
Al
|
a3e51db32d
|
[api] include some of the new components in default address_components for the libpostal expansion API
|
2017-02-15 22:29:22 -05:00 |
|
Al
|
9a93e95938
|
[api] removing geodb from setup functions
|
2017-02-10 01:02:52 -05:00 |
|
Al
|
b320aed9ac
|
[merge] merging master
|
2017-01-13 19:58:49 -05:00 |
|
Al
|
a3506131fe
|
[build] adding libpostal_setup_datadir, libpostal_setup_parser_datadir, libpostal_setup_language_classifier_datadir functions for configuring the datadir at runtime
|
2017-01-09 16:11:26 -05:00 |
|
Al
|
58b063b632
|
[strings] making string_tree_iterator_done more meaningful (returns true if the iterator has no paths left to traverse)
|
2016-12-31 00:54:36 -05:00 |
|
Al
|
091167ed3c
|
[api] remove geodb from libpostal.c
|
2016-12-29 02:35:43 -05:00 |
|
Al
|
eea11beb6a
|
[expansion] using easier-to-access data structure for address dictionaries
|
2016-11-27 00:56:48 -08:00 |
|
Al
|
2e8888e331
|
[fix] warnings/size_t in libpostal.c
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
83381e9d8a
|
[expand] Adding exception for a few types of special punctuation (ampersand, plus, pound sign) which should be left in the original string and separated by whitespace. Closes #84. Closes #85
|
2016-07-17 15:02:47 -04:00 |
|
Al
|
ce78064988
|
[fix] NULL checks
|
2016-07-15 13:23:23 -04:00 |
|
Al
|
58a5dbe7e0
|
[logging] Logging the value of LIBPOSTAL_DATA_DIR when a setup error occurs
|
2016-07-01 14:51:04 -04:00 |
|
Al
|
9819ebf949
|
[fix] always include expansions in the ambiguous expansion dictionary, no matter which component
|
2016-04-29 13:26:13 -04:00 |
|
Al
|
14e8f50cf1
|
[fix] Expansions when passing in the address_components= option. Was only limiting results at the phrase level, should work at the individual expansion level
|
2016-03-29 16:46:29 -04:00 |
|
Al
|
37c09d1ed9
|
[api] Adding function to free expansions from expand_address
|
2016-02-16 10:56:45 -05:00 |
|
Al
|
98165e89ad
|
[api] Using bools instead of bit fields in the public API
|
2016-02-15 18:33:39 -05:00 |
|
Al
|
cf2a79bef1
|
[api] Default options accessible through getters, not static structs
|
2016-02-15 17:34:00 -05:00 |
|
Al
|
84d5ba18f0
|
[api] Fixing multi-language expansions with overlapping expansions, whitespace, utf8 normalization of canonical strings
|
2016-02-08 02:50:34 -05:00 |
|
Al
|
9ac0379a65
|
[phrases] Case where trie search finds a match, makes progress beyond the next token but has to fall back. Adding trie search test case
|
2016-02-08 01:07:56 -05:00 |
|
Al
|
085bfd6ada
|
[fix] static methods for libpostal.c
|
2016-01-30 02:20:59 -05:00 |
|
Al
|
42d169feee
|
[api] Libpostal expand API will now detect language automatically using a high accuracy language classifier trained on OSM streets/addresses/toponyms. Hooray batch geocoding!
|
2016-01-27 03:23:51 -05:00 |
|
Al
|
780966a59b
|
[api] More spacing fixes and using language information in normalize string
|
2015-12-31 03:52:14 -05:00 |
|
Al
|
9335d26fbd
|
[fix] spacing
|
2015-12-31 02:26:28 -05:00 |
|
Al
|
e4dba2297d
|
[mv] Moving token type checking to header
|
2015-12-28 01:17:33 -05:00 |
|
Al
|
0fa1c2389c
|
[fix] Leak in expanding strings that have a separable prefix and suffix, other than that ran through 78 million expansions with no discernable memory issues
|
2015-12-26 17:19:59 -05:00 |
|
Al
|
5439f4679f
|
[fix] Special tokens like emails/urls/phone numbers bypass normalization
|
2015-12-20 03:07:36 -05:00 |
|
Al
|
cf2a0efa11
|
[fix] Prefixes and suffixes that are the same length as the original token should be handled as regular expansions
|
2015-12-19 17:29:26 -05:00 |
|
Al
|
97906c86a8
|
[fix] Strip punctuation in final output in cases where there are no expansions
|
2015-12-19 02:10:41 -05:00 |
|
Al
|
4497c4501e
|
[fix] do not add a token if prefix/suffix expansions are inseparable and canonical
|
2015-12-19 01:36:02 -05:00 |
|
Al
|
b4a8a69226
|
[expansion] Fixing extra space on prefix/suffix expansions
|
2015-12-18 20:28:59 -05:00 |
|
Al
|
b9bf5c629e
|
[fix] Moving address_parser_response_destroy into libpostal so caller can free
|
2015-12-15 00:52:24 -05:00 |
|
Al
|
406f9c533d
|
[api] Separating parser setup/teardown into two separate methods
|
2015-12-14 18:15:57 -05:00 |
|
Al
|
dc03c83bb2
|
[math] Adding an aligned memory allocator for vectors to help with vectorization/SIMD
|
2015-12-14 14:56:38 -05:00 |
|
Al
|
88836e56e1
|
[api] Adding parse_address implementation to the libpostal API. GeoDB and address parser are now required. Stripping punctuation from the normalized output
|
2015-12-12 12:47:44 -05:00 |
|
Al
|
2fcc72ae07
|
[fix] multitoken canonical strings
|
2015-12-08 15:38:04 -05:00 |
|
Al
|
d35f519629
|
[expansion] Fixing case where non-ideographic tokens like # can potentially be concatenated with surrounding tokens and should normalized with whitespace in between
|
2015-12-07 19:18:46 -05:00 |
|
Al
|
0d8d396108
|
[expansion] Fixing cases like ML King where a global (all languages) expansion subsumes the specific language expansion (like English)
|
2015-12-07 18:09:25 -05:00 |
|
Al
|
9bab70909d
|
[numex] Always adding a version of the string without Roman numeral expansion since many times those tokens can be ambiguous
|
2015-12-07 14:29:18 -05:00 |
|
Al
|
43287db90a
|
[normalization/phrases] Fixing a bug which occurs with an already-separated elision
|
2015-12-02 16:04:39 -05:00 |
|
Al
|
1a1d74785c
|
[fix] Compiler warnings for casts/printf
|
2015-10-26 18:52:18 -04:00 |
|
Al
|
3cba2e8df3
|
[api] Using default setup methods for submodules in libpostal setup
|
2015-09-15 14:01:33 -04:00 |
|