Commit Graph

74 Commits

Author SHA1 Message Date
Al
0540d7c7e3 [api/compat] PR #465 redefined the language classifier response struct in the API and was casting between incompatible pointer types. Using the exported struct throughout. 2025-01-30 01:45:18 -05:00
Al
26124ee72f [near_dupes] exposing name_word_hashes directly in the API 2022-03-25 14:04:26 -04:00
Luiz Otavio V. B. Oliveira
0327150d2b Exposes language classification functions 2019-06-14 14:31:12 +02:00
Al
c5bb9d8daa [normalize/api] exposing normalize_string_languages and normalized_tokens_languages to the API for pre-normalizing numeric expressions at tokenization time 2018-02-22 18:47:36 -05:00
Al
86d5eca521 [api] checking for NULL responses in the cstring_array methods before converting them to char arrays 2017-12-30 02:31:25 -05:00
Al
6dff154a99 [api] adding APIs for getting default options and using a consistent naming convention 2017-12-29 17:48:54 -05:00
Al
8495cda1eb [api] adding pairwise-dupe functions/structs to the public header 2017-12-29 13:48:54 -05:00
Al
1f1412c120 [api] adding libpostal_place_languages method to public API for classifying languages consistently from components (may need to make several calls using the same languages and don't necessarily want the language classifier to be run on house numbers when we already know the languages from e.g. the street name - this provides a simple window into the language classifier focused on the entire address/record 2017-12-29 03:32:41 -05:00
Al
f3a626463a [api] adding API functions for near dupe hashes to the public header 2017-12-24 12:43:28 -05:00
Al
8b2a4d1ecf [api] adding libpostal_expand_address_root to the public API. This will attempt to delete tokens that can be safely ignored. It's deterministic and rule-based, but is informed by libpostal's fairly comprehensive dictionaries, and should work relatively well across languages for deduping purposes. 2017-12-17 17:46:26 -05:00
Al
8968a6c966 [expand] moving expand to its own module so the internal methods can be exposed, calling from libpostal.c 2017-12-08 16:26:13 -05:00
Al
ec4d683d1b Merge branch 'master' into lieu_api 2017-11-29 15:49:52 -05:00
AeroXuk
9090811826 Modifed the libpostal API to add an extra function libpostal_parser_print_features to toggle debugging info. Updated address_parser app to use the new function. 2017-11-27 19:20:37 +00:00
AeroXuk
26ac9ab5c2 Removing EXPORT statements from all source files and most header files, leaving only the exports for the main API in libpostal.h. Modified Makefiles so that all the test apps build without having extra functions exported from libpostal. 2017-11-25 04:35:28 +00:00
AeroXuk
2d3b420d35 Merging changes from AeroXuk/libpostal_windows. 2017-11-19 12:44:38 +00:00
Al
053dca82ba [expand] adding a normalization for a single non-acronym internal period where there's an expansion at the prefix/suffix (for #218 and https://github.com/openvenues/libpostal/issues/216#issuecomment-306617824). Helps in cases like "St.Michaels" or "Jln.Utara" without needing to specify concatenated prefix phrases for every possibility 2017-10-28 02:38:15 -04:00
Al
5c927e780f [expand] adding ability to expand Roman numerals with ordinal suffixes like IXe in French 2017-10-20 02:51:26 -04:00
Al
448ca6a61a [merge] merging commit from v1.1 2017-10-12 01:41:04 -04:00
Al
0c6af2b74c [fix] normalize canonical strings (after expanding abbreviations, concatenated suffixes, etc.) with Latin-ASCII, Latin-ASCII-Simple or simple UTF-8 normalization depending on the options 2017-08-03 14:08:05 -06:00
Iestyn Pryce
ecd07b18c1 Fix log_* formats which expect size_t but receive uint32_t. 2017-05-19 22:31:56 +01:00
Iestyn Pryce
f34fc56fec Fix log_debug formats which expect unsigned int but receive size_t 2017-05-14 17:48:26 +01:00
Al
a7e67c4967 [fix] adding maximum number of permutations for libpostal_expand_address to consider (n=100 for both the inner and outer loop, so max strings=10000), fixes #200 2017-05-13 14:11:08 -04:00
Al
5780a08b48 [fix] check that possible ordinal suffix also has non-zero digit length before normalizing 2017-05-12 15:48:20 -04:00
Al
f3adde746e [numex] adding ability to handle handle the degree symbol in numex parsing since it's technically a separate token 2017-04-19 20:18:21 -04:00
Al
cddc368533 [numex] adding one form of normalization which strips ordinal suffixes so {96th, Ninety-sixth} => 96. This is an additional form of normalization, so there's still one form where the suffixes are kept. One case that's still not handled is something like "IXe Arrondissement" 2017-04-18 21:39:54 -04:00
Al
8742574257 [parser] storing address_parser_context on the parser struct itself so it doesn't have to be allocated every time 2017-04-04 20:40:55 -04:00
Al
6d4c7984df [api] doing this now since we're bumping a major version. Using a libpostal prefixes for all public header functions and definitions 2017-03-31 03:35:51 -04:00
Al
a3e51db32d [api] include some of the new components in default address_components for the libpostal expansion API 2017-02-15 22:29:22 -05:00
Al
9a93e95938 [api] removing geodb from setup functions 2017-02-10 01:02:52 -05:00
Al
b320aed9ac [merge] merging master 2017-01-13 19:58:49 -05:00
Al
a3506131fe [build] adding libpostal_setup_datadir, libpostal_setup_parser_datadir, libpostal_setup_language_classifier_datadir functions for configuring the datadir at runtime 2017-01-09 16:11:26 -05:00
Al
58b063b632 [strings] making string_tree_iterator_done more meaningful (returns true if the iterator has no paths left to traverse) 2016-12-31 00:54:36 -05:00
Al
091167ed3c [api] remove geodb from libpostal.c 2016-12-29 02:35:43 -05:00
Al
eea11beb6a [expansion] using easier-to-access data structure for address dictionaries 2016-11-27 00:56:48 -08:00
Al
2e8888e331 [fix] warnings/size_t in libpostal.c 2016-07-21 17:04:57 -04:00
Al
83381e9d8a [expand] Adding exception for a few types of special punctuation (ampersand, plus, pound sign) which should be left in the original string and separated by whitespace. Closes #84. Closes #85 2016-07-17 15:02:47 -04:00
Al
ce78064988 [fix] NULL checks 2016-07-15 13:23:23 -04:00
Al
58a5dbe7e0 [logging] Logging the value of LIBPOSTAL_DATA_DIR when a setup error occurs 2016-07-01 14:51:04 -04:00
Al
9819ebf949 [fix] always include expansions in the ambiguous expansion dictionary, no matter which component 2016-04-29 13:26:13 -04:00
Al
14e8f50cf1 [fix] Expansions when passing in the address_components= option. Was only limiting results at the phrase level, should work at the individual expansion level 2016-03-29 16:46:29 -04:00
Al
37c09d1ed9 [api] Adding function to free expansions from expand_address 2016-02-16 10:56:45 -05:00
Al
98165e89ad [api] Using bools instead of bit fields in the public API 2016-02-15 18:33:39 -05:00
Al
cf2a79bef1 [api] Default options accessible through getters, not static structs 2016-02-15 17:34:00 -05:00
Al
84d5ba18f0 [api] Fixing multi-language expansions with overlapping expansions, whitespace, utf8 normalization of canonical strings 2016-02-08 02:50:34 -05:00
Al
9ac0379a65 [phrases] Case where trie search finds a match, makes progress beyond the next token but has to fall back. Adding trie search test case 2016-02-08 01:07:56 -05:00
Al
085bfd6ada [fix] static methods for libpostal.c 2016-01-30 02:20:59 -05:00
Al
42d169feee [api] Libpostal expand API will now detect language automatically using a high accuracy language classifier trained on OSM streets/addresses/toponyms. Hooray batch geocoding! 2016-01-27 03:23:51 -05:00
Al
780966a59b [api] More spacing fixes and using language information in normalize string 2015-12-31 03:52:14 -05:00
Al
9335d26fbd [fix] spacing 2015-12-31 02:26:28 -05:00
Al
e4dba2297d [mv] Moving token type checking to header 2015-12-28 01:17:33 -05:00