Commit Graph

775 Commits

Author SHA1 Message Date
Al
c0c21b81f2 [build] Adding generated configure script 2015-08-07 17:35:44 -04:00
Al
a197d04b1a [fix] float comparison 2015-08-07 17:28:21 -04:00
Al
f161f68d53 [build] Changes to Makefile.am to build on Debian/Ubuntu, fixing downloading of the data tarball for Mac and Linux 2015-08-07 17:27:34 -04:00
Al
9b69d1f67a [fix] Removing C++ checks from all but the main API functions 2015-08-07 17:15:39 -04:00
Al
359a1efb03 [fix] Adding stdint.h include to most of the header files for portability 2015-08-07 02:43:44 -04:00
Al
0738a57caa [fix] restoring ctype.h include 2015-08-07 01:52:08 -04:00
Al
06d2e916a1 [fix] includes, matters on GCC/Linux 2015-08-07 01:51:34 -04:00
Al
ae9825b9f9 [build] Fixing data dir download in Automake file 2015-08-07 01:51:06 -04:00
Al
d7ebcd046e [fix] includes 2015-08-07 01:00:26 -04:00
Al
f246c2ee95 [api] Adding address component constants to libpostal.h, returning char ** instead of a cstring_array to simplify API/dependencies 2015-08-06 17:52:54 -04:00
Al
61d586fa1d [config] config.h=>libpostal_config.h so as not to conflict with autoconf 2015-08-06 17:50:55 -04:00
Al
2bedb695a2 [build] adding Automake file in src, including rule to download data dir tarball 2015-08-06 17:48:37 -04:00
Al
4b9f11eca5 [build] Main Automake file and modified version of Sparkey's Automake file 2015-08-06 02:14:33 -04:00
Al
fe078cff66 [build] Adding Autoconf file 2015-08-06 02:13:43 -04:00
Al
1d39916aaa [fix] Fixing warnings in unicode script data 2015-08-02 21:30:54 -06:00
Al
770ce4256f [expansion] Re-generating address expansion data file 2015-08-02 21:30:19 -06:00
Al
90cde298dd [dictionaries] condensed forms of sin numero in various languages 2015-08-02 21:19:55 -06:00
Al
753c6efb1d [api] Initial libpostal API, combining string normalization, transliteration, numex and address dictionaries 2015-08-02 21:16:18 -06:00
Al
b27030e39f [fix] tokenized trie search was skipping tokens in some cases 2015-08-02 14:36:21 -06:00
Al
3178eda501 [utils] string_contains_hyphen method 2015-08-02 14:35:18 -06:00
Al
46141a6c36 [normalize] Adding an option when normalizing tokens to split tokens of the form [\w]+[\.\-]?[\d]+ for cases like I35, CR123, R-66, RN.7, etc. where the alpha component is an expansion 2015-08-02 14:34:36 -06:00
Al
f10dd49c58 [expansion] NULL_CANONICAL_INDEX constant 2015-08-01 23:59:16 -06:00
Al
6bf563ca89 [dictionaries] Italian abbreviations for strada 2015-07-28 19:15:30 -04:00
Al
fe4789a665 [fix] compiler warnings 2015-07-28 19:14:00 -04:00
Al
551904d202 [normalize] cstring_array instead of string_tree for token-based normalization 2015-07-28 19:09:50 -04:00
Al
90d4da9e72 [geodb] Adding an is_canonical bit field to geodb trie values 2015-07-28 19:08:24 -04:00
Al
9bc902f575 [numex] LATIN_LANGUAGE_CODE constant for Roman numeral normalization 2015-07-28 18:12:12 -04:00
Al
df1410da8c [numex] Fixing numex parsing for lone stopwords and certain prefix matches that were getting mistakenly converted e.g. settembre => 7mbre 2015-07-28 18:11:23 -04:00
Al
a16f0dabcb [numex] Fixing hyphen-initial numeric phrases that end the string 2015-07-28 03:28:44 -04:00
Al
3dc6115a4e [dictionaries] Updates to English and Spanish dictionaries on looking through a data set of real test addresses 2015-07-27 16:42:09 -04:00
Al
0f5b69c06b [fix] transition to SEARCH_STATE_NO_MATCH in trie_search_tokens_from_index on a return to the start node 2015-07-27 16:35:27 -04:00
Al
243f327928 [fix] NULL check 2015-07-27 16:32:01 -04:00
Al
7aee159c0c [utils] string_tree_num_tokens 2015-07-27 12:36:34 -04:00
Al
b812d90c59 [fix] specifying numex dir with cross-platform PATH_SEPARATOR 2015-07-27 12:36:06 -04:00
Al
7ff9a6054d [geodb] trim strings in geodb builder 2015-07-27 02:37:20 -04:00
Al
053b987d58 [normalize] adding an option for string trimming in normalize 2015-07-27 01:59:14 -04:00
Al
b94526a27b [utils] Making string_trim handle all kinds of UTF-8 whitespace/separators 2015-07-27 01:55:46 -04:00
Al
eab4c554d6 [numex] Regenerating numex data file 2015-07-27 01:53:13 -04:00
Al
0ab1434f20 [numex] Making all languages except the ideographic writing systems (CJK) whole_tokens_only for numex. Otherwise non-number prefixes may accidentally get converted into numbers. May add some more options around this in the future. 2015-07-27 01:52:44 -04:00
Al
d2539f5b57 [numex] Fixing case of hyphen/space-initial phrases in numex, as well as whole token only languages with ordinals 2015-07-27 01:44:33 -04:00
Al
8ff4ace63b [phrases] Allowing trie_search to process tokenized input with or without whitespace, and to handle ideographic characters correctly 2015-07-26 23:41:57 -04:00
Al
38b10b9dd0 [fix] Clearing paths before reuse in geodb_builder 2015-07-26 23:36:34 -04:00
Al
93042761ac [fix] warnings in string_utils.c 2015-07-26 23:36:03 -04:00
Al
50ee95ff7d [geodb] Adding a msgpack'd list of ids for naked string keys in geodb builder 2015-07-25 18:42:13 -04:00
Al
a67ec44a08 [utils] cstring_array_terminate, moving msgpack_utils to separate file 2015-07-25 18:41:02 -04:00
Al
42f6be7434 [fix] county road 2015-07-25 14:19:38 -04:00
Al
2ff8c0fd1e [transliteration] fixing length-based transliteration 2015-07-25 13:53:28 -04:00
Al
71ffdf9cbc [expansion] tokenized version of search_address_dictionaries 2015-07-25 13:50:53 -04:00
Al
ee96dab93c [fix] unnecessary headers 2015-07-25 13:49:42 -04:00
Al
e549e76806 [utils] string_tree_iterator_foreach_token 2015-07-25 13:49:02 -04:00