bbaa302e2e[fix] NUMEX_STOPWORD_RULE define
Al
2015-08-09 01:03:23 -04:00
5383640c14[fix] cast
Al
2015-08-09 01:01:11 -04:00
dd391eabe5[numex] Separating rules from keys for Linux gcc compilation
Al
2015-08-09 01:00:57 -04:00
e346b831cb[build] public-read permissions when uploading to S3
Al
2015-08-09 00:17:04 -04:00
ad584671c4[build] Not compiling with -Werror for now
Al
2015-08-09 00:02:41 -04:00
f170f70727[build] Link to math library
Al
2015-08-09 00:01:44 -04:00
423e2c86c7[build] builder programs are now in noinst_PROGRAMS, Makefile target to upload data tarball to S3 (with proper credentials)
Al
2015-08-08 23:29:34 -04:00
a5ce1f12dd[fix] stdint header in address expansion rule generation script
Al
2015-08-08 23:28:11 -04:00
ee982cd872[dictionaries] Removing dictionaries/all/personal_suffixes, can add to languages as needed
Al
2015-08-08 23:13:02 -04:00
5acf7a4f3e[phrases] resetting node position when continuation falls off the trie
Al
2015-08-08 22:17:58 -04:00
a77c8e1321[build] Adding bootstrap.sh script and removing configure from version control
Al
2015-08-08 21:22:11 -04:00
cd0f95f9e2[fix] making transliteration path relative to data dir
Al
2015-08-08 21:05:10 -04:00
2ba0e814ad[build] better autoconf checks for time and dirent headers
Al
2015-08-08 21:01:51 -04:00
d0679450e3[config] Including Autoconf config.h in internal config
Al
2015-08-08 20:50:23 -04:00
5df9e123af[numex] Fix to whole_tokens_only numeric experession parsing where numex was pushing a number onto the stack even on encountering a new rule context even though the token was not completely parsed
Al
2015-08-08 20:49:54 -04:00
53f54d6454[fix] removing comment
Al
2015-08-08 20:23:49 -04:00
2106a6cfe4[build] Adding command-line test and bench programs
Al
2015-08-08 19:44:50 -04:00
5aa2e99b92[fix] data dir for tar extraction
Al
2015-08-08 19:42:37 -04:00
54aa6fe7df[build] Fixing runtime check/save of last updated file for package data tarball
Al
2015-08-08 17:15:57 -04:00
f38a53601b[rm] Better not to keep that file in the repo
Al
2015-08-08 02:41:54 -04:00
770f44198c[build] Adding default file to track last updated date
Al
2015-08-08 02:30:42 -04:00
c0c21b81f2[build] Adding generated configure script
Al
2015-08-07 17:35:40 -04:00
a197d04b1a[fix] float comparison
Al
2015-08-07 17:28:15 -04:00
f161f68d53[build] Changes to Makefile.am to build on Debian/Ubuntu, fixing downloading of the data tarball for Mac and Linux
Al
2015-08-07 17:27:34 -04:00
9b69d1f67a[fix] Removing C++ checks from all but the main API functions
Al
2015-08-07 16:30:07 -04:00
359a1efb03[fix] Adding stdint.h include to most of the header files for portability
Al
2015-08-07 02:43:27 -04:00
0738a57caa[fix] restoring ctype.h include
Al
2015-08-07 01:52:01 -04:00
06d2e916a1[fix] includes, matters on GCC/Linux
Al
2015-08-07 01:51:34 -04:00
ae9825b9f9[build] Fixing data dir download in Automake file
Al
2015-08-07 01:51:06 -04:00
d7ebcd046e[fix] includes
Al
2015-08-07 01:00:26 -04:00
f246c2ee95[api] Adding address component constants to libpostal.h, returning char ** instead of a cstring_array to simplify API/dependencies
Al
2015-08-06 17:52:54 -04:00
61d586fa1d[config] config.h=>libpostal_config.h so as not to conflict with autoconf
Al
2015-08-06 17:49:35 -04:00
2bedb695a2[build] adding Automake file in src, including rule to download data dir tarball
Al
2015-08-06 17:48:37 -04:00
4b9f11eca5[build] Main Automake file and modified version of Sparkey's Automake file
Al
2015-08-06 02:14:33 -04:00
fe078cff66[build] Adding Autoconf file
Al
2015-08-06 02:13:43 -04:00
1d39916aaa[fix] Fixing warnings in unicode script data
Al
2015-08-02 21:30:54 -06:00
770ce4256f[expansion] Re-generating address expansion data file
Al
2015-08-02 21:30:19 -06:00
90cde298dd[dictionaries] condensed forms of sin numero in various languages
Al
2015-08-02 21:19:55 -06:00
753c6efb1d[api] Initial libpostal API, combining string normalization, transliteration, numex and address dictionaries
Al
2015-08-02 14:38:10 -06:00
b27030e39f[fix] tokenized trie search was skipping tokens in some cases
Al
2015-08-02 14:36:21 -06:00
3178eda501[utils] string_contains_hyphen method
Al
2015-08-02 14:35:18 -06:00
46141a6c36[normalize] Adding an option when normalizing tokens to split tokens of the form [\w]+[\.\-]?[\d]+ for cases like I35, CR123, R-66, RN.7, etc. where the alpha component is an expansion
Al
2015-08-02 14:34:32 -06:00
f10dd49c58[expansion] NULL_CANONICAL_INDEX constant
Al
2015-08-01 23:59:16 -06:00
6bf563ca89[dictionaries] Italian abbreviations for strada
Al
2015-07-28 19:15:30 -04:00
fe4789a665[fix] compiler warnings
Al
2015-07-28 19:14:00 -04:00
551904d202[normalize] cstring_array instead of string_tree for token-based normalization
Al
2015-07-28 19:09:50 -04:00
90d4da9e72[geodb] Adding an is_canonical bit field to geodb trie values
Al
2015-07-28 19:08:24 -04:00
9bc902f575[numex] LATIN_LANGUAGE_CODE constant for Roman numeral normalization
Al
2015-07-28 18:12:09 -04:00
df1410da8c[numex] Fixing numex parsing for lone stopwords and certain prefix matches that were getting mistakenly converted e.g. settembre => 7mbre
Al
2015-07-28 18:11:19 -04:00
a16f0dabcb[numex] Fixing hyphen-initial numeric phrases that end the string
Al
2015-07-28 03:28:44 -04:00
3dc6115a4e[dictionaries] Updates to English and Spanish dictionaries on looking through a data set of real test addresses
Al
2015-07-27 16:42:06 -04:00
0f5b69c06b[fix] transition to SEARCH_STATE_NO_MATCH in trie_search_tokens_from_index on a return to the start node
Al
2015-07-27 16:35:18 -04:00
243f327928[fix] NULL check
Al
2015-07-27 16:31:55 -04:00
7aee159c0c[utils] string_tree_num_tokens
Al
2015-07-27 12:36:34 -04:00
b812d90c59[fix] specifying numex dir with cross-platform PATH_SEPARATOR
Al
2015-07-27 12:36:06 -04:00
7ff9a6054d[geodb] trim strings in geodb builder
Al
2015-07-27 02:37:20 -04:00
053b987d58[normalize] adding an option for string trimming in normalize
Al
2015-07-27 01:59:14 -04:00
b94526a27b[utils] Making string_trim handle all kinds of UTF-8 whitespace/separators
Al
2015-07-27 01:55:46 -04:00
eab4c554d6[numex] Regenerating numex data file
Al
2015-07-27 01:53:13 -04:00
0ab1434f20[numex] Making all languages except the ideographic writing systems (CJK) whole_tokens_only for numex. Otherwise non-number prefixes may accidentally get converted into numbers. May add some more options around this in the future.
Al
2015-07-27 01:52:41 -04:00
d2539f5b57[numex] Fixing case of hyphen/space-initial phrases in numex, as well as whole token only languages with ordinals
Al
2015-07-27 01:44:33 -04:00
8ff4ace63b[phrases] Allowing trie_search to process tokenized input with or without whitespace, and to handle ideographic characters correctly
Al
2015-07-26 23:41:57 -04:00
38b10b9dd0[fix] Clearing paths before reuse in geodb_builder
Al
2015-07-26 23:36:34 -04:00
93042761ac[fix] warnings in string_utils.c
Al
2015-07-26 23:36:03 -04:00
50ee95ff7d[geodb] Adding a msgpack'd list of ids for naked string keys in geodb builder
Al
2015-07-25 18:42:13 -04:00
a67ec44a08[utils] cstring_array_terminate, moving msgpack_utils to separate file
Al
2015-07-25 18:39:57 -04:00
42f6be7434[fix] county road
Al
2015-07-25 14:19:38 -04:00
2ff8c0fd1e[transliteration] fixing length-based transliteration
Al
2015-07-25 13:53:28 -04:00
71ffdf9cbc[expansion] tokenized version of search_address_dictionaries
Al
2015-07-25 13:50:53 -04:00
ee96dab93c[fix] unnecessary headers
Al
2015-07-25 13:49:42 -04:00
e549e76806[utils] string_tree_iterator_foreach_token
Al
2015-07-25 13:49:02 -04:00
2adaf475c2[utils] cstring_array (contiguous) to array of malloc'd strings
Al
2015-07-25 12:13:50 -04:00
e9277d7339[utils] vector extend method
Al
2015-07-25 01:33:45 -04:00
cdb9afddd3[fix] address training data carriage returns
Al
2015-07-25 00:35:27 -04:00
9fb1eae877[expansion] Regenerating address data file
Al
2015-07-24 16:09:22 -04:00
cff72a0cb3[dictionaries] Adding a few versions of the phrase "centro commerical" in French, Spanish and Italian after a review of addresses in those languages
Al
2015-07-24 16:04:43 -04:00
351c7c8c2e[expansion] Add concatenated suffixes to the suffix keyspace of the address dictionary trie and concatenated prefixes and elisions to the prefix keyspace
Al
2015-07-24 16:02:47 -04:00
90a91cadd0[search] Modifying trie_search_prefixes to use the new key schema
Al
2015-07-24 15:59:49 -04:00
bb7688d8d1[phrases] trie_add_prefix method and a schema for prefix keys, e.g. elisions in French and Italian, separable prefixes like Hinter in German, etc.
Al
2015-07-24 15:56:04 -04:00
359cd62e20[numex] Adding a replace_numeric_expressions method (returns NULL if no replacements were made), fixing lengths in situations where two unrelated numbers are joined by a stopword e.g. in the phrase "one and one" the "and" acts as a delimiter vs a phrase where the stopword acts as a joiner like "one hundred and twenty"
Al
2015-07-24 15:30:58 -04:00
12959aa483[numex] Re-generating numex data
Al
2015-07-24 15:24:03 -04:00
5239c365d0[docs] Adding some documentation for normalize.h options
Al
2015-07-24 15:23:18 -04:00
caf714f06f[fix] typo and frivolous key
Al
2015-07-24 15:22:57 -04:00
87566bb6a5[numex] Adding validation checks for numex JSON
Al
2015-07-24 15:21:52 -04:00
96538469dd[utils] Adding a cstring_array_foreach macro
Al
2015-07-23 15:57:12 -04:00
27af28eacf[expansion] Changes to address_expansion struct to allow for multiple dictionaries per record. Only adding unique canonical strings to the string array
Al
2015-07-22 20:35:29 -04:00
454be89121[expansion] generated header and data files
Al
2015-07-22 20:31:54 -04:00
b27af13f8a[expansion] Adding an array of dictionaries to each (phrase, canonical) pair
Al
2015-07-22 20:24:14 -04:00
0a9e92f11f[expansion] Adding both key (for membership tests) and language-prefixed key to address dictionary
Al
2015-07-22 17:21:09 -04:00
09004aa5f1[expansion] Constant for the "all" dictionary
Al
2015-07-22 17:17:55 -04:00
f61d993157[expansion] removing the self param from address_dictionary methods, adding search_address_dictionaries method which searches a string for phrases in a particular language
Al
2015-07-22 03:51:14 -04:00
3da4b5d8c2[numex] New numex generated data file
Al
2015-07-22 02:24:16 -04:00
ba8ff2b0c6[expansion] Language prefixed keys
Al
2015-07-22 02:16:22 -04:00
157727d249[fix] method name, strlen and fclose
Al
2015-07-22 02:15:45 -04:00
64a63fdf51[mv] Moving all repo data files to a resources dir, data is only for runtime files
Al
2015-07-21 18:10:39 -04:00
a38b924c5d[fix] add_token_alternatives
Al
2015-07-21 17:26:59 -04:00
71be52275d[tokenization] Adding a version which of tokenize which keeps whitespace tokens
Al
2015-07-21 17:26:20 -04:00
5d21cb1604[expansion] Address dictionary builder
Al
2015-07-21 16:46:57 -04:00
6eccde0df8[fix] trie_set_data_at_index
Al
2015-07-21 16:46:38 -04:00
c798876b3d[expansion] Address dictionary allocation, I/O, get/set
Al
2015-07-21 16:46:15 -04:00