Al
|
08f39d6b80
|
[parser] Adding address_parser_rewind to make multiple passes through the file when compiling the phrase tries
|
2016-07-28 17:13:58 -04:00 |
|
Al
|
1b09b7f2e5
|
[fix] Adding country_region to address_parser_train
|
2016-07-28 16:18:32 -04:00 |
|
Al
|
c6af5cc071
|
[parser] Adding country_region label to parser as a boundary component
|
2016-07-28 15:19:48 -04:00 |
|
Al
|
64f167f045
|
[tokenization] Re-generating scanner
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
81b4a4a1cb
|
[tokenization] Hyphens, etc. between non-ASCII digits (e.g. Unicode full-width numbers) should be single tokens
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
be5fd79a48
|
[expansion] Prefix/suffix expansions by default can apply to ADDRESS_ANY but also inherit the types of any dictionary that lists their canonical form (so we can add suffixes without worrying about whether they're for streets or place names, etc.)
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
8926293063
|
[parser/cli] Using NFC normalization on the output in the parser client (closes #30). Optional command-line arg for parser output dir, useful for spot-checking different experiments
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
44908ff95a
|
[parser] No digit normalization in training data-derived parser phrases (for postcodes, etc.), phrases include the new island type, house number phrases if any are valid. Adjacent words are now full phrases if they are part of a multiword token like a city name. For hyphenated names like Carmel-by-the-Sea, adding a version to the phrase dictionary where the hyphens are replaced with spaces
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
41ae742285
|
[fix] tokenized trie search when falling off the trie at the start of a valid phrase
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
6e60b3bbda
|
[fix] semicolon in #define
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
b5d4dd6f37
|
[tokenization] Including full-width numbers in numeric tokens
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
dd7ef6fabf
|
[dictionaries] Making new component for near/nearby prepositions
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
2454b98c6d
|
[tokenization] Reverting commit for tokenizing initial/final apostrophes as part of words as it may be more effective to handle during post-processing
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
0a8f46bdc3
|
[parser] Using new geonames designations in parser features
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
c383f8af88
|
[parser] Using NFC normalization for parser as well, @ sign not defined as separator since it may also be used in intersections
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
c2ee5a45b3
|
[geodb] Adding separate bitset for geonames place types and using NFC normalization instead of NFD (requires retraining)
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
6c39c663ff
|
[normalize] Adding NORMALIZE_STRING_COMPOSE for NFC unicode normalization
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
757c6147cb
|
[tokenization] Adding ability to tokenize 's Gravenhage
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
2e8888e331
|
[fix] warnings/size_t in libpostal.c
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
e800f21f06
|
[gazetteers] Adding new gazetteer types/address components
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
e5e0cf3b92
|
[fix] loading transliteration module in address_parser_test.c as well
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
b8d43dc601
|
[fix] cstring_array_split calls
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
b19cd3f60a
|
[fix] brace
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
994b2f18e4
|
[parser] Ignore multiple spaces in parser input post-normalization. If normalizing the string creates several distinct tokens (namely in Vulgar fractions e.g. ½ => 1/2), add all the sub-tokens with the same label as the parent
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
b664ab1cea
|
[utils] Adding cstring_array_split_ignore_consecutive
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
8e90ee45d2
|
[fix] calls and NULL checks
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
e3cffaf0d1
|
[fix] tokenized_string_t should copy its source string
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
16501aba17
|
[fix] Need to load transliteration module for Latin-ASCII normalization
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
a9ba61585b
|
[fix] Adding set -e to data download script so it fails if any subcommands fail
|
2016-05-04 23:08:06 -04:00 |
|
Al
|
9819ebf949
|
[fix] always include expansions in the ambiguous expansion dictionary, no matter which component
|
2016-04-29 13:26:13 -04:00 |
|
Al
|
0bc3550c11
|
[expansion] Adding address_expansion_in_dictionary
|
2016-04-29 13:23:48 -04:00 |
|
Al
|
59e5fcd1b4
|
[fix] LC_ALL=C in data download script
|
2016-04-11 12:47:50 -04:00 |
|
Travis
|
b8d4d71522
|
[auto][ci skip] Adding data files from Travis build #112
|
2016-03-30 20:04:52 +00:00 |
|
Al
|
14e8f50cf1
|
[fix] Expansions when passing in the address_components= option. Was only limiting results at the phrase level, should work at the individual expansion level
|
2016-03-29 16:46:29 -04:00 |
|
Travis
|
2795d258d1
|
[auto][ci skip] Adding data files from Travis build #108
|
2016-03-29 19:11:57 +00:00 |
|
Al
|
6dad58c696
|
[fix][ci skip] last remaining instance of vignt in libpostal
|
2016-03-29 12:51:19 -04:00 |
|
Travis
|
08d873ac15
|
[auto][ci skip] Adding data files from Travis build #105
|
2016-03-29 15:39:14 +00:00 |
|
Travis
|
49adcfe9b5
|
[auto][ci skip] Adding data files from Travis build #97
|
2016-03-22 14:33:13 +00:00 |
|
Al
|
25c8ba8603
|
[fix] Log more helpful error message in language_classifier if not loaded
|
2016-03-21 18:18:25 -04:00 |
|
Al
|
0356b45069
|
[fix] Log errors in numex module if not loaded
|
2016-03-21 18:15:53 -04:00 |
|
Al
|
943cd4443a
|
[fix] Log errors if address dictionaries not loaded
|
2016-03-21 18:13:14 -04:00 |
|
Al
|
510f12ff96
|
[fix] Log error in transliteration if setup hasn't been called
|
2016-03-21 18:06:02 -04:00 |
|
Al
|
1b94727871
|
[fix] Check that parser is loaded in parse_address, log and return NULL instead of segfaulting
|
2016-03-21 18:04:26 -04:00 |
|
Al
|
be7b696cb2
|
[fix] actually that temporary array is unnecessary altogether, eliminating
|
2016-03-21 17:00:11 -04:00 |
|
Al
|
e0f7638372
|
[fix] Freeing up temporary char_array
|
2016-03-21 16:50:48 -04:00 |
|
Travis
|
14093a263d
|
[auto][ci skip] Adding data files from Travis build #92
|
2016-03-21 16:43:23 +00:00 |
|
Travis
|
0dfd20f14d
|
[auto][ci skip] Adding data files from Travis build #86
|
2016-03-16 20:37:31 +00:00 |
|
Travis
|
576e91d3fa
|
[auto][ci skip] Adding data files from Travis build #84
|
2016-03-16 19:08:17 +00:00 |
|
Travis
|
2dc9643b29
|
[auto][ci skip] Adding data files from Travis build #82
|
2016-03-14 16:29:21 +00:00 |
|
Al
|
0d7f9f2032
|
[data] Using UTC dates for libpostal data file tracking for #38. Also silencing curl when checking if file was updated
|
2016-03-10 16:44:02 -05:00 |
|