Al
|
df530b8f4a
|
[tokenization] Re-generating scanner
|
2016-07-03 23:51:29 -04:00 |
|
Al
|
3cbb1b3976
|
[tokenization] Hyphens, etc. between non-ASCII digits (e.g. Unicode full-width numbers) should be single tokens
|
2016-07-03 23:51:13 -04:00 |
|
Al
|
d79189c501
|
[expansion] Prefix/suffix expansions by default can apply to ADDRESS_ANY but also inherit the types of any dictionary that lists their canonical form (so we can add suffixes without worrying about whether they're for streets or place names, etc.)
|
2016-06-28 02:38:00 -04:00 |
|
Al
|
106dfa80c3
|
[parser/cli] Using NFC normalization on the output in the parser client (closes #30). Optional command-line arg for parser output dir, useful for spot-checking different experiments
|
2016-06-22 11:56:35 -04:00 |
|
Al
|
e19bc86c5a
|
[parser] No digit normalization in training data-derived parser phrases (for postcodes, etc.), phrases include the new island type, house number phrases if any are valid. Adjacent words are now full phrases if they are part of a multiword token like a city name. For hyphenated names like Carmel-by-the-Sea, adding a version to the phrase dictionary where the hyphens are replaced with spaces
|
2016-06-22 11:50:42 -04:00 |
|
Al
|
3ff2f726d0
|
[fix] tokenized trie search when falling off the trie at the start of a valid phrase
|
2016-06-21 15:48:47 -04:00 |
|
Al
|
935a31df07
|
[fix] semicolon in #define
|
2016-06-21 15:16:14 -04:00 |
|
Al
|
eb1b410d63
|
[tokenization] Including full-width numbers in numeric tokens
|
2016-06-14 01:28:25 +02:00 |
|
Al
|
1e295ea8e9
|
[dictionaries] Making new component for near/nearby prepositions
|
2016-06-01 15:32:23 -04:00 |
|
Al
|
5c92185e71
|
[tokenization] Reverting commit for tokenizing initial/final apostrophes as part of words as it may be more effective to handle during post-processing
|
2016-05-30 12:45:58 -04:00 |
|
Al
|
b23f07b679
|
[parser] Using new geonames designations in parser features
|
2016-05-29 01:40:45 -04:00 |
|
Al
|
bbddfe25bf
|
[parser] Using NFC normalization for parser as well, @ sign not defined as separator since it may also be used in intersections
|
2016-05-29 01:37:38 -04:00 |
|
Al
|
1ac077914b
|
[geodb] Adding separate bitset for geonames place types and using NFC normalization instead of NFD (requires retraining)
|
2016-05-29 01:36:18 -04:00 |
|
Al
|
1d1ada1bc1
|
[normalize] Adding NORMALIZE_STRING_COMPOSE for NFC unicode normalization
|
2016-05-28 19:25:12 -04:00 |
|
Al
|
1fd57fdda3
|
[tokenization] Adding ability to tokenize 's Gravenhage
|
2016-05-28 19:24:19 -04:00 |
|
Al
|
514aaf7377
|
[fix] warnings/size_t in libpostal.c
|
2016-05-28 19:19:31 -04:00 |
|
Al
|
c0e8578b9c
|
[gazetteers] Adding new gazetteer types/address components
|
2016-05-28 19:19:18 -04:00 |
|
Al
|
206a471732
|
[fix] loading transliteration module in address_parser_test.c as well
|
2016-05-25 19:54:01 -04:00 |
|
Al
|
f59150b047
|
[fix] cstring_array_split calls
|
2016-05-25 17:58:30 -04:00 |
|
Al
|
5065917f41
|
[fix] brace
|
2016-05-25 17:52:00 -04:00 |
|
Al
|
679d3efcdc
|
[parser] Ignore multiple spaces in parser input post-normalization. If normalizing the string creates several distinct tokens (namely in Vulgar fractions e.g. ½ => 1/2), add all the sub-tokens with the same label as the parent
|
2016-05-25 17:50:29 -04:00 |
|
Al
|
370744ccfd
|
[utils] Adding cstring_array_split_ignore_consecutive
|
2016-05-25 17:07:20 -04:00 |
|
Al
|
5c7d24c71b
|
[fix] calls and NULL checks
|
2016-05-25 15:50:53 -04:00 |
|
Al
|
349df20720
|
[fix] tokenized_string_t should copy its source string
|
2016-05-25 15:48:03 -04:00 |
|
Al
|
00784a897d
|
[fix] Need to load transliteration module for Latin-ASCII normalization
|
2016-05-25 15:25:34 -04:00 |
|
Al
|
a9ba61585b
|
[fix] Adding set -e to data download script so it fails if any subcommands fail
|
2016-05-04 23:08:06 -04:00 |
|
Al
|
9819ebf949
|
[fix] always include expansions in the ambiguous expansion dictionary, no matter which component
|
2016-04-29 13:26:13 -04:00 |
|
Al
|
0bc3550c11
|
[expansion] Adding address_expansion_in_dictionary
|
2016-04-29 13:23:48 -04:00 |
|
Al
|
59e5fcd1b4
|
[fix] LC_ALL=C in data download script
|
2016-04-11 12:47:50 -04:00 |
|
Travis
|
b8d4d71522
|
[auto][ci skip] Adding data files from Travis build #112
|
2016-03-30 20:04:52 +00:00 |
|
Al
|
14e8f50cf1
|
[fix] Expansions when passing in the address_components= option. Was only limiting results at the phrase level, should work at the individual expansion level
|
2016-03-29 16:46:29 -04:00 |
|
Travis
|
2795d258d1
|
[auto][ci skip] Adding data files from Travis build #108
|
2016-03-29 19:11:57 +00:00 |
|
Al
|
6dad58c696
|
[fix][ci skip] last remaining instance of vignt in libpostal
|
2016-03-29 12:51:19 -04:00 |
|
Travis
|
08d873ac15
|
[auto][ci skip] Adding data files from Travis build #105
|
2016-03-29 15:39:14 +00:00 |
|
Travis
|
49adcfe9b5
|
[auto][ci skip] Adding data files from Travis build #97
|
2016-03-22 14:33:13 +00:00 |
|
Al
|
25c8ba8603
|
[fix] Log more helpful error message in language_classifier if not loaded
|
2016-03-21 18:18:25 -04:00 |
|
Al
|
0356b45069
|
[fix] Log errors in numex module if not loaded
|
2016-03-21 18:15:53 -04:00 |
|
Al
|
943cd4443a
|
[fix] Log errors if address dictionaries not loaded
|
2016-03-21 18:13:14 -04:00 |
|
Al
|
510f12ff96
|
[fix] Log error in transliteration if setup hasn't been called
|
2016-03-21 18:06:02 -04:00 |
|
Al
|
1b94727871
|
[fix] Check that parser is loaded in parse_address, log and return NULL instead of segfaulting
|
2016-03-21 18:04:26 -04:00 |
|
Al
|
be7b696cb2
|
[fix] actually that temporary array is unnecessary altogether, eliminating
|
2016-03-21 17:00:11 -04:00 |
|
Al
|
e0f7638372
|
[fix] Freeing up temporary char_array
|
2016-03-21 16:50:48 -04:00 |
|
Travis
|
14093a263d
|
[auto][ci skip] Adding data files from Travis build #92
|
2016-03-21 16:43:23 +00:00 |
|
Travis
|
0dfd20f14d
|
[auto][ci skip] Adding data files from Travis build #86
|
2016-03-16 20:37:31 +00:00 |
|
Travis
|
576e91d3fa
|
[auto][ci skip] Adding data files from Travis build #84
|
2016-03-16 19:08:17 +00:00 |
|
Travis
|
2dc9643b29
|
[auto][ci skip] Adding data files from Travis build #82
|
2016-03-14 16:29:21 +00:00 |
|
Al
|
0d7f9f2032
|
[data] Using UTC dates for libpostal data file tracking for #38. Also silencing curl when checking if file was updated
|
2016-03-10 16:44:02 -05:00 |
|
Travis
|
c4203c6ea9
|
[auto][ci skip] Adding data files from Travis build #63
|
2016-03-06 18:00:40 +00:00 |
|
Travis
|
73140a8239
|
[auto][ci skip] Adding data files from Travis build #62
|
2016-03-06 17:51:23 +00:00 |
|
Travis
|
d8e0945d5b
|
[auto][build] Adding data files from Travis build #57
|
2016-03-06 16:11:32 +00:00 |
|