Commit Graph

833 Commits

Author SHA1 Message Date
Al
0a8f46bdc3 [parser] Using new geonames designations in parser features 2016-07-21 17:04:57 -04:00
Al
c383f8af88 [parser] Using NFC normalization for parser as well, @ sign not defined as separator since it may also be used in intersections 2016-07-21 17:04:57 -04:00
Al
c2ee5a45b3 [geodb] Adding separate bitset for geonames place types and using NFC normalization instead of NFD (requires retraining) 2016-07-21 17:04:57 -04:00
Al
6c39c663ff [normalize] Adding NORMALIZE_STRING_COMPOSE for NFC unicode normalization 2016-07-21 17:04:57 -04:00
Al
757c6147cb [tokenization] Adding ability to tokenize 's Gravenhage 2016-07-21 17:04:57 -04:00
Al
2e8888e331 [fix] warnings/size_t in libpostal.c 2016-07-21 17:04:57 -04:00
Al
e800f21f06 [gazetteers] Adding new gazetteer types/address components 2016-07-21 17:04:57 -04:00
Al
e5e0cf3b92 [fix] loading transliteration module in address_parser_test.c as well 2016-07-21 17:04:57 -04:00
Al
b8d43dc601 [fix] cstring_array_split calls 2016-07-21 17:04:57 -04:00
Al
b19cd3f60a [fix] brace 2016-07-21 17:04:57 -04:00
Al
994b2f18e4 [parser] Ignore multiple spaces in parser input post-normalization. If normalizing the string creates several distinct tokens (namely in Vulgar fractions e.g. ½ => 1/2), add all the sub-tokens with the same label as the parent 2016-07-21 17:04:57 -04:00
Al
b664ab1cea [utils] Adding cstring_array_split_ignore_consecutive 2016-07-21 17:04:57 -04:00
Al
8e90ee45d2 [fix] calls and NULL checks 2016-07-21 17:04:57 -04:00
Al
e3cffaf0d1 [fix] tokenized_string_t should copy its source string 2016-07-21 17:04:57 -04:00
Al
16501aba17 [fix] Need to load transliteration module for Latin-ASCII normalization 2016-07-21 17:04:57 -04:00
Al Barrentine
e02c6adc85 Merge pull request #91 from uberbaud/openbsd
Add support for OpenBSD
2016-07-20 19:47:18 -04:00
Tom Davis
c0366147e8 Add support for OpenBSD 2016-07-20 18:19:31 -04:00
Tom Davis
a8bb798ce0 Call libpostal_data in source path, not build path
This fix updates Makefile to find the actual libpostal_data file when
`configure` is called from another directory, which it uses as the build
directory.
2016-07-20 17:31:52 -04:00
Travis
a0f6e100f1 [auto][ci skip] Adding data files from Travis build #133 2016-07-17 19:13:46 +00:00
Al
12d50aac12 Merge branch 'master' of https://github.com/openvenues/libpostal 2016-07-17 15:03:52 -04:00
Al
83381e9d8a [expand] Adding exception for a few types of special punctuation (ampersand, plus, pound sign) which should be left in the original string and separated by whitespace. Closes #84. Closes #85 2016-07-17 15:02:47 -04:00
Travis
2fb677ca73 [auto][ci skip] Adding data files from Travis build #132 2016-07-17 18:47:28 +00:00
David Farrell
a7a9708d2b don't error on multiple setup_parser() 2016-07-17 11:25:03 -04:00
Al
d7996ed56c [fix] setting garbage pointer to NULL on language_classifier_teardown (fixes #82) 2016-07-17 01:56:09 -04:00
Al
ce78064988 [fix] NULL checks 2016-07-15 13:23:23 -04:00
Al
2f5f226faa [fix] Add original string to normalizations if all options were set to false 2016-07-15 13:23:23 -04:00
Al
e816b4f77e [parser] Ignore language/country options explicitly in the parser. The purpose of these options is not to be able to create language-specific/country-specific models at some point, shouldn't be used in the global model 2016-07-06 14:56:46 -04:00
Al
58a5dbe7e0 [logging] Logging the value of LIBPOSTAL_DATA_DIR when a setup error occurs 2016-07-01 14:51:04 -04:00
Al
ad9dfb46bd [build] Using a process pool with 64MB chunks (similar to aws cli) for S3 downloads. Setting the max concurrent requeests to 10, also the default in aws cli. 2016-07-01 14:37:13 -04:00
Al
a9ba61585b [fix] Adding set -e to data download script so it fails if any subcommands fail 2016-05-04 23:08:06 -04:00
Al
9819ebf949 [fix] always include expansions in the ambiguous expansion dictionary, no matter which component 2016-04-29 13:26:13 -04:00
Al
0bc3550c11 [expansion] Adding address_expansion_in_dictionary 2016-04-29 13:23:48 -04:00
Al
59e5fcd1b4 [fix] LC_ALL=C in data download script 2016-04-11 12:47:50 -04:00
Travis
b8d4d71522 [auto][ci skip] Adding data files from Travis build #112 2016-03-30 20:04:52 +00:00
Al
14e8f50cf1 [fix] Expansions when passing in the address_components= option. Was only limiting results at the phrase level, should work at the individual expansion level 2016-03-29 16:46:29 -04:00
Travis
2795d258d1 [auto][ci skip] Adding data files from Travis build #108 2016-03-29 19:11:57 +00:00
Al
6dad58c696 [fix][ci skip] last remaining instance of vignt in libpostal 2016-03-29 12:51:19 -04:00
Travis
08d873ac15 [auto][ci skip] Adding data files from Travis build #105 2016-03-29 15:39:14 +00:00
Travis
49adcfe9b5 [auto][ci skip] Adding data files from Travis build #97 2016-03-22 14:33:13 +00:00
Al
25c8ba8603 [fix] Log more helpful error message in language_classifier if not loaded 2016-03-21 18:18:25 -04:00
Al
0356b45069 [fix] Log errors in numex module if not loaded 2016-03-21 18:15:53 -04:00
Al
943cd4443a [fix] Log errors if address dictionaries not loaded 2016-03-21 18:13:14 -04:00
Al
510f12ff96 [fix] Log error in transliteration if setup hasn't been called 2016-03-21 18:06:02 -04:00
Al
1b94727871 [fix] Check that parser is loaded in parse_address, log and return NULL instead of segfaulting 2016-03-21 18:04:26 -04:00
Al
be7b696cb2 [fix] actually that temporary array is unnecessary altogether, eliminating 2016-03-21 17:00:11 -04:00
Al
e0f7638372 [fix] Freeing up temporary char_array 2016-03-21 16:50:48 -04:00
Travis
14093a263d [auto][ci skip] Adding data files from Travis build #92 2016-03-21 16:43:23 +00:00
Travis
0dfd20f14d [auto][ci skip] Adding data files from Travis build #86 2016-03-16 20:37:31 +00:00
Travis
576e91d3fa [auto][ci skip] Adding data files from Travis build #84 2016-03-16 19:08:17 +00:00
Travis
2dc9643b29 [auto][ci skip] Adding data files from Travis build #82 2016-03-14 16:29:21 +00:00