Commit Graph

4678 Commits

Author SHA1 Message Date
Al
a3506131fe [build] adding libpostal_setup_datadir, libpostal_setup_parser_datadir, libpostal_setup_language_classifier_datadir functions for configuring the datadir at runtime 2017-01-09 16:11:26 -05:00
Al
953a26e54e [utils] char_array_add_vjoined to stay consistent (add_* methods NUL termiante) 2017-01-09 16:10:07 -05:00
Al
7a8f94330b [parser] only adding ngrams in a hyphenated word if the subword is not rare 2017-01-09 02:53:33 -05:00
Al
00cf936460 [openaddresses] adding Nordrhein-Westfalen, Germany 2017-01-08 12:48:45 -05:00
Al
86c7b7f3fe [addresses] no longer normalizing slashes in boundary names for places that have multilingual names, etc. 2017-01-08 12:41:51 -05:00
Al
a6d94f998b [addresses] stripping parentheticals in admin boundary names as sometimes cities in e.g. Switzerland are like Oberwil (ZG) in OSM 2017-01-08 03:43:22 -05:00
Al
e10c156176 [dictionaries] adding BL as an abbreviation for Boulevard 2017-01-07 20:22:03 -05:00
Al
828b67d4f7 [osm] adding some new training data for simple road names and their surrounding admin boundaries 2017-01-07 15:34:43 -05:00
Al Barrentine
a2b84a0177 [docs][ci skip] Adding parser label definitions to the README 2017-01-07 14:17:31 -05:00
Al
83e38d9a8c [openaddresses] add OSM boundaries for Milwaukee county as many of the cities appear to be IDs 2017-01-07 01:42:46 -05:00
Al
eab629802c [openaddresses] removing pre_release_downloads as they're all in master now, adding city_replacements for all data sets where OSM boundaries are used 2017-01-07 01:39:11 -05:00
Al
69f1137532 [openaddresses] adding city_replacements for Lake County, FL 2017-01-07 00:35:12 -05:00
Al
c025b0f7d4 [openaddresses] adding correct state for Glarus, Switzerland, ignoring city in Milwaukee if it's purely numeric 2017-01-07 00:01:46 -05:00
Al
d51f9dbb0e [addresses] stripping unit phrases from streets in OpenAddresses as well, return value wasn't getting used before 2017-01-06 10:19:08 -05:00
Al
cfdef1788c [addresses] stripping unit from street using the libpostal dictionaries in all the address data sets. Happens surprisingly often in OpenStreetMap as well as OpenAddresses 2017-01-06 10:06:23 -05:00
Al
3fbd4426b7 [openaddresses] adding Swiss cantons of Grigioni/Graubünden, Glarus, Uri, and Schwyz 2017-01-06 08:55:32 -05:00
Al
9c14d47f24 [openaddresses] adding Cambell and Pendleton County KY and San Benito County, CA 2017-01-06 02:41:29 -05:00
Al Barrentine
2b3a6f663e Merge pull request #152 from rinigus/master_rpc_malloc
changes required for cross-compilation of ARM target
2017-01-05 17:12:51 -05:00
Al
321f2034d2 [fix] unidata file 2017-01-05 04:24:33 -05:00
Al
7a31802a04 [fix] also fix german-ascii transliteration on uppercase U with umlaut 2017-01-05 04:07:29 -05:00
Al
25723fcea2 [transliteration] making the custom rules in transliteration less repetitious and accessible from elsewhere, removing string names for common transliterators and using constants 2017-01-05 04:06:51 -05:00
Al
3fcaae3dbc [openaddresses] add Canton of Solothurn, Switzerland 2017-01-05 02:23:20 -05:00
Al
4182123fa6 [openaddresses] adding Schaffhausen, also adding language=de for the last few cantons 2017-01-05 01:40:30 -05:00
Al
72e6bf043b [openaddresses] add Basel-Stadt, Switzerland 2017-01-05 01:26:20 -05:00
Al
3d16c20d24 [openaddresses] add Boyd County, KY 2017-01-05 01:25:41 -05:00
Rinigus
26aeb0ebec drop AC_FUNC_MALLOC and _REALLOC and check for them as regular functions; add extra cflags for scanner 2017-01-05 07:34:24 +02:00
Al
c5cca4c82f [openaddresses] add Canton of Basel-Landschaft, Switzerland 2017-01-04 02:34:15 -05:00
Al
3e7042597e [openaddresses] adding Jamaica countrywide to OpenAddresses config 2017-01-04 02:32:41 -05:00
Al
bcd61ffbe8 [formatting] moving postcode to the beginning of the address only in countries using the continental European conventions. Creates more ambiguity than is worthwhile in the US, etc. when, say, house_number is removed from a training example and the postcode is inserted first (could very easily be a house_number) 2017-01-03 03:39:16 -05:00
Al
38e147d210 [fix] address configs for Greek/Hebrew 2017-01-03 03:07:53 -05:00
Al
de2dffa315 [addresses] adding Calle to purely numeric Spanish street names in OSM as well 2017-01-02 23:41:01 -05:00
Al
ccd555d020 [transliteration] regenerated transliteration_scripts_data.c 2017-01-02 13:52:48 -05:00
Al
600b40d2f6 [transliteration] adding german-ascii transliteration to Estonian to handle umlauts (ä => ae, etc.) 2017-01-02 13:51:56 -05:00
Al
b2b7f6f155 [osm] add wikipedia:* to rail station exception 2017-01-02 13:13:42 -05:00
Al
a99a1e759e [openaddresses] adding Rio de Janeiro, Stockholm, and Liechtenstein. Adding higher CLDR country probability for smaller countries 2017-01-02 03:29:36 -05:00
Al
77035fbdbd [strings] adding utf8_is_whitespace to the header so it can be referenced from multiple files 2017-01-02 02:23:21 -05:00
Al
400ea589ef [normalize] add NORMALIZE_STRING_SIMPLE_LATIN_ASCII option to pynormalize 2017-01-02 02:08:54 -05:00
Al
182976214c [logging] converting most of the steps in building the transliteration table to use debug logging 2017-01-02 00:41:11 -05:00
Al
d8d3840700 [transliteration] constant for the html-escape transliterator 2017-01-02 00:40:12 -05:00
Al
4ad3a52fe1 [strings] fix lowercasing in string_utils.c 2017-01-01 20:08:34 -05:00
Al
a78937f265 [normalize] use the new utf8proc lowercasing (as opposed to case folding), free copies since none of the string functions operate in-place any more, add minimal HTML escaping transliterator even to ASCII text 2017-01-01 20:06:32 -05:00
Al
5c56a44faa [strings] reverting to utf8proc v1.3.1, as 2.0 and above can chop off certain sequences 2017-01-01 20:03:23 -05:00
Al
fe88630f78 [dictionaries] regenerating address_expansion_data.c from upstream changes 2017-01-01 14:26:54 -05:00
Al
101bbcc02d Merge remote-tracking branch 'origin/master' into parser-data 2017-01-01 14:25:37 -05:00
Travis
d61e90a33d [auto][ci skip] Adding data files from Travis build #188 2017-01-01 19:20:54 +00:00
Al Barrentine
6048d6a71e Merge pull request #149 from iestynpryce/master
Enhanced the Welsh (cy) language dictionaries.
2017-01-01 14:11:16 -05:00
Al
0b5cc96654 [transliteration] add decompose option when stripping accents 2017-01-01 13:54:20 -05:00
Al
7d6c85aeec [fix] new string tree iterator, don't decrement permutations on rollovers 2017-01-01 13:34:08 -05:00
Al
1780c5e053 [fix] moving enum 2016-12-31 13:01:57 -05:00
Iestyn Pryce
d8ee43156e Enhanced the Welsh (cy) language dictionaries. 2016-12-31 09:46:58 +00:00