Commit Graph

4485 Commits

Author SHA1 Message Date
Al
01ef0a42dc [places] adding state_district more often in UK for population < 10000 2017-01-12 01:34:25 -05:00
Al
aa142d0311 [openaddresses] adding Brasil census data sets for 26 states, currently just city and postcode for Distrito Federal while another source is being investigated 2017-01-12 00:44:27 -05:00
Al
1f6eac5dae [boundaries] use neighbourhood with the u 2017-01-12 00:43:13 -05:00
Al
09b3aeb7d9 [fix] component 2017-01-11 16:50:54 -05:00
Al
ed5dd28023 [addresses] adding some more synonyms to Brasilia street regex 2017-01-11 16:31:30 -05:00
Al
bec569adaa [osm] adding new validity check to venue names so if the Jaccard(name tokens, street & house numer tokens) == 1 and the address does not have a known venue type e.g. a restaurant, the "venue name" is actually just the street address and can be discarded 2017-01-11 16:23:42 -05:00
Al
7f851810d2 [addresses] formatting addresses in Brasilia, so e.g. "Bloco B" is never part of the street name or building name, it's the house number. place=neighbourhood maps to nothing in Brasilia as these are basically subdivisions whose streets are identically named 2017-01-11 16:18:04 -05:00
Al
0d030a98c5 [osm] adding airport polygon index 2017-01-11 04:25:54 -05:00
Al
d528095984 [addresses] adding random unit numbers with more digits 2017-01-11 04:24:35 -05:00
Al
979fd16215 [osm] adding airports and terminals data sets with points and polygons, more file cleanup in OSM fetch script 2017-01-10 16:20:32 -05:00
Al
4bdfe5ba1d [openaddresses] add Habersham County, GA 2017-01-10 16:19:31 -05:00
Al
49fdb29e16 [openaddresses] add Swedish municipalities of Malmö, Vaxholm, Vaxjö, and Helsingborg 2017-01-10 12:23:06 -05:00
Al
7a8f94330b [parser] only adding ngrams in a hyphenated word if the subword is not rare 2017-01-09 02:53:33 -05:00
Al
00cf936460 [openaddresses] adding Nordrhein-Westfalen, Germany 2017-01-08 12:48:45 -05:00
Al
86c7b7f3fe [addresses] no longer normalizing slashes in boundary names for places that have multilingual names, etc. 2017-01-08 12:41:51 -05:00
Al
a6d94f998b [addresses] stripping parentheticals in admin boundary names as sometimes cities in e.g. Switzerland are like Oberwil (ZG) in OSM 2017-01-08 03:43:22 -05:00
Al
e10c156176 [dictionaries] adding BL as an abbreviation for Boulevard 2017-01-07 20:22:03 -05:00
Al
828b67d4f7 [osm] adding some new training data for simple road names and their surrounding admin boundaries 2017-01-07 15:34:43 -05:00
Al
83e38d9a8c [openaddresses] add OSM boundaries for Milwaukee county as many of the cities appear to be IDs 2017-01-07 01:42:46 -05:00
Al
eab629802c [openaddresses] removing pre_release_downloads as they're all in master now, adding city_replacements for all data sets where OSM boundaries are used 2017-01-07 01:39:11 -05:00
Al
69f1137532 [openaddresses] adding city_replacements for Lake County, FL 2017-01-07 00:35:12 -05:00
Al
c025b0f7d4 [openaddresses] adding correct state for Glarus, Switzerland, ignoring city in Milwaukee if it's purely numeric 2017-01-07 00:01:46 -05:00
Al
d51f9dbb0e [addresses] stripping unit phrases from streets in OpenAddresses as well, return value wasn't getting used before 2017-01-06 10:19:08 -05:00
Al
cfdef1788c [addresses] stripping unit from street using the libpostal dictionaries in all the address data sets. Happens surprisingly often in OpenStreetMap as well as OpenAddresses 2017-01-06 10:06:23 -05:00
Al
3fbd4426b7 [openaddresses] adding Swiss cantons of Grigioni/Graubünden, Glarus, Uri, and Schwyz 2017-01-06 08:55:32 -05:00
Al
9c14d47f24 [openaddresses] adding Cambell and Pendleton County KY and San Benito County, CA 2017-01-06 02:41:29 -05:00
Al
321f2034d2 [fix] unidata file 2017-01-05 04:24:33 -05:00
Al
7a31802a04 [fix] also fix german-ascii transliteration on uppercase U with umlaut 2017-01-05 04:07:29 -05:00
Al
25723fcea2 [transliteration] making the custom rules in transliteration less repetitious and accessible from elsewhere, removing string names for common transliterators and using constants 2017-01-05 04:06:51 -05:00
Al
3fcaae3dbc [openaddresses] add Canton of Solothurn, Switzerland 2017-01-05 02:23:20 -05:00
Al
4182123fa6 [openaddresses] adding Schaffhausen, also adding language=de for the last few cantons 2017-01-05 01:40:30 -05:00
Al
72e6bf043b [openaddresses] add Basel-Stadt, Switzerland 2017-01-05 01:26:20 -05:00
Al
3d16c20d24 [openaddresses] add Boyd County, KY 2017-01-05 01:25:41 -05:00
Al
c5cca4c82f [openaddresses] add Canton of Basel-Landschaft, Switzerland 2017-01-04 02:34:15 -05:00
Al
3e7042597e [openaddresses] adding Jamaica countrywide to OpenAddresses config 2017-01-04 02:32:41 -05:00
Al
bcd61ffbe8 [formatting] moving postcode to the beginning of the address only in countries using the continental European conventions. Creates more ambiguity than is worthwhile in the US, etc. when, say, house_number is removed from a training example and the postcode is inserted first (could very easily be a house_number) 2017-01-03 03:39:16 -05:00
Al
38e147d210 [fix] address configs for Greek/Hebrew 2017-01-03 03:07:53 -05:00
Al
de2dffa315 [addresses] adding Calle to purely numeric Spanish street names in OSM as well 2017-01-02 23:41:01 -05:00
Al
ccd555d020 [transliteration] regenerated transliteration_scripts_data.c 2017-01-02 13:52:48 -05:00
Al
600b40d2f6 [transliteration] adding german-ascii transliteration to Estonian to handle umlauts (ä => ae, etc.) 2017-01-02 13:51:56 -05:00
Al
b2b7f6f155 [osm] add wikipedia:* to rail station exception 2017-01-02 13:13:42 -05:00
Al
a99a1e759e [openaddresses] adding Rio de Janeiro, Stockholm, and Liechtenstein. Adding higher CLDR country probability for smaller countries 2017-01-02 03:29:36 -05:00
Al
77035fbdbd [strings] adding utf8_is_whitespace to the header so it can be referenced from multiple files 2017-01-02 02:23:21 -05:00
Al
400ea589ef [normalize] add NORMALIZE_STRING_SIMPLE_LATIN_ASCII option to pynormalize 2017-01-02 02:08:54 -05:00
Al
182976214c [logging] converting most of the steps in building the transliteration table to use debug logging 2017-01-02 00:41:11 -05:00
Al
d8d3840700 [transliteration] constant for the html-escape transliterator 2017-01-02 00:40:12 -05:00
Al
4ad3a52fe1 [strings] fix lowercasing in string_utils.c 2017-01-01 20:08:34 -05:00
Al
a78937f265 [normalize] use the new utf8proc lowercasing (as opposed to case folding), free copies since none of the string functions operate in-place any more, add minimal HTML escaping transliterator even to ASCII text 2017-01-01 20:06:32 -05:00
Al
5c56a44faa [strings] reverting to utf8proc v1.3.1, as 2.0 and above can chop off certain sequences 2017-01-01 20:03:23 -05:00
Al
fe88630f78 [dictionaries] regenerating address_expansion_data.c from upstream changes 2017-01-01 14:26:54 -05:00