Al
|
88a80f4e30
|
[fix] using normalized tags throughout in OSM formatted place data
|
2017-01-12 02:25:17 -05:00 |
|
Al
|
eff46bd55a
|
[fix] Ukraine country code
|
2017-01-12 02:12:46 -05:00 |
|
Al
|
01ef0a42dc
|
[places] adding state_district more often in UK for population < 10000
|
2017-01-12 01:34:25 -05:00 |
|
Al
|
aa142d0311
|
[openaddresses] adding Brasil census data sets for 26 states, currently just city and postcode for Distrito Federal while another source is being investigated
|
2017-01-12 00:44:27 -05:00 |
|
Al
|
1f6eac5dae
|
[boundaries] use neighbourhood with the u
|
2017-01-12 00:43:13 -05:00 |
|
Al
|
09b3aeb7d9
|
[fix] component
|
2017-01-11 16:50:54 -05:00 |
|
Al
|
ed5dd28023
|
[addresses] adding some more synonyms to Brasilia street regex
|
2017-01-11 16:31:30 -05:00 |
|
Al
|
bec569adaa
|
[osm] adding new validity check to venue names so if the Jaccard(name tokens, street & house numer tokens) == 1 and the address does not have a known venue type e.g. a restaurant, the "venue name" is actually just the street address and can be discarded
|
2017-01-11 16:23:42 -05:00 |
|
Al
|
7f851810d2
|
[addresses] formatting addresses in Brasilia, so e.g. "Bloco B" is never part of the street name or building name, it's the house number. place=neighbourhood maps to nothing in Brasilia as these are basically subdivisions whose streets are identically named
|
2017-01-11 16:18:04 -05:00 |
|
Al
|
0d030a98c5
|
[osm] adding airport polygon index
|
2017-01-11 04:25:54 -05:00 |
|
Al
|
d528095984
|
[addresses] adding random unit numbers with more digits
|
2017-01-11 04:24:35 -05:00 |
|
Al
|
979fd16215
|
[osm] adding airports and terminals data sets with points and polygons, more file cleanup in OSM fetch script
|
2017-01-10 16:20:32 -05:00 |
|
Al
|
4bdfe5ba1d
|
[openaddresses] add Habersham County, GA
|
2017-01-10 16:19:31 -05:00 |
|
Al
|
49fdb29e16
|
[openaddresses] add Swedish municipalities of Malmö, Vaxholm, Vaxjö, and Helsingborg
|
2017-01-10 12:23:06 -05:00 |
|
Al
|
7a8f94330b
|
[parser] only adding ngrams in a hyphenated word if the subword is not rare
|
2017-01-09 02:53:33 -05:00 |
|
Al
|
00cf936460
|
[openaddresses] adding Nordrhein-Westfalen, Germany
|
2017-01-08 12:48:45 -05:00 |
|
Al
|
86c7b7f3fe
|
[addresses] no longer normalizing slashes in boundary names for places that have multilingual names, etc.
|
2017-01-08 12:41:51 -05:00 |
|
Al
|
a6d94f998b
|
[addresses] stripping parentheticals in admin boundary names as sometimes cities in e.g. Switzerland are like Oberwil (ZG) in OSM
|
2017-01-08 03:43:22 -05:00 |
|
Al
|
e10c156176
|
[dictionaries] adding BL as an abbreviation for Boulevard
|
2017-01-07 20:22:03 -05:00 |
|
Al
|
828b67d4f7
|
[osm] adding some new training data for simple road names and their surrounding admin boundaries
|
2017-01-07 15:34:43 -05:00 |
|
Al
|
83e38d9a8c
|
[openaddresses] add OSM boundaries for Milwaukee county as many of the cities appear to be IDs
|
2017-01-07 01:42:46 -05:00 |
|
Al
|
eab629802c
|
[openaddresses] removing pre_release_downloads as they're all in master now, adding city_replacements for all data sets where OSM boundaries are used
|
2017-01-07 01:39:11 -05:00 |
|
Al
|
69f1137532
|
[openaddresses] adding city_replacements for Lake County, FL
|
2017-01-07 00:35:12 -05:00 |
|
Al
|
c025b0f7d4
|
[openaddresses] adding correct state for Glarus, Switzerland, ignoring city in Milwaukee if it's purely numeric
|
2017-01-07 00:01:46 -05:00 |
|
Al
|
d51f9dbb0e
|
[addresses] stripping unit phrases from streets in OpenAddresses as well, return value wasn't getting used before
|
2017-01-06 10:19:08 -05:00 |
|
Al
|
cfdef1788c
|
[addresses] stripping unit from street using the libpostal dictionaries in all the address data sets. Happens surprisingly often in OpenStreetMap as well as OpenAddresses
|
2017-01-06 10:06:23 -05:00 |
|
Al
|
3fbd4426b7
|
[openaddresses] adding Swiss cantons of Grigioni/Graubünden, Glarus, Uri, and Schwyz
|
2017-01-06 08:55:32 -05:00 |
|
Al
|
9c14d47f24
|
[openaddresses] adding Cambell and Pendleton County KY and San Benito County, CA
|
2017-01-06 02:41:29 -05:00 |
|
Al
|
321f2034d2
|
[fix] unidata file
|
2017-01-05 04:24:33 -05:00 |
|
Al
|
7a31802a04
|
[fix] also fix german-ascii transliteration on uppercase U with umlaut
|
2017-01-05 04:07:29 -05:00 |
|
Al
|
25723fcea2
|
[transliteration] making the custom rules in transliteration less repetitious and accessible from elsewhere, removing string names for common transliterators and using constants
|
2017-01-05 04:06:51 -05:00 |
|
Al
|
3fcaae3dbc
|
[openaddresses] add Canton of Solothurn, Switzerland
|
2017-01-05 02:23:20 -05:00 |
|
Al
|
4182123fa6
|
[openaddresses] adding Schaffhausen, also adding language=de for the last few cantons
|
2017-01-05 01:40:30 -05:00 |
|
Al
|
72e6bf043b
|
[openaddresses] add Basel-Stadt, Switzerland
|
2017-01-05 01:26:20 -05:00 |
|
Al
|
3d16c20d24
|
[openaddresses] add Boyd County, KY
|
2017-01-05 01:25:41 -05:00 |
|
Al
|
c5cca4c82f
|
[openaddresses] add Canton of Basel-Landschaft, Switzerland
|
2017-01-04 02:34:15 -05:00 |
|
Al
|
3e7042597e
|
[openaddresses] adding Jamaica countrywide to OpenAddresses config
|
2017-01-04 02:32:41 -05:00 |
|
Al
|
bcd61ffbe8
|
[formatting] moving postcode to the beginning of the address only in countries using the continental European conventions. Creates more ambiguity than is worthwhile in the US, etc. when, say, house_number is removed from a training example and the postcode is inserted first (could very easily be a house_number)
|
2017-01-03 03:39:16 -05:00 |
|
Al
|
38e147d210
|
[fix] address configs for Greek/Hebrew
|
2017-01-03 03:07:53 -05:00 |
|
Al
|
de2dffa315
|
[addresses] adding Calle to purely numeric Spanish street names in OSM as well
|
2017-01-02 23:41:01 -05:00 |
|
Al
|
ccd555d020
|
[transliteration] regenerated transliteration_scripts_data.c
|
2017-01-02 13:52:48 -05:00 |
|
Al
|
600b40d2f6
|
[transliteration] adding german-ascii transliteration to Estonian to handle umlauts (ä => ae, etc.)
|
2017-01-02 13:51:56 -05:00 |
|
Al
|
b2b7f6f155
|
[osm] add wikipedia:* to rail station exception
|
2017-01-02 13:13:42 -05:00 |
|
Al
|
a99a1e759e
|
[openaddresses] adding Rio de Janeiro, Stockholm, and Liechtenstein. Adding higher CLDR country probability for smaller countries
|
2017-01-02 03:29:36 -05:00 |
|
Al
|
77035fbdbd
|
[strings] adding utf8_is_whitespace to the header so it can be referenced from multiple files
|
2017-01-02 02:23:21 -05:00 |
|
Al
|
400ea589ef
|
[normalize] add NORMALIZE_STRING_SIMPLE_LATIN_ASCII option to pynormalize
|
2017-01-02 02:08:54 -05:00 |
|
Al
|
182976214c
|
[logging] converting most of the steps in building the transliteration table to use debug logging
|
2017-01-02 00:41:11 -05:00 |
|
Al
|
d8d3840700
|
[transliteration] constant for the html-escape transliterator
|
2017-01-02 00:40:12 -05:00 |
|
Al
|
4ad3a52fe1
|
[strings] fix lowercasing in string_utils.c
|
2017-01-01 20:08:34 -05:00 |
|
Al
|
a78937f265
|
[normalize] use the new utf8proc lowercasing (as opposed to case folding), free copies since none of the string functions operate in-place any more, add minimal HTML escaping transliterator even to ASCII text
|
2017-01-01 20:06:32 -05:00 |
|