Al
acd953ce51
[parser] first pass at new parser feature extraction
...
- removing geodb phrases
- use Latin-ASCII-simple transliteration (no umlauts, etc.)
- no digit normalization for admin component phrases and postcodes
- tag = START + word, special feature for first word in the sequence
- add the new admin boundary categories
- for hyphenated non-phrase words, add each sub-word
- for rare and unknown words, add ngram features of 3-6 characters with
underscores to indicate beginnings and endings (similar to language
classifier features)
- defines notion of "rare words" (known words with a frequency <= n where
n > the unknown word threshold), so known words can share
statistical strength with artificial and real unknown words
2016-12-29 02:17:35 -05:00
Al
e62101b8bf
[parser] remove geodb from address_parser_test, sort confusion matrix
2016-12-29 02:14:40 -05:00
Al
174529e8d0
[parser] remove geodb and fix small memory leak in address_parser_train
2016-12-29 02:12:06 -05:00
Al
bde5fdfaad
[merge] merging in master
2016-12-29 02:00:31 -05:00
Al
646d96e13e
Merge remote-tracking branch 'origin/master' into parser-data
2016-12-29 01:58:38 -05:00
Al
a26a01ece3
[openaddresses] adding SEMCOG counties, MI
2016-12-28 19:37:44 -05:00
Al
22b4a215f4
[places] additional form for West Indies
2016-12-28 17:58:32 -05:00
Al
f58ebbdf7f
[fix] var name
2016-12-28 14:37:00 -05:00
Al
7ee44a584b
[fix] genitive case for Russian/Ukrainian toponyms, not locative ( #125 )
2016-12-28 14:34:28 -05:00
Al
e6e4b28e43
[addresses] making the город/г. prefix apply to the Russian language rather than the country
2016-12-28 13:26:19 -05:00
Al
f995fdf9d2
[fix] default None
2016-12-28 05:09:15 -05:00
Al
3dc6a69bf5
[openaddresses] adding locative names in OpenAddresses as well, which contains some Ukraine data sets
2016-12-28 04:59:55 -05:00
Al
91013fe296
[fix] moving checks inside the add_locatives function, fixing float cast
2016-12-28 04:59:27 -05:00
Al
6f009fb8a6
[addresses] adding pymorphy2 for converting Russian and Ukrainian place names (sticking with state and staet_district for the moment) to the locative case as mentioned in #125
2016-12-28 04:48:32 -05:00
Al
e91907a21b
[boundaries] actually, the urban okrugs/districts seem to function more like neighborhoods in St Petersburg and Moscow, calling the raions city_district and the okrugs suburb
2016-12-28 01:36:11 -05:00
Travis
6c35eb9e65
[auto][ci skip] Adding data files from Travis build #186
2016-12-28 06:29:35 +00:00
Al
a86d6d5528
[merge] merging in master
2016-12-28 01:11:04 -05:00
Al Barrentine
47c3b0091b
Merge pull request #147 from Komzpa/patch-1
...
Remove place names that are not place names (RU, BE)
2016-12-28 01:08:48 -05:00
Al
e23951a90f
[dictionaries] new Ukrainian place names dictionary from http://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases/UK
2016-12-28 01:08:01 -05:00
Al
0bcaf816c4
[dictionaries] new Russian place names dictionary from http://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases/RU
2016-12-28 01:07:35 -05:00
Al
561d195be4
[fix] add global_overrides_last=True for federal cities in Russia
2016-12-28 00:49:13 -05:00
Al
4ce8f414ef
[boundaries] adding Moscow and St Petersburg as cities despite technically having "state" boundaries
2016-12-28 00:25:20 -05:00
Al
12c7bed275
[fix] /exceptions/overrides/
2016-12-28 00:16:22 -05:00
Al
1afe97b508
[fix] /containing/contained_by/
2016-12-28 00:04:18 -05:00
Al
66eda96b75
[boundaries] admin_level=8 is city_district in Moscow and St Petersburg
2016-12-27 23:59:14 -05:00
Al
4344c5fdf3
[formatting] adding non-zero invert probabilities to all the former Soviet states. Other template insertions can still apply afterward for #125
2016-12-27 23:25:49 -05:00
Al
25e966411d
[formatting] adding the ability to invert the address template (line by line, preserving order within each line) with certain probabilities
2016-12-27 23:25:49 -05:00
Al
1c17f1f2e2
[names/ru] adding г. (город) prefix to Russian city names 50% of the time in various forms per #125
2016-12-27 23:25:41 -05:00
Al
165056ccd8
[names] adding configurable prefix/suffix additions for boundary names
2016-12-27 20:32:23 -05:00
Travis
dc528affd5
[auto][ci skip] Adding data files from Travis build #184
2016-12-27 23:45:40 +00:00
Al Barrentine
2a42ea016b
Merge pull request #148 from Komzpa/patch-2
...
Ukrainian place names that are actually whatever
2016-12-27 18:35:48 -05:00
Darafei Praliaskouski
e514778645
Ukrainian place names that are actually whatever
2016-12-27 15:21:05 +03:00
Darafei Praliaskouski
dba8c28e6a
Remove Russian place names that are actually street names
2016-12-27 13:28:23 +03:00
Darafei Praliaskouski
38a6618e40
Remove Belarusian place names that are not place names
...
These all are parts of streets.
2016-12-27 13:26:30 +03:00
Al
80a9c1b308
[addresses] move country-specific cleanups to before reverse geocoding as those deal with the user-specified components
2016-12-27 04:19:57 -05:00
Al
d9c28ec160
[names] adding regional council and regional municipality to suffixes
2016-12-27 03:45:09 -05:00
Al
6163dbae39
[osm/places] adding option to only format place tags for city and smaller admins, using for polygons as larger polys should be included elsewhere anyway
2016-12-27 03:37:15 -05:00
Al
6eee689685
[fix] only applying separator tag to commas
2016-12-27 03:16:04 -05:00
Al
6192ac985a
[names] one more for South Africa: District Municipality
2016-12-27 03:04:31 -05:00
Al
2cdf30a79e
[names] same with Metropolitan Municipality
2016-12-27 02:48:55 -05:00
Al
2e3c1dee67
[names] add Local Municipality to English ignorable suffixes (seen in South Africa)
2016-12-27 02:45:58 -05:00
Al
76d8fc1d37
[fix] combined components
2016-12-26 21:35:27 -05:00
Al
c3bf63bc18
[fix] remove reference to ftfy in the formatter
2016-12-26 21:25:28 -05:00
Al
8abbb273b2
[osm] adding the excellent ftfy ( https://github.com/LuminosoInsight/python-ftfy ) to fix Mojibake, etc. in address components
2016-12-26 21:18:14 -05:00
Al
7ec368542b
[formatting] giving single hyphens the separator tag
2016-12-26 21:00:25 -05:00
Al
d208397ecb
[addresses] checking if component is generated in combining fields
2016-12-26 16:58:10 -05:00
Al
654fc2c463
[fix] memory cleanup in address_parser_data_set, logging any bad input lines
2016-12-26 16:18:15 -05:00
Al
e6d7b09e08
[expansions] adding generated expansion data
2016-12-26 16:16:59 -05:00
Al
4cdd245dc2
[logging] log error in address_dictionary_get_expansions
2016-12-26 16:16:26 -05:00
Al Barrentine
46a1be3443
Merge pull request #144 from bradh/utcdateref
...
[fix] Use UTC date reference to avoid repeating S3 downloads.
2016-12-26 14:04:41 -05:00