Al
|
f6ca495fc8
|
[polygons/zones] Adding a polygon reader for OSM zones (named residential/commercial/industrial/military areas) which are closed ways and can be used in addresses e.g. in office parks, larger housing complexes, etc.
|
2016-03-26 23:16:57 -04:00 |
|
Al
|
e7798c1aca
|
[categories] documentation of Nominatim special phrases scraper
|
2016-03-26 23:12:17 -04:00 |
|
Al
|
39f1d27ced
|
[osm] Building OSM file for deriving category queries, zone data for including the names of residential, commercial and industrial areas in the parser. Named landuse and historic features are considered valid places/venues.
|
2016-03-26 23:08:24 -04:00 |
|
Al
|
9c14d207aa
|
[fix] Adding islands to admin borders
|
2016-03-24 02:12:07 -04:00 |
|
Al
|
95934ab213
|
[numex] Moving numex script to a different subpackage, adding function for creating ordinals
|
2016-03-18 20:36:22 -04:00 |
|
Al
|
7c48ab3034
|
[categories] Scraper for Nominatim Special Phrases, translated into a number of languages
|
2016-03-18 17:52:33 -04:00 |
|
Al
|
7696179843
|
[osm] Removing generic amenities like ATMs, parking, restrooms, etc. from addresses but keeping them in venues to support generic queries
|
2016-03-14 01:07:03 -04:00 |
|
Al
|
18e2c7519e
|
[fix] Absolute dir check in generating expansion data files
|
2016-03-13 23:23:46 -04:00 |
|
Al
|
c5498c6c0c
|
[osm] Incorporating airports, and only including certain values for tourism= and leisure= since not all are physical place types, adding building= to addresses
|
2016-03-12 15:02:31 -05:00 |
|
Al Barrentine
|
942e5df1b9
|
Merge pull request #40 from thatdatabaseguy/master
Including landmarks + more venues in OSM training data
|
2016-03-11 16:47:11 -05:00 |
|
Al
|
7a24ced43c
|
[fix] longitude validation
|
2016-03-11 16:35:33 -05:00 |
|
Al
|
99f452c7b1
|
[geo] Validate lat/lon in latlon_to_decimal
|
2016-03-11 16:18:31 -05:00 |
|
Al
|
a2f186a0ee
|
[geo] Adding lat/lon validation functions for the training scripts
|
2016-03-11 14:09:10 -05:00 |
|
Al
|
f7d6943994
|
[fix] no comma in download_quattroshapes filenames
|
2016-03-10 23:40:54 -05:00 |
|
Al
|
a71fa7bd8d
|
[osm] tourism= keys should only be included in some cases. Listing everything on taginfo with >= 100 uses
|
2016-03-10 14:17:38 -05:00 |
|
Al
|
d43fe201ff
|
[osm] No longer requiring street name in OSM planet addresses. Adding leisure and tourism keys to capture things like parks, squares, etc. Adding place=locality for neighborhoods.
|
2016-03-09 18:19:33 -05:00 |
|
Al
|
1003832b9c
|
[fix] README should not be included in building address dictionaries
|
2016-03-09 11:18:19 -05:00 |
|
Al
|
08085ee08b
|
[languages][ci skip] Checking in script to extract address phrases in various languages using frequent itemsets
|
2016-03-08 14:35:20 -05:00 |
|
Al
|
a483fd5d42
|
[fix][ci skip] pip installing some light requirements when the dictionaries/numex files change. Only building transliteration if the data file changed (the CLDR files are not in-repo so will be built offline)
|
2016-03-04 16:17:05 -05:00 |
|
Al
|
52ebc9fc46
|
[fix] Paths relative to the current file in address_dictionaries.py so it can be run from anywhere
|
2016-02-24 13:10:44 -05:00 |
|
Al
|
393fd7e0f3
|
[build] Using env var for data dir in geodata build script
|
2016-02-08 01:11:42 -05:00 |
|
Al
|
b4dcb83e10
|
[fix] sets of potential languages in case phrase matches multiple dictionaries
|
2016-01-24 17:57:12 -05:00 |
|
Al
|
b713d102d1
|
[languages] using whole phrase len, not first token, in disambiguation. Using single unambiguous observed default language or unambiguous observed language
|
2016-01-24 17:43:14 -05:00 |
|
Al
|
b3e730d83f
|
[languages] If there's a single default language, assume ambiguous abbreviations are the default
|
2016-01-24 17:15:02 -05:00 |
|
Al
|
fffaeecfc6
|
[languages] Only count regional defaults when returning languages
|
2016-01-24 16:35:14 -05:00 |
|
Al
|
f8a0463aa0
|
[languages] Language disambiguation treats the national languages as non-default
|
2016-01-24 15:10:04 -05:00 |
|
Al
|
f04360732c
|
[languages] Single character cannot be sufficient to disambiguate with multiple languages (Avenue A for example)
|
2016-01-24 03:17:21 -05:00 |
|
Al
|
00ce71223f
|
[osm] Using the default probabilities for abbreviations in ways training data
|
2016-01-24 00:53:41 -05:00 |
|
Al
|
bab7a0f961
|
[osm] splitting streets (way names) on semicolons
|
2016-01-24 00:42:25 -05:00 |
|
Al
|
3485738c2b
|
[fix] regional languages in French Canada
|
2016-01-24 00:20:34 -05:00 |
|
Al
|
7646adfc0f
|
[osm] Adding abbreviated street names in addition to the originals
|
2016-01-23 23:23:58 -05:00 |
|
Al
|
67130383ce
|
[fix] converting semicolons to commas in OSM house numbers and picking one at random
|
2016-01-23 23:16:19 -05:00 |
|
Al
|
1bb797f783
|
[fix] spacing in phrases
|
2016-01-23 21:59:49 -05:00 |
|
Al
|
3a8c3dfcf6
|
[fix] spacing in phrases at end of string
|
2016-01-23 21:51:40 -05:00 |
|
Al
|
78450bfad9
|
[fix] Spaces in abbreviation
|
2016-01-23 21:36:20 -05:00 |
|
Al
|
308ceb5a5f
|
[fix] convert UTF8 slices back to unicode before using with the Python trie
|
2016-01-23 20:20:23 -05:00 |
|
Al
|
5eb6bb309b
|
[fix] Only adding whitespace back into tokenized strings during abbreviation if it existed in the original string
|
2016-01-23 20:09:45 -05:00 |
|
Al
|
d61207e95a
|
[fix] var name
|
2016-01-23 18:01:02 -05:00 |
|
Al
|
e44cba1d06
|
[fix] geonames db not required in OSM training data
|
2016-01-23 17:59:55 -05:00 |
|
Al
|
4f03711e60
|
[osm] Adding abbreviated training examples to ways language training data
|
2016-01-23 14:10:47 -05:00 |
|
Al
|
c9fb4ee69d
|
[osm/formatting] Dropping state more often than not, except in the US and Canada where those fields are more commonly used
|
2016-01-22 17:58:24 -05:00 |
|
Al
|
ea9bb3f2d5
|
[fix] Abbreviation probabilities should only apply once, not once per dictionary. Also fixing issues where some of the abbreviations were doubled
|
2016-01-22 15:48:21 -05:00 |
|
Al
|
f9f6558e06
|
[fix] simple whitespace field splits for the limited format training data (used for language classification)
|
2016-01-22 04:34:42 -05:00 |
|
Al
|
cd1db7b288
|
[fix] Making sure rare components are dropped first, adding state and country back in
|
2016-01-22 04:17:19 -05:00 |
|
Al
|
adc3a00264
|
[fix] var name
|
2016-01-22 04:10:16 -05:00 |
|
Al
|
261beffa36
|
[fix] Actually better to remove country and state from rare components and let them use the standard dropout probabilities
|
2016-01-22 04:00:45 -05:00 |
|
Al
|
a6cc3d0114
|
[fix] Adding state to the more frequently dropped components
|
2016-01-22 03:56:38 -05:00 |
|
Al
|
bca3dae004
|
[fix] state full name probabilities for limited vs. full formatted OSM training sets
|
2016-01-22 03:54:20 -05:00 |
|
Al
|
d1cf253092
|
[osm/formatting] Higher probability of dropout for rare components like counties, etc.
|
2016-01-22 03:39:35 -05:00 |
|
Al
|
9dd965a6fa
|
[fix] removing gazetteer configuration from disambiguation module
|
2016-01-22 03:18:18 -05:00 |
|