Al
|
20567bf9a3
|
[polygons] Adding full quattroshapes-backed reverse geocoder to add to OSM training data
|
2015-10-12 15:37:21 -05:00 |
|
Al
|
1b2642fe58
|
[polygons] Addindg ability to specify include properties by filename
|
2015-10-12 15:36:24 -05:00 |
|
Al
|
151161cab3
|
[fix] Raising error in geonames output if a country cannot be localized
|
2015-10-07 03:45:56 -04:00 |
|
Al
|
1917816b80
|
[countries] Not relying on pycountry alpha 2 codes for localized country names as it doesn't contain Kosovo which was causing problems
|
2015-10-07 03:44:49 -04:00 |
|
Al
|
cfa57c96a3
|
[fix] untagged formatted addresses
|
2015-10-04 02:02:59 -04:00 |
|
Al
|
5d2a24872a
|
[osm] Adding dependencies so single street names are not valid without at least one of {house, number, suburb, city, postcode}
|
2015-10-03 15:22:26 -04:00 |
|
Al
|
77be2fe433
|
[osm] Adjusting priors for country code expansion
|
2015-10-03 15:13:16 -04:00 |
|
Al
|
0b98a26426
|
[fix] keeping name tag in address components
|
2015-10-03 15:10:14 -04:00 |
|
Al
|
0f9ad259dc
|
[osm] Doing initial formatting after replacing country/state
|
2015-10-03 14:40:38 -04:00 |
|
Al
|
71233c9c02
|
[fix] import, initialization
|
2015-10-03 14:37:08 -04:00 |
|
Al
|
85b17d9b27
|
[fix] file encoding
|
2015-10-03 14:34:29 -04:00 |
|
Al
|
1948aa87ea
|
[fix] typo
|
2015-10-03 14:33:45 -04:00 |
|
Al
|
22efce7337
|
[osm/parsing] Randomly replacing country codes with local and foreign language expansions as well as randomly expanding state abbreviations to make parser more robust to different input
|
2015-10-03 14:31:51 -04:00 |
|
Al
|
8920812055
|
[expansion] Adding state abbreviations for US, Canada and Australia for expansion while generating OSM training data
|
2015-10-03 14:25:30 -04:00 |
|
Al
|
7eb18f3538
|
[languages] Function to sample a random language from a discrete distribution (e.g. languages on the Internet, languages in a country, etc.)
|
2015-10-03 13:20:23 -04:00 |
|
Al
|
db71b65412
|
[fix] checking validity of component combination
|
2015-10-02 20:28:45 -04:00 |
|
Al
|
a2fd6e25f8
|
[fix] import
|
2015-10-02 20:25:48 -04:00 |
|
Al
|
49abb70b59
|
[fix] dictionary
|
2015-10-02 20:24:21 -04:00 |
|
Al
|
521f33d892
|
[fix] bitset for address components, only looking at valid component keys
|
2015-10-02 20:21:59 -04:00 |
|
Al
|
528285f735
|
[fix] only OSM tagged addresses need extra logic
|
2015-10-02 20:18:30 -04:00 |
|
Al
|
83aecb9f2c
|
[osm/parsing] Making tagged training data for address parser more robust to the types of partial input we see in geocoding by randomly eliminating components subject to some constraints (e.g. house number cannot be used without a street name)
|
2015-10-02 19:54:28 -04:00 |
|
Al
|
c790a2b87f
|
[fix] spoken/official
|
2015-10-02 19:50:11 -04:00 |
|
Al
|
db3364be30
|
[geonames] Using official country languages in GeoNames
|
2015-10-01 02:21:14 -04:00 |
|
Al
|
7dfbcce9ec
|
[languages] options for get_country_languages
|
2015-09-30 04:09:07 -04:00 |
|
Al
|
86e9166ae8
|
[doc] doumentation for country_names module, fixing variable name
|
2015-09-30 03:08:04 -04:00 |
|
Al
|
42e77cb570
|
[countries] Making country official names align better with OSM/Wikipedia, plugging holes
|
2015-09-30 01:03:03 -04:00 |
|
Al
|
40cf247655
|
[formatting] Constants for field names, a few options in format_address
|
2015-09-29 23:03:37 -04:00 |
|
Al
|
22e8178a97
|
[countries] Adding module for getting official country names in every language from CLDR + a dictionary of local language names
|
2015-09-29 21:10:38 -04:00 |
|
Al
|
daad1a1313
|
[geonames] Removing alternate names from geonames data set which are digits-only (most are not legitimate)
|
2015-09-28 17:46:53 -04:00 |
|
Al
|
f29f2f091b
|
[fix] PEBCAK
|
2015-09-27 22:49:27 -04:00 |
|
Al
|
93b3110a49
|
[fix] only commas and hyphens need to be eliminated at the end of phrases in untagged address formatting
|
2015-09-27 19:25:34 -04:00 |
|
Al
|
d3bfaf6b43
|
[osm/formatting] Fixing formatting tagged addresses with comma separated fields
|
2015-09-27 03:19:23 -04:00 |
|
Al
|
d512201e2c
|
[fix] removing space from tokens in address formatting
|
2015-09-27 02:18:34 -04:00 |
|
Al
|
5b829cd5a7
|
[fix] blank values containing punctuation in formatting
|
2015-09-26 21:49:28 -04:00 |
|
Al
|
dac0440be8
|
[fix] rsplit
|
2015-09-26 21:07:54 -04:00 |
|
Al
|
ae93552455
|
[osm/formatting] Moving back to openvenues repo pending resolution of the Turkish address issue
|
2015-09-26 03:56:52 -04:00 |
|
Al
|
0c792a2cc3
|
[osm/formatting] Changing the way the formatter elimiates inter-component separators, changing repo back to OpenCageData after pull request merge
|
2015-09-26 03:21:26 -04:00 |
|
Al
|
5417b4e602
|
[unicode] Downloading latest UnicodeData.txt instead of using builtin Python module (out of date) e.g. for getting unicode codepoint categories
|
2015-09-25 23:59:38 -04:00 |
|
Al
|
8fe791a14a
|
[fix] ensure_dir in file downloads
|
2015-09-25 17:05:22 -04:00 |
|
Al
|
646b9f7248
|
[osm/formatting] Continuing to use openvenues formatter for the India fix
|
2015-09-25 13:36:24 -04:00 |
|
Al
|
9901dd2aac
|
[fix] Switching address formatter back to OpenCageData repo
|
2015-09-24 18:42:17 -04:00 |
|
Al
|
3ce1669c30
|
[fix] import
|
2015-09-24 01:25:00 -04:00 |
|
Al
|
c85ce0b11d
|
[osm/formatting] Tagging separators as well in tagged output of the address formatter
|
2015-09-24 01:22:49 -04:00 |
|
Al
|
abfb1d4a60
|
[transliteration] Wide char support in transliteration data generator
|
2015-09-23 03:56:12 -04:00 |
|
Al
|
7e057b0fb8
|
[utils] basic functions for wide char support for narrow Python builds (unichr, ord, unicode iteration)
|
2015-09-23 00:42:54 -04:00 |
|
Al
|
8562c7a5cb
|
[unicode] Adding wide char support for language disambiguation (comes up in venue names), despite the likelihood of running on a narrow Python build. Rolling back common script chars at a script break, so in the case of e.g. Cyrllic name (Latin name), the segmentation is done at the space before the paren.
|
2015-09-23 00:37:59 -04:00 |
|
Al
|
13bcc35523
|
[unicode] Allowing wide chars in unicode properties
|
2015-09-23 00:34:07 -04:00 |
|
Al
|
b4593b6f88
|
[unicode/tokenization] Using new character classes including wide chars in scanner
|
2015-09-23 00:33:14 -04:00 |
|
Al
|
a76831df7a
|
[unicode] Wide version of word breaks
|
2015-09-22 18:55:33 -04:00 |
|
Al
|
25917cfb17
|
[fix] scripts
|
2015-09-22 15:15:30 -04:00 |
|