Al
|
85b17d9b27
|
[fix] file encoding
|
2015-10-03 14:34:29 -04:00 |
|
Al
|
1948aa87ea
|
[fix] typo
|
2015-10-03 14:33:45 -04:00 |
|
Al
|
22efce7337
|
[osm/parsing] Randomly replacing country codes with local and foreign language expansions as well as randomly expanding state abbreviations to make parser more robust to different input
|
2015-10-03 14:31:51 -04:00 |
|
Al
|
8920812055
|
[expansion] Adding state abbreviations for US, Canada and Australia for expansion while generating OSM training data
|
2015-10-03 14:25:30 -04:00 |
|
Al
|
7eb18f3538
|
[languages] Function to sample a random language from a discrete distribution (e.g. languages on the Internet, languages in a country, etc.)
|
2015-10-03 13:20:23 -04:00 |
|
Al
|
db71b65412
|
[fix] checking validity of component combination
|
2015-10-02 20:28:45 -04:00 |
|
Al
|
a2fd6e25f8
|
[fix] import
|
2015-10-02 20:25:48 -04:00 |
|
Al
|
49abb70b59
|
[fix] dictionary
|
2015-10-02 20:24:21 -04:00 |
|
Al
|
521f33d892
|
[fix] bitset for address components, only looking at valid component keys
|
2015-10-02 20:21:59 -04:00 |
|
Al
|
528285f735
|
[fix] only OSM tagged addresses need extra logic
|
2015-10-02 20:18:30 -04:00 |
|
Al
|
83aecb9f2c
|
[osm/parsing] Making tagged training data for address parser more robust to the types of partial input we see in geocoding by randomly eliminating components subject to some constraints (e.g. house number cannot be used without a street name)
|
2015-10-02 19:54:28 -04:00 |
|
Al
|
c790a2b87f
|
[fix] spoken/official
|
2015-10-02 19:50:11 -04:00 |
|
Al
|
db3364be30
|
[geonames] Using official country languages in GeoNames
|
2015-10-01 02:21:14 -04:00 |
|
Al
|
7dfbcce9ec
|
[languages] options for get_country_languages
|
2015-09-30 04:09:07 -04:00 |
|
Al
|
86e9166ae8
|
[doc] doumentation for country_names module, fixing variable name
|
2015-09-30 03:08:04 -04:00 |
|
Al
|
42e77cb570
|
[countries] Making country official names align better with OSM/Wikipedia, plugging holes
|
2015-09-30 01:03:03 -04:00 |
|
Al
|
40cf247655
|
[formatting] Constants for field names, a few options in format_address
|
2015-09-29 23:03:37 -04:00 |
|
Al
|
22e8178a97
|
[countries] Adding module for getting official country names in every language from CLDR + a dictionary of local language names
|
2015-09-29 21:10:38 -04:00 |
|
Al
|
daad1a1313
|
[geonames] Removing alternate names from geonames data set which are digits-only (most are not legitimate)
|
2015-09-28 17:46:53 -04:00 |
|
Al
|
f29f2f091b
|
[fix] PEBCAK
|
2015-09-27 22:49:27 -04:00 |
|
Al
|
93b3110a49
|
[fix] only commas and hyphens need to be eliminated at the end of phrases in untagged address formatting
|
2015-09-27 19:25:34 -04:00 |
|
Al
|
d3bfaf6b43
|
[osm/formatting] Fixing formatting tagged addresses with comma separated fields
|
2015-09-27 03:19:23 -04:00 |
|
Al
|
d512201e2c
|
[fix] removing space from tokens in address formatting
|
2015-09-27 02:18:34 -04:00 |
|
Al
|
5b829cd5a7
|
[fix] blank values containing punctuation in formatting
|
2015-09-26 21:49:28 -04:00 |
|
Al
|
dac0440be8
|
[fix] rsplit
|
2015-09-26 21:07:54 -04:00 |
|
Al
|
ae93552455
|
[osm/formatting] Moving back to openvenues repo pending resolution of the Turkish address issue
|
2015-09-26 03:56:52 -04:00 |
|
Al
|
0c792a2cc3
|
[osm/formatting] Changing the way the formatter elimiates inter-component separators, changing repo back to OpenCageData after pull request merge
|
2015-09-26 03:21:26 -04:00 |
|
Al
|
5417b4e602
|
[unicode] Downloading latest UnicodeData.txt instead of using builtin Python module (out of date) e.g. for getting unicode codepoint categories
|
2015-09-25 23:59:38 -04:00 |
|
Al
|
8fe791a14a
|
[fix] ensure_dir in file downloads
|
2015-09-25 17:05:22 -04:00 |
|
Al
|
646b9f7248
|
[osm/formatting] Continuing to use openvenues formatter for the India fix
|
2015-09-25 13:36:24 -04:00 |
|
Al
|
9901dd2aac
|
[fix] Switching address formatter back to OpenCageData repo
|
2015-09-24 18:42:17 -04:00 |
|
Al
|
3ce1669c30
|
[fix] import
|
2015-09-24 01:25:00 -04:00 |
|
Al
|
c85ce0b11d
|
[osm/formatting] Tagging separators as well in tagged output of the address formatter
|
2015-09-24 01:22:49 -04:00 |
|
Al
|
abfb1d4a60
|
[transliteration] Wide char support in transliteration data generator
|
2015-09-23 03:56:12 -04:00 |
|
Al
|
7e057b0fb8
|
[utils] basic functions for wide char support for narrow Python builds (unichr, ord, unicode iteration)
|
2015-09-23 00:42:54 -04:00 |
|
Al
|
8562c7a5cb
|
[unicode] Adding wide char support for language disambiguation (comes up in venue names), despite the likelihood of running on a narrow Python build. Rolling back common script chars at a script break, so in the case of e.g. Cyrllic name (Latin name), the segmentation is done at the space before the paren.
|
2015-09-23 00:37:59 -04:00 |
|
Al
|
13bcc35523
|
[unicode] Allowing wide chars in unicode properties
|
2015-09-23 00:34:07 -04:00 |
|
Al
|
b4593b6f88
|
[unicode/tokenization] Using new character classes including wide chars in scanner
|
2015-09-23 00:33:14 -04:00 |
|
Al
|
a76831df7a
|
[unicode] Wide version of word breaks
|
2015-09-22 18:55:33 -04:00 |
|
Al
|
25917cfb17
|
[fix] scripts
|
2015-09-22 15:15:30 -04:00 |
|
Al
|
b405a53fe1
|
[fix] chars out of range in get_string_script Python version
|
2015-09-22 08:14:27 -04:00 |
|
Al
|
ca25b48687
|
[fix] Not writing empty fields in formatted addresses
|
2015-09-22 08:13:55 -04:00 |
|
Al
|
747de1944b
|
[fix] Accounting for unknown scripts in disambiguation
|
2015-09-21 18:05:28 -04:00 |
|
Al
|
134cf616d6
|
[osm] Using street for language disambiguation in training data
|
2015-09-21 04:09:15 -04:00 |
|
Al
|
84cf21df88
|
[osm] Separating address formatter into its own module, adding some documentation of the various training sets with examples
|
2015-09-20 20:05:46 -04:00 |
|
Al
|
6731395ca0
|
[osm] Separating tagged from untagged output
|
2015-09-19 14:11:47 -04:00 |
|
Al
|
35f1c02caf
|
[polygons] Reducing simplify tolerance for language polys now that regional languages are handled separately
|
2015-09-10 12:44:13 -07:00 |
|
Al
|
440a8158b6
|
[polygons] Adding in country languages for regional polygons without a default language
|
2015-09-10 12:34:26 -07:00 |
|
Al
|
fca7f21b1d
|
[polygons] Making simplify_tolerance and preserve_topology for polygon simplification configurable per class
|
2015-09-10 11:06:18 -07:00 |
|
Al
|
b85fe50fad
|
[osm] Training data for toponyms only cares about valid languages for name field
|
2015-09-08 16:38:05 -07:00 |
|