Al
|
42e77cb570
|
[countries] Making country official names align better with OSM/Wikipedia, plugging holes
|
2015-09-30 01:03:03 -04:00 |
|
Al
|
40cf247655
|
[formatting] Constants for field names, a few options in format_address
|
2015-09-29 23:03:37 -04:00 |
|
Al
|
22e8178a97
|
[countries] Adding module for getting official country names in every language from CLDR + a dictionary of local language names
|
2015-09-29 21:10:38 -04:00 |
|
Al
|
daad1a1313
|
[geonames] Removing alternate names from geonames data set which are digits-only (most are not legitimate)
|
2015-09-28 17:46:53 -04:00 |
|
Al
|
f29f2f091b
|
[fix] PEBCAK
|
2015-09-27 22:49:27 -04:00 |
|
Al
|
93b3110a49
|
[fix] only commas and hyphens need to be eliminated at the end of phrases in untagged address formatting
|
2015-09-27 19:25:34 -04:00 |
|
Al
|
d3bfaf6b43
|
[osm/formatting] Fixing formatting tagged addresses with comma separated fields
|
2015-09-27 03:19:23 -04:00 |
|
Al
|
d512201e2c
|
[fix] removing space from tokens in address formatting
|
2015-09-27 02:18:34 -04:00 |
|
Al
|
5b829cd5a7
|
[fix] blank values containing punctuation in formatting
|
2015-09-26 21:49:28 -04:00 |
|
Al
|
dac0440be8
|
[fix] rsplit
|
2015-09-26 21:07:54 -04:00 |
|
Al
|
ae93552455
|
[osm/formatting] Moving back to openvenues repo pending resolution of the Turkish address issue
|
2015-09-26 03:56:52 -04:00 |
|
Al
|
0c792a2cc3
|
[osm/formatting] Changing the way the formatter elimiates inter-component separators, changing repo back to OpenCageData after pull request merge
|
2015-09-26 03:21:26 -04:00 |
|
Al
|
5417b4e602
|
[unicode] Downloading latest UnicodeData.txt instead of using builtin Python module (out of date) e.g. for getting unicode codepoint categories
|
2015-09-25 23:59:38 -04:00 |
|
Al
|
8fe791a14a
|
[fix] ensure_dir in file downloads
|
2015-09-25 17:05:22 -04:00 |
|
Al
|
646b9f7248
|
[osm/formatting] Continuing to use openvenues formatter for the India fix
|
2015-09-25 13:36:24 -04:00 |
|
Al
|
9901dd2aac
|
[fix] Switching address formatter back to OpenCageData repo
|
2015-09-24 18:42:17 -04:00 |
|
Al
|
3ce1669c30
|
[fix] import
|
2015-09-24 01:25:00 -04:00 |
|
Al
|
c85ce0b11d
|
[osm/formatting] Tagging separators as well in tagged output of the address formatter
|
2015-09-24 01:22:49 -04:00 |
|
Al
|
abfb1d4a60
|
[transliteration] Wide char support in transliteration data generator
|
2015-09-23 03:56:12 -04:00 |
|
Al
|
7e057b0fb8
|
[utils] basic functions for wide char support for narrow Python builds (unichr, ord, unicode iteration)
|
2015-09-23 00:42:54 -04:00 |
|
Al
|
8562c7a5cb
|
[unicode] Adding wide char support for language disambiguation (comes up in venue names), despite the likelihood of running on a narrow Python build. Rolling back common script chars at a script break, so in the case of e.g. Cyrllic name (Latin name), the segmentation is done at the space before the paren.
|
2015-09-23 00:37:59 -04:00 |
|
Al
|
13bcc35523
|
[unicode] Allowing wide chars in unicode properties
|
2015-09-23 00:34:07 -04:00 |
|
Al
|
b4593b6f88
|
[unicode/tokenization] Using new character classes including wide chars in scanner
|
2015-09-23 00:33:14 -04:00 |
|
Al
|
a76831df7a
|
[unicode] Wide version of word breaks
|
2015-09-22 18:55:33 -04:00 |
|
Al
|
25917cfb17
|
[fix] scripts
|
2015-09-22 15:15:30 -04:00 |
|
Al
|
b405a53fe1
|
[fix] chars out of range in get_string_script Python version
|
2015-09-22 08:14:27 -04:00 |
|
Al
|
ca25b48687
|
[fix] Not writing empty fields in formatted addresses
|
2015-09-22 08:13:55 -04:00 |
|
Al
|
747de1944b
|
[fix] Accounting for unknown scripts in disambiguation
|
2015-09-21 18:05:28 -04:00 |
|
Al
|
134cf616d6
|
[osm] Using street for language disambiguation in training data
|
2015-09-21 04:09:15 -04:00 |
|
Al
|
84cf21df88
|
[osm] Separating address formatter into its own module, adding some documentation of the various training sets with examples
|
2015-09-20 20:05:46 -04:00 |
|
Al
|
6731395ca0
|
[osm] Separating tagged from untagged output
|
2015-09-19 14:11:47 -04:00 |
|
Al
|
35f1c02caf
|
[polygons] Reducing simplify tolerance for language polys now that regional languages are handled separately
|
2015-09-10 12:44:13 -07:00 |
|
Al
|
440a8158b6
|
[polygons] Adding in country languages for regional polygons without a default language
|
2015-09-10 12:34:26 -07:00 |
|
Al
|
fca7f21b1d
|
[polygons] Making simplify_tolerance and preserve_topology for polygon simplification configurable per class
|
2015-09-10 11:06:18 -07:00 |
|
Al
|
b85fe50fad
|
[osm] Training data for toponyms only cares about valid languages for name field
|
2015-09-08 16:38:05 -07:00 |
|
Al
|
e566063343
|
[osm] Doing an all-to-nodes conversion and an additional filter on the borders data set
|
2015-09-08 09:18:08 -07:00 |
|
Al
|
8525529968
|
[osm] Not requiring qualified name tags to process OSM toponyms
|
2015-09-06 21:03:01 -07:00 |
|
Al
|
df20e2cbc0
|
[osm] Including toponyms in the training data for countries where the unqualified place names can be assumed to be examples of a given language
|
2015-09-04 14:13:33 -04:00 |
|
Al
|
17fcfa8b59
|
[fix] adding house to ignore keys rather than aliasing it
|
2015-09-04 12:40:08 -04:00 |
|
Al
|
d64a27bc57
|
[osm] Converting relations to nodes in borders training data
|
2015-09-04 12:32:25 -04:00 |
|
Al
|
168b7f59da
|
[fix] default indices in strip_component
|
2015-09-04 12:29:47 -04:00 |
|
Al
|
64db63e3eb
|
[osm] Removing house tag
|
2015-09-04 12:23:47 -04:00 |
|
Al
|
6a20ce5e85
|
[language_id] Adding formatted addresses and toponyms to language training data
|
2015-09-04 01:46:49 -04:00 |
|
Al
|
4ebdca0ea7
|
[fix] var
|
2015-09-03 21:01:20 -04:00 |
|
Al
|
8345afbcd0
|
[fix] exclude country toponyms where the default languages is well represented
|
2015-09-03 20:56:58 -04:00 |
|
Al
|
20bb191624
|
[fix] chaining
|
2015-09-03 20:52:00 -04:00 |
|
Al
|
e7cf5000fe
|
[fix] Exclude polygons with > 1 regional language
|
2015-09-03 20:48:04 -04:00 |
|
Al
|
9a9530c1b9
|
[fix] unqualified names
|
2015-09-03 20:37:22 -04:00 |
|
Al
|
a5fdd911d8
|
[fix] only use name key for default names
|
2015-09-03 20:35:08 -04:00 |
|
Al
|
d8e1432533
|
[osm] Adding unqualified names in single-language countries
|
2015-09-03 20:31:49 -04:00 |
|