Al
|
9901dd2aac
|
[fix] Switching address formatter back to OpenCageData repo
|
2015-09-24 18:42:17 -04:00 |
|
Al
|
3ce1669c30
|
[fix] import
|
2015-09-24 01:25:00 -04:00 |
|
Al
|
c85ce0b11d
|
[osm/formatting] Tagging separators as well in tagged output of the address formatter
|
2015-09-24 01:22:49 -04:00 |
|
Al
|
abfb1d4a60
|
[transliteration] Wide char support in transliteration data generator
|
2015-09-23 03:56:12 -04:00 |
|
Al
|
7e057b0fb8
|
[utils] basic functions for wide char support for narrow Python builds (unichr, ord, unicode iteration)
|
2015-09-23 00:42:54 -04:00 |
|
Al
|
8562c7a5cb
|
[unicode] Adding wide char support for language disambiguation (comes up in venue names), despite the likelihood of running on a narrow Python build. Rolling back common script chars at a script break, so in the case of e.g. Cyrllic name (Latin name), the segmentation is done at the space before the paren.
|
2015-09-23 00:37:59 -04:00 |
|
Al
|
13bcc35523
|
[unicode] Allowing wide chars in unicode properties
|
2015-09-23 00:34:07 -04:00 |
|
Al
|
b4593b6f88
|
[unicode/tokenization] Using new character classes including wide chars in scanner
|
2015-09-23 00:33:14 -04:00 |
|
Al
|
a76831df7a
|
[unicode] Wide version of word breaks
|
2015-09-22 18:55:33 -04:00 |
|
Al
|
25917cfb17
|
[fix] scripts
|
2015-09-22 15:15:30 -04:00 |
|
Al
|
b405a53fe1
|
[fix] chars out of range in get_string_script Python version
|
2015-09-22 08:14:27 -04:00 |
|
Al
|
ca25b48687
|
[fix] Not writing empty fields in formatted addresses
|
2015-09-22 08:13:55 -04:00 |
|
Al
|
747de1944b
|
[fix] Accounting for unknown scripts in disambiguation
|
2015-09-21 18:05:28 -04:00 |
|
Al
|
134cf616d6
|
[osm] Using street for language disambiguation in training data
|
2015-09-21 04:09:15 -04:00 |
|
Al
|
84cf21df88
|
[osm] Separating address formatter into its own module, adding some documentation of the various training sets with examples
|
2015-09-20 20:05:46 -04:00 |
|
Al
|
6731395ca0
|
[osm] Separating tagged from untagged output
|
2015-09-19 14:11:47 -04:00 |
|
Al
|
35f1c02caf
|
[polygons] Reducing simplify tolerance for language polys now that regional languages are handled separately
|
2015-09-10 12:44:13 -07:00 |
|
Al
|
440a8158b6
|
[polygons] Adding in country languages for regional polygons without a default language
|
2015-09-10 12:34:26 -07:00 |
|
Al
|
fca7f21b1d
|
[polygons] Making simplify_tolerance and preserve_topology for polygon simplification configurable per class
|
2015-09-10 11:06:18 -07:00 |
|
Al
|
b85fe50fad
|
[osm] Training data for toponyms only cares about valid languages for name field
|
2015-09-08 16:38:05 -07:00 |
|
Al
|
e566063343
|
[osm] Doing an all-to-nodes conversion and an additional filter on the borders data set
|
2015-09-08 09:18:08 -07:00 |
|
Al
|
8525529968
|
[osm] Not requiring qualified name tags to process OSM toponyms
|
2015-09-06 21:03:01 -07:00 |
|
Al
|
df20e2cbc0
|
[osm] Including toponyms in the training data for countries where the unqualified place names can be assumed to be examples of a given language
|
2015-09-04 14:13:33 -04:00 |
|
Al
|
17fcfa8b59
|
[fix] adding house to ignore keys rather than aliasing it
|
2015-09-04 12:40:08 -04:00 |
|
Al
|
d64a27bc57
|
[osm] Converting relations to nodes in borders training data
|
2015-09-04 12:32:25 -04:00 |
|
Al
|
168b7f59da
|
[fix] default indices in strip_component
|
2015-09-04 12:29:47 -04:00 |
|
Al
|
64db63e3eb
|
[osm] Removing house tag
|
2015-09-04 12:23:47 -04:00 |
|
Al
|
6a20ce5e85
|
[language_id] Adding formatted addresses and toponyms to language training data
|
2015-09-04 01:46:49 -04:00 |
|
Al
|
4ebdca0ea7
|
[fix] var
|
2015-09-03 21:01:20 -04:00 |
|
Al
|
8345afbcd0
|
[fix] exclude country toponyms where the default languages is well represented
|
2015-09-03 20:56:58 -04:00 |
|
Al
|
20bb191624
|
[fix] chaining
|
2015-09-03 20:52:00 -04:00 |
|
Al
|
e7cf5000fe
|
[fix] Exclude polygons with > 1 regional language
|
2015-09-03 20:48:04 -04:00 |
|
Al
|
9a9530c1b9
|
[fix] unqualified names
|
2015-09-03 20:37:22 -04:00 |
|
Al
|
a5fdd911d8
|
[fix] only use name key for default names
|
2015-09-03 20:35:08 -04:00 |
|
Al
|
d8e1432533
|
[osm] Adding unqualified names in single-language countries
|
2015-09-03 20:31:49 -04:00 |
|
Al
|
b15d2d70aa
|
[fix] top language
|
2015-09-03 20:09:46 -04:00 |
|
Al
|
44bf94a158
|
[osm] Better borders training data set (only need the metadata, not the polygons)
|
2015-09-03 20:09:03 -04:00 |
|
Al
|
55af9b0a0c
|
[fix] OSM address tagged training data formatting
|
2015-09-03 18:35:19 -04:00 |
|
Al
|
c6bfc0e021
|
[osm] Postponing punctuation stripping until after address template rendering
|
2015-09-03 18:13:41 -04:00 |
|
Al
|
d54fb25e45
|
[osm] don't bother with the R-tree check if there are no name:* tags in border data set
|
2015-09-03 17:54:40 -04:00 |
|
Al
|
33af61095b
|
[fix] var
|
2015-09-03 17:49:52 -04:00 |
|
Al
|
294101ad80
|
[osm] Treating components that are all punctuation as blank in address parsing (e.g. a single comma)
|
2015-09-03 17:46:57 -04:00 |
|
Al
|
e1e5c16637
|
[osm] Not adding unqualified name tags to toponym data set, throwing out a few cases of language ambiguity
|
2015-09-03 16:50:30 -04:00 |
|
Al
|
040a26a6f2
|
[fix] import
|
2015-09-03 13:54:23 -04:00 |
|
Al
|
7787427c58
|
[fix] typo
|
2015-09-03 13:53:18 -04:00 |
|
Al
|
23633e95dd
|
[osm] Only adding country default language toponyms to training data
|
2015-09-03 13:44:41 -04:00 |
|
Al
|
11c01f64d2
|
[osm] OrderedDict of attrs in OSM training data
|
2015-09-03 11:11:18 -04:00 |
|
Al
|
27eb4e4aed
|
[osm] Adding a toponym language training set using planet-borders.osm (all admin borders)
|
2015-09-03 10:19:11 -04:00 |
|
Al
|
db57855c95
|
[osm] Switching formatter repo to the OpenVenues fork, with fixes and several dozen new countries added
|
2015-09-03 10:06:54 -04:00 |
|
Al
|
a916668f28
|
[i18n] Local file for ISO 15924
|
2015-09-01 23:58:36 -04:00 |
|