Al
|
528285f735
|
[fix] only OSM tagged addresses need extra logic
|
2015-10-02 20:18:30 -04:00 |
|
Al
|
83aecb9f2c
|
[osm/parsing] Making tagged training data for address parser more robust to the types of partial input we see in geocoding by randomly eliminating components subject to some constraints (e.g. house number cannot be used without a street name)
|
2015-10-02 19:54:28 -04:00 |
|
Al
|
ca25b48687
|
[fix] Not writing empty fields in formatted addresses
|
2015-09-22 08:13:55 -04:00 |
|
Al
|
134cf616d6
|
[osm] Using street for language disambiguation in training data
|
2015-09-21 04:09:15 -04:00 |
|
Al
|
84cf21df88
|
[osm] Separating address formatter into its own module, adding some documentation of the various training sets with examples
|
2015-09-20 20:05:46 -04:00 |
|
Al
|
6731395ca0
|
[osm] Separating tagged from untagged output
|
2015-09-19 14:11:47 -04:00 |
|
Al
|
b85fe50fad
|
[osm] Training data for toponyms only cares about valid languages for name field
|
2015-09-08 16:38:05 -07:00 |
|
Al
|
e566063343
|
[osm] Doing an all-to-nodes conversion and an additional filter on the borders data set
|
2015-09-08 09:18:08 -07:00 |
|
Al
|
8525529968
|
[osm] Not requiring qualified name tags to process OSM toponyms
|
2015-09-06 21:03:01 -07:00 |
|
Al
|
df20e2cbc0
|
[osm] Including toponyms in the training data for countries where the unqualified place names can be assumed to be examples of a given language
|
2015-09-04 14:13:33 -04:00 |
|
Al
|
17fcfa8b59
|
[fix] adding house to ignore keys rather than aliasing it
|
2015-09-04 12:40:08 -04:00 |
|
Al
|
d64a27bc57
|
[osm] Converting relations to nodes in borders training data
|
2015-09-04 12:32:25 -04:00 |
|
Al
|
168b7f59da
|
[fix] default indices in strip_component
|
2015-09-04 12:29:47 -04:00 |
|
Al
|
64db63e3eb
|
[osm] Removing house tag
|
2015-09-04 12:23:47 -04:00 |
|
Al
|
4ebdca0ea7
|
[fix] var
|
2015-09-03 21:01:20 -04:00 |
|
Al
|
8345afbcd0
|
[fix] exclude country toponyms where the default languages is well represented
|
2015-09-03 20:56:58 -04:00 |
|
Al
|
20bb191624
|
[fix] chaining
|
2015-09-03 20:52:00 -04:00 |
|
Al
|
e7cf5000fe
|
[fix] Exclude polygons with > 1 regional language
|
2015-09-03 20:48:04 -04:00 |
|
Al
|
9a9530c1b9
|
[fix] unqualified names
|
2015-09-03 20:37:22 -04:00 |
|
Al
|
a5fdd911d8
|
[fix] only use name key for default names
|
2015-09-03 20:35:08 -04:00 |
|
Al
|
d8e1432533
|
[osm] Adding unqualified names in single-language countries
|
2015-09-03 20:31:49 -04:00 |
|
Al
|
b15d2d70aa
|
[fix] top language
|
2015-09-03 20:09:46 -04:00 |
|
Al
|
44bf94a158
|
[osm] Better borders training data set (only need the metadata, not the polygons)
|
2015-09-03 20:09:03 -04:00 |
|
Al
|
55af9b0a0c
|
[fix] OSM address tagged training data formatting
|
2015-09-03 18:35:19 -04:00 |
|
Al
|
c6bfc0e021
|
[osm] Postponing punctuation stripping until after address template rendering
|
2015-09-03 18:13:41 -04:00 |
|
Al
|
d54fb25e45
|
[osm] don't bother with the R-tree check if there are no name:* tags in border data set
|
2015-09-03 17:54:40 -04:00 |
|
Al
|
33af61095b
|
[fix] var
|
2015-09-03 17:49:52 -04:00 |
|
Al
|
294101ad80
|
[osm] Treating components that are all punctuation as blank in address parsing (e.g. a single comma)
|
2015-09-03 17:46:57 -04:00 |
|
Al
|
e1e5c16637
|
[osm] Not adding unqualified name tags to toponym data set, throwing out a few cases of language ambiguity
|
2015-09-03 16:50:30 -04:00 |
|
Al
|
040a26a6f2
|
[fix] import
|
2015-09-03 13:54:23 -04:00 |
|
Al
|
7787427c58
|
[fix] typo
|
2015-09-03 13:53:18 -04:00 |
|
Al
|
23633e95dd
|
[osm] Only adding country default language toponyms to training data
|
2015-09-03 13:44:41 -04:00 |
|
Al
|
11c01f64d2
|
[osm] OrderedDict of attrs in OSM training data
|
2015-09-03 11:11:18 -04:00 |
|
Al
|
27eb4e4aed
|
[osm] Adding a toponym language training set using planet-borders.osm (all admin borders)
|
2015-09-03 10:19:11 -04:00 |
|
Al
|
db57855c95
|
[osm] Switching formatter repo to the OpenVenues fork, with fixes and several dozen new countries added
|
2015-09-03 10:06:54 -04:00 |
|
Al
|
a2ec8001b0
|
[osm] Removing postal code keys in formatted language training data
|
2015-08-24 14:08:36 -04:00 |
|
Al
|
e70c2453ee
|
[fix] import
|
2015-08-22 15:04:30 -04:00 |
|
Al
|
3902715258
|
[osm] Some countries like Lebanon in OSM will list the same address under two languages (French/English), which creates an unreasonable task for a linear classifier, so running disambiguation in those cases
|
2015-08-22 14:11:49 -04:00 |
|
Al
|
4976be64e5
|
[fix] var name
|
2015-08-21 08:02:26 -04:00 |
|
Al
|
8e56568cab
|
[fix] typo
|
2015-08-21 08:01:49 -04:00 |
|
Al
|
ca6d802a43
|
[languages] Moving language id methods into a separate package
|
2015-08-21 08:00:56 -04:00 |
|
Al
|
9d2f7e4bd1
|
[fix] var name
|
2015-08-18 16:20:12 -04:00 |
|
Al
|
0528d1b578
|
[osm] OSM untagged formatted addresses try to use language namespaced tags
|
2015-08-18 16:18:27 -04:00 |
|
Al
|
c09cb4dd82
|
[osm] OSM untagged formatted addresses now use the new language labeling scheme
|
2015-08-18 15:13:10 -04:00 |
|
Al
|
3daba2ddcd
|
[fix] removing debug print
|
2015-08-18 13:22:48 -04:00 |
|
Al
|
ffe76f0403
|
[languages/osm] Checking for existence of separable prefix/suffix in the given dictionaries
|
2015-08-18 12:10:06 -04:00 |
|
Al
|
0e00625dbd
|
[languages/osm] Adding a primitive phrase dictionary to the OSM training data construction script and a few heuristics to help disambiguate in the case of small local language groups that may not be specified with name:lang tags e.g. Occitan, Catalan, Basque, Galician, etc. Also throwing away ambiguous multilanguage names
|
2015-08-18 11:12:27 -04:00 |
|
Al
|
89071ea21a
|
[osm] Omitting country in limited address data set (often abbreviated, doesn't convey language as well)
|
2015-08-15 03:25:45 -04:00 |
|
Al
|
c505260912
|
[fix] var name
|
2015-08-15 02:47:31 -04:00 |
|
Al
|
548ce79b99
|
[fix] street addresses by language
|
2015-08-15 02:44:04 -04:00 |
|