Al
|
e70c2453ee
|
[fix] import
|
2015-08-22 15:04:30 -04:00 |
|
Al
|
3902715258
|
[osm] Some countries like Lebanon in OSM will list the same address under two languages (French/English), which creates an unreasonable task for a linear classifier, so running disambiguation in those cases
|
2015-08-22 14:11:49 -04:00 |
|
Al
|
4976be64e5
|
[fix] var name
|
2015-08-21 08:02:26 -04:00 |
|
Al
|
8e56568cab
|
[fix] typo
|
2015-08-21 08:01:49 -04:00 |
|
Al
|
ca6d802a43
|
[languages] Moving language id methods into a separate package
|
2015-08-21 08:00:56 -04:00 |
|
Al
|
9d2f7e4bd1
|
[fix] var name
|
2015-08-18 16:20:12 -04:00 |
|
Al
|
0528d1b578
|
[osm] OSM untagged formatted addresses try to use language namespaced tags
|
2015-08-18 16:18:27 -04:00 |
|
Al
|
c09cb4dd82
|
[osm] OSM untagged formatted addresses now use the new language labeling scheme
|
2015-08-18 15:13:10 -04:00 |
|
Al
|
3daba2ddcd
|
[fix] removing debug print
|
2015-08-18 13:22:48 -04:00 |
|
Al
|
ffe76f0403
|
[languages/osm] Checking for existence of separable prefix/suffix in the given dictionaries
|
2015-08-18 12:10:06 -04:00 |
|
Al
|
0e00625dbd
|
[languages/osm] Adding a primitive phrase dictionary to the OSM training data construction script and a few heuristics to help disambiguate in the case of small local language groups that may not be specified with name:lang tags e.g. Occitan, Catalan, Basque, Galician, etc. Also throwing away ambiguous multilanguage names
|
2015-08-18 11:12:27 -04:00 |
|
Al
|
89071ea21a
|
[osm] Omitting country in limited address data set (often abbreviated, doesn't convey language as well)
|
2015-08-15 03:25:45 -04:00 |
|
Al
|
c505260912
|
[fix] var name
|
2015-08-15 02:47:31 -04:00 |
|
Al
|
548ce79b99
|
[fix] street addresses by language
|
2015-08-15 02:44:04 -04:00 |
|
Al
|
74a751ce0a
|
[osm] Adding a new OSM training data option for writing out full formatted addresses without place names
|
2015-08-15 02:39:49 -04:00 |
|
Al
|
0e92abd53e
|
[osm] Adding building tag to venues training set construction
|
2015-08-14 21:07:07 -04:00 |
|
Al
|
cad1f95bbb
|
[osm] Making minimal_only the default in formatted addresses, expanding list of acceptable combinations of address fields
|
2015-08-14 10:21:17 -04:00 |
|
Al
|
1e936ac9dc
|
[fix] road+house_number as minimal keys for formatting addresses
|
2015-08-14 04:09:51 -04:00 |
|
Al
|
83bbd67c9c
|
[fix] param
|
2015-08-14 00:57:17 -04:00 |
|
Al
|
e993ddcb51
|
[fix] splitter
|
2015-08-14 00:54:06 -04:00 |
|
Al
|
dc2766ae5d
|
[fix] __init__
|
2015-08-14 00:49:06 -04:00 |
|
Al
|
62c67aa970
|
[osm] Using pipe splitter for address components
|
2015-08-14 00:45:49 -04:00 |
|
Al
|
2bd763be03
|
[osm] Prefer amenity tag, skip if the building tag is simply building=yes
|
2015-08-13 21:16:34 -04:00 |
|
Al
|
c844d0484a
|
[fix] carriage returns
|
2015-08-13 21:07:12 -04:00 |
|
Al
|
ef14aa2b7e
|
[osm] Replacing escape chars at write time as there's no quoting, adding building key to venue training data
|
2015-08-13 19:30:44 -04:00 |
|
Al
|
46f2c68a69
|
[osm] Using tsv_no_quote writers in all OSM training data files
|
2015-08-13 18:40:41 -04:00 |
|
Al
|
cdb9afddd3
|
[fix] address training data carriage returns
|
2015-07-25 00:35:27 -04:00 |
|
Al
|
5cba747a93
|
[fix] variable name
|
2015-07-17 03:06:09 -04:00 |
|
Al
|
5e7bb54a5c
|
[polygons] only add language polygons if there's one default language
|
2015-07-17 02:19:55 -04:00 |
|
Al
|
d5ac816066
|
[fix] import
|
2015-07-16 13:33:50 -04:00 |
|
Al
|
8899be6eef
|
[osm] choosing the first default language for OSM training data, fixing way/relation offsets
|
2015-07-16 13:32:16 -04:00 |
|
Al
|
d57f9df7ed
|
[fix] regexes
|
2015-07-14 14:04:32 -04:00 |
|
Al
|
d494963dcd
|
[fix] lat/lon conversion in address formatting
|
2015-07-14 13:34:22 -04:00 |
|
Al
|
a0f2ff1e2a
|
[fix] adding encoding declaration
|
2015-07-13 21:09:18 -04:00 |
|
Al
|
d15737b319
|
[osm] Validating lat/lon in OSM training data
|
2015-07-13 21:08:08 -04:00 |
|
Al
|
0c18a57c4e
|
[fix] planet url no longer needed
|
2015-07-13 14:27:26 -04:00 |
|
Al
|
e8348dde0e
|
[osm] removing all the fetch/convert arguments from training data generator
|
2015-07-13 14:24:54 -04:00 |
|
Al
|
5e9e08f6b1
|
[fix] making fetch script executable
|
2015-07-13 14:19:24 -04:00 |
|
Al
|
465bcd46aa
|
[fix] input file in OSM training data generator
|
2015-07-13 14:18:24 -04:00 |
|
Al
|
961606ac12
|
[fix] removing intermediate file in OSM fetch
|
2015-07-13 14:17:57 -04:00 |
|
Al
|
59bf23ae67
|
[osm] Planet admin bounds filter
|
2015-07-13 04:08:55 -04:00 |
|
Al
|
ec1e820268
|
[parsing] Changing to OpenCageData repo
|
2015-07-09 13:44:14 -04:00 |
|
Al
|
cb2035867b
|
[fix] osm geodata imports
|
2015-06-15 18:36:01 -04:00 |
|
Al
|
22fa81b33f
|
[fix] __init__.py
|
2015-06-15 17:54:27 -04:00 |
|
Al
|
6c8e5b45a4
|
[fix] removing building alias (for OSm it means building category), fix to fetch script
|
2015-03-18 08:40:07 -04:00 |
|
Al
|
aeac0fe8c0
|
[geodata] Script to construct OSM training examples for building language dictionaries, disambiguating between abbreviations, classifying venues by type and formatting addresses for use in a sequence model with Lokku's address-formatting repo.
|
2015-03-17 18:11:07 -04:00 |
|
Al
|
0437271c92
|
[geodata] OSM planet fetch needs to convert ways/relations to nodes for all data sets
|
2015-03-17 16:51:17 -04:00 |
|
Al
|
621b25c964
|
[geodata] script to fetch/transform OSM planet (needs about 100GB of disk free) training language models
|
2015-03-16 00:45:14 -04:00 |
|