Al
|
adc3a00264
|
[fix] var name
|
2016-01-22 04:10:16 -05:00 |
|
Al
|
261beffa36
|
[fix] Actually better to remove country and state from rare components and let them use the standard dropout probabilities
|
2016-01-22 04:00:45 -05:00 |
|
Al
|
a6cc3d0114
|
[fix] Adding state to the more frequently dropped components
|
2016-01-22 03:56:38 -05:00 |
|
Al
|
bca3dae004
|
[fix] state full name probabilities for limited vs. full formatted OSM training sets
|
2016-01-22 03:54:20 -05:00 |
|
Al
|
d1cf253092
|
[osm/formatting] Higher probability of dropout for rare components like counties, etc.
|
2016-01-22 03:39:35 -05:00 |
|
Al
|
9dd965a6fa
|
[fix] removing gazetteer configuration from disambiguation module
|
2016-01-22 03:18:18 -05:00 |
|
Al
|
b22646ee30
|
[mv] Moving gazetteers into their own module
|
2016-01-22 03:15:56 -05:00 |
|
Al
|
5a68e7aeef
|
[fix] import
|
2016-01-22 03:00:43 -05:00 |
|
Al
|
6ac72576bc
|
[osm/formatting] Randomly abbreviating street names and venue names using all the available libpostal dictionaries. Refactoring OSM formatting into separate methods which can be individually tested. Adding override for special phrases like UK
|
2016-01-22 02:56:39 -05:00 |
|
Al
|
f4995d4f0f
|
[languages] Adding several different types of dictionaries for name expansion/abbreviation in OSM
|
2016-01-22 00:51:32 -05:00 |
|
Al
|
26cbb1eb8d
|
[languages] Fixing multiple expansions in the same dictionary for Python trie, adding length for prefixes/suffixes
|
2016-01-21 04:29:14 -05:00 |
|
Al
|
0269d92e3d
|
[languages] Adding canonical string and dictionary type to Python trie, modifying disambiguate_languages accordingly, and adding lists of alternate forms
|
2016-01-21 02:30:59 -05:00 |
|
Al
|
2e15db06dd
|
[text] making normalize_string directly callable from Python geodata
|
2016-01-21 02:07:46 -05:00 |
|
Al
|
71e01e6133
|
[fix] prefix/suffix phrase search in Python trie search
|
2016-01-19 03:43:54 -05:00 |
|
Al
|
39667b73a2
|
[build] std=gnu99 in geodata build
|
2016-01-19 03:23:56 -05:00 |
|
Al
|
8b94a018e6
|
[languages] encoding in language disambiguation
|
2016-01-19 03:22:03 -05:00 |
|
Al
|
3262d2ccd3
|
[fix] arg count
|
2016-01-19 03:16:14 -05:00 |
|
Al
|
fe8f3158f6
|
[fix] missing file in geodata
|
2016-01-17 22:23:44 -05:00 |
|
Al
|
5fd9dc7e2b
|
[scripts] relative dirs in setup.py for geodata
|
2016-01-17 22:22:50 -05:00 |
|
Al
|
da62ff309e
|
[transliteration] Fixing Malayalam script
|
2016-01-17 22:15:56 -05:00 |
|
Al
|
8030b235e6
|
[languages] Changing the definition in script languages so only languages that appear on street signs will be used
|
2016-01-17 22:03:41 -05:00 |
|
Al
|
3d7dd8966e
|
[languages] Using unicode script in language disambiguation in addition to dictionaries. Eliminating dependency on address_normalizer
|
2016-01-17 18:28:28 -05:00 |
|
Al
|
fa32eacdd1
|
[phrases] Adding Python phrase filter from address_normalizer until a Python wrapper around libpostal's trie_search is available
|
2016-01-17 15:45:02 -05:00 |
|
Al
|
f79a3c5bf4
|
[osm/polygons] Allowing polygons that GEOS claims are invalid in OSM polygon index (there were some glaring omissions from the index like the polygons for the UK or Berlin). For some reason .buffer(0) creates weird multipolygons that no longer contain their centroids, etc. and aren't useful in reverese geocoding
|
2016-01-17 15:43:21 -05:00 |
|
Al
|
04f251c1cc
|
[polygons] Don't call fix_polygon (force polygon validity) by default
|
2016-01-16 21:21:27 -05:00 |
|
Al
|
19a5541a85
|
[polygons/osm] append polygon nodes by vertices that connect to each other
|
2016-01-16 21:20:49 -05:00 |
|
Al
|
58e53cab1c
|
[scripts] Adding the tokenize/normalize wrappers directly into the internal geodata package so pypostal can be maintained in an independent repo
|
2016-01-12 13:29:31 -05:00 |
|
Al
|
e9e05bb929
|
[transliteration] Distinguishing between variables with numbers and backreferences in transliteration rules
|
2015-12-23 13:07:44 -05:00 |
|
Al
|
e55ff54be1
|
[fix] Adding Korean-Latin-BGN to excluded transliterators
|
2015-12-21 16:24:50 -05:00 |
|
Al
|
682c316775
|
[transliteration] Removing Korean-Latin-BGN, not a great transliterator and AFAICT, ICU doesn't use it either
|
2015-12-21 12:45:45 -05:00 |
|
Al
|
ccf509edb1
|
[fix] update to control characters for generating the transliteration rules
|
2015-12-20 15:40:38 -05:00 |
|
Al
|
b2a944830a
|
[transliteration] Making sure the Python script to generate transliteration data works on the new CLDR format
|
2015-12-19 00:34:30 -05:00 |
|
Al
|
1d288954d7
|
[osm] Fixing an issue in the training data with house numbers in OSM (seen mostly in Uruguay) where a comma separated list of house numbers is entered.
|
2015-12-10 18:46:28 -05:00 |
|
Al
|
779298360c
|
[osm] In cases with more than one official language and where the address language can be determined, use it for looking up language-specific OSM polygons
|
2015-12-09 01:00:59 -05:00 |
|
Al
|
aeb72d7d26
|
[osm] Randomly select up to n components for state_district OSM boundaries. For all other fields select one name at random
|
2015-12-09 00:20:20 -05:00 |
|
Al
|
69a469d9d3
|
[osm] Choosing a language at random in countries with multilingual addresses for the parser training data so we get some monolingual examples
|
2015-12-08 20:38:32 -05:00 |
|
Al
|
35db855819
|
[fix] canonical index in address expansion data, should be -1 for all canonical phrases
|
2015-12-08 15:09:51 -05:00 |
|
Al
|
f8a3081d0f
|
[fix] city name in OSM formatting
|
2015-12-07 02:33:12 -05:00 |
|
Al
|
b25a738000
|
[osm] Doing more deduping in the OSM training data to avoid confusing the parser when city, state, district all have the same name
|
2015-12-06 16:14:02 -05:00 |
|
Al
|
dd8f8b4d7b
|
[fix] prefix/suffix regexes
|
2015-12-05 18:41:22 -05:00 |
|
Al
|
5fcb6d2c30
|
[fix] typo
|
2015-12-05 16:23:58 -05:00 |
|
Al
|
3a7ba0288f
|
[fix] .get
|
2015-12-05 16:13:15 -05:00 |
|
Al
|
c92a6de477
|
[fix] name
|
2015-12-05 15:49:50 -05:00 |
|
Al
|
2a4210f93f
|
[osm] Stripping standard city prefixes/suffies e.g. Township of
|
2015-12-05 15:42:22 -05:00 |
|
Al
|
f41158b8b3
|
[osm] Avoid using the alternate name (e.g. Brooklyn instead of Kings County) when it is the same as city
|
2015-12-05 14:21:07 -05:00 |
|
Al
|
7c26317903
|
[fix] osm components
|
2015-12-03 19:30:15 -05:00 |
|
Al
|
42a8890652
|
[osm] Only removing local language city if there are prior components from OSM
|
2015-12-03 19:11:03 -05:00 |
|
Al
|
ab0a4e622d
|
[formatting] Switching back over to OpenCageData
|
2015-12-03 18:03:21 -05:00 |
|
Al
|
5af95ee613
|
[osm] Adding GeoNames abbreviated city names in a small percentage of cases to get variations like NYC, BK, SF, etc. in the training data
|
2015-12-03 18:00:05 -05:00 |
|
Al
|
218361f43f
|
[osm] Removing multilinestring boundaries from OSM polygon index (often partial boundaries e.g. France-Germany)
|
2015-12-03 00:51:09 -05:00 |
|