Commit Graph

1125 Commits

Author SHA1 Message Date
Al
5af95ee613 [osm] Adding GeoNames abbreviated city names in a small percentage of cases to get variations like NYC, BK, SF, etc. in the training data 2015-12-03 18:00:05 -05:00
Al
25e89bcc41 [fix] tokenized trie search edge case where tail is stored on the space node 2015-12-03 12:25:21 -05:00
Al
218361f43f [osm] Removing multilinestring boundaries from OSM polygon index (often partial boundaries e.g. France-Germany) 2015-12-03 00:51:09 -05:00
Al
43287db90a [normalization/phrases] Fixing a bug which occurs with an already-separated elision 2015-12-02 16:04:39 -05:00
Al
87c04b4d37 [fix] path in setup.py 2015-12-02 14:22:11 -05:00
Al
09a3e2ab64 [fix] pip install command 2015-12-02 13:43:57 -05:00
Al
746b5d0f34 [fix] transliterate using string_equals 2015-12-02 13:09:43 -05:00
Al
d0aaff1482 [utils] string_equals with NULL check 2015-12-01 13:12:08 -05:00
Al
f322ae0a1c [build] adding shuffle.c to Makefile rule 2015-12-01 11:28:33 -05:00
Al
b94264b745 [parser] Forgot to add shuffle.h/.c 2015-12-01 11:25:28 -05:00
Al
116fe857db [parser] gshuf (Mac equivalent of shuf) is quite a bit slower than shuf, so removing it. Need to train on Linux unless a better alternative is found for shuffling large files on Mac 2015-12-01 11:24:44 -05:00
Al
8484d4fffd [fix] venue names should be removed probabilistically in the training data, giving neighborhoods a slightly better chance of being included 2015-11-30 23:28:12 -05:00
Al
6ef40c1769 [fix] dupe checking 2015-11-30 18:43:11 -05:00
Al
af170de019 [fix] Smaller probabilities on adding neighborhoods and admin polygons, eliminating duplicates on the row level 2015-11-30 18:35:31 -05:00
Al
621fd79002 [fix] var 2015-11-30 18:20:26 -05:00
Al
b430fb7657 [osm/formatting] Adding pick random name logic to neighborhoods as well, getting rid of drop probabilities as they're covered elsewhere, adding several forms of venue names to the training data 2015-11-30 18:10:18 -05:00
Al
d4b6450f19 [formatting] Not applying template replacements from address formatting by default 2015-11-30 16:11:13 -05:00
Al
839a12b212 [osm/formatting] Changing drop probabilities and doing it in random order 2015-11-30 15:27:35 -05:00
Al
5f13041140 [parsing/build] Makefile changes for address parser 2015-11-30 14:51:43 -05:00
Al
4ca911baf8 [parsing] Adding a command-line client (with history) to test address parsing 2015-11-30 14:51:01 -05:00
Al
89677d94a3 [parsing] Initial commit of the address parser, training/testing, feature function, I/O 2015-11-30 14:48:13 -05:00
Al
e62eb1e697 [math] Matrix file I/O 2015-11-30 12:53:18 -05:00
Al
5682c347ac [fix] close file handle 2015-11-30 12:51:13 -05:00
Al
9a8ba14887 [osm/formatting] Adding per-field drop probabilities to OSM training data to make some fields more likely to be dropped, although it might create more training data 2015-11-30 11:10:12 -05:00
Al
c8e4602d4c [fix] Neighborhoods reverse geocoder discriminates between OSM matched with Zetashapes and OSM matched with Quattroshapes 2015-11-30 10:59:50 -05:00
Al
feab77970b [cli] Adding antirez's linenoise for command-line interfaces 2015-11-29 11:28:31 -05:00
Al
15d9e00121 [osm/formatting] Adding in more ISO alpha-3 codes for countries in the training data 2015-11-28 14:08:07 -05:00
Al
d3040036ec [fix] moving separator definitions 2015-11-28 13:53:13 -05:00
Al
66778737ff [fix] non-local language states 2015-11-28 13:48:59 -05:00
Al
69ba631dc9 [docs] updating params in OSM training data docs 2015-11-28 01:09:14 -05:00
Al
3cd1fee89d [fix] KeyError 2015-11-27 14:40:11 -05:00
Al
a77bc03977 [fix] language 2015-11-27 14:24:32 -05:00
Al
38d4e2d67a [fix] cities 2015-11-27 14:05:53 -05:00
Al
3cf98770e3 [fix] var name 2015-11-27 13:54:38 -05:00
Al
2e0f35b13a [fix] key checks for Quattroshapes cities, removing city in non-local language case 2015-11-27 13:45:51 -05:00
Al
105ba313c5 [fix] var name 2015-11-27 12:00:11 -05:00
Al
3eea355352 [fix] argument order 2015-11-27 11:47:39 -05:00
Al
51f6a82727 [fix] import again 2015-11-27 11:38:40 -05:00
Al
644eeb74c6 [fix] import 2015-11-27 11:17:53 -05:00
Al
2830986073 [osm/formatting] Adding in cities from Quattroshapes/GeoNames in the case of non-local languages or in general with a small random probability 2015-11-27 11:09:12 -05:00
Al
b0667d0032 [fix] only care about levels in Quattroshapes index, not Zetashapes 2015-11-26 23:45:50 -05:00
Al
0eb0042826 [fix] Same in neighborhoods reverse geocoder lookups 2015-11-26 14:17:17 -05:00
Al
4170f6e9e3 [fix] same options for geohash-based index 2015-11-26 14:14:53 -05:00
Al
4cff1f8a9d [fix] Quattroshapes neighborhoods index uses geohashes for slightly better coverage 2015-11-26 12:45:54 -05:00
Al
98d8054a2b [polygons/quattroshapes] Converting Quattroshapes lookups to an R-tree index 2015-11-25 19:37:57 -05:00
Al
8a8e45f2a6 [fix] filenames 2015-11-25 18:08:04 -05:00
Al
bd88628a98 [polygons/quattroshapes] Removing local admin and neighborhoods from the Quattroshapes reverse geocoder since they're covered in neighborhoods 2015-11-25 18:06:14 -05:00
Al
40d18aa7f6 [polygons/osm] Switching back to buffer(0). Still destroys many polygons, may need to look into another solution 2015-11-25 17:10:50 -05:00
Al
a50c971732 [polygons/osm] Ommitting last node in every way of a connected component since that node is equal to the start node of its neighbor 2015-11-25 17:09:19 -05:00
Al
d6d5eab989 [geonames] Adding ability to lookup GeoNames alternate names (may obtain IDs from Quattroshapes). Not great for local-language primary names (OSM remains the best) but decent for extracting foreign toponyms 2015-11-25 17:07:14 -05:00