69ba631dc9[docs] updating params in OSM training data docs
Al
2015-11-28 01:09:14 -05:00
3cd1fee89d[fix] KeyError
Al
2015-11-27 14:40:11 -05:00
a77bc03977[fix] language
Al
2015-11-27 14:24:32 -05:00
38d4e2d67a[fix] cities
Al
2015-11-27 14:05:53 -05:00
3cf98770e3[fix] var name
Al
2015-11-27 13:54:38 -05:00
2e0f35b13a[fix] key checks for Quattroshapes cities, removing city in non-local language case
Al
2015-11-27 13:45:51 -05:00
105ba313c5[fix] var name
Al
2015-11-27 12:00:11 -05:00
3eea355352[fix] argument order
Al
2015-11-27 11:47:39 -05:00
51f6a82727[fix] import again
Al
2015-11-27 11:38:40 -05:00
644eeb74c6[fix] import
Al
2015-11-27 11:17:53 -05:00
2830986073[osm/formatting] Adding in cities from Quattroshapes/GeoNames in the case of non-local languages or in general with a small random probability
Al
2015-11-27 11:09:12 -05:00
b0667d0032[fix] only care about levels in Quattroshapes index, not Zetashapes
Al
2015-11-26 23:45:50 -05:00
0eb0042826[fix] Same in neighborhoods reverse geocoder lookups
Al
2015-11-26 14:17:17 -05:00
4170f6e9e3[fix] same options for geohash-based index
Al
2015-11-26 14:14:53 -05:00
4cff1f8a9d[fix] Quattroshapes neighborhoods index uses geohashes for slightly better coverage
Al
2015-11-26 12:45:54 -05:00
98d8054a2b[polygons/quattroshapes] Converting Quattroshapes lookups to an R-tree index
Al
2015-11-25 19:37:57 -05:00
8a8e45f2a6[fix] filenames
Al
2015-11-25 18:08:04 -05:00
bd88628a98[polygons/quattroshapes] Removing local admin and neighborhoods from the Quattroshapes reverse geocoder since they're covered in neighborhoods
Al
2015-11-25 18:06:14 -05:00
40d18aa7f6[polygons/osm] Switching back to buffer(0). Still destroys many polygons, may need to look into another solution
Al
2015-11-25 17:10:50 -05:00
a50c971732[polygons/osm] Ommitting last node in every way of a connected component since that node is equal to the start node of its neighbor
Al
2015-11-25 17:09:19 -05:00
d6d5eab989[geonames] Adding ability to lookup GeoNames alternate names (may obtain IDs from Quattroshapes). Not great for local-language primary names (OSM remains the best) but decent for extracting foreign toponyms
Al
2015-11-25 17:07:14 -05:00
3217fa39cd[fix] add country randomly in the formatted language training data in cases where country is not present
Al
2015-11-25 14:54:41 -05:00
1a6618957b[fix] Python float precision doesn't appear to be the problem
Al
2015-11-25 11:29:08 -05:00
5781813cbd[fix] For countries like Denmark, removing country with a smaller probability
Al
2015-11-25 00:39:52 -05:00
e4b8349d98[fix] sparsity of country tags should be enough for language address training data
Al
2015-11-25 00:32:01 -05:00
824c779107[fix] Cutting down training repeatedly on country names
Al
2015-11-24 23:22:57 -05:00
88529d28e2[fix] country formatting in language address training data
Al
2015-11-24 23:20:31 -05:00
cd74fcda3c[fix] not requiring minimal keys in format language data
Al
2015-11-24 23:13:28 -05:00
e560e53308[fix] formatter
Al
2015-11-24 22:27:57 -05:00
8c422a6e61[osm] Adding new localized country names in anguage training data for formatted addresses
Al
2015-11-24 21:49:10 -05:00
e40ca0bb89[fix] Removing house numbers from formatted address language training data, using a simple whitespace splitter
Al
2015-11-24 21:15:22 -05:00
a92cbb8003[osm] Trying fixed-point precision in converting OSM coordinates to avoid issues with polygon self-intersection when the lines are very close together (e.g. parts of Berlin, UK country polygon)
Al
2015-11-24 15:13:14 -05:00
ef9c5c2ca1[fix] args
Al
2015-11-24 11:02:35 -05:00
e75c1ce860[fix] limited addresses
Al
2015-11-24 11:01:22 -05:00
94039f98ad[fix] argument validation in OSM training data script
Al
2015-11-24 10:59:16 -05:00
6d20d7348f[osm] Using OSM namespaced tags from polygons in the case of non-local languages
Al
2015-11-23 14:42:23 -05:00
e46e1a93a0[fix] ISO code and simple/international name checks should be on the polygons
Al
2015-11-23 14:30:38 -05:00
eb7488ab55[fix] Making country replacement probability independent of the probability used for local vs non-local languages
Al
2015-11-23 13:46:14 -05:00
f4f7cceba2[fix] var, non-local languages
Al
2015-11-23 12:51:26 -05:00
6aa640b5f0[fix] Moving is_in:country to lower priority
Al
2015-11-23 12:36:05 -05:00
2b1c346fde[osm] Using name:simple and int_name to capture more variations for US addresses, adding ISO codes occationally instead of names
Al
2015-11-23 12:35:44 -05:00
f1b6620369[osm/formatting] replacing keys with the highest priority so addr:* tags take precedence over is_in:* tags
Al
2015-11-22 22:25:41 -05:00
2695b5dd26[osm] Shortening state names obtained from reverse geocoding for relevant countries
Al
2015-11-22 22:09:31 -05:00
8b035814c7[osm] Change probabilities for country names
Al
2015-11-22 18:52:17 -05:00
04183c672e[fix] non-integer admin levels
Al
2015-11-22 18:33:27 -05:00
7ee8045a0f[fix] comparison
Al
2015-11-22 18:27:05 -05:00
efa0e38e45[fix] another issue with tokenize API
Al
2015-11-22 18:08:45 -05:00
ce065bb9ec[fix] using new pypostal tokenize API
Al
2015-11-22 18:01:07 -05:00
71afcafe11[fix] key names
Al
2015-11-22 17:46:56 -05:00
f77ddc71e7[fix] reverting to old Rtree index filename
Al
2015-11-22 17:25:51 -05:00
ee482e7a07[fix] import
Al
2015-11-22 16:04:50 -05:00
ee75ffccd5[fix] import
Al
2015-11-22 15:51:13 -05:00
c6f531ca95[fix] arguments
Al
2015-11-22 15:35:25 -05:00
c851cf2547[fix] OSM R-tree
Al
2015-11-22 15:24:35 -05:00
d3703ce6b4[fix] var name
Al
2015-11-22 14:27:25 -05:00
422ea668d8[fix] import
Al
2015-11-22 14:23:09 -05:00
4f0d6fbf79[fix] default arg again
Al
2015-11-22 14:22:09 -05:00
4cc275e313[fix] doc and default arg
Al
2015-11-22 14:21:20 -05:00
c8f47b38a2[osm/formatting] Adding OSM polygon lookups and neighborhood polygon lookups to the training data in order to provide more variations for the model to work with
Al
2015-11-21 17:05:35 -05:00
9fc60600dd[fix] OSM reverse geocoder polygon ordering
Al
2015-11-20 14:49:37 -05:00
130518fe58[polygons] OSM reverse geocoder sort levels
Al
2015-11-20 13:52:30 -05:00
b948a8ebd8[osm] Adding global keys which map to OSM address components
Al
2015-11-20 12:48:54 -05:00
85667997cd[formatting] Adding city_district and state_district tags to address formatting templates where it makes sense. These will not be in all addresses, tags can be added and removed from the training data with certain probabilities
Al
2015-11-20 12:24:44 -05:00
470bd17c07[formatting] Adding configs for a few dozen countries mapping OSM admin level to an address formatter field
Al
2015-11-17 11:42:26 -05:00
946bce1cb9[osm] Adding a few more boundary types to planet admin borders
Al
2015-11-17 11:40:42 -05:00
b3ef8ded12[formatting] Adding OSM address components lookup by country
Al
2015-11-17 11:39:34 -05:00
0b74039a6a[formatting] Adding city_district as a separate format tag
Al
2015-11-17 11:38:38 -05:00
48a305c8c4[fix] Reverting last two changes, have to fix on the OSM side
Al
2015-11-01 16:23:41 -05:00
90773294b9[polygons] Only fixing polygons in cases with inner rings
Al
2015-11-01 12:36:35 -05:00
477300c061[polygons] Eliminating fix_polygon
Al
2015-11-01 03:12:58 -05:00
aba2c51e65[fix] name
Al
2015-11-01 01:35:01 -04:00
1dbfc6a87b[polygons/neighborhoods] Not counting local admin polys unless they match OSM, fix for Paris arrondissements
Al
2015-11-01 01:24:43 -04:00
4fdaef2638[fix] Don't need Quattroshapes dir for OSM Rtree
Al
2015-10-31 19:08:04 -04:00
e9a6ea1d72[fix] default index path
Al
2015-10-31 18:47:24 -04:00
4a35c50f92[fix] index paths
Al
2015-10-31 18:30:53 -04:00
c54fb412a3[fix] save_index
Al
2015-10-31 17:59:44 -04:00
0227d9335f[fix] Removing some debug code
Al
2015-10-31 14:50:32 -04:00
5882c2d64b[fix] command-line arg II
Al
2015-10-31 14:25:17 -04:00
66f8a2dc9e[fix] command-line arg
Al
2015-10-31 14:24:23 -04:00
e5d8812504[fix] argument default
Al
2015-10-31 14:23:37 -04:00
f39090869e[fix] imports
Al
2015-10-31 14:22:45 -04:00
8166cd66c8[fix] encoding yet again
Al
2015-10-31 14:19:50 -04:00
f473ff0dad[fix] encoding, different file
Al
2015-10-31 14:18:43 -04:00
a2eb40109c[fix] file encoding
Al
2015-10-31 14:17:26 -04:00
3e43ac7255[polygons/osm] Adding a unified neighborhood reverse geocoder incorporating Zetashapes, OSM and Quattroshapes. Uses the new Soft TFIDF implementation to approximately match OSM names to Quattroshapes/Zetashapes names and geohash indices for more coarse point-in-polygon tests (OSM neighborhoods are stored as points not polygons, so need to match with a geometry from the other sources)
Al
2015-10-31 14:15:39 -04:00
a38624ba59[similarity] Adding NameDeduper base class for deduping geographic names using the new Soft TFIDF similarity
Al
2015-10-30 15:56:23 -04:00
a5c1296044[similarity] Adding Jaccard similarity with word frequencies instead of simple sets, better for ideographic scripts (Han, Hangul, etc.) in the absence of word segmentation since there may be many high frequency characters
Al
2015-10-30 13:35:11 -04:00
cbeb08f1d1[python/normalize] importing options from the C module
Al
2015-10-30 12:34:07 -04:00
cccc3e9cf5[similarity] Using Soft-TFIDF for approximate name matching. Soft-TFIDF is a hybrid string distance metric which balances local token similarities (using Jaro-Winkler similarity by default) allowing for slight spelling errors with global TFIDF statistics so that very frequent words don't affect the score as much
Al
2015-10-30 02:02:16 -04:00
e7f783477f[python/normalize] Adding remove parentheses options in Python normalize (would require compiling with the scanner to do it from C, but could switch)
Al
2015-10-30 01:27:13 -04:00
5076c0409b[similarity] Adding an in-memory IDF index for weighted similarities
Al
2015-10-29 12:53:11 -04:00
1c543a5271[osm/formatting] Adding is_in tags to the address formatter as they're common in OSM, aliasing addr:district to state_district instead of suburb
Al
2015-10-29 12:25:35 -04:00
c7df3fcb3a[osm] Adding a list of various OSM name tags obtained from Nominatim
Al
2015-10-29 11:44:51 -04:00
cee9da05d6[fix] using tokenize_raw API
Al
2015-10-28 21:37:41 -04:00
bbd10e97bd[fix] imports
Al
2015-10-28 21:32:09 -04:00
110451d6d6[polygons] Polygon area calculations
Al
2015-10-28 21:19:35 -04:00
e946e63222[polygons] Changing language polygon index to use new index_polygon method
Al
2015-10-28 21:18:27 -04:00
5fdbb7e832[polygons] Adding a geohash polygon index which selects a prefix size based on the area of the polygon's bounding box
Al
2015-10-28 21:17:33 -04:00