Commit Graph

  • 69ba631dc9 [docs] updating params in OSM training data docs Al 2015-11-28 01:09:14 -05:00
  • 3cd1fee89d [fix] KeyError Al 2015-11-27 14:40:11 -05:00
  • a77bc03977 [fix] language Al 2015-11-27 14:24:32 -05:00
  • 38d4e2d67a [fix] cities Al 2015-11-27 14:05:53 -05:00
  • 3cf98770e3 [fix] var name Al 2015-11-27 13:54:38 -05:00
  • 2e0f35b13a [fix] key checks for Quattroshapes cities, removing city in non-local language case Al 2015-11-27 13:45:51 -05:00
  • 105ba313c5 [fix] var name Al 2015-11-27 12:00:11 -05:00
  • 3eea355352 [fix] argument order Al 2015-11-27 11:47:39 -05:00
  • 51f6a82727 [fix] import again Al 2015-11-27 11:38:40 -05:00
  • 644eeb74c6 [fix] import Al 2015-11-27 11:17:53 -05:00
  • 2830986073 [osm/formatting] Adding in cities from Quattroshapes/GeoNames in the case of non-local languages or in general with a small random probability Al 2015-11-27 11:09:12 -05:00
  • b0667d0032 [fix] only care about levels in Quattroshapes index, not Zetashapes Al 2015-11-26 23:45:50 -05:00
  • 0eb0042826 [fix] Same in neighborhoods reverse geocoder lookups Al 2015-11-26 14:17:17 -05:00
  • 4170f6e9e3 [fix] same options for geohash-based index Al 2015-11-26 14:14:53 -05:00
  • 4cff1f8a9d [fix] Quattroshapes neighborhoods index uses geohashes for slightly better coverage Al 2015-11-26 12:45:54 -05:00
  • 98d8054a2b [polygons/quattroshapes] Converting Quattroshapes lookups to an R-tree index Al 2015-11-25 19:37:57 -05:00
  • 8a8e45f2a6 [fix] filenames Al 2015-11-25 18:08:04 -05:00
  • bd88628a98 [polygons/quattroshapes] Removing local admin and neighborhoods from the Quattroshapes reverse geocoder since they're covered in neighborhoods Al 2015-11-25 18:06:14 -05:00
  • 40d18aa7f6 [polygons/osm] Switching back to buffer(0). Still destroys many polygons, may need to look into another solution Al 2015-11-25 17:10:50 -05:00
  • a50c971732 [polygons/osm] Ommitting last node in every way of a connected component since that node is equal to the start node of its neighbor Al 2015-11-25 17:09:19 -05:00
  • d6d5eab989 [geonames] Adding ability to lookup GeoNames alternate names (may obtain IDs from Quattroshapes). Not great for local-language primary names (OSM remains the best) but decent for extracting foreign toponyms Al 2015-11-25 17:07:14 -05:00
  • 3217fa39cd [fix] add country randomly in the formatted language training data in cases where country is not present Al 2015-11-25 14:54:41 -05:00
  • 1a6618957b [fix] Python float precision doesn't appear to be the problem Al 2015-11-25 11:29:08 -05:00
  • 5781813cbd [fix] For countries like Denmark, removing country with a smaller probability Al 2015-11-25 00:39:52 -05:00
  • e4b8349d98 [fix] sparsity of country tags should be enough for language address training data Al 2015-11-25 00:32:01 -05:00
  • 824c779107 [fix] Cutting down training repeatedly on country names Al 2015-11-24 23:22:57 -05:00
  • 88529d28e2 [fix] country formatting in language address training data Al 2015-11-24 23:20:31 -05:00
  • cd74fcda3c [fix] not requiring minimal keys in format language data Al 2015-11-24 23:13:28 -05:00
  • e560e53308 [fix] formatter Al 2015-11-24 22:27:57 -05:00
  • 8c422a6e61 [osm] Adding new localized country names in anguage training data for formatted addresses Al 2015-11-24 21:49:10 -05:00
  • e40ca0bb89 [fix] Removing house numbers from formatted address language training data, using a simple whitespace splitter Al 2015-11-24 21:15:22 -05:00
  • a92cbb8003 [osm] Trying fixed-point precision in converting OSM coordinates to avoid issues with polygon self-intersection when the lines are very close together (e.g. parts of Berlin, UK country polygon) Al 2015-11-24 15:13:14 -05:00
  • ef9c5c2ca1 [fix] args Al 2015-11-24 11:02:35 -05:00
  • e75c1ce860 [fix] limited addresses Al 2015-11-24 11:01:22 -05:00
  • 94039f98ad [fix] argument validation in OSM training data script Al 2015-11-24 10:59:16 -05:00
  • de9f3120c8 [polygons] Trying a slightly higher value for buffer() as suggested by this issue https://github.com/Toblerity/Shapely/issues/277 Al 2015-11-23 15:43:23 -05:00
  • 6d20d7348f [osm] Using OSM namespaced tags from polygons in the case of non-local languages Al 2015-11-23 14:42:23 -05:00
  • e46e1a93a0 [fix] ISO code and simple/international name checks should be on the polygons Al 2015-11-23 14:30:38 -05:00
  • eb7488ab55 [fix] Making country replacement probability independent of the probability used for local vs non-local languages Al 2015-11-23 13:46:14 -05:00
  • f4f7cceba2 [fix] var, non-local languages Al 2015-11-23 12:51:26 -05:00
  • 6aa640b5f0 [fix] Moving is_in:country to lower priority Al 2015-11-23 12:36:05 -05:00
  • 2b1c346fde [osm] Using name:simple and int_name to capture more variations for US addresses, adding ISO codes occationally instead of names Al 2015-11-23 12:35:44 -05:00
  • f1b6620369 [osm/formatting] replacing keys with the highest priority so addr:* tags take precedence over is_in:* tags Al 2015-11-22 22:25:41 -05:00
  • 2695b5dd26 [osm] Shortening state names obtained from reverse geocoding for relevant countries Al 2015-11-22 22:09:31 -05:00
  • 8b035814c7 [osm] Change probabilities for country names Al 2015-11-22 18:52:17 -05:00
  • 04183c672e [fix] non-integer admin levels Al 2015-11-22 18:33:27 -05:00
  • 7ee8045a0f [fix] comparison Al 2015-11-22 18:27:05 -05:00
  • efa0e38e45 [fix] another issue with tokenize API Al 2015-11-22 18:08:45 -05:00
  • ce065bb9ec [fix] using new pypostal tokenize API Al 2015-11-22 18:01:07 -05:00
  • 71afcafe11 [fix] key names Al 2015-11-22 17:46:56 -05:00
  • f77ddc71e7 [fix] reverting to old Rtree index filename Al 2015-11-22 17:25:51 -05:00
  • ee482e7a07 [fix] import Al 2015-11-22 16:04:50 -05:00
  • ee75ffccd5 [fix] import Al 2015-11-22 15:51:13 -05:00
  • c6f531ca95 [fix] arguments Al 2015-11-22 15:35:25 -05:00
  • c851cf2547 [fix] OSM R-tree Al 2015-11-22 15:24:35 -05:00
  • d3703ce6b4 [fix] var name Al 2015-11-22 14:27:25 -05:00
  • 5b6fbd66e0 [fix] arg Al 2015-11-22 14:24:05 -05:00
  • 422ea668d8 [fix] import Al 2015-11-22 14:23:09 -05:00
  • 4f0d6fbf79 [fix] default arg again Al 2015-11-22 14:22:09 -05:00
  • 4cc275e313 [fix] doc and default arg Al 2015-11-22 14:21:20 -05:00
  • c8f47b38a2 [osm/formatting] Adding OSM polygon lookups and neighborhood polygon lookups to the training data in order to provide more variations for the model to work with Al 2015-11-21 17:05:35 -05:00
  • 9fc60600dd [fix] OSM reverse geocoder polygon ordering Al 2015-11-20 14:49:37 -05:00
  • 130518fe58 [polygons] OSM reverse geocoder sort levels Al 2015-11-20 13:52:30 -05:00
  • b948a8ebd8 [osm] Adding global keys which map to OSM address components Al 2015-11-20 12:48:54 -05:00
  • 85667997cd [formatting] Adding city_district and state_district tags to address formatting templates where it makes sense. These will not be in all addresses, tags can be added and removed from the training data with certain probabilities Al 2015-11-20 12:24:44 -05:00
  • 470bd17c07 [formatting] Adding configs for a few dozen countries mapping OSM admin level to an address formatter field Al 2015-11-17 11:42:26 -05:00
  • 946bce1cb9 [osm] Adding a few more boundary types to planet admin borders Al 2015-11-17 11:40:42 -05:00
  • b3ef8ded12 [formatting] Adding OSM address components lookup by country Al 2015-11-17 11:39:34 -05:00
  • 0b74039a6a [formatting] Adding city_district as a separate format tag Al 2015-11-17 11:38:38 -05:00
  • 48a305c8c4 [fix] Reverting last two changes, have to fix on the OSM side Al 2015-11-01 16:23:41 -05:00
  • 90773294b9 [polygons] Only fixing polygons in cases with inner rings Al 2015-11-01 12:36:35 -05:00
  • 477300c061 [polygons] Eliminating fix_polygon Al 2015-11-01 03:12:58 -05:00
  • aba2c51e65 [fix] name Al 2015-11-01 01:35:01 -04:00
  • 1dbfc6a87b [polygons/neighborhoods] Not counting local admin polys unless they match OSM, fix for Paris arrondissements Al 2015-11-01 01:24:43 -04:00
  • 4fdaef2638 [fix] Don't need Quattroshapes dir for OSM Rtree Al 2015-10-31 19:08:04 -04:00
  • e9a6ea1d72 [fix] default index path Al 2015-10-31 18:47:24 -04:00
  • 4a35c50f92 [fix] index paths Al 2015-10-31 18:30:53 -04:00
  • c54fb412a3 [fix] save_index Al 2015-10-31 17:59:44 -04:00
  • 0227d9335f [fix] Removing some debug code Al 2015-10-31 14:50:32 -04:00
  • 5882c2d64b [fix] command-line arg II Al 2015-10-31 14:25:17 -04:00
  • 66f8a2dc9e [fix] command-line arg Al 2015-10-31 14:24:23 -04:00
  • e5d8812504 [fix] argument default Al 2015-10-31 14:23:37 -04:00
  • f39090869e [fix] imports Al 2015-10-31 14:22:45 -04:00
  • 8166cd66c8 [fix] encoding yet again Al 2015-10-31 14:19:50 -04:00
  • f473ff0dad [fix] encoding, different file Al 2015-10-31 14:18:43 -04:00
  • a2eb40109c [fix] file encoding Al 2015-10-31 14:17:26 -04:00
  • 3e43ac7255 [polygons/osm] Adding a unified neighborhood reverse geocoder incorporating Zetashapes, OSM and Quattroshapes. Uses the new Soft TFIDF implementation to approximately match OSM names to Quattroshapes/Zetashapes names and geohash indices for more coarse point-in-polygon tests (OSM neighborhoods are stored as points not polygons, so need to match with a geometry from the other sources) Al 2015-10-31 14:15:39 -04:00
  • a38624ba59 [similarity] Adding NameDeduper base class for deduping geographic names using the new Soft TFIDF similarity Al 2015-10-30 15:56:23 -04:00
  • a5c1296044 [similarity] Adding Jaccard similarity with word frequencies instead of simple sets, better for ideographic scripts (Han, Hangul, etc.) in the absence of word segmentation since there may be many high frequency characters Al 2015-10-30 13:35:11 -04:00
  • cbeb08f1d1 [python/normalize] importing options from the C module Al 2015-10-30 12:34:07 -04:00
  • cccc3e9cf5 [similarity] Using Soft-TFIDF for approximate name matching. Soft-TFIDF is a hybrid string distance metric which balances local token similarities (using Jaro-Winkler similarity by default) allowing for slight spelling errors with global TFIDF statistics so that very frequent words don't affect the score as much Al 2015-10-30 02:02:16 -04:00
  • e7f783477f [python/normalize] Adding remove parentheses options in Python normalize (would require compiling with the scanner to do it from C, but could switch) Al 2015-10-30 01:27:13 -04:00
  • 5076c0409b [similarity] Adding an in-memory IDF index for weighted similarities Al 2015-10-29 12:53:11 -04:00
  • 1c543a5271 [osm/formatting] Adding is_in tags to the address formatter as they're common in OSM, aliasing addr:district to state_district instead of suburb Al 2015-10-29 12:25:35 -04:00
  • c7df3fcb3a [osm] Adding a list of various OSM name tags obtained from Nominatim Al 2015-10-29 11:44:51 -04:00
  • cee9da05d6 [fix] using tokenize_raw API Al 2015-10-28 21:37:41 -04:00
  • bbd10e97bd [fix] imports Al 2015-10-28 21:32:09 -04:00
  • 110451d6d6 [polygons] Polygon area calculations Al 2015-10-28 21:19:35 -04:00
  • e946e63222 [polygons] Changing language polygon index to use new index_polygon method Al 2015-10-28 21:18:27 -04:00
  • 5fdbb7e832 [polygons] Adding a geohash polygon index which selects a prefix size based on the area of the polygon's bounding box Al 2015-10-28 21:17:33 -04:00