Commit Graph

579 Commits

Author SHA1 Message Date
Al
3d3aacae67 [addresses] Adding abbreviations as a separate module so it can be used with multiple data sets 2016-07-21 17:04:57 -04:00
Al
317d3aa9ed [addresses] PO Box phrase generator 2016-07-21 17:04:57 -04:00
Al
9c4348a990 [addresses] conjunction class for building phrases like "5th and 6th" or "Units 1 & 2" across languages using the address configs 2016-07-21 17:04:57 -04:00
Al
d136fb7576 [addresses] base class for numbered components (floors, units, house numbers in some languages/countries). Can generate many variants of a number (e.g. Floor 2, 2nd Floor, Floor #2, Floor No. 2, etc.) 2016-07-21 17:04:57 -04:00
Al
14c89e6895 [addresses] utilities for sampling from an arbitrary discrete distribution, building cumulative distributions, and sampling from a Zipfian distribution which seems to be a reasonable way of generating plausible apartment/floor numbers when the height/number of units is unknown. Picking a letter uniformly at random means P('Unit A') == P('Unit Z') when 'A' should be much more likely. Sampling from a Zipfian gets the desired effect in situations where address components are numbered by "counting from 0/1/A" while still allowing for a long tail 2016-07-21 17:04:57 -04:00
Al
dcabdf7c0b [addresses] address config class for general sampling of forms specified in the address configs (default/alternatives to choose a phrase, canonical/abbreviated/sample to choose an abbreviation or surface form for that phrase) 2016-07-21 17:04:57 -04:00
Al
9a0ea19d02 [polygons] Persistent polygons for neighborhoods index as well, cache size at 100k 2016-07-21 17:04:57 -04:00
Al
90142e8559 [polygons] neighborhoods repo has the correct polygons for NYC, removing the pediacities version 2016-07-21 17:04:57 -04:00
Al
c570bb7aef [fix] priorities in neighborhood index 2016-07-21 17:04:57 -04:00
Al
e87c216241 [fix] var name 2016-07-21 17:04:57 -04:00
Al
ab1a8d4416 [fix] Fixes to Zetashapes reverse geocoder 2016-07-21 17:04:57 -04:00
Al
a93f110112 [fix] moving methods 2016-07-21 17:04:57 -04:00
Al
efd167323b [polygons/neighborhoods] refactoring Zetashapes download, adding in PediaCities polygons for NYC neighborhoods 2016-07-21 17:04:57 -04:00
Al
333bd7ef45 [polygons] refactoring methods for getting cached/non-cached polygons 2016-07-21 17:04:57 -04:00
Al
e4ff4a28b1 [polygons] Quattroshapes neighborhoods use regular in-memory polygons 2016-07-21 17:04:57 -04:00
Al
9dd5d5c210 [dictionaries] encapsulating reading address dictionaries so it's easy to implement sampling for the address training data 2016-07-21 17:04:57 -04:00
Al
23525df39d [numex] Nicer API for ordinal suffixes 2016-07-21 17:04:57 -04:00
Al
0f0af1f295 [osm/polygons] Adding properties in building polygons 2016-07-21 17:04:57 -04:00
Al
e24306701f [numex] Moving numex files to YAML as well 2016-07-21 17:04:57 -04:00
Al
76fc337d0e [osm/polygons] add building:part to building polygons 2016-07-21 17:04:57 -04:00
Al
72ee2e00ae [osm] Moving OSM boundaries to YAML files instead of JSON for consistency 2016-07-21 17:04:57 -04:00
Al
6a03b0376c [osm/polygons] Using greater simplify tolerance 2016-07-21 17:04:57 -04:00
Al
ae62471d32 [fix] simplify_polygons in building geocoder, and adding caching back to OSM admin polygons as it's faster when taking into account startup time. Also adding a few properties to buildings and landuse polygons 2016-07-21 17:04:57 -04:00
Al
1f52f8ddcc [osm/polygons] Same check for closed ways as for relations in OSM polygon readers 2016-07-21 17:04:57 -04:00
Al
26ada5cdbb [osm/polygons] From benchmarking it seems to make sense to keep OSM polygons in memory after all 2016-07-21 17:04:57 -04:00
Al
f76a78120d [fix] properties/polygon key split 2016-07-21 17:04:57 -04:00
Al
d460e2abe9 [osm/polygons] Trying persistent polygons again on OSM/Quattroshapes to test the new settings 2016-07-21 17:04:57 -04:00
Al
171a85bdff [osm/polygons] Storing polygon JSON under a different key so it doesn't have to be read from disk after a successful cache matched point-in-polygon test just to retrieve the properties 2016-07-21 17:04:57 -04:00
Al
67a3ee8e2a [fix] var name 2016-07-21 17:04:57 -04:00
Al
58f075f2ea [fix] classmethod for loading polygons 2016-07-21 17:04:57 -04:00
Al
9755d2cee9 [osm/polygons] Keep OSM/Quattroshapes admin polygons in memory as there are fewer of them and they are large 2016-07-21 17:04:57 -04:00
Al
f6b88ba456 [fix] double prep 2016-07-21 17:04:57 -04:00
Al
941ab39a6a [fix] return_all in polygon index 2016-07-21 17:04:57 -04:00
Al
7b82af5526 [osm/polygons] Keep stats on cache hits/misses for testing cache sizes 2016-07-21 17:04:57 -04:00
Al
499a20cb36 [osm/polygons] Using an LRU cache for prepped polygons in the various PolygonIndex subclasses. That way can store less simplified polygons but keep frequently accessed ones (like countries) in memory 2016-07-21 17:04:57 -04:00
Al
a84047b567 [fix] name 2016-07-21 17:04:57 -04:00
Al
57c3e0ddd4 [fix] import 2016-07-21 17:04:57 -04:00
Al
0e58f24172 [fix] arg name 2016-07-21 17:04:57 -04:00
Al
2f862ca0ec [osm] Adding place=plot to subdivisions data set 2016-07-21 17:04:57 -04:00
Al
6d1352334e [fix] command for subdivision polys 2016-07-21 17:04:57 -04:00
Al
fe7cc0a937 [fix] import 2016-07-21 17:04:57 -04:00
Al
142bc293bb [fix] var scope 2016-07-21 17:04:57 -04:00
Al
70effea0f7 [fix] Simplify OSM polygons but using the new threshold 2016-07-21 17:04:57 -04:00
Al
4e17ef6f91 [osm] Storing polygon properties in a LevelDB, polygons themselves stay in memory 2016-07-21 17:04:57 -04:00
Al
8db7f139ba [osm] Adding building polygon reader, including closed ways for admin polys 2016-07-21 17:04:57 -04:00
Al
12a688df36 [osm] Splitting out generic amenities like ATM, fuel, restrooms, etc. so they can be used in category queries. Adding subdivision polygons, postcode polygons, building polygons, adding a few types of place keys to venues data set 2016-07-21 17:04:57 -04:00
Al
fc689222da [osm] adding civil boundaries (e.g. postal areas in Dublin), fixing output files 2016-07-21 17:04:57 -04:00
Al
492b6ee235 [categories] Using TSV files instead of YAML for category queries, easier to edit 2016-07-21 17:04:57 -04:00
Al
f3a9f4a257 [fix] removing init_gazetteers, doing it at the module level 2016-07-21 17:04:57 -04:00
Al
0162194dbc [dictionaries] Adding dictionary type enums to the generator script 2016-07-21 17:04:57 -04:00