Commit Graph

593 Commits

Author SHA1 Message Date
Al
028dbacc87 [dictionaries] making entrances/postcodes plural for consistency 2016-04-15 01:10:03 -04:00
Al
883ef2ec56 [dictionaries] Moving intersections to cross streets 2016-04-14 17:53:45 -04:00
Al
5850793768 [expansion] Add postcode dictionary to gazetteer types 2016-04-14 14:33:02 -04:00
Al
6babbfaf02 [addresses] generator for floor numbers as well as special aliases like basement, mezzanine, etc. using the address configs 2016-04-14 14:22:08 -04:00
Al
36b3d515ad [expansion] Modifying the Python gazetteers to use new dictionaries API 2016-04-14 14:17:09 -04:00
Al
2ff4940e36 [expansion] Adding number and intersections to dictionary types 2016-04-14 14:15:33 -04:00
Al
49b02796c0 [addresses] Adding abbreviations as a separate module so it can be used with multiple data sets 2016-04-14 03:10:01 -04:00
Al
a6553b77d3 [addresses] PO Box phrase generator 2016-04-14 02:38:45 -04:00
Al
d29ade7210 [addresses] conjunction class for building phrases like "5th and 6th" or "Units 1 & 2" across languages using the address configs 2016-04-14 01:21:44 -04:00
Al
f0ac3522da [addresses] base class for numbered components (floors, units, house numbers in some languages/countries). Can generate many variants of a number (e.g. Floor 2, 2nd Floor, Floor #2, Floor No. 2, etc.) 2016-04-14 01:17:43 -04:00
Al
fe006e0d62 [addresses] utilities for sampling from an arbitrary discrete distribution, building cumulative distributions, and sampling from a Zipfian distribution which seems to be a reasonable way of generating plausible apartment/floor numbers when the height/number of units is unknown. Picking a letter uniformly at random means P('Unit A') == P('Unit Z') when 'A' should be much more likely. Sampling from a Zipfian gets the desired effect in situations where address components are numbered by "counting from 0/1/A" while still allowing for a long tail 2016-04-14 01:13:39 -04:00
Al
58feeab714 [addresses] address config class for general sampling of forms specified in the address configs (default/alternatives to choose a phrase, canonical/abbreviated/sample to choose an abbreviation or surface form for that phrase) 2016-04-14 01:06:54 -04:00
Al
7332445525 [polygons] Persistent polygons for neighborhoods index as well, cache size at 100k 2016-04-11 01:24:45 -04:00
Al
e6dcf975f6 [polygons] neighborhoods repo has the correct polygons for NYC, removing the pediacities version 2016-04-10 20:27:38 -04:00
Al
f739b46d6d [fix] priorities in neighborhood index 2016-04-10 18:56:03 -04:00
Al
761413e723 [fix] var name 2016-04-10 14:02:55 -04:00
Al
83fcf39d49 [fix] Fixes to Zetashapes reverse geocoder 2016-04-10 14:01:43 -04:00
Al
ef72ad592b [fix] moving methods 2016-04-09 21:35:17 -04:00
Al
dee143798a [polygons/neighborhoods] refactoring Zetashapes download, adding in PediaCities polygons for NYC neighborhoods 2016-04-09 21:32:39 -04:00
Al
38b39887ec [polygons] refactoring methods for getting cached/non-cached polygons 2016-04-09 19:52:48 -04:00
Al
78924fa308 [polygons] Quattroshapes neighborhoods use regular in-memory polygons 2016-04-09 19:28:56 -04:00
Al
d38de71854 [dictionaries] encapsulating reading address dictionaries so it's easy to implement sampling for the address training data 2016-04-08 18:12:30 -04:00
Al
02e82e5342 [numex] Nicer API for ordinal suffixes 2016-04-08 17:10:18 -04:00
Al
737b5d06ed [osm/polygons] Adding properties in building polygons 2016-04-08 12:33:40 -04:00
Al
3bc85db41e [numex] Moving numex files to YAML as well 2016-04-07 13:26:00 -04:00
Al
5fce5e8000 [osm/polygons] add building:part to building polygons 2016-04-07 13:15:42 -04:00
Al
778fba2451 [osm] Moving OSM boundaries to YAML files instead of JSON for consistency 2016-04-06 22:59:46 -04:00
Al
f2f131661a [osm/polygons] Using greater simplify tolerance 2016-04-06 20:24:37 -04:00
Al
69ef201cf1 [fix] simplify_polygons in building geocoder, and adding caching back to OSM admin polygons as it's faster when taking into account startup time. Also adding a few properties to buildings and landuse polygons 2016-04-06 13:53:47 -04:00
Al
502c61d9db [osm/polygons] Same check for closed ways as for relations in OSM polygon readers 2016-04-06 01:35:36 -04:00
Al
984cdc0650 [osm/polygons] From benchmarking it seems to make sense to keep OSM polygons in memory after all 2016-04-05 23:25:45 -04:00
Al
fbebcc11d0 [fix] properties/polygon key split 2016-04-05 22:47:48 -04:00
Al
ee160c715b [osm/polygons] Trying persistent polygons again on OSM/Quattroshapes to test the new settings 2016-04-05 19:46:45 -04:00
Al
b8ccb8bfa1 [osm/polygons] Storing polygon JSON under a different key so it doesn't have to be read from disk after a successful cache matched point-in-polygon test just to retrieve the properties 2016-04-05 19:45:44 -04:00
Al
a8ea5f47c3 [fix] var name 2016-04-05 19:23:08 -04:00
Al
65e0067ed0 [fix] classmethod for loading polygons 2016-04-05 19:20:12 -04:00
Al
a8b0114871 [osm/polygons] Keep OSM/Quattroshapes admin polygons in memory as there are fewer of them and they are large 2016-04-05 19:14:17 -04:00
Al
b693fe11dd [fix] double prep 2016-04-05 18:49:52 -04:00
Al
136700fa7f [fix] return_all in polygon index 2016-04-05 18:42:20 -04:00
Al
e242868fd9 [osm/polygons] Keep stats on cache hits/misses for testing cache sizes 2016-04-05 16:46:19 -04:00
Al
ec29c36cbc [build] Adding lru-dict, a fast C LRU cache, to requirements.txt for geodata package 2016-04-05 14:55:35 -04:00
Al
004165d184 [osm/polygons] Using an LRU cache for prepped polygons in the various PolygonIndex subclasses. That way can store less simplified polygons but keep frequently accessed ones (like countries) in memory 2016-04-05 14:53:07 -04:00
Al
1af5b88922 [fix] name 2016-04-05 00:51:01 -04:00
Al
49498ccf81 [fix] import 2016-04-04 23:38:30 -04:00
Al
6bb6ddb06a [fix] arg name 2016-04-04 22:41:20 -04:00
Al
0107473c6d [osm] Adding place=plot to subdivisions data set 2016-04-04 22:15:07 -04:00
Al
1ded6567f0 [fix] command for subdivision polys 2016-04-04 21:55:58 -04:00
Al
d570ca406b [fix] import 2016-04-04 21:54:45 -04:00
Al
4aacad3676 [fix] var scope 2016-04-04 21:43:05 -04:00
Al
1844f99baf [fix] Simplify OSM polygons but using the new threshold 2016-04-04 21:39:26 -04:00