Commit Graph

  • e86921c081 [neighborhoods] Moving neighborhoods index to its own package Al 2016-04-26 14:26:46 -04:00
  • 054706b916 [dictionaries] preescolar Al 2016-04-23 14:24:34 -04:00
  • a338985012 [dictionaries] parcela Al 2016-04-23 14:24:18 -04:00
  • 8a5908f2dd [fix] most frequently occurring form for Auntie Anne's Al 2016-04-23 13:35:44 -04:00
  • 9890ad3811 [addresses] config for phrases around postcodes like CP in Spanish Al 2016-04-23 12:37:04 -04:00
  • 8b0ec3e70d [addresses] PO Box config Al 2016-04-23 12:36:16 -04:00
  • 088152bdbf [fix] instance var Al 2016-04-23 12:35:09 -04:00
  • cfad1df8cb [fix] import Al 2016-04-23 12:01:17 -04:00
  • a065246e51 [fix] set Al 2016-04-22 16:49:45 -04:00
  • 3bf7edfc5a [fix] file extension part II Al 2016-04-22 16:48:26 -04:00
  • 62ba9c53ee [fix] file extension Al 2016-04-22 16:47:54 -04:00
  • 87f9ec4227 [fix] chmod +x Al 2016-04-22 16:46:23 -04:00
  • 9b54fda8b2 [chains] Adding code to generate chain_stores repo TSV files from OSM Al 2016-04-22 16:45:28 -04:00
  • 22fabccf31 [fix] double pipes Al 2016-04-22 16:25:16 -04:00
  • aba1ebb7de [dictionaries] H&R block was missing Al 2016-04-22 16:11:45 -04:00
  • 6080441ef8 [rm] Removing first attempt at chain stores in favor of new dictionary type Al 2016-04-22 13:30:42 -04:00
  • b56729c5a6 [chains] Adding chain stores derived from frequent OSM venue names at https://github.com/openvenues/chain_stores + research Al 2016-04-22 13:28:48 -04:00
  • 907c8fe96d [addresses] /po_box/po_boxes/ Al 2016-04-20 17:07:27 -04:00
  • 6ff0b25f40 [addresses] Generate house number related phrases Al 2016-04-20 17:06:30 -04:00
  • 1eeda65cfd [dictionaries] /house_number/house_numbers/ Al 2016-04-20 15:56:48 -04:00
  • dba8be445d [fix] None handling and number dictionaries Al 2016-04-20 14:58:57 -04:00
  • 901f720368 [addresses] different dictionaries for sampling cardinal/unit directions, not converting None to a string Al 2016-04-19 17:05:10 -04:00
  • d91735c3c2 [addresses] Updating English config to support new options for occasionally adding whitespace between unit numbers Al 2016-04-19 17:03:46 -04:00
  • 10320723b1 [dictionaries] Removing ambiguous abbreviations for flat Al 2016-04-19 17:01:53 -04:00
  • 38ec82a42b [addresses] Unit/apartment number generation Al 2016-04-19 17:01:24 -04:00
  • 1acf0d592b [addresses] sample positive floors Al 2016-04-19 16:59:16 -04:00
  • 868fcb752b [mv] Moving sampling to math.sampling Al 2016-04-19 11:57:42 -04:00
  • c31926f3dd [addresses] Adding more numeric/numeric_affix probabilities to English config Al 2016-04-19 11:25:12 -04:00
  • ce2b2d9559 [addresses] Conjunction can be subclassed Al 2016-04-19 11:22:13 -04:00
  • c92af0da78 [addresses] Adding ability to randomly append relative/cardinal directions Al 2016-04-19 11:21:23 -04:00
  • 450aee95c2 [addresses] Adding base class for numeric phrases (appended to a number using numeric/numeric_affix), using probability 1.0 if only one of numeric/numeric_affix/ordinal is specified Al 2016-04-19 11:07:25 -04:00
  • 1b2e92dc14 [fix] polygons Al 2016-04-19 10:15:31 -04:00
  • 9abc679f09 [fix] typo Al 2016-04-19 00:53:39 -04:00
  • ccbbf84e8d [dictionaries] Updates to Spanish dictionaries, casa can be a numbered unit type Al 2016-04-19 00:45:32 -04:00
  • b8125a232d [dictionaries] Updates to English dictionaries Al 2016-04-19 00:44:33 -04:00
  • 47ffd18c8c [polygons] Adding __iter__ and __len__ to polygon index and keeping track of the number of polygons for iteration Al 2016-04-19 00:42:50 -04:00
  • 9271fda30e [addresses] Combined unit + house number (32/4, etc.) is more common in Canada, Australia, Singapore, etc. Not as much in the US, UK Al 2016-04-18 17:05:55 -04:00
  • d88f130edf [addresses] changing plurals to use the standard probability structure Al 2016-04-18 15:12:59 -04:00
  • af3fc30632 [docs] Adding note about Rstats binding to the README Al 2016-04-16 13:56:20 -04:00
  • 7272d44575 [dictionaries] Updates to Spanish dictionaries to support the new structure, new abbreviations for Colombia, etc. Al 2016-04-15 14:21:43 -04:00
  • 2a570481ba [addresses] implementing null_probability (raw number, no phrase), orindal genders, and direction_probability Al 2016-04-15 03:25:41 -04:00
  • 430ad2e187 [numbers] suffixed_number Al 2016-04-15 02:04:58 -04:00
  • 028dbacc87 [dictionaries] making entrances/postcodes plural for consistency Al 2016-04-15 01:10:03 -04:00
  • 883ef2ec56 [dictionaries] Moving intersections to cross streets Al 2016-04-14 17:53:27 -04:00
  • 5850793768 [expansion] Add postcode dictionary to gazetteer types Al 2016-04-14 14:33:02 -04:00
  • 6babbfaf02 [addresses] generator for floor numbers as well as special aliases like basement, mezzanine, etc. using the address configs Al 2016-04-14 14:22:08 -04:00
  • 36b3d515ad [expansion] Modifying the Python gazetteers to use new dictionaries API Al 2016-04-14 14:17:09 -04:00
  • 2ff4940e36 [expansion] Adding number and intersections to dictionary types Al 2016-04-14 14:15:33 -04:00
  • 49b02796c0 [addresses] Adding abbreviations as a separate module so it can be used with multiple data sets Al 2016-04-14 03:09:58 -04:00
  • a6553b77d3 [addresses] PO Box phrase generator Al 2016-04-14 02:38:45 -04:00
  • 9eb444b193 [addresses] PO Box fixes in the address config Al 2016-04-14 02:38:02 -04:00
  • d29ade7210 [addresses] conjunction class for building phrases like "5th and 6th" or "Units 1 & 2" across languages using the address configs Al 2016-04-14 01:21:42 -04:00
  • f0ac3522da [addresses] base class for numbered components (floors, units, house numbers in some languages/countries). Can generate many variants of a number (e.g. Floor 2, 2nd Floor, Floor #2, Floor No. 2, etc.) Al 2016-04-14 01:17:43 -04:00
  • fe006e0d62 [addresses] utilities for sampling from an arbitrary discrete distribution, building cumulative distributions, and sampling from a Zipfian distribution which seems to be a reasonable way of generating plausible apartment/floor numbers when the height/number of units is unknown. Picking a letter uniformly at random means P('Unit A') == P('Unit Z') when 'A' should be much more likely. Sampling from a Zipfian gets the desired effect in situations where address components are numbered by "counting from 0/1/A" while still allowing for a long tail Al 2016-04-14 01:13:39 -04:00
  • 58feeab714 [addresses] address config class for general sampling of forms specified in the address configs (default/alternatives to choose a phrase, canonical/abbreviated/sample to choose an abbreviation or surface form for that phrase) Al 2016-04-14 01:06:51 -04:00
  • 518140a1b5 [addresses] Adding corner_of key to the English address config Al 2016-04-14 01:04:01 -04:00
  • db9d51e655 [dictionaries] Intersections dictionary for English Al 2016-04-14 00:57:09 -04:00
  • 8fdd3e9314 [addresses] Additions to the English address config Al 2016-04-14 00:56:39 -04:00
  • e37431912d [boundaries/fix] admin_level 7 in Australia should map to city, not state_district Al 2016-04-13 18:27:29 -04:00
  • 7bb5da94bb [dictionaries] Making the word for "number" a separate dictionary as it can apply in several places Al 2016-04-13 18:27:04 -04:00
  • da561fd9e3 [addresses] Adding probabilities to the English address configs Al 2016-04-11 23:25:16 -04:00
  • 59e5fcd1b4 [fix] LC_ALL=C in data download script Al 2016-04-11 12:47:50 -04:00
  • 7332445525 [polygons] Persistent polygons for neighborhoods index as well, cache size at 100k Al 2016-04-11 01:24:45 -04:00
  • e6dcf975f6 [polygons] neighborhoods repo has the correct polygons for NYC, removing the pediacities version Al 2016-04-10 20:27:38 -04:00
  • f739b46d6d [fix] priorities in neighborhood index Al 2016-04-10 18:56:01 -04:00
  • 761413e723 [fix] var name Al 2016-04-10 14:02:51 -04:00
  • 83fcf39d49 [fix] Fixes to Zetashapes reverse geocoder Al 2016-04-10 14:01:43 -04:00
  • ef72ad592b [fix] moving methods Al 2016-04-09 21:35:17 -04:00
  • dee143798a [polygons/neighborhoods] refactoring Zetashapes download, adding in PediaCities polygons for NYC neighborhoods Al 2016-04-09 21:32:39 -04:00
  • 38b39887ec [polygons] refactoring methods for getting cached/non-cached polygons Al 2016-04-09 19:52:48 -04:00
  • 78924fa308 [polygons] Quattroshapes neighborhoods use regular in-memory polygons Al 2016-04-09 19:28:56 -04:00
  • bcf87574d4 [dictionaries] Spanish abbreviations for numero Al 2016-04-09 15:18:46 -04:00
  • 5d182e30d4 [dictionaries] adding abbreviations for Hong Kong/Kowloon/New Territories Al 2016-04-09 15:17:37 -04:00
  • 2d0a0f1c83 [dictionaries] Adding a few English abbreviations/expansions Al 2016-04-09 14:53:33 -04:00
  • 26581aeb4d [numex] string keys Al 2016-04-08 18:13:08 -04:00
  • d38de71854 [dictionaries] encapsulating reading address dictionaries so it's easy to implement sampling for the address training data Al 2016-04-08 18:12:30 -04:00
  • 02e82e5342 [numex] Nicer API for ordinal suffixes Al 2016-04-08 17:10:10 -04:00
  • 737b5d06ed [osm/polygons] Adding properties in building polygons Al 2016-04-08 12:33:40 -04:00
  • 3bc85db41e [numex] Moving numex files to YAML as well Al 2016-04-07 13:26:00 -04:00
  • 5fce5e8000 [osm/polygons] add building:part to building polygons Al 2016-04-07 13:15:42 -04:00
  • 778fba2451 [osm] Moving OSM boundaries to YAML files instead of JSON for consistency Al 2016-04-06 22:59:46 -04:00
  • f2f131661a [osm/polygons] Using greater simplify tolerance Al 2016-04-06 20:24:37 -04:00
  • 69ef201cf1 [fix] simplify_polygons in building geocoder, and adding caching back to OSM admin polygons as it's faster when taking into account startup time. Also adding a few properties to buildings and landuse polygons Al 2016-04-06 13:53:47 -04:00
  • 502c61d9db [osm/polygons] Same check for closed ways as for relations in OSM polygon readers Al 2016-04-06 01:35:36 -04:00
  • 984cdc0650 [osm/polygons] From benchmarking it seems to make sense to keep OSM polygons in memory after all Al 2016-04-05 23:25:45 -04:00
  • fbebcc11d0 [fix] properties/polygon key split Al 2016-04-05 22:47:48 -04:00
  • ee160c715b [osm/polygons] Trying persistent polygons again on OSM/Quattroshapes to test the new settings Al 2016-04-05 19:46:45 -04:00
  • b8ccb8bfa1 [osm/polygons] Storing polygon JSON under a different key so it doesn't have to be read from disk after a successful cache matched point-in-polygon test just to retrieve the properties Al 2016-04-05 19:45:44 -04:00
  • a8ea5f47c3 [fix] var name Al 2016-04-05 19:23:08 -04:00
  • 65e0067ed0 [fix] classmethod for loading polygons Al 2016-04-05 19:20:12 -04:00
  • a8b0114871 [osm/polygons] Keep OSM/Quattroshapes admin polygons in memory as there are fewer of them and they are large Al 2016-04-05 19:05:26 -04:00
  • b693fe11dd [fix] double prep Al 2016-04-05 18:49:52 -04:00
  • 136700fa7f [fix] return_all in polygon index Al 2016-04-05 18:42:20 -04:00
  • e242868fd9 [osm/polygons] Keep stats on cache hits/misses for testing cache sizes Al 2016-04-05 16:46:14 -04:00
  • ec29c36cbc [build] Adding lru-dict, a fast C LRU cache, to requirements.txt for geodata package Al 2016-04-05 14:55:35 -04:00
  • 004165d184 [osm/polygons] Using an LRU cache for prepped polygons in the various PolygonIndex subclasses. That way can store less simplified polygons but keep frequently accessed ones (like countries) in memory Al 2016-04-05 14:53:07 -04:00
  • 01567d2672 [osm/boundaries] admin_level 10 in Spain = suburb Al 2016-04-05 01:24:26 -04:00
  • 1af5b88922 [fix] name Al 2016-04-05 00:51:01 -04:00
  • 49498ccf81 [fix] import Al 2016-04-04 23:38:30 -04:00
  • 6bb6ddb06a [fix] arg name Al 2016-04-04 22:41:20 -04:00