Commit Graph

1790 Commits

Author SHA1 Message Date
Al
8aac200d74 [addresses] config for phrases around postcodes like CP in Spanish 2016-07-21 17:04:57 -04:00
Al
f070697066 [addresses] PO Box config 2016-07-21 17:04:57 -04:00
Al
5bbb60e241 [fix] instance var 2016-07-21 17:04:57 -04:00
Al
3fd73c0bc8 [fix] import 2016-07-21 17:04:57 -04:00
Al
5713a07106 [fix] set 2016-07-21 17:04:57 -04:00
Al
bf22aead7b [fix] file extension part II 2016-07-21 17:04:57 -04:00
Al
7a44ba78bb [fix] file extension 2016-07-21 17:04:57 -04:00
Al
79b5516e75 [fix] chmod +x 2016-07-21 17:04:57 -04:00
Al
fe2921a20a [chains] Adding code to generate chain_stores repo TSV files from OSM 2016-07-21 17:04:57 -04:00
Al
6c47d2bed7 [fix] double pipes 2016-07-21 17:04:57 -04:00
Al
95058aa904 [dictionaries] H&R block was missing 2016-07-21 17:04:57 -04:00
Al
6d9929b94a [rm] Removing first attempt at chain stores in favor of new dictionary type 2016-07-21 17:04:57 -04:00
Al
6ec14f11f6 [chains] Adding chain stores derived from frequent OSM venue names at https://github.com/openvenues/chain_stores + research 2016-07-21 17:04:57 -04:00
Al
a7fe6408c0 [addresses] /po_box/po_boxes/ 2016-07-21 17:04:57 -04:00
Al
1e107f09ab [addresses] Generate house number related phrases 2016-07-21 17:04:57 -04:00
Al
62748b4644 [dictionaries] /house_number/house_numbers/ 2016-07-21 17:04:57 -04:00
Al
90c88a3a24 [fix] None handling and number dictionaries 2016-07-21 17:04:57 -04:00
Al
e13c536b03 [addresses] different dictionaries for sampling cardinal/unit directions, not converting None to a string 2016-07-21 17:04:57 -04:00
Al
8688812e71 [addresses] Updating English config to support new options for occasionally adding whitespace between unit numbers 2016-07-21 17:04:57 -04:00
Al
7f3667caf8 [dictionaries] Removing ambiguous abbreviations for flat 2016-07-21 17:04:57 -04:00
Al
c47762b91c [addresses] Unit/apartment number generation 2016-07-21 17:04:57 -04:00
Al
ca68391ea6 [addresses] sample positive floors 2016-07-21 17:04:57 -04:00
Al
9f652591ad [mv] Moving sampling to math.sampling 2016-07-21 17:04:57 -04:00
Al
93df047f8c [addresses] Adding more numeric/numeric_affix probabilities to English config 2016-07-21 17:04:57 -04:00
Al
32b6217aa8 [addresses] Conjunction can be subclassed 2016-07-21 17:04:57 -04:00
Al
535453f77d [addresses] Adding ability to randomly append relative/cardinal directions 2016-07-21 17:04:57 -04:00
Al
f026e8a764 [addresses] Adding base class for numeric phrases (appended to a number using numeric/numeric_affix), using probability 1.0 if only one of numeric/numeric_affix/ordinal is specified 2016-07-21 17:04:57 -04:00
Al
efc40c5698 [fix] polygons 2016-07-21 17:04:57 -04:00
Al
c7ea5d9637 [fix] typo 2016-07-21 17:04:57 -04:00
Al
cc17d8c15d [dictionaries] Updates to Spanish dictionaries, casa can be a numbered unit type 2016-07-21 17:04:57 -04:00
Al
5dcc7130d2 [dictionaries] Updates to English dictionaries 2016-07-21 17:04:57 -04:00
Al
0a80ec7129 [polygons] Adding __iter__ and __len__ to polygon index and keeping track of the number of polygons for iteration 2016-07-21 17:04:57 -04:00
Al
9328883a61 [addresses] Combined unit + house number (32/4, etc.) is more common in Canada, Australia, Singapore, etc. Not as much in the US, UK 2016-07-21 17:04:57 -04:00
Al
848b7ac167 [addresses] changing plurals to use the standard probability structure 2016-07-21 17:04:57 -04:00
Al
d0fb0d413d [dictionaries] Updates to Spanish dictionaries to support the new structure, new abbreviations for Colombia, etc. 2016-07-21 17:04:57 -04:00
Al
f7764b70cd [addresses] implementing null_probability (raw number, no phrase), orindal genders, and direction_probability 2016-07-21 17:04:57 -04:00
Al
22687323c2 [numbers] suffixed_number 2016-07-21 17:04:57 -04:00
Al
6d4e54cd7a [dictionaries] making entrances/postcodes plural for consistency 2016-07-21 17:04:57 -04:00
Al
410eb0006a [dictionaries] Moving intersections to cross streets 2016-07-21 17:04:57 -04:00
Al
2f9a58f37b [expansion] Add postcode dictionary to gazetteer types 2016-07-21 17:04:57 -04:00
Al
b5386eb601 [addresses] generator for floor numbers as well as special aliases like basement, mezzanine, etc. using the address configs 2016-07-21 17:04:57 -04:00
Al
e1f1e34dca [expansion] Modifying the Python gazetteers to use new dictionaries API 2016-07-21 17:04:57 -04:00
Al
80089099e9 [expansion] Adding number and intersections to dictionary types 2016-07-21 17:04:57 -04:00
Al
3d3aacae67 [addresses] Adding abbreviations as a separate module so it can be used with multiple data sets 2016-07-21 17:04:57 -04:00
Al
317d3aa9ed [addresses] PO Box phrase generator 2016-07-21 17:04:57 -04:00
Al
21a2c067f5 [addresses] PO Box fixes in the address config 2016-07-21 17:04:57 -04:00
Al
9c4348a990 [addresses] conjunction class for building phrases like "5th and 6th" or "Units 1 & 2" across languages using the address configs 2016-07-21 17:04:57 -04:00
Al
d136fb7576 [addresses] base class for numbered components (floors, units, house numbers in some languages/countries). Can generate many variants of a number (e.g. Floor 2, 2nd Floor, Floor #2, Floor No. 2, etc.) 2016-07-21 17:04:57 -04:00
Al
14c89e6895 [addresses] utilities for sampling from an arbitrary discrete distribution, building cumulative distributions, and sampling from a Zipfian distribution which seems to be a reasonable way of generating plausible apartment/floor numbers when the height/number of units is unknown. Picking a letter uniformly at random means P('Unit A') == P('Unit Z') when 'A' should be much more likely. Sampling from a Zipfian gets the desired effect in situations where address components are numbered by "counting from 0/1/A" while still allowing for a long tail 2016-07-21 17:04:57 -04:00
Al
dcabdf7c0b [addresses] address config class for general sampling of forms specified in the address configs (default/alternatives to choose a phrase, canonical/abbreviated/sample to choose an abbreviation or surface form for that phrase) 2016-07-21 17:04:57 -04:00