Commit Graph

28 Commits

Author SHA1 Message Date
Al
e5fdd915d0 [fix] check the first phrase for components and bail if it matches something other than the specified tag 2016-07-21 17:04:57 -04:00
Al
8370a41ec0 [fix] import 2016-07-21 17:04:57 -04:00
Al
651bc32650 [addresses] more thoroughly solving the addr:city='Harlem' issue 2016-07-21 17:04:57 -04:00
Al
5a31b60cbe [addresses] Adding normalized_place_name, a method for separating compound fields like addr:city='New York NY' into simply 'New York', solving the compound phrase problem. Also solves the mislabeled place name problem, causing the system to ignore the user tag and fall back on reverse geocoded components in cases e.g. where addr:city='Harlem', which is a known neighborhood but not a city when reverse geocoded. A few other refactors for expanded address components 2016-07-21 17:04:57 -04:00
Al
fa99b4ce77 [addresses] wrapping up some of the functionality from OSM formatter to be used in on an arbitrary address component dictionary 2016-07-21 17:04:57 -04:00
Al
7e5ecb30cf [addresses] sample_alphabet (Zipfian) in PO box rather than a uniform choice 2016-07-21 17:04:57 -04:00
Al
3d765e9eca [addresses] Fixing direction_probability, adding ability to have phrases which only apply to numbers, adding the possibility of null phrases to non-numeric "numbers" e.g. A-Z, etc. 2016-07-21 17:04:57 -04:00
Al
04a5a9e611 [fix] Removing YAML inheritance as it doesn't merge nested dictionaries 2016-07-21 17:04:57 -04:00
Al
37747709ee [addresses] Using YAML inheritance instead of baking it into the config parser 2016-07-21 17:04:57 -04:00
Al
8aac200d74 [addresses] config for phrases around postcodes like CP in Spanish 2016-07-21 17:04:57 -04:00
Al
a7fe6408c0 [addresses] /po_box/po_boxes/ 2016-07-21 17:04:57 -04:00
Al
1e107f09ab [addresses] Generate house number related phrases 2016-07-21 17:04:57 -04:00
Al
90c88a3a24 [fix] None handling and number dictionaries 2016-07-21 17:04:57 -04:00
Al
e13c536b03 [addresses] different dictionaries for sampling cardinal/unit directions, not converting None to a string 2016-07-21 17:04:57 -04:00
Al
7f3667caf8 [dictionaries] Removing ambiguous abbreviations for flat 2016-07-21 17:04:57 -04:00
Al
c47762b91c [addresses] Unit/apartment number generation 2016-07-21 17:04:57 -04:00
Al
ca68391ea6 [addresses] sample positive floors 2016-07-21 17:04:57 -04:00
Al
9f652591ad [mv] Moving sampling to math.sampling 2016-07-21 17:04:57 -04:00
Al
32b6217aa8 [addresses] Conjunction can be subclassed 2016-07-21 17:04:57 -04:00
Al
535453f77d [addresses] Adding ability to randomly append relative/cardinal directions 2016-07-21 17:04:57 -04:00
Al
f026e8a764 [addresses] Adding base class for numeric phrases (appended to a number using numeric/numeric_affix), using probability 1.0 if only one of numeric/numeric_affix/ordinal is specified 2016-07-21 17:04:57 -04:00
Al
f7764b70cd [addresses] implementing null_probability (raw number, no phrase), orindal genders, and direction_probability 2016-07-21 17:04:57 -04:00
Al
b5386eb601 [addresses] generator for floor numbers as well as special aliases like basement, mezzanine, etc. using the address configs 2016-07-21 17:04:57 -04:00
Al
317d3aa9ed [addresses] PO Box phrase generator 2016-07-21 17:04:57 -04:00
Al
9c4348a990 [addresses] conjunction class for building phrases like "5th and 6th" or "Units 1 & 2" across languages using the address configs 2016-07-21 17:04:57 -04:00
Al
d136fb7576 [addresses] base class for numbered components (floors, units, house numbers in some languages/countries). Can generate many variants of a number (e.g. Floor 2, 2nd Floor, Floor #2, Floor No. 2, etc.) 2016-07-21 17:04:57 -04:00
Al
14c89e6895 [addresses] utilities for sampling from an arbitrary discrete distribution, building cumulative distributions, and sampling from a Zipfian distribution which seems to be a reasonable way of generating plausible apartment/floor numbers when the height/number of units is unknown. Picking a letter uniformly at random means P('Unit A') == P('Unit Z') when 'A' should be much more likely. Sampling from a Zipfian gets the desired effect in situations where address components are numbered by "counting from 0/1/A" while still allowing for a long tail 2016-07-21 17:04:57 -04:00
Al
dcabdf7c0b [addresses] address config class for general sampling of forms specified in the address configs (default/alternatives to choose a phrase, canonical/abbreviated/sample to choose an abbreviation or surface form for that phrase) 2016-07-21 17:04:57 -04:00