Commit Graph

65 Commits

Author SHA1 Message Date
Al
14bc224f25 [openaddresses] Adding OSM neighborhoods across the US wherever we have them. That index is relatively small and cheap to do lookups for every point whereas the general R-tree should be used only when necessary 2016-08-24 14:58:19 -04:00
Al
4552aa380c [openaddresses] Adding South Carolina 2016-08-24 14:47:07 -04:00
Al
f66fb4a172 [openaddresses] Adding Maryland 2016-08-24 13:54:40 -04:00
Al
f9ec02c8e0 [openaddresses] Adding Georgia. There's a lot of weirdness in there so whitelisting files. Files that weren't added were deliberate 2016-08-24 13:52:35 -04:00
Al
ad625a46a4 [openaddresses] Adding Delaware and Pennsylvania. Going with the "older states in the union will have funkier addresses" strategy. 2016-08-23 22:22:35 -04:00
Al
e746cbab75 [openaddresses] Adding New England states (postcodes beginning with 0). 2016-08-23 02:51:20 -04:00
Al
9866614f63 [openaddresses] Using new config implementation, using neighborhoods/boroughs in NYC 2016-08-23 02:14:29 -04:00
Al
ed0b49884e [openaddresses] Changes to OA config utilizing some of the new cleanup options. Adding language to brussels-fr and brussels-nl, adding New York and New Jersey statewide with the understanding that OSM components will be added in NJ and postcodes will be stripped of letters in NY 2016-08-23 00:38:43 -04:00
Al
8b57a7acf2 [osm] abbreviate toponyms (qualifiers) with some probability so we get those versions in the model's phrase dictionaries 2016-08-22 20:55:35 -04:00
Al
b41ba7374b [intersections] intersections training data, using a Cartesian product of all names in the same language, including something like tiger:name_base 2016-08-18 01:19:14 -04:00
Al
10a41309b8 [addresses] Increasing Romaji probability to 0.4 2016-08-06 21:27:32 -04:00
Al
cdd5a96346 [addresses] metro station can also be used for plain venues without a house number so we get more in the training set 2016-08-06 20:52:29 -04:00
Al
195278cfea [osm] Reverse geocoding to metro station only for addresess in Japan 2016-08-06 19:50:18 -04:00
Al
6ce882cb55 [addresses] Metro station component dependencies (road or house_number) 2016-08-06 19:34:39 -04:00
Al
8b5d44e173 [fix] Japanese house numbers aren't without dependencies, just have different ones (road or suburb or city_district) 2016-08-06 03:38:44 -04:00
Al
2c024ce9f4 [addresses] special case for Japan, house_number does not depend on street name 2016-08-06 02:38:58 -04:00
Al
afbb79b81d [osm/parser] Making a much lower probability of generating sub-building components for named venues (usually on the ground floor, etc.) 2016-07-31 20:40:44 -04:00
Al
09b16d954f [osm] Use much lower probability of ISO country codes 2016-07-29 11:41:39 -04:00
Al
21bcbd8381 [fix] restoring CLDR probability 2016-07-28 15:21:44 -04:00
Al
bebb33fe64 [osm] Include CLDR country even if the place didn't match simplified OSM polygons 2016-07-28 14:11:31 -04:00
Al
543048bc26 [osm] use CLDR country names with random probability 2016-07-28 02:37:12 -04:00
Al
1058b17a61 [osm] Moving admin_center overrides to OSM parser config 2016-07-25 02:02:48 -04:00
Al
9681d4dc8e [merge] 2016-07-22 18:55:55 -04:00
Al
226dd55a97 [osm] Adding Romaji probability to Japanese config for block/house number phrases 2016-07-22 17:01:15 -04:00
Al
b1b797171c [osm] Combining addr:block_number and addr:housenumber in Japan (randomly adds phrases for the 番号/bango system) 2016-07-22 14:52:16 -04:00
Al
afa58e6edb [openaddresses] Removing New Zealand city as the field is not specific enough and may conflict with OSM names, needs to be reverse geocoded. Adding cldr country probabilities so we can add localized names/codes given the country 2016-07-21 17:04:57 -04:00
Al
90a2f2b2e0 [parser] road has no dependencies 2016-07-21 17:04:57 -04:00
Al
29d16c9c80 [openaddresses] Country code for Belgium, removing Flanders as it has encoding issues, removing region from New Zealand formats as it appears to be conflated with districts 2016-07-21 17:04:57 -04:00
Al
64824b90a9 [openaddresses] Only adding units for Australia, as they're known to contain both designator and number. US units seem to often have simple numbers/letters for the unit field 2016-07-21 17:04:57 -04:00
Al
55d66af422 [openaddresses] Adding abbreviated unit 2016-07-21 17:04:57 -04:00
Al
2120adefff [openaddresses] Adding unit by default (only for files that have been vetted) 2016-07-21 17:04:57 -04:00
Al
cc4b7109ab [openaddresses] OpenAddresses config specifying a few files 2016-07-21 17:04:57 -04:00
Al
cc7727b13e [intersections] Adding intersections to config 2016-07-21 17:04:57 -04:00
Al
366c4995af [parser] lower full-name probability for states 2016-07-21 17:04:57 -04:00
Al
308080f6ee [formatting] Moving language country overrides to formatter config so actual language is retained 2016-07-21 17:04:57 -04:00
Al
890268aa87 [languages] Use English formats for Romanized CJK 2016-07-21 17:04:57 -04:00
Al
a5331f7107 [osm] Venue name depends on one of {house_number, road, suburb, city_district, city, postcode} 2016-07-21 17:04:57 -04:00
Al
e4d84fac7e [parser/osm] Adding address sans name for venues probabilistically 2016-07-21 17:04:57 -04:00
Al
fc44255be7 [osm/parser] Place only probability for chain queries as well 2016-07-21 17:04:57 -04:00
Al
b61cce7983 [osm/parser] Place only probability for category queries 2016-07-21 17:04:57 -04:00
Al
e99d5aebe0 [parser/osm] Adding category plural probability, chain store sample probability and probability of dropping postcode for raw places 2016-07-21 17:04:57 -04:00
Al
08212efe44 [parser] Adding OSM-specific parser config 2016-07-21 17:04:57 -04:00
Al
6fc6f9f591 [addresses] Adding address-level component dropout to AddressComponents (returns an ordering so the client formatter can potentially emit multiple addresses with different components dropped out). Adding PO box and category probabilities to config 2016-07-21 17:04:57 -04:00
Al
f468ab84d2 [parser] Removing island exceptions from parser default config 2016-07-21 17:04:57 -04:00
Al
62b35b318f [parser] Parser default config 2016-07-21 17:04:57 -04:00
Al
0d2e8387e6 [openaddresses] Removing New Zealand city as the field is not specific enough and may conflict with OSM names, needs to be reverse geocoded. Adding cldr country probabilities so we can add localized names/codes given the country 2016-05-31 18:29:07 -04:00
Al
9fcc04e440 [parser] road has no dependencies 2016-05-31 15:52:24 -04:00
Al
bc28f69875 [openaddresses] Country code for Belgium, removing Flanders as it has encoding issues, removing region from New Zealand formats as it appears to be conflated with districts 2016-05-31 12:11:42 -04:00
Al
d98eeb08e8 [openaddresses] Only adding units for Australia, as they're known to contain both designator and number. US units seem to often have simple numbers/letters for the unit field 2016-05-31 02:20:28 -04:00
Al
0efab434f7 [openaddresses] Adding abbreviated unit 2016-05-31 02:11:52 -04:00