Commit Graph

129 Commits

Author SHA1 Message Date
Al
da36b71829 [addresses] adding new places index in OSM and OpenAddresses training data 2016-12-05 18:36:17 -05:00
Al
628fecea59 [addresses] adding point-based city/equivalent reverse geocoding for places that don't have as many defined polygons in OSM 2016-12-05 18:30:46 -05:00
Al
ef243fbb18 [fix] var name 2016-11-25 13:41:07 -08:00
Al
cdbc102821 [boundaries] in addition to population, check if a city has an unambiguous Wikipedia 2016-11-25 13:36:49 -08:00
Al
3dc2a922fb [addresses/languages] if there's only one default language and we don't have a road name or a unicode script to disambiguate, assume the default (e.g. English in the US unless there's a Spanish/French road name). Can affect things like state abbreviations 2016-11-22 18:27:54 -05:00
Al
de9bf29af0 [addresses] allowing osm_components argument to AddressComponents.expanded 2016-11-19 01:38:02 -05:00
Al
ca89a6ca2e [fix] args 2016-11-18 18:09:48 -05:00
Al
4e30a23313 [addresses] Adding toponym abbreviation to the input admin components as well as those obtained through reverse geocoding. Also was doing two random tests before abbreviating toponyms, reducing their frequency in the training data, now correctly using a single test. 2016-11-17 19:53:09 -05:00
Al
15b66f541c [fix] refactor to use ComponentDependencies class 2016-11-15 17:07:10 -05:00
Al
653b2d09c0 [addresses] moving component dependency graphs to a new module 2016-11-14 16:45:15 -05:00
Al
495b27470e [addresses] refactoring address component dependency graphs 2016-11-12 18:09:36 -05:00
Al
e9106698d2 [fix] convert newlines 2016-10-27 12:01:48 -04:00
Al
d51a1d6196 [addresses] doing hyphenation for existing components in component expansion (i.e. OSM training data) 2016-10-21 22:02:19 -04:00
Al
00ebdfed7f [osm] adding alt_place_names to the shared formatting class AddressComponents and making them classmethods 2016-10-20 20:41:22 -04:00
Al
51afc2619b [fix] only replace whitespace between words, not for instance whitespace around an existing hyphen, and reducing to one space for spaced hyphens 2016-10-19 01:24:54 -04:00
Al
e8899eafd6 [osm] adding hyphenation/de-hyphenation to OSM admin components 2016-10-19 01:00:29 -04:00
Al
72e7d3ff5b [addresses/hyphens] adding some methods to hyphenate/dehyphenate place names at random 2016-10-18 19:10:31 -04:00
Al
6ff1024c02 [fix] null candidate languages 2016-10-07 19:49:32 -04:00
Al
4ff3f50e01 [fix] Dublin postcode formatting 2016-10-07 01:06:37 -04:00
Al
2e8b6e6a29 [fix] args 2016-10-07 01:03:22 -04:00
Al
a67efcffe4 [addresses] add new option to use city population to determine whether components should be dropped out 2016-10-05 18:16:25 -04:00
Al
182c0b3d26 [addresses] adding country-specific cleanups for Kingston (city=Kingston 12 split into city=Kingston, postcode=12) and Dublin (e.g. Dublin 3 specified various ways will be treated as a city_district, whereas Eirecodes are treated as postal codes) 2016-10-05 17:05:24 -04:00
Al
faf418decb [languages] using country_and_languages method in OSM, neighborhoods and OpenAddresses 2016-10-05 02:49:55 -04:00
Al
551cce8cb1 [fix] making a separate gazetteer for toponym abbreviations 2016-09-10 01:08:58 -04:00
Al
19a044f7f3 [fix] imports 2016-09-10 00:09:11 -04:00
Al
604e898d65 [fix] using toponym_gazetteer in OSM boundary abbreviations 2016-09-10 00:02:59 -04:00
Al
d1483ea589 [fix] postcodes 2016-09-02 02:31:48 -04:00
Al
28fbb6a3bf [fix] admin_center ids 2016-09-02 02:19:17 -04:00
Al
1287f131cb [addresses] using the new admin_center config in AddressComponents 2016-09-02 02:02:28 -04:00
Al
a58194ca2e [fix] add_admin_boundaries and adding cleaned up house number 2016-08-28 15:15:57 -04:00
Al
dd0ca5e008 [addresses] Adding admin_center properties to place components in add_admin_boundaries (only overriding for specified areas where the boundary may otherwise not have all the properties) 2016-08-25 01:20:06 -04:00
Al
8b57a7acf2 [osm] abbreviate toponyms (qualifiers) with some probability so we get those versions in the model's phrase dictionaries 2016-08-22 20:55:35 -04:00
Al
d281e71d2c [fix] removing metro station indexas a dependency for AddressComponents 2016-08-22 15:52:27 -04:00
Al
f43abe0846 [fix] making cleaned_name a classmethod 2016-08-18 19:55:52 -04:00
Al
7b314324ca [osm/addresses] Factoring out semicolon/comma-delimited name cleanup into its own method 2016-08-17 18:45:33 -04:00
Al
48755ec218 [boundaries] Adding regex replacements for boundary names such as Lyon 2e Arrondissement where putting Lyon is the OSM convention but we might sometimes want just 2e Arrondissement to appear in the training data next to Lyon 2016-08-11 13:09:24 -04:00
Al
5ec752e887 [fix] order of ops 2016-08-06 20:43:13 -04:00
Al
3e34012e69 [fix] if the language is given already, use it as a suffix rather than choosing at random 2016-08-06 20:36:56 -04:00
Al
606c464db6 [fix] house number phrases 2016-08-06 20:11:32 -04:00
Al
0e7cb2b06c [fix] var name II 2016-08-06 20:00:35 -04:00
Al
8d88820d30 [fix] var name 2016-08-06 19:59:53 -04:00
Al
6ef54bcc6f [addresses] Adding metro stations to AddressComponents expansion 2016-08-06 19:36:57 -04:00
Al
684550ea7d [fix] only add house_number phrase to numeric inputs 2016-08-06 14:49:28 -04:00
Al
445e8082c8 [addresses] Adding per-country overrides for address component dependencies 2016-08-06 02:36:47 -04:00
Al
0ab3b13b75 [osm] Remove hanging commas, slashes, etc. Implementing a stricter rule for user-specified tags (not reverse geocoded) so that if they contain an unknown phrase followed by an unknown boundary phrase, we delete that tag and fall back to the reverse geocoded components. Moving CLDR country tagging to later in the process since those are known correct names. 2016-08-02 16:25:45 -04:00
Al
4ab60cd4fc [osm] Remove boundary names with trailing commas 2016-08-02 03:13:05 -04:00
Al
12466b12dc [osm] Removing boundary names (not including postal codes) which are simply digits 2016-08-02 02:17:25 -04:00
Al
e11c723f8b [fix] var rename 2016-08-01 17:50:00 -04:00
Al
79ce922432 [osm] Fixing sub-building components so generated numbers are not added to the address components unless cls.phrase returns non-None 2016-08-01 17:44:23 -04:00
Al
3505af4bc1 [fix] don't add phrases for non-numeric existing components 2016-07-31 22:14:37 -04:00