Commit Graph

  • 5140db536a [phrases] additions to venue names dictionaries and a more restrictive version of street types dictionaries Al 2016-11-19 02:58:27 -05:00
  • 71be0fdfbc [fix] sets Al 2016-11-19 02:30:40 -05:00
  • b6f7b5b577 [fix] name Al 2016-11-19 01:38:15 -05:00
  • de9bf29af0 [addresses] allowing osm_components argument to AddressComponents.expanded Al 2016-11-19 01:38:02 -05:00
  • 1df1b60a9f [phrases] adding extract_phrases method to gazetteers, which returns a set of gazetteer phrases found in a given string Al 2016-11-18 23:35:44 -05:00
  • 8ef8d88186 [fix] don't short-circuit OSM address formatting unless there are no components and no venue names Al 2016-11-18 23:31:24 -05:00
  • 25ceeed6ef [fix] check before pop Al 2016-11-18 18:36:35 -05:00
  • 7a89c6e9ce [osm] removing dependencies for house/venue name (purely numeric names taken care of in osm formatter) Al 2016-11-18 18:32:44 -05:00
  • ca89a6ca2e [fix] args Al 2016-11-18 18:09:48 -05:00
  • 72305975eb [openaddresses] adding Nelson Mandela Bay as a pre-release download Al 2016-11-18 18:00:42 -05:00
  • 6e73d46097 [fix] typo Al 2016-11-18 00:50:18 -05:00
  • 4e30a23313 [addresses] Adding toponym abbreviation to the input admin components as well as those obtained through reverse geocoding. Also was doing two random tests before abbreviating toponyms, reducing their frequency in the training data, now correctly using a single test. Al 2016-11-17 19:53:09 -05:00
  • a9fdfee2ac [polygons] adding optional test_point for complex polygons with an admin_center, and including admin_center lat/lon as part of the properties Al 2016-11-17 19:36:32 -05:00
  • c2ccec70ad [polygons] adding lat/lon props to admin centers Al 2016-11-17 19:21:31 -05:00
  • 71d535e845 [polygons] using try/except in polygons Al 2016-11-17 17:38:54 -05:00
  • d701bb1320 [polygons] only applying the new fix-on-read solution in the OSM admin/subdivision indices Al 2016-11-17 00:33:06 -05:00
  • c1d4b03bb4 [polygons] moving polygon fixes to the to_polygon method so they get applied both at ingestion and on cache load Al 2016-11-16 23:25:48 -05:00
  • a25ae7f9ef [osm/polygons] adding fixed version of a polygon if polygon is invalid and doesn't contain its centroid Al 2016-11-16 17:38:01 -05:00
  • 0421b8b17c [boundaries] Reading, UK Al 2016-11-16 03:48:21 -05:00
  • 9c5321d240 [boundaries] Bedford, UK Al 2016-11-16 03:45:50 -05:00
  • 749e495482 [boundaries] Nottingham, UK Al 2016-11-16 03:37:21 -05:00
  • b5464f842b [boundaries] converting admin_level=10 to city in the UK and Ireland Al 2016-11-16 03:20:38 -05:00
  • 4a0ed7c703 [boundaries] adding a few more city boundary exceptions to England and Scotland Al 2016-11-16 02:55:30 -05:00
  • e85a1b906a [fix] East Asian probabilities Al 2016-11-16 02:54:56 -05:00
  • 3617b3a10c [fix] recursive merge for entries that are empty dictionaries Al 2016-11-16 02:14:28 -05:00
  • b03494a736 [boundaries] adding admin_level=6 as cities in West Midlands (county), UK Al 2016-11-16 01:53:03 -05:00
  • 07f41a7565 [boundaries] adding York as city in UK (listed as admin_level=6) Al 2016-11-16 01:35:59 -05:00
  • c5a48b4cd3 [fix] East Asian system po_box probabilities Al 2016-11-16 01:26:31 -05:00
  • 15b66f541c [fix] refactor to use ComponentDependencies class Al 2016-11-15 17:07:10 -05:00
  • 68ab69cdc3 [fix] alias in formatter config Al 2016-11-15 17:04:15 -05:00
  • dc65f518a5 [openaddresses] adding new US counties from OpenAddresses Al 2016-11-15 02:32:00 -05:00
  • 67f409cdf6 [places] adding dependencies to admin components e.g. so in some countries city_district must be accompanied by a city, etc. Al 2016-11-15 02:31:15 -05:00
  • 96fb725e54 [formatting] adding po_box insertions for East Asian addresses Al 2016-11-14 18:29:44 -05:00
  • 653b2d09c0 [addresses] moving component dependency graphs to a new module Al 2016-11-14 16:45:15 -05:00
  • 495b27470e [addresses] refactoring address component dependency graphs Al 2016-11-12 18:09:36 -05:00
  • b42159205d [openaddresses] adding some of the new US counties from OA Al 2016-11-10 10:27:11 -05:00
  • 7c9c600e07 [openaddresses] add new counties from upstream Al 2016-11-06 00:06:59 -04:00
  • a6c88f54ab [openaddresses] add Forsyth county Al 2016-11-02 23:59:50 -04:00
  • 7cdccbe31f [openaddresses] adding fixed sources in ID Al 2016-10-31 11:22:48 -04:00
  • 353c6c7b7a [openaddresses] adding Jefferson County, AL Al 2016-10-28 10:58:04 -04:00
  • e9106698d2 [fix] convert newlines Al 2016-10-27 12:01:48 -04:00
  • e48f207d10 [openaddresses] updating with new OpenAddresses sources Al 2016-10-27 11:19:30 -04:00
  • 5cabd9b4f7 [fix] country languages in OpenAddresses Al 2016-10-24 17:35:39 -04:00
  • ac0eb1776e [openaddresses] adding Brazoria County, TX Al 2016-10-24 09:27:05 -04:00
  • 35d3d8cc73 [openaddresses] countries are known a priori, so if the boundaries don't quite line up with OSM, use the country from the path Al 2016-10-23 19:50:54 -04:00
  • f429bea15b [fix] subtract abs value Al 2016-10-23 01:11:09 -04:00
  • 1658c425c5 [fix] clear country cache only at each new country, not each file Al 2016-10-23 00:57:52 -04:00
  • 7199ff17e0 [fix] truncate postcodes that are longer than specified length Al 2016-10-23 00:52:24 -04:00
  • 3934111cdf [openaddresses] 5-digit postcodes in a few counties Al 2016-10-23 00:51:43 -04:00
  • 889e914dfc [openaddresses] clear all polygon caches Al 2016-10-23 00:11:54 -04:00
  • 0fd431a9d2 [fix] abs Al 2016-10-22 23:55:30 -04:00
  • ec54d3de35 [fix] don't convert number to int/float in numeric_phrase (chops leading zeros) Al 2016-10-22 23:49:58 -04:00
  • 63edd53fb3 [openaddresses] adding clear_cache method to clear the LRU cache for point-in-polygon indices and using it in OpenAddresses import since it heavily reuses polygons and only for the current file Al 2016-10-22 20:28:59 -04:00
  • d51a1d6196 [addresses] doing hyphenation for existing components in component expansion (i.e. OSM training data) Al 2016-10-21 22:02:15 -04:00
  • 0216a991c6 [formatting] use US template insertions for Canada as well Al 2016-10-21 14:43:40 -04:00
  • 2a355b2cf8 [openaddresses] adding address only 10% of the time in OpenAddresses Al 2016-10-20 23:57:30 -04:00
  • dfbc4bf144 [openaddresses] no add_osm_boundaries for two of the recent Washington editions, only reverse geocode to OSM when no city is given Al 2016-10-20 22:46:29 -04:00
  • d965ea9371 [openaddresses] adding hyphenation/dehyphenation to the OpenAddresses formatter Al 2016-10-20 20:55:17 -04:00
  • 00ebdfed7f [osm] adding alt_place_names to the shared formatting class AddressComponents and making them classmethods Al 2016-10-20 20:41:22 -04:00
  • d9bc465c82 [osm] parsing out semicolon-delimited postal codes from OSM in countries like Poland that use hyphen delimited postcodes without treating them as number ranges Al 2016-10-19 17:46:37 -04:00
  • 91e6ca0942 [osm] adding a number of Australian city council boundaries Al 2016-10-19 16:33:42 -04:00
  • cec8168279 [osm] adding council and city council to ignorable place name suffixes Al 2016-10-19 16:33:04 -04:00
  • ec77a247fa [fix] just ignore records without the "name" tag Al 2016-10-19 13:36:15 -04:00
  • 61078eded9 [fix] checking for dictionary key Al 2016-10-19 13:34:13 -04:00
  • c2b73307de [fix] parens Al 2016-10-19 13:29:56 -04:00
  • f639151698 [osm] checking for non-admin_center nodes which are part of a lower admin level polygon with the same name Al 2016-10-19 13:27:33 -04:00
  • e380567ac4 [osm] adding alt_place_names method which does hyphenation, de-hyphenation and abbreviated toponyms with/without hyphens Al 2016-10-19 02:19:09 -04:00
  • 51afc2619b [fix] only replace whitespace between words, not for instance whitespace around an existing hyphen, and reducing to one space for spaced hyphens Al 2016-10-19 01:23:58 -04:00
  • 78f341f4f1 [osm] higher probability of hyphenation Al 2016-10-19 01:10:06 -04:00
  • e8899eafd6 [osm] adding hyphenation/de-hyphenation to OSM admin components Al 2016-10-19 01:00:29 -04:00
  • 98ac232eea [osm] hyphenating and de-hyphenating place names in places training data Al 2016-10-19 00:33:10 -04:00
  • 562caba31c [openaddresses] adding new counties in Washington Al 2016-10-19 00:30:50 -04:00
  • 72e7d3ff5b [addresses/hyphens] adding some methods to hyphenate/dehyphenate place names at random Al 2016-10-18 19:10:31 -04:00
  • 7e007a49ab [osm] removing place=district mapping globally (means city_district in Hungary) and mapping it specifically to state_district/city_district in the places where it's needed Al 2016-10-18 19:02:36 -04:00
  • 9384d8cc7e [osm] adding exception for Vienna Al 2016-10-18 02:52:02 -04:00
  • d4f4b716a0 [openaddresses] adding new counties in Oregon Al 2016-10-18 00:44:45 -04:00
  • fc9ed13bc5 [boundaries] adding Community Development Council and CDC as removable suffixes for Singapore Al 2016-10-17 16:04:17 -04:00
  • d34faf42b8 [osm] fix names with pipes in them Al 2016-10-17 02:32:25 -04:00
  • a796b41d90 [geonames] admin codes on geonames/postal_codes tables Al 2016-10-17 00:21:33 -04:00
  • ff27ee14bb [osm] only add label props if the name property is identical (counterexample, Nottinghamshire's label is listed as West Bridgford, which is really its admin_center) Al 2016-10-16 22:18:52 -04:00
  • de9e234929 [osm] adding alternate civil parish description to the UK Al 2016-10-16 22:04:17 -04:00
  • 876f575040 [geonames] adding 5 borough exceptions Al 2016-10-16 21:31:20 -04:00
  • 093e7ed120 [fix] city districts in Košice, Slovakia Al 2016-10-15 01:47:37 -04:00
  • 049b3c9ce1 [boundaries] city/wards for Dar es Salaam + admin_center Al 2016-10-15 01:47:14 -04:00
  • c4848b113d [geonames] unindenting overrides in GeoNames configs Al 2016-10-15 01:46:46 -04:00
  • c39cfec218 [boundaries] Dar es Salaam=city, wards=city_district in Tanzania Al 2016-10-15 01:40:00 -04:00
  • 876fdd11fa [fix] country/language codes in formatting config Al 2016-10-12 15:51:31 -04:00
  • 9fb936019a [geoplanet] script to create GeoPlanet postal codes training data Al 2016-10-12 15:05:45 -04:00
  • 1e6a00c573 [fix] place in UK that was parented by a postal_code Al 2016-10-12 15:00:33 -04:00
  • 1d25f08b52 [expand] adding a function to check if two place names/addresses are equivalent after token normalization (replacing hyphens, deleting final periods, lowercasing, simple transliteration, etc.) and taking into account abbreviations from any specified libpostal dictionaries. In conjunction with place name affixes, useful in data sets like GeoPlanet or GeoNames to determine if a name variant is related to the original or not Al 2016-10-12 14:55:59 -04:00
  • f8664b0deb [formatting] making regex-based tests during insert_component optional.If exact_order=True, insert the given component directly before/after the reference component, otherwise for components that already exist in the template only need to care about relative position. Adding a method to determine if template language is important for a particular country/language pair. Al 2016-10-12 14:42:15 -04:00
  • 3db6b7fbf1 [dictionaries] adding new abbreviations for Sankt in German and Scandinavian languages Al 2016-10-11 18:05:11 -04:00
  • 2663b81670 [address_formatting] caching parsed templates from pystache yields about a 2.5x speedup per call, should shave off several hours of CPU time for large training sets Al 2016-10-11 15:36:49 -04:00
  • 2314acef1b [geoplanet] bypassing Québec as a county (just city and state) Al 2016-10-11 02:33:27 -04:00
  • 02fc172b5c [geoplanet] abbreviations for UK and NYC, fixing country codes for IM, GG and JE Al 2016-10-11 02:11:26 -04:00
  • 6ff1024c02 [fix] null candidate languages Al 2016-10-07 19:49:32 -04:00
  • 30074524d8 [fix] return empty list for languages in country_and_languages Al 2016-10-07 18:57:22 -04:00
  • 29698781cb [boundaries] making Kingston parish a city and only using the name Kingston, just so the parser doesn't have to disambiguate between references to the parish vs. the city, both referred to as Kingston Al 2016-10-07 18:52:42 -04:00
  • ff7fec6ed1 [osm/polygons] need to include id/type in polygon properties now that they're getting added earlier in the pipeline Al 2016-10-07 01:21:02 -04:00
  • 169a3c3d70 [osm] drop postcode as well for address-only format Al 2016-10-07 01:10:16 -04:00