5140db536a[phrases] additions to venue names dictionaries and a more restrictive version of street types dictionaries
Al
2016-11-19 02:58:27 -05:00
71be0fdfbc[fix] sets
Al
2016-11-19 02:30:40 -05:00
b6f7b5b577[fix] name
Al
2016-11-19 01:38:15 -05:00
de9bf29af0[addresses] allowing osm_components argument to AddressComponents.expanded
Al
2016-11-19 01:38:02 -05:00
1df1b60a9f[phrases] adding extract_phrases method to gazetteers, which returns a set of gazetteer phrases found in a given string
Al
2016-11-18 23:35:44 -05:00
8ef8d88186[fix] don't short-circuit OSM address formatting unless there are no components and no venue names
Al
2016-11-18 23:31:24 -05:00
25ceeed6ef[fix] check before pop
Al
2016-11-18 18:36:35 -05:00
7a89c6e9ce[osm] removing dependencies for house/venue name (purely numeric names taken care of in osm formatter)
Al
2016-11-18 18:32:44 -05:00
ca89a6ca2e[fix] args
Al
2016-11-18 18:09:48 -05:00
72305975eb[openaddresses] adding Nelson Mandela Bay as a pre-release download
Al
2016-11-18 18:00:42 -05:00
6e73d46097[fix] typo
Al
2016-11-18 00:50:18 -05:00
4e30a23313[addresses] Adding toponym abbreviation to the input admin components as well as those obtained through reverse geocoding. Also was doing two random tests before abbreviating toponyms, reducing their frequency in the training data, now correctly using a single test.
Al
2016-11-17 19:53:09 -05:00
a9fdfee2ac[polygons] adding optional test_point for complex polygons with an admin_center, and including admin_center lat/lon as part of the properties
Al
2016-11-17 19:36:32 -05:00
c2ccec70ad[polygons] adding lat/lon props to admin centers
Al
2016-11-17 19:21:31 -05:00
71d535e845[polygons] using try/except in polygons
Al
2016-11-17 17:38:54 -05:00
d701bb1320[polygons] only applying the new fix-on-read solution in the OSM admin/subdivision indices
Al
2016-11-17 00:33:06 -05:00
c1d4b03bb4[polygons] moving polygon fixes to the to_polygon method so they get applied both at ingestion and on cache load
Al
2016-11-16 23:25:48 -05:00
a25ae7f9ef[osm/polygons] adding fixed version of a polygon if polygon is invalid and doesn't contain its centroid
Al
2016-11-16 17:38:01 -05:00
0421b8b17c[boundaries] Reading, UK
Al
2016-11-16 03:48:21 -05:00
9c5321d240[boundaries] Bedford, UK
Al
2016-11-16 03:45:50 -05:00
749e495482[boundaries] Nottingham, UK
Al
2016-11-16 03:37:21 -05:00
b5464f842b[boundaries] converting admin_level=10 to city in the UK and Ireland
Al
2016-11-16 03:20:38 -05:00
4a0ed7c703[boundaries] adding a few more city boundary exceptions to England and Scotland
Al
2016-11-16 02:55:30 -05:00
e85a1b906a[fix] East Asian probabilities
Al
2016-11-16 02:54:56 -05:00
3617b3a10c[fix] recursive merge for entries that are empty dictionaries
Al
2016-11-16 02:14:28 -05:00
b03494a736[boundaries] adding admin_level=6 as cities in West Midlands (county), UK
Al
2016-11-16 01:53:03 -05:00
07f41a7565[boundaries] adding York as city in UK (listed as admin_level=6)
Al
2016-11-16 01:35:59 -05:00
c5a48b4cd3[fix] East Asian system po_box probabilities
Al
2016-11-16 01:26:31 -05:00
15b66f541c[fix] refactor to use ComponentDependencies class
Al
2016-11-15 17:07:10 -05:00
68ab69cdc3[fix] alias in formatter config
Al
2016-11-15 17:04:15 -05:00
dc65f518a5[openaddresses] adding new US counties from OpenAddresses
Al
2016-11-15 02:32:00 -05:00
67f409cdf6[places] adding dependencies to admin components e.g. so in some countries city_district must be accompanied by a city, etc.
Al
2016-11-15 02:31:15 -05:00
96fb725e54[formatting] adding po_box insertions for East Asian addresses
Al
2016-11-14 18:29:44 -05:00
653b2d09c0[addresses] moving component dependency graphs to a new module
Al
2016-11-14 16:45:15 -05:00
495b27470e[addresses] refactoring address component dependency graphs
Al
2016-11-12 18:09:36 -05:00
b42159205d[openaddresses] adding some of the new US counties from OA
Al
2016-11-10 10:27:11 -05:00
7c9c600e07[openaddresses] add new counties from upstream
Al
2016-11-06 00:06:59 -04:00
a6c88f54ab[openaddresses] add Forsyth county
Al
2016-11-02 23:59:50 -04:00
7cdccbe31f[openaddresses] adding fixed sources in ID
Al
2016-10-31 11:22:48 -04:00
353c6c7b7a[openaddresses] adding Jefferson County, AL
Al
2016-10-28 10:58:04 -04:00
e9106698d2[fix] convert newlines
Al
2016-10-27 12:01:48 -04:00
e48f207d10[openaddresses] updating with new OpenAddresses sources
Al
2016-10-27 11:19:30 -04:00
5cabd9b4f7[fix] country languages in OpenAddresses
Al
2016-10-24 17:35:39 -04:00
ac0eb1776e[openaddresses] adding Brazoria County, TX
Al
2016-10-24 09:27:05 -04:00
35d3d8cc73[openaddresses] countries are known a priori, so if the boundaries don't quite line up with OSM, use the country from the path
Al
2016-10-23 19:50:54 -04:00
f429bea15b[fix] subtract abs value
Al
2016-10-23 01:11:09 -04:00
1658c425c5[fix] clear country cache only at each new country, not each file
Al
2016-10-23 00:57:52 -04:00
7199ff17e0[fix] truncate postcodes that are longer than specified length
Al
2016-10-23 00:52:24 -04:00
3934111cdf[openaddresses] 5-digit postcodes in a few counties
Al
2016-10-23 00:51:43 -04:00
889e914dfc[openaddresses] clear all polygon caches
Al
2016-10-23 00:11:54 -04:00
ec54d3de35[fix] don't convert number to int/float in numeric_phrase (chops leading zeros)
Al
2016-10-22 23:49:58 -04:00
63edd53fb3[openaddresses] adding clear_cache method to clear the LRU cache for point-in-polygon indices and using it in OpenAddresses import since it heavily reuses polygons and only for the current file
Al
2016-10-22 20:28:59 -04:00
d51a1d6196[addresses] doing hyphenation for existing components in component expansion (i.e. OSM training data)
Al
2016-10-21 22:02:15 -04:00
0216a991c6[formatting] use US template insertions for Canada as well
Al
2016-10-21 14:43:40 -04:00
2a355b2cf8[openaddresses] adding address only 10% of the time in OpenAddresses
Al
2016-10-20 23:57:30 -04:00
dfbc4bf144[openaddresses] no add_osm_boundaries for two of the recent Washington editions, only reverse geocode to OSM when no city is given
Al
2016-10-20 22:46:29 -04:00
d965ea9371[openaddresses] adding hyphenation/dehyphenation to the OpenAddresses formatter
Al
2016-10-20 20:55:17 -04:00
00ebdfed7f[osm] adding alt_place_names to the shared formatting class AddressComponents and making them classmethods
Al
2016-10-20 20:41:22 -04:00
d9bc465c82[osm] parsing out semicolon-delimited postal codes from OSM in countries like Poland that use hyphen delimited postcodes without treating them as number ranges
Al
2016-10-19 17:46:37 -04:00
91e6ca0942[osm] adding a number of Australian city council boundaries
Al
2016-10-19 16:33:42 -04:00
cec8168279[osm] adding council and city council to ignorable place name suffixes
Al
2016-10-19 16:33:04 -04:00
ec77a247fa[fix] just ignore records without the "name" tag
Al
2016-10-19 13:36:15 -04:00
61078eded9[fix] checking for dictionary key
Al
2016-10-19 13:34:13 -04:00
c2b73307de[fix] parens
Al
2016-10-19 13:29:56 -04:00
f639151698[osm] checking for non-admin_center nodes which are part of a lower admin level polygon with the same name
Al
2016-10-19 13:27:33 -04:00
e380567ac4[osm] adding alt_place_names method which does hyphenation, de-hyphenation and abbreviated toponyms with/without hyphens
Al
2016-10-19 02:19:09 -04:00
51afc2619b[fix] only replace whitespace between words, not for instance whitespace around an existing hyphen, and reducing to one space for spaced hyphens
Al
2016-10-19 01:23:58 -04:00
78f341f4f1[osm] higher probability of hyphenation
Al
2016-10-19 01:10:06 -04:00
e8899eafd6[osm] adding hyphenation/de-hyphenation to OSM admin components
Al
2016-10-19 01:00:29 -04:00
98ac232eea[osm] hyphenating and de-hyphenating place names in places training data
Al
2016-10-19 00:33:10 -04:00
562caba31c[openaddresses] adding new counties in Washington
Al
2016-10-19 00:30:50 -04:00
72e7d3ff5b[addresses/hyphens] adding some methods to hyphenate/dehyphenate place names at random
Al
2016-10-18 19:10:31 -04:00
7e007a49ab[osm] removing place=district mapping globally (means city_district in Hungary) and mapping it specifically to state_district/city_district in the places where it's needed
Al
2016-10-18 19:02:36 -04:00
9384d8cc7e[osm] adding exception for Vienna
Al
2016-10-18 02:52:02 -04:00
d4f4b716a0[openaddresses] adding new counties in Oregon
Al
2016-10-18 00:44:45 -04:00
fc9ed13bc5[boundaries] adding Community Development Council and CDC as removable suffixes for Singapore
Al
2016-10-17 16:04:17 -04:00
d34faf42b8[osm] fix names with pipes in them
Al
2016-10-17 02:32:25 -04:00
a796b41d90[geonames] admin codes on geonames/postal_codes tables
Al
2016-10-17 00:21:33 -04:00
ff27ee14bb[osm] only add label props if the name property is identical (counterexample, Nottinghamshire's label is listed as West Bridgford, which is really its admin_center)
Al
2016-10-16 22:18:52 -04:00
de9e234929[osm] adding alternate civil parish description to the UK
Al
2016-10-16 22:04:17 -04:00
876f575040[geonames] adding 5 borough exceptions
Al
2016-10-16 21:31:20 -04:00
093e7ed120[fix] city districts in Košice, Slovakia
Al
2016-10-15 01:47:37 -04:00
049b3c9ce1[boundaries] city/wards for Dar es Salaam + admin_center
Al
2016-10-15 01:47:14 -04:00
c4848b113d[geonames] unindenting overrides in GeoNames configs
Al
2016-10-15 01:46:46 -04:00
c39cfec218[boundaries] Dar es Salaam=city, wards=city_district in Tanzania
Al
2016-10-15 01:40:00 -04:00
876fdd11fa[fix] country/language codes in formatting config
Al
2016-10-12 15:51:31 -04:00
9fb936019a[geoplanet] script to create GeoPlanet postal codes training data
Al
2016-10-12 15:05:45 -04:00
1e6a00c573[fix] place in UK that was parented by a postal_code
Al
2016-10-12 15:00:33 -04:00
1d25f08b52[expand] adding a function to check if two place names/addresses are equivalent after token normalization (replacing hyphens, deleting final periods, lowercasing, simple transliteration, etc.) and taking into account abbreviations from any specified libpostal dictionaries. In conjunction with place name affixes, useful in data sets like GeoPlanet or GeoNames to determine if a name variant is related to the original or not
Al
2016-10-12 14:55:59 -04:00
f8664b0deb[formatting] making regex-based tests during insert_component optional.If exact_order=True, insert the given component directly before/after the reference component, otherwise for components that already exist in the template only need to care about relative position. Adding a method to determine if template language is important for a particular country/language pair.
Al
2016-10-12 14:42:15 -04:00
3db6b7fbf1[dictionaries] adding new abbreviations for Sankt in German and Scandinavian languages
Al
2016-10-11 18:05:11 -04:00
2663b81670[address_formatting] caching parsed templates from pystache yields about a 2.5x speedup per call, should shave off several hours of CPU time for large training sets
Al
2016-10-11 15:36:49 -04:00
2314acef1b[geoplanet] bypassing Québec as a county (just city and state)
Al
2016-10-11 02:33:27 -04:00
02fc172b5c[geoplanet] abbreviations for UK and NYC, fixing country codes for IM, GG and JE
Al
2016-10-11 02:11:26 -04:00
6ff1024c02[fix] null candidate languages
Al
2016-10-07 19:49:32 -04:00
30074524d8[fix] return empty list for languages in country_and_languages
Al
2016-10-07 18:57:22 -04:00
29698781cb[boundaries] making Kingston parish a city and only using the name Kingston, just so the parser doesn't have to disambiguate between references to the parish vs. the city, both referred to as Kingston
Al
2016-10-07 18:52:42 -04:00
ff7fec6ed1[osm/polygons] need to include id/type in polygon properties now that they're getting added earlier in the pipeline
Al
2016-10-07 01:21:02 -04:00
169a3c3d70[osm] drop postcode as well for address-only format
Al
2016-10-07 01:10:16 -04:00