Commit Graph

  • 6388a79bf0 [addresses] strip "-", etc. in addr:housenumber Al 2016-12-21 01:53:23 -05:00
  • c33db4f04d [addresses] normalize existing sub-building components Al 2016-12-21 01:28:43 -05:00
  • 3b14613f1d [fix] restore original house number for subsequent formatting after addr:conscriptionnumber/addr:streetnumber Al 2016-12-21 00:51:44 -05:00
  • 484c7ef912 [osm] adding addresses with addr:conscriptionnumber and addr:streetnumber when available Al 2016-12-21 00:36:40 -05:00
  • eafafab959 [addresses] adding function to generate phrases for addr:conscriptionnumber in OSM, e.g. č.p. 123 in the Czech Republic Al 2016-12-21 00:35:39 -05:00
  • 63006a0c8b [dictionaries] adding súpisné číslo (s.č.) in Slovak Al 2016-12-20 21:38:32 -05:00
  • 010db088ae [dictionaries] adding Konskriptionsnummer for some addresses in Austria and Germany Al 2016-12-20 21:37:42 -05:00
  • f7aebdc2ed [dictionaries] adding číslo popisné (č.p.) in Czech Al 2016-12-20 21:36:42 -05:00
  • cc4098fb05 [openaddresses] abbreviate states as well in OpenAddresses when full version is specified Al 2016-12-20 17:24:06 -05:00
  • 1cba89a99b [addresses] higher state abbreviation probability for places that use abbreviations Al 2016-12-20 16:53:59 -05:00
  • 8845609962 [openaddresses] same for the rest of the multiword abbreviated states (except bilingual multiword provinces in Canada where we'll stick to the most common abbreviated form, which gets expanded to the unabbreviated province) Al 2016-12-20 16:53:27 -05:00
  • c3db5eb1e0 [openaddresses] add full state name for Distrito Federal so all the abbreviations get considered Al 2016-12-20 16:46:14 -05:00
  • 21202869b8 [openaddresses] adding Grafschaft Bentheim, Germany and Tirol, Austria Al 2016-12-20 12:33:21 -05:00
  • cd25ca1537 [names] replace name affixes with both country/language and language-only variants Al 2016-12-20 03:10:13 -05:00
  • 9e44fcb2bb [addresses] abbreviating neighborhoods/city_districts Al 2016-12-20 03:01:34 -05:00
  • 53723bbf3d [fix] passing argument through to normalized_place_name Al 2016-12-20 02:21:36 -05:00
  • 7ff290e14c [openaddresses] adding Gatineau QC, Owensboro KY, Madison KY, St Clair MI, and Grand Forks ND Al 2016-12-20 02:17:59 -05:00
  • 2ab584ac0b [states] adding more multiword state abbreviations Al 2016-12-20 02:16:42 -05:00
  • 11444ffa34 [places] adding higher probability of city_district in Mexico (for boroughs of Mexico City) Al 2016-12-20 01:43:21 -05:00
  • 6d02fbb9b8 [addresses] switch for phrases that come from components so they only get stripped if they contain another phrase a la Washington, D.C. Consolidating always_use_full_names and random_key options Al 2016-12-20 01:42:40 -05:00
  • e35636ed77 [boundaries] higher probability for city_district in the UK (London) Al 2016-12-19 02:34:06 -05:00
  • 56ca37d1f3 [fix] openaddresses config reading Al 2016-12-19 02:18:24 -05:00
  • f2720db2f8 [osm] adding simple street name normalization for certain streets in OSM that also contain the house number (only when separated by commas and in a country/language where house number comes after street). There are other cases for normalization but need to better define them. Al 2016-12-19 02:13:39 -05:00
  • ff32321425 [formatter] adding house_number_before_road method to AddressFormatter Al 2016-12-19 02:00:06 -05:00
  • f35fd97735 [boundaries] add abbreviated state names to valid component names Al 2016-12-19 00:51:05 -05:00
  • c3dfd6530f [openaddresses] adding Skagit County, WA, USA Al 2016-12-19 00:15:06 -05:00
  • d02a18a5a8 [fix] all_names, use values instead of name keys Al 2016-12-18 17:29:15 -05:00
  • e9c7bc43e3 [fix] check fixed list of keys in all_names as well Al 2016-12-18 17:26:43 -05:00
  • 2727572822 [addresses] using the name key disttribution in AddressComponents.all_names. Returning names and valid components from the new function instead of the full gazetteer (can be build later) Al 2016-12-18 17:22:13 -05:00
  • 954b6548bf [names] adding name_key_dist method to boundary names to account for certain boundaries like e.g. Kings County that have name exceptions Al 2016-12-18 17:20:03 -05:00
  • d308473686 [addresses] separating boundary phrase gazetteer construction into its own method Al 2016-12-18 15:47:16 -05:00
  • 585b203a4f [fix] /props/attrs/ Al 2016-12-18 15:32:09 -05:00
  • 82b26117aa [fix] name comparison in neighborhoods index Al 2016-12-18 15:27:21 -05:00
  • 3ac2c93e1c [utils] using renaming char_array_append_vjoined to char_array_add_vjoined to follow convention that add_* calls NUL-terminate while append_* calls do not Al 2016-12-18 15:26:58 -05:00
  • 8322e98ad3 [fix] var name II Al 2016-12-18 11:42:16 -05:00
  • 0c55bc3bb8 [fix] var name Al 2016-12-18 11:41:00 -05:00
  • e5657c5612 [fix] putting the neighborhoods check after the dupe threshold check, as it's not really needed until then anyway Al 2016-12-18 03:00:40 -05:00
  • 4314a6822d [fix] don't need to do two checks for OSM boundaries Al 2016-12-18 02:32:05 -05:00
  • 590246748f [fix] move OSM check to after ClickThatHood/Quattroshapes checks as we don't need to check the point if it doesn't match a neighborhood geometry. Should speed up neighborhood index construction Al 2016-12-18 02:27:50 -05:00
  • 0a1e69ee9b [fix] yaml config Al 2016-12-18 01:52:38 -05:00
  • 86a8315b9d [openaddresses] adding new config option to OA config for aliasing fields based on a regex Al 2016-12-18 01:50:58 -05:00
  • d357f0f37c [neighborhoods] check polygon boundaries in OSM neighborhood points for a name match at the city level or below Al 2016-12-18 01:42:34 -05:00
  • a2cf1a35df [openaddresses] aliasing Paris/Marseilles/Lyon arrondissements to city_district in OpenAddresses Al 2016-12-18 01:28:58 -05:00
  • fc57c437cb [boundaries] adding exceptions for Arrondissements in Paris, Marseilles and Lyon Al 2016-12-18 01:19:55 -05:00
  • 154a227285 [openaddresses] 5-digit postcodes for Spain, some are stored as integers stripping the initial zeros Al 2016-12-17 17:40:49 -05:00
  • 726ee2a299 [openaddresses] fixing state abbreviations for Mexico Al 2016-12-17 02:54:42 -05:00
  • 3ed95a175e [ngrams] adding function to extract an array of ngrams from a string, with optional special prefixes/suffixes for the edges Al 2016-12-17 01:33:18 -05:00
  • 3c6ed7489c [openaddresses] adding regex replacement to remove "*" from any field Al 2016-12-16 17:09:41 -05:00
  • f1a460b874 [openaddresses] adding state abbreviations for OA Switzerland Al 2016-12-16 15:56:42 -05:00
  • 10d4979f21 [states] adding Canton abbreviations for Switzerland Al 2016-12-16 15:54:08 -05:00
  • e99d76e750 [places] higher probability of adding Canton (state) for smaller cities in Switzerland Al 2016-12-16 15:53:42 -05:00
  • 05adbaca01 [places] add state_district (province) and state (region) in Italy more often Al 2016-12-16 14:49:15 -05:00
  • ba96f68b62 [fix] openaddresses formatter Al 2016-12-16 14:22:15 -05:00
  • d08e8d8dd3 [openaddresses] adding a value map for Italian province abbreviations in the countrywide file (they're commonly used in addresses and this may be a better place to handle that since the province names are given). Updating OpenAddresses config to use new dictionary field maps. Al 2016-12-16 06:57:05 -05:00
  • da3240d5f6 [openaddresses] making field maps in OpenAddresses config a dictionary rather than a list to make inheritance easier Al 2016-12-16 06:54:36 -05:00
  • 83aab5a46a [openaddresses] adding option to map values for a particular field Al 2016-12-16 06:44:19 -05:00
  • ae32645e0d [openaddresses] add city and state to Mexico City Al 2016-12-14 20:49:40 -05:00
  • 558cd2af2d [boundaries] adding a few more US non-city_districts as exceptions. Al 2016-12-14 17:53:10 -05:00
  • 846b88cde5 [addresses] let the place config take care of adding/removing neighborhoods rather than doing it as part of the add_neighborhoods method Al 2016-12-14 03:15:07 -05:00
  • 5946ead37f [addresses] using the defined component from the neighborhoods index for city_district (they're fairly rare, just NYC boroughs basically) Al 2016-12-14 03:10:02 -05:00
  • 026737cd3b [neighborhoods] adding component to neighborhoods index at construction time Al 2016-12-14 03:07:13 -05:00
  • 5846943b70 [addresses] removing place_type override requirement from the neighborhoods index (NYC boroughs, etc.) Al 2016-12-14 02:16:57 -05:00
  • 09f808ca47 [geoplanet] only add short postal codes to GeoPlanet data set if they match the Google regexes Al 2016-12-13 17:03:26 -05:00
  • 34db27b80c [openaddresses] Mendocino County, CA Al 2016-12-13 16:44:22 -05:00
  • 6b04711195 [neighborhoods] adjust cache size when building neighborhoods index Al 2016-12-13 16:11:42 -05:00
  • 40cd86c3be [addresses] only add city relacement if a city is not found first Al 2016-12-13 16:10:52 -05:00
  • 7e65661884 [openaddresses] Pierce County, WA Al 2016-12-13 14:03:16 -05:00
  • cd91068f0f [neighborhoods] fix neighborhoods index checks to include the borough points while still not making letting something like Santa Monica pass as a neighborhoods when it's a proper city Al 2016-12-13 02:28:59 -05:00
  • cb475d8245 [openaddresses] adding Sunshine Coast, BC and Sardegna, Italy Al 2016-12-12 17:42:43 -05:00
  • bcf6b3cc68 Merge pull request #137 from openvenues/fix_address_parser_train Al Barrentine 2016-12-12 11:54:16 -05:00
  • 8f1e69960f [fix] loading transliteration module in address_parser_test.c as well Al 2016-05-25 19:54:01 -04:00
  • 3939dd0ca6 [fix] cstring_array_split calls Al 2016-05-25 17:58:30 -04:00
  • a42d0e917a [fix] brace Al 2016-05-25 17:52:00 -04:00
  • ced8f9ae27 [parser] Ignore multiple spaces in parser input post-normalization. If normalizing the string creates several distinct tokens (namely in Vulgar fractions e.g. ½ => 1/2), add all the sub-tokens with the same label as the parent Al 2016-05-25 17:50:29 -04:00
  • b1816e9b70 [utils] Adding cstring_array_split_ignore_consecutive Al 2016-05-25 17:07:20 -04:00
  • 6baa7087fe [fix] calls and NULL checks Al 2016-05-25 15:50:53 -04:00
  • 5e07f5e8c5 [fix] tokenized_string_t should copy its source string Al 2016-05-25 15:47:57 -04:00
  • 521a094a47 [fix] Need to load transliteration module for Latin-ASCII normalization Al 2016-05-25 15:25:34 -04:00
  • d158751d92 [addresses] same rules for state_district apply to state, no alt_names etc. unless a city is present Al 2016-12-12 05:31:32 -05:00
  • bf3e9749ca [osm] during place formatting, add point-based cities for any places/polygons that are smaller than cities e.g. suburb or city_district, use admin_center as the point for reverse geocoding if available (instead of representative_point() which can be expensive or centroid which can be inaccurate) Al 2016-12-12 05:29:33 -05:00
  • 33dd9223dc [places] allowing state_district to depend on state in the US Al 2016-12-11 17:04:24 -05:00
  • 5d98f3115c [boundareis] adding two exceptions for admin_level=9 in US Al 2016-12-11 16:58:16 -05:00
  • da4fe37fb4 [addresses] option to add city points, no random keys for state_district if city or replacement is not present Al 2016-12-11 15:20:20 -05:00
  • dfc88a47b2 [fix] typo Al 2016-12-11 02:46:03 -05:00
  • e8abf44c16 [neighborhoods] check if there's no defined place-type before classifying a polygon as city_district Al 2016-12-11 02:44:02 -05:00
  • 01d6bc27b6 [fix] "District of" is only a valid prefix in the non-US Anglophone world Al 2016-12-11 02:11:51 -05:00
  • 9b95601e42 [states] adding abbreviations with internal periods for multi-word US states Al 2016-12-11 01:17:27 -05:00
  • fffc81a17a [fix] default value Al 2016-12-10 18:14:25 -05:00
  • 371198da3c [fix] typo Al 2016-12-10 18:14:11 -05:00
  • 91982528c6 [fix] normalize place names after adding admin boundaries as well Al 2016-12-10 18:07:41 -05:00
  • 34d3ae7e9e [addresses] fixing normalized_place_name so it deals with things like Washington DC where Washington DC may actually be one of the OSM names Al 2016-12-10 17:52:38 -05:00
  • 80ee34cc3a [text] adding normalization with whitespace Al 2016-12-10 17:50:53 -05:00
  • 4550f00f03 [fix] var name Al 2016-12-10 15:18:09 -05:00
  • 72771741c3 [fix] order Al 2016-12-10 15:16:35 -05:00
  • 8595d8da05 [addresses] don't add components to the trie that have the same normalized name as the given component Al 2016-12-10 15:12:40 -05:00
  • bb12d0940e [fix] options/docs in osm address training Al 2016-12-10 13:45:37 -05:00
  • ffc584f679 [states] adding all forms of the state abbreviation to the trie when doing place name normalization to handle the D.C./DC case Al 2016-12-10 13:45:22 -05:00
  • 5098599ed6 [addresses] remove Quattroshapes/GeoNames cities as they may have problematic names, and in any case we have point-based cities from OSM now Al 2016-12-10 02:08:33 -05:00
  • 18c5fd0855 [fix] check for non-None city Al 2016-12-10 01:23:06 -05:00
  • dc022f8652 [osm] adding normalized_place_name to Quattroshapes city Al 2016-12-10 01:17:38 -05:00