Commit Graph

  • f218e43adc [names] remove short_name from Sao Paulo Al 2017-01-12 03:27:50 -05:00
  • 122d7b2b79 [fix] only using the revised address components for CLDR country name Al 2017-01-12 02:33:16 -05:00
  • 88a80f4e30 [fix] using normalized tags throughout in OSM formatted place data Al 2017-01-12 02:25:17 -05:00
  • eff46bd55a [fix] Ukraine country code Al 2017-01-12 02:12:46 -05:00
  • 01ef0a42dc [places] adding state_district more often in UK for population < 10000 Al 2017-01-12 01:34:21 -05:00
  • aa142d0311 [openaddresses] adding Brasil census data sets for 26 states, currently just city and postcode for Distrito Federal while another source is being investigated Al 2017-01-12 00:44:22 -05:00
  • 1f6eac5dae [boundaries] use neighbourhood with the u Al 2017-01-12 00:43:13 -05:00
  • 09b3aeb7d9 [fix] component Al 2017-01-11 16:50:54 -05:00
  • ed5dd28023 [addresses] adding some more synonyms to Brasilia street regex Al 2017-01-11 16:31:30 -05:00
  • bec569adaa [osm] adding new validity check to venue names so if the Jaccard(name tokens, street & house numer tokens) == 1 and the address does not have a known venue type e.g. a restaurant, the "venue name" is actually just the street address and can be discarded Al 2017-01-11 16:23:42 -05:00
  • 7f851810d2 [addresses] formatting addresses in Brasilia, so e.g. "Bloco B" is never part of the street name or building name, it's the house number. place=neighbourhood maps to nothing in Brasilia as these are basically subdivisions whose streets are identically named Al 2017-01-11 16:17:35 -05:00
  • 0d030a98c5 [osm] adding airport polygon index Al 2017-01-11 04:25:54 -05:00
  • d528095984 [addresses] adding random unit numbers with more digits Al 2017-01-11 04:24:29 -05:00
  • 979fd16215 [osm] adding airports and terminals data sets with points and polygons, more file cleanup in OSM fetch script Al 2017-01-10 16:20:28 -05:00
  • 4bdfe5ba1d [openaddresses] add Habersham County, GA Al 2017-01-10 16:19:31 -05:00
  • 49fdb29e16 [openaddresses] add Swedish municipalities of Malmö, Vaxholm, Vaxjö, and Helsingborg Al 2017-01-10 07:48:08 -05:00
  • 577f26e418 Merge pull request #154 from openvenues/setup_datadir_functions Al Barrentine 2017-01-09 16:52:07 -05:00
  • bbc91722cb [version] bump version to 0.3.3 Al 2017-01-09 16:14:07 -05:00
  • a3506131fe [build] adding libpostal_setup_datadir, libpostal_setup_parser_datadir, libpostal_setup_language_classifier_datadir functions for configuring the datadir at runtime Al 2017-01-09 16:11:26 -05:00
  • 953a26e54e [utils] char_array_add_vjoined to stay consistent (add_* methods NUL termiante) Al 2017-01-09 14:42:36 -05:00
  • 7a8f94330b [parser] only adding ngrams in a hyphenated word if the subword is not rare Al 2017-01-09 02:53:33 -05:00
  • 00cf936460 [openaddresses] adding Nordrhein-Westfalen, Germany Al 2017-01-08 12:48:45 -05:00
  • 86c7b7f3fe [addresses] no longer normalizing slashes in boundary names for places that have multilingual names, etc. Al 2017-01-08 12:41:45 -05:00
  • a6d94f998b [addresses] stripping parentheticals in admin boundary names as sometimes cities in e.g. Switzerland are like Oberwil (ZG) in OSM Al 2017-01-08 03:43:22 -05:00
  • e10c156176 [dictionaries] adding BL as an abbreviation for Boulevard Al 2017-01-07 20:22:03 -05:00
  • 828b67d4f7 [osm] adding some new training data for simple road names and their surrounding admin boundaries Al 2017-01-07 15:34:43 -05:00
  • a2b84a0177 [docs][ci skip] Adding parser label definitions to the README Al Barrentine 2017-01-07 14:17:31 -05:00
  • 83e38d9a8c [openaddresses] add OSM boundaries for Milwaukee county as many of the cities appear to be IDs Al 2017-01-07 01:42:46 -05:00
  • eab629802c [openaddresses] removing pre_release_downloads as they're all in master now, adding city_replacements for all data sets where OSM boundaries are used Al 2017-01-07 01:39:11 -05:00
  • 69f1137532 [openaddresses] adding city_replacements for Lake County, FL Al 2017-01-07 00:35:12 -05:00
  • c025b0f7d4 [openaddresses] adding correct state for Glarus, Switzerland, ignoring city in Milwaukee if it's purely numeric Al 2017-01-07 00:01:46 -05:00
  • d51f9dbb0e [addresses] stripping unit phrases from streets in OpenAddresses as well, return value wasn't getting used before Al 2017-01-06 10:19:01 -05:00
  • cfdef1788c [addresses] stripping unit from street using the libpostal dictionaries in all the address data sets. Happens surprisingly often in OpenStreetMap as well as OpenAddresses Al 2017-01-06 09:19:01 -05:00
  • 3fbd4426b7 [openaddresses] adding Swiss cantons of Grigioni/Graubünden, Glarus, Uri, and Schwyz Al 2017-01-06 08:55:32 -05:00
  • 9c14d47f24 [openaddresses] adding Cambell and Pendleton County KY and San Benito County, CA Al 2017-01-06 02:41:29 -05:00
  • 2b3a6f663e Merge pull request #152 from rinigus/master_rpc_malloc Al Barrentine 2017-01-05 17:12:51 -05:00
  • 321f2034d2 [fix] unidata file Al 2017-01-05 04:24:33 -05:00
  • 7a31802a04 [fix] also fix german-ascii transliteration on uppercase U with umlaut Al 2017-01-05 04:07:29 -05:00
  • 25723fcea2 [transliteration] making the custom rules in transliteration less repetitious and accessible from elsewhere, removing string names for common transliterators and using constants Al 2017-01-05 04:06:51 -05:00
  • 3fcaae3dbc [openaddresses] add Canton of Solothurn, Switzerland Al 2017-01-05 02:23:20 -05:00
  • 4182123fa6 [openaddresses] adding Schaffhausen, also adding language=de for the last few cantons Al 2017-01-05 01:40:30 -05:00
  • 72e6bf043b [openaddresses] add Basel-Stadt, Switzerland Al 2017-01-05 01:26:20 -05:00
  • 3d16c20d24 [openaddresses] add Boyd County, KY Al 2017-01-05 01:25:41 -05:00
  • 26aeb0ebec drop AC_FUNC_MALLOC and _REALLOC and check for them as regular functions; add extra cflags for scanner Rinigus 2017-01-05 07:34:24 +02:00
  • c5cca4c82f [openaddresses] add Canton of Basel-Landschaft, Switzerland Al 2017-01-04 02:34:15 -05:00
  • 3e7042597e [openaddresses] adding Jamaica countrywide to OpenAddresses config Al 2017-01-04 02:32:41 -05:00
  • bcd61ffbe8 [formatting] moving postcode to the beginning of the address only in countries using the continental European conventions. Creates more ambiguity than is worthwhile in the US, etc. when, say, house_number is removed from a training example and the postcode is inserted first (could very easily be a house_number) Al 2017-01-03 03:32:15 -05:00
  • 38e147d210 [fix] address configs for Greek/Hebrew Al 2017-01-03 03:07:53 -05:00
  • de2dffa315 [addresses] adding Calle to purely numeric Spanish street names in OSM as well Al 2017-01-02 23:41:01 -05:00
  • ccd555d020 [transliteration] regenerated transliteration_scripts_data.c Al 2017-01-02 13:52:48 -05:00
  • 600b40d2f6 [transliteration] adding german-ascii transliteration to Estonian to handle umlauts (ä => ae, etc.) Al 2017-01-02 13:51:56 -05:00
  • b2b7f6f155 [osm] add wikipedia:* to rail station exception Al 2017-01-02 13:13:42 -05:00
  • a99a1e759e [openaddresses] adding Rio de Janeiro, Stockholm, and Liechtenstein. Adding higher CLDR country probability for smaller countries Al 2017-01-02 03:29:36 -05:00
  • 77035fbdbd [strings] adding utf8_is_whitespace to the header so it can be referenced from multiple files Al 2017-01-02 02:23:21 -05:00
  • 400ea589ef [normalize] add NORMALIZE_STRING_SIMPLE_LATIN_ASCII option to pynormalize Al 2017-01-02 02:08:54 -05:00
  • 182976214c [logging] converting most of the steps in building the transliteration table to use debug logging Al 2017-01-02 00:41:11 -05:00
  • d8d3840700 [transliteration] constant for the html-escape transliterator Al 2017-01-02 00:40:12 -05:00
  • 4ad3a52fe1 [strings] fix lowercasing in string_utils.c Al 2017-01-01 20:08:32 -05:00
  • a78937f265 [normalize] use the new utf8proc lowercasing (as opposed to case folding), free copies since none of the string functions operate in-place any more, add minimal HTML escaping transliterator even to ASCII text Al 2017-01-01 20:06:32 -05:00
  • 5c56a44faa [strings] reverting to utf8proc v1.3.1, as 2.0 and above can chop off certain sequences Al 2017-01-01 20:03:23 -05:00
  • fe88630f78 [dictionaries] regenerating address_expansion_data.c from upstream changes Al 2017-01-01 14:26:54 -05:00
  • 101bbcc02d Merge remote-tracking branch 'origin/master' into parser-data Al 2017-01-01 14:25:37 -05:00
  • d61e90a33d [auto][ci skip] Adding data files from Travis build #188 Travis 2017-01-01 19:20:54 +00:00
  • 6048d6a71e Merge pull request #149 from iestynpryce/master Al Barrentine 2017-01-01 14:11:16 -05:00
  • 0b5cc96654 [transliteration] add decompose option when stripping accents Al 2017-01-01 13:54:17 -05:00
  • 7d6c85aeec [fix] new string tree iterator, don't decrement permutations on rollovers Al 2017-01-01 13:34:08 -05:00
  • 1780c5e053 [fix] moving enum Al 2016-12-31 03:54:12 -05:00
  • d8ee43156e Enhanced the Welsh (cy) language dictionaries. Iestyn Pryce 2016-12-31 09:46:58 +00:00
  • 475aa3dbfa [strings] fixing and simplifying string tree iterator. This version is inspired by Python's itertools.product (itertoolsmodule.c has so many goodies) Al 2016-12-31 03:22:17 -05:00
  • 261ec3888a [strings] header changes for new utf8 lower/upper functions Al 2016-12-31 03:20:43 -05:00
  • 58b063b632 [strings] making string_tree_iterator_done more meaningful (returns true if the iterator has no paths left to traverse) Al 2016-12-31 00:54:36 -05:00
  • 8978000320 [strings] adding latest utf8proc, new functions for utf8_lower (instead of case folding) and utf8_upper, and a utf8_is_whitespace that takes things like tabs into account Al 2016-12-31 00:52:12 -05:00
  • db16e656ca [parser/cli] adding .print_features option in address_parser client for debugging Al 2016-12-31 00:20:35 -05:00
  • bdb51a244e [phrases] fix case in trie search when searching for tokens in a string tail. If we're on the last token in a sequenence and the token matches the tail, check that the tail is complete, and if so return the match before exiting the loop. Affects multiword phrases that tend to appear toward the end of a sequence (long country names like "United States of America", etc.) Al 2016-12-29 16:15:33 -05:00
  • 2d077699e6 [places] adding is_in property to the set of tags for the places index. This may allow us to make more granular exceptions for node-based places that are actually suburbs but classified as {hamlet, village, locality, town}, etc. if the is_in contains a city that's also a boundary or nearby point Al 2016-12-29 14:04:13 -05:00
  • cad57b94b2 [boundaries] mapping place=hamlet to suburb for all of Malaysia. place=village becomes suburb as well in the urban core Al 2016-12-29 14:01:57 -05:00
  • 21a2a7419a [addresses] only add village as city component if no city can be found in the area Al 2016-12-29 13:41:05 -05:00
  • 8080e16791 [openaddresses] adding Joinville, Brasil and adding OSM boundaries for Brasilian address data sets Al 2016-12-29 13:27:49 -05:00
  • 0b6947840c [dictionaries] removing Belarusian place_names.txt Al 2016-12-29 03:24:57 -05:00
  • 05732f6718 [build] Makefile changes for new parser feature extraction Al 2016-12-29 02:39:13 -05:00
  • 091167ed3c [api] remove geodb from libpostal.c Al 2016-12-29 02:35:43 -05:00
  • acd953ce51 [parser] first pass at new parser feature extraction Al 2016-12-29 02:17:05 -05:00
  • e62101b8bf [parser] remove geodb from address_parser_test, sort confusion matrix Al 2016-12-29 02:14:40 -05:00
  • 174529e8d0 [parser] remove geodb and fix small memory leak in address_parser_train Al 2016-12-29 02:12:06 -05:00
  • bde5fdfaad [merge] merging in master Al 2016-12-29 02:00:31 -05:00
  • 646d96e13e Merge remote-tracking branch 'origin/master' into parser-data Al 2016-12-29 01:58:38 -05:00
  • a26a01ece3 [openaddresses] adding SEMCOG counties, MI Al 2016-12-28 19:37:44 -05:00
  • 22b4a215f4 [places] additional form for West Indies Al 2016-12-28 17:58:32 -05:00
  • f58ebbdf7f [fix] var name Al 2016-12-28 14:37:00 -05:00
  • 7ee44a584b [fix] genitive case for Russian/Ukrainian toponyms, not locative (#125) Al 2016-12-28 14:34:20 -05:00
  • e6e4b28e43 [addresses] making the город/г. prefix apply to the Russian language rather than the country Al 2016-12-28 13:26:19 -05:00
  • f995fdf9d2 [fix] default None Al 2016-12-28 05:09:15 -05:00
  • 3dc6a69bf5 [openaddresses] adding locative names in OpenAddresses as well, which contains some Ukraine data sets Al 2016-12-28 04:59:55 -05:00
  • 91013fe296 [fix] moving checks inside the add_locatives function, fixing float cast Al 2016-12-28 04:59:27 -05:00
  • 6f009fb8a6 [addresses] adding pymorphy2 for converting Russian and Ukrainian place names (sticking with state and staet_district for the moment) to the locative case as mentioned in #125 Al 2016-12-28 04:48:32 -05:00
  • e91907a21b [boundaries] actually, the urban okrugs/districts seem to function more like neighborhoods in St Petersburg and Moscow, calling the raions city_district and the okrugs suburb Al 2016-12-28 01:36:11 -05:00
  • 6c35eb9e65 [auto][ci skip] Adding data files from Travis build #186 Travis 2016-12-28 06:29:35 +00:00
  • a86d6d5528 [merge] merging in master Al 2016-12-28 01:11:04 -05:00
  • 47c3b0091b Merge pull request #147 from Komzpa/patch-1 Al Barrentine 2016-12-28 01:08:48 -05:00
  • e23951a90f [dictionaries] new Ukrainian place names dictionary from http://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases/UK Al 2016-12-28 01:08:01 -05:00