Commit Graph

4104 Commits

Author SHA1 Message Date
Al
763c86dcd4 [geoplanet] add County to the names of US counties outside of Louisiana and Alaska, add Parish in Lousiana 2016-12-08 02:30:37 -05:00
Al
7d0c402a31 [openaddresses] adding Douglas County and Paulding County in GA. Jackson County and Rankin County in MS 2016-12-08 02:26:39 -05:00
Al
c2c2822936 [openaddresses] adding today's changes from OpenAddresses 2016-12-07 17:51:24 -05:00
Al
55c2f18896 [dictionaries] adding US highway and US route expansions 2016-12-07 14:39:27 -05:00
Al
42861aa38c [names] adding New Zealand to places that normalize City as a suffix (not Australia though as it has some cities that actually do end in City) 2016-12-07 06:19:08 -05:00
Al
7436d9693a [names] adding new name_affixes call to replace both prefixes/suffixes in one call, using in GeoPlanet training and the generic AddressComponents normalizations 2016-12-07 05:49:16 -05:00
Al
9386a999f6 [names] adding country-specific affixes and only normalizing the word City as a suffix in UK/Ireland 2016-12-07 05:37:25 -05:00
Al
a9209fae37 [openaddresses] adding Kenton County, KY 2016-12-06 23:04:21 -05:00
Al
b69914ff18 [openaddresses] adding Kansas City, MO 2016-12-06 22:56:31 -05:00
Al
3ff472c8cf [openaddresses] fixing house numbers with multiple consecutive hyphens 2016-12-06 22:50:14 -05:00
Al
ae527ef5b1 [fix] indentation 2016-12-06 19:03:13 -05:00
Al
78615bf29c [places] higher probability of state_district for non-city Ireland 2016-12-06 18:15:38 -05:00
Al
fddf21d1c1 [boundaries] moving Ireland counties back to state_district, regions to state (as they're typically used as admin1 in ISO, etc.) 2016-12-06 17:05:29 -05:00
Al
aae8a8acf0 [boundaries] adding a few more common prefixes (looks like in Ireland it's common enough to remove the County prefix) 2016-12-06 17:04:09 -05:00
Al
fadf0ca66b [openaddresses] filename for Ward County, ND 2016-12-06 15:55:33 -05:00
Al
29590be406 [openaddresses] adding Kalmar, Sweden and Fribourg, Switzerland 2016-12-06 15:51:10 -05:00
Al
e13787a6f6 [fix] var name again 2016-12-05 18:49:23 -05:00
Al
e1c6eff5e2 [fix] var 2016-12-05 18:46:49 -05:00
Al
da36b71829 [addresses] adding new places index in OSM and OpenAddresses training data 2016-12-05 18:36:17 -05:00
Al
628fecea59 [addresses] adding point-based city/equivalent reverse geocoding for places that don't have as many defined polygons in OSM 2016-12-05 18:30:46 -05:00
Al
8509fe3ac0 [dictionaries] English dictionary fix 2016-12-05 18:24:27 -05:00
Al
f87f0df717 [places] adding generic place index for reverse geocoding to points 2016-12-05 02:05:54 -05:00
Al
e32c232c67 [localities] /planet-neighborhoods/planet-localities/ 2016-12-04 23:05:11 -05:00
Al
cca80b046c [abbreviation] fixing abbreviations within hyphenated phrases, particularly for prefix/suffix matches 2016-12-03 17:55:11 -05:00
Al
22c4e99ea0 [parser] As part of reading/tokenizing the address parser data set,
several copies of the same training example will be generated.

1. with only lowercasing
2. with simple Latin-ASCII normalization (no umlauts, only things that
are common to all languages)
3. basic UTF-8 normalizations (accent stripping)
4. language-specific Latin-ASCII transliteration (e.g. ü => ue in German)

This will apply both on the initial passes when building the phrase
gazetteers and during each iteration of training. In this way, only the
most basic normalizations like lowercasing need to be done at runtime
and it's possible to use only minimal normalizations like lowercasing.

May have a small effect on randomization as examples are created in a
deterministic order. However, this should not lead to cycles since the
base examples are shuffled, thus still satisfying the random permutation
requirement of an online/stochastic learning algorithm.
2016-12-02 13:09:03 -05:00
Al
adab232674 [osm] don't include rail stations with no venue phrases (if there's a railway station at Foo, only include it if it's named "Foo Station", not just plain "Foo") 2016-12-01 02:03:38 -05:00
Al
7bfee0c1d3 [openaddresses] adding some of the new/fixed counties from upstream OA 2016-12-01 01:42:16 -05:00
Al
87d2463c9f [openaddresses] adding Texas statewide 2016-11-30 21:42:04 -08:00
Al
4b35da629f [numex] regenerated numex data file 2016-11-30 15:58:55 -08:00
Al
4677874610 [parser] stripping postal codes of phrases like CP (in Spanish) before adding them to the gazetteers, whether it's concatenated or a separate token. Adding a command-line argument for the number of iterations 2016-11-30 15:58:03 -08:00
Al
0e29cdd9fd [parser] fixing some uninitialized value issues during parser training 2016-11-30 15:42:09 -08:00
Al
f5a6bd0f36 [fix] sparse_matrix_new_from_matrix uses new matrix types 2016-11-30 10:15:12 -08:00
Al
b639fa5127 [utils] string_replace also creates a copy 2016-11-30 10:09:33 -08:00
Al
5a7e73e2a1 [openaddresses] adding Plumas County, CA and York County, PA 2016-11-29 11:38:19 -08:00
Al
1b18fd0b36 [openaddresses] adding Tehama County, CA 2016-11-28 16:25:26 -08:00
Al
89f6611c4e [strings] string_trim makes a copy rather than modifying the pointer 2016-11-28 15:06:07 -08:00
Al
d922d9a60a [expansion] regenerated address_expansion_data.c 2016-11-28 10:47:15 -08:00
Al
f67ebe8711 [openaddresses] Tulare County, CA 2016-11-28 10:02:10 -08:00
Al
f78281456a [fix] header defintion 2016-11-27 01:00:25 -08:00
Al
eea11beb6a [expansion] using easier-to-access data structure for address dictionaries 2016-11-27 00:56:48 -08:00
Al
7de2aa21cd [boundaries] increasing state probability for Venezuela and India 2016-11-26 16:20:08 -08:00
Al
df803ab53d [openaddresses] Ward County, ND 2016-11-26 12:09:17 -08:00
Al
ef243fbb18 [fix] var name 2016-11-25 13:41:07 -08:00
Al
cdbc102821 [boundaries] in addition to population, check if a city has an unambiguous Wikipedia 2016-11-25 13:36:49 -08:00
Al
78c1a40708 [boundaries] more UK exceptions (strategy here is not to confuse the parser with districts that share names with cities. Only affects addresses with no city already specified and would be similar to listing a postal city) 2016-11-25 02:52:42 -08:00
Al
08b420cd1f [boundaries] a few more UK exceptions 2016-11-25 02:23:04 -08:00
Al
06fcb73a08 [boundaries] two more exceptions in Wales and Scotland 2016-11-25 02:16:07 -08:00
Al
f8fc59e384 [boundaries] a few more UK exceptions for non-metropolitan districts which can basically be regarded as cities 2016-11-25 02:01:20 -08:00
Al
eda4358a01 [fix] exception for Eastleigh, UK 2016-11-25 01:28:05 -08:00
Al
f171d849ca [boundaries] exception for Chichester, UK 2016-11-25 01:17:10 -08:00