Al
e1c6eff5e2
[fix] var
2016-12-05 18:46:49 -05:00
Al
da36b71829
[addresses] adding new places index in OSM and OpenAddresses training data
2016-12-05 18:36:17 -05:00
Al
628fecea59
[addresses] adding point-based city/equivalent reverse geocoding for places that don't have as many defined polygons in OSM
2016-12-05 18:30:46 -05:00
Al
8509fe3ac0
[dictionaries] English dictionary fix
2016-12-05 18:24:27 -05:00
Al
f87f0df717
[places] adding generic place index for reverse geocoding to points
2016-12-05 02:05:54 -05:00
Al
e32c232c67
[localities] /planet-neighborhoods/planet-localities/
2016-12-04 23:05:11 -05:00
Al
cca80b046c
[abbreviation] fixing abbreviations within hyphenated phrases, particularly for prefix/suffix matches
2016-12-03 17:55:11 -05:00
Al
22c4e99ea0
[parser] As part of reading/tokenizing the address parser data set,
...
several copies of the same training example will be generated.
1. with only lowercasing
2. with simple Latin-ASCII normalization (no umlauts, only things that
are common to all languages)
3. basic UTF-8 normalizations (accent stripping)
4. language-specific Latin-ASCII transliteration (e.g. ü => ue in German)
This will apply both on the initial passes when building the phrase
gazetteers and during each iteration of training. In this way, only the
most basic normalizations like lowercasing need to be done at runtime
and it's possible to use only minimal normalizations like lowercasing.
May have a small effect on randomization as examples are created in a
deterministic order. However, this should not lead to cycles since the
base examples are shuffled, thus still satisfying the random permutation
requirement of an online/stochastic learning algorithm.
2016-12-02 13:09:03 -05:00
Al
adab232674
[osm] don't include rail stations with no venue phrases (if there's a railway station at Foo, only include it if it's named "Foo Station", not just plain "Foo")
2016-12-01 02:03:38 -05:00
Al
7bfee0c1d3
[openaddresses] adding some of the new/fixed counties from upstream OA
2016-12-01 01:42:16 -05:00
Al
87d2463c9f
[openaddresses] adding Texas statewide
2016-11-30 21:42:04 -08:00
Al
4b35da629f
[numex] regenerated numex data file
2016-11-30 15:58:55 -08:00
Al
4677874610
[parser] stripping postal codes of phrases like CP (in Spanish) before adding them to the gazetteers, whether it's concatenated or a separate token. Adding a command-line argument for the number of iterations
2016-11-30 15:58:03 -08:00
Al
0e29cdd9fd
[parser] fixing some uninitialized value issues during parser training
2016-11-30 15:42:09 -08:00
Al
f5a6bd0f36
[fix] sparse_matrix_new_from_matrix uses new matrix types
2016-11-30 10:15:12 -08:00
Al
b639fa5127
[utils] string_replace also creates a copy
2016-11-30 10:09:33 -08:00
Al
5a7e73e2a1
[openaddresses] adding Plumas County, CA and York County, PA
2016-11-29 11:38:19 -08:00
Al
1b18fd0b36
[openaddresses] adding Tehama County, CA
2016-11-28 16:25:26 -08:00
Al
89f6611c4e
[strings] string_trim makes a copy rather than modifying the pointer
2016-11-28 15:06:07 -08:00
Al
d922d9a60a
[expansion] regenerated address_expansion_data.c
2016-11-28 10:47:15 -08:00
Al
f67ebe8711
[openaddresses] Tulare County, CA
2016-11-28 10:02:10 -08:00
Al
f78281456a
[fix] header defintion
2016-11-27 01:00:25 -08:00
Al
eea11beb6a
[expansion] using easier-to-access data structure for address dictionaries
2016-11-27 00:56:48 -08:00
Al
7de2aa21cd
[boundaries] increasing state probability for Venezuela and India
2016-11-26 16:20:08 -08:00
Al
df803ab53d
[openaddresses] Ward County, ND
2016-11-26 12:09:17 -08:00
Al
ef243fbb18
[fix] var name
2016-11-25 13:41:07 -08:00
Al
cdbc102821
[boundaries] in addition to population, check if a city has an unambiguous Wikipedia
2016-11-25 13:36:49 -08:00
Al
78c1a40708
[boundaries] more UK exceptions (strategy here is not to confuse the parser with districts that share names with cities. Only affects addresses with no city already specified and would be similar to listing a postal city)
2016-11-25 02:52:42 -08:00
Al
08b420cd1f
[boundaries] a few more UK exceptions
2016-11-25 02:23:04 -08:00
Al
06fcb73a08
[boundaries] two more exceptions in Wales and Scotland
2016-11-25 02:16:07 -08:00
Al
f8fc59e384
[boundaries] a few more UK exceptions for non-metropolitan districts which can basically be regarded as cities
2016-11-25 02:01:20 -08:00
Al
eda4358a01
[fix] exception for Eastleigh, UK
2016-11-25 01:28:05 -08:00
Al
f171d849ca
[boundaries] exception for Chichester, UK
2016-11-25 01:17:10 -08:00
Al
87634a36e1
[openaddresses] for cases where city populations are not known (i.e. not getting boundaries from OSM, most of the sources in OpenAddresses), place-only records should have at least two identifying components. Helps when city names, etc. are highly ambiguous and need to be qualified
2016-11-25 00:56:38 -08:00
Al
5118bbc6b9
[places] lower dropout probabilities for country field
2016-11-25 00:44:52 -08:00
Al
5c3ccc3bc6
[places] better handling of population exceptions in places config
2016-11-25 00:38:49 -08:00
Al
4e10dc47e1
[places] adding a few more exceptions to places config, making state/country required for smaller cities
2016-11-24 23:42:16 -08:00
Al
5a8ea5c3b9
[openaddresses] fix for Arapahoe County, CO. Had CO listed as the city
2016-11-24 04:18:46 -05:00
Al
89cacbdb0e
[openaddresses] El Paso County, CO
2016-11-24 04:03:21 -05:00
Al
e07c74f077
[fix] config
2016-11-24 03:57:52 -05:00
Al
f72b576f39
[fix] indentation
2016-11-24 03:55:37 -05:00
Al
46b7043dc7
[fix] typo
2016-11-24 03:50:11 -05:00
Al
da882a4195
[names] adding "District Municipality of" to ignorable prefixes
2016-11-24 03:48:54 -05:00
Al
1e1f00670b
[openaddresses] fixing some cities that I thought were counties
2016-11-24 03:45:52 -05:00
Al
fcf4717335
[openaddresses] adding city_replacements handling to OA formatter
2016-11-23 20:16:48 -05:00
Al
1ccca9086f
[openaddresses] add city_replacements for all files using OSM boundaries (replace with known county or city)
2016-11-23 16:14:25 -05:00
Al
3dc2a922fb
[addresses/languages] if there's only one default language and we don't have a road name or a unicode script to disambiguate, assume the default (e.g. English in the US unless there's a Spanish/French road name). Can affect things like state abbreviations
2016-11-22 18:27:54 -05:00
Al
3c5e2afeed
[boundaries] exception for Cardiff, Wales
2016-11-22 13:16:21 -05:00
Al
49054932ad
[boundaries] adding city replacements for South Africa
2016-11-22 12:05:14 -05:00
Al
ee6edbbd91
[countries] take first encountered country code instead of reversing the components (for cases like Puerto Rico, Hong Kong, etc.)
2016-11-22 11:55:41 -05:00