Al
|
cd6a0ab90b
|
[geodb] Prefixing features with name for geo disambiguation (better trie compression) and removing the longer geohash prefix features
|
2015-10-09 15:16:08 -04:00 |
|
Al
|
77c4bb10c6
|
[utils] Adding kh_foreach_key
|
2015-10-09 11:51:32 -04:00 |
|
Al
|
151161cab3
|
[fix] Raising error in geonames output if a country cannot be localized
|
2015-10-07 03:45:56 -04:00 |
|
Al
|
1917816b80
|
[countries] Not relying on pycountry alpha 2 codes for localized country names as it doesn't contain Kosovo which was causing problems
|
2015-10-07 03:44:49 -04:00 |
|
Al
|
1e98932b82
|
[fix] setting array->n after reading in both graph and sparse_matrix implementations
|
2015-10-06 19:28:28 -04:00 |
|
Al
|
5a231fb709
|
[graph] Builder for graphs not constructed in vertex-sorted order
|
2015-10-06 19:03:10 -04:00 |
|
Al
|
4984352eda
|
[graph] Simple sparse graph implementation, essentially a sparse matrix with no values array
|
2015-10-06 18:58:18 -04:00 |
|
Al
|
3084fc929b
|
[geodb] Was missing country boundary type in GeoDB causing some misses in parsing
|
2015-10-06 16:01:22 -04:00 |
|
Al
|
5af6dc77d1
|
[dictionaries] Adding a few additional abbreviated names of political leaders that come up, a missing abbreviation
|
2015-10-06 15:09:50 -04:00 |
|
Al
|
5f03bc9369
|
[fix] Unit dictionaries apply to ADDRESS_UNIT component
|
2015-10-06 12:04:31 -04:00 |
|
Al
|
91f4e477ad
|
[fix] typo
|
2015-10-06 12:04:07 -04:00 |
|
Al
|
0eb9ef5bdf
|
[tokenization] Regenerating scanner.c
|
2015-10-05 01:41:48 -04:00 |
|
Al
|
50a36cc595
|
[parser] using trie_new_from_hash instead of an inline implemention in averaged perceptron training
|
2015-10-04 18:31:16 -04:00 |
|
Al
|
ff8986a287
|
[phrases] trie_new_from_hash compresses a {str: uint32_t} hashtable into a trie in sorted order
|
2015-10-04 18:28:21 -04:00 |
|
Al
|
55a5a79b4b
|
[tokenization] tokenized string with source
|
2015-10-04 18:27:04 -04:00 |
|
Al
|
aa39c45b87
|
[tokenization] skipping control characters in tokenization, comes up in OSM surprisingly
|
2015-10-04 18:25:50 -04:00 |
|
Al
|
d6480d2902
|
[utils] Adding ksort for strings by default in collections.h
|
2015-10-04 18:23:42 -04:00 |
|
Al
|
db63e6dbc3
|
[fix] making ksort methods static
|
2015-10-04 18:23:09 -04:00 |
|
Al
|
ed51fce291
|
[fix] Safe to assume Bokmål for Norwegian street addresses
|
2015-10-04 11:19:43 -04:00 |
|
Al
|
cfa57c96a3
|
[fix] untagged formatted addresses
|
2015-10-04 02:02:59 -04:00 |
|
Al
|
89d0fd5718
|
[fix] Alpha-numeric splitting
|
2015-10-03 16:40:10 -04:00 |
|
Al
|
6428c0ae20
|
[utils] cstring_array_cat
|
2015-10-03 16:00:13 -04:00 |
|
Al
|
5d2a24872a
|
[osm] Adding dependencies so single street names are not valid without at least one of {house, number, suburb, city, postcode}
|
2015-10-03 15:22:26 -04:00 |
|
Al
|
77be2fe433
|
[osm] Adjusting priors for country code expansion
|
2015-10-03 15:13:16 -04:00 |
|
Al
|
0b98a26426
|
[fix] keeping name tag in address components
|
2015-10-03 15:10:14 -04:00 |
|
Al
|
0f9ad259dc
|
[osm] Doing initial formatting after replacing country/state
|
2015-10-03 14:40:38 -04:00 |
|
Al
|
71233c9c02
|
[fix] import, initialization
|
2015-10-03 14:37:08 -04:00 |
|
Al
|
85b17d9b27
|
[fix] file encoding
|
2015-10-03 14:34:29 -04:00 |
|
Al
|
1948aa87ea
|
[fix] typo
|
2015-10-03 14:33:45 -04:00 |
|
Al
|
22efce7337
|
[osm/parsing] Randomly replacing country codes with local and foreign language expansions as well as randomly expanding state abbreviations to make parser more robust to different input
|
2015-10-03 14:31:51 -04:00 |
|
Al
|
8920812055
|
[expansion] Adding state abbreviations for US, Canada and Australia for expansion while generating OSM training data
|
2015-10-03 14:25:30 -04:00 |
|
Al
|
7eb18f3538
|
[languages] Function to sample a random language from a discrete distribution (e.g. languages on the Internet, languages in a country, etc.)
|
2015-10-03 13:20:23 -04:00 |
|
Al
|
0aa6950b6c
|
[fix] abbreviations
|
2015-10-02 23:48:21 -04:00 |
|
Al
|
db71b65412
|
[fix] checking validity of component combination
|
2015-10-02 20:28:45 -04:00 |
|
Al
|
a2fd6e25f8
|
[fix] import
|
2015-10-02 20:25:48 -04:00 |
|
Al
|
49abb70b59
|
[fix] dictionary
|
2015-10-02 20:24:21 -04:00 |
|
Al
|
521f33d892
|
[fix] bitset for address components, only looking at valid component keys
|
2015-10-02 20:21:59 -04:00 |
|
Al
|
528285f735
|
[fix] only OSM tagged addresses need extra logic
|
2015-10-02 20:18:30 -04:00 |
|
Al
|
83aecb9f2c
|
[osm/parsing] Making tagged training data for address parser more robust to the types of partial input we see in geocoding by randomly eliminating components subject to some constraints (e.g. house number cannot be used without a street name)
|
2015-10-02 19:54:28 -04:00 |
|
Al
|
c790a2b87f
|
[fix] spoken/official
|
2015-10-02 19:50:11 -04:00 |
|
Al
|
db3364be30
|
[geonames] Using official country languages in GeoNames
|
2015-10-01 02:21:14 -04:00 |
|
Al
|
01856dd36d
|
[fix] acronyms
|
2015-10-01 00:24:04 -04:00 |
|
Al
|
562aeb497d
|
[tokenization] Regenerating scanner.c
|
2015-09-30 11:32:38 -04:00 |
|
Al
|
689b830ad2
|
[tokenization] Acronym vs abbreviation
|
2015-09-30 04:10:04 -04:00 |
|
Al
|
7dfbcce9ec
|
[languages] options for get_country_languages
|
2015-09-30 04:09:07 -04:00 |
|
Al
|
86e9166ae8
|
[doc] doumentation for country_names module, fixing variable name
|
2015-09-30 03:08:04 -04:00 |
|
Al
|
42e77cb570
|
[countries] Making country official names align better with OSM/Wikipedia, plugging holes
|
2015-09-30 01:03:03 -04:00 |
|
Al
|
0cedc68a97
|
[languages] Changing Arabic to default in North African countries with two official languages. Making Danish secondary in the US Virgin Islands
|
2015-09-30 01:01:42 -04:00 |
|
Al
|
40cf247655
|
[formatting] Constants for field names, a few options in format_address
|
2015-09-29 23:03:37 -04:00 |
|
Al
|
22e8178a97
|
[countries] Adding module for getting official country names in every language from CLDR + a dictionary of local language names
|
2015-09-29 21:10:38 -04:00 |
|