Al
|
ff8986a287
|
[phrases] trie_new_from_hash compresses a {str: uint32_t} hashtable into a trie in sorted order
|
2015-10-04 18:28:21 -04:00 |
|
Al
|
55a5a79b4b
|
[tokenization] tokenized string with source
|
2015-10-04 18:27:04 -04:00 |
|
Al
|
aa39c45b87
|
[tokenization] skipping control characters in tokenization, comes up in OSM surprisingly
|
2015-10-04 18:25:50 -04:00 |
|
Al
|
d6480d2902
|
[utils] Adding ksort for strings by default in collections.h
|
2015-10-04 18:23:42 -04:00 |
|
Al
|
db63e6dbc3
|
[fix] making ksort methods static
|
2015-10-04 18:23:09 -04:00 |
|
Al
|
ed51fce291
|
[fix] Safe to assume Bokmål for Norwegian street addresses
|
2015-10-04 11:19:43 -04:00 |
|
Al
|
cfa57c96a3
|
[fix] untagged formatted addresses
|
2015-10-04 02:02:59 -04:00 |
|
Al
|
89d0fd5718
|
[fix] Alpha-numeric splitting
|
2015-10-03 16:40:10 -04:00 |
|
Al
|
6428c0ae20
|
[utils] cstring_array_cat
|
2015-10-03 16:00:13 -04:00 |
|
Al
|
5d2a24872a
|
[osm] Adding dependencies so single street names are not valid without at least one of {house, number, suburb, city, postcode}
|
2015-10-03 15:22:26 -04:00 |
|
Al
|
77be2fe433
|
[osm] Adjusting priors for country code expansion
|
2015-10-03 15:13:16 -04:00 |
|
Al
|
0b98a26426
|
[fix] keeping name tag in address components
|
2015-10-03 15:10:14 -04:00 |
|
Al
|
0f9ad259dc
|
[osm] Doing initial formatting after replacing country/state
|
2015-10-03 14:40:38 -04:00 |
|
Al
|
71233c9c02
|
[fix] import, initialization
|
2015-10-03 14:37:08 -04:00 |
|
Al
|
85b17d9b27
|
[fix] file encoding
|
2015-10-03 14:34:29 -04:00 |
|
Al
|
1948aa87ea
|
[fix] typo
|
2015-10-03 14:33:45 -04:00 |
|
Al
|
22efce7337
|
[osm/parsing] Randomly replacing country codes with local and foreign language expansions as well as randomly expanding state abbreviations to make parser more robust to different input
|
2015-10-03 14:31:51 -04:00 |
|
Al
|
8920812055
|
[expansion] Adding state abbreviations for US, Canada and Australia for expansion while generating OSM training data
|
2015-10-03 14:25:30 -04:00 |
|
Al
|
7eb18f3538
|
[languages] Function to sample a random language from a discrete distribution (e.g. languages on the Internet, languages in a country, etc.)
|
2015-10-03 13:20:23 -04:00 |
|
Al
|
0aa6950b6c
|
[fix] abbreviations
|
2015-10-02 23:48:21 -04:00 |
|
Al
|
db71b65412
|
[fix] checking validity of component combination
|
2015-10-02 20:28:45 -04:00 |
|
Al
|
a2fd6e25f8
|
[fix] import
|
2015-10-02 20:25:48 -04:00 |
|
Al
|
49abb70b59
|
[fix] dictionary
|
2015-10-02 20:24:21 -04:00 |
|
Al
|
521f33d892
|
[fix] bitset for address components, only looking at valid component keys
|
2015-10-02 20:21:59 -04:00 |
|
Al
|
528285f735
|
[fix] only OSM tagged addresses need extra logic
|
2015-10-02 20:18:30 -04:00 |
|
Al
|
83aecb9f2c
|
[osm/parsing] Making tagged training data for address parser more robust to the types of partial input we see in geocoding by randomly eliminating components subject to some constraints (e.g. house number cannot be used without a street name)
|
2015-10-02 19:54:28 -04:00 |
|
Al
|
c790a2b87f
|
[fix] spoken/official
|
2015-10-02 19:50:11 -04:00 |
|
Al
|
db3364be30
|
[geonames] Using official country languages in GeoNames
|
2015-10-01 02:21:14 -04:00 |
|
Al
|
01856dd36d
|
[fix] acronyms
|
2015-10-01 00:24:04 -04:00 |
|
Al
|
562aeb497d
|
[tokenization] Regenerating scanner.c
|
2015-09-30 11:32:38 -04:00 |
|
Al
|
689b830ad2
|
[tokenization] Acronym vs abbreviation
|
2015-09-30 04:10:04 -04:00 |
|
Al
|
7dfbcce9ec
|
[languages] options for get_country_languages
|
2015-09-30 04:09:07 -04:00 |
|
Al
|
86e9166ae8
|
[doc] doumentation for country_names module, fixing variable name
|
2015-09-30 03:08:04 -04:00 |
|
Al
|
42e77cb570
|
[countries] Making country official names align better with OSM/Wikipedia, plugging holes
|
2015-09-30 01:03:03 -04:00 |
|
Al
|
0cedc68a97
|
[languages] Changing Arabic to default in North African countries with two official languages. Making Danish secondary in the US Virgin Islands
|
2015-09-30 01:01:42 -04:00 |
|
Al
|
40cf247655
|
[formatting] Constants for field names, a few options in format_address
|
2015-09-29 23:03:37 -04:00 |
|
Al
|
22e8178a97
|
[countries] Adding module for getting official country names in every language from CLDR + a dictionary of local language names
|
2015-09-29 21:10:38 -04:00 |
|
Al
|
c3c6a18df8
|
[geodb] Renaming geodb
|
2015-09-29 13:07:50 -04:00 |
|
Al
|
8ca22247f9
|
[fix] labels in averaged perceptron trainer
|
2015-09-29 13:07:07 -04:00 |
|
Al
|
6666f0baf8
|
[fix] Labels in averaged perceptron tagger
|
2015-09-29 13:06:34 -04:00 |
|
Al
|
05da2ee6bd
|
[dictionaries] Adding commonly used colon form No: for Turkish addresses
|
2015-09-28 17:48:19 -04:00 |
|
Al
|
daad1a1313
|
[geonames] Removing alternate names from geonames data set which are digits-only (most are not legitimate)
|
2015-09-28 17:46:53 -04:00 |
|
Al
|
12816d0e95
|
[api] Setting global objects to NULL on teardown
|
2015-09-28 17:27:57 -04:00 |
|
Al
|
abfa744d59
|
[build] Adding libpostal_data script for downloading data from S3, Makefile uses that now as part of the all-local target. Can be run periodically after install
|
2015-09-28 17:26:15 -04:00 |
|
Al
|
f29f2f091b
|
[fix] PEBCAK
|
2015-09-27 22:49:27 -04:00 |
|
Al
|
93b3110a49
|
[fix] only commas and hyphens need to be eliminated at the end of phrases in untagged address formatting
|
2015-09-27 19:25:34 -04:00 |
|
Al
|
d3bfaf6b43
|
[osm/formatting] Fixing formatting tagged addresses with comma separated fields
|
2015-09-27 03:19:23 -04:00 |
|
Al
|
d512201e2c
|
[fix] removing space from tokens in address formatting
|
2015-09-27 02:18:34 -04:00 |
|
Al
|
a3214b7914
|
[readme] Readme fixes and additions
|
2015-09-26 23:32:19 -04:00 |
|
Al
|
5b829cd5a7
|
[fix] blank values containing punctuation in formatting
|
2015-09-26 21:49:28 -04:00 |
|