Commit Graph

911 Commits

Author SHA1 Message Date
Al
1e98932b82 [fix] setting array->n after reading in both graph and sparse_matrix implementations 2015-10-06 19:28:28 -04:00
Al
5a231fb709 [graph] Builder for graphs not constructed in vertex-sorted order 2015-10-06 19:03:10 -04:00
Al
4984352eda [graph] Simple sparse graph implementation, essentially a sparse matrix with no values array 2015-10-06 18:58:18 -04:00
Al
3084fc929b [geodb] Was missing country boundary type in GeoDB causing some misses in parsing 2015-10-06 16:01:22 -04:00
Al
5af6dc77d1 [dictionaries] Adding a few additional abbreviated names of political leaders that come up, a missing abbreviation 2015-10-06 15:09:50 -04:00
Al
5f03bc9369 [fix] Unit dictionaries apply to ADDRESS_UNIT component 2015-10-06 12:04:31 -04:00
Al
91f4e477ad [fix] typo 2015-10-06 12:04:07 -04:00
Al
0eb9ef5bdf [tokenization] Regenerating scanner.c 2015-10-05 01:41:48 -04:00
Al
50a36cc595 [parser] using trie_new_from_hash instead of an inline implemention in averaged perceptron training 2015-10-04 18:31:16 -04:00
Al
ff8986a287 [phrases] trie_new_from_hash compresses a {str: uint32_t} hashtable into a trie in sorted order 2015-10-04 18:28:21 -04:00
Al
55a5a79b4b [tokenization] tokenized string with source 2015-10-04 18:27:04 -04:00
Al
aa39c45b87 [tokenization] skipping control characters in tokenization, comes up in OSM surprisingly 2015-10-04 18:25:50 -04:00
Al
d6480d2902 [utils] Adding ksort for strings by default in collections.h 2015-10-04 18:23:42 -04:00
Al
db63e6dbc3 [fix] making ksort methods static 2015-10-04 18:23:09 -04:00
Al
ed51fce291 [fix] Safe to assume Bokmål for Norwegian street addresses 2015-10-04 11:19:43 -04:00
Al
cfa57c96a3 [fix] untagged formatted addresses 2015-10-04 02:02:59 -04:00
Al
89d0fd5718 [fix] Alpha-numeric splitting 2015-10-03 16:40:10 -04:00
Al
6428c0ae20 [utils] cstring_array_cat 2015-10-03 16:00:13 -04:00
Al
5d2a24872a [osm] Adding dependencies so single street names are not valid without at least one of {house, number, suburb, city, postcode} 2015-10-03 15:22:26 -04:00
Al
77be2fe433 [osm] Adjusting priors for country code expansion 2015-10-03 15:13:16 -04:00
Al
0b98a26426 [fix] keeping name tag in address components 2015-10-03 15:10:14 -04:00
Al
0f9ad259dc [osm] Doing initial formatting after replacing country/state 2015-10-03 14:40:38 -04:00
Al
71233c9c02 [fix] import, initialization 2015-10-03 14:37:08 -04:00
Al
85b17d9b27 [fix] file encoding 2015-10-03 14:34:29 -04:00
Al
1948aa87ea [fix] typo 2015-10-03 14:33:45 -04:00
Al
22efce7337 [osm/parsing] Randomly replacing country codes with local and foreign language expansions as well as randomly expanding state abbreviations to make parser more robust to different input 2015-10-03 14:31:51 -04:00
Al
8920812055 [expansion] Adding state abbreviations for US, Canada and Australia for expansion while generating OSM training data 2015-10-03 14:25:30 -04:00
Al
7eb18f3538 [languages] Function to sample a random language from a discrete distribution (e.g. languages on the Internet, languages in a country, etc.) 2015-10-03 13:20:23 -04:00
Al
0aa6950b6c [fix] abbreviations 2015-10-02 23:48:21 -04:00
Al
db71b65412 [fix] checking validity of component combination 2015-10-02 20:28:45 -04:00
Al
a2fd6e25f8 [fix] import 2015-10-02 20:25:48 -04:00
Al
49abb70b59 [fix] dictionary 2015-10-02 20:24:21 -04:00
Al
521f33d892 [fix] bitset for address components, only looking at valid component keys 2015-10-02 20:21:59 -04:00
Al
528285f735 [fix] only OSM tagged addresses need extra logic 2015-10-02 20:18:30 -04:00
Al
83aecb9f2c [osm/parsing] Making tagged training data for address parser more robust to the types of partial input we see in geocoding by randomly eliminating components subject to some constraints (e.g. house number cannot be used without a street name) 2015-10-02 19:54:28 -04:00
Al
c790a2b87f [fix] spoken/official 2015-10-02 19:50:11 -04:00
Al
db3364be30 [geonames] Using official country languages in GeoNames 2015-10-01 02:21:14 -04:00
Al
01856dd36d [fix] acronyms 2015-10-01 00:24:04 -04:00
Al
562aeb497d [tokenization] Regenerating scanner.c 2015-09-30 11:32:38 -04:00
Al
689b830ad2 [tokenization] Acronym vs abbreviation 2015-09-30 04:10:04 -04:00
Al
7dfbcce9ec [languages] options for get_country_languages 2015-09-30 04:09:07 -04:00
Al
86e9166ae8 [doc] doumentation for country_names module, fixing variable name 2015-09-30 03:08:04 -04:00
Al
42e77cb570 [countries] Making country official names align better with OSM/Wikipedia, plugging holes 2015-09-30 01:03:03 -04:00
Al
0cedc68a97 [languages] Changing Arabic to default in North African countries with two official languages. Making Danish secondary in the US Virgin Islands 2015-09-30 01:01:42 -04:00
Al
40cf247655 [formatting] Constants for field names, a few options in format_address 2015-09-29 23:03:37 -04:00
Al
22e8178a97 [countries] Adding module for getting official country names in every language from CLDR + a dictionary of local language names 2015-09-29 21:10:38 -04:00
Al
c3c6a18df8 [geodb] Renaming geodb 2015-09-29 13:07:50 -04:00
Al
8ca22247f9 [fix] labels in averaged perceptron trainer 2015-09-29 13:07:07 -04:00
Al
6666f0baf8 [fix] Labels in averaged perceptron tagger 2015-09-29 13:06:34 -04:00
Al
05da2ee6bd [dictionaries] Adding commonly used colon form No: for Turkish addresses 2015-09-28 17:48:19 -04:00