Commit Graph

887 Commits

Author SHA1 Message Date
Al
1948aa87ea [fix] typo 2015-10-03 14:33:45 -04:00
Al
22efce7337 [osm/parsing] Randomly replacing country codes with local and foreign language expansions as well as randomly expanding state abbreviations to make parser more robust to different input 2015-10-03 14:31:51 -04:00
Al
8920812055 [expansion] Adding state abbreviations for US, Canada and Australia for expansion while generating OSM training data 2015-10-03 14:25:30 -04:00
Al
7eb18f3538 [languages] Function to sample a random language from a discrete distribution (e.g. languages on the Internet, languages in a country, etc.) 2015-10-03 13:20:23 -04:00
Al
0aa6950b6c [fix] abbreviations 2015-10-02 23:48:21 -04:00
Al
db71b65412 [fix] checking validity of component combination 2015-10-02 20:28:45 -04:00
Al
a2fd6e25f8 [fix] import 2015-10-02 20:25:48 -04:00
Al
49abb70b59 [fix] dictionary 2015-10-02 20:24:21 -04:00
Al
521f33d892 [fix] bitset for address components, only looking at valid component keys 2015-10-02 20:21:59 -04:00
Al
528285f735 [fix] only OSM tagged addresses need extra logic 2015-10-02 20:18:30 -04:00
Al
83aecb9f2c [osm/parsing] Making tagged training data for address parser more robust to the types of partial input we see in geocoding by randomly eliminating components subject to some constraints (e.g. house number cannot be used without a street name) 2015-10-02 19:54:28 -04:00
Al
c790a2b87f [fix] spoken/official 2015-10-02 19:50:11 -04:00
Al
db3364be30 [geonames] Using official country languages in GeoNames 2015-10-01 02:21:14 -04:00
Al
01856dd36d [fix] acronyms 2015-10-01 00:24:04 -04:00
Al
562aeb497d [tokenization] Regenerating scanner.c 2015-09-30 11:32:38 -04:00
Al
689b830ad2 [tokenization] Acronym vs abbreviation 2015-09-30 04:10:04 -04:00
Al
7dfbcce9ec [languages] options for get_country_languages 2015-09-30 04:09:07 -04:00
Al
86e9166ae8 [doc] doumentation for country_names module, fixing variable name 2015-09-30 03:08:04 -04:00
Al
42e77cb570 [countries] Making country official names align better with OSM/Wikipedia, plugging holes 2015-09-30 01:03:03 -04:00
Al
0cedc68a97 [languages] Changing Arabic to default in North African countries with two official languages. Making Danish secondary in the US Virgin Islands 2015-09-30 01:01:42 -04:00
Al
40cf247655 [formatting] Constants for field names, a few options in format_address 2015-09-29 23:03:37 -04:00
Al
22e8178a97 [countries] Adding module for getting official country names in every language from CLDR + a dictionary of local language names 2015-09-29 21:10:38 -04:00
Al
c3c6a18df8 [geodb] Renaming geodb 2015-09-29 13:07:50 -04:00
Al
8ca22247f9 [fix] labels in averaged perceptron trainer 2015-09-29 13:07:07 -04:00
Al
6666f0baf8 [fix] Labels in averaged perceptron tagger 2015-09-29 13:06:34 -04:00
Al
05da2ee6bd [dictionaries] Adding commonly used colon form No: for Turkish addresses 2015-09-28 17:48:19 -04:00
Al
daad1a1313 [geonames] Removing alternate names from geonames data set which are digits-only (most are not legitimate) 2015-09-28 17:46:53 -04:00
Al
12816d0e95 [api] Setting global objects to NULL on teardown 2015-09-28 17:27:57 -04:00
Al
abfa744d59 [build] Adding libpostal_data script for downloading data from S3, Makefile uses that now as part of the all-local target. Can be run periodically after install 2015-09-28 17:26:15 -04:00
Al
f29f2f091b [fix] PEBCAK 2015-09-27 22:49:27 -04:00
Al
93b3110a49 [fix] only commas and hyphens need to be eliminated at the end of phrases in untagged address formatting 2015-09-27 19:25:34 -04:00
Al
d3bfaf6b43 [osm/formatting] Fixing formatting tagged addresses with comma separated fields 2015-09-27 03:19:23 -04:00
Al
d512201e2c [fix] removing space from tokens in address formatting 2015-09-27 02:18:34 -04:00
Al
a3214b7914 [readme] Readme fixes and additions 2015-09-26 23:32:19 -04:00
Al
5b829cd5a7 [fix] blank values containing punctuation in formatting 2015-09-26 21:49:28 -04:00
Al
dac0440be8 [fix] rsplit 2015-09-26 21:07:54 -04:00
Al
e255ae0e09 [dictionaries] Luxembourgish dictionaries 2015-09-26 18:31:07 -04:00
Al
3fe56d029d [dictionaries] German Swiss dictionaries 2015-09-26 18:30:55 -04:00
Al
ae93552455 [osm/formatting] Moving back to openvenues repo pending resolution of the Turkish address issue 2015-09-26 03:56:52 -04:00
Al
0c792a2cc3 [osm/formatting] Changing the way the formatter elimiates inter-component separators, changing repo back to OpenCageData after pull request merge 2015-09-26 03:21:26 -04:00
Al
856198a352 [tokenization] Regenerated scanner.c 2015-09-26 02:27:45 -04:00
Al
07f1f361e2 [transliteration] Regenerating transliteration data with new categories 2015-09-26 00:07:39 -04:00
Al
172263af58 [tokenization] Adding updated token classes to scanner.re 2015-09-26 00:05:23 -04:00
Al
5417b4e602 [unicode] Downloading latest UnicodeData.txt instead of using builtin Python module (out of date) e.g. for getting unicode codepoint categories 2015-09-25 23:59:38 -04:00
Al
8fe791a14a [fix] ensure_dir in file downloads 2015-09-25 17:05:22 -04:00
Al
646b9f7248 [osm/formatting] Continuing to use openvenues formatter for the India fix 2015-09-25 13:36:24 -04:00
Al
5a6b47d0fd [api] Adding LIBPOSTAL_DEFAULT_OPTIONS to libpostal.h 2015-09-25 01:53:29 -04:00
Al
f5bb72c6f5 [readme] missed a dictionary type 2015-09-24 23:32:36 -04:00
Al
f243b9cfa6 [fix] phrasing 2015-09-24 23:30:03 -04:00
Al
dc31019604 [readme] Heading 2015-09-24 23:20:23 -04:00