Commit Graph

471 Commits

Author SHA1 Message Date
Al
080ccf0ddd [fix] logging warnings in transliterate 2015-10-12 13:50:42 -05:00
Al
baef090793 [logging] Wrapping logging statements in a do while (0) so the compiler always at least sees debug code 2015-10-12 13:42:10 -05:00
Al
b88f237d82 [build] Adding separate Makefile target for downloading geodb 2015-10-11 22:27:25 -05:00
Al
588cf1df86 [build] Changing options to libpostal_data script to allow downloading geodb, uploaded first version to S3 2015-10-11 22:25:37 -05:00
Al
372e952cd3 [geodb] Adding some logging to geodb 2015-10-11 01:00:08 -05:00
Al
cb334b9fb1 [geodisambig] Shaving a few hundred more megabytes off of the geodb by only adding a single geohash prefix and not indexing the neighbors (query can use its neighbors) 2015-10-11 00:45:26 -05:00
Al
2394f817e4 [phrases] Fixing fallback at the end of a string in trie search 2015-10-11 00:13:21 -05:00
Al
29bc0fd11e [build] Makefile changes for the new geodb 2015-10-09 15:54:44 -04:00
Al
a6fbd48bec [geodb] geodb builder changes to support the new, more compact geodb 2015-10-09 15:53:56 -04:00
Al
bf596b9184 [utils] integer string sizes 2015-10-09 15:40:47 -04:00
Al
4dad121334 [fix] Initializing booleans in postal code constructor 2015-10-09 15:40:28 -04:00
Al
44da2e446b [geodb] Additional filenames and struct members in geodb.h 2015-10-09 15:37:10 -04:00
Al
67d128c386 [graph] graph_load and graph_save 2015-10-09 15:36:14 -04:00
Al
9fe2250521 [geodb] Using a trie for geo disambiguation features rather than the sparkey hashtable, sparkey simply contains the ids or code/country pairs in the case of postal codes 2015-10-09 15:35:50 -04:00
Al
cd6a0ab90b [geodb] Prefixing features with name for geo disambiguation (better trie compression) and removing the longer geohash prefix features 2015-10-09 15:16:08 -04:00
Al
77c4bb10c6 [utils] Adding kh_foreach_key 2015-10-09 11:51:32 -04:00
Al
1e98932b82 [fix] setting array->n after reading in both graph and sparse_matrix implementations 2015-10-06 19:28:28 -04:00
Al
5a231fb709 [graph] Builder for graphs not constructed in vertex-sorted order 2015-10-06 19:03:10 -04:00
Al
4984352eda [graph] Simple sparse graph implementation, essentially a sparse matrix with no values array 2015-10-06 18:58:18 -04:00
Al
3084fc929b [geodb] Was missing country boundary type in GeoDB causing some misses in parsing 2015-10-06 16:01:22 -04:00
Al
5f03bc9369 [fix] Unit dictionaries apply to ADDRESS_UNIT component 2015-10-06 12:04:31 -04:00
Al
91f4e477ad [fix] typo 2015-10-06 12:04:07 -04:00
Al
0eb9ef5bdf [tokenization] Regenerating scanner.c 2015-10-05 01:41:48 -04:00
Al
50a36cc595 [parser] using trie_new_from_hash instead of an inline implemention in averaged perceptron training 2015-10-04 18:31:16 -04:00
Al
ff8986a287 [phrases] trie_new_from_hash compresses a {str: uint32_t} hashtable into a trie in sorted order 2015-10-04 18:28:21 -04:00
Al
55a5a79b4b [tokenization] tokenized string with source 2015-10-04 18:27:04 -04:00
Al
aa39c45b87 [tokenization] skipping control characters in tokenization, comes up in OSM surprisingly 2015-10-04 18:25:50 -04:00
Al
d6480d2902 [utils] Adding ksort for strings by default in collections.h 2015-10-04 18:23:42 -04:00
Al
db63e6dbc3 [fix] making ksort methods static 2015-10-04 18:23:09 -04:00
Al
89d0fd5718 [fix] Alpha-numeric splitting 2015-10-03 16:40:10 -04:00
Al
6428c0ae20 [utils] cstring_array_cat 2015-10-03 16:00:13 -04:00
Al
0aa6950b6c [fix] abbreviations 2015-10-02 23:48:21 -04:00
Al
01856dd36d [fix] acronyms 2015-10-01 00:24:04 -04:00
Al
562aeb497d [tokenization] Regenerating scanner.c 2015-09-30 11:32:38 -04:00
Al
689b830ad2 [tokenization] Acronym vs abbreviation 2015-09-30 04:10:04 -04:00
Al
c3c6a18df8 [geodb] Renaming geodb 2015-09-29 13:07:50 -04:00
Al
8ca22247f9 [fix] labels in averaged perceptron trainer 2015-09-29 13:07:07 -04:00
Al
6666f0baf8 [fix] Labels in averaged perceptron tagger 2015-09-29 13:06:34 -04:00
Al
12816d0e95 [api] Setting global objects to NULL on teardown 2015-09-28 17:27:57 -04:00
Al
abfa744d59 [build] Adding libpostal_data script for downloading data from S3, Makefile uses that now as part of the all-local target. Can be run periodically after install 2015-09-28 17:26:15 -04:00
Al
856198a352 [tokenization] Regenerated scanner.c 2015-09-26 02:27:45 -04:00
Al
07f1f361e2 [transliteration] Regenerating transliteration data with new categories 2015-09-26 00:07:39 -04:00
Al
172263af58 [tokenization] Adding updated token classes to scanner.re 2015-09-26 00:05:23 -04:00
Al
5a6b47d0fd [api] Adding LIBPOSTAL_DEFAULT_OPTIONS to libpostal.h 2015-09-25 01:53:29 -04:00
Al
accd8a57e7 [expansion] Regenerating expansion data 2015-09-24 16:38:20 -04:00
Al
f6c30778bf [normalize] New token normalization option for replacing digits with 'D' for masking numbers e.g. when learning patterns (so 1234 and 5678 both normalize to DDDD). Shouldn't be used by libpostal API, just by the feature extractors in the machine learning models. Also adding better possessive handling. 2015-09-23 19:41:01 -04:00
Al
a1d272077d [doc] Averaged perceptron tagger 2015-09-23 19:37:55 -04:00
Al
4a0da67aa1 [fix] warning 2015-09-23 04:06:54 -04:00
Al
88bd0cd158 [unicode] better segmentation on script breaks 2015-09-23 04:06:34 -04:00
Al
377c947541 [transliteration] Regenerating transliteration data files 2015-09-23 04:04:38 -04:00