Al
|
baef090793
|
[logging] Wrapping logging statements in a do while (0) so the compiler always at least sees debug code
|
2015-10-12 13:42:10 -05:00 |
|
Al
|
b88f237d82
|
[build] Adding separate Makefile target for downloading geodb
|
2015-10-11 22:27:25 -05:00 |
|
Al
|
588cf1df86
|
[build] Changing options to libpostal_data script to allow downloading geodb, uploaded first version to S3
|
2015-10-11 22:25:37 -05:00 |
|
Al
|
372e952cd3
|
[geodb] Adding some logging to geodb
|
2015-10-11 01:00:08 -05:00 |
|
Al
|
cb334b9fb1
|
[geodisambig] Shaving a few hundred more megabytes off of the geodb by only adding a single geohash prefix and not indexing the neighbors (query can use its neighbors)
|
2015-10-11 00:45:26 -05:00 |
|
Al
|
2394f817e4
|
[phrases] Fixing fallback at the end of a string in trie search
|
2015-10-11 00:13:21 -05:00 |
|
Al
|
29bc0fd11e
|
[build] Makefile changes for the new geodb
|
2015-10-09 15:54:44 -04:00 |
|
Al
|
a6fbd48bec
|
[geodb] geodb builder changes to support the new, more compact geodb
|
2015-10-09 15:53:56 -04:00 |
|
Al
|
bf596b9184
|
[utils] integer string sizes
|
2015-10-09 15:40:47 -04:00 |
|
Al
|
4dad121334
|
[fix] Initializing booleans in postal code constructor
|
2015-10-09 15:40:28 -04:00 |
|
Al
|
44da2e446b
|
[geodb] Additional filenames and struct members in geodb.h
|
2015-10-09 15:37:10 -04:00 |
|
Al
|
67d128c386
|
[graph] graph_load and graph_save
|
2015-10-09 15:36:14 -04:00 |
|
Al
|
9fe2250521
|
[geodb] Using a trie for geo disambiguation features rather than the sparkey hashtable, sparkey simply contains the ids or code/country pairs in the case of postal codes
|
2015-10-09 15:35:50 -04:00 |
|
Al
|
cd6a0ab90b
|
[geodb] Prefixing features with name for geo disambiguation (better trie compression) and removing the longer geohash prefix features
|
2015-10-09 15:16:08 -04:00 |
|
Al
|
77c4bb10c6
|
[utils] Adding kh_foreach_key
|
2015-10-09 11:51:32 -04:00 |
|
Al
|
1e98932b82
|
[fix] setting array->n after reading in both graph and sparse_matrix implementations
|
2015-10-06 19:28:28 -04:00 |
|
Al
|
5a231fb709
|
[graph] Builder for graphs not constructed in vertex-sorted order
|
2015-10-06 19:03:10 -04:00 |
|
Al
|
4984352eda
|
[graph] Simple sparse graph implementation, essentially a sparse matrix with no values array
|
2015-10-06 18:58:18 -04:00 |
|
Al
|
3084fc929b
|
[geodb] Was missing country boundary type in GeoDB causing some misses in parsing
|
2015-10-06 16:01:22 -04:00 |
|
Al
|
5f03bc9369
|
[fix] Unit dictionaries apply to ADDRESS_UNIT component
|
2015-10-06 12:04:31 -04:00 |
|
Al
|
91f4e477ad
|
[fix] typo
|
2015-10-06 12:04:07 -04:00 |
|
Al
|
0eb9ef5bdf
|
[tokenization] Regenerating scanner.c
|
2015-10-05 01:41:48 -04:00 |
|
Al
|
50a36cc595
|
[parser] using trie_new_from_hash instead of an inline implemention in averaged perceptron training
|
2015-10-04 18:31:16 -04:00 |
|
Al
|
ff8986a287
|
[phrases] trie_new_from_hash compresses a {str: uint32_t} hashtable into a trie in sorted order
|
2015-10-04 18:28:21 -04:00 |
|
Al
|
55a5a79b4b
|
[tokenization] tokenized string with source
|
2015-10-04 18:27:04 -04:00 |
|
Al
|
aa39c45b87
|
[tokenization] skipping control characters in tokenization, comes up in OSM surprisingly
|
2015-10-04 18:25:50 -04:00 |
|
Al
|
d6480d2902
|
[utils] Adding ksort for strings by default in collections.h
|
2015-10-04 18:23:42 -04:00 |
|
Al
|
db63e6dbc3
|
[fix] making ksort methods static
|
2015-10-04 18:23:09 -04:00 |
|
Al
|
89d0fd5718
|
[fix] Alpha-numeric splitting
|
2015-10-03 16:40:10 -04:00 |
|
Al
|
6428c0ae20
|
[utils] cstring_array_cat
|
2015-10-03 16:00:13 -04:00 |
|
Al
|
0aa6950b6c
|
[fix] abbreviations
|
2015-10-02 23:48:21 -04:00 |
|
Al
|
01856dd36d
|
[fix] acronyms
|
2015-10-01 00:24:04 -04:00 |
|
Al
|
562aeb497d
|
[tokenization] Regenerating scanner.c
|
2015-09-30 11:32:38 -04:00 |
|
Al
|
689b830ad2
|
[tokenization] Acronym vs abbreviation
|
2015-09-30 04:10:04 -04:00 |
|
Al
|
c3c6a18df8
|
[geodb] Renaming geodb
|
2015-09-29 13:07:50 -04:00 |
|
Al
|
8ca22247f9
|
[fix] labels in averaged perceptron trainer
|
2015-09-29 13:07:07 -04:00 |
|
Al
|
6666f0baf8
|
[fix] Labels in averaged perceptron tagger
|
2015-09-29 13:06:34 -04:00 |
|
Al
|
12816d0e95
|
[api] Setting global objects to NULL on teardown
|
2015-09-28 17:27:57 -04:00 |
|
Al
|
abfa744d59
|
[build] Adding libpostal_data script for downloading data from S3, Makefile uses that now as part of the all-local target. Can be run periodically after install
|
2015-09-28 17:26:15 -04:00 |
|
Al
|
856198a352
|
[tokenization] Regenerated scanner.c
|
2015-09-26 02:27:45 -04:00 |
|
Al
|
07f1f361e2
|
[transliteration] Regenerating transliteration data with new categories
|
2015-09-26 00:07:39 -04:00 |
|
Al
|
172263af58
|
[tokenization] Adding updated token classes to scanner.re
|
2015-09-26 00:05:23 -04:00 |
|
Al
|
5a6b47d0fd
|
[api] Adding LIBPOSTAL_DEFAULT_OPTIONS to libpostal.h
|
2015-09-25 01:53:29 -04:00 |
|
Al
|
accd8a57e7
|
[expansion] Regenerating expansion data
|
2015-09-24 16:38:20 -04:00 |
|
Al
|
f6c30778bf
|
[normalize] New token normalization option for replacing digits with 'D' for masking numbers e.g. when learning patterns (so 1234 and 5678 both normalize to DDDD). Shouldn't be used by libpostal API, just by the feature extractors in the machine learning models. Also adding better possessive handling.
|
2015-09-23 19:41:01 -04:00 |
|
Al
|
a1d272077d
|
[doc] Averaged perceptron tagger
|
2015-09-23 19:37:55 -04:00 |
|
Al
|
4a0da67aa1
|
[fix] warning
|
2015-09-23 04:06:54 -04:00 |
|
Al
|
88bd0cd158
|
[unicode] better segmentation on script breaks
|
2015-09-23 04:06:34 -04:00 |
|
Al
|
377c947541
|
[transliteration] Regenerating transliteration data files
|
2015-09-23 04:04:38 -04:00 |
|
Al
|
19e5457a0f
|
[unicode] Regenerated unicode scripts data file, using simple integers instead of repeating the enum types for succinctness
|
2015-09-23 00:36:29 -04:00 |
|