Commit Graph

436 Commits

Author SHA1 Message Date
Al
c3c6a18df8 [geodb] Renaming geodb 2015-09-29 13:07:50 -04:00
Al
8ca22247f9 [fix] labels in averaged perceptron trainer 2015-09-29 13:07:07 -04:00
Al
6666f0baf8 [fix] Labels in averaged perceptron tagger 2015-09-29 13:06:34 -04:00
Al
12816d0e95 [api] Setting global objects to NULL on teardown 2015-09-28 17:27:57 -04:00
Al
abfa744d59 [build] Adding libpostal_data script for downloading data from S3, Makefile uses that now as part of the all-local target. Can be run periodically after install 2015-09-28 17:26:15 -04:00
Al
856198a352 [tokenization] Regenerated scanner.c 2015-09-26 02:27:45 -04:00
Al
07f1f361e2 [transliteration] Regenerating transliteration data with new categories 2015-09-26 00:07:39 -04:00
Al
172263af58 [tokenization] Adding updated token classes to scanner.re 2015-09-26 00:05:23 -04:00
Al
5a6b47d0fd [api] Adding LIBPOSTAL_DEFAULT_OPTIONS to libpostal.h 2015-09-25 01:53:29 -04:00
Al
accd8a57e7 [expansion] Regenerating expansion data 2015-09-24 16:38:20 -04:00
Al
f6c30778bf [normalize] New token normalization option for replacing digits with 'D' for masking numbers e.g. when learning patterns (so 1234 and 5678 both normalize to DDDD). Shouldn't be used by libpostal API, just by the feature extractors in the machine learning models. Also adding better possessive handling. 2015-09-23 19:41:01 -04:00
Al
a1d272077d [doc] Averaged perceptron tagger 2015-09-23 19:37:55 -04:00
Al
4a0da67aa1 [fix] warning 2015-09-23 04:06:54 -04:00
Al
88bd0cd158 [unicode] better segmentation on script breaks 2015-09-23 04:06:34 -04:00
Al
377c947541 [transliteration] Regenerating transliteration data files 2015-09-23 04:04:38 -04:00
Al
19e5457a0f [unicode] Regenerated unicode scripts data file, using simple integers instead of repeating the enum types for succinctness 2015-09-23 00:36:29 -04:00
Al
4ad3fac627 [unicode] Regenerated unicode script types (ignore extraneous scripts, they're not used, just reside in the upper unicode planes) 2015-09-23 00:35:08 -04:00
Al
f13e9fad90 [tokenization] Regenerated scanner.c 2015-09-23 00:33:27 -04:00
Al
b4593b6f88 [unicode/tokenization] Using new character classes including wide chars in scanner 2015-09-23 00:33:14 -04:00
Al
cffa5a4a20 [fix] stdint include 2015-09-20 20:10:47 -04:00
Al
3fab0f984f [fix] fixing some compiler warnings, using type-specific abs functions for vector_math 2015-09-19 16:11:09 -04:00
Al
2940cc15b8 [fix] tokenized string destroy frees original string 2015-09-19 01:40:41 -04:00
Al
2b13871341 [constants] max country code length 2015-09-19 01:39:58 -04:00
Al
0396823772 [fix] geodb path separator 2015-09-19 01:39:31 -04:00
Al
17cfdb0625 [fix] adding char_array_append_* methods to header 2015-09-18 13:19:42 -04:00
Al
f2f7db92ff [fix] phrases 2015-09-18 13:19:18 -04:00
Al
b74e92adad [fix] include 2015-09-18 13:18:49 -04:00
Al
2a869894d9 [fix] geodb 2015-09-18 13:18:26 -04:00
Al
9e9131bda0 [parser] Averaged perceptron tagger 2015-09-17 05:51:24 -04:00
Al
8a86f7ec64 [parser] Adding context struct to feature function 2015-09-17 05:48:00 -04:00
Al
87ed7d9a0f [geodb] Adding trie search methods for finding geodb phrases 2015-09-16 22:11:10 -04:00
Al
e62c75b9c6 [phrases] Adding _with_phrases versions of address dictionary methods for pre-allocated phrases 2015-09-16 21:24:28 -04:00
Al
23103a21d4 [phrases] Adding with_phrases versions of trie search methods for pre-allocated phrases 2015-09-16 21:23:34 -04:00
Al
d5ec005787 [transliteration] Similar init method for transliteration 2015-09-16 21:14:02 -04:00
Al
b11362ab98 [numex] using module init method for building, otherwise passing NULL path uses the default path 2015-09-16 21:13:05 -04:00
Al
3cba2e8df3 [api] Using default setup methods for submodules in libpostal setup 2015-09-15 14:01:33 -04:00
Al
e122824448 [expansion] Adding the ability to search address dictionary phrases with a NULL language, will return phrases in any language 2015-09-15 14:00:26 -04:00
Al
c47ff1b113 [utils] Adding source string to tokenized_string struct 2015-09-15 13:21:51 -04:00
Al
b2f690b6f6 [api] Error logging if modules can't be found 2015-09-15 13:21:15 -04:00
Al
9de3029dd3 [parser] Averaged perceptron training does full examples (greedily). During training, features are a hashtable, sorted and converted to a trie during finalize 2015-09-14 17:38:45 -04:00
Al
a5b5f80b04 [fix] new_copy 2015-09-14 16:50:23 -04:00
Al
3ea6358f77 [fix] vector zeros allocation 2015-09-14 16:50:08 -04:00
Al
c21f61b9b4 [parser] Default address parser path 2015-09-11 15:05:38 -07:00
Al
32c180528f [tokens] Adding a copy_tokens option for tokenized_string 2015-09-11 15:04:29 -07:00
Al
9ce658b7a3 [collections] Adding string_array for an array of char pointers 2015-09-10 16:34:16 -07:00
Al
35b9122a1a [utils] inlining a few functions 2015-09-10 16:33:54 -07:00
Al
6a5b01b51b [parser] Averaged perceptron training 2015-09-10 10:26:24 -07:00
Al
0ddf50cb5f [utils] add to feature array with printf syntax 2015-09-10 10:24:51 -07:00
Al
b3f89a207a [utils] Version of string_split for single character delimiters which modifies the input string directly rather than creating (essentially) a copy 2015-09-09 18:07:31 -07:00
Al
607a607b71 [doc] documentation fix for averaged perceptron 2015-09-08 16:37:23 -07:00