Commit Graph

1306 Commits

Author SHA1 Message Date
Al
58e53cab1c [scripts] Adding the tokenize/normalize wrappers directly into the internal geodata package so pypostal can be maintained in an independent repo 2016-01-12 13:29:31 -05:00
Al
622dc354e7 [optimization] Adding learning rate to lazy sparse update in stochastic gradient descent 2016-01-12 11:04:16 -05:00
Al
79f2b7c192 [build] Removing source from libpostal shared lib 2016-01-12 10:31:22 -05:00
Al
6a9c1e8c6d [build] Adding trie_utils.c to address parser train/test 2016-01-12 10:22:34 -05:00
Al
7cc201dec3 [optimization] Moving gamma_t calculation to the header in SGD 2016-01-11 16:40:50 -05:00
Al
25ae5bed33 [unicode] Adding SCRIPT_INHERITED as a common script so diacritics like COMBING CEDILLA don't break the current script and produce false word breaks 2016-01-11 16:39:21 -05:00
Al
3260edcf18 [math] Adding sparse dot sparse given a dense output matrix (suitable for the minibatch use case), fixing sparse dot vector 2016-01-11 13:55:54 -05:00
Al
736bc7c70d [config] language_classifier data dir 2016-01-10 03:05:36 -05:00
Al
ebaedb6bcf [language_classifier] Language classifier training using L2-regularized logistic regression and stochastic gradient descent 2016-01-10 01:31:18 -05:00
Al
56710cce21 [language_classifier] Language classifier data set I/O 2016-01-10 01:22:29 -05:00
Al
0558475a50 [language_classifier] Language classifier structs, I/O and API 2016-01-10 01:20:17 -05:00
Al
b85e454a58 [fix] var 2016-01-09 03:43:53 -05:00
Al
b13462f8ef [language_classifier] Features for address languages classification, quadgrams for most languages, unigrams for ideographic characters, script for single-script languages like Thai, Hebrew, etc. 2016-01-09 03:42:57 -05:00
Al
29930fa7b6 [fix] sort hash keys by value 2016-01-09 03:38:25 -05:00
Al
62017fd33d [optimization] Using sparse updates in stochastic gradient descent. Decomposing the updates into the gradient of the loss function (zero for features not observed in the current batch) and the gradient of the regularization term. The derivative of the regularization term in L2-regularized models is equivalent to an exponential decay function. Before computing the gradient for the current batch, we bring the weights up to date only for the features observed in that batch, and update only those values 2016-01-09 03:37:31 -05:00
Al
aa22db11b2 [math] Matrix arithmetic 2016-01-09 01:45:10 -05:00
Al
197b18f3cf [fix] NULL check 2016-01-09 01:43:25 -05:00
Al
9c4b5ccbb1 [math] Adding array_{op}_times_scalar methods 2016-01-09 01:42:54 -05:00
Al
2f1e2139ca [math] Unique columns as array for CSR sparse matrix 2016-01-09 01:40:26 -05:00
Al
023c04d78f [classification] Pre-allocating memory in logistic regression trainer, storing last updated timestamps for sparse stochastic gradient descent and using the new gradient API 2016-01-09 01:39:24 -05:00
Al
562cc06eaf [classification] Sparse version of logistic regression gradient which, given an array of the features/columns used in the input batch, only updates the gradient for that batch, even for the operations which otherwise would apply to the entire matrix (scaling by -1/m, regularization) 2016-01-09 01:33:33 -05:00
Al
5ca4bba1d5 [fix] Writing matrix dimension as 64-bit 2016-01-08 01:29:52 -05:00
Al
8f054eeeb1 [classification] Training structures for logistic regression and stochastic (minibatch) gradient descent update 2016-01-08 01:07:20 -05:00
Al
4acf10c3a4 [classification] Multinomial logistic regression, gradient and cost function 2016-01-08 01:03:09 -05:00
Al
8b70529711 [optimization] Stochastic gradient descent with gain schedule a la Leon Bottou 2016-01-08 00:54:17 -05:00
Al
6b164d263e [math] Sparse matrix from dense 2016-01-08 00:48:57 -05:00
Al
ba8fc716df [features] Functions for dealing with minibatches 2016-01-08 00:48:11 -05:00
Al
06638d2885 [fix] only strdup when necessary in feature counting functions 2016-01-08 00:46:41 -05:00
Al
31a3a2a3fa [math] Matrix scalar arithmetic functions 2016-01-08 00:44:33 -05:00
Al
b6ce94166b [sparse] Only increase size of sparse matrix on finalize row if it needs to be 2016-01-07 13:19:22 -05:00
Al
2e67afab09 [fix] adding functions to string_utils header 2016-01-06 23:03:16 -05:00
Al
a8b9a2c153 [fix] making *_hash_sort_keys_by_value static 2016-01-06 23:01:00 -05:00
Al
0d5cf0d6d7 [utils] char_array_cat_printf was forcing a doubling of the size of the buffer, which is bad if calling many times. Now only initiates a realloc if the char_array is almost full. Also adding cstring_array_from_strings which takes a list of char *s 2016-01-06 22:56:01 -05:00
Al
8c019998d7 [phrases] trie_num_keys 2016-01-05 22:02:15 -05:00
Al
22668945cb [mv] Moving trie_new_from_hash to a module 2016-01-05 16:43:17 -05:00
Al
33e9a05ebf [tokenization] is_whitespace 2016-01-05 16:40:35 -05:00
Al
6e1435ac48 [features] No copy versions of feature counts functions 2016-01-05 16:39:50 -05:00
Al
a740417cab [utils] Adding hash sort by values for numeric types 2016-01-05 14:47:48 -05:00
Al
6ef7c90278 [fix] using string_equals, handles NULLs 2016-01-05 14:08:10 -05:00
Al
c0214d6023 [fix] free normalized string in address parser data set 2016-01-05 14:06:03 -05:00
Al
6a5ad96a17 [math] Adding vector sort and vector argsort to numeric vectors 2016-01-05 14:05:27 -05:00
Al
7aea79281e [math] Floating point equality with relative epsilon comparisons 2016-01-02 15:39:49 -05:00
Al
81624f8b6d [dictionaries] All professional suffixes should use the abbreviated form as the canonical 2015-12-31 13:14:29 -05:00
Al
780966a59b [api] More spacing fixes and using language information in normalize string 2015-12-31 03:52:14 -05:00
Al
ff75c5cc50 [normalize] Adding normalize_string_languages method which can use additional transliterators 2015-12-31 03:50:36 -05:00
Al
7906f5542d [dictionaries] ulitsa is the proper transliteration for Russian 2015-12-31 03:49:51 -05:00
Al
9335d26fbd [fix] spacing 2015-12-31 02:26:28 -05:00
Al
7bd1336b3b [fix] Freeing languages in Python 2015-12-31 01:46:04 -05:00
Al
cc89b768d8 [dictionaries] New Japanese abbreviations from the OSM wiki 2015-12-31 01:32:42 -05:00
Al
ffe9c2a971 [dictionaries] Santi/SS in Italian 2015-12-31 01:32:21 -05:00