Commit Graph

845 Commits

Author SHA1 Message Date
Al
4acf10c3a4 [classification] Multinomial logistic regression, gradient and cost function 2016-01-08 01:03:09 -05:00
Al
8b70529711 [optimization] Stochastic gradient descent with gain schedule a la Leon Bottou 2016-01-08 00:54:17 -05:00
Al
6b164d263e [math] Sparse matrix from dense 2016-01-08 00:48:57 -05:00
Al
ba8fc716df [features] Functions for dealing with minibatches 2016-01-08 00:48:11 -05:00
Al
06638d2885 [fix] only strdup when necessary in feature counting functions 2016-01-08 00:46:41 -05:00
Al
31a3a2a3fa [math] Matrix scalar arithmetic functions 2016-01-08 00:44:33 -05:00
Al
b6ce94166b [sparse] Only increase size of sparse matrix on finalize row if it needs to be 2016-01-07 13:19:22 -05:00
Al
2e67afab09 [fix] adding functions to string_utils header 2016-01-06 23:03:16 -05:00
Al
a8b9a2c153 [fix] making *_hash_sort_keys_by_value static 2016-01-06 23:01:00 -05:00
Al
0d5cf0d6d7 [utils] char_array_cat_printf was forcing a doubling of the size of the buffer, which is bad if calling many times. Now only initiates a realloc if the char_array is almost full. Also adding cstring_array_from_strings which takes a list of char *s 2016-01-06 22:56:01 -05:00
Al
8c019998d7 [phrases] trie_num_keys 2016-01-05 22:02:15 -05:00
Al
22668945cb [mv] Moving trie_new_from_hash to a module 2016-01-05 16:43:17 -05:00
Al
33e9a05ebf [tokenization] is_whitespace 2016-01-05 16:40:35 -05:00
Al
6e1435ac48 [features] No copy versions of feature counts functions 2016-01-05 16:39:50 -05:00
Al
a740417cab [utils] Adding hash sort by values for numeric types 2016-01-05 14:47:48 -05:00
Al
6ef7c90278 [fix] using string_equals, handles NULLs 2016-01-05 14:08:10 -05:00
Al
c0214d6023 [fix] free normalized string in address parser data set 2016-01-05 14:06:03 -05:00
Al
6a5ad96a17 [math] Adding vector sort and vector argsort to numeric vectors 2016-01-05 14:05:27 -05:00
Al
7aea79281e [math] Floating point equality with relative epsilon comparisons 2016-01-02 15:39:49 -05:00
Al
780966a59b [api] More spacing fixes and using language information in normalize string 2015-12-31 03:52:14 -05:00
Al
ff75c5cc50 [normalize] Adding normalize_string_languages method which can use additional transliterators 2015-12-31 03:50:36 -05:00
Al
9335d26fbd [fix] spacing 2015-12-31 02:26:28 -05:00
Al
1b0567a881 [fix] Ubuntu build 2015-12-28 17:19:50 -05:00
Al
77ccd975c4 [fix] #endif 2015-12-28 17:03:12 -05:00
Al
d0b5985cb7 [build] Adding /usr/local/lib and /usr/local/include to sparkey build 2015-12-28 16:56:10 -05:00
Al
45b5e2dd6f [fix] array_zero 2015-12-28 01:24:27 -05:00
Al
fb4c984f15 [math] sparse_matrix_new_shape 2015-12-28 01:20:23 -05:00
Al
72ad01cbc3 [features] Using a str=>double hashtable for feature counts 2015-12-28 01:18:49 -05:00
Al
e4dba2297d [mv] Moving token type checking to header 2015-12-28 01:17:33 -05:00
Al
0fa1c2389c [fix] Leak in expanding strings that have a separable prefix and suffix, other than that ran through 78 million expansions with no discernable memory issues 2015-12-26 17:19:59 -05:00
Al
deeb8f007e [fix] Check for result.len > 0 in false start continuation numex parsing, plus additional safety check during replacement 2015-12-24 02:26:53 -05:00
Al
507dd631f8 [build] Adding json_encode.c to the address parser client sources 2015-12-23 19:37:28 -05:00
Al
5e6d24ff7e [unicode] Upgrading to latest utf8proc from JuliaLang (Unicode 8) 2015-12-23 19:33:09 -05:00
Al
3fbb3c587a [fix] using a char_array instead of copying the string in normalize_string 2015-12-23 19:21:54 -05:00
Al
2eea999692 [fix] Fixing false start continuations in numex parsing 2015-12-23 19:19:14 -05:00
Al
850d82de6e [fix] In trie search, moving fall-off and tail checks inside the inner character loop dding tail position as a separate variable from offset in the string 2015-12-23 19:16:43 -05:00
Al
19173d3a6e [transliteration] In set match checks, use the current index, not current index - char_len 2015-12-23 13:12:30 -05:00
Al
e9e05bb929 [transliteration] Distinguishing between variables with numbers and backreferences in transliteration rules 2015-12-23 13:07:44 -05:00
Al
aaa1fc0387 [fix] Stepping through codepoints first then through chars in trie_search_prefixes_from_index (used in transliteration and numex) 2015-12-23 01:58:39 -05:00
Al
baa8e3cc3f [fix] Compare the remaining part of the current UTF-8 character using simple string comparison, since it may be in the middle of a valid UTF-8 character 2015-12-21 20:34:15 -05:00
Al
ceda863e9f [fix] Encode strings as JSON in address parser cli 2015-12-21 17:45:09 -05:00
Al
e55ff54be1 [fix] Adding Korean-Latin-BGN to excluded transliterators 2015-12-21 16:24:50 -05:00
Al
c7fb7f685d [transliteration] Fixing group replacement in transliteration in the case of multiple groups, not adding to phrase length when checking context 2015-12-21 16:06:04 -05:00
Al
ab124465e6 [fix] regenerating transliteration data 2015-12-20 15:41:42 -05:00
Al
5439f4679f [fix] Special tokens like emails/urls/phone numbers bypass normalization 2015-12-20 03:07:36 -05:00
Al
cf2a0efa11 [fix] Prefixes and suffixes that are the same length as the original token should be handled as regular expansions 2015-12-19 17:29:26 -05:00
Al
aaecd7961a [fix] Options out of order 2015-12-19 15:05:50 -05:00
Al
48cb2b5c7b [api] Node was complaining about non-trivial designated initializers (probably the bit fields), so converting to old-school initializer 2015-12-19 02:34:31 -05:00
Al
97906c86a8 [fix] Strip punctuation in final output in cases where there are no expansions 2015-12-19 02:10:41 -05:00
Al
4497c4501e [fix] do not add a token if prefix/suffix expansions are inseparable and canonical 2015-12-19 01:36:02 -05:00