Commit Graph

388 Commits

Author SHA1 Message Date
Al
b3f89a207a [utils] Version of string_split for single character delimiters which modifies the input string directly rather than creating (essentially) a copy 2015-09-09 18:07:31 -07:00
Al
607a607b71 [doc] documentation fix for averaged perceptron 2015-09-08 16:37:23 -07:00
Al
c80d8b8067 [parsing] Averaged perceptron model data structure for storing the finalized, averaged, sparse weights 2015-09-08 12:42:54 -07:00
Al
8d642b45b9 [fix] trie was returning early on add_at_index and not incrementing the num_keys 2015-09-08 11:41:46 -07:00
Al
ae7e30634b [features] Adding counter/bag-of-words representation of features 2015-09-08 00:17:26 -07:00
Al
49d389b9d8 [refactor] changing names in int-valued hash tables 2015-09-08 00:15:14 -07:00
Al
2fffd76af8 [fix] typo 2015-09-07 23:58:34 -07:00
Al
aa454c4430 [fix] removing char_array_copy from header 2015-09-07 23:58:05 -07:00
Al
3fd6552b44 [fix] void not void * in vector *_copy 2015-09-07 23:57:44 -07:00
Al
cddffdb65f [math] Adding column and row sums to sparse matrices 2015-09-07 00:34:00 -07:00
Al
9d2ca08fc2 [utils] Adding _copy and _new_copy methods to vectors (the former copies data to a pre-allocated vector, the latter allocates a new vector) 2015-09-06 21:01:26 -07:00
Al
49fe504201 [math] Matrix get value at row, column index 2015-09-06 12:37:10 -07:00
Al
ec3ab7234a [utils] Adding index to cstring_array_foreach, similar to Python's enumerate 2015-09-04 19:34:06 -04:00
Al
ee4d73c65d [math] sparse matrix I/O methods 2015-09-01 00:29:11 -04:00
Al
a8f6617294 [phrases] Adding num_keys attribute to trie 2015-08-31 21:41:34 -04:00
Al
aac5b37e76 [fix] Removing default dirent include 2015-08-31 21:38:29 -04:00
Al
bb50c7ea2c [math] Adding sigmoid and softmax functions 2015-08-31 21:04:21 -04:00
Al
a090a22bca [math] Adding compressed sparse row (CSR) format sparse matrix, designed for dynamic construction, just the methods needed for logistic regression for now i.e. no sparse dot products 2015-08-31 16:42:41 -04:00
Al
0f617454d3 [math] Dense matrices 2015-08-31 14:57:11 -04:00
Al
0ee72b8dfb [math] can only use memset for *_array_new_zeros 2015-08-31 14:44:43 -04:00
Al
c566eaecf1 [dictionaries] Rebuilding address expansion data and uploading new files to S3 2015-08-31 14:33:28 -04:00
Al
789150ae33 [math] Using regular C arrays instead of vectors for vector_math.h 2015-08-30 02:41:31 -04:00
Al
07b0bed602 [math] Only float vectors have *_array_log, *_array_exp, etc. 2015-08-26 17:58:07 -04:00
Al
9464670174 [scripts] Regenerating unicode_scripts_data file 2015-08-13 18:27:23 -04:00
Al
66a71ab70d [normalize] Need to do a Latin-ASCII transliteration even if the string is entirely ASCII since it may contain HTML escapes 2015-08-11 23:36:08 -04:00
Al
87b275fcab [transliteration] Regenerating transliteration data file 2015-08-11 23:11:17 -04:00
Al
9712e0fa87 [fix] phrase start in transliteration 2015-08-11 23:09:49 -04:00
Al
562a7c243d [phrases] Fixing tail searches in trie_get_prefix* 2015-08-11 23:08:21 -04:00
Al
e98a822661 [build] ORder-only dependencies for downloading data files, rm-ing the tarball when done extracting 2015-08-11 12:59:37 -04:00
Al
0028c2bc53 [build] Fixing tarball uploading 2015-08-11 03:18:35 -04:00
Al
f21b767696 [build] Adding tarball back to pkgdata 2015-08-10 18:44:40 -04:00
Al
c29cf5ac9a [api] Better handling of strings with multiple scripts and strings that use more than one transliterator. Reducing complexity/allocations 2015-08-10 17:51:41 -04:00
Al
4bc6adf669 [normalize] Adding the original script as an alternative in transliteration mode as well 2015-08-10 17:48:48 -04:00
Al
a13e5117b5 [utils] string_tree_num_strings method 2015-08-10 17:46:37 -04:00
Al
219947722d [cli] delete_word_hyphens as a default option 2015-08-10 16:19:54 -04:00
Al
78a80dd86e [api] Add separable or inseparable non-canonical string affixes (e.g. foobg. => fooburg, foostrasse => foostraße|foo straße, l'ensemble => l' ensemble, etc.) in expand_address 2015-08-10 16:19:03 -04:00
Al
de5d6945b5 [expansion] Adding search_address_dictionaries_prefix/suffix for concatenated prefixes/suffixes e.g. in Germanic languages. Adding a flag to the address_expansion struct and trie value to denote separability, adding prefix/suffix keys during dictionary creation 2015-08-10 16:15:01 -04:00
Al
0f77ca1213 [normalize] Adding a char_array version of normalize token 2015-08-10 16:11:34 -04:00
Al
064b6b5898 [utils] char_array_append_reversed for adding reversed strings without a malloc 2015-08-10 16:10:05 -04:00
Al
dab181a4d7 [fix] Only the exact TRIE_PREFIX_CHAR/TRIE_SUFFIX_CHAR characters are disallowed as keys 2015-08-10 16:09:10 -04:00
Al
e511eede74 [phrases] Prefix/suffix trie search using the new characters, fixing length of matched prefixes/suffixes and exiting early on falling off the the trie 2015-08-10 16:02:38 -04:00
Al
51572d6575 [phrases] Changing prefix/suffix chars so both are control characters and neither is the NUL-byte. Modifying transliteration special characters accordingly 2015-08-10 16:01:22 -04:00
Al
11a9881988 [phrases] adding _from_index_get_prefix_char/_from_index_get_suffix_char methods 2015-08-09 03:41:20 -04:00
Al
2eb67ad850 [phrases] trie_search_prefixes/trie_search_suffixes now take a length param 2015-08-09 02:01:37 -04:00
Al
bbaa302e2e [fix] NUMEX_STOPWORD_RULE define 2015-08-09 01:03:23 -04:00
Al
5383640c14 [fix] cast 2015-08-09 01:01:11 -04:00
Al
dd391eabe5 [numex] Separating rules from keys for Linux gcc compilation 2015-08-09 01:00:57 -04:00
Al
e346b831cb [build] public-read permissions when uploading to S3 2015-08-09 00:17:04 -04:00
Al
ad584671c4 [build] Not compiling with -Werror for now 2015-08-09 00:02:41 -04:00
Al
423e2c86c7 [build] builder programs are now in noinst_PROGRAMS, Makefile target to upload data tarball to S3 (with proper credentials) 2015-08-08 23:29:34 -04:00