Al
|
e511eede74
|
[phrases] Prefix/suffix trie search using the new characters, fixing length of matched prefixes/suffixes and exiting early on falling off the the trie
|
2015-08-10 16:02:38 -04:00 |
|
Al
|
11a9881988
|
[phrases] adding _from_index_get_prefix_char/_from_index_get_suffix_char methods
|
2015-08-09 03:41:20 -04:00 |
|
Al
|
2eb67ad850
|
[phrases] trie_search_prefixes/trie_search_suffixes now take a length param
|
2015-08-09 02:01:37 -04:00 |
|
Al
|
5acf7a4f3e
|
[phrases] resetting node position when continuation falls off the trie
|
2015-08-08 22:18:05 -04:00 |
|
Al
|
b27030e39f
|
[fix] tokenized trie search was skipping tokens in some cases
|
2015-08-02 14:36:21 -06:00 |
|
Al
|
0f5b69c06b
|
[fix] transition to SEARCH_STATE_NO_MATCH in trie_search_tokens_from_index on a return to the start node
|
2015-07-27 16:35:27 -04:00 |
|
Al
|
8ff4ace63b
|
[phrases] Allowing trie_search to process tokenized input with or without whitespace, and to handle ideographic characters correctly
|
2015-07-26 23:41:57 -04:00 |
|
Al
|
90a91cadd0
|
[search] Modifying trie_search_prefixes to use the new key schema
|
2015-07-24 15:59:49 -04:00 |
|
Al
|
9337bf9aea
|
[phrases] trie_search_suffixes uses the NUL-byte prefix by default but the _from_index version can start from another node. fixing single character suffixes
|
2015-06-25 17:24:19 -04:00 |
|
Al
|
c159f83f9b
|
[fix] trie_search logging
|
2015-06-12 16:17:41 -04:00 |
|
Al
|
6b60446dbe
|
[phrases] no longer ignoring spaces in the input string, just trying different methods for hyphens, getting indexes right in the case where a space or hyphen precedes the match and backtracking on matches if the rest of the string falls off the trie
|
2015-06-12 11:30:24 -04:00 |
|
Al
|
6841ed8fb3
|
[phrases] Ignoring separators and dashes in trie_search_prefixes so it can be used for languages like German where numbers, phrases, etc. may just be concatenated together as a single token
|
2015-06-11 11:05:56 -04:00 |
|
Al
|
cb603562e0
|
[phrases] Adding *_from_index methods to trie_search
|
2015-06-09 11:14:42 -04:00 |
|
Al
|
2856c2b401
|
[utils] string_utils category functions take a category instead of a codepoint
|
2015-06-05 16:55:21 -04:00 |
|
Al
|
0177fd4b13
|
[fix] trie_search using proper length in utf8proc_iterate
|
2015-05-27 16:08:19 -04:00 |
|
Al
|
eecee39904
|
[fix] giving constant trie node names more specificity
|
2015-05-18 14:24:39 -04:00 |
|
Al
|
1373843b86
|
[fix] setting last_node in tokenized trie search in the case where a prefix phrase matches but the longer string doesn't.
|
2015-04-27 01:49:08 -04:00 |
|
Al
|
908e3dc03c
|
[phrases] trie_search now only takes the original string and the token array. Fixed a bug where certain phrases were being found in string search but not in tokenized search
|
2015-04-19 09:32:20 -04:00 |
|
Al
|
79fd7a8ded
|
[tokenization/trie] simpler url regex reduces the scanner file size, accounting for a few more variations in word tokens, making trie suffix search use iteration instead of malloc'ing a new string
|
2015-04-05 16:33:14 -04:00 |
|
Al
|
310acbed2c
|
[phrases] Adding prefix-only trie searches, primarily with Germanic languages in mind (spelled out numbers, concatenated prefixes). Making the prefix/suffix APIs for single tokens more consistent with trie searches over longer strings/token arrays
|
2015-04-01 02:52:57 -04:00 |
|
Al
|
5dd3896c4a
|
[phrases] trie_search module for searching for millions of patterns in a trie simultanously. Works for strings, token sequences, and can search for suffixes.
|
2015-03-03 13:51:01 -05:00 |
|