Al
|
a446290829
|
[fix] IDEOGRAM class name
|
2015-03-11 17:33:53 -04:00 |
|
Al
|
a5f7c73374
|
[utils] is_relative_path
|
2015-03-11 17:31:08 -04:00 |
|
Al
|
5157a0fd8b
|
[utils] float and double arrays in collections.h
|
2015-03-11 17:30:26 -04:00 |
|
Al
|
94805fb1a7
|
[tokenization] Better scanner support for ideographic languages (Chinese, Japanese, Korean, etc.) with an IDEOGRAM token class in the scanner so we know when we're dealing with those languages vs. other random characters
|
2015-03-11 17:29:37 -04:00 |
|
Al
|
1dc0b8e07b
|
[dictionaries] Catalan dictionaries
|
2015-03-08 17:57:08 -04:00 |
|
Al
|
fce693a6b3
|
[dictionaries] additions to Portuguese dictionaries
|
2015-03-08 17:56:38 -04:00 |
|
Al
|
642d3697d4
|
[dictionaries] additions to German dictionaries, including a separable prefix dictionary
|
2015-03-08 17:55:57 -04:00 |
|
Al
|
38ec03bf2b
|
[phrases] default constructor for a trie uses a default alphabet derived from Wikipedia character frequencies for convenience. In practice the alphabet size/ordering matters only for very small tries or specialized alphabets. Mostly just use trie_new()
|
2015-03-05 13:40:52 -05:00 |
|
Al
|
939c3af293
|
[dictionaries] gazetteers.h has the config for in-memory dictionaries' directory structure
|
2015-03-04 16:01:16 -05:00 |
|
Al
|
7985a93963
|
[mv] Dutch concatenated suffixes
|
2015-03-04 01:21:37 -05:00 |
|
Al
|
b4bddfb510
|
[project] Making a work-in-progress note in the README
|
2015-03-03 23:35:15 -05:00 |
|
Al
|
163d8b7143
|
[dictionaries] first/last names apply to all languages. English gazetteers may potentially be used as a backup for all countries (most countries with non-Latin scripts transliterate, some actually translate the street name, usually to English)
|
2015-03-03 23:31:43 -05:00 |
|
Al
|
6d9c6a6fe7
|
[utils] geohash
|
2015-03-03 18:51:49 -05:00 |
|
Al
|
31910bd7b0
|
[dictionaries] fix for Dutch concatenated suffixes
|
2015-03-03 18:51:11 -05:00 |
|
Al
|
d5c14ca068
|
[dictionaries] Portuguese dictionaries
|
2015-03-03 18:46:44 -05:00 |
|
Al
|
fca161b2db
|
[dictionaries] Dutch dictionaries
|
2015-03-03 18:46:26 -05:00 |
|
Al
|
b058e9e950
|
[dictionaries] Italian dictionaries
|
2015-03-03 18:46:11 -05:00 |
|
Al
|
837557ce97
|
[dictionaries] German dictionaries (including concatenated suffixes)
|
2015-03-03 18:45:42 -05:00 |
|
Al
|
c0c6ec5b85
|
[dictionaries] French dictionaries
|
2015-03-03 18:45:21 -05:00 |
|
Al
|
99816f55b1
|
[dictionaries] Spanish dictionaries
|
2015-03-03 18:45:04 -05:00 |
|
Al
|
ff55b2eace
|
[dictionaries] English dictionaries
|
2015-03-03 18:44:36 -05:00 |
|
Al
|
5dd3896c4a
|
[phrases] trie_search module for searching for millions of patterns in a trie simultanously. Works for strings, token sequences, and can search for suffixes.
|
2015-03-03 13:51:01 -05:00 |
|
Al
|
10777ce973
|
[fix] debug logging only in trie.c
|
2015-03-03 13:28:43 -05:00 |
|
Al
|
585baab0a5
|
[phrases] optimized implementation of a double-array trie for storing millions of phrases compactly while being extremely quick to access. Supports utf-8, stores phrase tails in a contiguous character array separated by NUL bytes and stores offsets only so the chars at that offset can be treated as a regular C string and fed to things like strncmp. Also stores suffixes (primarily for languages like German, Dutch, etc. that concatenate street names e.g. Foobarstraße, Fobarweg) by prefixing the reversed string with the NUL byte and storing it backward in the trie, so can search forward and backward with the same data structure.
|
2015-03-03 13:18:18 -05:00 |
|
Al
|
3ed5795cff
|
[fix] fixing some formatting
|
2015-03-03 12:54:27 -05:00 |
|
Al
|
087328c321
|
[utils] logging
|
2015-03-03 12:38:10 -05:00 |
|
Al
|
09552906d3
|
[utils] util headers
|
2015-03-03 12:37:32 -05:00 |
|
Al
|
0689f936c9
|
[tokenization] scanner/tokenizer (generated with re2c)
|
2015-03-03 12:35:22 -05:00 |
|
Al
|
5216aba1b6
|
[utils] string utils, file utils, contiguous arrays of strings used for storing tokenized strings, klib for generic hashtables and vectors, antirez's sds for certain types of string building, utf8proc for iterating over utf-8 strings and unicode normalization
|
2015-03-03 12:33:13 -05:00 |
|
Al Barrentine
|
27269e18ca
|
Initial commit
|
2015-03-02 19:21:31 -05:00 |
|