Al
|
07f1f361e2
|
[transliteration] Regenerating transliteration data with new categories
|
2015-09-26 00:07:39 -04:00 |
|
Al
|
172263af58
|
[tokenization] Adding updated token classes to scanner.re
|
2015-09-26 00:05:23 -04:00 |
|
Al
|
5a6b47d0fd
|
[api] Adding LIBPOSTAL_DEFAULT_OPTIONS to libpostal.h
|
2015-09-25 01:53:29 -04:00 |
|
Al
|
accd8a57e7
|
[expansion] Regenerating expansion data
|
2015-09-24 16:38:20 -04:00 |
|
Al
|
f6c30778bf
|
[normalize] New token normalization option for replacing digits with 'D' for masking numbers e.g. when learning patterns (so 1234 and 5678 both normalize to DDDD). Shouldn't be used by libpostal API, just by the feature extractors in the machine learning models. Also adding better possessive handling.
|
2015-09-23 19:41:01 -04:00 |
|
Al
|
a1d272077d
|
[doc] Averaged perceptron tagger
|
2015-09-23 19:37:55 -04:00 |
|
Al
|
4a0da67aa1
|
[fix] warning
|
2015-09-23 04:06:54 -04:00 |
|
Al
|
88bd0cd158
|
[unicode] better segmentation on script breaks
|
2015-09-23 04:06:34 -04:00 |
|
Al
|
377c947541
|
[transliteration] Regenerating transliteration data files
|
2015-09-23 04:04:38 -04:00 |
|
Al
|
19e5457a0f
|
[unicode] Regenerated unicode scripts data file, using simple integers instead of repeating the enum types for succinctness
|
2015-09-23 00:36:29 -04:00 |
|
Al
|
4ad3fac627
|
[unicode] Regenerated unicode script types (ignore extraneous scripts, they're not used, just reside in the upper unicode planes)
|
2015-09-23 00:35:08 -04:00 |
|
Al
|
f13e9fad90
|
[tokenization] Regenerated scanner.c
|
2015-09-23 00:33:27 -04:00 |
|
Al
|
b4593b6f88
|
[unicode/tokenization] Using new character classes including wide chars in scanner
|
2015-09-23 00:33:14 -04:00 |
|
Al
|
cffa5a4a20
|
[fix] stdint include
|
2015-09-20 20:10:47 -04:00 |
|
Al
|
3fab0f984f
|
[fix] fixing some compiler warnings, using type-specific abs functions for vector_math
|
2015-09-19 16:11:09 -04:00 |
|
Al
|
2940cc15b8
|
[fix] tokenized string destroy frees original string
|
2015-09-19 01:40:41 -04:00 |
|
Al
|
2b13871341
|
[constants] max country code length
|
2015-09-19 01:39:58 -04:00 |
|
Al
|
0396823772
|
[fix] geodb path separator
|
2015-09-19 01:39:31 -04:00 |
|
Al
|
17cfdb0625
|
[fix] adding char_array_append_* methods to header
|
2015-09-18 13:19:42 -04:00 |
|
Al
|
f2f7db92ff
|
[fix] phrases
|
2015-09-18 13:19:18 -04:00 |
|
Al
|
b74e92adad
|
[fix] include
|
2015-09-18 13:18:49 -04:00 |
|
Al
|
2a869894d9
|
[fix] geodb
|
2015-09-18 13:18:26 -04:00 |
|
Al
|
9e9131bda0
|
[parser] Averaged perceptron tagger
|
2015-09-17 05:51:24 -04:00 |
|
Al
|
8a86f7ec64
|
[parser] Adding context struct to feature function
|
2015-09-17 05:48:00 -04:00 |
|
Al
|
87ed7d9a0f
|
[geodb] Adding trie search methods for finding geodb phrases
|
2015-09-16 22:11:10 -04:00 |
|
Al
|
e62c75b9c6
|
[phrases] Adding _with_phrases versions of address dictionary methods for pre-allocated phrases
|
2015-09-16 21:24:28 -04:00 |
|
Al
|
23103a21d4
|
[phrases] Adding with_phrases versions of trie search methods for pre-allocated phrases
|
2015-09-16 21:23:34 -04:00 |
|
Al
|
d5ec005787
|
[transliteration] Similar init method for transliteration
|
2015-09-16 21:14:02 -04:00 |
|
Al
|
b11362ab98
|
[numex] using module init method for building, otherwise passing NULL path uses the default path
|
2015-09-16 21:13:05 -04:00 |
|
Al
|
3cba2e8df3
|
[api] Using default setup methods for submodules in libpostal setup
|
2015-09-15 14:01:33 -04:00 |
|
Al
|
e122824448
|
[expansion] Adding the ability to search address dictionary phrases with a NULL language, will return phrases in any language
|
2015-09-15 14:00:26 -04:00 |
|
Al
|
c47ff1b113
|
[utils] Adding source string to tokenized_string struct
|
2015-09-15 13:21:51 -04:00 |
|
Al
|
b2f690b6f6
|
[api] Error logging if modules can't be found
|
2015-09-15 13:21:15 -04:00 |
|
Al
|
9de3029dd3
|
[parser] Averaged perceptron training does full examples (greedily). During training, features are a hashtable, sorted and converted to a trie during finalize
|
2015-09-14 17:38:45 -04:00 |
|
Al
|
a5b5f80b04
|
[fix] new_copy
|
2015-09-14 16:50:23 -04:00 |
|
Al
|
3ea6358f77
|
[fix] vector zeros allocation
|
2015-09-14 16:50:08 -04:00 |
|
Al
|
c21f61b9b4
|
[parser] Default address parser path
|
2015-09-11 15:05:38 -07:00 |
|
Al
|
32c180528f
|
[tokens] Adding a copy_tokens option for tokenized_string
|
2015-09-11 15:04:29 -07:00 |
|
Al
|
9ce658b7a3
|
[collections] Adding string_array for an array of char pointers
|
2015-09-10 16:34:16 -07:00 |
|
Al
|
35b9122a1a
|
[utils] inlining a few functions
|
2015-09-10 16:33:54 -07:00 |
|
Al
|
6a5b01b51b
|
[parser] Averaged perceptron training
|
2015-09-10 10:26:24 -07:00 |
|
Al
|
0ddf50cb5f
|
[utils] add to feature array with printf syntax
|
2015-09-10 10:24:51 -07:00 |
|
Al
|
b3f89a207a
|
[utils] Version of string_split for single character delimiters which modifies the input string directly rather than creating (essentially) a copy
|
2015-09-09 18:07:31 -07:00 |
|
Al
|
607a607b71
|
[doc] documentation fix for averaged perceptron
|
2015-09-08 16:37:23 -07:00 |
|
Al
|
c80d8b8067
|
[parsing] Averaged perceptron model data structure for storing the finalized, averaged, sparse weights
|
2015-09-08 12:42:54 -07:00 |
|
Al
|
8d642b45b9
|
[fix] trie was returning early on add_at_index and not incrementing the num_keys
|
2015-09-08 11:41:46 -07:00 |
|
Al
|
ae7e30634b
|
[features] Adding counter/bag-of-words representation of features
|
2015-09-08 00:17:26 -07:00 |
|
Al
|
49d389b9d8
|
[refactor] changing names in int-valued hash tables
|
2015-09-08 00:15:14 -07:00 |
|
Al
|
2fffd76af8
|
[fix] typo
|
2015-09-07 23:58:34 -07:00 |
|
Al
|
aa454c4430
|
[fix] removing char_array_copy from header
|
2015-09-07 23:58:05 -07:00 |
|