2940cc15b8[fix] tokenized string destroy frees original string
Al
2015-09-19 01:40:41 -04:00
2b13871341[constants] max country code length
Al
2015-09-19 01:39:58 -04:00
0396823772[fix] geodb path separator
Al
2015-09-19 01:39:31 -04:00
17cfdb0625[fix] adding char_array_append_* methods to header
Al
2015-09-18 13:19:42 -04:00
f2f7db92ff[fix] phrases
Al
2015-09-18 13:19:18 -04:00
b74e92adad[fix] include
Al
2015-09-18 13:18:49 -04:00
2a869894d9[fix] geodb
Al
2015-09-18 13:18:26 -04:00
9e9131bda0[parser] Averaged perceptron tagger
Al
2015-09-17 05:51:24 -04:00
8a86f7ec64[parser] Adding context struct to feature function
Al
2015-09-17 05:48:00 -04:00
87ed7d9a0f[geodb] Adding trie search methods for finding geodb phrases
Al
2015-09-16 22:11:10 -04:00
e62c75b9c6[phrases] Adding _with_phrases versions of address dictionary methods for pre-allocated phrases
Al
2015-09-16 21:24:23 -04:00
23103a21d4[phrases] Adding with_phrases versions of trie search methods for pre-allocated phrases
Al
2015-09-16 21:23:34 -04:00
d5ec005787[transliteration] Similar init method for transliteration
Al
2015-09-16 21:14:02 -04:00
b11362ab98[numex] using module init method for building, otherwise passing NULL path uses the default path
Al
2015-09-16 21:13:05 -04:00
3cba2e8df3[api] Using default setup methods for submodules in libpostal setup
Al
2015-09-15 14:01:33 -04:00
e122824448[expansion] Adding the ability to search address dictionary phrases with a NULL language, will return phrases in any language
Al
2015-09-15 13:24:53 -04:00
c47ff1b113[utils] Adding source string to tokenized_string struct
Al
2015-09-15 13:21:51 -04:00
b2f690b6f6[api] Error logging if modules can't be found
Al
2015-09-15 13:21:11 -04:00
9de3029dd3[parser] Averaged perceptron training does full examples (greedily). During training, features are a hashtable, sorted and converted to a trie during finalize
Al
2015-09-14 17:38:45 -04:00
a5b5f80b04[fix] new_copy
Al
2015-09-14 16:50:23 -04:00
3ea6358f77[fix] vector zeros allocation
Al
2015-09-14 16:50:08 -04:00
c21f61b9b4[parser] Default address parser path
Al
2015-09-11 15:05:07 -07:00
32c180528f[tokens] Adding a copy_tokens option for tokenized_string
Al
2015-09-11 15:03:09 -07:00
9ce658b7a3[collections] Adding string_array for an array of char pointers
Al
2015-09-10 16:34:16 -07:00
35b9122a1a[utils] inlining a few functions
Al
2015-09-10 16:33:54 -07:00
35f1c02caf[polygons] Reducing simplify tolerance for language polys now that regional languages are handled separately
Al
2015-09-10 12:44:13 -07:00
440a8158b6[polygons] Adding in country languages for regional polygons without a default language
Al
2015-09-10 12:34:26 -07:00
22c16b43cf[languages] Italian is also the regional default in Valle D'Aosta and Trentino-Alto Adige
Al
2015-09-10 11:09:13 -07:00
fca7f21b1d[polygons] Making simplify_tolerance and preserve_topology for polygon simplification configurable per class
Al
2015-09-10 11:06:18 -07:00
6a5b01b51b[parser] Averaged perceptron training
Al
2015-09-10 10:25:52 -07:00
0ddf50cb5f[utils] add to feature array with printf syntax
Al
2015-09-10 10:24:51 -07:00
b3f89a207a[utils] Version of string_split for single character delimiters which modifies the input string directly rather than creating (essentially) a copy
Al
2015-09-09 17:41:23 -07:00
c1da2fa94b[dictionaries] Adding 'Rang' to French dictionaries
Al
2015-09-09 17:21:26 -07:00
b85fe50fad[osm] Training data for toponyms only cares about valid languages for name field
Al
2015-09-08 16:38:05 -07:00
607a607b71[doc] documentation fix for averaged perceptron
Al
2015-09-08 16:37:23 -07:00
c80d8b8067[parsing] Averaged perceptron model data structure for storing the finalized, averaged, sparse weights
Al
2015-09-08 12:02:15 -07:00
8d642b45b9[fix] trie was returning early on add_at_index and not incrementing the num_keys
Al
2015-09-08 11:41:46 -07:00
e566063343[osm] Doing an all-to-nodes conversion and an additional filter on the borders data set
Al
2015-09-08 09:18:05 -07:00
ae7e30634b[features] Adding counter/bag-of-words representation of features
Al
2015-09-08 00:17:26 -07:00
49d389b9d8[refactor] changing names in int-valued hash tables
Al
2015-09-08 00:15:14 -07:00
2fffd76af8[fix] typo
Al
2015-09-07 23:58:34 -07:00
aa454c4430[fix] removing char_array_copy from header
Al
2015-09-07 23:58:05 -07:00
3fd6552b44[fix] void not void * in vector *_copy
Al
2015-09-07 23:57:44 -07:00
cddffdb65f[math] Adding column and row sums to sparse matrices
Al
2015-09-07 00:34:00 -07:00
8525529968[osm] Not requiring qualified name tags to process OSM toponyms
Al
2015-09-06 21:03:01 -07:00
9d2ca08fc2[utils] Adding _copy and _new_copy methods to vectors (the former copies data to a pre-allocated vector, the latter allocates a new vector)
Al
2015-09-06 20:46:29 -07:00
49fe504201[math] Matrix get value at row, column index
Al
2015-09-06 12:37:10 -07:00
ec3ab7234a[utils] Adding index to cstring_array_foreach, similar to Python's enumerate
Al
2015-09-04 19:34:00 -04:00
df20e2cbc0[osm] Including toponyms in the training data for countries where the unqualified place names can be assumed to be examples of a given language
Al
2015-09-04 14:13:26 -04:00
17fcfa8b59[fix] adding house to ignore keys rather than aliasing it
Al
2015-09-04 12:40:08 -04:00
d64a27bc57[osm] Converting relations to nodes in borders training data
Al
2015-09-04 12:32:25 -04:00
168b7f59da[fix] default indices in strip_component
Al
2015-09-04 12:29:47 -04:00
64db63e3eb[osm] Removing house tag
Al
2015-09-04 12:23:47 -04:00
6a20ce5e85[language_id] Adding formatted addresses and toponyms to language training data
Al
2015-09-04 01:46:49 -04:00
294101ad80[osm] Treating components that are all punctuation as blank in address parsing (e.g. a single comma)
Al
2015-09-03 17:46:57 -04:00
e1e5c16637[osm] Not adding unqualified name tags to toponym data set, throwing out a few cases of language ambiguity
Al
2015-09-03 16:50:30 -04:00
040a26a6f2[fix] import
Al
2015-09-03 13:54:23 -04:00
7787427c58[fix] typo
Al
2015-09-03 13:53:18 -04:00
23633e95dd[osm] Only adding country default language toponyms to training data
Al
2015-09-03 13:44:41 -04:00
11c01f64d2[osm] OrderedDict of attrs in OSM training data
Al
2015-09-03 11:11:18 -04:00
27eb4e4aed[osm] Adding a toponym language training set using planet-borders.osm (all admin borders)
Al
2015-09-03 10:19:11 -04:00
db57855c95[osm] Switching formatter repo to the OpenVenues fork, with fixes and several dozen new countries added
Al
2015-09-03 10:06:54 -04:00
a916668f28[i18n] Local file for ISO 15924
Al
2015-09-01 23:58:36 -04:00
ee4d73c65d[math] sparse matrix I/O methods
Al
2015-09-01 00:29:11 -04:00
a8f6617294[phrases] Adding num_keys attribute to trie
Al
2015-08-31 21:41:34 -04:00
aac5b37e76[fix] Removing default dirent include
Al
2015-08-31 21:38:29 -04:00
bb50c7ea2c[math] Adding sigmoid and softmax functions
Al
2015-08-31 20:27:58 -04:00
a090a22bca[math] Adding compressed sparse row (CSR) format sparse matrix, designed for dynamic construction, just the methods needed for logistic regression for now i.e. no sparse dot products
Al
2015-08-31 15:52:28 -04:00
0f617454d3[math] Dense matrices
Al
2015-08-31 14:57:11 -04:00
0ee72b8dfb[math] can only use memset for *_array_new_zeros
Al
2015-08-31 14:44:43 -04:00
c566eaecf1[dictionaries] Rebuilding address expansion data and uploading new files to S3
Al
2015-08-31 14:33:28 -04:00
789150ae33[math] Using regular C arrays instead of vectors for vector_math.h
Al
2015-08-30 02:41:31 -04:00
07b0bed602[math] Only float vectors have *_array_log, *_array_exp, etc.
Al
2015-08-26 17:58:01 -04:00
a2ec8001b0[osm] Removing postal code keys in formatted language training data
Al
2015-08-24 14:08:36 -04:00
8bbcb60aee[languages] Moving search_suffix and search_prefix into methods
Al
2015-08-24 14:04:28 -04:00
c68f56e61d[fix] paths
Al
2015-08-24 12:58:27 -04:00
d620cb6fc3[fix] Calculating splits in Python rather than bash
Al
2015-08-24 12:47:51 -04:00