2514580611[dictionaries] Indonesian dictionaries to support new config
Yanuar Budi Baskoro
2017-05-19 18:44:32 +07:00
60cde05c3d[dictionaries] Indonesian dictionaries to support new config
Yanuar Budi Baskoro
2017-05-19 18:39:48 +07:00
87a76bf967Fix log_{debug,info} formats which expect size_t but receive int.
Iestyn Pryce
2017-05-17 22:40:53 +01:00
2a0fb69ae5Merge pull request #201 from iestynpryce/master
Al Barrentine
2017-05-14 20:53:15 -04:00
f34fc56fecFix log_debug formats which expect unsigned int but receive size_t
Iestyn Pryce
2017-05-14 17:48:26 +01:00
a7e67c4967[fix] adding maximum number of permutations for libpostal_expand_address to consider (n=100 for both the inner and outer loop, so max strings=10000), fixes#200
Al
2017-05-13 14:11:08 -04:00
5780a08b48[fix] check that possible ordinal suffix also has non-zero digit length before normalizing
Al
2017-05-12 15:48:20 -04:00
cea3ced533[fix] open files in binary format for #69
Al
2017-05-03 17:34:38 -04:00
6ea2273263[fix] terminate the char_array if input token is zero-length in add_normalized_token
Al
2017-04-28 11:25:07 -04:00
04eb2d4539Merge pull request #189 from openvenues/fix_trie_search
Al Barrentine
2017-04-21 14:39:03 -04:00
278679b7fb[fix] in tokenized trie_search, in the case of a partial failed match, reset to the root node before rolling the pointer back to phrase start + 1
Al
2017-04-21 13:51:07 -04:00
074b6ff802[auto][ci skip] Adding data files from Travis build #231
Travis
2017-04-20 02:39:39 +00:00
004d3d98c9Merge pull request #187 from openvenues/degree_symbol_ordinal_suffix
Al Barrentine
2017-04-19 22:29:10 -04:00
7bce358ca6[fix] whitespace in numex config to trigger build
Al
2017-04-19 21:14:54 -04:00
676fb9bcbc[fix] no parens in travis config grep for numex change detection
Al
2017-04-19 21:14:19 -04:00
86956db055[fix] adding numex change to trigger build
Al
2017-04-19 21:00:59 -04:00
e81580287d[test] adding tests for ordinal suffix normalization
Al
2017-04-19 20:59:36 -04:00
85297f3333[fix] numex change detection in Travis build
Al
2017-04-19 20:58:03 -04:00
4762ff2638[auto][ci skip] Adding data files from Travis build #228
Travis
2017-04-20 00:51:42 +00:00
e92c3c2867Merge pull request #186 from openvenues/degree_symbol_ordinal_suffix
Al Barrentine
2017-04-19 20:39:22 -04:00
f3adde746e[numex] adding ability to handle handle the degree symbol in numex parsing since it's technically a separate token
Al
2017-04-19 20:18:21 -04:00
19899b2f7d[dictionaries] adding degree symbol "°" variant for any surface forms that have "º"
Al
2017-04-19 19:25:25 -04:00
c968dd4ecc[numex] adding "°" as additional ordinal suffix for Spanish, Italian, and Portuguese
Al
2017-04-19 19:22:28 -04:00
254f3622eaMerge pull request #185 from Ironholds/master
Al Barrentine
2017-04-19 09:08:59 -04:00
18a5d06427Merge pull request #1 from Ironholds/Ironholds-patch-1
Oliver Keyes
2017-04-18 21:53:24 -07:00
35821f975eRemove unused variable
Oliver Keyes
2017-04-18 21:25:00 -07:00
e0c82b5edbMerge pull request #184 from openvenues/remove_ordinal_suffix
Al Barrentine
2017-04-18 22:33:00 -04:00
9cd3ec37f9[build] rebuild numex table in Travis if either the configs change or numex_table_builder.c changes
Al
2017-04-18 21:42:01 -04:00
f3cf119e58[build] Makefile changes to support moving numeric expression parsing to normalize.c
Al
2017-04-18 21:41:24 -04:00
cddc368533[numex] adding one form of normalization which strips ordinal suffixes so {96th, Ninety-sixth} => 96. This is an additional form of normalization, so there's still one form where the suffixes are kept. One case that's still not handled is something like "IXe Arrondissement"
Al
2017-04-18 21:39:54 -04:00
92051863ba[numex] adding ordinal suffixes themselves to the numex trie so they can be removed from strings
Al
2017-04-18 17:20:02 -04:00
63ac3cf921Merge pull request #183 from openvenues/cdn
Al Barrentine
2017-04-17 14:39:35 -04:00
d2732922c2[data] deployed model files and training data to CloudFront for easier downloading around the world and in places like China where the Great Fire Wall may prevent large downloads from abroad. TTL is set to 0 so it still caches the files themselves but checks with origin for the If-Modified-Since headers, allowing the files to be updated dynamically
Al
2017-04-17 14:11:44 -04:00
5699ef3da0Merge pull request #181 from eefi/bug/various/initializer
Al Barrentine
2017-04-13 16:22:33 -04:00
413c584f08[fix] need to set prev_state to the NULL state in numex parsing after a non-space/non-hyphen is encountered and the previous match, if any, is added to the result array
Al
2017-04-13 16:01:46 -04:00
f9b57dbd42[fix] don't use unnamed fields in initializers
Austin Chu
2017-04-13 14:33:50 -04:00
7bef84676eMerge pull request #180 from eefi/bug/tagger/include-guard
Al Barrentine
2017-04-13 13:58:13 -04:00
32c8662f8dMerge pull request #177 from eefi/bug/matrix/clbas
Al Barrentine
2017-04-12 20:58:00 -04:00
19a04511ba[fix] typo in compiler warning when no CBLAS found
Austin Chu
2017-04-12 20:40:08 -04:00
b464eb6c07[numex] fix numex parsing when the spelled-out number is followed by a comma or other punctuation
Al
2017-04-11 16:28:33 -04:00
fc91471434[osm/boundaries] check polygons with an ISO3166-2 as well in the country polygon index in case the country polygon is funky
Al
2017-04-09 02:15:42 -04:00
4ecd6c23c6[formatting] removing the ability to insert city between house number and road in France from discussion in #27
Al
2017-04-08 15:42:59 -04:00
7f7aada32a[build] add another housekeeping file in the datadir for data_version. Blow away the exiting files if that file either doesn't exist or doesn't contain a matching version string to help with upgrades
Al
2017-04-07 17:40:27 -04:00
4f9b0ef495[docs][ci skip] adding note about using libpostal on mobile
Al
2017-04-07 00:55:39 -04:00
6984427eb9[docs][ci skip] add link to the 1.0 blog post
Al
2017-04-06 13:19:45 -04:00
5605ba3185[docs] adding note about the newly-trained language classifier trained with FTRL-Proximal (now 1/10th the size), which keeps its high accuracy while maintaining a sparse solution. This commit will trigger a build with the freshly uploaded model.
Al
2017-04-06 11:43:54 -04:00
5a96be5d5c[fix][ci skip] S3 upload paths in data upload/download script
Al
2017-04-06 00:37:09 -04:00
d8409f1f38[auto][ci skip] Adding data files from Travis build #210
Travis
2017-04-06 04:06:16 +00:00
918342d4c3Merge pull request #171 from openvenues/parser-data
Al Barrentine
2017-04-05 23:51:27 -04:00
c01e67c1e4[fix] removing one of the warnings about C90 since this is entirely C99.
Al
2017-04-05 14:51:18 -04:00
caebf4e2c9[classification] correcting cost functions in SGD and FTRL for use in parameter sweeps
Al
2017-04-05 14:08:51 -04:00
6219cc6378[numex] add dehyphenated form when building numex table
Al
2017-04-05 14:06:19 -04:00
264866d719[build/fix] autoconf syntax for Ubuntu (12.04) version of autoconf aka that used on Travis
Al
2017-04-05 09:43:24 -04:00
ef0d4c2ded[build] fixing checks in numex.py, run when the resources/numex directory changes
Al
2017-04-05 08:53:48 -04:00
0ec2e57afa[fix] adding yaml to requirements-simple.txt for CI
Al
2017-04-05 08:33:39 -04:00
64fae1e241[fix] /AC_CONFIG_MACRO_DIRS/AC_CONFIG_MACRO_DIR/
Al
2017-04-05 08:27:44 -04:00
2b3fb196a1[build] add pkg-config to packages in Travis config, remove libsnappy-dev
Al
2017-04-05 08:24:26 -04:00
8cef3c4eb9[docs] new parser GIF, featuring addresses relevant to current events
Al
2017-04-05 07:21:48 -04:00
aaae1e055e[docs] fix spacing
Al
2017-04-05 02:03:39 -04:00
9c7eac61eb[docs] merge README from master, move bindings below examples
Al
2017-04-05 02:02:59 -04:00
8ec6e546f5[test] adding more tests from the demo
Al
2017-04-04 20:50:19 -04:00
22443e31cc[parser] removing special commands other than .exit from address_parser_cli
Al
2017-04-04 20:49:37 -04:00
8742574257[parser] storing address_parser_context on the parser struct itself so it doesn't have to be allocated every time
Al
2017-04-04 20:40:55 -04:00
67157fbd98[docs] moving blog post to first paragraph
Al
2017-04-03 21:04:37 -04:00
b8f65d0a06[docs] aesthetic README changes
Al
2017-04-03 18:18:02 -04:00
f746c6eec6[openaddresses] Sampson and Yadkin counties, NC, and Union County, SC
Al
2017-04-03 18:08:55 -04:00
bca449e653[openaddresses] Rown County, NC
Al
2017-04-03 17:57:03 -04:00
6102fd3459[openaddresses] Carteret County, NC
Al
2017-04-03 16:55:21 -04:00
342740c3a6[openaddresses] Bladen County, NC
Al
2017-04-03 16:53:43 -04:00
7c67ca6edb[openaddresses] Beaufort County, NC
Al
2017-04-03 16:52:15 -04:00
680a2e6357[openaddresses] city of Ruidoso, NM
Al
2017-04-03 16:50:27 -04:00
921e635b7a[openaddresses] add Caddo Parisn, LA
Al
2017-04-03 16:48:30 -04:00
e0dc0c9b86[openaddresses] add Desoto County, FL
Al
2017-04-03 16:45:56 -04:00
20adc591a8[openaddresses] adding OSM boundaries to Clear Creek County, CO as new data set doesn't list city
Al
2017-04-03 16:38:50 -04:00
4b16b5bccd[docs] README fixes
Al
2017-04-03 16:35:48 -04:00
97ffdbaee0[openaddresses] removing Lawrence County, SD. Covered by new statewide and has some weird addresses
Al
2017-04-03 16:16:52 -04:00
e4290a489f[openaddresses] Fall River County, SD
Al
2017-04-03 16:15:21 -04:00
c3a6445290[docs] README updates for 1.0 release, adding training data section
Al
2017-04-03 15:59:01 -04:00
65a0d82bda[openaddresses] moving Buenos Aires, adding Boulder County, CO
Al
2017-04-03 13:08:31 -04:00
eff7a7a27a[optimization] moving regularization methods to their own module
Al
2017-04-03 00:16:30 -04:00
957aa0c0c9[utils] cartesian product iterator for grid search during model selection
Al
2017-04-03 00:15:31 -04:00
4a72afc712[build] Makefile changes for new language_classifier_train
Al
2017-04-02 23:55:31 -04:00
378a11c88f[fix] expansion array destroy API in libpostal expand program
Al
2017-04-02 23:55:04 -04:00
c5e2f89ee9[fix] declaring is_common_script function as static
Al
2017-04-02 23:53:21 -04:00
5dfdd4b7eb[language_classification] Runtime language classifier can now use dense or sparse weights, with a different header signature for the sparse version (using old signature for the dense version, so backward-compatible)
Al
2017-04-02 23:51:54 -04:00
835d851310[log] log the offending line if token count does not match in language_classifier_io
Al
2017-04-02 23:47:07 -04:00
964ac15e51[language_classification] adding options to language_classifier_train for using SGD with {L2, L1} regularization or FTRL-Proximal using both.
Al
2017-04-02 23:33:58 -04:00
58661c9f27[languages] adding replace_hyphens and split_alpha_from_numeric in language classifier input normalization
Al
2017-04-02 23:32:24 -04:00