Commit Graph

  • 73d27caeb9 Fix log_* formats which expect long long uint but receive uint64_t. Iestyn Pryce 2017-05-21 10:57:20 +01:00
  • 695756d484 [dictionaries] add more option on toponyms Yanuar Budi Baskoro 2017-05-21 16:56:14 +07:00
  • 0c3ef33682 Merge latest https://github.com/openvenues/libpostal Iestyn Pryce 2017-05-21 10:30:27 +01:00
  • 6aa3cb61fd Fix log_* formats which expect long long int but receive int64_t. Iestyn Pryce 2017-05-21 10:29:34 +01:00
  • 03be9eea49 [dictionaries] Remove additional english words from ID dictionary Yanuar Budi Baskoro 2017-05-21 15:58:02 +07:00
  • 09cb28cb14 [dictionaries] Remove english words from ID dictionary Yanuar Budi Baskoro 2017-05-21 15:39:47 +07:00
  • b79934394a Merge pull request #204 from iestynpryce/master Al Barrentine 2017-05-20 21:28:28 -04:00
  • ecd07b18c1 Fix log_* formats which expect size_t but receive uint32_t. Iestyn Pryce 2017-05-19 22:31:56 +01:00
  • 3b2fb597fe [dictionaries] Fix blank synonym in numbers Yanuar Budi Baskoro 2017-05-20 01:04:12 +07:00
  • 7f14dafd21 [dictionaries] Fix blank synonym in academic degrees Yanuar Budi Baskoro 2017-05-20 01:00:28 +07:00
  • 2514580611 [dictionaries] Indonesian dictionaries to support new config Yanuar Budi Baskoro 2017-05-19 18:44:32 +07:00
  • 60cde05c3d [dictionaries] Indonesian dictionaries to support new config Yanuar Budi Baskoro 2017-05-19 18:39:48 +07:00
  • 87a76bf967 Fix log_{debug,info} formats which expect size_t but receive int. Iestyn Pryce 2017-05-17 22:40:53 +01:00
  • 2a0fb69ae5 Merge pull request #201 from iestynpryce/master Al Barrentine 2017-05-14 20:53:15 -04:00
  • f34fc56fec Fix log_debug formats which expect unsigned int but receive size_t Iestyn Pryce 2017-05-14 17:48:26 +01:00
  • a7e67c4967 [fix] adding maximum number of permutations for libpostal_expand_address to consider (n=100 for both the inner and outer loop, so max strings=10000), fixes #200 Al 2017-05-13 14:11:08 -04:00
  • 5780a08b48 [fix] check that possible ordinal suffix also has non-zero digit length before normalizing Al 2017-05-12 15:48:20 -04:00
  • cea3ced533 [fix] open files in binary format for #69 Al 2017-05-03 17:34:38 -04:00
  • 6ea2273263 [fix] terminate the char_array if input token is zero-length in add_normalized_token Al 2017-04-28 11:25:07 -04:00
  • 04eb2d4539 Merge pull request #189 from openvenues/fix_trie_search Al Barrentine 2017-04-21 14:39:03 -04:00
  • 278679b7fb [fix] in tokenized trie_search, in the case of a partial failed match, reset to the root node before rolling the pointer back to phrase start + 1 Al 2017-04-21 13:51:07 -04:00
  • 074b6ff802 [auto][ci skip] Adding data files from Travis build #231 Travis 2017-04-20 02:39:39 +00:00
  • 004d3d98c9 Merge pull request #187 from openvenues/degree_symbol_ordinal_suffix Al Barrentine 2017-04-19 22:29:10 -04:00
  • 7bce358ca6 [fix] whitespace in numex config to trigger build Al 2017-04-19 21:14:54 -04:00
  • 676fb9bcbc [fix] no parens in travis config grep for numex change detection Al 2017-04-19 21:14:19 -04:00
  • 86956db055 [fix] adding numex change to trigger build Al 2017-04-19 21:00:59 -04:00
  • e81580287d [test] adding tests for ordinal suffix normalization Al 2017-04-19 20:59:36 -04:00
  • 85297f3333 [fix] numex change detection in Travis build Al 2017-04-19 20:58:03 -04:00
  • 4762ff2638 [auto][ci skip] Adding data files from Travis build #228 Travis 2017-04-20 00:51:42 +00:00
  • e92c3c2867 Merge pull request #186 from openvenues/degree_symbol_ordinal_suffix Al Barrentine 2017-04-19 20:39:22 -04:00
  • f3adde746e [numex] adding ability to handle handle the degree symbol in numex parsing since it's technically a separate token Al 2017-04-19 20:18:21 -04:00
  • 19899b2f7d [dictionaries] adding degree symbol "°" variant for any surface forms that have "º" Al 2017-04-19 19:25:25 -04:00
  • c968dd4ecc [numex] adding "°" as additional ordinal suffix for Spanish, Italian, and Portuguese Al 2017-04-19 19:22:28 -04:00
  • 254f3622ea Merge pull request #185 from Ironholds/master Al Barrentine 2017-04-19 09:08:59 -04:00
  • 18a5d06427 Merge pull request #1 from Ironholds/Ironholds-patch-1 Oliver Keyes 2017-04-18 21:53:24 -07:00
  • 35821f975e Remove unused variable Oliver Keyes 2017-04-18 21:25:00 -07:00
  • e0c82b5edb Merge pull request #184 from openvenues/remove_ordinal_suffix Al Barrentine 2017-04-18 22:33:00 -04:00
  • 9cd3ec37f9 [build] rebuild numex table in Travis if either the configs change or numex_table_builder.c changes Al 2017-04-18 21:42:01 -04:00
  • f3cf119e58 [build] Makefile changes to support moving numeric expression parsing to normalize.c Al 2017-04-18 21:41:24 -04:00
  • cddc368533 [numex] adding one form of normalization which strips ordinal suffixes so {96th, Ninety-sixth} => 96. This is an additional form of normalization, so there's still one form where the suffixes are kept. One case that's still not handled is something like "IXe Arrondissement" Al 2017-04-18 21:39:54 -04:00
  • 92051863ba [numex] adding ordinal suffixes themselves to the numex trie so they can be removed from strings Al 2017-04-18 17:20:02 -04:00
  • 63ac3cf921 Merge pull request #183 from openvenues/cdn Al Barrentine 2017-04-17 14:39:35 -04:00
  • d2732922c2 [data] deployed model files and training data to CloudFront for easier downloading around the world and in places like China where the Great Fire Wall may prevent large downloads from abroad. TTL is set to 0 so it still caches the files themselves but checks with origin for the If-Modified-Since headers, allowing the files to be updated dynamically Al 2017-04-17 14:11:44 -04:00
  • 5699ef3da0 Merge pull request #181 from eefi/bug/various/initializer Al Barrentine 2017-04-13 16:22:33 -04:00
  • 36dc41af8c Merge branch 'master' of https://github.com/openvenues/libpostal Al 2017-04-13 16:02:06 -04:00
  • 413c584f08 [fix] need to set prev_state to the NULL state in numex parsing after a non-space/non-hyphen is encountered and the previous match, if any, is added to the result array Al 2017-04-13 16:01:46 -04:00
  • f9b57dbd42 [fix] don't use unnamed fields in initializers Austin Chu 2017-04-13 14:33:50 -04:00
  • 7bef84676e Merge pull request #180 from eefi/bug/tagger/include-guard Al Barrentine 2017-04-13 13:58:13 -04:00
  • a966712e18 [fix] add #include guard to tagger.h Austin Chu 2017-04-13 13:02:03 -04:00
  • 32c8662f8d Merge pull request #177 from eefi/bug/matrix/clbas Al Barrentine 2017-04-12 20:58:00 -04:00
  • 19a04511ba [fix] typo in compiler warning when no CBLAS found Austin Chu 2017-04-12 20:40:08 -04:00
  • b464eb6c07 [numex] fix numex parsing when the spelled-out number is followed by a comma or other punctuation Al 2017-04-11 16:28:33 -04:00
  • fc91471434 [osm/boundaries] check polygons with an ISO3166-2 as well in the country polygon index in case the country polygon is funky Al 2017-04-09 02:15:42 -04:00
  • 4ecd6c23c6 [formatting] removing the ability to insert city between house number and road in France from discussion in #27 Al 2017-04-08 15:42:59 -04:00
  • 7f7aada32a [build] add another housekeeping file in the datadir for data_version. Blow away the exiting files if that file either doesn't exist or doesn't contain a matching version string to help with upgrades Al 2017-04-07 17:40:27 -04:00
  • 4f9b0ef495 [docs][ci skip] adding note about using libpostal on mobile Al 2017-04-07 00:55:39 -04:00
  • 6984427eb9 [docs][ci skip] add link to the 1.0 blog post Al 2017-04-06 13:19:45 -04:00
  • 5605ba3185 [docs] adding note about the newly-trained language classifier trained with FTRL-Proximal (now 1/10th the size), which keeps its high accuracy while maintaining a sparse solution. This commit will trigger a build with the freshly uploaded model. Al 2017-04-06 11:43:54 -04:00
  • 5a96be5d5c [fix][ci skip] S3 upload paths in data upload/download script Al 2017-04-06 00:37:09 -04:00
  • d8409f1f38 [auto][ci skip] Adding data files from Travis build #210 Travis 2017-04-06 04:06:16 +00:00
  • 918342d4c3 Merge pull request #171 from openvenues/parser-data Al Barrentine 2017-04-05 23:51:27 -04:00
  • c01e67c1e4 [fix] removing one of the warnings about C90 since this is entirely C99. Al 2017-04-05 14:51:18 -04:00
  • caebf4e2c9 [classification] correcting cost functions in SGD and FTRL for use in parameter sweeps Al 2017-04-05 14:08:51 -04:00
  • 6219cc6378 [numex] add dehyphenated form when building numex table Al 2017-04-05 14:06:19 -04:00
  • 264866d719 [build/fix] autoconf syntax for Ubuntu (12.04) version of autoconf aka that used on Travis Al 2017-04-05 09:43:24 -04:00
  • ef0d4c2ded [build] fixing checks in numex.py, run when the resources/numex directory changes Al 2017-04-05 08:53:48 -04:00
  • 0ec2e57afa [fix] adding yaml to requirements-simple.txt for CI Al 2017-04-05 08:33:39 -04:00
  • 64fae1e241 [fix] /AC_CONFIG_MACRO_DIRS/AC_CONFIG_MACRO_DIR/ Al 2017-04-05 08:27:44 -04:00
  • 2b3fb196a1 [build] add pkg-config to packages in Travis config, remove libsnappy-dev Al 2017-04-05 08:24:26 -04:00
  • 8cef3c4eb9 [docs] new parser GIF, featuring addresses relevant to current events Al 2017-04-05 07:21:48 -04:00
  • aaae1e055e [docs] fix spacing Al 2017-04-05 02:03:39 -04:00
  • 9c7eac61eb [docs] merge README from master, move bindings below examples Al 2017-04-05 02:02:59 -04:00
  • 8ec6e546f5 [test] adding more tests from the demo Al 2017-04-04 20:50:19 -04:00
  • 22443e31cc [parser] removing special commands other than .exit from address_parser_cli Al 2017-04-04 20:49:37 -04:00
  • 8742574257 [parser] storing address_parser_context on the parser struct itself so it doesn't have to be allocated every time Al 2017-04-04 20:40:55 -04:00
  • 67157fbd98 [docs] moving blog post to first paragraph Al 2017-04-03 21:04:37 -04:00
  • b8f65d0a06 [docs] aesthetic README changes Al 2017-04-03 18:18:02 -04:00
  • f746c6eec6 [openaddresses] Sampson and Yadkin counties, NC, and Union County, SC Al 2017-04-03 18:08:55 -04:00
  • bca449e653 [openaddresses] Rown County, NC Al 2017-04-03 17:57:03 -04:00
  • 6102fd3459 [openaddresses] Carteret County, NC Al 2017-04-03 16:55:21 -04:00
  • 342740c3a6 [openaddresses] Bladen County, NC Al 2017-04-03 16:53:43 -04:00
  • 7c67ca6edb [openaddresses] Beaufort County, NC Al 2017-04-03 16:52:15 -04:00
  • 680a2e6357 [openaddresses] city of Ruidoso, NM Al 2017-04-03 16:50:27 -04:00
  • 921e635b7a [openaddresses] add Caddo Parisn, LA Al 2017-04-03 16:48:30 -04:00
  • e0dc0c9b86 [openaddresses] add Desoto County, FL Al 2017-04-03 16:45:56 -04:00
  • 20adc591a8 [openaddresses] adding OSM boundaries to Clear Creek County, CO as new data set doesn't list city Al 2017-04-03 16:38:50 -04:00
  • 4b16b5bccd [docs] README fixes Al 2017-04-03 16:35:48 -04:00
  • 97ffdbaee0 [openaddresses] removing Lawrence County, SD. Covered by new statewide and has some weird addresses Al 2017-04-03 16:16:52 -04:00
  • e4290a489f [openaddresses] Fall River County, SD Al 2017-04-03 16:15:21 -04:00
  • c3a6445290 [docs] README updates for 1.0 release, adding training data section Al 2017-04-03 15:59:01 -04:00
  • 65a0d82bda [openaddresses] moving Buenos Aires, adding Boulder County, CO Al 2017-04-03 13:08:31 -04:00
  • eff7a7a27a [optimization] moving regularization methods to their own module Al 2017-04-03 00:16:30 -04:00
  • 957aa0c0c9 [utils] cartesian product iterator for grid search during model selection Al 2017-04-03 00:15:31 -04:00
  • 4a72afc712 [build] Makefile changes for new language_classifier_train Al 2017-04-02 23:55:31 -04:00
  • 378a11c88f [fix] expansion array destroy API in libpostal expand program Al 2017-04-02 23:55:04 -04:00
  • c5e2f89ee9 [fix] declaring is_common_script function as static Al 2017-04-02 23:53:21 -04:00
  • 5dfdd4b7eb [language_classification] Runtime language classifier can now use dense or sparse weights, with a different header signature for the sparse version (using old signature for the dense version, so backward-compatible) Al 2017-04-02 23:51:54 -04:00
  • 835d851310 [log] log the offending line if token count does not match in language_classifier_io Al 2017-04-02 23:47:07 -04:00
  • 964ac15e51 [language_classification] adding options to language_classifier_train for using SGD with {L2, L1} regularization or FTRL-Proximal using both. Al 2017-04-02 23:33:58 -04:00
  • 58661c9f27 [languages] adding replace_hyphens and split_alpha_from_numeric in language classifier input normalization Al 2017-04-02 23:32:24 -04:00