Commit Graph

  • 18eb5ef9ee Merge pull request #272 from AeroXuk/master Al Barrentine 2017-11-28 21:35:46 -05:00
  • 19ae97d527 Adding include config.h to strndup.c so that the function is not compiled and doesn't cause errors when the system has its own implementation. AeroXuk 2017-11-27 23:40:46 +00:00
  • 9090811826 Modifed the libpostal API to add an extra function libpostal_parser_print_features to toggle debugging info. Updated address_parser app to use the new function. AeroXuk 2017-11-27 19:20:37 +00:00
  • 69e0d5d963 Updated linenoise to be MSys2/MinGW compatible. Updated address_parser app to use the defined libpostal api and not include internal components directly. Removed windows src Makefile as it is now the same as the standard one. AeroXuk 2017-11-27 01:42:25 +00:00
  • bb5535602a Adding libpostal.h to the AppVeyor package. AeroXuk 2017-11-25 10:13:14 +00:00
  • 26ac9ab5c2 Removing EXPORT statements from all source files and most header files, leaving only the exports for the main API in libpostal.h. Modified Makefiles so that all the test apps build without having extra functions exported from libpostal. AeroXuk 2017-11-25 04:35:28 +00:00
  • 15b3758be8 [auto][ci skip] Adding data files from Travis build #284 Travis 2017-11-24 22:29:45 +00:00
  • 7d001489ef Merge pull request #274 from openvenues/fix_oh_expansion Al Barrentine 2017-11-24 17:13:24 -05:00
  • ebe7fc9be9 [test] missing paren in Columbus, OH test. Adding test for "oh" as part of a number in Nineteen oh one W El Segundo Blvd Al 2017-11-24 16:11:07 -05:00
  • d7f22544b4 [test] adding an expansion test for the Columbus, OH case Al 2017-11-24 15:44:37 -05:00
  • ef098fd2e7 [numex] implementing the numex concat_only_if_number left context, which helps in the case of e.g. Columbus, OH in #271 Al 2017-11-24 15:42:50 -05:00
  • c276cf1529 [numex] adding a new type of left context for numeric expressions called conat_only_if_number (for something like "oh" which can be "Columbus, OH" or something like "Twenty-One Oh One" Al 2017-11-24 15:36:50 -05:00
  • f0246e7333 Fix bug in strndup fix for windows. Move all includes out of headers and into code for strndup.h and move it to be the last include. AeroXuk 2017-11-23 19:11:25 +00:00
  • d205f4d2bb Adding artifacts to AppVeyor config. AeroXuk 2017-11-23 02:24:06 +00:00
  • f07ab765cb Adding the export marker to all functions used in tests. AeroXuk 2017-11-20 20:58:37 +00:00
  • ad682b7592 Altered Makefile to include strndup.c on the other programs which require it. For the windows version of the Makefile, commented out address_parser lines as it has dependencies on includes we don't have. AeroXuk 2017-11-20 20:24:11 +00:00
  • dbf232b8f8 Fix bugs in AppVeyor config and build script. Added call to test script. AeroXuk 2017-11-19 13:35:08 +00:00
  • 2d3b420d35 Merging changes from AeroXuk/libpostal_windows. AeroXuk 2017-11-19 12:44:38 +00:00
  • 7d6e648fc3 [auto][ci skip] Adding data files from Travis build #271 Travis 2017-11-17 19:36:25 +00:00
  • 27b3e99515 Merge pull request #269 from Jeffrey04/ms-dictionary-expansion-1.0 Al Barrentine 2017-11-17 14:20:43 -05:00
  • 86c3105d44 new names with alternate spelling jeffrey04 2017-11-16 11:23:20 +08:00
  • e9d2ab6400 reordered list of synonyms jeffrey04 2017-11-16 11:22:42 +08:00
  • b3d306456f new synonyms jeffrey04 2017-11-16 11:22:14 +08:00
  • 0d76d190e1 updated street types jeffrey04 2017-11-16 11:21:39 +08:00
  • f726970d2b updated qualifiers jeffrey04 2017-11-16 11:20:20 +08:00
  • 39fd7f0cb1 list of titles update jeffrey04 2017-11-16 11:18:18 +08:00
  • 865f99a0c1 sorted place names jeffrey04 2017-11-16 11:04:49 +08:00
  • ceae1257af new place names jeffrey04 2017-11-16 11:00:07 +08:00
  • f3b76c1f28 some new company types in malay jeffrey04 2017-11-16 10:55:03 +08:00
  • c9d22d228f rearrange according to alphabetical order jeffrey04 2017-11-16 10:53:52 +08:00
  • 5e9d8f0a1e rearrange into alphabetical order as in other languages jeffrey04 2017-11-16 10:51:53 +08:00
  • 6d54cbcc82 new building types jeffrey04 2017-11-16 10:43:58 +08:00
  • 867c3b825c Merge pull request #1 from openvenues/master Choon-Siang Lai 2017-11-15 14:35:47 +08:00
  • fbf88aee88 [similarity] adding possible abbreviation functions to header, making everything const char * Al 2017-11-12 04:48:26 -05:00
  • b34e578366 [similarity] using new sequence alignment breakdown by operation to tell if any two words are an abbreviation. The loose variant requires that the alignment covers all characters in the shortest string, which matches things like Services vs. Svc, whereas the strict variant requires that either the shorter string is a prefix of the longer one (Inc and Incorporated) or that the two strings share both a prefix and a suffix (Dept and Department). Both variants require that the strings share at least the first letter in common. Al 2017-11-11 04:02:28 -05:00
  • 751873e56b [similarity] a *NEW* sequence alignment algorithm which builds on Smith-Waterman-Gotoh with affine gap penalties. Like Smith-Waterman, it performs a local alignment, and like the cost-only version of Gotoh's improvement, it needs O(mn) time and O(m) space (where m is the length of the longer string). However, this version of the algorithm stores and returns a breakdown of the number and specific types of edits it makes (matches, mismatches, gap opens, gap extensions, and transpositions) rather than rolling them up into a single cost, and without needing to return/compute the full alignment as in Needleman-Wunsch or Hirschberg's variant Al 2017-11-11 03:07:39 -05:00
  • 665b780422 [utils] adding unicode_equals function in string_utils for testing equality of unicode char arrays Al 2017-11-11 02:45:41 -05:00
  • 5f0e394ea8 [fix] README badges Al 2017-11-01 20:12:36 -04:00
  • 669e52b329 [build] adding --no-same-owner explicitly when untarring the data files for #267 Al 2017-11-01 20:05:33 -04:00
  • 3c6629ae3d [dictionaries] adding variants of & as synonyms in all languages Al 2017-10-28 17:22:14 -04:00
  • bc9f11d6e3 [similarity] exposing unicode versions of Damerau-Levenshtein and Jaro-Winkler distances Al 2017-10-28 02:45:48 -04:00
  • 2d6079b06f [expand] added search_address_dictionaries_substring to support the new use case (i.e. returns "does this substring in the trie?" regardless of if it's stored under the special prefixes/suffixes namespaces) Al 2017-10-28 02:40:14 -04:00
  • 053dca82ba [expand] adding a normalization for a single non-acronym internal period where there's an expansion at the prefix/suffix (for #218 and https://github.com/openvenues/libpostal/issues/216#issuecomment-306617824). Helps in cases like "St.Michaels" or "Jln.Utara" without needing to specify concatenated prefix phrases for every possibility Al 2017-10-28 02:38:15 -04:00
  • 6d430f7e9b [utils] adding functions for finding the next index of a full stop/period charater in a string Al 2017-10-27 04:07:28 -04:00
  • e38e57b8e8 [numex] fixing edge case where something like "IV Michael" could cause a partial Roman numeral to get added for the MI portion of "Michael" Al 2017-10-27 04:04:06 -04:00
  • e8ae3bbbaf [similarity] using NULL-terminated varargs in double metaphone instead of specifying the number of arguments, should be more maintainable Al 2017-10-23 15:20:04 -04:00
  • 5c0ecf8963 [dedupe] Jaccard similarity Al 2017-10-21 10:34:12 -04:00
  • 4ccc2a9e9f [fix] making string args const in string_similarity module Al 2017-10-21 02:45:08 -04:00
  • 5c927e780f [expand] adding ability to expand Roman numerals with ordinal suffixes like IXe in French Al 2017-10-20 02:51:26 -04:00
  • b7eda37e44 [utils] adding utf8_is_digit to string_utils.h Al 2017-10-20 02:45:55 -04:00
  • 1fbc238b60 [numex] adding functions to parse and validate a Roman numeral Al 2017-10-20 02:45:32 -04:00
  • 1c5afcafd2 [phrases] when skipping/ignoring hyphens in trie search, make sure that the new longer phrase ends at a word boundary (space, hyphen, end of string, etc.) Al 2017-10-20 02:43:39 -04:00
  • 9d2a111286 [numex] when parsing numex, bail on rules in whole_tokens_only languages if there are contiguous rules with no right context rules (example: something that wouldn't make sense like VL in Latin) Al 2017-10-20 02:34:30 -04:00
  • bd477976d1 [similarity] string similarity measures for Damerau-Levenshtein and Jaro-Winkler distances. Both operate on unicode points internally for lengths, etc. instead of byte strings and the Levenshtein distance uses only one array instead of needing to store the full matrix of transitions. Al 2017-10-19 04:51:28 -04:00
  • 245aa226e0 [utils] function to create an array of uint32_t codepoints from a UTF-8 string, a few bug fixes to string_utils Al 2017-10-19 04:48:50 -04:00
  • c61007388b [similarity] bug fixes and additional French, Spanish, Italian, and Slavic phonetics Al 2017-10-18 04:00:57 -04:00
  • 3a3aca8490 [similarity] adding basic double metaphone implementation Al 2017-10-18 03:59:05 -04:00
  • 2f2d3da722 [test] test for utf8_equal_ignore_separators Al 2017-10-14 01:42:08 -04:00
  • 09fbb02042 [utils] adding utf8_equal_ignore_separators to string utils Al 2017-10-14 01:36:56 -04:00
  • f8a808e254 [utils] adding utf8_len function for strings, and utf8_is_digit Al 2017-10-12 11:16:53 -04:00
  • 448ca6a61a [merge] merging commit from v1.1 Al 2017-08-14 04:04:58 -06:00
  • bb277fb326 [auto][ci skip] Adding data files from Travis build #268 Travis 2017-10-10 18:58:10 +00:00
  • e60139757f Merge pull request #257 from mkaranta/patch-1 Al Barrentine 2017-10-10 14:42:29 -04:00
  • c96a042e86 Add 'bld' as an abbreviation for 'building' mkaranta 2017-10-10 14:19:09 -04:00
  • c984dca459 [fix] removing log error for sequences of length 0 Al 2017-09-19 23:20:03 -04:00
  • 94a0e842e7 [fix] typo Al Barrentine 2017-08-16 15:04:15 -04:00
  • 34e2c4772e [code of conduct] adding stronger, more specific language about hate speech in code of conduct Al Barrentine 2017-08-16 15:03:38 -04:00
  • 2bfa8efefb [docs] updating README examples of normalization now that canonical forms are no longer transliterated Al Barrentine 2017-08-16 12:15:22 -04:00
  • 0c6af2b74c [fix] normalize canonical strings (after expanding abbreviations, concatenated suffixes, etc.) with Latin-ASCII, Latin-ASCII-Simple or simple UTF-8 normalization depending on the options Al 2017-08-03 14:08:05 -06:00
  • ed011e50d5 [docs][ci skip] update contributing section in README Al 2017-08-01 00:27:50 -04:00
  • caf2415938 [fix][ci skip] updates to contributions guide Al 2017-08-01 00:25:36 -04:00
  • da2affbacb [fix][ci skip] removing repetition in contributing guide Al 2017-08-01 00:13:53 -04:00
  • 2c06f26f3d [docs][ci skip] adding contributing guide for how to submit issues Al 2017-08-01 00:06:56 -04:00
  • 6ca6493d0b Merge pull request #231 from michaelkrog/patch-1 Al Barrentine 2017-07-27 11:21:34 -04:00
  • a36dcc8b9c Update is.yaml Michael Krog 2017-07-27 13:24:54 +02:00
  • 7352dc74c6 Moving language around in code of conduct Al Barrentine 2017-07-21 12:58:35 -04:00
  • 4cde250463 Adding a custom libpostal Code of Conduct Al Barrentine 2017-07-21 02:35:07 -04:00
  • dab3b95ae1 Merge pull request #229 from openvenues/32bit_numex_fix Al Barrentine 2017-07-20 18:11:02 -04:00
  • 97044f5a8b [fix] 32-bit safety in numex table loading Al 2017-07-20 17:50:53 -04:00
  • 0cb8c61fb0 Merge pull request #215 from xiamx/patch-2 Al Barrentine 2017-06-05 16:26:11 -04:00
  • abcf72be2e Add Elixir language binding to Readme Mengxuan Xia 2017-06-05 16:05:19 -04:00
  • 50cf14846c Merge pull request #214 from iestynpryce/master Al Barrentine 2017-05-30 08:45:28 -04:00
  • b96a687182 Merge https://github.com/openvenues/libpostal Iestyn Pryce 2017-05-29 18:23:03 +01:00
  • 8dd84b71ba [auto][ci skip] Adding data files from Travis build #250 Travis 2017-05-24 05:05:06 +00:00
  • e9696e9166 Merge pull request #212 from openvenues/bbraunay-master Al Barrentine 2017-05-24 00:54:05 -04:00
  • 1948634bf3 [dictionaries] adding a separable prefix for Jl. and Jln. so things like Jl.Utara get separated and expanded Al 2017-05-24 00:26:32 -04:00
  • 3b5b5d8baa [dictionaries] adding ambiguous expansions for all Indonesian abbreviations 1-2 characters as they could also be initials, etc. Al 2017-05-23 18:04:09 -04:00
  • f507102457 [dictionaries] removing English words from Indonesian unit types Al 2017-05-23 18:01:38 -04:00
  • 4b24699e1f [fix] changing national to nasional in Indonesian Al 2017-05-23 18:00:20 -04:00
  • 4df48fb412 [dictionaries] moving Kampong to normalize to Kampung in Indonesian, better if there's one canonical form Al 2017-05-23 17:57:34 -04:00
  • ec79c610eb [dictionaries] removing a few English words and dupes from Indonesian place names Al 2017-05-23 17:55:59 -04:00
  • 77365a56a5 [dictionaries] removing no fixed address from Indonesian dictionaries Al 2017-05-23 17:51:15 -04:00
  • 8a35cfcd80 [dictionaries] removing level/platform/podium from Indonesian level types Al 2017-05-23 17:50:25 -04:00
  • 364b00da01 [dictionaries] separating Mas and Abang Al 2017-05-23 17:46:45 -04:00
  • 83378049ee [dictionaries] remove Doktor from academic degrees in Indonesian dictionaries Al 2017-05-23 17:35:53 -04:00
  • 52593c6374 [dictionaries] remove nonprofit from Indonesian company types Al 2017-05-23 17:27:11 -04:00
  • 08524f4b07 [dictionaries] moving some of the existing chain stores for Indonesia to the all/chains.txt dictionary Al 2017-05-23 17:25:59 -04:00
  • 18b2fb0ec8 Merge branch 'master' of https://github.com/bbraunay/libpostal into bbraunay-master Al 2017-05-23 17:18:37 -04:00
  • 87cf7b5bca Add portable way of formatting khint_t type (from klib) Iestyn Pryce 2017-05-21 11:58:37 +01:00
  • d8239a9cc4 Revert format regression introduced in ecd07b18c1 Iestyn Pryce 2017-05-21 11:14:21 +01:00