Commit Graph

5144 Commits

Author SHA1 Message Date
Al
9d2a111286 [numex] when parsing numex, bail on rules in whole_tokens_only languages if there are contiguous rules with no right context rules (example: something that wouldn't make sense like VL in Latin) 2017-10-20 02:34:30 -04:00
Al
bd477976d1 [similarity] string similarity measures for Damerau-Levenshtein and Jaro-Winkler distances. Both operate on unicode points internally for lengths, etc. instead of byte strings and the Levenshtein distance uses only one array instead of needing to store the full matrix of transitions. 2017-10-19 04:51:33 -04:00
Al
245aa226e0 [utils] function to create an array of uint32_t codepoints from a UTF-8 string, a few bug fixes to string_utils 2017-10-19 04:48:50 -04:00
Al
c61007388b [similarity] bug fixes and additional French, Spanish, Italian, and Slavic phonetics 2017-10-18 13:31:35 -04:00
Al
3a3aca8490 [similarity] adding basic double metaphone implementation 2017-10-18 03:59:05 -04:00
Al
2f2d3da722 [test] test for utf8_equal_ignore_separators 2017-10-14 01:42:08 -04:00
Al
09fbb02042 [utils] adding utf8_equal_ignore_separators to string utils 2017-10-14 01:36:56 -04:00
Al
f8a808e254 [utils] adding utf8_len function for strings, and utf8_is_digit 2017-10-12 11:16:53 -04:00
Al
448ca6a61a [merge] merging commit from v1.1 2017-10-12 01:41:04 -04:00
Travis
bb277fb326 [auto][ci skip] Adding data files from Travis build #268 2017-10-10 18:58:10 +00:00
Al Barrentine
e60139757f Merge pull request #257 from mkaranta/patch-1
Add 'bld' as an abbreviation for 'building'
2017-10-10 14:42:29 -04:00
mkaranta
c96a042e86 Add 'bld' as an abbreviation for 'building'
I noticed this was missing while testing a batch of addresses. Hopefully it doesn't introduce much noise.
2017-10-10 14:19:09 -04:00
Al
c984dca459 [fix] removing log error for sequences of length 0 2017-09-19 23:20:03 -04:00
Al Barrentine
94a0e842e7 [fix] typo 2017-08-16 15:04:15 -04:00
Al Barrentine
34e2c4772e [code of conduct] adding stronger, more specific language about hate speech in code of conduct 2017-08-16 15:03:38 -04:00
Al Barrentine
2bfa8efefb [docs] updating README examples of normalization now that canonical forms are no longer transliterated 2017-08-16 12:15:22 -04:00
Al
0c6af2b74c [fix] normalize canonical strings (after expanding abbreviations, concatenated suffixes, etc.) with Latin-ASCII, Latin-ASCII-Simple or simple UTF-8 normalization depending on the options 2017-08-03 14:08:05 -06:00
Al
ed011e50d5 [docs][ci skip] update contributing section in README 2017-08-01 00:27:50 -04:00
Al
caf2415938 [fix][ci skip] updates to contributions guide 2017-08-01 00:25:36 -04:00
Al
da2affbacb [fix][ci skip] removing repetition in contributing guide 2017-08-01 00:13:55 -04:00
Al
2c06f26f3d [docs][ci skip] adding contributing guide for how to submit issues 2017-08-01 00:10:40 -04:00
Al Barrentine
6ca6493d0b Merge pull request #231 from michaelkrog/patch-1
Changes front matter of iis.yaml to correct description
2017-07-27 11:21:34 -04:00
Michael Krog
a36dcc8b9c Update is.yaml 2017-07-27 13:24:54 +02:00
Al Barrentine
7352dc74c6 Moving language around in code of conduct 2017-07-21 12:58:35 -04:00
Al Barrentine
4cde250463 Adding a custom libpostal Code of Conduct 2017-07-21 02:35:07 -04:00
Al Barrentine
dab3b95ae1 Merge pull request #229 from openvenues/32bit_numex_fix
32-bit safety in numex table loading
2017-07-20 18:11:02 -04:00
Al
97044f5a8b [fix] 32-bit safety in numex table loading 2017-07-20 17:55:43 -04:00
Al Barrentine
0cb8c61fb0 Merge pull request #215 from xiamx/patch-2
Add Elixir language binding to README.md
2017-06-05 16:26:11 -04:00
Mengxuan Xia
abcf72be2e Add Elixir language binding to Readme 2017-06-05 16:05:19 -04:00
Al Barrentine
50cf14846c Merge pull request #214 from iestynpryce/master
Fix remaining log_* compile format warnings
2017-05-30 08:45:28 -04:00
Iestyn Pryce
b96a687182 Merge https://github.com/openvenues/libpostal 2017-05-29 18:23:03 +01:00
Travis
8dd84b71ba [auto][ci skip] Adding data files from Travis build #250 2017-05-24 05:05:06 +00:00
Al Barrentine
e9696e9166 Merge pull request #212 from openvenues/bbraunay-master
modified Indonesian dictionary updates
2017-05-24 00:54:05 -04:00
Al
1948634bf3 [dictionaries] adding a separable prefix for Jl. and Jln. so things like Jl.Utara get separated and expanded 2017-05-24 00:26:32 -04:00
Al
3b5b5d8baa [dictionaries] adding ambiguous expansions for all Indonesian abbreviations 1-2 characters as they could also be initials, etc. 2017-05-23 18:04:09 -04:00
Al
f507102457 [dictionaries] removing English words from Indonesian unit types 2017-05-23 18:01:47 -04:00
Al
4b24699e1f [fix] changing national to nasional in Indonesian 2017-05-23 18:00:20 -04:00
Al
4df48fb412 [dictionaries] moving Kampong to normalize to Kampung in Indonesian, better if there's one canonical form 2017-05-23 17:57:38 -04:00
Al
ec79c610eb [dictionaries] removing a few English words and dupes from Indonesian place names 2017-05-23 17:55:59 -04:00
Al
77365a56a5 [dictionaries] removing no fixed address from Indonesian dictionaries 2017-05-23 17:51:15 -04:00
Al
8a35cfcd80 [dictionaries] removing level/platform/podium from Indonesian level types 2017-05-23 17:50:50 -04:00
Al
364b00da01 [dictionaries] separating Mas and Abang 2017-05-23 17:46:45 -04:00
Al
83378049ee [dictionaries] remove Doktor from academic degrees in Indonesian dictionaries 2017-05-23 17:35:53 -04:00
Al
52593c6374 [dictionaries] remove nonprofit from Indonesian company types 2017-05-23 17:27:11 -04:00
Al
08524f4b07 [dictionaries] moving some of the existing chain stores for Indonesia to the all/chains.txt dictionary 2017-05-23 17:25:59 -04:00
Al
18b2fb0ec8 Merge branch 'master' of https://github.com/bbraunay/libpostal into bbraunay-master 2017-05-23 17:18:37 -04:00
Iestyn Pryce
87cf7b5bca Add portable way of formatting khint_t type (from klib) 2017-05-21 11:58:37 +01:00
Iestyn Pryce
d8239a9cc4 Revert format regression introduced in ecd07b18c1 2017-05-21 11:14:21 +01:00
Iestyn Pryce
73d27caeb9 Fix log_* formats which expect long long uint but receive uint64_t. 2017-05-21 10:57:20 +01:00
Yanuar Budi Baskoro
695756d484 [dictionaries] add more option on toponyms 2017-05-21 16:56:14 +07:00