2018-02-22 01:21:23 +00:00
..
2015-04-12 15:14:01 -04:00
2015-08-07 17:15:39 -04:00
2017-11-29 12:26:36 -05:00
2017-11-27 01:42:25 +00:00
2017-12-18 18:33:50 +00:00
2015-03-11 17:47:57 -04:00
2016-12-21 14:39:27 -05:00
2017-11-25 04:35:28 +00:00
2018-01-15 23:47:16 -05:00
2018-01-15 23:47:16 -05:00
2016-11-27 00:56:48 -08:00
2017-12-17 03:14:00 -05:00
2017-12-17 03:14:00 -05:00
2018-02-22 01:21:23 +00:00
2018-02-06 01:58:15 -05:00
2017-11-27 19:20:37 +00:00
2017-03-18 06:09:52 -04:00
2016-12-09 13:37:45 -05:00
2017-03-31 03:35:51 -04:00
2017-05-21 11:58:37 +01:00
2018-01-02 11:56:02 -08:00
2017-12-24 01:45:50 -05:00
2017-03-06 20:32:50 -05:00
2017-03-06 20:32:50 -05:00
2017-05-21 10:57:20 +01:00
2017-03-06 22:28:33 -05:00
2017-02-19 14:02:54 -08:00
2015-09-10 10:26:24 -07:00
2017-03-31 03:35:51 -04:00
2017-01-13 18:30:04 -05:00
2015-08-07 02:43:44 -04:00
2017-04-03 00:15:31 -04:00
2017-04-03 00:15:31 -04:00
2016-11-21 14:04:34 -05:00
2017-04-02 23:28:17 -04:00
2015-09-19 01:39:58 -04:00
2017-03-19 16:10:29 -04:00
2017-03-10 01:10:22 -05:00
2017-05-21 10:57:20 +01:00
2017-03-10 01:28:31 -05:00
2017-03-10 01:25:20 -05:00
2017-03-10 13:39:52 -05:00
2017-09-19 23:20:03 -04:00
2017-03-10 02:06:45 -05:00
2018-01-26 01:20:35 -05:00
2017-12-29 17:48:54 -05:00
2017-10-23 15:20:04 -04:00
2017-10-18 03:59:05 -04:00
2018-02-15 18:55:37 -05:00
2017-12-30 02:34:06 -05:00
2017-11-25 04:35:28 +00:00
2017-11-25 04:35:28 +00:00
2017-11-25 04:35:28 +00:00
2017-11-25 04:35:28 +00:00
2017-04-02 13:41:57 -04:00
2017-04-02 13:41:57 -04:00
2018-01-22 01:56:32 -05:00
2017-04-02 14:28:25 -04:00
2018-02-15 18:52:16 -05:00
2016-07-21 17:04:57 -04:00
2015-10-11 00:45:26 -05:00
2015-10-11 00:45:26 -05:00
2017-01-13 19:58:49 -05:00
2017-01-13 18:30:04 -05:00
2015-10-09 15:37:10 -04:00
2015-07-08 17:02:59 -04:00
2016-02-25 14:53:31 -06:00
2016-07-21 17:04:57 -04:00
2017-03-18 06:05:28 -04:00
2015-10-09 15:37:10 -04:00
2017-03-18 06:05:28 -04:00
2017-03-18 06:05:28 -04:00
2017-12-29 03:08:48 -05:00
2017-12-29 03:08:48 -05:00
2015-12-17 12:25:05 -05:00
2015-12-17 12:25:05 -05:00
2018-01-19 14:24:08 -05:00
2017-04-02 23:47:07 -04:00
2016-01-17 21:11:37 -05:00
2017-05-19 22:31:56 +01:00
2017-05-19 22:31:56 +01:00
2017-04-02 23:51:54 -04:00
2017-04-02 23:51:54 -04:00
2017-04-02 23:32:24 -04:00
2016-01-09 03:42:57 -05:00
2017-01-09 16:11:26 -05:00
2018-01-17 17:31:42 -05:00
2017-12-30 02:31:25 -05:00
2017-12-30 03:24:39 -05:00
2017-04-05 14:18:13 -04:00
2017-04-05 14:18:13 -04:00
2017-05-21 11:14:21 +01:00
2017-04-02 14:32:14 -04:00
2017-04-02 23:29:52 -04:00
2016-08-06 00:40:01 -04:00
2017-04-02 23:55:04 -04:00
2017-12-29 17:46:35 -05:00
2017-04-12 20:40:08 -04:00
2016-01-09 01:43:25 -05:00
2016-01-08 00:48:11 -05:00
2015-07-25 18:41:02 -04:00
2015-08-07 02:43:44 -04:00
2017-12-30 02:33:33 -05:00
2018-02-06 03:08:37 -05:00
[dedupe] adding near-dupe hashing function, which can be thought of as the blocking function in record linkage or as a form of locally sensitive hashing in general document deduping. The goal is, if two addresses/names are the same, they should share at least one hash. These hashes can also be used as an inverted index (DB, ES, hashtable, etc.). Uses the double metaphone for name words in Latin script (otherwise each individual token, and sequences of two tokens in the case of ideograms for e.g. Chinese, Japanese, Korean, etc.)
2017-12-24 02:47:45 -05:00
2016-12-21 18:09:45 -05:00
2016-12-21 18:09:45 -05:00
2017-12-17 19:53:15 -05:00
2017-11-23 19:11:25 +00:00
2017-11-24 22:29:45 +00:00
2015-08-09 01:00:57 -04:00
2017-04-18 17:20:02 -04:00
2018-01-02 10:24:39 -08:00
[numex] changing is_roman_numeral to is_likely_roman_numeral to get rid of most of the false positives like \"La\" in Spanish which could be L(=50) + the ordinal suffix \"a\", but in practice it never means that. For Roman numerals that are shorter than two characters (whether on their own like "DC" or "MD", or attached to a potential ordinal suffix like \"Ce\" in French), will be ignored unless they're composed of more likely, smaller, Roman numerals: I, V, and X, so VI, IX, etc. are expanded as Roman numerals but LI is not.
2017-12-27 19:38:02 -05:00
2017-12-29 04:50:08 -05:00
2017-12-27 22:13:04 -05:00
2015-07-08 17:02:59 -04:00
2017-04-03 00:16:30 -04:00
2017-04-03 00:16:30 -04:00
2017-11-25 04:35:28 +00:00
2017-11-25 04:35:28 +00:00
2017-11-25 04:35:28 +00:00
2017-03-10 13:39:52 -05:00
2017-04-02 13:51:45 -04:00
2018-01-26 18:04:45 -05:00
2018-01-25 14:23:18 -05:00
2017-04-02 13:48:46 -04:00
2017-04-02 13:48:46 -04:00
2017-01-13 19:58:49 -05:00
2016-08-06 00:40:01 -04:00
2018-01-22 01:56:32 -05:00
2017-04-05 14:18:13 -04:00
2018-01-25 16:26:41 -05:00
2017-11-12 04:48:26 -05:00
2018-01-15 23:47:16 -05:00
2018-01-15 23:47:16 -05:00
2017-11-29 12:21:13 -05:00
2017-11-29 12:21:13 -05:00
2017-04-13 13:02:03 -04:00
2017-10-12 01:41:04 -04:00
2017-11-23 19:11:25 +00:00
2017-12-25 01:37:42 -05:00
2018-02-06 15:08:11 -05:00
2017-11-25 04:35:28 +00:00
2017-03-17 18:28:41 -04:00
2015-05-29 16:54:05 -04:00
2017-01-02 13:52:48 -05:00
2017-01-02 00:41:11 -05:00
2017-11-25 04:35:28 +00:00
2017-11-25 04:35:28 +00:00
2017-05-17 22:40:53 +01:00
2016-01-05 16:43:17 -05:00
2017-11-25 04:35:28 +00:00
2017-11-25 04:35:28 +00:00
2016-01-17 20:53:44 -05:00
2016-01-17 20:53:44 -05:00
2017-04-02 23:53:21 -04:00
2015-09-23 04:04:38 -04:00
2018-01-22 01:56:32 -05:00
2017-03-10 19:31:34 -05:00