libpostal

Author	SHA1	Message	Date
Al	d1cf253092	[osm/formatting] Higher probability of dropout for rare components like counties, etc.	2016-01-22 03:39:35 -05:00
Al	9dd965a6fa	[fix] removing gazetteer configuration from disambiguation module	2016-01-22 03:18:18 -05:00
Al	b22646ee30	[mv] Moving gazetteers into their own module	2016-01-22 03:15:56 -05:00
Al	5a68e7aeef	[fix] import	2016-01-22 03:00:43 -05:00
Al	6ac72576bc	[osm/formatting] Randomly abbreviating street names and venue names using all the available libpostal dictionaries. Refactoring OSM formatting into separate methods which can be individually tested. Adding override for special phrases like UK	2016-01-22 02:56:39 -05:00
Al	f4995d4f0f	[languages] Adding several different types of dictionaries for name expansion/abbreviation in OSM	2016-01-22 00:51:32 -05:00
Al	89aa039692	[dictionaries] Adding some Italian month abbreviations	2016-01-21 15:12:46 -05:00
Al	26cbb1eb8d	[languages] Fixing multiple expansions in the same dictionary for Python trie, adding length for prefixes/suffixes	2016-01-21 04:29:14 -05:00
Al	0269d92e3d	[languages] Adding canonical string and dictionary type to Python trie, modifying disambiguate_languages accordingly, and adding lists of alternate forms	2016-01-21 02:30:59 -05:00
Al	2e15db06dd	[text] making normalize_string directly callable from Python geodata	2016-01-21 02:07:46 -05:00
Al	71e01e6133	[fix] prefix/suffix phrase search in Python trie search	2016-01-19 03:43:54 -05:00
Al	39667b73a2	[build] std=gnu99 in geodata build	2016-01-19 03:23:56 -05:00
Al	8b94a018e6	[languages] encoding in language disambiguation	2016-01-19 03:22:03 -05:00
Al	3262d2ccd3	[fix] arg count	2016-01-19 03:16:14 -05:00
Al	5d5d5713cc	[transliteration] Regenerating transliterator scripts	2016-01-18 12:04:14 -05:00
Al	fe8f3158f6	[fix] missing file in geodata	2016-01-17 22:23:44 -05:00
Al	5fd9dc7e2b	[scripts] relative dirs in setup.py for geodata	2016-01-17 22:22:50 -05:00
Al	da62ff309e	[transliteration] Fixing Malayalam script	2016-01-17 22:15:56 -05:00
Al	5385cb71d6	[languages] Adding English dictionaries to Indonesia	2016-01-17 22:08:06 -05:00
Al	8030b235e6	[languages] Changing the definition in script languages so only languages that appear on street signs will be used	2016-01-17 22:03:41 -05:00
Al	0dfd8d6439	[language_classification] Adding script feature for any non-Latin script. Even if the script doesn't directly identify the language, it can act as a modified intercept (all Han script addresses will share the Han feature, even if we haven't seen one of the > 80k Han characters)	2016-01-17 21:37:45 -05:00
Al	b9a3230f65	[language_classification] Removing the per-country classifier, text-based alone is doing close to 99% accuracy now	2016-01-17 21:13:14 -05:00
Al	f808f74271	[language_classification] Automatic hyperparameter optimization using either the cross-validation set or two distinct subsets of the training set	2016-01-17 21:11:37 -05:00
Al	af5689ee52	[fix] removing unused var	2016-01-17 21:00:17 -05:00
Al	7d727fc8f0	[optimization] Using adapted learning rate in stochastic gradient descent (if lambda > 0)	2016-01-17 20:59:47 -05:00
Al	7b300639f1	[fix] Trie prefix search tail comparison	2016-01-17 20:56:37 -05:00
Al	70dbfdd560	[unicode] Regenerating unicode_script_data.c	2016-01-17 20:53:44 -05:00
Al	de240d2b94	[fix] tokenize_add_tokens respects specified length	2016-01-17 20:51:47 -05:00
Al	10cadc67d7	[io] matrix_read using array I/O functions	2016-01-17 20:40:18 -05:00
Al	baba826d21	[io] Cutting down on system calls in trie_read	2016-01-17 20:39:19 -05:00
Al	cba2acc21f	[io] Sparse matrix using array I/O methods	2016-01-17 20:38:16 -05:00
Al	46b35c5202	[utils] Adding functions to read numeric arrays from files	2016-01-17 20:36:57 -05:00
Al	3d7dd8966e	[languages] Using unicode script in language disambiguation in addition to dictionaries. Eliminating dependency on address_normalizer	2016-01-17 18:28:28 -05:00
Al	fa32eacdd1	[phrases] Adding Python phrase filter from address_normalizer until a Python wrapper around libpostal's trie_search is available	2016-01-17 15:45:02 -05:00
Al	f79a3c5bf4	[osm/polygons] Allowing polygons that GEOS claims are invalid in OSM polygon index (there were some glaring omissions from the index like the polygons for the UK or Berlin). For some reason .buffer(0) creates weird multipolygons that no longer contain their centroids, etc. and aren't useful in reverese geocoding	2016-01-17 15:43:21 -05:00
Al	04f251c1cc	[polygons] Don't call fix_polygon (force polygon validity) by default	2016-01-16 21:21:27 -05:00
Al	19a5541a85	[polygons/osm] append polygon nodes by vertices that connect to each other	2016-01-16 21:20:49 -05:00
Al	d4143c1685	[parsing] Adding an optimization to the parser API where, if the entire input is a single known geographic phrase like New York, it returns the most likely label from the training data. That way e.g. a search for 'Florida' doesn't get tagged as 'house.' This doesn't affect training, only prediction.	2016-01-15 20:07:21 -05:00
Al	24b4a680c3	[languages] Adding English dictionaries for Bangladesh	2016-01-14 13:36:07 -05:00
Al	edebdf73e0	[dictionaries] Using long forms as canonical for English degrees as new language models may do some auto-abbreviating	2016-01-14 13:35:41 -05:00
Al	58e53cab1c	[scripts] Adding the tokenize/normalize wrappers directly into the internal geodata package so pypostal can be maintained in an independent repo	2016-01-12 13:29:31 -05:00
Al	622dc354e7	[optimization] Adding learning rate to lazy sparse update in stochastic gradient descent	2016-01-12 11:04:16 -05:00
Al	79f2b7c192	[build] Removing source from libpostal shared lib	2016-01-12 10:31:22 -05:00
Al	6a9c1e8c6d	[build] Adding trie_utils.c to address parser train/test	2016-01-12 10:22:34 -05:00
Al	7cc201dec3	[optimization] Moving gamma_t calculation to the header in SGD	2016-01-11 16:40:50 -05:00
Al	25ae5bed33	[unicode] Adding SCRIPT_INHERITED as a common script so diacritics like COMBING CEDILLA don't break the current script and produce false word breaks	2016-01-11 16:39:21 -05:00
Al	3260edcf18	[math] Adding sparse dot sparse given a dense output matrix (suitable for the minibatch use case), fixing sparse dot vector	2016-01-11 13:55:54 -05:00
Al	736bc7c70d	[config] language_classifier data dir	2016-01-10 03:05:36 -05:00
Al	ebaedb6bcf	[language_classifier] Language classifier training using L2-regularized logistic regression and stochastic gradient descent	2016-01-10 01:31:18 -05:00
Al	56710cce21	[language_classifier] Language classifier data set I/O	2016-01-10 01:22:29 -05:00

1 2 3 4 5 ...

1346 Commits