libpostal

Author	SHA1	Message	Date
Al	be6f48f109	[fix] that didn't work, set log level to CRITICAL	2017-02-15 14:06:57 -05:00
Al	26bf617a06	[fix] prevent Shapely from logging to console	2017-02-15 14:00:51 -05:00
Al	934f6247c6	[osm] options to build the streets-only training data	2017-01-16 15:26:04 -05:00
Al	bb12d0940e	[fix] options/docs in osm address training	2016-12-10 13:45:37 -05:00
Al	5098599ed6	[addresses] remove Quattroshapes/GeoNames cities as they may have problematic names, and in any case we have point-based cities from OSM now	2016-12-10 02:08:40 -05:00
Al	da36b71829	[addresses] adding new places index in OSM and OpenAddresses training data	2016-12-05 18:36:17 -05:00
Al	7b3a59878c	[fix] bracket	2016-10-05 14:27:24 -04:00
Al	432f9dd42e	[fix] format of candidate_languages in the new OSM rtree	2016-10-05 03:12:07 -04:00
Al	faf418decb	[languages] using country_and_languages method in OSM, neighborhoods and OpenAddresses	2016-10-05 02:49:55 -04:00
Al	d281e71d2c	[fix] removing metro station indexas a dependency for AddressComponents	2016-08-22 15:52:27 -04:00
Al	145af9331e	[osm] build OSM training data for intersections using the JSON output from intersections.py rather having to compute each time	2016-08-17 18:11:55 -04:00
Al	e35649f09d	[fix] import	2016-08-06 20:01:38 -04:00
Al	0edfbe0d61	[osm] Adding metro stations index to training data options	2016-08-06 19:52:21 -04:00
Al	ffece04855	[osm] Place training data from OSM script	2016-07-25 02:45:16 -04:00
Al	73b2aec25e	[fix] input file	2016-07-21 17:04:57 -04:00
Al	51831e2111	[fix] add ways db dir	2016-07-21 17:04:57 -04:00
Al	0a912766e4	[fix] logging for intersections data	2016-07-21 17:04:57 -04:00
Al	8aada7086f	[intersections] intersections training data	2016-07-21 17:04:57 -04:00
Al	11d1acc3bc	[parser] Sample chain store alternate names from the cross-language dictionary	2016-07-21 17:04:57 -04:00
Al	5ea570835e	[fix] args again	2016-07-21 17:04:57 -04:00
Al	7c41d84d8f	[fix] args	2016-07-21 17:04:57 -04:00
Al	2e4ba6e6cc	[subdivisions/buildings] Adding subdivisions and buildings rtree to training data for getting building height, zone	2016-07-21 17:04:57 -04:00
Al	91db1ec371	[fix] removing unnecessary vars	2016-07-21 17:04:57 -04:00
Al	bce7004ed7	[fix] import	2016-07-21 17:04:57 -04:00
Al	e57783ff5f	[fix] constructor	2016-07-21 17:04:57 -04:00
Al	677a86224e	[fix] cli arg name	2016-07-21 17:04:57 -04:00
Al	d04a026528	[fix] no need to init language, etc. in new script	2016-07-21 17:04:57 -04:00
Al	611002ea7a	[fix] cleaning up imports	2016-07-21 17:04:57 -04:00
Al	a96e5760a9	[osm] Same great training script, only shorter	2016-07-21 17:04:57 -04:00
Al	00ce71223f	[osm] Using the default probabilities for abbreviations in ways training data	2016-01-24 00:53:41 -05:00
Al	bab7a0f961	[osm] splitting streets (way names) on semicolons	2016-01-24 00:42:25 -05:00
Al	7646adfc0f	[osm] Adding abbreviated street names in addition to the originals	2016-01-23 23:23:58 -05:00
Al	67130383ce	[fix] converting semicolons to commas in OSM house numbers and picking one at random	2016-01-23 23:16:19 -05:00
Al	1bb797f783	[fix] spacing in phrases	2016-01-23 21:59:49 -05:00
Al	3a8c3dfcf6	[fix] spacing in phrases at end of string	2016-01-23 21:51:40 -05:00
Al	78450bfad9	[fix] Spaces in abbreviation	2016-01-23 21:36:20 -05:00
Al	308ceb5a5f	[fix] convert UTF8 slices back to unicode before using with the Python trie	2016-01-23 20:20:23 -05:00
Al	5eb6bb309b	[fix] Only adding whitespace back into tokenized strings during abbreviation if it existed in the original string	2016-01-23 20:09:45 -05:00
Al	d61207e95a	[fix] var name	2016-01-23 18:01:02 -05:00
Al	e44cba1d06	[fix] geonames db not required in OSM training data	2016-01-23 17:59:55 -05:00
Al	4f03711e60	[osm] Adding abbreviated training examples to ways language training data	2016-01-23 14:10:47 -05:00
Al	c9fb4ee69d	[osm/formatting] Dropping state more often than not, except in the US and Canada where those fields are more commonly used	2016-01-22 17:58:24 -05:00
Al	ea9bb3f2d5	[fix] Abbreviation probabilities should only apply once, not once per dictionary. Also fixing issues where some of the abbreviations were doubled	2016-01-22 15:48:21 -05:00
Al	f9f6558e06	[fix] simple whitespace field splits for the limited format training data (used for language classification)	2016-01-22 04:34:42 -05:00
Al	cd1db7b288	[fix] Making sure rare components are dropped first, adding state and country back in	2016-01-22 04:17:19 -05:00
Al	adc3a00264	[fix] var name	2016-01-22 04:10:16 -05:00
Al	261beffa36	[fix] Actually better to remove country and state from rare components and let them use the standard dropout probabilities	2016-01-22 04:00:45 -05:00
Al	a6cc3d0114	[fix] Adding state to the more frequently dropped components	2016-01-22 03:56:38 -05:00
Al	bca3dae004	[fix] state full name probabilities for limited vs. full formatted OSM training sets	2016-01-22 03:54:20 -05:00
Al	d1cf253092	[osm/formatting] Higher probability of dropout for rare components like counties, etc.	2016-01-22 03:39:35 -05:00

1 2 3 4 5

202 Commits