libpostal

Author	SHA1	Message	Date
Al	22e8178a97	[countries] Adding module for getting official country names in every language from CLDR + a dictionary of local language names	2015-09-29 21:10:38 -04:00
Al	daad1a1313	[geonames] Removing alternate names from geonames data set which are digits-only (most are not legitimate)	2015-09-28 17:46:53 -04:00
Al	f29f2f091b	[fix] PEBCAK	2015-09-27 22:49:27 -04:00
Al	93b3110a49	[fix] only commas and hyphens need to be eliminated at the end of phrases in untagged address formatting	2015-09-27 19:25:34 -04:00
Al	d3bfaf6b43	[osm/formatting] Fixing formatting tagged addresses with comma separated fields	2015-09-27 03:19:23 -04:00
Al	d512201e2c	[fix] removing space from tokens in address formatting	2015-09-27 02:18:34 -04:00
Al	5b829cd5a7	[fix] blank values containing punctuation in formatting	2015-09-26 21:49:28 -04:00
Al	dac0440be8	[fix] rsplit	2015-09-26 21:07:54 -04:00
Al	ae93552455	[osm/formatting] Moving back to openvenues repo pending resolution of the Turkish address issue	2015-09-26 03:56:52 -04:00
Al	0c792a2cc3	[osm/formatting] Changing the way the formatter elimiates inter-component separators, changing repo back to OpenCageData after pull request merge	2015-09-26 03:21:26 -04:00
Al	5417b4e602	[unicode] Downloading latest UnicodeData.txt instead of using builtin Python module (out of date) e.g. for getting unicode codepoint categories	2015-09-25 23:59:38 -04:00
Al	8fe791a14a	[fix] ensure_dir in file downloads	2015-09-25 17:05:22 -04:00
Al	646b9f7248	[osm/formatting] Continuing to use openvenues formatter for the India fix	2015-09-25 13:36:24 -04:00
Al	9901dd2aac	[fix] Switching address formatter back to OpenCageData repo	2015-09-24 18:42:17 -04:00
Al	3ce1669c30	[fix] import	2015-09-24 01:25:00 -04:00
Al	c85ce0b11d	[osm/formatting] Tagging separators as well in tagged output of the address formatter	2015-09-24 01:22:49 -04:00
Al	abfb1d4a60	[transliteration] Wide char support in transliteration data generator	2015-09-23 03:56:12 -04:00
Al	7e057b0fb8	[utils] basic functions for wide char support for narrow Python builds (unichr, ord, unicode iteration)	2015-09-23 00:42:54 -04:00
Al	8562c7a5cb	[unicode] Adding wide char support for language disambiguation (comes up in venue names), despite the likelihood of running on a narrow Python build. Rolling back common script chars at a script break, so in the case of e.g. Cyrllic name (Latin name), the segmentation is done at the space before the paren.	2015-09-23 00:37:59 -04:00
Al	13bcc35523	[unicode] Allowing wide chars in unicode properties	2015-09-23 00:34:07 -04:00
Al	b4593b6f88	[unicode/tokenization] Using new character classes including wide chars in scanner	2015-09-23 00:33:14 -04:00
Al	a76831df7a	[unicode] Wide version of word breaks	2015-09-22 18:55:33 -04:00
Al	25917cfb17	[fix] scripts	2015-09-22 15:15:30 -04:00
Al	b405a53fe1	[fix] chars out of range in get_string_script Python version	2015-09-22 08:14:27 -04:00
Al	ca25b48687	[fix] Not writing empty fields in formatted addresses	2015-09-22 08:13:55 -04:00
Al	747de1944b	[fix] Accounting for unknown scripts in disambiguation	2015-09-21 18:05:28 -04:00
Al	134cf616d6	[osm] Using street for language disambiguation in training data	2015-09-21 04:09:15 -04:00
Al	84cf21df88	[osm] Separating address formatter into its own module, adding some documentation of the various training sets with examples	2015-09-20 20:05:46 -04:00
Al	6731395ca0	[osm] Separating tagged from untagged output	2015-09-19 14:11:47 -04:00
Al	35f1c02caf	[polygons] Reducing simplify tolerance for language polys now that regional languages are handled separately	2015-09-10 12:44:13 -07:00
Al	440a8158b6	[polygons] Adding in country languages for regional polygons without a default language	2015-09-10 12:34:26 -07:00
Al	fca7f21b1d	[polygons] Making simplify_tolerance and preserve_topology for polygon simplification configurable per class	2015-09-10 11:06:18 -07:00
Al	b85fe50fad	[osm] Training data for toponyms only cares about valid languages for name field	2015-09-08 16:38:05 -07:00
Al	e566063343	[osm] Doing an all-to-nodes conversion and an additional filter on the borders data set	2015-09-08 09:18:08 -07:00
Al	8525529968	[osm] Not requiring qualified name tags to process OSM toponyms	2015-09-06 21:03:01 -07:00
Al	df20e2cbc0	[osm] Including toponyms in the training data for countries where the unqualified place names can be assumed to be examples of a given language	2015-09-04 14:13:33 -04:00
Al	17fcfa8b59	[fix] adding house to ignore keys rather than aliasing it	2015-09-04 12:40:08 -04:00
Al	d64a27bc57	[osm] Converting relations to nodes in borders training data	2015-09-04 12:32:25 -04:00
Al	168b7f59da	[fix] default indices in strip_component	2015-09-04 12:29:47 -04:00
Al	64db63e3eb	[osm] Removing house tag	2015-09-04 12:23:47 -04:00
Al	6a20ce5e85	[language_id] Adding formatted addresses and toponyms to language training data	2015-09-04 01:46:49 -04:00
Al	4ebdca0ea7	[fix] var	2015-09-03 21:01:20 -04:00
Al	8345afbcd0	[fix] exclude country toponyms where the default languages is well represented	2015-09-03 20:56:58 -04:00
Al	20bb191624	[fix] chaining	2015-09-03 20:52:00 -04:00
Al	e7cf5000fe	[fix] Exclude polygons with > 1 regional language	2015-09-03 20:48:04 -04:00
Al	9a9530c1b9	[fix] unqualified names	2015-09-03 20:37:22 -04:00
Al	a5fdd911d8	[fix] only use name key for default names	2015-09-03 20:35:08 -04:00
Al	d8e1432533	[osm] Adding unqualified names in single-language countries	2015-09-03 20:31:49 -04:00
Al	b15d2d70aa	[fix] top language	2015-09-03 20:09:46 -04:00
Al	44bf94a158	[osm] Better borders training data set (only need the metadata, not the polygons)	2015-09-03 20:09:03 -04:00

... 31 32 33 34 35 ...

1853 Commits