libpostal

Author	SHA1	Message	Date
Al	2e7f8f1ae7	[abbreviations] Adding toponyms gazetteer for probabilistically abbreviating things like Mount=>Mt, Saint=>St, Fort=>Ft in place names	2016-08-24 18:52:00 -04:00
Al	dfa5c8e0a6	[abbreviations] Adding ability to abbreviate within hyphenated phrases e.g. Sint-Maarten => St.-Maarten	2016-08-24 18:50:24 -04:00
Al	a6dad74a2b	[openaddresses] cleaning comma-delimited boundary components in OpenAddresses data sets	2016-08-24 15:06:04 -04:00
Al	d250f58293	[openaddresses] Also skipping addresses where street == unit	2016-08-24 14:10:41 -04:00
Al	7c3ad708d8	[openaddresses] Ensuring integer house numbers are > 0, street is not simply a numeric token (usually a copy of the house number) and that street != house number generally	2016-08-24 13:46:56 -04:00
Al	b7c600e496	[openaddresses] adding numeric_postcodes_only and add_osm_neighborhoods options	2016-08-23 02:11:21 -04:00
Al	ed0b49884e	[openaddresses] Changes to OA config utilizing some of the new cleanup options. Adding language to brussels-fr and brussels-nl, adding New York and New Jersey statewide with the understanding that OSM components will be added in NJ and postcodes will be stripped of letters in NY	2016-08-23 00:38:43 -04:00
Al	8ec288d8f8	[openaddresses] Adding ability to specify language of a particular OpenAddresses CSV a priori. Unless otherwise specified, non-numeric unit fields will be discarded and phrases will be added randomly for numeric unit fields.	2016-08-23 00:29:09 -04:00
Al	99f71b718f	[openaddresses] New command-line arguments to OpenAddresses training data script	2016-08-22 22:12:47 -04:00
Al	23be122d2e	[openaddresses] Adding ability to use OSM boundaries for OpenAddresses (not turned on by default), cleaning up street names, requiring at least house number and street, validating house number to provide some assurance that it's not a badly-formatted NULL value, adding ability to strip letters from postcode for data sets like New York's statewide where there are some codes attached.	2016-08-22 22:09:00 -04:00
Al	8b57a7acf2	[osm] abbreviate toponyms (qualifiers) with some probability so we get those versions in the model's phrase dictionaries	2016-08-22 20:55:35 -04:00
Al	d281e71d2c	[fix] removing metro station indexas a dependency for AddressComponents	2016-08-22 15:52:27 -04:00
Al	79c9694e2d	[names] Allowing for similarity-only normalization in name affixes	2016-08-22 03:47:08 -04:00
Al	cb4408fea8	[transliteration] Adding language-specific transliterators for handling umlauts in German + special transliterations in the Nordic languages. It may still result in some wrong transliterations if the language classifier is wrong, but generally it's accurate enough that its predictions can be relied upon. Also adding a Latin-ASCII-Simple transform which only does the punctuation portion of Latin-ASCII so it won't change anything substantial about the input string.	2016-08-20 18:17:46 -04:00
Al	85ae5d4a05	[fix] name	2016-08-19 23:38:33 -04:00
Al	7951044d74	[intersections] Abbreviating street names that are not base names with random probabilities	2016-08-19 23:27:29 -04:00
Al	42808c62e3	[fix] dictionary access	2016-08-19 16:02:36 -04:00
Al	41f715d6ee	[intersections] Better handling of default languages in intersection queries	2016-08-19 15:59:58 -04:00
Al	a7118b40a7	[intersections] Allowing tags like name_1, etc. to make it into road name permutations for intersections	2016-08-19 13:12:02 -04:00
Al	0b2d3d965f	[fix] using lat/lon from the node properties in intersections data	2016-08-19 12:23:08 -04:00
Al	294316c721	[intersections] no need to store lat/lon in intersections	2016-08-19 01:58:53 -04:00
Al	9a6ec41ce6	[points] Adding __iter__ and __len__ to point index	2016-08-19 01:01:05 -04:00
Al	f43abe0846	[fix] making cleaned_name a classmethod	2016-08-18 19:55:52 -04:00
Al	defc7ffacc	[fix] arg name again	2016-08-18 18:22:06 -04:00
Al	4a28225df6	[fix] name	2016-08-18 18:20:55 -04:00
Al	86b921c629	[intersections] Adding the intersection's properties for intersections in case we want to do anything with named intersections in Japan/Korea	2016-08-18 17:14:23 -04:00
Al	87ee5f47f9	[fix] check for None in binary_search	2016-08-18 15:12:23 -04:00
Al	1675bba3f0	[intersections] highway=crossing also valid	2016-08-18 03:00:23 -04:00
Al	f137d68e12	[intersections] only juction=yes and highway=traffic_signals count as intersections, should eliminate points that are simply joining two segments of the same road	2016-08-18 02:53:49 -04:00
Al	93586c2592	[fix] aliasing all_languages	2016-08-18 02:24:59 -04:00
Al	688f103e80	[fix] languages	2016-08-18 02:24:34 -04:00
Al	e3ac3200b3	[fix] disambiguating languages using one of the default street names in intersections data	2016-08-18 02:05:13 -04:00
Al	328398813a	[fix] itertools.combinations	2016-08-18 01:26:48 -04:00
Al	737cbf4457	[fix] reference before assignment	2016-08-18 01:24:30 -04:00
Al	b41ba7374b	[intersections] intersections training data, using a Cartesian product of all names in the same language, including something like tiger:name_base	2016-08-18 01:19:14 -04:00
Al	701bcb1d79	[intersections] Using name cleanup on intersections, including tiger:name_base which sometimes has semicolon delimiters as well	2016-08-17 18:47:07 -04:00
Al	7b314324ca	[osm/addresses] Factoring out semicolon/comma-delimited name cleanup into its own method	2016-08-17 18:45:33 -04:00
Al	145af9331e	[osm] build OSM training data for intersections using the JSON output from intersections.py rather having to compute each time	2016-08-17 18:11:55 -04:00
Al	a3ae1eb330	[intersections] Adding a read classmethod to intersections to read the intermediate JSON file	2016-08-17 15:29:59 -04:00
Al	96c753e8c6	[fix] adding logging on new intersections script	2016-08-16 23:55:22 -04:00
Al	5b172ad2d7	[intersections] Caching intersection creation in an intermediate script to save time diagnosing issues downstream	2016-08-16 23:52:58 -04:00
Al	7ff0cb2704	[fix] name and a few things for intersections data	2016-08-15 21:26:54 -04:00
Al	7ab6af4335	[fix] bounds	2016-08-15 12:01:22 -04:00
Al	060d3a1f86	[fix] var name	2016-08-15 11:18:00 -04:00
Al	29fc198aba	[osm] giving parse_osm_number_range a parameter for max range and setting it to 1000 for postal codes e.g. for major cities that may have several hundred postal codes	2016-08-15 10:34:24 -04:00
Al	637baad629	[osm] Adding at least min_references entries for every selected postcode	2016-08-15 10:30:28 -04:00
Al	aa6b9cd858	[fix] var name for place tags coming from the admin rtree	2016-08-15 10:25:19 -04:00
Al	bc8acb196c	[osm] Pulling valid postal codes out into a method	2016-08-13 01:49:26 -04:00
Al	22123b80ba	[fix] refactoring geonames script a bit	2016-08-11 21:31:39 -04:00
Al	48755ec218	[boundaries] Adding regex replacements for boundary names such as Lyon 2e Arrondissement where putting Lyon is the OSM convention but we might sometimes want just 2e Arrondissement to appear in the training data next to Lyon	2016-08-11 13:09:24 -04:00

... 9 10 11 12 13 ...

1990 Commits