Al
|
ed0b49884e
|
[openaddresses] Changes to OA config utilizing some of the new cleanup options. Adding language to brussels-fr and brussels-nl, adding New York and New Jersey statewide with the understanding that OSM components will be added in NJ and postcodes will be stripped of letters in NY
|
2016-08-23 00:38:43 -04:00 |
|
Al
|
8ec288d8f8
|
[openaddresses] Adding ability to specify language of a particular OpenAddresses CSV a priori. Unless otherwise specified, non-numeric unit fields will be discarded and phrases will be added randomly for numeric unit fields.
|
2016-08-23 00:29:09 -04:00 |
|
Al
|
99f71b718f
|
[openaddresses] New command-line arguments to OpenAddresses training data script
|
2016-08-22 22:12:47 -04:00 |
|
Al
|
23be122d2e
|
[openaddresses] Adding ability to use OSM boundaries for OpenAddresses (not turned on by default), cleaning up street names, requiring at least house number and street, validating house number to provide some assurance that it's not a badly-formatted NULL value, adding ability to strip letters from postcode for data sets like New York's statewide where there are some codes attached.
|
2016-08-22 22:09:00 -04:00 |
|
Al
|
8b57a7acf2
|
[osm] abbreviate toponyms (qualifiers) with some probability so we get those versions in the model's phrase dictionaries
|
2016-08-22 20:55:35 -04:00 |
|
Al
|
d281e71d2c
|
[fix] removing metro station indexas a dependency for AddressComponents
|
2016-08-22 15:52:27 -04:00 |
|
Al
|
3fef3e56d5
|
[boundaries] converting Mexico City boroughs to city_district
|
2016-08-22 03:51:01 -04:00 |
|
Al
|
79c9694e2d
|
[names] Allowing for similarity-only normalization in name affixes
|
2016-08-22 03:47:08 -04:00 |
|
Al
|
72b5f6b55a
|
[dictionaries] German dictionary updates
|
2016-08-22 00:11:10 -04:00 |
|
Al
|
58851a9088
|
[normalization] Adding NORMALIZE_STRING_SIMPLE_LATIN_ASCII option so parser can normalize punctuation and HTML entities, etc. without touching the alphanumeric parts of the original input
|
2016-08-21 19:45:32 -04:00 |
|
Al
|
8b9702b43d
|
[error handling] Checking that resize succeeded in transliterate.c
|
2016-08-21 19:43:09 -04:00 |
|
Al
|
2644fed18f
|
[transliteration] Adding LATIN_ASCII_SIMPLE constant to transliterate.h
|
2016-08-21 19:42:10 -04:00 |
|
Al
|
4375bdea3b
|
[transliteration] strduping transliterator name while building table
|
2016-08-21 19:41:34 -04:00 |
|
Al
|
bde8776bc2
|
[transliteration] Regenerating transliteration data files
|
2016-08-21 19:41:11 -04:00 |
|
Al
|
cb4408fea8
|
[transliteration] Adding language-specific transliterators for handling umlauts in German + special transliterations in the Nordic languages. It may still result in some wrong transliterations if the language classifier is wrong, but generally it's accurate enough that its predictions can be relied upon. Also adding a Latin-ASCII-Simple transform which only does the punctuation portion of Latin-ASCII so it won't change anything substantial about the input string.
|
2016-08-20 18:17:46 -04:00 |
|
Al
|
85ae5d4a05
|
[fix] name
|
2016-08-19 23:38:33 -04:00 |
|
Al
|
7951044d74
|
[intersections] Abbreviating street names that are not base names with random probabilities
|
2016-08-19 23:27:29 -04:00 |
|
Al
|
42808c62e3
|
[fix] dictionary access
|
2016-08-19 16:02:36 -04:00 |
|
Al
|
41f715d6ee
|
[intersections] Better handling of default languages in intersection queries
|
2016-08-19 15:59:58 -04:00 |
|
Al
|
a7118b40a7
|
[intersections] Allowing tags like name_1, etc. to make it into road name permutations for intersections
|
2016-08-19 13:12:02 -04:00 |
|
Al
|
0b2d3d965f
|
[fix] using lat/lon from the node properties in intersections data
|
2016-08-19 12:23:08 -04:00 |
|
Al
|
294316c721
|
[intersections] no need to store lat/lon in intersections
|
2016-08-19 01:58:53 -04:00 |
|
Al
|
9a6ec41ce6
|
[points] Adding __iter__ and __len__ to point index
|
2016-08-19 01:01:05 -04:00 |
|
Al
|
f43abe0846
|
[fix] making cleaned_name a classmethod
|
2016-08-18 19:55:52 -04:00 |
|
Al
|
defc7ffacc
|
[fix] arg name again
|
2016-08-18 18:22:06 -04:00 |
|
Al
|
4a28225df6
|
[fix] name
|
2016-08-18 18:20:55 -04:00 |
|
Al
|
86b921c629
|
[intersections] Adding the intersection's properties for intersections in case we want to do anything with named intersections in Japan/Korea
|
2016-08-18 17:14:23 -04:00 |
|
Al
|
87ee5f47f9
|
[fix] check for None in binary_search
|
2016-08-18 15:12:23 -04:00 |
|
Al
|
1675bba3f0
|
[intersections] highway=crossing also valid
|
2016-08-18 03:00:23 -04:00 |
|
Al
|
f137d68e12
|
[intersections] only juction=yes and highway=traffic_signals count as intersections, should eliminate points that are simply joining two segments of the same road
|
2016-08-18 02:53:49 -04:00 |
|
Al
|
93586c2592
|
[fix] aliasing all_languages
|
2016-08-18 02:24:59 -04:00 |
|
Al
|
688f103e80
|
[fix] languages
|
2016-08-18 02:24:34 -04:00 |
|
Al
|
e3ac3200b3
|
[fix] disambiguating languages using one of the default street names in intersections data
|
2016-08-18 02:05:13 -04:00 |
|
Al
|
328398813a
|
[fix] itertools.combinations
|
2016-08-18 01:26:48 -04:00 |
|
Al
|
737cbf4457
|
[fix] reference before assignment
|
2016-08-18 01:24:30 -04:00 |
|
Al
|
b41ba7374b
|
[intersections] intersections training data, using a Cartesian product of all names in the same language, including something like tiger:name_base
|
2016-08-18 01:19:14 -04:00 |
|
Al
|
701bcb1d79
|
[intersections] Using name cleanup on intersections, including tiger:name_base which sometimes has semicolon delimiters as well
|
2016-08-17 18:47:07 -04:00 |
|
Al
|
7b314324ca
|
[osm/addresses] Factoring out semicolon/comma-delimited name cleanup into its own method
|
2016-08-17 18:45:33 -04:00 |
|
Al
|
145af9331e
|
[osm] build OSM training data for intersections using the JSON output from intersections.py rather having to compute each time
|
2016-08-17 18:11:55 -04:00 |
|
Al
|
a3ae1eb330
|
[intersections] Adding a read classmethod to intersections to read the intermediate JSON file
|
2016-08-17 15:29:59 -04:00 |
|
Al
|
96c753e8c6
|
[fix] adding logging on new intersections script
|
2016-08-16 23:55:22 -04:00 |
|
Al
|
5b172ad2d7
|
[intersections] Caching intersection creation in an intermediate script to save time diagnosing issues downstream
|
2016-08-16 23:52:58 -04:00 |
|
Al
|
330edc2c93
|
[utils] cstring_array_get_phrase requires a char_array to be passed in so it doesn't have to do any memory allocation
|
2016-08-16 13:11:45 -04:00 |
|
Al
|
92e66fd60c
|
[utils] string_next_hyphen_index
|
2016-08-16 12:49:52 -04:00 |
|
Al
|
7ff0cb2704
|
[fix] name and a few things for intersections data
|
2016-08-15 21:26:54 -04:00 |
|
Al
|
7ab6af4335
|
[fix] bounds
|
2016-08-15 12:01:22 -04:00 |
|
Al
|
060d3a1f86
|
[fix] var name
|
2016-08-15 11:18:00 -04:00 |
|
Al
|
29fc198aba
|
[osm] giving parse_osm_number_range a parameter for max range and setting it to 1000 for postal codes e.g. for major cities that may have several hundred postal codes
|
2016-08-15 10:34:24 -04:00 |
|
Al
|
637baad629
|
[osm] Adding at least min_references entries for every selected postcode
|
2016-08-15 10:30:28 -04:00 |
|
Al
|
aa6b9cd858
|
[fix] var name for place tags coming from the admin rtree
|
2016-08-15 10:25:19 -04:00 |
|