d137b31be3[fix] YAML config
Al
2016-08-26 22:48:26 -04:00
12d429b63d[openaddresses] Simple regex-based method to strip unit phrases tacked onto the end of a street
Al
2016-08-26 22:39:13 -04:00
318ad2a0c4[openaddresses] Removing <Null> tag from values in OpenAddresses, seeing it in Colorado county files
Al
2016-08-26 21:41:30 -04:00
cfd4537bb5[openaddresses] adding Nebraska (37th state in the union)
Al
2016-08-26 20:59:38 -04:00
0f9e8ee95d[openaddresses] Better handling of float postcodes
Al
2016-08-26 20:10:30 -04:00
0618e506c4[openaddresses] adding Nevada (36th state in the union)
Al
2016-08-26 19:59:03 -04:00
e6f4fd656a[openaddresses] Adding West Virginia (35th state in the union)
Al
2016-08-26 19:52:25 -04:00
19eceb0074[openaddresses] Adding Kansas (34th state in the union)
Al
2016-08-26 19:48:30 -04:00
219149988f[openaddresses] adding Oregon (33rd state in the union)
Al
2016-08-26 19:25:59 -04:00
56329439af[openaddresses] some postcodes in OpenAddresses are stored as floats, convert to int and then to string if that's the case
Al
2016-08-26 19:12:48 -04:00
155d46f46c[openaddresses] Adding Minnesota (32nd state in the union)
Al
2016-08-26 19:04:34 -04:00
e771e9b2d5[openaddresses] Adding Wisconsin (30th state in the union)
Al
2016-08-26 18:37:48 -04:00
0016cdbd7f[openaddresses] Adding Iowa (29th state in the union)
Al
2016-08-26 18:13:50 -04:00
0ec9593e6c[openaddresses] Adding Texas (28th state in the union, however reluctantly)
Al
2016-08-26 17:42:24 -04:00
79b4e0be90[dictionaries] service road abbreviations
Al
2016-08-26 16:46:02 -04:00
d696c792ae[openaddresses] Adding Florida (27th state in the union)
Al
2016-08-26 16:35:04 -04:00
2b9d58dcbe[openaddresses] Ignoring fields with null-like values as well (there appear to be no valid places named Null or None...yet)
Al
2016-08-26 15:48:32 -04:00
2654683af4[openaddresses] Adding quick-and-dirty regex-based exclusion list for fields containing various patterns in OpenAddresses, to be used sparingly
Al
2016-08-26 15:34:30 -04:00
7bcddeff44[openaddresses] Adding Michigan (26th state in the union)
Al
2016-08-26 13:42:13 -04:00
755a65aa14[openaddresses] Adding Arkansas (25th state in the union)
Al
2016-08-26 13:36:25 -04:00
d97bb9cd4c[openaddresses] Adding Missouri (24th state in the union)
Al
2016-08-26 13:36:09 -04:00
d4e76eac0b[openaddresses] Adding Alabama (22nd state in the Union)
Al
2016-08-26 13:12:39 -04:00
aa26277136[openaddresses] Adding Illinois (21st state in the union)
Al
2016-08-26 13:08:06 -04:00
4e9f9e8957[openaddresses] Replace multiple spaces with single space
Al
2016-08-26 12:45:49 -04:00
9e89147c83[openaddresses] removing spaces in numeric ranges in OpenAddresses, sometimes see things like '12 -23'
Al
2016-08-26 12:30:15 -04:00
a11abf2787[openaddresses] Adding Mississippi (20th state in the union)
Al
2016-08-26 10:58:16 -04:00
ebeb7f816a[openaddresses] Adding Indiana (19th state in the union)
Al
2016-08-26 10:48:05 -04:00
3b2c86d240[fix] strip values in OpenAddresses components
Al
2016-08-26 10:24:34 -04:00
472580320d[dictionaries] English synonyms update
Al
2016-08-26 10:19:51 -04:00
b2f8180d19[openaddresses] Ignore any fields in OpenAddresses which have N/A as a value
Al
2016-08-25 23:58:38 -04:00
01afbf80ef[data] Each curl process will retry the chunk up to 3 times
Al
2016-08-25 23:18:39 -04:00
c23a7a4030[openaddresses] Ditto for numeric boundary names
Al
2016-08-25 22:58:52 -04:00
34b01e203d[openaddresses] Don't allow single-letter boundary names as they're probably just typos
Al
2016-08-25 22:58:26 -04:00
3a8dee523d[openaddresses] Adding Louisiana (18th state in the union)
Al
2016-08-25 22:50:18 -04:00
9aea4451ff[openaddresses] Adding Ohio (17th state in the union)
Al
2016-08-25 20:29:26 -04:00
0b19f27d8d[openaddresses] Adding Tennessee (16th state in the union)
Al
2016-08-25 18:55:54 -04:00
59a840ab37[openaddresses] Adding Kentucky (15th state in the union)
Al
2016-08-25 18:38:53 -04:00
dc6e483067[openaddresses] Adding DC (not a state, but in after the original 13 colonies)
Al
2016-08-25 18:10:02 -04:00
e251fc42fa[openaddresses] Adding North Carolina (12th state in the union)
Al
2016-08-25 18:08:19 -04:00
2009b4c992[openaddresses] Adding Virginia (10th state in the union)
Al
2016-08-25 16:37:39 -04:00
859868aea2[openaddresses] Adding option to strip non-digits from postcode, addresses with a postcode and no house_number+street may still be useful, keeping them around as place queries to help with postcode contexts
Al
2016-08-25 16:36:18 -04:00
da619e3cf4[osm] Adding border_type=city to override tags
Al
2016-08-25 15:21:33 -04:00
93b377c8a7[openaddresses] Fixes for California, have to remove Orange County because it's all being stuffed into the street field
Al
2016-08-25 14:39:45 -04:00
dd0ca5e008[addresses] Adding admin_center properties to place components in add_admin_boundaries (only overriding for specified areas where the boundary may otherwise not have all the properties)
Al
2016-08-25 01:19:32 -04:00
b75419d6e8[boundaries] Luxembourg quarters = city_district
Al
2016-08-24 23:37:44 -04:00
2e7f8f1ae7[abbreviations] Adding toponyms gazetteer for probabilistically abbreviating things like Mount=>Mt, Saint=>St, Fort=>Ft in place names
Al
2016-08-24 18:52:00 -04:00
dfa5c8e0a6[abbreviations] Adding ability to abbreviate within hyphenated phrases e.g. Sint-Maarten => St.-Maarten
Al
2016-08-24 17:32:28 -04:00
a6dad74a2b[openaddresses] cleaning comma-delimited boundary components in OpenAddresses data sets
Al
2016-08-24 15:06:04 -04:00
14bc224f25[openaddresses] Adding OSM neighborhoods across the US wherever we have them. That index is relatively small and cheap to do lookups for every point whereas the general R-tree should be used only when necessary
Al
2016-08-24 14:58:07 -04:00
4552aa380c[openaddresses] Adding South Carolina
Al
2016-08-24 14:47:07 -04:00
84bb12657b[dictionaries] Adding a variety of abbreviations/misspellings for street, road, drive, and place
Al
2016-08-24 14:19:40 -04:00
709cecd300[dictionaries] Adding some MLK synonyms after looking at the Georgia data
Al
2016-08-24 14:18:44 -04:00
d250f58293[openaddresses] Also skipping addresses where street == unit
Al
2016-08-24 14:10:41 -04:00
f66fb4a172[openaddresses] Adding Maryland
Al
2016-08-24 13:54:07 -04:00
f9ec02c8e0[openaddresses] Adding Georgia. There's a lot of weirdness in there so whitelisting files. Files that weren't added were deliberate
Al
2016-08-24 13:52:35 -04:00
7c3ad708d8[openaddresses] Ensuring integer house numbers are > 0, street is not simply a numeric token (usually a copy of the house number) and that street != house number generally
Al
2016-08-24 13:46:56 -04:00
ad625a46a4[openaddresses] Adding Delaware and Pennsylvania. Going with the "older states in the union will have funkier addresses" strategy.
Al
2016-08-23 22:22:29 -04:00
f36ca6a788[dictionaries] Adding Asturian language and dictionaries for the Asturias region of Spain. Realized some of the default street names/addresses in Oviedo, etc. are actually Asturian rather than Spanish
Al
2016-08-23 21:52:22 -04:00
ff06462981[dictionaries] Adding oberste etage, unterste etage and parkdeck to German dictionaries. Generating as part of the sub-building info for the address parser
Al
2016-08-23 21:49:44 -04:00
de1255af00[auto][ci skip] Adding data files from Travis build #161
Travis
2016-08-23 22:48:20 +00:00
f03df6aab8Merge pull request #108 from petacat/patch-5
Al Barrentine
2016-08-23 18:38:08 -04:00
f19c9852aa[auto][ci skip] Adding data files from Travis build #160
Travis
2016-08-23 22:24:19 +00:00
d797d6c863[auto][ci skip] Adding data files from Travis build #159
Travis
2016-08-23 22:14:07 +00:00
d1991848a3Merge pull request #106 from petacat/patch-3
Al Barrentine
2016-08-23 18:09:47 -04:00
964b440380Merge pull request #104 from petacat/patch-1
Al Barrentine
2016-08-23 17:49:36 -04:00
a787c25cdfUpdate toponyms.txt
Thomas Rosen
2016-08-23 23:09:32 +02:00
7e258f2d87Update place_names.txt
Thomas Rosen
2016-08-23 23:03:31 +02:00
bd109dc9caUpdate directionals.txt
Thomas Rosen
2016-08-23 22:56:56 +02:00
e746cbab75[openaddresses] Adding New England states (postcodes beginning with 0).
Al
2016-08-23 02:51:20 -04:00
9866614f63[openaddresses] Using new config implementation, using neighborhoods/boroughs in NYC
Al
2016-08-23 02:14:29 -04:00
b7c600e496[openaddresses] adding numeric_postcodes_only and add_osm_neighborhoods options
Al
2016-08-23 02:11:21 -04:00
ed0b49884e[openaddresses] Changes to OA config utilizing some of the new cleanup options. Adding language to brussels-fr and brussels-nl, adding New York and New Jersey statewide with the understanding that OSM components will be added in NJ and postcodes will be stripped of letters in NY
Al
2016-08-23 00:38:43 -04:00
8ec288d8f8[openaddresses] Adding ability to specify language of a particular OpenAddresses CSV a priori. Unless otherwise specified, non-numeric unit fields will be discarded and phrases will be added randomly for numeric unit fields.
Al
2016-08-23 00:29:05 -04:00
99f71b718f[openaddresses] New command-line arguments to OpenAddresses training data script
Al
2016-08-22 22:12:47 -04:00
23be122d2e[openaddresses] Adding ability to use OSM boundaries for OpenAddresses (not turned on by default), cleaning up street names, requiring at least house number and street, validating house number to provide some assurance that it's not a badly-formatted NULL value, adding ability to strip letters from postcode for data sets like New York's statewide where there are some codes attached.
Al
2016-08-22 22:07:34 -04:00
8b57a7acf2[osm] abbreviate toponyms (qualifiers) with some probability so we get those versions in the model's phrase dictionaries
Al
2016-08-22 20:29:29 -04:00
d281e71d2c[fix] removing metro station indexas a dependency for AddressComponents
Al
2016-08-22 15:52:22 -04:00
3fef3e56d5[boundaries] converting Mexico City boroughs to city_district
Al
2016-08-22 03:51:01 -04:00
79c9694e2d[names] Allowing for similarity-only normalization in name affixes
Al
2016-08-22 03:47:03 -04:00
72b5f6b55a[dictionaries] German dictionary updates
Al
2016-08-22 00:11:10 -04:00
58851a9088[normalization] Adding NORMALIZE_STRING_SIMPLE_LATIN_ASCII option so parser can normalize punctuation and HTML entities, etc. without touching the alphanumeric parts of the original input
Al
2016-08-21 19:45:32 -04:00
8b9702b43d[error handling] Checking that resize succeeded in transliterate.c
Al
2016-08-21 19:43:09 -04:00
2644fed18f[transliteration] Adding LATIN_ASCII_SIMPLE constant to transliterate.h
Al
2016-08-21 19:42:10 -04:00
4375bdea3b[transliteration] strduping transliterator name while building table
Al
2016-08-21 19:41:34 -04:00
bde8776bc2[transliteration] Regenerating transliteration data files
Al
2016-08-21 19:41:11 -04:00
cb4408fea8[transliteration] Adding language-specific transliterators for handling umlauts in German + special transliterations in the Nordic languages. It may still result in some wrong transliterations if the language classifier is wrong, but generally it's accurate enough that its predictions can be relied upon. Also adding a Latin-ASCII-Simple transform which only does the punctuation portion of Latin-ASCII so it won't change anything substantial about the input string.
Al
2016-08-20 18:17:35 -04:00
85ae5d4a05[fix] name
Al
2016-08-19 23:38:33 -04:00
7951044d74[intersections] Abbreviating street names that are not base names with random probabilities
Al
2016-08-19 23:27:29 -04:00
42808c62e3[fix] dictionary access
Al
2016-08-19 16:02:36 -04:00
41f715d6ee[intersections] Better handling of default languages in intersection queries
Al
2016-08-19 15:59:54 -04:00
a7118b40a7[intersections] Allowing tags like name_1, etc. to make it into road name permutations for intersections
Al
2016-08-19 13:12:02 -04:00
0b2d3d965f[fix] using lat/lon from the node properties in intersections data
Al
2016-08-19 12:23:08 -04:00
294316c721[intersections] no need to store lat/lon in intersections
Al
2016-08-19 01:58:53 -04:00
9a6ec41ce6[points] Adding __iter__ and __len__ to point index
Al
2016-08-19 01:01:05 -04:00
f43abe0846[fix] making cleaned_name a classmethod
Al
2016-08-18 19:55:52 -04:00
defc7ffacc[fix] arg name again
Al
2016-08-18 18:22:06 -04:00
4a28225df6[fix] name
Al
2016-08-18 18:20:55 -04:00
86b921c629[intersections] Adding the intersection's properties for intersections in case we want to do anything with named intersections in Japan/Korea
Al
2016-08-18 17:14:23 -04:00
87ee5f47f9[fix] check for None in binary_search
Al
2016-08-18 15:12:23 -04:00
1675bba3f0[intersections] highway=crossing also valid
Al
2016-08-18 03:00:23 -04:00