Al
|
472580320d
|
[dictionaries] English synonyms update
|
2016-08-26 10:19:51 -04:00 |
|
Al
|
b2f8180d19
|
[openaddresses] Ignore any fields in OpenAddresses which have N/A as a value
|
2016-08-25 23:58:38 -04:00 |
|
Al
|
c23a7a4030
|
[openaddresses] Ditto for numeric boundary names
|
2016-08-25 22:58:52 -04:00 |
|
Al
|
34b01e203d
|
[openaddresses] Don't allow single-letter boundary names as they're probably just typos
|
2016-08-25 22:58:26 -04:00 |
|
Al
|
3a8dee523d
|
[openaddresses] Adding Louisiana (18th state in the union)
|
2016-08-25 22:50:18 -04:00 |
|
Al
|
9aea4451ff
|
[openaddresses] Adding Ohio (17th state in the union)
|
2016-08-25 22:01:57 -04:00 |
|
Al
|
0b19f27d8d
|
[openaddresses] Adding Tennessee (16th state in the union)
|
2016-08-25 18:55:54 -04:00 |
|
Al
|
59a840ab37
|
[openaddresses] Adding Kentucky (15th state in the union)
|
2016-08-25 18:38:53 -04:00 |
|
Al
|
dc6e483067
|
[openaddresses] Adding DC (not a state, but in after the original 13 colonies)
|
2016-08-25 18:11:54 -04:00 |
|
Al
|
e251fc42fa
|
[openaddresses] Adding North Carolina (12th state in the union)
|
2016-08-25 18:08:19 -04:00 |
|
Al
|
2009b4c992
|
[openaddresses] Adding Virginia (10th state in the union)
|
2016-08-25 16:37:39 -04:00 |
|
Al
|
859868aea2
|
[openaddresses] Adding option to strip non-digits from postcode, addresses with a postcode and no house_number+street may still be useful, keeping them around as place queries to help with postcode contexts
|
2016-08-25 16:36:18 -04:00 |
|
Al
|
da619e3cf4
|
[osm] Adding border_type=city to override tags
|
2016-08-25 15:21:33 -04:00 |
|
Al
|
93b377c8a7
|
[openaddresses] Fixes for California, have to remove Orange County because it's all being stuffed into the street field
|
2016-08-25 14:39:45 -04:00 |
|
Al
|
dd0ca5e008
|
[addresses] Adding admin_center properties to place components in add_admin_boundaries (only overriding for specified areas where the boundary may otherwise not have all the properties)
|
2016-08-25 01:20:06 -04:00 |
|
Al
|
b75419d6e8
|
[boundaries] Luxembourg quarters = city_district
|
2016-08-24 23:37:44 -04:00 |
|
Al
|
2e7f8f1ae7
|
[abbreviations] Adding toponyms gazetteer for probabilistically abbreviating things like Mount=>Mt, Saint=>St, Fort=>Ft in place names
|
2016-08-24 18:52:00 -04:00 |
|
Al
|
dfa5c8e0a6
|
[abbreviations] Adding ability to abbreviate within hyphenated phrases e.g. Sint-Maarten => St.-Maarten
|
2016-08-24 18:50:24 -04:00 |
|
Al
|
a6dad74a2b
|
[openaddresses] cleaning comma-delimited boundary components in OpenAddresses data sets
|
2016-08-24 15:06:04 -04:00 |
|
Al
|
14bc224f25
|
[openaddresses] Adding OSM neighborhoods across the US wherever we have them. That index is relatively small and cheap to do lookups for every point whereas the general R-tree should be used only when necessary
|
2016-08-24 14:58:19 -04:00 |
|
Al
|
4552aa380c
|
[openaddresses] Adding South Carolina
|
2016-08-24 14:47:07 -04:00 |
|
Al
|
84bb12657b
|
[dictionaries] Adding a variety of abbreviations/misspellings for street, road, drive, and place
|
2016-08-24 14:19:40 -04:00 |
|
Al
|
709cecd300
|
[dictionaries] Adding some MLK synonyms after looking at the Georgia data
|
2016-08-24 14:18:44 -04:00 |
|
Al
|
d250f58293
|
[openaddresses] Also skipping addresses where street == unit
|
2016-08-24 14:10:41 -04:00 |
|
Al
|
f66fb4a172
|
[openaddresses] Adding Maryland
|
2016-08-24 13:54:40 -04:00 |
|
Al
|
f9ec02c8e0
|
[openaddresses] Adding Georgia. There's a lot of weirdness in there so whitelisting files. Files that weren't added were deliberate
|
2016-08-24 13:52:35 -04:00 |
|
Al
|
7c3ad708d8
|
[openaddresses] Ensuring integer house numbers are > 0, street is not simply a numeric token (usually a copy of the house number) and that street != house number generally
|
2016-08-24 13:46:56 -04:00 |
|
Al
|
ad625a46a4
|
[openaddresses] Adding Delaware and Pennsylvania. Going with the "older states in the union will have funkier addresses" strategy.
|
2016-08-23 22:22:35 -04:00 |
|
Al
|
f36ca6a788
|
[dictionaries] Adding Asturian language and dictionaries for the Asturias region of Spain. Realized some of the default street names/addresses in Oviedo, etc. are actually Asturian rather than Spanish
|
2016-08-23 21:52:22 -04:00 |
|
Al
|
ff06462981
|
[dictionaries] Adding oberste etage, unterste etage and parkdeck to German dictionaries. Generating as part of the sub-building info for the address parser
|
2016-08-23 21:49:44 -04:00 |
|
Al
|
e746cbab75
|
[openaddresses] Adding New England states (postcodes beginning with 0).
|
2016-08-23 02:51:20 -04:00 |
|
Al
|
9866614f63
|
[openaddresses] Using new config implementation, using neighborhoods/boroughs in NYC
|
2016-08-23 02:14:29 -04:00 |
|
Al
|
b7c600e496
|
[openaddresses] adding numeric_postcodes_only and add_osm_neighborhoods options
|
2016-08-23 02:11:21 -04:00 |
|
Al
|
ed0b49884e
|
[openaddresses] Changes to OA config utilizing some of the new cleanup options. Adding language to brussels-fr and brussels-nl, adding New York and New Jersey statewide with the understanding that OSM components will be added in NJ and postcodes will be stripped of letters in NY
|
2016-08-23 00:38:43 -04:00 |
|
Al
|
8ec288d8f8
|
[openaddresses] Adding ability to specify language of a particular OpenAddresses CSV a priori. Unless otherwise specified, non-numeric unit fields will be discarded and phrases will be added randomly for numeric unit fields.
|
2016-08-23 00:29:09 -04:00 |
|
Al
|
99f71b718f
|
[openaddresses] New command-line arguments to OpenAddresses training data script
|
2016-08-22 22:12:47 -04:00 |
|
Al
|
23be122d2e
|
[openaddresses] Adding ability to use OSM boundaries for OpenAddresses (not turned on by default), cleaning up street names, requiring at least house number and street, validating house number to provide some assurance that it's not a badly-formatted NULL value, adding ability to strip letters from postcode for data sets like New York's statewide where there are some codes attached.
|
2016-08-22 22:09:00 -04:00 |
|
Al
|
8b57a7acf2
|
[osm] abbreviate toponyms (qualifiers) with some probability so we get those versions in the model's phrase dictionaries
|
2016-08-22 20:55:35 -04:00 |
|
Al
|
d281e71d2c
|
[fix] removing metro station indexas a dependency for AddressComponents
|
2016-08-22 15:52:27 -04:00 |
|
Al
|
3fef3e56d5
|
[boundaries] converting Mexico City boroughs to city_district
|
2016-08-22 03:51:01 -04:00 |
|
Al
|
79c9694e2d
|
[names] Allowing for similarity-only normalization in name affixes
|
2016-08-22 03:47:08 -04:00 |
|
Al
|
72b5f6b55a
|
[dictionaries] German dictionary updates
|
2016-08-22 00:11:10 -04:00 |
|
Al
|
58851a9088
|
[normalization] Adding NORMALIZE_STRING_SIMPLE_LATIN_ASCII option so parser can normalize punctuation and HTML entities, etc. without touching the alphanumeric parts of the original input
|
2016-08-21 19:45:32 -04:00 |
|
Al
|
8b9702b43d
|
[error handling] Checking that resize succeeded in transliterate.c
|
2016-08-21 19:43:09 -04:00 |
|
Al
|
2644fed18f
|
[transliteration] Adding LATIN_ASCII_SIMPLE constant to transliterate.h
|
2016-08-21 19:42:10 -04:00 |
|
Al
|
4375bdea3b
|
[transliteration] strduping transliterator name while building table
|
2016-08-21 19:41:34 -04:00 |
|
Al
|
bde8776bc2
|
[transliteration] Regenerating transliteration data files
|
2016-08-21 19:41:11 -04:00 |
|
Al
|
cb4408fea8
|
[transliteration] Adding language-specific transliterators for handling umlauts in German + special transliterations in the Nordic languages. It may still result in some wrong transliterations if the language classifier is wrong, but generally it's accurate enough that its predictions can be relied upon. Also adding a Latin-ASCII-Simple transform which only does the punctuation portion of Latin-ASCII so it won't change anything substantial about the input string.
|
2016-08-20 18:17:46 -04:00 |
|
Al
|
85ae5d4a05
|
[fix] name
|
2016-08-19 23:38:33 -04:00 |
|
Al
|
7951044d74
|
[intersections] Abbreviating street names that are not base names with random probabilities
|
2016-08-19 23:27:29 -04:00 |
|