9c090302f7[addresses] Topological sort of address component dependencies so they get checked/removed in order
Al
2016-05-31 16:01:49 -04:00
cd7cd292b7[states] State abbreviations for Brazil and Mexico
Al
2016-05-31 15:53:40 -04:00
90a2f2b2e0[parser] road has no dependencies
Al
2016-05-31 15:52:24 -04:00
29d16c9c80[openaddresses] Country code for Belgium, removing Flanders as it has encoding issues, removing region from New Zealand formats as it appears to be conflated with districts
Al
2016-05-31 12:11:42 -04:00
419f5961a5[fix] unused var
Al
2016-05-31 11:01:37 -04:00
7612e93fdf[addresses] French address config
Al
2016-05-31 03:36:07 -04:00
4b28791bb1[addresses] Spanish PO box probabilities
Al
2016-05-31 03:35:49 -04:00
a57ace0be0[openaddresses] OpenAddresses training script
Al
2016-05-31 02:33:32 -04:00
64824b90a9[openaddresses] Only adding units for Australia, as they're known to contain both designator and number. US units seem to often have simple numbers/letters for the unit field
Al
2016-05-31 02:20:28 -04:00
584a4e0ee8[openaddresses] Added components via OA config
Al
2016-05-31 02:12:41 -04:00
55d66af422[openaddresses] Adding abbreviated unit
Al
2016-05-31 02:11:52 -04:00
2120adefff[openaddresses] Adding unit by default (only for files that have been vetted)
Al
2016-05-31 02:06:52 -04:00
d910c6ca94[fix] OpenAddresses formatting
Al
2016-05-31 02:04:06 -04:00
802a5ee534[fix] condition
Al
2016-05-31 02:00:33 -04:00
e6a1d11324[fix] validators
Al
2016-05-31 01:59:05 -04:00
caa155c9c4[fix] method name
Al
2016-05-31 01:57:34 -04:00
4d0caec3d3[fix] return value
Al
2016-05-31 01:56:14 -04:00
0e09e1222f[fix] import again
Al
2016-05-31 01:53:17 -04:00
e5267996ea[fix] import
Al
2016-05-31 01:51:16 -04:00
10662e79d5[fix] directory structure
Al
2016-05-31 01:48:45 -04:00
0c9f1aa30d[fix] import
Al
2016-05-31 01:42:27 -04:00
1d80d8b6b8[openaddresses] OpenAddresses address formatter, using the config
Al
2016-05-31 01:41:16 -04:00
cc4b7109ab[openaddresses] OpenAddresses config specifying a few files
Al
2016-05-31 01:40:21 -04:00
91b06439e2[openaddresses] Fetch script for OpenAddresses
Al
2016-05-31 01:39:04 -04:00
a32f6b5017[addresses] Making address_language a classmethod
Al
2016-05-31 01:20:05 -04:00
420ceb6c38[intersections] Only requiring a tag to share at least two ways
Al
2016-05-30 23:10:04 -04:00
cc7727b13e[intersections] Adding intersections to config
Al
2016-05-30 23:08:00 -04:00
202dc0c58a[fix] name
Al
2016-05-30 23:06:45 -04:00
73b2aec25e[fix] input file
Al
2016-05-30 22:12:56 -04:00
89f6793243[fix] args
Al
2016-05-30 22:12:39 -04:00
51831e2111[fix] add ways db dir
Al
2016-05-30 22:07:01 -04:00
f7680e9b65[fix] name
Al
2016-05-30 22:01:17 -04:00
0a912766e4[fix] logging for intersections data
Al
2016-05-30 22:00:28 -04:00
baf8fbb381[fix] import
Al
2016-05-30 22:00:14 -04:00
b4a70a9a56[fix] import
Al
2016-05-30 21:58:12 -04:00
8aada7086f[intersections] intersections training data
Al
2016-05-30 21:50:45 -04:00
5075128ada[intersections] Adding places to intersection template, intersection phrase generator
Al
2016-05-30 21:07:14 -04:00
701e67614a[fix] import
Al
2016-05-30 14:53:55 -04:00
2454b98c6d[tokenization] Reverting commit for tokenizing initial/final apostrophes as part of words as it may be more effective to handle during post-processing
Al
2016-05-30 11:59:37 -04:00
0a8f46bdc3[parser] Using new geonames designations in parser features
Al
2016-05-29 01:40:45 -04:00
c383f8af88[parser] Using NFC normalization for parser as well, @ sign not defined as separator since it may also be used in intersections
Al
2016-05-29 01:37:38 -04:00
c2ee5a45b3[geodb] Adding separate bitset for geonames place types and using NFC normalization instead of NFD (requires retraining)
Al
2016-05-29 01:36:00 -04:00
6c39c663ff[normalize] Adding NORMALIZE_STRING_COMPOSE for NFC unicode normalization
Al
2016-05-28 19:25:12 -04:00
757c6147cb[tokenization] Adding ability to tokenize 's Gravenhage
Al
2016-05-28 19:24:19 -04:00
2e8888e331[fix] warnings/size_t in libpostal.c
Al
2016-05-28 19:19:31 -04:00
e800f21f06[gazetteers] Adding new gazetteer types/address components
Al
2016-05-28 19:19:18 -04:00
95b239a5f9[dictionaries] Adding letra to Spanish numbered unit dictionaries
Al
2016-05-28 19:15:02 -04:00
9561f771ce[dictionaries] Adding new dictionary types to generator script
Al
2016-05-28 17:16:43 -04:00
7aa06c4535[boundaries] Adding Bucharest sectors as city_district
Al
2016-05-27 20:22:56 -04:00
9aeb22bfbc[dictionaries] More dictionary refactoring
Al
2016-05-27 19:40:20 -04:00
6980565698[addresses] Allowing null_phrase_probability for alpha, and alpha+digits instead of just for ordinals (mostly for Spain)
Al
2016-05-27 13:40:38 -04:00
d4d8fa81d1[addresses] Adding increasing null_phrase_probability for plain numerics in Spain so things like 2o B make it into the training data
Al
2016-05-27 13:37:43 -04:00
35e73d0e40[places] setting probability of including island to 0.5 for Hawaii, 0.8 seems too high given all the Honolulu, HI addresses (not often seen as Honolulu, Oahu, HI)
Al
2016-05-27 11:32:52 -04:00
605b7c2b4f[dictionaries] Italian CAP abbreviations
Al
2016-05-27 11:31:16 -04:00
4e8e08086e[dictionaries] Russian place names
Al
2016-05-27 11:28:50 -04:00
8d33b62da2[dictionaries] Adding more fleshed out Greek dictionaries from a recent Nominatim NameFinder wiki update
Al
2016-05-27 11:28:23 -04:00
0d39cd94c2[dictionaries] Refactoring existing unit_types/level_types dictionaries to use the new more granular dictionary structure
Al
2016-05-27 11:27:34 -04:00
11d1acc3bc[parser] Sample chain store alternate names from the cross-language dictionary
Al
2016-05-26 12:09:10 -04:00
69e1c846ba[parser] Fixing config keys so OSM streets/venues get abbreviated. Selecting namespaced address fields in cases like Brussels or Hong Kong where everything is bilingual. Adding the ability to pass a known language into address component expansion
Al
2016-05-26 12:05:46 -04:00
e5e0cf3b92[fix] loading transliteration module in address_parser_test.c as well
Al
2016-05-25 19:54:01 -04:00
8e338c5ffb[fix] ON needs to be quotes in YAML, uppercase Yukon abbreviation
Al
2016-05-25 19:12:15 -04:00
b8d43dc601[fix] cstring_array_split calls
Al
2016-05-25 17:58:30 -04:00
b19cd3f60a[fix] brace
Al
2016-05-25 17:52:00 -04:00
994b2f18e4[parser] Ignore multiple spaces in parser input post-normalization. If normalizing the string creates several distinct tokens (namely in Vulgar fractions e.g. ½ => 1/2), add all the sub-tokens with the same label as the parent
Al
2016-05-25 17:50:29 -04:00
b664ab1cea[utils] Adding cstring_array_split_ignore_consecutive
Al
2016-05-25 17:07:20 -04:00
8e90ee45d2[fix] calls and NULL checks
Al
2016-05-25 15:50:53 -04:00
e3cffaf0d1[fix] tokenized_string_t should copy its source string
Al
2016-05-25 15:47:57 -04:00
16501aba17[fix] Need to load transliteration module for Latin-ASCII normalization
Al
2016-05-25 15:25:34 -04:00
b326e209fb[places] Adding Town of to English prefixes
Al
2016-05-25 11:23:31 -04:00
366c4995af[parser] lower full-name probability for states
Al
2016-05-25 00:47:36 -04:00
d88be7ef5d[fix] use simple language code if language_script cannot be found
Al
2016-05-24 19:49:08 -04:00
90467e9098[fix] global formatter config
Al
2016-05-24 19:44:40 -04:00
16a91528d6[fix] config key name
Al
2016-05-24 19:39:12 -04:00
d3b936067e[fix] neighborhood reverse geocoder using the new OSM definitions module which keeps track of whatever the data fetching script defines as being a valid {neighborhood, admin boundary, etc.}
Al
2016-05-24 19:27:22 -04:00
b294b891dd[boundaries] lines sharing a point are added to the polygon head-to-tail, reversing the node order as needed, produces accurate OSM polygons for reverse geocoding lookups
Al
2016-05-24 19:24:37 -04:00
75aa713792[fix] moving language code replacements out of address components
Al
2016-05-24 16:55:46 -04:00
6cb834b3a3[boundaries] admin_level=8 is city_district in Japan
Al
2016-05-24 16:53:42 -04:00
308080f6ee[formatting] Moving language country overrides to formatter config so actual language is retained
Al
2016-05-24 16:52:08 -04:00
e59e3a173c[fix] place=municipality
Al
2016-05-24 15:35:33 -04:00
3c16973cac[fix] OSM neighborhood ids
Al
2016-05-24 15:13:07 -04:00
d86443a697[fix] Adding basic Han numeral replacement to neighborhood deduping
Al
2016-05-24 14:55:54 -04:00
046f445a56[fix] component bitsets
Al
2016-05-24 13:07:32 -04:00
0dbfd79b72[fix] language format changes only apply to local languages
Al
2016-05-24 12:59:32 -04:00
12f86875e2[formatting] Increase probability of postcode before city
Al
2016-05-24 12:21:04 -04:00
890268aa87[languages] Use English formats for Romanized CJK
Al
2016-05-24 12:13:58 -04:00
ad4b197ead[fix] floor samples
Al
2016-05-24 11:16:57 -04:00
e53e61358d[fix] Don't remove chome from Japanese, as the neighborhoods are usually just plain numbers
Al
2016-05-23 18:17:04 -04:00
110be7a245[fix] args
Al
2016-05-23 17:42:34 -04:00
9772e85c87[fix] US/Canada probabilities for industrial/commercial
Al
2016-05-23 16:22:27 -04:00
d4e913c55f[boundaries] Adding CP and civil parish to English place suffixes
Al
2016-05-23 15:47:57 -04:00
a5331f7107[osm] Venue name depends on one of {house_number, road, suburb, city_district, city, postcode}
Al
2016-05-23 15:46:59 -04:00
2d1e7ca990[fix] Spanish office probabilities
Al
2016-05-23 15:35:55 -04:00
a1421d4a68[fix] floors
Al
2016-05-23 15:18:10 -04:00
5ea570835e[fix] args again
Al
2016-05-23 15:01:58 -04:00
7c41d84d8f[fix] args
Al
2016-05-23 14:59:22 -04:00
2e4ba6e6cc[subdivisions/buildings] Adding subdivisions and buildings rtree to training data for getting building height, zone
Al
2016-05-23 14:51:44 -04:00
52aa95c213[subdivisions] Adding zone types
Al
2016-05-23 14:45:55 -04:00
91db1ec371[fix] removing unnecessary vars
Al
2016-05-23 13:04:25 -04:00
694020ddf3[fix] all_names returns a list not a set
Al
2016-05-23 13:04:00 -04:00
97d2bfb508[osm] venue names
Al
2016-05-23 12:51:28 -04:00