Al
|
85b402063b
|
[fix] escape literal backslash in address dictionaries
|
2016-12-24 16:05:45 -05:00 |
|
Al
|
cca80b046c
|
[abbreviation] fixing abbreviations within hyphenated phrases, particularly for prefix/suffix matches
|
2016-12-03 17:55:11 -05:00 |
|
Al
|
e15036fcce
|
[fix] if there are street types that are not venue words and not vice versa, then call the venue invalid as a standalone term
|
2016-11-19 04:11:33 -05:00 |
|
Al
|
5140db536a
|
[phrases] additions to venue names dictionaries and a more restrictive version of street types dictionaries
|
2016-11-19 02:58:27 -05:00 |
|
Al
|
71be0fdfbc
|
[fix] sets
|
2016-11-19 02:30:40 -05:00 |
|
Al
|
b6f7b5b577
|
[fix] name
|
2016-11-19 01:38:15 -05:00 |
|
Al
|
1df1b60a9f
|
[phrases] adding extract_phrases method to gazetteers, which returns a set of gazetteer phrases found in a given string
|
2016-11-18 23:35:44 -05:00 |
|
Al
|
1d25f08b52
|
[expand] adding a function to check if two place names/addresses are equivalent after token normalization (replacing hyphens, deleting final periods, lowercasing, simple transliteration, etc.) and taking into account abbreviations from any specified libpostal dictionaries. In conjunction with place name affixes, useful in data sets like GeoPlanet or GeoNames to determine if a name variant is related to the original or not
|
2016-10-12 14:55:59 -04:00 |
|
Al
|
14c20091f4
|
[fix] abbreviations in hyphenated phrases like Saint-Germaine. Hyphenation should use the phrase length not the token length
|
2016-09-12 22:20:25 -04:00 |
|
Al
|
551cce8cb1
|
[fix] making a separate gazetteer for toponym abbreviations
|
2016-09-10 01:08:58 -04:00 |
|
Al
|
bae04eb543
|
[fix] int
|
2016-08-28 14:11:25 -04:00 |
|
Al
|
de0a7bfe4f
|
[fix] /or/and/
|
2016-08-28 14:09:30 -04:00 |
|
Al
|
44e59e8daf
|
[fix] return the original for already abbreviated tokens
|
2016-08-28 14:05:58 -04:00 |
|
Al
|
3cf3e401db
|
[fix] abbreviation recasing
|
2016-08-28 12:04:36 -04:00 |
|
Al
|
2e7f8f1ae7
|
[abbreviations] Adding toponyms gazetteer for probabilistically abbreviating things like Mount=>Mt, Saint=>St, Fort=>Ft in place names
|
2016-08-24 18:52:00 -04:00 |
|
Al
|
dfa5c8e0a6
|
[abbreviations] Adding ability to abbreviate within hyphenated phrases e.g. Sint-Maarten => St.-Maarten
|
2016-08-24 18:50:24 -04:00 |
|
Al
|
8b57a7acf2
|
[osm] abbreviate toponyms (qualifiers) with some probability so we get those versions in the model's phrase dictionaries
|
2016-08-22 20:55:35 -04:00 |
|
Al
|
dd7ef6fabf
|
[dictionaries] Making new component for near/nearby prepositions
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
9561f771ce
|
[dictionaries] Adding new dictionary types to generator script
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
4e4686fbfe
|
[gazetteers] Street and synonym dictionary for catching other abbreviations that occur in street names
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
38607b0a50
|
[fix] var name for error case
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
b50120f45c
|
[chains] Adding chains gazetteer
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
771a360a85
|
[phrases] Using safe_encode/safe_decode as default trie serializer/deserializer
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
3a9ac9d96f
|
[fix] six.u
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
7b42e52c6a
|
[fix] token_types.PHRASE
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
d5dc34ec1d
|
[gazetteers] moving PHRASE to a token type
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
62748b4644
|
[dictionaries] /house_number/house_numbers/
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
6d4e54cd7a
|
[dictionaries] making entrances/postcodes plural for consistency
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
410eb0006a
|
[dictionaries] Moving intersections to cross streets
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
2f9a58f37b
|
[expansion] Add postcode dictionary to gazetteer types
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
e1f1e34dca
|
[expansion] Modifying the Python gazetteers to use new dictionaries API
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
80089099e9
|
[expansion] Adding number and intersections to dictionary types
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
3d3aacae67
|
[addresses] Adding abbreviations as a separate module so it can be used with multiple data sets
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
9dd5d5c210
|
[dictionaries] encapsulating reading address dictionaries so it's easy to implement sampling for the address training data
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
f3a9f4a257
|
[fix] removing init_gazetteers, doing it at the module level
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
0162194dbc
|
[dictionaries] Adding dictionary type enums to the generator script
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
18e2c7519e
|
[fix] Absolute dir check in generating expansion data files
|
2016-03-13 23:23:46 -04:00 |
|
Al
|
1003832b9c
|
[fix] README should not be included in building address dictionaries
|
2016-03-09 11:18:19 -05:00 |
|
Al
|
52ebc9fc46
|
[fix] Paths relative to the current file in address_dictionaries.py so it can be run from anywhere
|
2016-02-24 13:10:44 -05:00 |
|
Al
|
b22646ee30
|
[mv] Moving gazetteers into their own module
|
2016-01-22 03:15:56 -05:00 |
|
Al
|
35db855819
|
[fix] canonical index in address expansion data, should be -1 for all canonical phrases
|
2015-12-08 15:09:51 -05:00 |
|
Al
|
a5ce1f12dd
|
[fix] stdint header in address expansion rule generation script
|
2015-08-08 23:28:11 -04:00 |
|
Al
|
b27af13f8a
|
[expansion] Adding an array of dictionaries to each (phrase, canonical) pair
|
2015-07-22 20:24:14 -04:00 |
|
Al
|
64a63fdf51
|
[mv] Moving all repo data files to a resources dir, data is only for runtime files
|
2015-07-21 18:11:36 -04:00 |
|
Al
|
7f67ed7dc0
|
[fix] less ambiguous variable name in the generated expansions data file
|
2015-07-20 02:58:26 -04:00 |
|
Al
|
b9103a39fa
|
[expansion] Moving filename=>dictionary type mapping to the Python generation script and validating there
|
2015-07-16 03:51:11 -04:00 |
|
Al
|
f181c04e7a
|
[expansion] expansion rule structs and Python script to generate rules from dictionaries tree. Note that a canonical_index of -1 indicates that a given phrase is the canonical (saves space)
|
2015-07-16 02:49:53 -04:00 |
|