Commit Graph

30 Commits

Author SHA1 Message Date
Al
dd7ef6fabf [dictionaries] Making new component for near/nearby prepositions 2016-07-21 17:04:57 -04:00
Al
9561f771ce [dictionaries] Adding new dictionary types to generator script 2016-07-21 17:04:57 -04:00
Al
4e4686fbfe [gazetteers] Street and synonym dictionary for catching other abbreviations that occur in street names 2016-07-21 17:04:57 -04:00
Al
38607b0a50 [fix] var name for error case 2016-07-21 17:04:57 -04:00
Al
b50120f45c [chains] Adding chains gazetteer 2016-07-21 17:04:57 -04:00
Al
771a360a85 [phrases] Using safe_encode/safe_decode as default trie serializer/deserializer 2016-07-21 17:04:57 -04:00
Al
3a9ac9d96f [fix] six.u 2016-07-21 17:04:57 -04:00
Al
7b42e52c6a [fix] token_types.PHRASE 2016-07-21 17:04:57 -04:00
Al
d5dc34ec1d [gazetteers] moving PHRASE to a token type 2016-07-21 17:04:57 -04:00
Al
62748b4644 [dictionaries] /house_number/house_numbers/ 2016-07-21 17:04:57 -04:00
Al
6d4e54cd7a [dictionaries] making entrances/postcodes plural for consistency 2016-07-21 17:04:57 -04:00
Al
410eb0006a [dictionaries] Moving intersections to cross streets 2016-07-21 17:04:57 -04:00
Al
2f9a58f37b [expansion] Add postcode dictionary to gazetteer types 2016-07-21 17:04:57 -04:00
Al
e1f1e34dca [expansion] Modifying the Python gazetteers to use new dictionaries API 2016-07-21 17:04:57 -04:00
Al
80089099e9 [expansion] Adding number and intersections to dictionary types 2016-07-21 17:04:57 -04:00
Al
3d3aacae67 [addresses] Adding abbreviations as a separate module so it can be used with multiple data sets 2016-07-21 17:04:57 -04:00
Al
9dd5d5c210 [dictionaries] encapsulating reading address dictionaries so it's easy to implement sampling for the address training data 2016-07-21 17:04:57 -04:00
Al
f3a9f4a257 [fix] removing init_gazetteers, doing it at the module level 2016-07-21 17:04:57 -04:00
Al
0162194dbc [dictionaries] Adding dictionary type enums to the generator script 2016-07-21 17:04:57 -04:00
Al
18e2c7519e [fix] Absolute dir check in generating expansion data files 2016-03-13 23:23:46 -04:00
Al
1003832b9c [fix] README should not be included in building address dictionaries 2016-03-09 11:18:19 -05:00
Al
52ebc9fc46 [fix] Paths relative to the current file in address_dictionaries.py so it can be run from anywhere 2016-02-24 13:10:44 -05:00
Al
b22646ee30 [mv] Moving gazetteers into their own module 2016-01-22 03:15:56 -05:00
Al
35db855819 [fix] canonical index in address expansion data, should be -1 for all canonical phrases 2015-12-08 15:09:51 -05:00
Al
a5ce1f12dd [fix] stdint header in address expansion rule generation script 2015-08-08 23:28:11 -04:00
Al
b27af13f8a [expansion] Adding an array of dictionaries to each (phrase, canonical) pair 2015-07-22 20:24:14 -04:00
Al
64a63fdf51 [mv] Moving all repo data files to a resources dir, data is only for runtime files 2015-07-21 18:11:36 -04:00
Al
7f67ed7dc0 [fix] less ambiguous variable name in the generated expansions data file 2015-07-20 02:58:26 -04:00
Al
b9103a39fa [expansion] Moving filename=>dictionary type mapping to the Python generation script and validating there 2015-07-16 03:51:11 -04:00
Al
f181c04e7a [expansion] expansion rule structs and Python script to generate rules from dictionaries tree. Note that a canonical_index of -1 indicates that a given phrase is the canonical (saves space) 2015-07-16 02:49:53 -04:00