Al
|
8562c7a5cb
|
[unicode] Adding wide char support for language disambiguation (comes up in venue names), despite the likelihood of running on a narrow Python build. Rolling back common script chars at a script break, so in the case of e.g. Cyrllic name (Latin name), the segmentation is done at the space before the paren.
|
2015-09-23 00:37:59 -04:00 |
|
Al
|
19e5457a0f
|
[unicode] Regenerated unicode scripts data file, using simple integers instead of repeating the enum types for succinctness
|
2015-09-23 00:36:29 -04:00 |
|
Al
|
4ad3fac627
|
[unicode] Regenerated unicode script types (ignore extraneous scripts, they're not used, just reside in the upper unicode planes)
|
2015-09-23 00:35:08 -04:00 |
|
Al
|
13bcc35523
|
[unicode] Allowing wide chars in unicode properties
|
2015-09-23 00:34:07 -04:00 |
|
Al
|
f13e9fad90
|
[tokenization] Regenerated scanner.c
|
2015-09-23 00:33:27 -04:00 |
|
Al
|
b4593b6f88
|
[unicode/tokenization] Using new character classes including wide chars in scanner
|
2015-09-23 00:33:14 -04:00 |
|
Al
|
a76831df7a
|
[unicode] Wide version of word breaks
|
2015-09-22 18:55:33 -04:00 |
|
Al
|
25917cfb17
|
[fix] scripts
|
2015-09-22 15:15:30 -04:00 |
|
Al
|
b405a53fe1
|
[fix] chars out of range in get_string_script Python version
|
2015-09-22 08:14:27 -04:00 |
|
Al
|
ca25b48687
|
[fix] Not writing empty fields in formatted addresses
|
2015-09-22 08:13:55 -04:00 |
|
Al
|
747de1944b
|
[fix] Accounting for unknown scripts in disambiguation
|
2015-09-21 18:05:28 -04:00 |
|
Al
|
3ac89d7ed9
|
[setup] fixing packaging
|
2015-09-21 17:31:15 -04:00 |
|
Al
|
236737eab3
|
[tokenization/osm] Using utf8 encoded version of string for tokens in python tokenizer
|
2015-09-21 17:27:43 -04:00 |
|
Al
|
134cf616d6
|
[osm] Using street for language disambiguation in training data
|
2015-09-21 04:09:15 -04:00 |
|
Al
|
ccac4a5a90
|
[fix] package directory
|
2015-09-21 04:01:36 -04:00 |
|
Al
|
5f912ddcd3
|
[fix] std=c99
|
2015-09-21 03:25:32 -04:00 |
|
Al
|
5b2fd0be50
|
[fix] pytokenize compilation on Ubuntu/gcc
|
2015-09-21 03:24:14 -04:00 |
|
Al
|
cffa5a4a20
|
[fix] stdint include
|
2015-09-20 20:10:47 -04:00 |
|
Al
|
25b3338600
|
[setup] setup.py for pypostal so it can be installed from the Github url
|
2015-09-20 20:07:59 -04:00 |
|
Al
|
84cf21df88
|
[osm] Separating address formatter into its own module, adding some documentation of the various training sets with examples
|
2015-09-20 20:05:46 -04:00 |
|
Al
|
5485ea2197
|
[python] Adding initial pypostal bindings for tokenize so we can remove address_normalizer dependency. Not tested on Python 3.
|
2015-09-20 14:59:39 -04:00 |
|
Al
|
3fab0f984f
|
[fix] fixing some compiler warnings, using type-specific abs functions for vector_math
|
2015-09-19 16:11:09 -04:00 |
|
Al
|
6731395ca0
|
[osm] Separating tagged from untagged output
|
2015-09-19 14:11:47 -04:00 |
|
Al
|
2940cc15b8
|
[fix] tokenized string destroy frees original string
|
2015-09-19 01:40:41 -04:00 |
|
Al
|
2b13871341
|
[constants] max country code length
|
2015-09-19 01:39:58 -04:00 |
|
Al
|
0396823772
|
[fix] geodb path separator
|
2015-09-19 01:39:31 -04:00 |
|
Al
|
17cfdb0625
|
[fix] adding char_array_append_* methods to header
|
2015-09-18 13:19:42 -04:00 |
|
Al
|
f2f7db92ff
|
[fix] phrases
|
2015-09-18 13:19:18 -04:00 |
|
Al
|
b74e92adad
|
[fix] include
|
2015-09-18 13:18:49 -04:00 |
|
Al
|
2a869894d9
|
[fix] geodb
|
2015-09-18 13:18:26 -04:00 |
|
Al
|
9e9131bda0
|
[parser] Averaged perceptron tagger
|
2015-09-17 05:51:24 -04:00 |
|
Al
|
8a86f7ec64
|
[parser] Adding context struct to feature function
|
2015-09-17 05:48:00 -04:00 |
|
Al
|
87ed7d9a0f
|
[geodb] Adding trie search methods for finding geodb phrases
|
2015-09-16 22:11:10 -04:00 |
|
Al
|
e62c75b9c6
|
[phrases] Adding _with_phrases versions of address dictionary methods for pre-allocated phrases
|
2015-09-16 21:24:28 -04:00 |
|
Al
|
23103a21d4
|
[phrases] Adding with_phrases versions of trie search methods for pre-allocated phrases
|
2015-09-16 21:23:34 -04:00 |
|
Al
|
d5ec005787
|
[transliteration] Similar init method for transliteration
|
2015-09-16 21:14:02 -04:00 |
|
Al
|
b11362ab98
|
[numex] using module init method for building, otherwise passing NULL path uses the default path
|
2015-09-16 21:13:05 -04:00 |
|
Al
|
3cba2e8df3
|
[api] Using default setup methods for submodules in libpostal setup
|
2015-09-15 14:01:33 -04:00 |
|
Al
|
e122824448
|
[expansion] Adding the ability to search address dictionary phrases with a NULL language, will return phrases in any language
|
2015-09-15 14:00:26 -04:00 |
|
Al
|
c47ff1b113
|
[utils] Adding source string to tokenized_string struct
|
2015-09-15 13:21:51 -04:00 |
|
Al
|
b2f690b6f6
|
[api] Error logging if modules can't be found
|
2015-09-15 13:21:15 -04:00 |
|
Al
|
9de3029dd3
|
[parser] Averaged perceptron training does full examples (greedily). During training, features are a hashtable, sorted and converted to a trie during finalize
|
2015-09-14 17:38:45 -04:00 |
|
Al
|
a5b5f80b04
|
[fix] new_copy
|
2015-09-14 16:50:23 -04:00 |
|
Al
|
3ea6358f77
|
[fix] vector zeros allocation
|
2015-09-14 16:50:08 -04:00 |
|
Al
|
c21f61b9b4
|
[parser] Default address parser path
|
2015-09-11 15:05:38 -07:00 |
|
Al
|
32c180528f
|
[tokens] Adding a copy_tokens option for tokenized_string
|
2015-09-11 15:04:29 -07:00 |
|
Al
|
9ce658b7a3
|
[collections] Adding string_array for an array of char pointers
|
2015-09-10 16:34:16 -07:00 |
|
Al
|
35b9122a1a
|
[utils] inlining a few functions
|
2015-09-10 16:33:54 -07:00 |
|
Al
|
35f1c02caf
|
[polygons] Reducing simplify tolerance for language polys now that regional languages are handled separately
|
2015-09-10 12:44:13 -07:00 |
|
Al
|
440a8158b6
|
[polygons] Adding in country languages for regional polygons without a default language
|
2015-09-10 12:34:26 -07:00 |
|