Al
|
12816d0e95
|
[api] Setting global objects to NULL on teardown
|
2015-09-28 17:27:57 -04:00 |
|
Al
|
abfa744d59
|
[build] Adding libpostal_data script for downloading data from S3, Makefile uses that now as part of the all-local target. Can be run periodically after install
|
2015-09-28 17:26:15 -04:00 |
|
Al
|
f29f2f091b
|
[fix] PEBCAK
|
2015-09-27 22:49:27 -04:00 |
|
Al
|
93b3110a49
|
[fix] only commas and hyphens need to be eliminated at the end of phrases in untagged address formatting
|
2015-09-27 19:25:34 -04:00 |
|
Al
|
d3bfaf6b43
|
[osm/formatting] Fixing formatting tagged addresses with comma separated fields
|
2015-09-27 03:19:23 -04:00 |
|
Al
|
d512201e2c
|
[fix] removing space from tokens in address formatting
|
2015-09-27 02:18:34 -04:00 |
|
Al
|
a3214b7914
|
[readme] Readme fixes and additions
|
2015-09-26 23:32:19 -04:00 |
|
Al
|
5b829cd5a7
|
[fix] blank values containing punctuation in formatting
|
2015-09-26 21:49:28 -04:00 |
|
Al
|
dac0440be8
|
[fix] rsplit
|
2015-09-26 21:07:54 -04:00 |
|
Al
|
e255ae0e09
|
[dictionaries] Luxembourgish dictionaries
|
2015-09-26 18:31:07 -04:00 |
|
Al
|
3fe56d029d
|
[dictionaries] German Swiss dictionaries
|
2015-09-26 18:30:55 -04:00 |
|
Al
|
ae93552455
|
[osm/formatting] Moving back to openvenues repo pending resolution of the Turkish address issue
|
2015-09-26 03:56:52 -04:00 |
|
Al
|
0c792a2cc3
|
[osm/formatting] Changing the way the formatter elimiates inter-component separators, changing repo back to OpenCageData after pull request merge
|
2015-09-26 03:21:26 -04:00 |
|
Al
|
856198a352
|
[tokenization] Regenerated scanner.c
|
2015-09-26 02:27:45 -04:00 |
|
Al
|
07f1f361e2
|
[transliteration] Regenerating transliteration data with new categories
|
2015-09-26 00:07:39 -04:00 |
|
Al
|
172263af58
|
[tokenization] Adding updated token classes to scanner.re
|
2015-09-26 00:05:23 -04:00 |
|
Al
|
5417b4e602
|
[unicode] Downloading latest UnicodeData.txt instead of using builtin Python module (out of date) e.g. for getting unicode codepoint categories
|
2015-09-25 23:59:38 -04:00 |
|
Al
|
8fe791a14a
|
[fix] ensure_dir in file downloads
|
2015-09-25 17:05:22 -04:00 |
|
Al
|
646b9f7248
|
[osm/formatting] Continuing to use openvenues formatter for the India fix
|
2015-09-25 13:36:24 -04:00 |
|
Al
|
5a6b47d0fd
|
[api] Adding LIBPOSTAL_DEFAULT_OPTIONS to libpostal.h
|
2015-09-25 01:53:29 -04:00 |
|
Al
|
f5bb72c6f5
|
[readme] missed a dictionary type
|
2015-09-24 23:32:36 -04:00 |
|
Al
|
f243b9cfa6
|
[fix] phrasing
|
2015-09-24 23:30:03 -04:00 |
|
Al
|
dc31019604
|
[readme] Heading
|
2015-09-24 23:20:23 -04:00 |
|
Al
|
cfef3059bb
|
[readme] Moving paragraph
|
2015-09-24 23:19:53 -04:00 |
|
Al
|
f62cfb9551
|
[readme] README changes
|
2015-09-24 23:16:07 -04:00 |
|
Al
|
3e256404b9
|
[readme] More informative README
|
2015-09-24 23:02:09 -04:00 |
|
Al
|
9901dd2aac
|
[fix] Switching address formatter back to OpenCageData repo
|
2015-09-24 18:42:17 -04:00 |
|
Al
|
accd8a57e7
|
[expansion] Regenerating expansion data
|
2015-09-24 16:38:20 -04:00 |
|
Al
|
fa320defb7
|
[dictionaries] Afrikaans dictionaries for better disambiguatin in South Africa
|
2015-09-24 16:37:16 -04:00 |
|
Al
|
050a850fb9
|
[dictionaries] Dutch directionals, separating out the west vs westen forms
|
2015-09-24 16:36:52 -04:00 |
|
Al
|
fe5d665533
|
[dictionaries] Arc in English needn't always expand to Arcade
|
2015-09-24 16:36:21 -04:00 |
|
Al
|
bcac6a41be
|
[dictionaries] Separating out Austrian toponym abbreviations
|
2015-09-24 16:35:56 -04:00 |
|
Al
|
3ce1669c30
|
[fix] import
|
2015-09-24 01:25:00 -04:00 |
|
Al
|
c85ce0b11d
|
[osm/formatting] Tagging separators as well in tagged output of the address formatter
|
2015-09-24 01:22:49 -04:00 |
|
Al
|
f6c30778bf
|
[normalize] New token normalization option for replacing digits with 'D' for masking numbers e.g. when learning patterns (so 1234 and 5678 both normalize to DDDD). Shouldn't be used by libpostal API, just by the feature extractors in the machine learning models. Also adding better possessive handling.
|
2015-09-23 19:41:01 -04:00 |
|
Al
|
a1d272077d
|
[doc] Averaged perceptron tagger
|
2015-09-23 19:37:55 -04:00 |
|
Al
|
4a0da67aa1
|
[fix] warning
|
2015-09-23 04:06:54 -04:00 |
|
Al
|
88bd0cd158
|
[unicode] better segmentation on script breaks
|
2015-09-23 04:06:34 -04:00 |
|
Al
|
377c947541
|
[transliteration] Regenerating transliteration data files
|
2015-09-23 04:04:38 -04:00 |
|
Al
|
abfb1d4a60
|
[transliteration] Wide char support in transliteration data generator
|
2015-09-23 03:56:12 -04:00 |
|
Al
|
7e057b0fb8
|
[utils] basic functions for wide char support for narrow Python builds (unichr, ord, unicode iteration)
|
2015-09-23 00:42:54 -04:00 |
|
Al
|
8562c7a5cb
|
[unicode] Adding wide char support for language disambiguation (comes up in venue names), despite the likelihood of running on a narrow Python build. Rolling back common script chars at a script break, so in the case of e.g. Cyrllic name (Latin name), the segmentation is done at the space before the paren.
|
2015-09-23 00:37:59 -04:00 |
|
Al
|
19e5457a0f
|
[unicode] Regenerated unicode scripts data file, using simple integers instead of repeating the enum types for succinctness
|
2015-09-23 00:36:29 -04:00 |
|
Al
|
4ad3fac627
|
[unicode] Regenerated unicode script types (ignore extraneous scripts, they're not used, just reside in the upper unicode planes)
|
2015-09-23 00:35:08 -04:00 |
|
Al
|
13bcc35523
|
[unicode] Allowing wide chars in unicode properties
|
2015-09-23 00:34:07 -04:00 |
|
Al
|
f13e9fad90
|
[tokenization] Regenerated scanner.c
|
2015-09-23 00:33:27 -04:00 |
|
Al
|
b4593b6f88
|
[unicode/tokenization] Using new character classes including wide chars in scanner
|
2015-09-23 00:33:14 -04:00 |
|
Al
|
a76831df7a
|
[unicode] Wide version of word breaks
|
2015-09-22 18:55:33 -04:00 |
|
Al
|
25917cfb17
|
[fix] scripts
|
2015-09-22 15:15:30 -04:00 |
|
Al
|
b405a53fe1
|
[fix] chars out of range in get_string_script Python version
|
2015-09-22 08:14:27 -04:00 |
|