[parser] adding chunked shuffle as a C function (writes each line to one of n random files, runs shuf on each file and concatenates the result). Adding a version which allows specifying a specific chunk size, and using a 2GB limit for address parser training. Allowing gshuf again for Mac as it seems the only problem there was not having enough memory when testing on a Mac laptop. The new limited-memory version should be fast enough.

This commit is contained in:
Al
2017-03-05 02:15:03 -05:00
parent ba4052c9ba
commit b76b7b8527
4 changed files with 106 additions and 3 deletions

View File

@@ -57,8 +57,10 @@ AC_CONFIG_FILES([Makefile
test/Makefile])
AC_CHECK_PROG([FOUND_SHUF], [shuf], [yes])
AC_CHECK_PROG([FOUND_GSHUF], [gshuf], [yes])
AS_IF([test "x$FOUND_SHUF" = xyes], [AC_DEFINE([HAVE_SHUF], [1], [shuf available])])
AS_IF([test "x$FOUND_GSHUF" = xyes], [AC_DEFINE([HAVE_GSHUF], [1], [gshuf available])])
# ------------------------------------------------------------------
# Checks for SSE2 build