[parser] adding chunked shuffle as a C function (writes each line to one of n random files, runs shuf on each file and concatenates the result). Adding a version which allows specifying a specific chunk size, and using a 2GB limit for address parser training. Allowing gshuf again for Mac as it seems the only problem there was not having enough memory when testing on a Mac laptop. The new limited-memory version should be fast enough.

This commit is contained in:
Al
2017-03-05 02:15:03 -05:00
parent ba4052c9ba
commit b76b7b8527
4 changed files with 106 additions and 3 deletions

View File

@@ -5,5 +5,7 @@
#include <stdbool.h>
bool shuffle_file(char *filename);
bool shuffle_file_chunked(char *filename, size_t parts);
bool shuffle_file_chunked_size(char *filename, size_t chunk_size);
#endif