Training Orchestration
The training architecture behind OpenZL makes certain assumptions about the data. The data must parse correctly according to the parser's specification, and similar categories of data should be tagged in the same way. Ideally, the samples of data will be sourced from the same place.
Data Setup
We will extend the parsing example, and show how to use the serialized compressor for compressing and benchmarking test data in this example. To start, running training_setup.py
provides multiple samples in our custom data format. We will use /tmp/openzl
as the working directory for this exercise.
train
sub-directory and 95 samples in the test
sub-directory.
Running training
We can re-run our training written in the previous exercise. This will save the compressor to the directory /tmp/openzl/train/compressor.zlc
. Go to the directory of the cmake build and go to the examples
directory. We can now run the training binary with the following command:
openzl::Compressor compressor;
// Register dependencies
registerGraph_ParsingCompressor(compressor);
// Deserialize the compressor
compressor.deserialize(compressorFile.contents());
registerGraph_ParsingCompressor
. This enables the compressor to be deserialized by calling compressor.deserialize
and passing in the serialized compressor.
Decompression only requires that the compressor's configured format version to be less than or equal to the decompressor's format version. The following two lines of code are all that is necessary to decompress.
Alternatively, the CLI can be used for decompression.Benchmarking
We can run the binary to benchmark the performance on each file.
We get the result on the test set: