Lighweight Benchmarking
A lightweight benchmarking tool unitBench
is provided in the codebase as a helper to benchmark some common compression use cases.
Build and Use
TheunitBench
binary expects a scenario and some number of input files. Here is a sample command to benchmark Zstd compression on a few files
A full list of existing scenarios can be found with the --list
option. Use -h
to learn more about other options.
Creating a custom benchmark
The unitBench
tool is designed to be easily extensible. To create a new benchmark, simply append a scenario to the scenario list here:
/* ==================================================
* Table of scenarios
* =============================================== */
#define NB_FUNCS (sizeof(scenarioList) / sizeof(scenarioList[0]))
#pragma GCC diagnostic ignored "-Wmissing-field-initializers"
// clang-format off
Bench_Entry const scenarioList[] = {
{ "deltaDecode8", deltaDecode8_wrapper, .outSize = out_identical },
{ "deltaDecode16", deltaDecode16_wrapper, .outSize = out_identical },
{ "deltaEncode32", deltaEncode32_wrapper, .outSize = out_identical },
Each scenario is described within a single structure, defined here. Many of its fields are optional. The structure is declared in-place within the scenarioList
array
Required (for standard scenarios only): Graph creation function, look into zs2_compressor.h for its signature. Setting .graphF to != NULL will imply .func, and trigger a round-trip scenario. Either .graphF or .func must be != NULL for the scenario to be valid!
Optional: modify input buffer for benchmark. This is uncommon; it may be needed to massage or verify input so that it corresponds to the scenario's expectations
Optional: tells how much memory must be allocated for dstCapacity (the output of .func). If left blank, unitBench will use ZL_compressBound() by default.
There are 2 ways to declare a benchmarked function. The first one is to pass a graph function (a "standard" scenario).
static ZL_GraphID fieldLZ32Graph(ZL_Compressor* cgraph)
{
return ZL_Compressor_registerStaticGraph_fromNode1o(
cgraph, ZL_NODE_CONVERT_SERIAL_TO_TOKEN4, ZL_GRAPH_FIELD_LZ);
}
// scenario definition
{ "fieldLZ32", .graphF = fieldLZ32Graph },
size_t exact2_wrapper(
const void* src,
size_t srcSize,
void* dst,
size_t dstCapacity,
void* customPayload)
{
(void)customPayload;
(void)dst;
(void)dstCapacity;
uint8_t present[1u << 16];
memset(present, 0, sizeof(present));
typedef uint16_t Elt;
size_t const nbElts = srcSize / sizeof(Elt);
Elt const* ptr = (Elt const*)src;
for (size_t i = 0; i < nbElts; ++i) {
present[ptr[i]] = 1;
}
size_t cardinality = 0;
for (size_t i = 0; i < sizeof(present); ++i) {
cardinality += present[i];
}
return cardinality;
}
// scenario definition
{ "exact2", exact2_wrapper, .outSize = out_identical },
outSize
function. This tells unitBench
how much space to allocate for the compressed output. out_identical
is a convenience funtion meaning "allocate a buffer with the same size as the input".
static size_t out_identical(const void* src, size_t srcSize)
{
(void)src;
return srcSize;
}
Advanced scenario configuration
Prep
The prep
function is an optional pre-processing function that is called on the input buffer. It can be used to massage the input if the scenario has special expectations on the input.
static size_t splitBy_prepInternal(void* src, size_t srcSize, size_t eltSize)
{
size_t const nbElts = srcSize / eltSize;
uint8_t* const src8 = src;
for (size_t n = 0; n < nbElts; n++) {
src8[n] = src8[n] % SB8_NB_DST_BUFFERS;
}
return srcSize;
}
size_t splitBy8_preparation(void* src, size_t srcSize, const BenchPayload* bp)
{
(void)bp;
return splitBy_prepInternal(src, srcSize, 8);
}
Display
The display
function is an optional function to calculate and print benchmark results in a format that differs from the standard format. Typically, this will be defined if special calculations need to be done to accurately calculate size or speed. For instance, decompression benchmarks need to use the generated size and not the source size when calculate speed.
/* display specialized for decompressors :
* provide speed evaluation in relation to size generated
* (instead of src, aka compressed size) */
static void decoderResult(
const char* srcname,
const char* fname,
BMK_runTime_t rt,
size_t srcSize)
{
double const sec = rt.nanoSecPerRun / 1000000000.;
double const nbRunsPerSec = 1. / sec;
double const nbBytesPerSec = nbRunsPerSec * (double)rt.sumOfReturn;
printf("decode %s (%llu KB) with %s into %llu KB (x%.2f) in %.2f ms ==> %.1f MB/s",
srcname,
(unsigned long long)(srcSize >> 10),
fname,
(unsigned long long)(rt.sumOfReturn >> 10),
(double)rt.sumOfReturn / (double)srcSize,
sec * 1000.,
nbBytesPerSec / (1 << 20));
}