Skip to content

Lighweight Benchmarking

A lightweight benchmarking tool unitBench is provided in the codebase as a helper to benchmark some common compression use cases.

Build and Use

make unitBench
The unitBench binary expects a scenario and some number of input files. Here is a sample command to benchmark Zstd compression on a few files
./unitBench zstdDirect file1.txt file2.txt file3.txt
A full list of existing scenarios can be found with the --list option. Use -h to learn more about other options.

Creating a custom benchmark

The unitBench tool is designed to be easily extensible. To create a new benchmark, simply append a scenario to the scenario list here:

benchmark/unitBench/benchList.h
/* ==================================================
 * Table of scenarios
 * =============================================== */

#define NB_FUNCS (sizeof(scenarioList) / sizeof(scenarioList[0]))
#pragma GCC diagnostic ignored "-Wmissing-field-initializers"

// clang-format off
Bench_Entry const scenarioList[] = {
    { "deltaDecode8", deltaDecode8_wrapper, .outSize = out_identical },
    { "deltaDecode16", deltaDecode16_wrapper, .outSize = out_identical },
    { "deltaEncode32", deltaEncode32_wrapper, .outSize = out_identical },
Each scenario is a struct containing a scenario name and some number of user-defined function pointers.

struct Bench_Entry

Each scenario is described within a single structure, defined here. Many of its fields are optional. The structure is declared in-place within the scenarioList array

const char* name;

Required: name of the scenario.

BMK_benchFn_t func;

Required (for custom scenarios only): the function to benchmark.

ZL_GraphFn graphF;

Required (for standard scenarios only): Graph creation function, look into zs2_compressor.h for its signature. Setting .graphF to != NULL will imply .func, and trigger a round-trip scenario. Either .graphF or .func must be != NULL for the scenario to be valid!

BMK_prepFn_t prep;

Optional: modify input buffer for benchmark. This is uncommon; it may be needed to massage or verify input so that it corresponds to the scenario's expectations

BMK_initFn_t init;

Optional: this function is run only once, at the beginning of the benchmark

BMK_outSize_f outSize;

Optional: tells how much memory must be allocated for dstCapacity (the output of .func). If left blank, unitBench will use ZL_compressBound() by default.

BMK_display_f display;

Optional: custom result display function.

There are 2 ways to declare a benchmarked function. The first one is to pass a graph function (a "standard" scenario).

benchmark/unitBench/benchList.h
static ZL_GraphID fieldLZ32Graph(ZL_Compressor* cgraph)
{
    return ZL_Compressor_registerStaticGraph_fromNode1o(
            cgraph, ZL_NODE_CONVERT_SERIAL_TO_TOKEN4, ZL_GRAPH_FIELD_LZ);
}

// scenario definition
    { "fieldLZ32", .graphF = fieldLZ32Graph },
If the scenario is not representable as a graph, it is still benchmarkable, but will require some extra work. Directly declare the function to be tested (a "custom" scenario):

typedef size_t (*BMK_benchFn_t)(const void *src, size_t srcSize, void *dst,
                                size_t dstCapacity, void *customPayload);

benchmark/unitBench/scenarios/codecs/estimate_scenario.c
size_t exact2_wrapper(
        const void* src,
        size_t srcSize,
        void* dst,
        size_t dstCapacity,
        void* customPayload)
{
    (void)customPayload;
    (void)dst;
    (void)dstCapacity;
    uint8_t present[1u << 16];
    memset(present, 0, sizeof(present));
    typedef uint16_t Elt;
    size_t const nbElts = srcSize / sizeof(Elt);
    Elt const* ptr      = (Elt const*)src;

    for (size_t i = 0; i < nbElts; ++i) {
        present[ptr[i]] = 1;
    }
    size_t cardinality = 0;
    for (size_t i = 0; i < sizeof(present); ++i) {
        cardinality += present[i];
    }

    return cardinality;
}
benchmark/unitBench/benchList.h
// scenario definition
    { "exact2", exact2_wrapper, .outSize = out_identical },
This scenario also declares the outSize function. This tells unitBench how much space to allocate for the compressed output. out_identical is a convenience funtion meaning "allocate a buffer with the same size as the input".

typedef size_t (*BMK_outSize_f)(const void *src, size_t srcSize);
benchmark/unitBench/benchList.h
static size_t out_identical(const void* src, size_t srcSize)
{
    (void)src;
    return srcSize;
}

Advanced scenario configuration

Prep

typedef size_t (*BMK_prepFn_t)(void *src, size_t srcSize,
                               const BenchPayload *bp);

The prep function is an optional pre-processing function that is called on the input buffer. It can be used to massage the input if the scenario has special expectations on the input.

benchmark/unitBench/scenarios/codecs/dispatch_by_tag.c
static size_t splitBy_prepInternal(void* src, size_t srcSize, size_t eltSize)
{
    size_t const nbElts = srcSize / eltSize;
    uint8_t* const src8 = src;
    for (size_t n = 0; n < nbElts; n++) {
        src8[n] = src8[n] % SB8_NB_DST_BUFFERS;
    }
    return srcSize;
}

size_t splitBy8_preparation(void* src, size_t srcSize, const BenchPayload* bp)
{
    (void)bp;
    return splitBy_prepInternal(src, srcSize, 8);
}

Display

typedef void (*BMK_display_f)(const char *srcname, const char *fname,
                              BMK_runTime_t rt, size_t srcSize);

The display function is an optional function to calculate and print benchmark results in a format that differs from the standard format. Typically, this will be defined if special calculations need to be done to accurately calculate size or speed. For instance, decompression benchmarks need to use the generated size and not the source size when calculate speed.

benchmark/unitBench/benchList.h
/* display specialized for decompressors :
 * provide speed evaluation in relation to size generated
 * (instead of src, aka compressed size) */
static void decoderResult(
        const char* srcname,
        const char* fname,
        BMK_runTime_t rt,
        size_t srcSize)
{
    double const sec           = rt.nanoSecPerRun / 1000000000.;
    double const nbRunsPerSec  = 1. / sec;
    double const nbBytesPerSec = nbRunsPerSec * (double)rt.sumOfReturn;

    printf("decode %s (%llu KB) with %s into %llu KB (x%.2f) in %.2f ms  ==> %.1f MB/s",
           srcname,
           (unsigned long long)(srcSize >> 10),
           fname,
           (unsigned long long)(rt.sumOfReturn >> 10),
           (double)rt.sumOfReturn / (double)srcSize,
           sec * 1000.,
           nbBytesPerSec / (1 << 20));
}