License Apache 2.0 C99 SIMD: SSE 4.1 / ARM NEON Patent free

Compress integers at the
speed of memory.

StreamVByte is a fast, patent-free integer compression technique that brings SIMD vectorization (SSE 4.1, ARM NEON) to Google's Group Varint approach — decoding billions of integers per second.

Why StreamVByte

Built for speed, designed for production

A small, focused C library that does one thing extremely well.

Vectorized

Hand-tuned SIMD kernels for x64 (SSE 4.1) and 64-bit ARM (NEON). Separates control bytes from data for shuffle-friendly decoding.

Patent-free

Released under the Apache 2.0 license with no patent encumbrance. Use it freely in open-source and commercial projects alike.

Differential coding

Built-in fast delta encoding for sorted data, plus zigzag helpers so you can compress signed integers without losing throughput.

Portable C99

Builds with GCC, Clang and MSVC on Linux, macOS and Windows. CMake and Makefile builds, with a graceful scalar fallback.

Safe decoding

Validate a stream against its expected length before decoding with streamvbyte_validate_stream to avoid undefined behavior.

Compact format

1, 2, 3 or 4 bytes per integer (or 0/1/2/4 in the zero-friendly variant), with a simple, documented and stable byte layout.

Quickstart

Encode and decode in a few lines

Add the header, link the library, and you are off.

example.c

#include "streamvbyte.h"

// suppose datain is an array of N uint32_t integers
size_t compsize = streamvbyte_encode(datain, N, compressedbuffer); // encode

// the result is stored in compressedbuffer using compsize bytes
streamvbyte_decode(compressedbuffer, recovdata, N);            // decode (fast)

differential coding for sorted data

// best when your integers are sorted / slowly increasing
size_t compsize = streamvbyte_delta_encode(datain, N, compressedbuffer, 0);
streamvbyte_delta_decode(compressedbuffer, recovdata, N, 0);

// validate before decoding an untrusted stream
if (streamvbyte_validate_stream(compressedbuffer, compsize, N)) {
    streamvbyte_decode(compressedbuffer, recovdata, N); // safe to decode
}

build with CMake

cmake -DCMAKE_BUILD_TYPE=Release -DSTREAMVBYTE_ENABLE_TESTS=ON -B build
cmake --build build
ctest --test-dir build
Performance

Billions of integers per second

Measured on an Apple M4 Max (ARM NEON), 500K random uint32_t per run.

~40 GB/s
peak decode throughput
10B+
integers decoded / second
0.31×
size on byte-range data
WorkloadEncodeDecodeCompressed size
Mixed 1–4 byte (log-uniform)26.6 GB/s39.6 GB/s0.64×
Full-range (mostly 4-byte)25.2 GB/s32.0 GB/s1.06×
Small values [0,256) (1-byte)26.6 GB/s36.2 GB/s0.31×

Reproduce it yourself: cmake --build build && ./build/perf. Numbers vary by CPU and data distribution.

Flexible

Signed integers & zero-friendly mode

StreamVByte focuses on unsigned 32-bit integers, but ships everything you need to go beyond that. Use the zigzag helpers to fold signed integers into unsigned ones, or switch to the 0124 variant when your data is full of zeros.

  • zigzag_encode / zigzag_decode for signed data
  • streamvbyte_encode_0124 uses 0, 1, 2 or 4 bytes per integer
  • Ideal when many values are expected to be zero

streamvbyte_zigzag.h

#include "streamvbyte_zigzag.h"

// signed -> unsigned
zigzag_encode(mysignedints, myunsignedints, number);

// unsigned -> signed
zigzag_decode(myunsignedints, mysignedints, number);
Specification

A simple, documented format

Two streams: control bytes followed by data bytes. No magic, no surprises.

The data starts with an array of control bytes — there are (count + 3) / 4 of them. Each byte holds four 2-bit words. Each 2-bit word says how many data bytes the next integer uses: 00→1 byte, 01→2 bytes, 10→3 bytes, 11→4 bytes. Data bytes are little-endian.

encoding (0, 100, 200, 300, 400, 500, 600, 700)

control bytes: 0x40 0x55
// 0x40 = 00 00 00 01  -> three 1-byte values, then a 2-byte value
// 0x55 = 01 01 01 01  -> four  2-byte values

data bytes: 0x00 0x64 0xc8   0x2c 0x01 0x90 0x01 0xf4 0x01 0x58 0x02 0xbc 0x02
//           0    100  200   300       400       500       600       700

The full specification — including how partial final groups are handled — lives in the README.

Trusted in production

Used by