StreamVByte is a fast, patent-free integer compression technique that brings SIMD vectorization (SSE 4.1, ARM NEON) to Google's Group Varint approach — decoding billions of integers per second.
A small, focused C library that does one thing extremely well.
Hand-tuned SIMD kernels for x64 (SSE 4.1) and 64-bit ARM (NEON). Separates control bytes from data for shuffle-friendly decoding.
Released under the Apache 2.0 license with no patent encumbrance. Use it freely in open-source and commercial projects alike.
Built-in fast delta encoding for sorted data, plus zigzag helpers so you can compress signed integers without losing throughput.
Builds with GCC, Clang and MSVC on Linux, macOS and Windows. CMake and Makefile builds, with a graceful scalar fallback.
Validate a stream against its expected length before decoding with streamvbyte_validate_stream to avoid undefined behavior.
1, 2, 3 or 4 bytes per integer (or 0/1/2/4 in the zero-friendly variant), with a simple, documented and stable byte layout.
Add the header, link the library, and you are off.
example.c
#include "streamvbyte.h" // suppose datain is an array of N uint32_t integers size_t compsize = streamvbyte_encode(datain, N, compressedbuffer); // encode // the result is stored in compressedbuffer using compsize bytes streamvbyte_decode(compressedbuffer, recovdata, N); // decode (fast)
differential coding for sorted data
// best when your integers are sorted / slowly increasing size_t compsize = streamvbyte_delta_encode(datain, N, compressedbuffer, 0); streamvbyte_delta_decode(compressedbuffer, recovdata, N, 0); // validate before decoding an untrusted stream if (streamvbyte_validate_stream(compressedbuffer, compsize, N)) { streamvbyte_decode(compressedbuffer, recovdata, N); // safe to decode }
build with CMake
cmake -DCMAKE_BUILD_TYPE=Release -DSTREAMVBYTE_ENABLE_TESTS=ON -B build cmake --build build ctest --test-dir build
Measured on an Apple M4 Max (ARM NEON), 500K random uint32_t per run.
| Workload | Encode | Decode | Compressed size |
|---|---|---|---|
| Mixed 1–4 byte (log-uniform) | 26.6 GB/s | 39.6 GB/s | 0.64× |
| Full-range (mostly 4-byte) | 25.2 GB/s | 32.0 GB/s | 1.06× |
| Small values [0,256) (1-byte) | 26.6 GB/s | 36.2 GB/s | 0.31× |
Reproduce it yourself: cmake --build build && ./build/perf. Numbers vary by CPU and data distribution.
StreamVByte focuses on unsigned 32-bit integers, but ships everything you need to go
beyond that. Use the zigzag helpers to fold signed integers into unsigned ones, or switch
to the 0124 variant when your data is full of zeros.
zigzag_encode / zigzag_decode for signed datastreamvbyte_encode_0124 uses 0, 1, 2 or 4 bytes per integerstreamvbyte_zigzag.h
#include "streamvbyte_zigzag.h" // signed -> unsigned zigzag_encode(mysignedints, myunsignedints, number); // unsigned -> signed zigzag_decode(myunsignedints, mysignedints, number);
Two streams: control bytes followed by data bytes. No magic, no surprises.
The data starts with an array of control bytes — there are (count + 3) / 4 of them.
Each byte holds four 2-bit words. Each 2-bit word says how many data bytes the next integer uses:
00→1 byte, 01→2 bytes, 10→3 bytes, 11→4 bytes. Data bytes are little-endian.
encoding (0, 100, 200, 300, 400, 500, 600, 700)
control bytes: 0x40 0x55 // 0x40 = 00 00 00 01 -> three 1-byte values, then a 2-byte value // 0x55 = 01 01 01 01 -> four 2-byte values data bytes: 0x00 0x64 0xc8 0x2c 0x01 0x90 0x01 0xf4 0x01 0x58 0x02 0xbc 0x02 // 0 100 200 300 400 500 600 700
The full specification — including how partial final groups are handled — lives in the README.