Decoding billions of integers per second
simdcomp is a simple C library that compresses lists of 32-bit integers with SIMD binary packing. When your numbers are small — or the gaps between sorted values are — it packs them tight and unpacks them at 15 GB/s.
#include "simdcomp.h"
/* Pack 128 small integers, then unpack them. */
uint32_t b = maxbits(datain); // bit width
simdpackwithoutmask(datain, buffer, b); // 128 ints -> b*128 bits
simdunpack(buffer, recovered, b); // and back again
/* Sorted? Store deltas with differential coding. */
uint32_t b1 = simdmaxbitsd1(0, datain);
simdpackwithoutmaskd1(0, datain, buffer, b1);
simdunpackd1(0, buffer, recovered, b1);
Highlights
Blazing fast
Decode at least 4 billion compressed integers per second — roughly 0.3 cycles per integer on a Skylake core. Far faster than gzip, LZO, Snappy or LZ4.
Simple C API
A handful of clear functions over blocks of 128 integers. C99 or better, CMake-friendly, no heavy dependencies. Drop it in and pack.
x86 and ARM
SSE4.1 with optional AVX2 / AVX-512 paths on Intel and AMD, plus 64-bit ARM NEON (Apple Silicon) through a self-contained shim. Same API everywhere.
Delta coding
For sorted lists, store differences between successive integers. Tiny gaps compress to a few bits each — ideal for inverted indexes and posting lists.
Search in place
Frame-of-reference (FOR) packing lets you search and select directly over compressed data, without fully decoding it first.
Peer reviewed
Built on published research in Software: Practice & Experience and used in production databases and search engines.
Build it in two commands
simdcomp builds with CMake and selects the right SIMD backend automatically
at compile time — SSE/AVX on x86, NEON on ARM. Pull it in with
find_package after installing, or vendor it straight from source
with FetchContent.
No package manager needed — the library is a small set of C files under
src/ and include/. You can also grab it as the npm
package simdcomp, or use the demo in the go/ folder.
# Clone & build
$ git clone https://github.com/fast-pack/simdcomp
$ cmake -B build
$ cmake --build build
$ ctest --test-dir build # run the tests
# Or vendor it via CMake FetchContent
FetchContent_Declare(simdcomp
GIT_REPOSITORY https://github.com/fast-pack/simdcomp.git
GIT_TAG master)
FetchContent_MakeAvailable(simdcomp)
target_link_libraries(myapp PRIVATE simdcomp::simdcomp)
What's in the box
Binary packing
Pack 128 integers into b 128-bit words with simdpack / simdunpack. Compression ratio is 32/b.
Differential coding
The *d1 family stores deltas from an offset, so sorted, slowly-growing sequences shrink to a handful of bits each.
Frame of reference
The simdfor routines pack relative to a base value — no delta chain, so individual values stay randomly accessible.
Search & select
Find a value or pull out the i-th element directly from packed blocks, without decompressing the whole array.
Wider vectors
Optional 256-bit and 512-bit code paths kick in automatically when you build with -march=native on a capable host.
64-bit ARM
The 128-bit SSE kernels map onto ARM NEON via a small built-in shim (include/neon128.h) — no third-party translation layer.
Arbitrary arrays
The *_length helpers and simdpack_compressedbytes handle arrays that aren't a neat multiple of 128.
Go demo
A small Go binding lives in the go/ folder, with ports to Rust and Julia maintained elsewhere.
Permissive license
Released under the BSD 3-Clause license — free to use in commercial and open-source projects alike.
Trusted where every cycle counts
simdcomp powers integer compression inside databases and search engines that move billions of values.
Plus ports & siblings — Rust (bitpacking), Julia (TinyInt.jl), Go (intcomp), and the wider FastPFor / StreamVByte family.