Integer compression at memory speed

Decoding billions of integers per second

simdcomp is a simple C library that compresses lists of 32-bit integers with SIMD binary packing. When your numbers are small — or the gaps between sorted values are — it packs them tight and unpacks them at 15 GB/s.

$ git clone github.com/fast-pack/simdcomp$ cmake -B build$ cmake --build build
quickstart.c
#include "simdcomp.h"

/* Pack 128 small integers, then unpack them. */
uint32_t b = maxbits(datain);        // bit width
simdpackwithoutmask(datain, buffer, b); // 128 ints -> b*128 bits
simdunpack(buffer, recovered, b);      // and back again

/* Sorted? Store deltas with differential coding. */
uint32_t b1 = simdmaxbitsd1(0, datain);
simdpackwithoutmaskd1(0, datain, buffer, b1);
simdunpackd1(0, buffer, recovered, b1);
4billion+
Integers decoded per second
0.3cycles/int
On a Skylake processor
15GB/s
Decompression throughput
32/b×
Compression ratio
Why simdcomp

Highlights

Blazing fast

Decode at least 4 billion compressed integers per second — roughly 0.3 cycles per integer on a Skylake core. Far faster than gzip, LZO, Snappy or LZ4.

Simple C API

A handful of clear functions over blocks of 128 integers. C99 or better, CMake-friendly, no heavy dependencies. Drop it in and pack.

x86 and ARM

SSE4.1 with optional AVX2 / AVX-512 paths on Intel and AMD, plus 64-bit ARM NEON (Apple Silicon) through a self-contained shim. Same API everywhere.

Delta coding

For sorted lists, store differences between successive integers. Tiny gaps compress to a few bits each — ideal for inverted indexes and posting lists.

Search in place

Frame-of-reference (FOR) packing lets you search and select directly over compressed data, without fully decoding it first.

Peer reviewed

Built on published research in Software: Practice & Experience and used in production databases and search engines.

Get started

Build it in two commands

simdcomp builds with CMake and selects the right SIMD backend automatically at compile time — SSE/AVX on x86, NEON on ARM. Pull it in with find_package after installing, or vendor it straight from source with FetchContent.

Setup guide Example code

No package manager needed — the library is a small set of C files under src/ and include/. You can also grab it as the npm package simdcomp, or use the demo in the go/ folder.

build.sh
# Clone & build
$ git clone https://github.com/fast-pack/simdcomp
$ cmake -B build
$ cmake --build build
$ ctest --test-dir build      # run the tests

# Or vendor it via CMake FetchContent
FetchContent_Declare(simdcomp
  GIT_REPOSITORY https://github.com/fast-pack/simdcomp.git
  GIT_TAG master)
FetchContent_MakeAvailable(simdcomp)
target_link_libraries(myapp PRIVATE simdcomp::simdcomp)
Features

What's in the box

core

Binary packing

Pack 128 integers into b 128-bit words with simdpack / simdunpack. Compression ratio is 32/b.

sorted

Differential coding

The *d1 family stores deltas from an offset, so sorted, slowly-growing sequences shrink to a handful of bits each.

FOR

Frame of reference

The simdfor routines pack relative to a base value — no delta chain, so individual values stay randomly accessible.

query

Search & select

Find a value or pull out the i-th element directly from packed blocks, without decompressing the whole array.

AVX2 · AVX-512

Wider vectors

Optional 256-bit and 512-bit code paths kick in automatically when you build with -march=native on a capable host.

NEON

64-bit ARM

The 128-bit SSE kernels map onto ARM NEON via a small built-in shim (include/neon128.h) — no third-party translation layer.

any length

Arbitrary arrays

The *_length helpers and simdpack_compressedbytes handle arrays that aren't a neat multiple of 128.

Go

Go demo

A small Go binding lives in the go/ folder, with ports to Rust and Julia maintained elsewhere.

BSD-3

Permissive license

Released under the BSD 3-Clause license — free to use in commercial and open-source projects alike.

Used in production

Trusted where every cycle counts

simdcomp powers integer compression inside databases and search engines that move billions of values.

Plus ports & siblings — Rust (bitpacking), Julia (TinyInt.jl), Go (intcomp), and the wider FastPFor / StreamVByte family.