Integer compression at memory speed

Decoding billions of integers per second

simdcomp is a simple C library that compresses lists of 32-bit integers with SIMD binary packing. When your numbers are small — or the gaps between sorted values are — it packs them tight and unpacks them at 15 GB/s.

Read the docs Star on GitHub

$ git clone github.com/fast-pack/simdcomp$ cmake -B build$ cmake --build build

quickstart.c

#include "simdcomp.h"

/* Pack 128 small integers, then unpack them. */
uint32_t b = maxbits(datain);        // bit width
simdpackwithoutmask(datain, buffer, b); // 128 ints -> b*128 bits
simdunpack(buffer, recovered, b);      // and back again

/* Sorted? Store deltas with differential coding. */
uint32_t b1 = simdmaxbitsd1(0, datain);
simdpackwithoutmaskd1(0, datain, buffer, b1);
simdunpackd1(0, buffer, recovered, b1);

4billion+

Integers decoded per second

0.3cycles/int

On a Skylake processor

15GB/s

Decompression throughput

32/b×

Compression ratio

Why simdcomp

Highlights

Blazing fast

Decode at least 4 billion compressed integers per second — roughly 0.3 cycles per integer on a Skylake core. Far faster than gzip, LZO, Snappy or LZ4.

Simple C API

A handful of clear functions over blocks of 128 integers. C99 or better, CMake-friendly, no heavy dependencies. Drop it in and pack.

x86 and ARM

SSE4.1 with optional AVX2 / AVX-512 paths on Intel and AMD, plus 64-bit ARM NEON (Apple Silicon) through a self-contained shim. Same API everywhere.

Delta coding

For sorted lists, store differences between successive integers. Tiny gaps compress to a few bits each — ideal for inverted indexes and posting lists.

Search in place

Frame-of-reference (FOR) packing lets you search and select directly over compressed data, without fully decoding it first.

Peer reviewed

Built on published research in Software: Practice & Experience and used in production databases and search engines.

Get started

Build it in two commands

simdcomp builds with CMake and selects the right SIMD backend automatically at compile time — SSE/AVX on x86, NEON on ARM. Pull it in with find_package after installing, or vendor it straight from source with FetchContent.

Setup guide Example code

No package manager needed — the library is a small set of C files under src/ and include/. You can also grab it as the npm package simdcomp, or use the demo in the go/ folder.

build.sh

# Clone & build
$ git clone https://github.com/fast-pack/simdcomp
$ cmake -B build
$ cmake --build build
$ ctest --test-dir build      # run the tests

# Or vendor it via CMake FetchContent
FetchContent_Declare(simdcomp
  GIT_REPOSITORY https://github.com/fast-pack/simdcomp.git
  GIT_TAG master)
FetchContent_MakeAvailable(simdcomp)
target_link_libraries(myapp PRIVATE simdcomp::simdcomp)

Features

What's in the box

core

Binary packing

Pack 128 integers into b 128-bit words with simdpack / simdunpack. Compression ratio is 32/b.

sorted

Differential coding

The *d1 family stores deltas from an offset, so sorted, slowly-growing sequences shrink to a handful of bits each.

FOR

Frame of reference

The simdfor routines pack relative to a base value — no delta chain, so individual values stay randomly accessible.

query

Search & select

Find a value or pull out the i-th element directly from packed blocks, without decompressing the whole array.

AVX2 · AVX-512

Wider vectors

Optional 256-bit and 512-bit code paths kick in automatically when you build with -march=native on a capable host.

NEON

64-bit ARM

The 128-bit SSE kernels map onto ARM NEON via a small built-in shim (include/neon128.h) — no third-party translation layer.

any length

Arbitrary arrays

The *_length helpers and simdpack_compressedbytes handle arrays that aren't a neat multiple of 128.

Go demo

A small Go binding lives in the go/ folder, with ports to Rust and Julia maintained elsewhere.

BSD-3

Permissive license

Released under the BSD 3-Clause license — free to use in commercial and open-source projects alike.

Used in production

Trusted where every cycle counts

simdcomp powers integer compression inside databases and search engines that move billions of values.

upscaledb EventQL Manticore Search SereneDB

Plus ports & siblings — Rust (bitpacking), Julia (TinyInt.jl), Go (intcomp), and the wider FastPFor / StreamVByte family.