Decoding billions of integers per second.

A C/C++ library for fast compression and intersection of sorted lists of integers using SIMD instructions — with particular attention to differential coding and novel schemes such as SIMD Galloping.

4 billion+integers decoded / second
15 GB/sdecompression throughput
> gzip · LZ4 · Snappyfar faster than generic codecs
Ubuntu CI Visual Studio CI
example.cpp
// Pick a codec and compress sorted integers
IntegerCODEC &codec =
    *CODECFactory::getFromName("s4-fastpfor-d1");

codec.encodeArray(input.data(), input.size(),
                  out.data(), outSize);

// SIMD intersection of two sorted lists
auto inter =
    IntersectionFactory::getFromName("simd");
size_t n = inter(a.data(), a.size(),
                 b.data(), b.size(), a.data());

Why this library

Fast

Decodes at least 4 billion compressed integers per second on most desktop and laptop CPUs — roughly 15 GB/s.

SIMD-accelerated

Built on SSE/AVX vector instructions to pack, unpack and intersect integers many lanes at a time.

Sorted-integer focused

Specialized for sorted lists with first-class differential (delta) coding — ideal for inverted indexes and posting lists.

SIMD Galloping

Introduces new vectorized intersection schemes, including SIMD Galloping, for very fast set intersection.

Many codecs

FastPFOR, BP128, StreamVByte, Frame-of-Reference, Masked VByte, VarintGB and more — selectable by name.

🎓

Peer reviewed

Backed by peer-reviewed research in Software: Practice & Experience, ACM TOIS and more.

From clone to running in three commands

The library builds a static archive plus a header set you can drop into your own project. On Linux, macOS and similar systems:

$ git clone https://github.com/fast-pack/SIMDCompressionAndIntersection
$ cd SIMDCompressionAndIntersection
$ make && ./unit

A static library libSIMDCompressionAndIntersection.a is produced, with headers in include/. Build the demo with make example && ./example. Windows users build with nmake -f .\makefile.vc.

roundtrip.cpp
#include "codecfactory.h"
#include "intersection.h"
using namespace SIMDCompressionLib;

IntegerCODEC &codec =
    *CODECFactory::getFromName("s4-fastpfor-d1");

// compress
size_t outSize = out.size();
codec.encodeArray(in.data(), in.size(),
                  out.data(), outSize);

// decompress — exact, lossless round-trip
size_t recovered = back.size();
codec.decodeArray(out.data(), outSize,
                  back.data(), recovered);

What's inside

Pick any scheme by name through CODECFactory or IntersectionFactory.

Bit packing

SIMD bit-packing (BP128 / s4-bp128) with optional delta coding in 1, 2 or 4 lanes.

FastPFOR

Patched frame-of-reference codec tuned for sorted data, including s4-fastpfor-d1.

Stream VByte

Faster byte-oriented integer compression using control streams and shuffles.

Varint family

VByte, Masked VByte, VarintGB and VarintG8IU variable-length encodings.

Frame of Reference

Classic and SIMD frame-of-reference compression for tightly clustered values.

SIMD intersection

Vectorized set intersection, including SIMD Galloping, for sorted posting lists.

Differential coding

In-place and on-the-fly delta coding to exploit the structure of sorted integers.

Cross-platform

Tested on Linux, macOS and Windows; SSE2 minimum, SSE 4.1 / AVX recommended.

Backed by research

The techniques in this library are described in peer-reviewed publications.

Ready to compress at SIMD speed?

Apache 2.0 licensed, patent-free as far as the authors know, and battle-tested across platforms.

Get it on GitHub ↗