Fast
Decodes at least 4 billion compressed integers per second on most desktop and laptop CPUs — roughly 15 GB/s.
A C/C++ library for fast compression and intersection of sorted lists of integers using SIMD instructions — with particular attention to differential coding and novel schemes such as SIMD Galloping.
// Pick a codec and compress sorted integers
IntegerCODEC &codec =
*CODECFactory::getFromName("s4-fastpfor-d1");
codec.encodeArray(input.data(), input.size(),
out.data(), outSize);
// SIMD intersection of two sorted lists
auto inter =
IntersectionFactory::getFromName("simd");
size_t n = inter(a.data(), a.size(),
b.data(), b.size(), a.data());
Decodes at least 4 billion compressed integers per second on most desktop and laptop CPUs — roughly 15 GB/s.
Built on SSE/AVX vector instructions to pack, unpack and intersect integers many lanes at a time.
Specialized for sorted lists with first-class differential (delta) coding — ideal for inverted indexes and posting lists.
Introduces new vectorized intersection schemes, including SIMD Galloping, for very fast set intersection.
FastPFOR, BP128, StreamVByte, Frame-of-Reference, Masked VByte, VarintGB and more — selectable by name.
Backed by peer-reviewed research in Software: Practice & Experience, ACM TOIS and more.
The library builds a static archive plus a header set you can drop into your own project. On Linux, macOS and similar systems:
$ git clone https://github.com/fast-pack/SIMDCompressionAndIntersection
$ cd SIMDCompressionAndIntersection
$ make && ./unit
A static library libSIMDCompressionAndIntersection.a is produced, with headers
in include/. Build the demo with make example && ./example.
Windows users build with nmake -f .\makefile.vc.
#include "codecfactory.h"
#include "intersection.h"
using namespace SIMDCompressionLib;
IntegerCODEC &codec =
*CODECFactory::getFromName("s4-fastpfor-d1");
// compress
size_t outSize = out.size();
codec.encodeArray(in.data(), in.size(),
out.data(), outSize);
// decompress — exact, lossless round-trip
size_t recovered = back.size();
codec.decodeArray(out.data(), outSize,
back.data(), recovered);
Pick any scheme by name through CODECFactory or IntersectionFactory.
SIMD bit-packing (BP128 / s4-bp128) with optional delta coding in 1, 2 or 4 lanes.
Patched frame-of-reference codec tuned for sorted data, including s4-fastpfor-d1.
Faster byte-oriented integer compression using control streams and shuffles.
VByte, Masked VByte, VarintGB and VarintG8IU variable-length encodings.
Classic and SIMD frame-of-reference compression for tightly clustered values.
Vectorized set intersection, including SIMD Galloping, for sorted posting lists.
In-place and on-the-fly delta coding to exploit the structure of sorted integers.
Tested on Linux, macOS and Windows; SSE2 minimum, SSE 4.1 / AVX recommended.
The techniques in this library are described in peer-reviewed publications.
Apache 2.0 licensed, patent-free as far as the authors know, and battle-tested across platforms.
Get it on GitHub ↗