C++ · SIMD · integer compression

Decode billions of integers
per second.

FastPFor is a research C++ library of SIMD-accelerated integer compression schemes. On a typical x64 laptop it decompresses at over 15 GB/s — that's 4 billion+ integers per second, far faster than generic codecs like gzip, LZO, Snappy or LZ4.

Get started View on GitHub

15+ GB/s

Decompression throughput

4B+ int/s

Integers decoded per second

128 ints

SIMD block size — no 1024 needed

4 languages

Ports: Python, Java, C#, Go

Why FastPFor?

A battle-tested library used in search engines, genomics tooling, and databases.

⚡

Vectorized speed

Exploits SSE/SSE4 instructions to decode at 15 GB/s+. Schemes like SIMDBinaryPacking work over blocks of just 128 integers.

🗜️

Great compression

SIMD doesn't mean weaker ratios. Many schemes both vectorize and compress very well for arrays where most integers are small.

🧩

Many codecs, one API

FastPFOR, SIMD-BP, VarintGB, Simple9/16, VByte and more — all behind a single IntegerCODEC factory interface.

🔬

Research-grade

Backs peer-reviewed papers and is used by Lucene-derived formats, GMAP/GSNAP, the zsearch engine, and code in DuckDB.

🖥️

Portable

Builds with GCC, Clang, ICC and MSVC on Linux, macOS and Windows. ARM is supported through SIMDe.

📦

Easy to build

Standard CMake. Installable with make install, with configurable portable / native SIMD modes.

Compress in a few lines

Pick a codec, encode your array, decode it back. That's the whole story.

#include "headers/codecfactory.h"
#include "headers/deltautil.h"

using namespace FastPForLib;

CODECFactory factory;
// Pick a codec (e.g. "simdfastpfor256", "simdbinarypacking", "varintg8iu")
IntegerCODEC &codec = *factory.getFromName("simdfastpfor256");

std::vector<uint32_t> mydata(10000);
for (uint32_t i = 0; i < mydata.size(); i += 150) mydata[i] = i;

// --- Encode -------------------------------------------------------------
std::vector<uint32_t> compressed(mydata.size() + 1024); // allocate enough
size_t compressedsize = compressed.size();
codec.encodeArray(mydata.data(), mydata.size(),
                  compressed.data(), compressedsize);
compressed.resize(compressedsize);

// --- Decode -------------------------------------------------------------
std::vector<uint32_t> recovered(mydata.size());
size_t recoveredsize = recovered.size();
codec.decodeArray(compressed.data(), compressed.size(),
                  recovered.data(), recoveredsize);
recovered.resize(recoveredsize);

assert(recovered == mydata); // round-trips exactly

Working with sorted integers? Combine FastPFor with differential coding (Delta::deltaSIMD / Delta::inverseDeltaSIMD) to compress the gaps instead.

Install & build

You need a C++11 compiler and CMake. On most systems:

git clone https://github.com/fast-pack/FastPFOR.git
cd FastPFor
cmake -B build
cmake --build build
ctest --test-dir build          # run the unit tests

Run a quick benchmark:

./build/codecs --clusterdynamic
./build/codecs --uniformdynamic

Tip — SIMD modes. Build portable (default, SSE4.2 baseline, safe to distribute) or native (-march=native, maximum speed on the build machine) via -DFASTPFOR_SIMD_MODE=portable|native. On x64 your CPU needs SSSE3 (anything since ~2006).

Use it from your language

FastPFor has been ported and wrapped across the ecosystem.

🐍

Python — PyFastPFor

Prefer Python? PyFastPFor gives you FastPFor's SIMD integer codecs directly from Python, with the same speed under the hood. Ideal for search, analytics and data-science pipelines.

github.com/searchivarius/PyFastPFor →

Java JavaFastPFOR Used by ClueWeb Tools C# CSharpFastPFOR .NET integer compression Go encoding Integer compression for Go Go intcomp Fast integer compression

Related libraries

Working with sorted lists, or want a lighter codec? These pair well with FastPFor.

Papers & references

FastPFor is grounded in peer-reviewed research.

D. Lemire, N. Kurz, C. Rupp. Stream VByte: Faster Byte-Oriented Integer Compression. Information Processing Letters 130, 2018. arXiv:1709.08990
D. Lemire, L. Boytsov. Decoding billions of integers per second through vectorization. Software: Practice & Experience 45(1), 2015. arXiv:1209.2137
D. Lemire, L. Boytsov, N. Kurz. SIMD Compression and the Intersection of Sorted Integers. Software: Practice & Experience 46(6), 2016. arXiv:1401.6399
W. X. Zhao, X. Zhang, D. Lemire, et al. A General SIMD-based Approach to Accelerating Compression Algorithms. ACM TOIS 33(3), 2015. arXiv:1502.01916

Decode billions of integersper second.

Why FastPFor?

Vectorized speed

Great compression

Many codecs, one API

Research-grade

Portable

Easy to build

Compress in a few lines

Install & build

Use it from your language

Python — PyFastPFor

Related libraries

Papers & references

Decode billions of integers
per second.