C++ · SIMD · integer compression

Decode billions of integers
per second.

FastPFor is a research C++ library of SIMD-accelerated integer compression schemes. On a typical x64 laptop it decompresses at over 15 GB/s — that's 4 billion+ integers per second, far faster than generic codecs like gzip, LZO, Snappy or LZ4.

Ubuntu CI status Apache 2.0 license C++11 GitHub stars
15+ GB/s
Decompression throughput
4B+ int/s
Integers decoded per second
128 ints
SIMD block size — no 1024 needed
4 languages
Ports: Python, Java, C#, Go

Why FastPFor?

A battle-tested library used in search engines, genomics tooling, and databases.

Vectorized speed

Exploits SSE/SSE4 instructions to decode at 15 GB/s+. Schemes like SIMDBinaryPacking work over blocks of just 128 integers.

🗜️

Great compression

SIMD doesn't mean weaker ratios. Many schemes both vectorize and compress very well for arrays where most integers are small.

🧩

Many codecs, one API

FastPFOR, SIMD-BP, VarintGB, Simple9/16, VByte and more — all behind a single IntegerCODEC factory interface.

🔬

Research-grade

Backs peer-reviewed papers and is used by Lucene-derived formats, GMAP/GSNAP, the zsearch engine, and code in DuckDB.

🖥️

Portable

Builds with GCC, Clang, ICC and MSVC on Linux, macOS and Windows. ARM is supported through SIMDe.

📦

Easy to build

Standard CMake. Installable with make install, with configurable portable / native SIMD modes.

Compress in a few lines

Pick a codec, encode your array, decode it back. That's the whole story.

#include "headers/codecfactory.h"
#include "headers/deltautil.h"

using namespace FastPForLib;

CODECFactory factory;
// Pick a codec (e.g. "simdfastpfor256", "simdbinarypacking", "varintg8iu")
IntegerCODEC &codec = *factory.getFromName("simdfastpfor256");

std::vector<uint32_t> mydata(10000);
for (uint32_t i = 0; i < mydata.size(); i += 150) mydata[i] = i;

// --- Encode -------------------------------------------------------------
std::vector<uint32_t> compressed(mydata.size() + 1024); // allocate enough
size_t compressedsize = compressed.size();
codec.encodeArray(mydata.data(), mydata.size(),
                  compressed.data(), compressedsize);
compressed.resize(compressedsize);

// --- Decode -------------------------------------------------------------
std::vector<uint32_t> recovered(mydata.size());
size_t recoveredsize = recovered.size();
codec.decodeArray(compressed.data(), compressed.size(),
                  recovered.data(), recoveredsize);
recovered.resize(recoveredsize);

assert(recovered == mydata); // round-trips exactly

Working with sorted integers? Combine FastPFor with differential coding (Delta::deltaSIMD / Delta::inverseDeltaSIMD) to compress the gaps instead.

Install & build

You need a C++11 compiler and CMake. On most systems:

git clone https://github.com/fast-pack/FastPFOR.git
cd FastPFor
cmake -B build
cmake --build build
ctest --test-dir build          # run the unit tests

Run a quick benchmark:

./build/codecs --clusterdynamic
./build/codecs --uniformdynamic
Tip — SIMD modes. Build portable (default, SSE4.2 baseline, safe to distribute) or native (-march=native, maximum speed on the build machine) via -DFASTPFOR_SIMD_MODE=portable|native. On x64 your CPU needs SSSE3 (anything since ~2006).

Use it from your language

FastPFor has been ported and wrapped across the ecosystem.

🐍

Python — PyFastPFor

Prefer Python? PyFastPFor gives you FastPFor's SIMD integer codecs directly from Python, with the same speed under the hood. Ideal for search, analytics and data-science pipelines.

github.com/searchivarius/PyFastPFor →

Related libraries

Working with sorted lists, or want a lighter codec? These pair well with FastPFor.

Papers & references

FastPFor is grounded in peer-reviewed research.