Vectorized speed
Exploits SSE/SSE4 instructions to decode at 15 GB/s+. Schemes like SIMDBinaryPacking work over blocks of just 128 integers.
C++ · SIMD · integer compression
FastPFor is a research C++ library of SIMD-accelerated integer compression schemes. On a typical x64 laptop it decompresses at over 15 GB/s — that's 4 billion+ integers per second, far faster than generic codecs like gzip, LZO, Snappy or LZ4.
A battle-tested library used in search engines, genomics tooling, and databases.
Exploits SSE/SSE4 instructions to decode at 15 GB/s+. Schemes like SIMDBinaryPacking work over blocks of just 128 integers.
SIMD doesn't mean weaker ratios. Many schemes both vectorize and compress very well for arrays where most integers are small.
FastPFOR, SIMD-BP, VarintGB, Simple9/16, VByte and more — all behind a single IntegerCODEC factory interface.
Backs peer-reviewed papers and is used by Lucene-derived formats, GMAP/GSNAP, the zsearch engine, and code in DuckDB.
Builds with GCC, Clang, ICC and MSVC on Linux, macOS and Windows. ARM is supported through SIMDe.
Standard CMake. Installable with make install, with configurable portable / native SIMD modes.
Pick a codec, encode your array, decode it back. That's the whole story.
#include "headers/codecfactory.h"
#include "headers/deltautil.h"
using namespace FastPForLib;
CODECFactory factory;
// Pick a codec (e.g. "simdfastpfor256", "simdbinarypacking", "varintg8iu")
IntegerCODEC &codec = *factory.getFromName("simdfastpfor256");
std::vector<uint32_t> mydata(10000);
for (uint32_t i = 0; i < mydata.size(); i += 150) mydata[i] = i;
// --- Encode -------------------------------------------------------------
std::vector<uint32_t> compressed(mydata.size() + 1024); // allocate enough
size_t compressedsize = compressed.size();
codec.encodeArray(mydata.data(), mydata.size(),
compressed.data(), compressedsize);
compressed.resize(compressedsize);
// --- Decode -------------------------------------------------------------
std::vector<uint32_t> recovered(mydata.size());
size_t recoveredsize = recovered.size();
codec.decodeArray(compressed.data(), compressed.size(),
recovered.data(), recoveredsize);
recovered.resize(recoveredsize);
assert(recovered == mydata); // round-trips exactly
Working with sorted integers? Combine FastPFor with differential coding
(Delta::deltaSIMD / Delta::inverseDeltaSIMD) to compress the gaps instead.
You need a C++11 compiler and CMake. On most systems:
git clone https://github.com/fast-pack/FastPFOR.git
cd FastPFor
cmake -B build
cmake --build build
ctest --test-dir build # run the unit tests
Run a quick benchmark:
./build/codecs --clusterdynamic
./build/codecs --uniformdynamic
portable (default, SSE4.2 baseline, safe to distribute)
or native (-march=native, maximum speed on the build machine) via
-DFASTPFOR_SIMD_MODE=portable|native. On x64 your CPU needs SSSE3 (anything since ~2006).
FastPFor has been ported and wrapped across the ecosystem.
Prefer Python? PyFastPFor gives you FastPFor's SIMD integer codecs directly from Python, with the same speed under the hood. Ideal for search, analytics and data-science pipelines.
github.com/searchivarius/PyFastPFor →Working with sorted lists, or want a lighter codec? These pair well with FastPFor.
FastPFor is grounded in peer-reviewed research.