MaskedVByte is a fast, vectorized decoder for VByte-compressed 32‑bit integers, with optional differential (delta) coding. It turns the venerable byte-oriented varint format into a SIMD-accelerated data path.
Measured on a single core decoding 16M integers; see Performance for details.
The classic VByte (varint) format is compact and ubiquitous — MaskedVByte makes it fast to read back.
Uses SSE4.1 to decode many integers at once with a mask-driven shuffle, instead of branching byte by byte.
Reads ordinary continuation-bit varints. Compatible with the format used across search engines and databases.
Built-in delta variants for sorted sequences and small gaps — fewer bytes per integer and even faster decoding.
select and search helpers let you jump into and scan delta-coded streams without full decompression.
Plain C99 with a clean header API. No dependencies. Builds with make or CMake; vendor it or install it.
The algorithm is described in a peer-reviewed paper and shipped in production search systems such as Lucene forks.
In the VByte format, each integer is stored in one to five bytes. The high bit of each byte — the continuation bit — says whether the integer continues into the next byte. A scalar decoder walks the stream one byte at a time, branching on every bit.
MaskedVByte instead:
The result is a decoder whose cost is driven by data width rather than per-byte branching — turning unpredictable branches into predictable vector work.
// 4 integers → variable-length bytes
// high bit = "continues"
120 → [0x78]
1000 → [0xE8 0x07]
3 → [0x03]
70000 → [0xF0 0xA2 0x04]
/* scalar: branch on every byte
masked: read the mask once,
shuffle, decode in lanes */
Clone, build, and decode in a few lines. Requires an x86‑64 CPU with SSE4.1 (an ARM/NEON shim is also included).
# clone
git clone https://github.com/fast-pack/MaskedVByte
cd MaskedVByte
# build the library + tests
make
./unit # quick correctness test
# or with CMake
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j
ctest --test-dir build
#include "varintencode.h"
#include "varintdecode.h"
int main(void) {
int N = 5000;
uint32_t *in = malloc(N * sizeof(uint32_t));
uint8_t *comp = malloc(N * sizeof(uint32_t));
uint32_t *recov = malloc(N * sizeof(uint32_t));
for (int k = 0; k < N; ++k) in[k] = 120;
// encode with classic VByte ...
size_t n = vbyte_encode(in, N, comp);
// ... decode fast with MaskedVByte
masked_vbyte_decode(comp, recov, N);
printf("Compressed %d ints to %zu bytes\n", N, n);
}
Throughput from the bundled benchmark decoding 16,777,216 integers, 5 repeats, single core.
MaskedVByte benchmark: 16777216 integers, 5 repeats
plain decode : 1384.94 mis/s (6.751 GB/s, 4.87 bytes/int)
delta decode : 2221.56 mis/s (3.341 GB/s, 1.50 bytes/int)
select_delta : validated (100000 random slots)
search_delta : validated (100000 random keys)
All results validated. Code looks good.
Figures above are from one representative run on Apple Silicon via the NEON shim; absolute numbers vary by CPU and data. On x86‑64 with SSE4.1, the original studies report MaskedVByte decoding at roughly twice the speed of an optimized scalar VByte decoder.
The full surface is two headers in include/. Here are the functions you will reach for most.
| Function | What it does |
|---|---|
| Encoding | |
vbyte_encode(in, length, bout) | Encode an array of integers with classic VByte. |
vbyte_encode_delta(in, length, bout, prev) | Delta-encode a sorted array starting from prev. |
| Decoding | |
masked_vbyte_decode(in, out, length) | Vectorized decode of length integers. |
masked_vbyte_decode_delta(in, out, length, prev) | Vectorized decode of a delta-coded stream. |
masked_vbyte_decode_fromcompressedsize(in, out, inputsize) | Decode exactly inputsize compressed bytes. |
masked_vbyte_decode_fromcompressedsize_delta(...) | Same, for a delta-coded stream. |
| Random access (delta) | |
masked_vbyte_select_delta(in, length, prev, slot) | Return the value at a given position. |
masked_vbyte_search_delta(in, length, prev, key, presult) | Find the first value ≥ key. |
Prefer the delta variants when your data is sorted or has small gaps — fewer bytes and faster decoding.
After cmake --install, link against the exported target — or vendor the
repository and add it as a subdirectory. Either way you get
maskedvbyte::maskedvbyte.
The library is released under the permissive Apache 2.0 license, so it drops cleanly into both open-source and commercial projects.
# installed package
find_package(maskedvbyte CONFIG REQUIRED)
target_link_libraries(your_target
PRIVATE maskedvbyte::maskedvbyte)
# or vendored as a subdirectory
add_subdirectory(path/to/MaskedVByte)
target_link_libraries(your_target
PRIVATE maskedvbyte::maskedvbyte)
If MaskedVByte helps your research, please cite the papers behind it.
Part of a broader family of high-performance integer compression tools.