ext-turbovec

ext-turbovec is a PHP 8.3+ extension for in-process vector indexing and approximate-nearest-neighbor search. It is the retrieval half of a fully local RAG stack for PHP — ext-infer generates embeddings, ext-turbovec indexes and searches them. No vector database, no Python sidecar, no data leaving the machine.

The engine is the turbovec Rust crate by Ryan Codrai, an implementation of Google Research’s TurboQuant algorithm (Gollapudi et al., arXiv:2504.19874):

Data-oblivious quantization — 2 or 4 bits per coordinate with near-optimal distortion and no training phase. Add vectors and they’re searchable; the index never needs rebuilding.
SIMD search kernels — hand-written NEON on ARM, AVX2/AVX-512BW on x86 (selected at runtime), competitive with or faster than FAISS IndexPQFastScan.
Kernel-level filtering — id allowlists are honored inside the scan, so selective filters get faster, not less accurate.

This extension is a downstream bindings consumer of upstream turbovec, not a fork — index behavior, file formats, and the quantization math are all upstream’s work.

Where to go next

Installation — PIE or from source.
Quick start — an index searching in ten lines.
Packed vectors — the one contract you must understand: how vectors cross into the extension.
Semantic search with ext-infer — the full local RAG loop.

Installation

Via PIE (recommended)

PIE is PHP’s installer for extensions:

pie install displace/ext-turbovec

PIE picks the pre-built binary for your PHP minor (8.3/8.4/8.5), platform (macOS arm64, Linux x86_64, Linux arm64), and thread-safety mode, drops it into your extension directory, and enables it.

Linux: OpenBLAS

The index engine links OpenBLAS on Linux. If the extension fails to load with a libopenblas.so.0: cannot open shared object file error:

sudo apt install libopenblas0        # Debian/Ubuntu
sudo dnf install openblas            # Fedora/RHEL

macOS needs nothing — the Accelerate framework ships with the OS.

From source

Requirements: Rust 1.89+ (rustup will pick up the repo’s toolchain pin), PHP 8.3+ with development headers (php-dev / php-config on PATH), libclang (libclang-dev), and on Linux libopenblas-dev.

git clone https://github.com/DisplaceTech/ext-turbovec
cd ext-turbovec
make release             # target/release/libturbovec.{so,dylib}

Load it ad hoc:

php -d extension=$PWD/target/release/libturbovec.so your-script.php

…or permanently, by adding extension=/path/to/libturbovec.so to your php.ini. make install (via cargo-php) automates the copy-and-enable.

Supported targets

PHP 8.3 / 8.4 / 8.5 on macOS arm64, Linux x86_64 (Haswell 2013+ for the SIMD fast path; older CPUs use a scalar fallback), and Linux arm64. Windows is not supported. See the compatibility matrix.

Quick start

An index of three vectors, searched, in one script:

<?php
use Displace\Vector\TurboQuantIndex;
use Displace\Vector\Vectors;

// dim must be a positive multiple of 8 — real embeddings (384, 768,
// 1024, 1536, ...) all qualify. We'll use 8 to keep the example tiny.
$index = new TurboQuantIndex(dim: 8, bitWidth: 4);

// Vectors enter as packed float32 strings: pack('g*', ...$floats).
// Batches are plain string concatenation.
$index->add(
    pack('g*', 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) .   // id 0
    pack('g*', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) .   // id 1
    pack('g*', 0.7, 0.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)     // id 2
);

// Search with a query vector; get back ids + scores, best-first.
$result = $index->search(pack('g*', 0.9, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), k: 2);

foreach ($result as $row) {
    printf("id %d  score %+.4f\n", $row['id'], $row['score']);
}
// id 0  score +0.9...    <- closest to the query
// id 2  score +0.5...

Three things to take away:

Vectors are packed strings. pack('g*', ...$floats) — or Vectors::pack($floats) — is the only input format. Why, and the full rules →
Ids are positional in TurboQuantIndex (the Nth added vector is id N). When your vectors belong to database rows, use IdMapIndex and address them by your own ids.
Scores are similarities — higher is better, results come best-first.

For something real, feed it actual embeddings: semantic search with ext-infer →

Verifying your install

php -m | grep turbovec

A deeper check that exercises construction, add, and search:

php -r '
$i = new Displace\Vector\TurboQuantIndex(8);
$i->add(pack("g*", 1,0,0,0,0,0,0,0));
$r = $i->search(pack("g*", 1,0,0,0,0,0,0,0), 1);
echo $r->ids()[0] === 0 ? "ext-turbovec OK\n" : "unexpected result\n";
'

Troubleshooting

libopenblas.so.0: cannot open shared object file (Linux) — install the OpenBLAS runtime: sudo apt install libopenblas0 (Debian/Ubuntu) or your distro’s equivalent.

undefined symbol errors at load — the binary was built for a different PHP minor. PIE matches automatically; for source builds, rebuild against the php-config of the PHP that’s loading it.

extension_loaded('turbovec') is false but no error — check which ini PHP is reading (php --ini) and that the extension= line points at an absolute path.

Old/exotic x86 CPU — pre-Haswell (2013) processors run the scalar fallback. Results are identical; throughput is lower. Nothing to configure — kernel selection is automatic at runtime.

Packed vectors

Every index method takes vectors as packed little-endian float32 binary strings — the output of PHP’s pack() with the g format code:

$one   = pack('g*', ...$floats);        // one vector
$batch = $one . $another . $third;      // batches: plain concatenation

Why packed strings, not arrays?

A PHP array of one million floats is one million zvals — each a 16-byte tagged value behind a hashtable, inflated and walked on every call. The same data as a packed string is a single contiguous buffer the extension reads in one pass. It’s the difference between an FFI boundary crossing that’s effectively a memcpy and one that allocates a small heap.

There is deliberately only one input path. Methods don’t silently accept arrays and convert them, because that would make the slow path invisible. If you have arrays, convert explicitly:

use Displace\Vector\Vectors;

$packed = Vectors::pack($floats);              // === pack('g*', ...$floats)
$floats = Vectors::unpack($packed, dim: 768);  // flat list<float> back out

unpack() returns a flat list (it round-trips pack() exactly); use array_chunk($floats, $dim) if you want per-vector rows.

The validation rules

For an index of dimensionality dim:

Input	Rule	On violation
`add()` / `addWithIds()` payload	`strlen % (4 * dim) === 0`	`DimensionMismatchException`
`search()` query	`strlen === 4 * dim` (exactly one vector)	`DimensionMismatchException`
every coordinate	finite, `abs(value) < 1e16`	`InvalidArgumentException`

The NaN/Inf rule is not pedantry: a single NaN coordinate would silently corrupt the per-vector scale inside the quantizer — the vector would count toward count() but never match any query. The extension rejects the payload up front, and a rejected call never partially applies.

Precision

PHP floats are 64-bit doubles; the packed format is 32-bit floats. The narrowing happens once, at pack time — identical to what pack('g*') itself does, and far above the precision the 2/4-bit quantizer keeps anyway.

Endianness

The format is explicitly little-endian (g, not G). All supported platforms are little-endian, and the extension refuses to compile for big-endian targets, so pack('g*') output is portable across every machine ext-turbovec runs on — including index files moved between them.

Where packed vectors come from

ext-infer: Vectors::pack($embedding->vector()) today; a packed fast path on the ext-infer side is planned (see the semantic search recipe).
Remote APIs: most embedding APIs return JSON arrays — Vectors::pack($response['embedding']).
Files/DB columns: if you stored pack('g*') blobs, pass them through unchanged; concatenation is batching.

Choosing an index

Two index classes share the same engine and differ only in how vectors are addressed.

`TurboQuantIndex` — positional ids

The Nth vector added is id N, forever. There is no removal. Use it when the corpus is append-only (or rebuilt wholesale) and you keep your own side table mapping positions to documents — or when position is the key, e.g. line numbers in a file.

$index = new TurboQuantIndex(dim: 768, bitWidth: 4);
$index->add($packedBatch);            // ids 0..n-1, in order

`IdMapIndex` — your ids, O(1) remove

Wraps the same engine with a bidirectional id table. You address vectors by arbitrary non-negative ints — SQL primary keys, content hashes truncated to 63 bits, whatever identifies a document in your system — and those ids survive any number of other insertions and removals.

$index = new IdMapIndex(dim: 768, bitWidth: 4);
$index->addWithIds($packedBatch, [1001, 1002, 1003]);
$index->remove(1002);                 // O(1)

addWithIds() rejects ids already present (and duplicates within the call) before adding anything — a failed call never partially applies. remove() of an absent id throws rather than silently doing nothing.

Removal is constant-time because it’s a swap-remove internally; the id table absorbs the reshuffling so external ids never move. The cost is a hash lookup per result row at search time — negligible against the scan.

Default to IdMapIndex for anything document-shaped. The positional index is the right choice only when you genuinely don’t need stable identity, filtering by id, or deletion.

Constructor parameters (both classes)

dim — vector dimensionality. Must be a positive multiple of 8, at most 65536. Every common embedding size qualifies (384, 512, 768, 1024, 1536, 3072, 4096). Locked at construction; payloads that disagree throw.
bitWidth — bits per coordinate after quantization, 2 or 4 (default 4). 4-bit is the right default: recall close to full-precision at an 8× size reduction. 2-bit halves memory again and is worth evaluating when your top-k feeds a reranker that can absorb a recall dip. The bit width is baked into the index (and its files) — changing it means re-adding the source vectors.

Searching

$result = $index->search($packedQuery, k: 10);

The query is exactly one packed vector (strlen === 4 * dim); k is the number of neighbors you want. The result is a SearchResult — immutable, Countable, and iterable:

count($result);                        // rows returned
$result->ids();                        // list<int>, best-first
$result->scores();                     // list<float>, parallel to ids()

foreach ($result as $row) {
    // $row = ['id' => int, 'score' => float]
}

Scores

Scores are inner-product similarities computed against the quantized codes — higher is better, and rows always arrive best-first. For unit-length (normalized) embeddings, inner product is cosine similarity, so a self-match scores ≈ 1.0 and unrelated vectors hover near 0. Most embedding models emit (or recommend) normalized vectors; normalize at embed time and the scores read naturally.

Treat absolute scores as model-specific: thresholds that mean “relevant” for one embedding model won’t transfer to another. Ranking is what the index guarantees.

How many rows come back

min(k, eligible) — where eligible is the index size, or the (deduplicated) allowlist size when filtering. Asking an index of 3 vectors for k: 10 returns 3 rows; there are no padded or sentinel rows to skip.

Concurrency

search() never mutates the index and is safe to call from multiple threads on the same instance (relevant under ZTS/parallel; trivially true in FPM where each worker has its own objects). The first search after add() or load() pays a one-time cost building the SIMD-blocked code layout; subsequent searches read it lock-free.

Filtered search

IdMapIndex::search() takes an optional allowlist of ids:

$result = $index->search($query, k: 10, allowlist: [1001, 1007, 1042]);

Every returned id is from the allowlist; everything else is invisible to the query. This is the hybrid-retrieval primitive: let SQL, BM25, an ACL check, or a time window pick the candidates, and let the vector index rank them.

Semantics

Row count is min(k, count(allowlist)) after deduplication. An allowlist of 5 with k: 10 returns exactly 5 rows — never padded fallbacks from outside the list.
Duplicate ids in the allowlist are fine (deduplicated internally).
An empty allowlist throws InvalidArgumentException. “No candidates” and “don’t filter” are different intents — pass null (or omit the argument) to search unfiltered.
Every allowlist id must currently be in the index; unknown ids throw rather than being silently ignored, because a stale candidate list usually means your index and your database have drifted.

Performance

Filtering happens inside the SIMD kernel at 32-vector block granularity: blocks containing no allowed slot are skipped before any scoring work, and disallowed slots within scored blocks are dropped at heap-insert. A selective allowlist therefore reduces work — you don’t pay for scanning the whole index and discarding rows afterwards, and there is no recall penalty on small candidate sets (this is exact filtering, not post-hoc).

The allowlist itself costs one O(1) contains check per entry at call time plus a bitmask build proportional to the index size. For per-request filters of thousands of ids this is noise; if you find yourself passing millions of allowed ids per query, invert the problem — search unfiltered and intersect afterwards.

Worked end-to-end example: filtered search over SQL ids →

Persistence

Both index classes serialize to a single file and load back with search results preserved bit-exactly — the quantized codes round-trip unchanged, so a query against a loaded index returns identical ids and identical scores.

$index->write('corpus.tvim');
$index = IdMapIndex::load('corpus.tvim');

TurboQuantIndex writes the .tv format (codes + scales + calibration).
IdMapIndex writes .tvim (.tv payload plus the id tables — removals and id assignments survive the round-trip).

The extensions are conventions, not requirements — the loader checks magic bytes, not filenames. Loading the wrong class for a file (or a corrupt/truncated file) throws IndexIOException.

Format stability

The on-disk formats are versioned by upstream turbovec (v3 as of upstream 0.6+; v2 files load transparently; v1 is refused with a rebuild hint). This extension pins upstream exactly (=0.9.0 for this release) so a given ext-turbovec version always reads and writes one known format. When a release bumps the upstream pin, the release notes state whether existing files remain loadable.

Files are portable across all supported platforms — the format is little-endian everywhere, and so are all targets the extension compiles for.

What persistence is (and isn’t)

write() is a full snapshot, not a WAL: writing after every add() rewrites the whole file. For corpora that mutate continuously, treat the index as a cache rebuilt from your system of record (see persistence patterns for the atomic-swap and rebuild idioms). load() currently reads the file into memory; mmap-backed loading for very large indexes is on the roadmap (PLAN.md).

Memory footprint

The index stores, per vector:

dim × bitWidth / 8 bytes of quantized codes
4 bytes (one float32 scale)

Plus per-index overhead that doesn’t grow with the corpus: a dim² float32 rotation matrix and small calibration tables, materialized lazily on first search. IdMapIndex adds ~24 bytes per vector for the id tables.

Worked examples (4-bit)

Corpus	Raw float32	ext-turbovec	Ratio
100K × 1024	410 MB	~52 MB	8×
1M × 768	3.1 GB	~390 MB	8×
10M × 768	31 GB	~3.9 GB	8×

The math for the first row:

100,000 × 1024 × 4 bits = 51.2 MB   codes
100,000 × 4 bytes       =  0.4 MB   scales
                          ────────
                          ~52 MB    (vs 410 MB raw)

At bitWidth: 2 the codes halve again (100K × 1024 ≈ 26 MB) — worth evaluating when a downstream reranker can absorb a small recall dip.

Transient costs to know about

First search after add()/load() builds the SIMD-blocked layout — roughly the size of the codes, briefly held alongside them during the repack.
add() ingestion processes your packed payload through rotation and quantization; peak memory during the call is a small multiple of the batch size. Feed multi-gigabyte corpora in chunks (e.g. 100K vectors per add()) rather than one giant string.
The rotation matrix is dim² × 4 bytes — 4 MB at dim 1024, 67 MB at dim 4096. Per index instance, independent of corpus size.

FPM sizing

Each worker that loads an index holds its own copy (load() reads into private memory today). For a 50 MB index across 20 workers, budget 1 GB, or front the search with a small pool of dedicated workers instead of loading in every FPM child. mmap-backed sharing is on the roadmap.

Semantic search with ext-infer

The canonical pairing: ext-infer turns text into vectors, ext-turbovec turns vectors into search. Both run inside the PHP process — the whole retrieval loop is local.

A runnable version of this recipe ships in the repo as examples/semantic-search.php.

Indexing

use Displace\Infer\Model;
use Displace\Vector\IdMapIndex;

// Any purpose-built embedding GGUF: BGE, E5, GTE, Qwen3-Embedding, ...
$model = Model::load('models/bge-small-en-v1.5-q8_0.gguf', ['embedding' => true]);

// $documents: id => text, e.g. straight out of your database.
$index = null;
foreach ($documents as $id => $text) {
    $embedding = $model->embed($text)->normalize();   // unit length -> cosine scores
    $index   ??= new IdMapIndex(dim: $embedding->dimensions(), bitWidth: 4);
    $index->addWithIds($embedding->packed(), [$id]);
}

$index->write('corpus.tvim');     // embed once, search forever

Two details that matter:

normalize() — unit-length vectors make the index’s inner-product scores equal cosine similarity, so a perfect match reads ≈ 1.0.
packed() (ext-infer ≥ 0.2) emits the packed little-endian float32 contract directly from the Rust side — the embedding’s coordinates never inflate into PHP values on their way into the index. On ext-infer 0.1, bridge with Vectors::pack($embedding->vector()) instead.

For large corpora, batch: accumulate packed() strings and ids in PHP arrays, then call addWithIds(implode('', $packed), $ids) every few thousand documents — packed vectors batch by plain string concatenation.

Long documents retrieve better as chunks than as whole-file vectors. displace/ai-toolkit ships structure-aware chunkers that pair with this loop:

use Displace\AI\Toolkit\Text\RecursiveCharacterChunker;

$chunker = new RecursiveCharacterChunker(size: 2000, overlap: 200);

foreach ($documents as $id => $text) {
    foreach ($chunker->chunk($text) as $chunk) {
        // embed $chunk, mapping your own composite id => chunk position
    }
}

Querying

$result = $index->search(
    $model->embed('how do I reset my password?')->normalize()->packed(),
    k: 5,
);

foreach ($result as $row) {
    printf("%.3f  %s\n", $row['score'], $documents[$row['id']]);
}

Closing the RAG loop

Feed the hits back into a chat model — also via ext-infer — and you have retrieval-augmented generation with zero services:

$context = implode("\n\n", array_map(
    fn (array $row): string => $documents[$row['id']],
    iterator_to_array($result),
));

$chat   = Model::load('models/Qwen3-4B-Q4_K_M.gguf');
$answer = $chat->chat(
    \Displace\Infer\Prompt::system("Answer using only this context:\n{$context}")
        ->withUser($question),
    maxTokens: 512,
);
echo $answer->answer();

Use one model handle for embeddings and a separate one for chat — the embedding flag is a load-time mode in ext-infer.

Decoupling with ai-contracts

Everything above names concrete classes. If your application (or a framework you’re integrating with) should not depend on a specific engine, code against the displace/ai-contracts interfaces instead and wrap the extensions in thin adapters:

use Displace\AI\Contracts\Embedder;
use Displace\AI\Contracts\VectorIndex;
use Displace\Infer\Model;
use Displace\Vector\IdMapIndex;

final class InferEmbedder implements Embedder
{
    public function __construct(private readonly Model $model) {}

    public function embed(string $text): string
    {
        return $this->model->embed($text)->normalize()->packed();
    }

    public function embedBatch(array $texts): string
    {
        return implode('', array_map($this->embed(...), $texts));
    }

    public function dimensions(): int
    {
        return $this->model->embed(' ')->dimensions();
    }
}

final class TurbovecIndex implements VectorIndex
{
    public function __construct(private readonly IdMapIndex $index) {}

    public function add(string $vectors, array $ids): void
    {
        $this->index->addWithIds($vectors, $ids);
    }

    public function search(string $query, int $k = 10, ?array $allowlist = null): array
    {
        return iterator_to_array($this->index->search($query, $k, $allowlist));
    }

    public function remove(int $id): void
    {
        $this->index->remove($id);
    }

    public function count(): int
    {
        return $this->index->count();
    }
}

Application code then takes Embedder $embedder, VectorIndex $index and never mentions either extension — the packed-float32 buffers flow from embedBatch() straight into add() with no conversion in between. Swap in an API-backed embedder or a database-backed index without touching the call sites.

Filtered search over SQL ids

The pattern: your database decides which documents are eligible (tenancy, ACLs, status, date ranges — things SQL is good at), and the vector index ranks within that candidate set. Because IdMapIndex uses your primary keys as vector ids, the handoff is just an array of ints.

Index with primary keys

use Displace\Vector\IdMapIndex;

// One-time (or scheduled) build: vectors keyed by the documents table's PK.
$index = new IdMapIndex(dim: 768, bitWidth: 4);

$stmt = $pdo->query('SELECT id, embedding FROM documents');   // embedding: pack('g*') BLOB
while ($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
    $index->addWithIds($row['embedding'], [(int) $row['id']]);
}
$index->write('documents.tvim');

Query within a SQL-defined candidate set

$index = IdMapIndex::load('documents.tvim');

// Stage 1: SQL narrows to what this user may see.
$stmt = $pdo->prepare(
    'SELECT id FROM documents WHERE tenant_id = ? AND status = "published"'
);
$stmt->execute([$tenantId]);
$allowed = array_map('intval', $stmt->fetchAll(PDO::FETCH_COLUMN));

if ($allowed === []) {
    return [];                       // no candidates -> no search
    // (an empty allowlist throws by design: "no candidates" must be
    //  handled by you, not silently treated as "no filter")
}

// Stage 2: dense rerank inside the candidate set.
$result = $index->search($packedQueryVector, k: 10, allowlist: $allowed);

// Stage 3: hydrate the hits, preserving rank order.
$in   = implode(',', array_fill(0, count($result), '?'));
$stmt = $pdo->prepare("SELECT * FROM documents WHERE id IN ($in)");
$stmt->execute($result->ids());
$byId = array_column($stmt->fetchAll(PDO::FETCH_ASSOC), null, 'id');

foreach ($result as $row) {
    $hit = $byId[$row['id']];
    printf("%.3f  %s\n", $row['score'], $hit['title']);
}

Keeping index and table in sync

INSERT → embed + addWithIds($packed, [$pk])
DELETE → remove($pk) (O(1))
UPDATE of content → remove($pk) then re-add with the new embedding

An unknown id in the allowlist throws — that’s drift detection, not an inconvenience. If you see it, a row was deleted from the index (or never embedded) while SQL still returns it; reconcile rather than catching and ignoring.

Filtering is exact and happens inside the SIMD kernel, so small candidate sets are cheaper than unfiltered searches — details in the filtering guide.

Persistence patterns

write() is a full snapshot. These idioms make snapshots safe and cheap in real deployments.

Atomic swap

Never write over the file a live process might be loading. Write to a temp path on the same filesystem, then rename() — atomic on POSIX:

$tmp = $path . '.tmp.' . getmypid();
$index->write($tmp);
rename($tmp, $path);          // readers see old-or-new, never partial

Embed once, serve many

Embedding is the expensive step; searching is cheap. Split the lifecycle:

A builder (cron job, queue worker, deploy step) embeds documents and writes corpus.tvim.
Servers load() the snapshot at startup (or on mtime change) and only search. search() is concurrency-safe; a loaded index serving reads needs no locking.

// In a long-lived worker:
static $index = null, $loadedAt = 0;
$mtime = filemtime('corpus.tvim');
if ($index === null || $mtime > $loadedAt) {
    $index    = IdMapIndex::load('corpus.tvim');
    $loadedAt = $mtime;
}

Incremental updates vs rebuilds

IdMapIndex handles live addWithIds()/remove() fine — persist by re-snapshotting on a schedule (the atomic swap above), not after every mutation. Two caveats that suggest periodic rebuilds from source instead of indefinite incremental mutation:

Quantization calibration is fitted on the first batch and reused for all later adds (by design — all vectors must share a coordinate system). If your embedding distribution drifts far from that first batch, a fresh rebuild re-fits calibration.
Removals are swap-removes; space is reclaimed, but a corpus that has churned 90% since its first batch deserves a clean rebuild anyway.

A rebuild is just: new index, re-add from your system of record, write to temp, swap. With stored pack('g*') blobs (no re-embedding), it runs at ingest speed — typically seconds per million vectors.

Versioning your snapshots

Name files by content, not just corpus.tvim:

$file = sprintf('corpus-%s-d%d-b%d.tvim', $modelTag, $dim, $bitWidth);

An index is only meaningful with the embedding model that produced its vectors — encode the model identity in the filename so a model upgrade can’t silently mix old and new vector spaces. Keep the previous snapshot for instant rollback.

API surface

Everything lives in the Displace\Vector namespace. Authoritative signatures (IDE-ready, with docblocks) are in stubs/vector.stubs.php.

`TurboQuantIndex`

Positional ids: the Nth vector added is id N. No removal.

Method	Notes
`__construct(int $dim, int $bitWidth = 4)`	`dim`: positive multiple of 8, ≤ 65536. `bitWidth`: 2 or 4.
`add(string $vectors): void`	Packed batch; `strlen % (4*dim) === 0`. Empty string is a no-op.
`count(): int`	Vectors currently in the index.
`search(string $query, int $k = 10): SearchResult`	Exactly one packed vector; `k >= 1`. Returns `min(k, count)` rows, best-first.
`write(string $path): void`	Versioned `.tv` snapshot; bit-exact round-trip.
`static load(string $path): TurboQuantIndex`

`IdMapIndex`

Stable external ids (0..PHP_INT_MAX), O(1) removal, allowlist filtering. Same constructor, count(), write()/load() (.tvim format) as above, plus:

Method	Notes
`addWithIds(string $vectors, array $ids): void`	One non-negative int per vector. Duplicate/known ids rejected up front; never partially applies.
`search(string $query, int $k = 10, ?array $allowlist = null): SearchResult`	With an allowlist: results ⊆ allowlist, row count `min(k, count(unique allowlist))`. Empty allowlist throws; unknown ids throw.
`remove(int $id): void`	O(1). Absent id throws.

`SearchResult`

Immutable; implements Countable and IteratorAggregate. Not directly constructible.

Method	Notes
`ids(): array`	`list<int>`, best-first. Positional slots or your external ids, per the producing index.
`scores(): array`	`list<float>`, parallel to `ids()`. Inner-product similarity — higher is better; equals cosine for unit-length vectors.
`count(): int`	Row count (also `count($result)`).
`getIterator(): SearchResultIterator`	Rows iterate as `['id' => int, 'score' => float]` (also implicit in `foreach`).

SearchResultIterator is an internal support class (@internal); it implements \Iterator and exists so getIterator() satisfies the interface — obtain it via foreach, not directly.

`Vectors`

Static helpers; not instantiable.

Method	Notes
`static pack(array $floats): string`	Byte-identical to `pack('g*', ...$floats)`. Ints accepted; anything else throws.
`static unpack(string $packed, int $dim): array`	Flat `list<float>`; validates `strlen % (4*dim) === 0`. Exact `pack()` round-trip.

Exceptions

See Exceptions for the hierarchy and which methods throw what.

Exceptions

\RuntimeException
  └── Displace\Vector\VectorException
        ├── Displace\Vector\InvalidArgumentException
        │     └── Displace\Vector\DimensionMismatchException
        └── Displace\Vector\IndexIOException

Catch VectorException for “anything this extension threw”; \RuntimeException clauses you already have keep working. A dimension mismatch is an invalid argument, hence the nesting — catch at whichever precision you need.

`InvalidArgumentException`

A malformed argument that isn’t a payload-length problem:

bitWidth other than 2 or 4; dim not a positive multiple of 8 or over 65536
k < 1
NaN/Inf/|x| >= 1e16 coordinates in a vector or query (these would silently corrupt the index, so they’re rejected up front)
negative ids; ids already present; duplicate ids within one addWithIds() call; id-count ≠ vector-count
remove() of an id not in the index
an empty allowlist (pass null to search unfiltered) or an allowlist id not in the index

`DimensionMismatchException`

The packed payload disagrees with the index’s dim:

add()/addWithIds(): strlen($vectors) not a multiple of 4 * dim
search(): strlen($query) !== 4 * dim (exactly one vector)
Vectors::unpack(): length not a multiple of 4 * dim

`IndexIOException`

write()/load() filesystem and format failures: missing or unreadable path, permissions, truncated file, wrong magic bytes (e.g. loading a .tvim with TurboQuantIndex), or an incompatible format version. Messages include the offending path.

`VectorException` (directly)

Thrown directly only for refused construction of result/helper classes (new SearchResult, new Vectors, …), which are produced by the API rather than instantiated.

Guarantees

A throwing call never partially applies: a rejected batch adds nothing, ids from a failed addWithIds() are not reserved, and the index is left exactly as it was.

Compatibility matrix

	macOS arm64	Linux x86_64	Linux arm64	Windows
PHP 8.3	✅	✅	✅	—
PHP 8.4	✅	✅	✅	—
PHP 8.5	✅	✅	✅	—

Release binaries are NTS. ZTS is enabled in composer.json and the code is thread-safe by design (search is immutable-shared; objects are request-local), but no ZTS runner exercises it in CI yet.

CPU

CPU-only by design — no GPU path exists upstream.

x86_64: binaries target the x86-64-v3 baseline (Haswell 2013+ — AVX2/FMA) for general code; the AVX-512BW kernel engages automatically at runtime where available, and pre-AVX2 CPUs fall back to a scalar path with identical results.
arm64: NEON kernels, no special requirements (Apple Silicon and Graviton-class cores both qualify).
Big-endian platforms are unsupported (the packed-vector ABI is little-endian; the extension refuses to compile there).

Runtime dependencies

Platform	Needs
Linux	`libopenblas0` (`libopenblas-dev` to build from source)
macOS	nothing — Accelerate ships with the OS

Upstream pin

ext-turbovec	turbovec crate	index format
0.1.x	=0.9.0	v3 (reads v2; refuses v1 with a rebuild hint)

The pin is exact so a given extension version always speaks one known on-disk format. Index files are portable across all supported platforms.

musl / Alpine

Not in the release matrix. Source builds should work (the repo carries the crt-static opt-out musl cdylibs need; install openblas-dev), but they’re not CI-verified.

Building from source

Prerequisites

Rust — rustup honors the repo’s rust-toolchain.toml (1.89.0; the floor comes from upstream’s AVX-512 target features).
PHP 8.3+ with dev headers — php-config must be on PATH (or set PHP_CONFIG). Debian/Ubuntu: apt install php8.3-dev; macOS: Homebrew PHP includes it.
libclang — for ext-php-rs’s bindgen. Debian/Ubuntu: apt install libclang-dev; macOS: ships with Xcode CLT.
OpenBLAS (Linux only) — apt install libopenblas-dev. macOS uses the built-in Accelerate framework.

Build

make build        # debug -> target/debug/libturbovec.{so,dylib}
make release      # optimized
make clippy       # cargo clippy --all-targets -- -D warnings (CI-enforced)
make fmt          # rustfmt

The extension builds against whichever PHP php-config describes — ext-php-rs generates version-specific bindings at build time, so a binary built against 8.3 will not load into 8.4. Rebuild per minor.

Loading the dev build

php -d extension=$PWD/target/debug/libturbovec.so -m | grep turbovec

make install (requires cargo install cargo-php) copies the release build into your PHP’s extension dir and wires up the ini.

Repo layout

src/lib.rs          module registration + compile-time platform guards
src/error.rs        VectorError + the PHP exception hierarchy
src/packed.rs       THE packed-vector decode/validation path (read this first)
src/vectors.rs      Vectors::pack / Vectors::unpack
src/result.rs       SearchResult + SearchResultIterator
src/turboquant.rs   TurboQuantIndex wrapper
src/idmap.rs        IdMapIndex wrapper
stubs/              IDE stubs — regenerate with `make stubs` after API changes
tests/phpt/         the PHPT suite (see Testing)

Design invariants worth knowing before changing code: every upstream panic condition is guarded at the PHP boundary (see the module docs in idmap.rs/packed.rs), and the packed decode path deliberately copies — the rationale is in packed.rs’s header comment.

Testing

End-to-end coverage lives in tests/phpt/ and runs through PHP’s own test harness against the just-built extension — cargo test would only see Rust, and the things worth testing here (zval conversions, exception classes, interface wiring) only exist inside a real PHP.

make test

That builds, sanity-loads the extension, and runs the suite. The harness itself (run-tests.php) isn’t bundled with binary PHP distributions, so the first run fetches the copy matching your PHP minor from php-src (set RUN_TESTS_PHP=/path/to/run-tests.php to use your own). Failures leave .diff/.out artifacts next to the .phpt files (gitignored).

Suite conventions

No downloads, no RNG. Vectors come from the seeded LCG in tests/phpt/vectors.inc — the same seed produces the same packed bytes on every platform. The whole suite runs in seconds.
Every public method is covered, including its failure modes; the partial-application guarantees (“a throwing call adds nothing”) are asserted, not assumed.
Output is label: yes lines compared with --EXPECT-- — when a test fails, the diff names exactly which property broke.
New public surface lands behind a PHPT that fails before the implementation and passes after (see PLAN.md’s working agreements).

Rust-side checks

make clippy       # -D warnings, enforced in CI
make fmt-check

There are no Rust unit tests by design: the crate is a thin binding layer, and cargo test can’t link Zend symbols anyway (the same constraint ext-infer documents). Logic worth unit-testing lives upstream in the turbovec crate, which has its own suite.

Releasing

The full, authoritative flow lives in RELEASE.md at the repo root. The short version:

Bump Cargo.toml’s version; cargo update --workspace.
Run the pre-flight checks: cargo fmt --all --check, cargo clippy --all-targets -- -D warnings, make test, composer validate composer.json. PHPTs must pass on macOS arm64 locally before any tag.
Commit, push, then git tag vX.Y.Z && git push --tags.
The tag triggers .github/workflows/release.yml: nine jobs (PHP 8.3/8.4/8.5 × macos-arm64/linux-x86_64/linux-arm64) build release binaries and attach PIE-named tarballs + sha256 sidecars to a draft GitHub Release.
Review the draft, write notes (remember the Linux libopenblas0 caveat and the upstream-pin/format statement), publish.

Two release-specific rules:

Stable tags are immutable on Packagist. A broken release means a new patch version, never a re-tag.
Upstream pin bumps are release events. If turbovec moves, the notes must state whether existing .tv/.tvim files remain loadable.

Keyboard shortcuts

ext-turbovec