v0.16.0: Persisted HNSW Graph — 5–25× Faster Cold Load

What changed

Before v0.16.0, every load() call threw away the HNSW graph and rebuilt it from scratch. For small indexes this was a rounding error. For anything over a few thousand vectors it was a cold-start tax you paid every time your process restarted — parallel rebuild at 2.7s, serial at 13.4s, on a 40k×128-dim clustered dataset.

v0.16.0 fixes this at the file-format level. save() now serialises the prebuilt HNSW link lists directly into the .feather file. On the next load() the graph is memory-mapped back in — 48ms for that same 40k dataset. No rebuild, no warmup, no latency cliff.

File format bumps to v9. All v3–v8 files load without modification.

Numbers first

Scenario	40k×128-dim cold load	Recall@10
Serial rebuild (pre-v0.16.0 baseline)	13.4s	0.988
Parallel rebuild (pre-v0.16.0 fast path)	2.7s	0.963
Persisted graph (v0.16.0)	48ms	0.988

Across typical workloads the speedup is 5–25× depending on index size and hardware. Recall is preserved at serial-build quality (0.988) because the exact graph that was built at insert time is what gets reloaded — no approximation on the way back in.

File sizes increase roughly 25% due to the embedded link lists. That is the only cost.

Before and after

Before v0.16.0 — rebuild every cold start

import featherdb

db = featherdb.open("embeddings.feather")
db.add(vectors, ids)
db.save()  # saves vectors + metadata, no graph

# --- next process start ---

db = featherdb.open("embeddings.feather")
# always triggers parallel HNSW rebuild: ~2.7s for 40k vecs
results = db.search(query, k=10)

After v0.16.0 — graph persisted, instant load

import featherdb

db = featherdb.open("embeddings.feather")
db.add(vectors, ids)
db.save()  # now embeds HNSW graph into the file (format v9)

# --- next process start ---

db = featherdb.open("embeddings.feather")
# graph loaded from file: ~48ms for 40k vecs, no rebuild
results = db.search(query, k=10)

No API changes. No flags to set. If the conditions for persistence are met, save() writes the graph and load() restores it.

When the graph persists vs. when it falls back

Persistence has three hard requirements. If any are violated, save() skips the graph and load() falls back to parallel rebuild automatically.

Index holds exactly the live set. No pending forget() or purge() calls that have not been flushed. The graph must be structurally consistent with the vector set on disk.
No on-disk quantization. Quantized modalities compress the stored vectors; the link lists reference positions in the full-precision space, so they cannot be trivially round-tripped through a quantization codec.
No pending forget/purge. Soft-deleted vectors leave ghost nodes in the graph. Persisting a graph with ghosts would restore stale routing paths on reload.

In-RAM int8 modalities are explicitly supported. The int8 graph is persisted with exact round-trip fidelity — positions are stored as int8, loaded as int8, no precision shift.

Each modality gets its own persist_graph flag in the v9 format, so a multi-modal index can persist some modalities and fall back on others independently.

The compact() workflow

After a forget() or purge() the graph is invalidated and persistence is disabled until the index is compacted. compact() rebuilds a clean graph over the surviving vectors and re-enables fast load for subsequent saves.

import featherdb

db = featherdb.open("embeddings.feather")

# remove stale entries — disables graph persistence
db.forget(old_ids)

# compact rebuilds a clean graph over surviving vectors
db.compact()

# save() now persists the clean graph again
db.save()

# --- next process start: fast load restored ---
db2 = featherdb.open("embeddings.feather")
# 48ms, not 2.7s
results = db2.search(query, k=10)

The compact/save/fast-load cycle is the recommended pattern for long-running indexes that periodically evict stale data.

Why recall improves

The previous fast path was parallel HNSW construction. Parallel build trades some graph quality for wall-clock speed — edges are inserted with less global visibility, producing a graph with slightly worse navigability. Recall@10 settled at 0.963.

Serial build has full global visibility at edge insertion time and reaches 0.988 recall. The persisted graph is the serial-build graph, frozen and reloaded exactly. You get serial-build quality at sub-100ms cold load time. The only way that was previously possible was to accept the 13.4s serial rebuild on every restart.

File format v9

v9 adds a per-modality header section that records:

persist_graph: bool — whether the link list block is present for this modality
Link list block — HNSW layer-0 and upper-layer neighbour arrays, length-prefixed per node
Entry point and ef_construction metadata needed to resume future inserts

v3–v8 files are detected on open and loaded with the existing rebuild path. No migration step required. If you want fast load on an existing file, add vectors, call compact(), then save() — the next write produces a v9 file.

What 30+ test cases cover

Persist → load → search round-trip, single and multi-modality
Fallback triggers: pending forget, pending purge, on-disk quantization
compact() re-enables persistence after each invalidation path
int8 graph round-trip fidelity (no precision shift)
Backward compat: v3, v5, v7, v8 files load and rebuild correctly under v0.16.0
Recall regression: persisted graph must match or exceed parallel-rebuild recall on reference datasets
File size delta: assert embedded graph stays within 30% overhead bound

Upgrade

pip install --upgrade feather-db

No code changes required for existing projects. The first save() after upgrade on an eligible index writes v9. Cold loads from that point on skip the rebuild.

If you need the old parallel-rebuild behaviour for any reason (e.g., you are size-constrained and cannot absorb the 25% file overhead), pass persist_graph=False to save():

db.save(persist_graph=False)  # opt out; falls back to parallel rebuild on load

]]>