Back to Theory
Architecture7 min read · June 16, 2026

What Is HNSW Indexing? The Algorithm Behind Fast Vector Search

HNSW (Hierarchical Navigable Small World) is the multi-layer proximity graph that makes sub-millisecond vector search possible at scale. Here is how it works, why it beats brute force at 500K+ vectors, and how Feather DB implements it — including graph persistence, adaptive capacity, int8 quantization, and SIMD distance kernels.

A
Ashwath
Founder, Feather DB
What Is HNSW Indexing? The Algorithm Behind Fast Vector Search

Architecture · Feather DB v0.16.0 · June 2026


The Problem: Brute Force Does Not Scale

Every vector database has the same core job: given a query vector, find the k most similar vectors in the index. The naive approach — compare the query against every stored vector — is called flat search or exhaustive search. It is exact. It is also O(n).

At 10,000 vectors and 768 dimensions, flat search is fine. At 500,000 vectors it starts to hurt. At 10 million it is a non-starter for anything requiring sub-second latency.

HNSW solves this. It trades a small, controlled accuracy loss for search complexity that is O(log n) — search time grows logarithmically with index size, not linearly. At 500K vectors, Feather DB achieves a p50 latency of 0.19ms with 97.2% recall@10. That is what HNSW makes possible.


What HNSW Is

HNSW stands for Hierarchical Navigable Small World. It is an approximate nearest neighbor (ANN) index built as a multi-layer proximity graph. The algorithm was introduced by Malkov and Yashunin (2016) and has since become the de facto ANN algorithm in production vector databases.

The core idea comes from two observations:

  1. Small-world graphs — a type of graph where any node can reach any other node in a small number of hops — allow fast greedy search.
  2. Layering — by building multiple copies of the graph at different densities, you can navigate from a coarse high-level view to fine-grained precision, the same way a map zooms from continent to street.

Put them together and you get a data structure where search starts at the top of a sparse hierarchy and drills down to an exact neighborhood.


The Layer Structure

An HNSW index is a stack of layers, numbered from 0 (bottom) to L (top). Each layer is a graph of nodes connected by edges to their nearest neighbors.

Top Layers — Coarse Navigation

Higher layers contain fewer nodes. A given vector is assigned to layer l with probability that decreases exponentially with l, so most vectors exist only in layer 0 and only a handful exist in the top layers. Search starts here. Because the top layers are sparse, greedy traversal covers large distances in vector space quickly — each hop skips over many nearby vectors to land closer to the query region.

Bottom Layer — Full Precision

Layer 0 contains every vector in the index, each connected to its M nearest neighbors. Once the top-layer search has navigated into the right neighborhood, search moves to layer 0 and finds the exact local nearest neighbors among a small candidate set.

The result is fast coarse navigation followed by precise local refinement — all within a single traversal.

A rough mental model:

Layer 3  ●                     ●             (2 nodes, long-range hops)
Layer 2  ●         ●     ●     ●             (4 nodes, medium hops)
Layer 1  ● ●   ● ● ●   ● ●   ●●             (many nodes, short hops)
Layer 0  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●   (all nodes, full precision)

Query enters at the top, exits at the bottom.


Key Parameters

Three parameters control the HNSW index. You set two at build time and one at query time.

M — Connections Per Node

Each node in the graph is connected to at most M neighbors per layer (2×M in layer 0). Higher M improves recall and reduces the number of hops needed to reach a query's nearest neighbors, at the cost of more memory and slower inserts.

Feather DB defaults to M=16. This is the sweet spot for most embedding dimensions (384–1536). For very high-dimensional vectors (3072+), M=32 may improve recall without much latency cost.

ef_construction — Build-Time Beam Width

ef_construction controls how many candidates the algorithm evaluates when inserting a new node. Larger values produce a denser, higher-quality graph that is more accurate at query time, but slow down index building.

Feather DB defaults to ef_construction=200. Halving it roughly halves insert time at a ~1–2pp recall cost. Doubling it past 400 yields diminishing returns.

ef_search — Query-Time Beam Width

ef_search is the only parameter you can tune without rebuilding the index. It controls how many candidates the greedy search explores before returning results. ef_search must be ≥ k (the number of results you want). Higher values improve recall at the cost of latency — you are doing more work per query.

The 97.2% recall@10 result at 0.19ms (p50) is measured with ef_search=64 on 500K vectors.

ParameterFeather DB defaultEffect of increasing
M16Higher recall, more memory, slower inserts
ef_construction200Better graph quality, slower builds
ef_search64Higher recall, higher latency (tunable per query)

Recall vs Speed: The Real Tradeoff

HNSW is an approximate nearest neighbor algorithm. It does not guarantee returning the exact top-k nearest neighbors. It returns a very good approximation of them.

Recall@10 = 97.2% means that for 97.2% of queries, all 10 results returned by HNSW are within the true 10 nearest neighbors. The 2.8% of "missed" neighbors are typically at ranks 9–10 and are often semantically equivalent to what was returned. In retrieval-augmented generation, the difference is imperceptible.

The alternative — flat search — achieves 100% recall by doing O(n) distance computations. At 500K vectors with 768-dimensional embeddings, flat search takes approximately 50–100ms per query. HNSW returns results in 0.19ms. The 2.8% recall cost is the price of a 250–500× speedup.

You can recover recall by increasing ef_search. At ef_search=128, recall@10 approaches 99% at roughly 0.35ms — still 150× faster than brute force.


How Feather DB Implements HNSW

Graph Persistence in v0.16.0

Before v0.16.0, opening a .feather file meant deserializing vectors and rebuilding the HNSW graph from scratch on every process start. On a 500K-vector index this took 2.7 seconds — acceptable for long-running servers, painful for serverless functions and short-lived agents.

v0.16.0 introduced serialized graph persistence. The HNSW adjacency structure — every node's neighbor lists across every layer — is now written directly into the .feather binary format alongside the vectors. On load, the graph is memory-mapped and ready without any rebuild step.

Cold load time: 48ms. Rebuild time: 2.7s. That is a 56× improvement on startup latency.

import feather_db

# v0.16.0: graph is loaded from disk, not rebuilt
db = feather_db.DB.open("index.feather", dim=768)
# ready in 48ms — not 2.7s
results = db.search(query_vec, k=10)

Adaptive Index Capacity

HNSW indexes pre-allocate memory for a fixed maximum number of nodes. Allocating for 1 million nodes at startup when you have 1,000 nodes wastes memory — the index structure is proportional to capacity, not occupancy.

Feather DB uses adaptive capacity growth. The index starts with 4,096 slots. When it fills, capacity doubles. This follows the same growth strategy as dynamic arrays and avoids the overhead of large pre-allocations for small indexes.

The impact: a 10,000-vector index uses 7.7× less memory than one pre-allocated for 100,000 nodes. Memory is only committed as the index grows into it.

int8 Quantization

By default, Feather DB stores vectors as 32-bit floats. int8 quantization compresses each dimension to an 8-bit integer, reducing per-vector storage from 4 bytes/dim to 1 byte/dim — a 1.7× reduction in index memory footprint (accounting for the quantization scale factor stored per vector).

HNSW works directly with int8 vectors. Distance computation uses integer arithmetic, which is faster on most hardware than floating-point. The recall cost is typically <1pp at the same ef_search.

db = feather_db.DB.open("index.feather", dim=768, quantize="int8")
db.add(id=1, vec=float_vec)   # quantized automatically on insert
results = db.search(query_vec, k=10)  # query quantized at search time

SSE/AVX SIMD Distance Kernels

Every HNSW search involves hundreds of L2 distance computations between the query vector and candidate neighbors. On x86, Feather DB dispatches SIMD-accelerated kernels for these computations at runtime.

The dispatch sequence:

  1. Check CPU feature flags at startup (SSE4.2 → AVX2 → AVX-512)
  2. Select the widest available kernel
  3. All distance computations route through it for the life of the process

AVX-512 processes 16 floats per instruction cycle versus 1 float in scalar code. On an AVX-512-capable machine (most modern x86 server CPUs), this produces a 4–8× throughput improvement for distance computation alone. The 0.19ms p50 latency is measured on AVX2-capable hardware.


FAQ

HNSW vs Flat Search — When to Use Each

Use flat search when your index is under ~50K vectors, you need exact results (not approximate), or your query budget is high and you prioritize simplicity. Flat search is exact, always correct, and has zero build cost.

Use HNSW when your index exceeds 50K vectors, latency matters, or you are operating at any scale where O(n) is a problem. At 500K vectors, HNSW is 250–500× faster than flat search with only a 2.8% recall cost.

Feather DB automatically uses HNSW for all indexes. If you need exact search on a small index, set ef_search very high (equal to index size) to force exhaustive candidate evaluation — though at that point flat search is simpler.

HNSW vs IVF (Inverted File Index)

IVF partitions the vector space into clusters (Voronoi cells) using k-means. Search probes the nearest clusters and scans vectors inside them. It requires a separate training step on representative data before the index is usable, and recall drops if the query's nearest neighbors span multiple clusters that were not probed.

HNSW needs no training step. Vectors can be inserted one at a time in any order, which makes it natural for dynamic, online-updated indexes like AI agent memory. IVF is more memory-efficient at very large scales (tens of millions of vectors) where HNSW's graph adjacency lists dominate memory. For the 10K–5M range that most production AI applications occupy, HNSW is the better default.

Why Is 97.2% Recall@10 Considered Good?

Because the alternative — achieving 100% recall — requires flat search at 250–500× the latency cost. And because the "missed" results are rarely meaningfully different from what was returned.

In semantic search and RAG, vectors at rank 10 and rank 11 are typically nearly equivalent in meaning. The LLM reading the retrieved context cannot tell the difference. What it can tell is the difference between a 0.19ms retrieval and a 100ms retrieval — the latter makes conversational agents feel sluggish.

97.2% recall@10 at 0.19ms p50 is a practical Pareto optimum for production AI workloads. You can push recall higher with a larger ef_search, or push latency lower with a smaller one, depending on your workload's sensitivity.

Does HNSW Work With Different Embedding Dimensions?

Yes. The graph structure is dimension-agnostic — it connects nodes based on their distances, not their specific coordinates. Feather DB supports dimensions from 64 (compact models like SPLADE sparse approximations) to 3072 (OpenAI text-embedding-3-large) and beyond. The dim parameter is set once at index creation and is fixed for the life of the file.

Higher dimensions generally need a slightly larger M to maintain recall, because the curse of dimensionality makes nearest neighbors less distinguishable. In practice, the default M=16 works well up to 1536 dimensions.

Can I Change HNSW Parameters After Building the Index?

M and ef_construction are fixed at build time. Changing them requires rebuilding the index. ef_search can be changed per-query without any rebuild. If you find your recall is lower than expected, try increasing ef_search first — it is free to tune at query time.


Summary

HNSW is the reason sub-millisecond vector search is possible at production scale. The algorithm's multi-layer structure provides O(log n) search complexity: sparse top layers for fast long-range navigation, a dense bottom layer for precise local lookup.

Feather DB's implementation adds four things on top of the baseline algorithm:

  • Graph persistence — 48ms cold load vs 2.7s rebuild, written directly into the .feather binary
  • Adaptive capacity — starts at 4,096 slots, grows on demand, 7.7× memory savings for small indexes
  • int8 quantization — 1.7× memory reduction with <1pp recall cost
  • SIMD dispatch — SSE/AVX L2 kernels on x86, selected automatically at startup

The result: 97.2% recall@10 at 0.19ms (p50) on 500K vectors, in an embedded database with no server, no infrastructure, and a single .feather file.

pip install feather-db
]]>