# What Is HNSW Indexing? The Algorithm Behind Fast Vector Search

> HNSW (Hierarchical Navigable Small World) is the multi-layer proximity graph that makes sub-millisecond vector search possible at scale. Here is how it works, why it beats brute force at 500K+ vectors, and how Feather DB implements it — including graph persistence, adaptive capacity, int8 quantization, and SIMD distance kernels.

- **Category**: Architecture
- **Read time**: 7 min read
- **Date**: June 16, 2026
- **Author**: Ashwath (Founder, Feather DB)
- **URL**: https://getfeather.store/theory/what-is-hnsw-graph-indexing

---

ParameterFeather DB defaultEffect of increasing
  
  
    M16Higher recall, more memory, slower inserts
    ef_construction200Better graph quality, slower builds
    ef_search64Higher recall, higher latency (tunable per query)
  

---

## Recall vs Speed: The Real Tradeoff

HNSW is an *approximate* nearest neighbor algorithm. It does not guarantee returning the exact top-*k* nearest neighbors. It returns a very good approximation of them.

Recall@10 = 97.2% means that for 97.2% of queries, all 10 results returned by HNSW are within the true 10 nearest neighbors. The 2.8% of "missed" neighbors are typically at ranks 9–10 and are often semantically equivalent to what was returned. In retrieval-augmented generation, the difference is imperceptible.

The alternative — flat search — achieves 100% recall by doing O(n) distance computations. At 500K vectors with 768-dimensional embeddings, flat search takes approximately 50–100ms per query. HNSW returns results in 0.19ms. The 2.8% recall cost is the price of a 250–500× speedup.

You can recover recall by increasing `ef_search`. At `ef_search=128`, recall@10 approaches 99% at roughly 0.35ms — still 150× faster than brute force.

---

## How Feather DB Implements HNSW

### Graph Persistence in v0.16.0

Before v0.16.0, opening a `.feather` file meant deserializing vectors and rebuilding the HNSW graph from scratch on every process start. On a 500K-vector index this took 2.7 seconds — acceptable for long-running servers, painful for serverless functions and short-lived agents.

v0.16.0 introduced serialized graph persistence. The HNSW adjacency structure — every node's neighbor lists across every layer — is now written directly into the `.feather` binary format alongside the vectors. On load, the graph is memory-mapped and ready without any rebuild step.

Cold load time: **48ms**. Rebuild time: **2.7s**. That is a 56× improvement on startup latency.

```python
import feather_db

# v0.16.0: graph is loaded from disk, not rebuilt
db = feather_db.DB.open("index.feather", dim=768)
# ready in 48ms — not 2.7s
results = db.search(query_vec, k=10)

```

### Adaptive Index Capacity

HNSW indexes pre-allocate memory for a fixed maximum number of nodes. Allocating for 1 million nodes at startup when you have 1,000 nodes wastes memory — the index structure is proportional to capacity, not occupancy.

Feather DB uses adaptive capacity growth. The index starts with 4,096 slots. When it fills, capacity doubles. This follows the same growth strategy as dynamic arrays and avoids the overhead of large pre-allocations for small indexes.

The impact: a 10,000-vector index uses **7.7× less memory** than one pre-allocated for 100,000 nodes. Memory is only committed as the index grows into it.

### int8 Quantization

By default, Feather DB stores vectors as 32-bit floats. int8 quantization compresses each dimension to an 8-bit integer, reducing per-vector storage from 4 bytes/dim to 1 byte/dim — a **1.7× reduction in index memory footprint** (accounting for the quantization scale factor stored per vector).

HNSW works directly with int8 vectors. Distance computation uses integer arithmetic, which is faster on most hardware than floating-point. The recall cost is typically <1pp at the same `ef_search`.

```python
db = feather_db.DB.open("index.feather", dim=768, quantize="int8")
db.add(id=1, vec=float_vec)   # quantized automatically on insert
results = db.search(query_vec, k=10)  # query quantized at search time

```

### SSE/AVX SIMD Distance Kernels

Every HNSW search involves hundreds of L2 distance computations between the query vector and candidate neighbors. On x86, Feather DB dispatches SIMD-accelerated kernels for these computations at runtime.

The dispatch sequence:

  - Check CPU feature flags at startup (SSE4.2 → AVX2 → AVX-512)

  - Select the widest available kernel

  - All distance computations route through it for the life of the process

AVX-512 processes 16 floats per instruction cycle versus 1 float in scalar code. On an AVX-512-capable machine (most modern x86 server CPUs), this produces a 4–8× throughput improvement for distance computation alone. The 0.19ms p50 latency is measured on AVX2-capable hardware.

---

## FAQ

### HNSW vs Flat Search — When to Use Each

**Use flat search** when your index is under ~50K vectors, you need exact results (not approximate), or your query budget is high and you prioritize simplicity. Flat search is exact, always correct, and has zero build cost.

**Use HNSW** when your index exceeds 50K vectors, latency matters, or you are operating at any scale where O(n) is a problem. At 500K vectors, HNSW is 250–500× faster than flat search with only a 2.8% recall cost.

Feather DB automatically uses HNSW for all indexes. If you need exact search on a small index, set `ef_search` very high (equal to index size) to force exhaustive candidate evaluation — though at that point flat search is simpler.

### HNSW vs IVF (Inverted File Index)

IVF partitions the vector space into clusters (Voronoi cells) using k-means. Search probes the nearest clusters and scans vectors inside them. It requires a separate training step on representative data before the index is usable, and recall drops if the query's nearest neighbors span multiple clusters that were not probed.

HNSW needs no training step. Vectors can be inserted one at a time in any order, which makes it natural for dynamic, online-updated indexes like AI agent memory. IVF is more memory-efficient at very large scales (tens of millions of vectors) where HNSW's graph adjacency lists dominate memory. For the 10K–5M range that most production AI applications occupy, HNSW is the better default.

### Why Is 97.2% Recall@10 Considered Good?

Because the alternative — achieving 100% recall — requires flat search at 250–500× the latency cost. And because the "missed" results are rarely meaningfully different from what was returned.

In semantic search and RAG, vectors at rank 10 and rank 11 are typically nearly equivalent in meaning. The LLM reading the retrieved context cannot tell the difference. What it can tell is the difference between a 0.19ms retrieval and a 100ms retrieval — the latter makes conversational agents feel sluggish.

97.2% recall@10 at 0.19ms p50 is a practical Pareto optimum for production AI workloads. You can push recall higher with a larger `ef_search`, or push latency lower with a smaller one, depending on your workload's sensitivity.

### Does HNSW Work With Different Embedding Dimensions?

Yes. The graph structure is dimension-agnostic — it connects nodes based on their distances, not their specific coordinates. Feather DB supports dimensions from 64 (compact models like SPLADE sparse approximations) to 3072 (OpenAI text-embedding-3-large) and beyond. The `dim` parameter is set once at index creation and is fixed for the life of the file.

Higher dimensions generally need a slightly larger `M` to maintain recall, because the curse of dimensionality makes nearest neighbors less distinguishable. In practice, the default `M=16` works well up to 1536 dimensions.

### Can I Change HNSW Parameters After Building the Index?

`M` and `ef_construction` are fixed at build time. Changing them requires rebuilding the index. `ef_search` can be changed per-query without any rebuild. If you find your recall is lower than expected, try increasing `ef_search` first — it is free to tune at query time.

---

## Summary

HNSW is the reason sub-millisecond vector search is possible at production scale. The algorithm's multi-layer structure provides O(log n) search complexity: sparse top layers for fast long-range navigation, a dense bottom layer for precise local lookup.

Feather DB's implementation adds four things on top of the baseline algorithm:

  - **Graph persistence** — 48ms cold load vs 2.7s rebuild, written directly into the `.feather` binary

  - **Adaptive capacity** — starts at 4,096 slots, grows on demand, 7.7× memory savings for small indexes

  - **int8 quantization** — 1.7× memory reduction with <1pp recall cost

  - **SIMD dispatch** — SSE/AVX L2 kernels on x86, selected automatically at startup

The result: 97.2% recall@10 at 0.19ms (p50) on 500K vectors, in an embedded database with no server, no infrastructure, and a single `.feather` file.

```bash
pip install feather-db

```

]]>

---

*This is the machine-readable mirror of the theory post at [getfeather.store/theory/what-is-hnsw-graph-indexing](https://getfeather.store/theory/what-is-hnsw-graph-indexing). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*