# HNSW + Typed Edges: Why Feather DB Combines ANN Search with Graph Traversal

> Approximate nearest-neighbor search finds similar things. Graph traversal finds connected things. A Living Context Engine needs both — and the way Feather DB fuses them into a single retrieval kernel is the real architectural unlock.

- **Category**: Technical Deep Dive
- **Read time**: 15 min read
- **Date**: May 14, 2026
- **Author**: Feather DB Engineering (Engineering Team)
- **URL**: https://getfeather.store/theory/hnsw-typed-edges-fusion

---

# HNSW + Typed Edges: Why Feather DB Combines ANN Search with Graph Traversal

*Architecture Deep Dive · Hybrid Retrieval · May 2026*

---

## The Two Halves of a Useful Query

A useful retrieval over business context answers two questions at once: *what is similar* to my query, and *what is connected* to those similar things. Pure vector search answers the first. Pure graph traversal answers the second. Neither, on its own, answers what an agent actually needs.

Consider a concrete query: "What is our competitive response to Brand X's product launch?" A vector search will surface the competitor launch document, the response brief, and the post-mortem — semantically similar, but topologically scattered. A graph traversal will surface every document linked to the brand-x node — but only if you already knew which node to start from. The agent does not know the entry point. It only knows the question.

The Feather DB `context_chain` API fuses both into a single retrieval. Phase 1 is ANN search to find seed nodes. Phase 2 is bounded BFS along typed edges from each seed. The output is not a list — it is a connected subgraph of context, returned in a single call.

## Why HNSW

HNSW (Hierarchical Navigable Small World) is the standard answer for ANN search in 2026 because it dominates the recall-vs-latency frontier for the dimensions and scales context engines see (typically 768–1536 dimensions, 10k–10M nodes). Feather's HNSW implementation is conventional: multi-layer skip-list-shaped graph, greedy descent from the entry point, configurable `ef_search` trading recall for latency.

Three Feather-specific design choices matter:

- **Slab-allocated neighbor lists.** Each layer's neighbors are a contiguous u32 slab, indexed by node ID. Insert/delete maintain free-lists so the slab never fragments.

- **Lazy distance computation.** The actual SIMD distance is only computed for nodes the BFS visits — not the entire candidate set.

- **Score-blended exit condition.** Search stops when the next-best candidate's *final score* (similarity ⊗ decay ⊗ importance) falls below a threshold, not when its raw similarity does. This means stale-but-similar candidates exit the search early without polluting results.

## Why Typed Edges

Edges are not generic links. Each edge has a type that says what the link *means*:

```python
{
  "derived_from":   "this was created from that",
  "responds_to":    "this exists in response to that",
  "contradicts":    "this conflicts with that",
  "variant_of":     "this is a sibling of that",
  "supersedes":     "this replaces that",
  "evidence_for":   "this supports that"
}

```

Typed edges are what let a graph traversal be selective. "Find all context derived from this brief" is a different query from "find all context that contradicts this brief." Without types, the graph collapses to a single homogeneous neighbor set and the traversal returns noise.

## The Fused Kernel

The retrieval pipeline is:

```text
query_vec
    │
    ▼
ANN over HNSW           ──→  top-k seed nodes (scored)
    │
    ▼
typed BFS from each seed ──→  hop-1, hop-2 neighbors (scored)
    │
    ▼
score merge + dedupe    ──→  final context chain

```

The output preserves which nodes were seeds versus traversed neighbors, and which edge types each neighbor was reached through. A consumer of the API can render not just a list of relevant chunks but the *shape* of relevance — "here is the brief, here are the executions derived from it, here are the post-mortems that responded to it."

### Bounded Traversal

BFS from k seeds with average degree d at h hops scans `k * d^h` nodes in the worst case. With k=5, d=8, h=2, that's 320 nodes. With h=3, 2,560. Feather caps total traversal candidates at a configurable budget (default 4,096) and prunes by score during the walk. In practice, h=2 captures the connected subgraph that matters for ~95% of queries; h=3 is useful only for explicitly transitive questions.

### Edge Type Filters

The `context_chain` call accepts an edge type filter. Passing `edge_types=["derived_from", "responds_to"]` restricts the traversal to a subset of relationship types. This is how an agent narrows a query from "anything connected to this" to "everything that responds to this" without two round trips.

## What This Replaces

The fusion of ANN search and typed graph traversal replaces three patterns that fail in practice:

- **The two-database pattern.** A vector DB plus a graph DB plus a service that joins them. Two systems, two indexes, two failure modes, joined by a network hop.

- **The metadata-filter pattern.** A vector DB with rich metadata, filtered post-hoc. Works for simple categorical filters; fails for any query that needs traversal.

- **The agent-orchestration pattern.** The LLM iteratively retrieves, reads, then retrieves more. Burns tokens and time; only as good as the agent's prompt.

The fused kernel does one round trip, in one process, with deterministic latency. The agent gets back the connected subgraph, formatted as a chain, scored end-to-end. That is the architectural unlock — not faster vector search, but a fundamentally different shape of retrieval.

---

*Part of the architecture series. Next: [Single-File Storage](/theory/single-file-storage-context-engine).*

---

*This is the machine-readable mirror of the theory post at [getfeather.store/theory/hnsw-typed-edges-fusion](https://getfeather.store/theory/hnsw-typed-edges-fusion). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*