# HNSW + Typed Edges: Why Feather DB Combines ANN Search with Graph Traversal > Approximate nearest-neighbor search finds similar things. Graph traversal finds connected things. A Living Context Engine needs both — and the way Feather DB fuses them into a single retrieval kernel is the real architectural unlock. - **Category**: Technical Deep Dive - **Read time**: 15 min read - **Date**: May 14, 2026 - **Author**: Feather DB Engineering (Engineering Team) - **URL**: https://getfeather.store/theory/hnsw-typed-edges-fusion --- # HNSW + Typed Edges: Why Feather DB Combines ANN Search with Graph Traversal *Architecture Deep Dive · Hybrid Retrieval · May 2026* --- ## The Two Halves of a Useful Query A useful retrieval over business context answers two questions at once: *what is similar* to my query, and *what is connected* to those similar things. Pure vector search answers the first. Pure graph traversal answers the second. Neither, on its own, answers what an agent actually needs. Consider a concrete query: "What is our competitive response to Brand X's product launch?" A vector search will surface the competitor launch document, the response brief, and the post-mortem — semantically similar, but topologically scattered. A graph traversal will surface every document linked to the brand-x node — but only if you already knew which node to start from. The agent does not know the entry point. It only knows the question. The Feather DB `context_chain` API fuses both into a single retrieval. Phase 1 is ANN search to find seed nodes. Phase 2 is bounded BFS along typed edges from each seed. The output is not a list — it is a connected subgraph of context, returned in a single call. ## Why HNSW HNSW (Hierarchical Navigable Small World) is the standard answer for ANN search in 2026 because it dominates the recall-vs-latency frontier for the dimensions and scales context engines see (typically 768–1536 dimensions, 10k–10M nodes). Feather's HNSW implementation is conventional: multi-layer skip-list-shaped graph, greedy descent from the entry point, configurable `ef_search` trading recall for latency. Three Feather-specific design choices matter: - **Slab-allocated neighbor lists.** Each layer's neighbors are a contiguous u32 slab, indexed by node ID. Insert/delete maintain free-lists so the slab never fragments. - **Lazy distance computation.** The actual SIMD distance is only computed for nodes the BFS visits — not the entire candidate set. - **Score-blended exit condition.** Search stops when the next-best candidate's *final score* (similarity ⊗ decay ⊗ importance) falls below a threshold, not when its raw similarity does. This means stale-but-similar candidates exit the search early without polluting results. ## Why Typed Edges Edges are not generic links. Each edge has a type that says what the link *means*: ```python { "derived_from": "this was created from that", "responds_to": "this exists in response to that", "contradicts": "this conflicts with that", "variant_of": "this is a sibling of that", "supersedes": "this replaces that", "evidence_for": "this supports that" } ``` Typed edges are what let a graph traversal be selective. "Find all context derived from this brief" is a different query from "find all context that contradicts this brief." Without types, the graph collapses to a single homogeneous neighbor set and the traversal returns noise. ## The Fused Kernel The retrieval pipeline is: ```text query_vec │ ▼ ANN over HNSW ──→ top-k seed nodes (scored) │ ▼ typed BFS from each seed ──→ hop-1, hop-2 neighbors (scored) │ ▼ score merge + dedupe ──→ final context chain ``` The output preserves which nodes were seeds versus traversed neighbors, and which edge types each neighbor was reached through. A consumer of the API can render not just a list of relevant chunks but the *shape* of relevance — "here is the brief, here are the executions derived from it, here are the post-mortems that responded to it." ### Bounded Traversal BFS from k seeds with average degree d at h hops scans `k * d^h` nodes in the worst case. With k=5, d=8, h=2, that's 320 nodes. With h=3, 2,560. Feather caps total traversal candidates at a configurable budget (default 4,096) and prunes by score during the walk. In practice, h=2 captures the connected subgraph that matters for ~95% of queries; h=3 is useful only for explicitly transitive questions. ### Edge Type Filters The `context_chain` call accepts an edge type filter. Passing `edge_types=["derived_from", "responds_to"]` restricts the traversal to a subset of relationship types. This is how an agent narrows a query from "anything connected to this" to "everything that responds to this" without two round trips. ## What This Replaces The fusion of ANN search and typed graph traversal replaces three patterns that fail in practice: - **The two-database pattern.** A vector DB plus a graph DB plus a service that joins them. Two systems, two indexes, two failure modes, joined by a network hop. - **The metadata-filter pattern.** A vector DB with rich metadata, filtered post-hoc. Works for simple categorical filters; fails for any query that needs traversal. - **The agent-orchestration pattern.** The LLM iteratively retrieves, reads, then retrieves more. Burns tokens and time; only as good as the agent's prompt. The fused kernel does one round trip, in one process, with deterministic latency. The agent gets back the connected subgraph, formatted as a chain, scored end-to-end. That is the architectural unlock — not faster vector search, but a fundamentally different shape of retrieval. --- *Part of the architecture series. Next: [Single-File Storage](/theory/single-file-storage-context-engine).* --- *This is the machine-readable mirror of the theory post at [getfeather.store/theory/hnsw-typed-edges-fusion](https://getfeather.store/theory/hnsw-typed-edges-fusion). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*