Feather DB Context Graph: Typed Edges, BFS Traversal, and context_chain

The limitation of vector search alone

Vector similarity finds semantically similar content. It's powerful but incomplete. Given a query about "the authentication bug," cosine similarity can find memory nodes whose embeddings are close to the query vector — nodes that discuss authentication, bugs, or both. But it cannot find a node titled "Fix merged in PR #114" if that fix's embedding doesn't happen to be semantically close to "authentication bug" in the embedding space.

In practice, knowledge has structure. A bug leads to a fix. A hypothesis leads to an experiment. A user preference supports a design decision. A new policy supersedes an old one. These relationships are real and load-bearing — knowing the bug without knowing the fix is half the picture. But the relationships live in the structure of knowledge, not in the embedding geometry. A flat vector search can't traverse them.

Feather DB's context graph layer adds explicit, typed relationships between memory nodes. Combined with HNSW search, it enables context_chain(): a single API that retrieves semantically similar nodes and then traverses their graph neighborhood to surface structurally related context.

The nine built-in edge types

Feather DB ships with nine built-in edge types, each encoding a distinct semantic relationship:

Edge type	Meaning	Example
`supports`	Node A provides evidence for Node B	benchmark result supports design decision
`contradicts`	Node A conflicts with Node B	new finding contradicts prior hypothesis
`refines`	Node A is a more precise version of Node B	specific preference refines general preference
`leads_to`	Node A causally or temporally leads to Node B	bug report leads to PR
`same_session`	Nodes A and B occurred in the same session	two messages in one conversation
`same_ad`	Nodes A and B are from the same ad or campaign	ad copy and its performance data
`supersedes`	Node A replaces Node B	updated preference supersedes old one
`causes`	Node A is a cause of Node B	configuration change causes bug
`resolves`	Node A resolves Node B	fix resolves bug report

Edge types are directional. db.link(a, b, "leads_to") means a leads to b, not b leads to a. BFS traversal respects direction by default, though the API supports bidirectional traversal for undirected queries.

Creating edges with db.link()

Edges are created with a single call. There is no schema to define and no migration to run — edges can be added at any time between any two existing nodes.

import feather_db as fdb
import time

db = fdb.DB.open("coding_assistant.feather", dim=768)

def store(text: str, importance: float = 0.5) -> int:
    nid = int(time.time() * 1000) % (2**31)
    meta = fdb.Metadata(importance=importance)
    meta.set_attribute("text", text)
    db.add(id=nid, vec=embed(text), meta=meta)
    return nid

# Store a sequence of related events
bug_id = store("Async race condition in payment handler — intermittent 500 errors", importance=0.8)
diagnosis_id = store("Root cause: missing await on db.commit() in payment_service.py line 247", importance=0.85)
fix_id = store("Fix: added await to db.commit(), also added explicit transaction rollback", importance=0.8)
pr_id = store("PR #114 merged — payment handler async fix, includes regression test", importance=0.75)
test_id = store("Regression test added: test_payment_handler_concurrent_commits", importance=0.6)

# Wire up the relationships
db.link(bug_id, diagnosis_id, edge_type="leads_to")
db.link(diagnosis_id, fix_id, edge_type="leads_to")
db.link(fix_id, pr_id, edge_type="leads_to")
db.link(pr_id, test_id, edge_type="leads_to")
db.link(fix_id, bug_id, edge_type="resolves")
db.link(diagnosis_id, bug_id, edge_type="causes")  # root cause causes the bug

How context_chain() works

context_chain() is a two-phase retrieval algorithm:

Phase 1: ANN search. Run HNSW approximate nearest neighbor search to find the k nodes with highest final scores (combining semantic similarity, recency, stickiness, and importance). This is the seed set.

Phase 2: BFS traversal. Starting from each seed node, perform breadth-first search over the context graph up to hops steps. All nodes reachable within the hop limit are added to the result set, deduplicated, and returned sorted by their traversal distance (1-hop neighbors before 2-hop, etc.).

The result is a list of nodes that are either semantically similar to the query or structurally connected to semantically similar nodes — the full context cluster.

# Query: developer asks about the payment bug
query = "What was the issue with the payment handler?"
query_vec = embed(query)

# context_chain: ANN seed + BFS expansion
chain = db.context_chain(
    query_vec,
    k=3,      # top-3 ANN seeds
    hops=2,   # expand 2 hops from each seed
    half_life=30,
    time_weight=0.3
)

for node in chain:
    text = node.meta.get_attribute("text") if node.meta else ""
    print(f"[score={node.score:.3f}] {text}")

# Output (paraphrased):
# [score=0.921] Async race condition in payment handler — intermittent 500 errors
# [score=0.887] Root cause: missing await on db.commit() in payment_service.py line 247
# [score=0.834] Fix: added await to db.commit(), also added explicit transaction rollback
# [score=0.791] PR #114 merged — payment handler async fix, includes regression test
# [score=0.743] Regression test added: test_payment_handler_concurrent_commits

The ANN search found the bug node (highest semantic similarity). BFS traversal from the bug node via leads_to edges surfaced the diagnosis, fix, PR, and test — even though the query phrase "payment handler" might not embed close to "regression test" or "PR #114."

Worked example: a coding assistant with linked decisions

Here is a more complete example: a coding assistant that links architectural decisions to their rationale and consequences.

import feather_db as fdb
import time

db = fdb.DB.open("arch_decisions.feather", dim=768)

def node(text, importance=0.6):
    nid = int(time.time() * 1000) % (2**31)
    time.sleep(0.001)  # ensure unique ms timestamps
    meta = fdb.Metadata(importance=importance)
    meta.set_attribute("text", text)
    db.add(id=nid, vec=embed(text), meta=meta)
    return nid

# Architecture decision record
requirement = node("System must handle 10K concurrent WebSocket connections", 0.9)
benchmark = node("asyncio handles 10K concurrent connections at 0.8ms p99 on t3.xlarge", 0.8)
decision = node("Chose asyncio + FastAPI over thread-per-connection Flask", 0.9)
consequence = node("All handlers must be async — no blocking calls allowed in request path", 0.85)
violation = node("Bug: requests.get() call in auth middleware blocks event loop", 0.8)
resolution = node("Fix: replaced requests.get() with httpx.AsyncClient in auth.py", 0.8)

# Wire the knowledge graph
db.link(requirement, benchmark, "leads_to")
db.link(benchmark, decision, "supports")
db.link(decision, consequence, "leads_to")
db.link(consequence, violation, "leads_to")
db.link(resolution, violation, "resolves")
db.link(violation, consequence, "causes")

# Later: developer asks about the auth middleware bug
chain = db.context_chain(embed("auth middleware blocking"), k=2, hops=3)
# Surfaces: violation, consequence, decision, benchmark, resolution
# The developer sees the full chain: why async was chosen → what that requires →
# how the bug violated it → how it was fixed

Filtering traversal by edge type

BFS traversal can be filtered to follow only specific edge types. This is useful when you want to trace only causal chains, or only supersession history.

# Only follow 'leads_to' and 'resolves' edges in BFS
chain = db.context_chain(
    query_vec,
    k=3,
    hops=2,
    edge_types=["leads_to", "resolves"]  # filter BFS to these types
)

# Only follow 'supersedes' edges — get the history of a fact
chain = db.context_chain(
    query_vec,
    k=1,
    hops=5,
    edge_types=["supersedes"]  # trace the full update history
)

The context graph as first-class architecture

Most vector databases treat their storage as a flat collection of embedding points. Relationships — if supported at all — are implemented as metadata filters, not first-class graph traversal. This means the structural knowledge encoded in relationships is invisible to retrieval.

Feather DB treats the context graph as a first-class retrieval structure, co-equal with the HNSW index. context_chain() is not a post-processing step on top of vector search — it's a unified algorithm where graph structure and vector geometry both contribute to what gets surfaced.

For AI agent memory, this means the agent's knowledge is not a pile of semantically indexed fragments. It's a structured graph of facts, relationships, decisions, and consequences — and retrieval can traverse that structure to surface the right context for any query.

Install: pip install feather-db · GitHub: github.com/feather-store/feather