Back to Theory
Architecture7 min read · June 16, 2026

Context Graphs in Feather DB: Typed Edges for Knowledge Networks

Vectors tell you what is similar. Graphs tell you what is related. Feather DB's context graph layer adds typed, weighted, directional edges to vector nodes — so your AI agent doesn't just find nearby memories, it understands why they're connected.

A
Ashwath
Founder, Feather DB

Context Graphs in Feather DB: Typed Edges for Knowledge Networks

Architecture Deep Dive · Feather DB v0.7.0 · June 2026


The problem with "similar"

Vector similarity is a remarkable thing. You embed a sentence, find its nearest neighbors in high-dimensional space, and the results feel almost like understanding. Conceptually adjacent ideas cluster together. The model figured it out.

But similarity is not relationship. And that gap matters more than it looks.

Consider a campaign memory with three nodes: a creative brief, a performance report, and a strategy document. The brief and the report are about the same campaign — they should be linked. The strategy document is semantically similar to the brief — but it contradicts the brief's core hypothesis. That distinction, "same campaign" vs. "contradicts," is exactly what a vector index cannot tell you.

Vectors encode what something is about. Edges encode how things relate.

That's what Feather DB's context graph layer exists to solve.


What is a context graph?

In Feather DB, every vector node can participate in a typed, weighted, directional graph. You store a vector as normal. Then you declare relationships between nodes using db.link().

import feather_db

db = feather_db.DB.open("campaign.feather", dim=768)

# Add nodes
db.add(id=1001, vec=embed("Q2 creative brief: festive hooks, Rs 999 offer"), meta=brief_meta)
db.add(id=1002, vec=embed("Q2 strategy: lead with scarcity, price anchoring"), meta=strategy_meta)
db.add(id=1003, vec=embed("Q2 performance report: CTR 3.2%, ROAS 4.1"), meta=report_meta)

# Declare relationships
db.link(from_id=1001, to_id=1002, rel_type="related_to", weight=0.9)
db.link(from_id=1001, to_id=1003, rel_type="followed_by", weight=1.0)
db.link(from_id=1002, to_id=1001, rel_type="contradicts", weight=0.6)

Those three calls add typed edges to the binary .feather file. No separate graph database. No sync job. The edges live in Metadata.edges alongside the vector, and a reverse index is rebuilt in memory on db.load().


Edge types in Feather DB

Feather DB ships with nine predefined relationship types — plus support for arbitrary user-defined strings.

Edge type Meaning Example
same_ad Same piece of creative in different modalities Text brief → image for that ad
related_to Thematically related, non-hierarchical Brief → strategy document
followed_by Temporal sequence — B came after A Strategy brief → performance report
contradicts The target node negates or rebuts the source Competitor claim → internal rebuttal
entity Both nodes share a named entity (person, brand, product) Two ads referencing the same product SKU
derived_from Target was produced from source Ad copy → source transcript
supports Source provides evidence for target Benchmark result → product claim
precedes Logical precondition User intent node → decision node
references Source explicitly cites target Campaign post → research paper

Any string not in that list is accepted as a user-defined type. Edge types are stored as strings in the binary format — "closed_deal", "user_approved", "blocked_by" all work exactly the same way.

# User-defined edge type — fully supported
db.link(from_id=5001, to_id=5002, rel_type="user_approved", weight=1.0)
db.link(from_id=3001, to_id=3002, rel_type="closed_deal",   weight=0.95)

Edge weight

Every edge carries a weight in [0.0, 1.0]. Weight controls how strongly the relationship holds — and it feeds directly into the BFS traversal score at each hop.

A same_ad edge between a creative brief and its image probably deserves weight=1.0 — they are the same thing in different modalities. A related_to edge between a strategy document and a loosely-connected research note might be weight=0.3.

The traversal score at hop n is:

hop_score = (1 / (1 + hop)) × importance × stickiness × edge_weight

So a node two hops away, reached via a weak edge, will naturally score lower than a node one hop away via a strong edge. Weight is how you encode confidence in the relationship.


BFS traversal: db.context_chain()

This is where the graph becomes operationally useful. context_chain runs in two phases.

Phase 1 — Vector search (hop=0): standard HNSW ANN search returns the k nearest vectors to your query. These are your seed nodes.

Phase 2 — BFS expansion (hops 1..depth): from each seed, traverse outgoing and incoming typed edges. Each reached node is scored by its hop distance and edge weight. Already-visited nodes are skipped.

chain = db.context_chain(
    start_vec=embed("Q2 campaign performance"),
    k=3,       # how many seeds from vector search
    hops=2,    # how many BFS hops to expand
)

for node in sorted(chain.nodes, key=lambda n: (n.hop, -n.score)):
    print(f"hop={node.hop}  score={node.score:.4f}  {node.metadata.content[:80]}")

The output tells you not just what was retrieved, but why — which hop, which edge, how confident the traversal was at each step.

You can also start from a known node ID rather than a query vector:

# Start from a specific node, traverse outward
chain = db.context_chain(start_id=1001, depth=2)

This is useful when you already know which memory you're anchored to and want to pull the surrounding knowledge graph into context.


Multimodal edges: linking text to image

The same_ad edge type was designed specifically for multimodal knowledge graphs. When you ingest a text creative brief and its image counterpart as separate nodes, an edge makes the relationship explicit — and traversable.

from google import genai

client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])

def embed_text(text):
    res = client.models.embed_content(
        model="gemini-embedding-exp-03-07",
        contents=text
    )
    return res.embeddings[0].values

def embed_image(description):
    res = client.models.embed_content(
        model="gemini-embedding-exp-03-07",
        contents=description
    )
    return res.embeddings[0].values

db = feather_db.DB.open("multimodal.feather", dim=768)

# Text node: the creative brief
text_meta = feather_db.Metadata()
text_meta.set_attribute("modality", "text")
text_meta.set_attribute("entity_type", "ad_creative")
text_meta.content = "Festive hook, Rs 999 CTA, female lead. Diwali campaign."
db.add(id=1001, vec=embed_text(text_meta.content), meta=text_meta)

# Image node: the actual visual
image_meta = feather_db.Metadata()
image_meta.set_attribute("modality", "image")
image_meta.set_attribute("entity_type", "ad_creative")
image_meta.content = "Diwali ad image: warm lighting, product hero shot, Rs 999 overlay."
db.add(id=2001, vec=embed_image(image_meta.content), meta=image_meta)

# Link them — same piece of creative, different modalities
db.link(from_id=2001, to_id=1001, rel_type="same_ad", weight=1.0)
db.link(from_id=1001, to_id=2001, rel_type="same_ad", weight=1.0)

Now a query that retrieves the image node automatically surfaces the text brief at hop=1, and vice versa. No explicit filter on modality required.


Combining vector search with graph traversal

The standard pattern in Feather DB is: find the top-k by vector similarity, then expand the graph for surrounding context.

query = embed_text("Campaign targeting seniors, FD product, post-budget messaging")

# Step 1: vector search — find the most semantically similar nodes
results = db.search(query, k=5)

# Step 2: graph expansion — pull in related context
for r in results:
    chain = db.context_chain(start_id=r.id, depth=2)
    for node in chain.nodes:
        if node.hop > 0:
            print(f"  [hop={node.hop}, {node.metadata.get_attribute('entity_type')}]")
            print(f"  {node.metadata.content[:100]}")

The vector search handles semantic proximity. The graph expansion handles logical structure — what was produced from what, what contradicts what, what came before or after.

Together, you get retrieval that looks like memory: not just "here are similar things" but "here is this thing, the strategy it came from, the performance it produced, and the competitor it was responding to."


Use cases

Creative briefs linked to campaigns

A creative brief node connects to campaign strategy via related_to, to performance reports via followed_by, and to image/video assets via same_ad. When an agent asks "what worked for our FD campaigns last quarter," the vector search surfaces the brief — and the graph expansion pulls the performance outcome automatically.

User preferences linked to decisions

Each time a user makes a decision in an agentic workflow, store the decision as a node and link it to the preference signals that drove it (derived_from). Later, when the agent needs to recall why a choice was made, context_chain from the decision node surfaces the full reasoning chain.

Concepts linked to sources

Research agents can store claim nodes linked to source document nodes via references, with an optional contradicts edge to a counter-claim. Retrieval surfaces not just the claim but its provenance and any known rebuttals — automatically.

Competitor intelligence

Competitor ad nodes link to internal strategy nodes via contradicts when their messaging conflicts with yours. A query about a competitor campaign surfaces your own counter-strategy at hop=1.


Building a campaign memory graph

Here is a complete example: ingesting a campaign's lifecycle from strategy through creative to performance, with typed edges at each stage.

import feather_db
import os

db = feather_db.DB.open("campaign_memory.feather", dim=768)

def make_node(content, entity_type, importance=0.7):
    meta = feather_db.Metadata()
    meta.set_attribute("entity_type", entity_type)
    meta.importance = importance
    meta.content = content
    return meta

# --- Ingest nodes ---

# Strategy layer
strategy_meta = make_node(
    "Q3 strategy: lead with scarcity framing for FD product. Target seniors 45+. "
    "Post-budget window. Tax-free up to Rs 1.5L. Urgency via rate expiry.",
    entity_type="strategy",
    importance=0.9
)
db.add(id=3001, vec=embed(strategy_meta.content), meta=strategy_meta)

# Creative layer
brief_meta = make_node(
    "Creative brief: Senior testimonial. Hook: 'Rate won't last.' CTA: 'Book before March 31.' "
    "Black-gold palette. 30-second video. FD_Senior_Q3_001.",
    entity_type="ad_creative",
    importance=0.8
)
db.add(id=1001, vec=embed(brief_meta.content), meta=brief_meta)

# Image asset
image_meta = make_node(
    "Ad image: Senior couple, warm lighting, FD rate overlay 8.5%, logo bottom-right.",
    entity_type="ad_creative",
    importance=0.7
)
db.add(id=2001, vec=embed(image_meta.content), meta=image_meta)

# Performance layer
perf_meta = make_node(
    "Performance report FD_Senior_Q3_001: CTR 3.8%, ROAS 5.2, CPL Rs 210. "
    "Top performer Q3. Audience: 45-60, Tier 1 cities.",
    entity_type="performance_report",
    importance=1.0
)
db.add(id=4001, vec=embed(perf_meta.content), meta=perf_meta)

# --- Declare edges ---

# Strategy → brief: brief was derived from strategy
db.link(from_id=1001, to_id=3001, rel_type="derived_from", weight=0.95)

# Brief → image: same creative, different modality
db.link(from_id=1001, to_id=2001, rel_type="same_ad", weight=1.0)
db.link(from_id=2001, to_id=1001, rel_type="same_ad", weight=1.0)

# Brief → performance: campaign outcomes followed from this creative
db.link(from_id=1001, to_id=4001, rel_type="followed_by", weight=1.0)

# Strategy → performance: direct attribution
db.link(from_id=3001, to_id=4001, rel_type="followed_by", weight=0.85)

db.save()

# --- Query: retrieve campaign context from a performance question ---
chain = db.context_chain(
    start_vec=embed("Which FD campaign worked best for seniors last quarter?"),
    k=2,
    hops=2
)

print("Campaign memory graph traversal:")
for node in sorted(chain.nodes, key=lambda n: (n.hop, -n.score)):
    etype = node.metadata.get_attribute("entity_type")
    print(f"  hop={node.hop}  [{etype}]  score={node.score:.4f}")
    print(f"    {node.metadata.content[:100]}")

Sample output:

Campaign memory graph traversal:
  hop=0  [performance_report]  score=1.8240
    Performance report FD_Senior_Q3_001: CTR 3.8%, ROAS 5.2, CPL Rs 210. Top performer Q3...

  hop=0  [ad_creative]  score=1.4110
    Creative brief: Senior testimonial. Hook: 'Rate won't last.' CTA: 'Book before March 31.'...

  hop=1  [strategy]  score=0.6650  ← derived_from edge
    Q3 strategy: lead with scarcity framing for FD product. Target seniors 45+. Post-budget...

  hop=1  [ad_creative]  score=0.5500  ← same_ad edge
    Ad image: Senior couple, warm lighting, FD rate overlay 8.5%, logo bottom-right...

One query. All four layers of the campaign — performance, creative, image, strategy — surfaced in the correct relationship order.


Performance characteristics

Graph traversal in Feather DB is O(edges visited) — proportional to the number of edges followed during BFS, not the total number of nodes in the database. For sparse knowledge graphs (the typical case), this is very fast.

The key insight: you don't need a dense graph. In practice, most nodes have 2–5 outgoing edges. A BFS with depth=2 from a single seed might visit 10–30 nodes total, regardless of whether the full index contains 10,000 nodes or 500,000.

Graph density Depth=1 traversal Depth=2 traversal Notes
2 edges/node ~2 nodes ~6 nodes Typical agent memory graph
5 edges/node ~5 nodes ~30 nodes Dense knowledge base
10 edges/node ~10 nodes ~110 nodes Use depth=1 for latency control

The HNSW vector search (Phase 1) dominates total latency — typically 0.19ms at p50 on 500K vectors. BFS traversal on a sparse graph adds sub-millisecond overhead.

For very dense graphs, keep depth=1 and rely on the weight parameter to filter weak edges. The traversal skips edges below a configurable min_weight threshold.


What this enables that pure vector search cannot

Vector similarity can find that a performance report and a creative brief are about the same campaign. But it can't tell you that the brief preceded the report, that the strategy caused the brief, or that a competitor node contradicts your positioning.

Those distinctions are what make an AI agent's memory feel like reasoning rather than retrieval.

Typed edges are cheap to declare — one line per relationship. The payoff is a retrieval system that understands not just "what is similar" but "what is this node's role in the knowledge network."

That's the difference between a vector index and a context engine.


Next steps