# Living Context Engine for Claude, GPT, and Gemini Agents: Model-Specific Patterns

> Each frontier model has different context window, tool-use, and streaming conventions. This guide covers the model-specific patterns for wiring a Living Context Engine into Claude, GPT-5, and Gemini 2.5 Pro agents.

- **Category**: Tutorial
- **Read time**: 13 min read
- **Date**: May 15, 2026
- **Author**: Feather DB Engineering (Engineering Team)
- **URL**: https://getfeather.store/theory/living-context-engine-claude-gpt-gemini-agents

---

# Living Context Engine for Claude, GPT, and Gemini Agents: Model-Specific Patterns

*Tutorial · Claude 4.5 / GPT-5 / Gemini 2.5 Pro · May 2026*

---

## Why Model-Specific Patterns Matter

The architecture of a Living Context Engine is model-agnostic. The wiring is not. Claude, GPT, and Gemini differ in three concrete dimensions that change how you call the engine: context window size, tool-use shape, and streaming conventions. The right pattern is the one that matches each model's native conventions — fighting them costs latency and quality.

This post walks through the recommended pattern for each of the three frontier model families.

## Pattern A — Claude (Anthropic)

Claude's strengths: very large context window (1M tokens for Claude 4.7 Opus), excellent at preserving structured input across long contexts, and tool-use that's first-class via the Messages API. The recommended pattern leans into these.

### Wide-Context Read

Claude tolerates large retrieved subgraphs without quality degradation. Increase your `k` and `hops` beyond the conservative defaults:

```python
chain = db.context_chain(query_vec, k=12, hops=3)

```

### Structured Context Block

Preserve the graph structure when formatting context — Claude reasons better with explicit topology. Use XML-ish tags that match Claude's documented conventions:

```python
def format_for_claude(chain):
    parts = [""]
    for hop in sorted({n.hop for n in chain.nodes}):
        parts.append(f"  ")
        for n in [x for x in chain.nodes if x.hop == hop]:
            parts.append(f"    ")
            parts.append(f"      {n.metadata['text']}")
            parts.append("    ")
        parts.append("  ")
    parts.append("")
    return "\n".join(parts)

```

### Tool-Use Write-Back

Expose `write_back` as a tool. Claude is excellent at deciding when its own output is worth persisting — give it the agency:

```python
tools = [{
    "name": "persist_decision",
    "description": "Persist a completed decision back to the context engine. Use after producing a final answer that should inform future calls.",
    "input_schema": {
        "type": "object",
        "properties": {
            "summary": {"type": "string"},
            "input_node_ids": {"type": "array", "items": {"type": "integer"}},
            "edge_type": {"type": "string", "enum": ["derived_from", "responds_to", "contradicts"]},
        },
        "required": ["summary", "input_node_ids"],
    },
}]

```

## Pattern B — GPT (OpenAI)

GPT-5's strengths: fastest tool-call latency, strong structured output via JSON schema, and a tight context window relative to Claude. The pattern adapts to these.

### Tighter Context Block

Keep `k` smaller and `hops` at 2. Trim the retrieved subgraph aggressively:

```python
chain = db.context_chain(query_vec, k=5, hops=2)
chain.nodes = chain.nodes[:8]  # hard cap for tight context

```

### JSON-Schema Output for Write-Back

Use GPT-5's JSON-schema mode to make the write-back deterministic:

```python
response_schema = {
    "type": "object",
    "properties": {
        "answer": {"type": "string"},
        "persist": {"type": "boolean"},
        "edge_type": {"type": "string", "enum": ["derived_from", "responds_to"]},
    },
    "required": ["answer", "persist"],
}

resp = client.chat.completions.create(
    model="gpt-5",
    response_format={"type": "json_schema", "json_schema": response_schema},
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": query_with_context},
    ],
)

parsed = json.loads(resp.choices[0].message.content)
if parsed["persist"]:
    write_back(db, parsed["answer"], input_ids, edge_type=parsed.get("edge_type", "derived_from"))

```

### Streaming-Friendly Output

GPT-5 streams well. Defer the write-back until the stream ends and you have the full output. Don't write back partial deltas.

## Pattern C — Gemini (Google)

Gemini 2.5 Pro's strengths: native multimodal input, very long context (2M tokens), and tight integration with Gemini Embedding 2 (the same 768-dim space used for image + text + video). The pattern leans on the multimodal native-ness.

### Multimodal Context Retrieval

The whole point of Gemini is mixed-modality reasoning. Use `modality=None` in `context_chain` to retrieve across all modalities and surface the graph that connects them:

```python
chain = db.context_chain(query_vec, k=8, hops=2, modality=None)

```

Feed Gemini both text excerpts and the actual image/video assets (referenced by node payload):

```python
contents = []
for n in chain.nodes:
    if n.modality == "image":
        contents.append({"inline_data": {"mime_type": "image/jpeg", "data": load_image_bytes(n.metadata["asset_path"])}})
        contents.append({"text": f"[image node {n.id}, edge={n.edge_type}]"})
    else:
        contents.append({"text": f"[{n.modality} node {n.id}, hop={n.hop}, edge={n.edge_type}]: {n.metadata['text']}"})

contents.append({"text": f"\nQuery: {query}"})

response = genai.GenerativeModel("gemini-2.5-pro").generate_content(contents)

```

### Long-Context Read

Gemini's 2M context is enormous. You can pull a wide subgraph (k=20+, hops=3) without quality degradation. Use this for cross-modal queries where the connected subgraph spans dozens of related assets.

### Embed With the Same Model

Critically — use `gemini-embedding-exp-03-07` for the vectors stored in your Living Context Engine when you're serving via Gemini. The 768-dim space is the same one Gemini's encoders use internally; cross-encoder alignment is what makes the multimodal queries work.

## Cross-Cutting Best Practices

- **One file per agent.** Each agent gets its own `.feather` file. Cheap to spin up, hard isolation, easy to checkpoint.

- **Embed once, retrieve everywhere.** Settle on one embedding model per store. Don't mix Gemini Embedding 2 vectors with OpenAI Ada vectors in the same index.

- **Capture downstream signal.** Whichever model you use, the reinforcement step (raising importance / bumping recall on inputs that produced successful outputs) is what closes the loop. Wire it on day one.

## The Common Substrate

Underneath all three patterns is the same Living Context Engine kernel. The model-specific code is at the formatting and write-back boundary — a few dozen lines each. The architectural substrate doesn't change. That portability is the point: the engine is the durable component, the models are interchangeable on top.

---

*Related: [Build a Living Context Engine in Python](/theory/how-to-build-living-context-engine-python) · [The 768-Dimension Bet](/theory/768-dimension-unified-vector-space).*

---

*This is the machine-readable mirror of the theory post at [getfeather.store/theory/living-context-engine-claude-gpt-gemini-agents](https://getfeather.store/theory/living-context-engine-claude-gpt-gemini-agents). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*