# Living Context Engine for Claude, GPT, and Gemini Agents: Model-Specific Patterns > Each frontier model has different context window, tool-use, and streaming conventions. This guide covers the model-specific patterns for wiring a Living Context Engine into Claude, GPT-5, and Gemini 2.5 Pro agents. - **Category**: Tutorial - **Read time**: 13 min read - **Date**: May 15, 2026 - **Author**: Feather DB Engineering (Engineering Team) - **URL**: https://getfeather.store/theory/living-context-engine-claude-gpt-gemini-agents --- # Living Context Engine for Claude, GPT, and Gemini Agents: Model-Specific Patterns *Tutorial · Claude 4.5 / GPT-5 / Gemini 2.5 Pro · May 2026* --- ## Why Model-Specific Patterns Matter The architecture of a Living Context Engine is model-agnostic. The wiring is not. Claude, GPT, and Gemini differ in three concrete dimensions that change how you call the engine: context window size, tool-use shape, and streaming conventions. The right pattern is the one that matches each model's native conventions — fighting them costs latency and quality. This post walks through the recommended pattern for each of the three frontier model families. ## Pattern A — Claude (Anthropic) Claude's strengths: very large context window (1M tokens for Claude 4.7 Opus), excellent at preserving structured input across long contexts, and tool-use that's first-class via the Messages API. The recommended pattern leans into these. ### Wide-Context Read Claude tolerates large retrieved subgraphs without quality degradation. Increase your `k` and `hops` beyond the conservative defaults: ```python chain = db.context_chain(query_vec, k=12, hops=3) ``` ### Structured Context Block Preserve the graph structure when formatting context — Claude reasons better with explicit topology. Use XML-ish tags that match Claude's documented conventions: ```python def format_for_claude(chain): parts = [""] for hop in sorted({n.hop for n in chain.nodes}): parts.append(f" ") for n in [x for x in chain.nodes if x.hop == hop]: parts.append(f" ") parts.append(f" {n.metadata['text']}") parts.append(" ") parts.append(" ") parts.append("") return "\n".join(parts) ``` ### Tool-Use Write-Back Expose `write_back` as a tool. Claude is excellent at deciding when its own output is worth persisting — give it the agency: ```python tools = [{ "name": "persist_decision", "description": "Persist a completed decision back to the context engine. Use after producing a final answer that should inform future calls.", "input_schema": { "type": "object", "properties": { "summary": {"type": "string"}, "input_node_ids": {"type": "array", "items": {"type": "integer"}}, "edge_type": {"type": "string", "enum": ["derived_from", "responds_to", "contradicts"]}, }, "required": ["summary", "input_node_ids"], }, }] ``` ## Pattern B — GPT (OpenAI) GPT-5's strengths: fastest tool-call latency, strong structured output via JSON schema, and a tight context window relative to Claude. The pattern adapts to these. ### Tighter Context Block Keep `k` smaller and `hops` at 2. Trim the retrieved subgraph aggressively: ```python chain = db.context_chain(query_vec, k=5, hops=2) chain.nodes = chain.nodes[:8] # hard cap for tight context ``` ### JSON-Schema Output for Write-Back Use GPT-5's JSON-schema mode to make the write-back deterministic: ```python response_schema = { "type": "object", "properties": { "answer": {"type": "string"}, "persist": {"type": "boolean"}, "edge_type": {"type": "string", "enum": ["derived_from", "responds_to"]}, }, "required": ["answer", "persist"], } resp = client.chat.completions.create( model="gpt-5", response_format={"type": "json_schema", "json_schema": response_schema}, messages=[ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": query_with_context}, ], ) parsed = json.loads(resp.choices[0].message.content) if parsed["persist"]: write_back(db, parsed["answer"], input_ids, edge_type=parsed.get("edge_type", "derived_from")) ``` ### Streaming-Friendly Output GPT-5 streams well. Defer the write-back until the stream ends and you have the full output. Don't write back partial deltas. ## Pattern C — Gemini (Google) Gemini 2.5 Pro's strengths: native multimodal input, very long context (2M tokens), and tight integration with Gemini Embedding 2 (the same 768-dim space used for image + text + video). The pattern leans on the multimodal native-ness. ### Multimodal Context Retrieval The whole point of Gemini is mixed-modality reasoning. Use `modality=None` in `context_chain` to retrieve across all modalities and surface the graph that connects them: ```python chain = db.context_chain(query_vec, k=8, hops=2, modality=None) ``` Feed Gemini both text excerpts and the actual image/video assets (referenced by node payload): ```python contents = [] for n in chain.nodes: if n.modality == "image": contents.append({"inline_data": {"mime_type": "image/jpeg", "data": load_image_bytes(n.metadata["asset_path"])}}) contents.append({"text": f"[image node {n.id}, edge={n.edge_type}]"}) else: contents.append({"text": f"[{n.modality} node {n.id}, hop={n.hop}, edge={n.edge_type}]: {n.metadata['text']}"}) contents.append({"text": f"\nQuery: {query}"}) response = genai.GenerativeModel("gemini-2.5-pro").generate_content(contents) ``` ### Long-Context Read Gemini's 2M context is enormous. You can pull a wide subgraph (k=20+, hops=3) without quality degradation. Use this for cross-modal queries where the connected subgraph spans dozens of related assets. ### Embed With the Same Model Critically — use `gemini-embedding-exp-03-07` for the vectors stored in your Living Context Engine when you're serving via Gemini. The 768-dim space is the same one Gemini's encoders use internally; cross-encoder alignment is what makes the multimodal queries work. ## Cross-Cutting Best Practices - **One file per agent.** Each agent gets its own `.feather` file. Cheap to spin up, hard isolation, easy to checkpoint. - **Embed once, retrieve everywhere.** Settle on one embedding model per store. Don't mix Gemini Embedding 2 vectors with OpenAI Ada vectors in the same index. - **Capture downstream signal.** Whichever model you use, the reinforcement step (raising importance / bumping recall on inputs that produced successful outputs) is what closes the loop. Wire it on day one. ## The Common Substrate Underneath all three patterns is the same Living Context Engine kernel. The model-specific code is at the formatting and write-back boundary — a few dozen lines each. The architectural substrate doesn't change. That portability is the point: the engine is the durable component, the models are interchangeable on top. --- *Related: [Build a Living Context Engine in Python](/theory/how-to-build-living-context-engine-python) · [The 768-Dimension Bet](/theory/768-dimension-unified-vector-space).* --- *This is the machine-readable mirror of the theory post at [getfeather.store/theory/living-context-engine-claude-gpt-gemini-agents](https://getfeather.store/theory/living-context-engine-claude-gpt-gemini-agents). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*