# Feather DB + OpenAI Agents SDK: Persistent Memory for GPT Agents > The OpenAI Agents SDK makes building GPT-4o agents with tools and handoffs straightforward. The one thing it doesn't give you: memory between runs. Here's how to wrap Feather DB as search_memory and add_memory tools, inject retrieved context into every response, isolate users via namespace, and bulk-load history with add_batch() — all with a 48ms cold start. - **Category**: Deploy - **Read time**: 9 min read - **Date**: June 16, 2026 - **Author**: Feather DB (Engineering) - **URL**: https://getfeather.store/theory/feather-db-openai-gpt-agent-memory --- ## The statefulness gap in the OpenAI Agents SDK The OpenAI Agents SDK (`openai-agents` package) ships with clean primitives: function-calling tools via `@function_tool`, agent handoffs via `handoff()`, and a `Runner` that handles the tool-call loop. What it doesn't provide is any memory layer. Each `Runner.run()` call starts cold. The agent has no knowledge of previous conversations, previously established user preferences, or facts it learned three sessions ago. For a simple Q&A bot, statelessness is fine. For any agent that's supposed to know you — a personal assistant, a support agent, a coding copilot — it becomes the core failure mode. Users repeat themselves. The agent gives the same generic answer it gave last week. Trust erodes. Feather DB plugs this gap with two tools: `search_memory` (retrieve relevant context before responding) and `add_memory` (store facts after each turn). The agent calls them. Memory persists across runs in a single `.feather` file. Cold-start load in v0.16.0 is 48ms — memory is ready before your first API call completes. ## Install ```bash pip install feather-db openai openai-agents ``` ## Step 1: Initialize Feather DB and the embed function ```python import os import feather_db as fdb from openai import OpenAI # v0.16.0: parallel HNSW load — 48ms cold start on 50k vectors os.environ["FEATHER_LOAD_THREADS"] = "8" openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) # One .feather file per deployment — namespaces isolate each user db = fdb.DB.open("agent_memory.feather", dim=1536) # text-embedding-3-small def embed(text: str) -> list[float]: resp = openai_client.embeddings.create( input=[text], model="text-embedding-3-small" ) return resp.data[0].embedding ``` One file, all users. Namespace isolation (covered below) keeps their memories separate — no cross-contamination. ## Step 2: Implement search_memory and add_memory These are plain Python functions first. The Agents SDK wrapper comes in Step 3. ```python from datetime import datetime def search_memory( query: str, user_id: str, k: int = 6, half_life: int = 30 ) -> str: """ Retrieve the k most relevant memories for this user. half_life controls decay speed in days — lower = faster fade. """ vec = embed(query) results = db.context_chain( vec, k=k, namespace=user_id, # each user_id is its own namespace max_depth=2, half_life=half_life ) if not results: return "No relevant memories found." lines = [f"Retrieved {len(results)} memories:"] for i, mem in enumerate(results, 1): mem_type = mem.meta.get_attribute("type") or "message" lines.append( f"{i}. [{mem_type}] (score={mem.score:.3f}) {mem.text}" ) return "\n".join(lines) def add_memory( text: str, user_id: str, memory_type: str = "message", half_life: int = 30, importance: float = 1.0 ) -> str: """ Store a fact, preference, or message for this user. memory_type tags the entry; half_life and importance tune recall weight. """ vec = embed(text) mem = db.add( vec, text=text, namespace=user_id, entity="conversation" ) mem.meta.set_attribute("type", memory_type) mem.meta.set_attribute("importance", importance) mem.meta.set_attribute("half_life", half_life) mem.meta.set_attribute("created_at", datetime.utcnow().isoformat()) return f"Stored (id={mem.id}): {text[:80]}" ``` ## Step 3: Wrap as Agents SDK tools and build the agent ```python from agents import Agent, Runner, function_tool # Bind user_id at agent-construction time — one agent instance per user, # or use a closure if you construct agents dynamically. def make_memory_tools(user_id: str): @function_tool def search_memory_tool(query: str) -> str: """ Search this user's memory for context relevant to the query. Call this at the start of every response before answering. """ return search_memory(query, user_id=user_id) @function_tool def add_memory_tool( text: str, memory_type: str = "message", half_life: int = 30, importance: float = 1.0 ) -> str: """ Save information to this user's memory. memory_type: 'preference' | 'fact' | 'message' | 'decision' half_life: days until memory fades — use 180 for preferences, 7 for session facts importance: 0.5 (low) to 3.0 (critical); default 1.0 """ return add_memory( text, user_id=user_id, memory_type=memory_type, half_life=half_life, importance=importance ) return search_memory_tool, add_memory_tool def build_agent(user_id: str) -> Agent: search_tool, add_tool = make_memory_tools(user_id) return Agent( name="Assistant", instructions="""You are a helpful assistant with persistent memory. On every turn: 1. Call search_memory_tool with the user's message to retrieve relevant context. 2. Use that context to personalize your response — reference what you know. 3. After responding, call add_memory_tool to save: - The user's message (memory_type='message', half_life=30) - Any preference the user revealed (memory_type='preference', half_life=180, importance=2.0) - Any important fact or decision (memory_type='fact', half_life=90, importance=1.5) Be explicit when you recall something: "Based on what you told me earlier..." Never pretend to know something you didn't retrieve from memory.""", tools=[search_tool, add_tool], model="gpt-4o" ) ``` ## Step 4: Automatic memory on every turn The pattern below runs a full conversation loop. Memory search happens before each response; memory write happens after. The agent handles both tool calls in its internal loop — you just pass the user message and get the response. ```python import asyncio async def chat(user_id: str, message: str) -> str: agent = build_agent(user_id) result = await Runner.run( agent, input=message, max_turns=6 # search + respond + write = 3 turns minimum ) return result.final_output async def demo(): user = "user_42" # Turn 1: user reveals a preference r1 = await chat(user, "I prefer concise bullet-point answers, not long paragraphs.") print(f"Turn 1: {r1}\n") # Turn 2: different topic — agent should still surface the preference r2 = await chat(user, "Explain how HNSW indexing works.") print(f"Turn 2: {r2}\n") # Turn 3: explicit recall test r3 = await chat(user, "What format do I prefer for answers?") print(f"Turn 3: {r3}\n") asyncio.run(demo()) ``` Turn 1 stores the preference with `half_life=180` and `importance=2.0`. Turn 2's `search_memory_tool` call retrieves it before the HNSW explanation — the agent answers in bullets without being reminded. That's the payoff. ## Step 5: Namespace per user — isolation by design Every `add_memory` and `search_memory` call passes `namespace=user_id`. Feather DB enforces strict namespace isolation at the index level — a search in `namespace="user_42"` never touches vectors stored under `namespace="user_99"`. No query-time filtering, no risk of leakage. ```python # Inspect what's stored for a specific user user_vec = embed("user preferences") results = db.search(user_vec, k=20, namespace="user_42") print(f"user_42 has {len(results)} memories") # Count across all namespaces print(f"Total vectors in file: {db.count()}") print(f"user_42 vectors: {db.count(namespace='user_42')}") ``` One `.feather` file serves every user in your system. Each user gets their own isolated memory space. No separate databases, no per-user deployments. ## Step 6: Adaptive decay — preferences outlast session facts Not all memories should fade at the same rate. A user's preferred response format should still surface six months from now. A fact from today's troubleshooting session is irrelevant by next week. Feather DB's adaptive decay is controlled per-memory via `half_life` (days) and `importance` (weight multiplier). The agent's instructions encode these directly: ```python # Preference: long-lived, high importance add_memory( "User prefers bullet-point answers over long paragraphs.", user_id="user_42", memory_type="preference", half_life=180, # fades over ~6 months importance=2.0 # surfaces even when semantic match is weak ) # Session fact: short-lived, normal importance add_memory( "User is debugging a KeyError on line 47 of ingest.py.", user_id="user_42", memory_type="fact", half_life=7, # fades after ~a week importance=1.0 ) # Conversational message: medium decay add_memory( "User asked how HNSW handles deletions.", user_id="user_42", memory_type="message", half_life=30, importance=1.0 ) ``` The agent instructions tell GPT-4o to set these values. In practice, the model applies them correctly for clear preference vs. fact vs. session signals — you don't need a separate classifier. ## Step 7: add_batch() for history import If a user already has an existing chat history — from another system, a CSV export, or a previous session log — use `add_batch()` to load it in one parallel call instead of a sequential loop. On a 4-core machine, `add_batch()` is 3.4× faster than sequential `add()` for bulk ingest. ```python import numpy as np def import_chat_history(user_id: str, messages: list[dict]): """ Bulk-load existing chat history into Feather DB. messages: list of {"role": "user"|"assistant", "content": str} """ if not messages: return texts = [m["content"] for m in messages] roles = [m["role"] for m in messages] # Embed all messages in one batch API call response = openai_client.embeddings.create( input=texts, model="text-embedding-3-small" ) vecs = np.array( [r.embedding for r in response.data], dtype=np.float32 ) # Build metadata — assign half_life by role metas = [] for role in roles: m = fdb.Metadata(importance=1.0) m.set_attribute("type", "message") m.set_attribute("role", role) m.set_attribute( "half_life", 30 if role == "user" else 14 ) m.set_attribute("source", "history_import") m.set_attribute("created_at", datetime.utcnow().isoformat()) metas.append(m) # Parallel ingest — GIL released during HNSW graph construction ids = list(range(db.count(namespace=user_id), db.count(namespace=user_id) + len(texts))) db.add_batch(ids, vecs, metas=metas, namespace=user_id) db.save() print(f"Imported {len(texts)} messages for {user_id}") # Usage: load 500 historical messages before the first live turn history = [ {"role": "user", "content": "I always want code examples in Python 3.12."}, {"role": "assistant", "content": "Noted — I'll use Python 3.12 syntax."}, # ... 498 more ] import_chat_history("user_42", history) ``` After `import_chat_history()` completes, the agent's `search_memory_tool` will surface relevant historical context on the very first live turn. No warm-up period needed. ## Step 8: Inject retrieved context into the system prompt The function-calling approach above works well and lets GPT-4o decide when to search. For tighter latency control, you can also pre-retrieve context server-side and inject it directly into the system prompt before calling the agent — bypassing one tool-call round-trip. ```python async def chat_with_preloaded_context( user_id: str, message: str ) -> str: # Retrieve before the API call — adds ~2ms, saves one tool-call round-trip context = search_memory(message, user_id=user_id, k=6) agent = Agent( name="Assistant", instructions=f"""You are a helpful assistant with persistent memory. Relevant context retrieved from this user's memory: {context} Use this context to personalize your response. After responding, call add_memory_tool to save any new preferences or facts.""", tools=[make_memory_tools(user_id)[1]], # add_memory only — search already done model="gpt-4o" ) result = await Runner.run(agent, input=message, max_turns=4) # Also store the turn explicitly add_memory(message, user_id=user_id, memory_type="message", half_life=30) add_memory(result.final_output, user_id=user_id, memory_type="message", half_life=14, importance=0.8) return result.final_output ``` Both patterns work. The function-calling version is more flexible — the agent decides relevance. The pre-injection version reduces round-trips and keeps total latency lower for high-traffic deployments. ## Production: combine with OpenAI file search Feather DB handles agent memory — the dynamic, evolving knowledge that accrues from interactions. OpenAI's built-in file search tool handles static document knowledge — product manuals, codebases, knowledge bases that don't change turn by turn. The two are complementary, not competing. ```python from openai import OpenAI from agents import Agent, Runner, function_tool client = OpenAI() # Upload static documents to OpenAI for file search vector_store = client.vector_stores.create(name="product-docs") # ... upload your PDFs, markdown files, etc. search_tool, add_tool = make_memory_tools("user_42") production_agent = Agent( name="Production Assistant", instructions="""You are a helpful assistant with two knowledge sources: 1. File search (built-in): use for product documentation, technical specs, policies. 2. search_memory_tool: use for this specific user's history, preferences, and past interactions. On every turn: - Call search_memory_tool first for user-specific context. - Use file search when the question requires authoritative product knowledge. - After responding, call add_memory_tool to persist anything new about this user.""", tools=[ search_tool, add_tool, # OpenAI file search is attached via tool_resources, not function_tool ], tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}}, model="gpt-4o" ) ``` Document knowledge lives in OpenAI's infrastructure. User memory lives in Feather DB — local, fast, and owned by you. Neither competes for the other's role. ## Performance in production OperationLatencyNotes Cold start (v0.16.0, 50k vectors)48ms`FEATHER_LOAD_THREADS=8` ANN search p500.19ms500k vectors, k=10 ANN search p990.13ms500k vectors, k=10 `add_batch()` 50k vectors~10s3.4× over sequential loop Sequential `add()`~2–5ms/callincludes embed round-trip The 48ms cold start means memory is fully loaded before your first `openai.chat.completions.create()` call returns. In any I/O-dominated agent loop, Feather DB is not your latency bottleneck. ## What you have - **Persistent memory across runs** — the agent remembers what it learned last session, last month, from the history import. - **Per-user namespace isolation** — one `.feather` file, zero cross-contamination between users. - **Adaptive decay** — preferences persist for six months (`half_life=180`); session facts fade in a week (`half_life=7`). - **Fast history import** — `add_batch()` loads existing chat logs 3.4× faster than a sequential loop. - **48ms cold start** — memory ready before the first API call completes. - **Composable with OpenAI file search** — document knowledge and agent memory as separate, non-competing layers. **Install:** `pip install feather-db openai openai-agents` · **GitHub:** [github.com/feather-store/feather](https://github.com/feather-store/feather) --- *This is the machine-readable mirror of the theory post at [getfeather.store/theory/feather-db-openai-gpt-agent-memory](https://getfeather.store/theory/feather-db-openai-gpt-agent-memory). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*