# Feather DB + OpenAI Agents SDK: Persistent Memory for GPT Agents

> The OpenAI Agents SDK makes building GPT-4o agents with tools and handoffs straightforward. The one thing it doesn't give you: memory between runs. Here's how to wrap Feather DB as search_memory and add_memory tools, inject retrieved context into every response, isolate users via namespace, and bulk-load history with add_batch() — all with a 48ms cold start.

- **Category**: Deploy
- **Read time**: 9 min read
- **Date**: June 16, 2026
- **Author**: Feather DB (Engineering)
- **URL**: https://getfeather.store/theory/feather-db-openai-gpt-agent-memory

---

## The statefulness gap in the OpenAI Agents SDK

The OpenAI Agents SDK (`openai-agents` package) ships with clean primitives: function-calling tools via `@function_tool`, agent handoffs via `handoff()`, and a `Runner` that handles the tool-call loop. What it doesn't provide is any memory layer. Each `Runner.run()` call starts cold. The agent has no knowledge of previous conversations, previously established user preferences, or facts it learned three sessions ago.

For a simple Q&A bot, statelessness is fine. For any agent that's supposed to know you — a personal assistant, a support agent, a coding copilot — it becomes the core failure mode. Users repeat themselves. The agent gives the same generic answer it gave last week. Trust erodes.

Feather DB plugs this gap with two tools: `search_memory` (retrieve relevant context before responding) and `add_memory` (store facts after each turn). The agent calls them. Memory persists across runs in a single `.feather` file. Cold-start load in v0.16.0 is 48ms — memory is ready before your first API call completes.

## Install

```bash
pip install feather-db openai openai-agents

```

## Step 1: Initialize Feather DB and the embed function

```python
import os
import feather_db as fdb
from openai import OpenAI

# v0.16.0: parallel HNSW load — 48ms cold start on 50k vectors
os.environ["FEATHER_LOAD_THREADS"] = "8"

openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# One .feather file per deployment — namespaces isolate each user
db = fdb.DB.open("agent_memory.feather", dim=1536)  # text-embedding-3-small

def embed(text: str) -> list[float]:
    resp = openai_client.embeddings.create(
        input=[text],
        model="text-embedding-3-small"
    )
    return resp.data[0].embedding

```

One file, all users. Namespace isolation (covered below) keeps their memories separate — no cross-contamination.

## Step 2: Implement search_memory and add_memory

These are plain Python functions first. The Agents SDK wrapper comes in Step 3.

```python
from datetime import datetime

def search_memory(
    query: str,
    user_id: str,
    k: int = 6,
    half_life: int = 30
) -> str:
    """
    Retrieve the k most relevant memories for this user.
    half_life controls decay speed in days — lower = faster fade.
    """
    vec = embed(query)
    results = db.context_chain(
        vec,
        k=k,
        namespace=user_id,   # each user_id is its own namespace
        max_depth=2,
        half_life=half_life
    )
    if not results:
        return "No relevant memories found."

    lines = [f"Retrieved {len(results)} memories:"]
    for i, mem in enumerate(results, 1):
        mem_type = mem.meta.get_attribute("type") or "message"
        lines.append(
            f"{i}. [{mem_type}] (score={mem.score:.3f}) {mem.text}"
        )
    return "\n".join(lines)

def add_memory(
    text: str,
    user_id: str,
    memory_type: str = "message",
    half_life: int = 30,
    importance: float = 1.0
) -> str:
    """
    Store a fact, preference, or message for this user.
    memory_type tags the entry; half_life and importance tune recall weight.
    """
    vec = embed(text)
    mem = db.add(
        vec,
        text=text,
        namespace=user_id,
        entity="conversation"
    )
    mem.meta.set_attribute("type", memory_type)
    mem.meta.set_attribute("importance", importance)
    mem.meta.set_attribute("half_life", half_life)
    mem.meta.set_attribute("created_at", datetime.utcnow().isoformat())
    return f"Stored (id={mem.id}): {text[:80]}"

```

## Step 3: Wrap as Agents SDK tools and build the agent

```python
from agents import Agent, Runner, function_tool

# Bind user_id at agent-construction time — one agent instance per user,
# or use a closure if you construct agents dynamically.
def make_memory_tools(user_id: str):

    @function_tool
    def search_memory_tool(query: str) -> str:
        """
        Search this user's memory for context relevant to the query.
        Call this at the start of every response before answering.
        """
        return search_memory(query, user_id=user_id)

    @function_tool
    def add_memory_tool(
        text: str,
        memory_type: str = "message",
        half_life: int = 30,
        importance: float = 1.0
    ) -> str:
        """
        Save information to this user's memory.
        memory_type: 'preference' | 'fact' | 'message' | 'decision'
        half_life: days until memory fades — use 180 for preferences, 7 for session facts
        importance: 0.5 (low) to 3.0 (critical); default 1.0
        """
        return add_memory(
            text,
            user_id=user_id,
            memory_type=memory_type,
            half_life=half_life,
            importance=importance
        )

    return search_memory_tool, add_memory_tool

def build_agent(user_id: str) -> Agent:
    search_tool, add_tool = make_memory_tools(user_id)

    return Agent(
        name="Assistant",
        instructions="""You are a helpful assistant with persistent memory.

On every turn:
1. Call search_memory_tool with the user's message to retrieve relevant context.
2. Use that context to personalize your response — reference what you know.
3. After responding, call add_memory_tool to save:
   - The user's message (memory_type='message', half_life=30)
   - Any preference the user revealed (memory_type='preference', half_life=180, importance=2.0)
   - Any important fact or decision (memory_type='fact', half_life=90, importance=1.5)

Be explicit when you recall something: "Based on what you told me earlier..."
Never pretend to know something you didn't retrieve from memory.""",
        tools=[search_tool, add_tool],
        model="gpt-4o"
    )

```

## Step 4: Automatic memory on every turn

The pattern below runs a full conversation loop. Memory search happens before each response; memory write happens after. The agent handles both tool calls in its internal loop — you just pass the user message and get the response.

```python
import asyncio

async def chat(user_id: str, message: str) -> str:
    agent = build_agent(user_id)
    result = await Runner.run(
        agent,
        input=message,
        max_turns=6   # search + respond + write = 3 turns minimum
    )
    return result.final_output

async def demo():
    user = "user_42"

    # Turn 1: user reveals a preference
    r1 = await chat(user, "I prefer concise bullet-point answers, not long paragraphs.")
    print(f"Turn 1: {r1}\n")

    # Turn 2: different topic — agent should still surface the preference
    r2 = await chat(user, "Explain how HNSW indexing works.")
    print(f"Turn 2: {r2}\n")

    # Turn 3: explicit recall test
    r3 = await chat(user, "What format do I prefer for answers?")
    print(f"Turn 3: {r3}\n")

asyncio.run(demo())

```

Turn 1 stores the preference with `half_life=180` and `importance=2.0`. Turn 2's `search_memory_tool` call retrieves it before the HNSW explanation — the agent answers in bullets without being reminded. That's the payoff.

## Step 5: Namespace per user — isolation by design

Every `add_memory` and `search_memory` call passes `namespace=user_id`. Feather DB enforces strict namespace isolation at the index level — a search in `namespace="user_42"` never touches vectors stored under `namespace="user_99"`. No query-time filtering, no risk of leakage.

```python
# Inspect what's stored for a specific user
user_vec = embed("user preferences")
results = db.search(user_vec, k=20, namespace="user_42")
print(f"user_42 has {len(results)} memories")

# Count across all namespaces
print(f"Total vectors in file: {db.count()}")
print(f"user_42 vectors: {db.count(namespace='user_42')}")

```

One `.feather` file serves every user in your system. Each user gets their own isolated memory space. No separate databases, no per-user deployments.

## Step 6: Adaptive decay — preferences outlast session facts

Not all memories should fade at the same rate. A user's preferred response format should still surface six months from now. A fact from today's troubleshooting session is irrelevant by next week.

Feather DB's adaptive decay is controlled per-memory via `half_life` (days) and `importance` (weight multiplier). The agent's instructions encode these directly:

```python
# Preference: long-lived, high importance
add_memory(
    "User prefers bullet-point answers over long paragraphs.",
    user_id="user_42",
    memory_type="preference",
    half_life=180,    # fades over ~6 months
    importance=2.0    # surfaces even when semantic match is weak
)

# Session fact: short-lived, normal importance
add_memory(
    "User is debugging a KeyError on line 47 of ingest.py.",
    user_id="user_42",
    memory_type="fact",
    half_life=7,      # fades after ~a week
    importance=1.0
)

# Conversational message: medium decay
add_memory(
    "User asked how HNSW handles deletions.",
    user_id="user_42",
    memory_type="message",
    half_life=30,
    importance=1.0
)

```

The agent instructions tell GPT-4o to set these values. In practice, the model applies them correctly for clear preference vs. fact vs. session signals — you don't need a separate classifier.

## Step 7: add_batch() for history import

If a user already has an existing chat history — from another system, a CSV export, or a previous session log — use `add_batch()` to load it in one parallel call instead of a sequential loop. On a 4-core machine, `add_batch()` is 3.4× faster than sequential `add()` for bulk ingest.

```python
import numpy as np

def import_chat_history(user_id: str, messages: list[dict]):
    """
    Bulk-load existing chat history into Feather DB.
    messages: list of {"role": "user"|"assistant", "content": str}
    """
    if not messages:
        return

    texts = [m["content"] for m in messages]
    roles = [m["role"] for m in messages]

    # Embed all messages in one batch API call
    response = openai_client.embeddings.create(
        input=texts,
        model="text-embedding-3-small"
    )
    vecs = np.array(
        [r.embedding for r in response.data],
        dtype=np.float32
    )

    # Build metadata — assign half_life by role
    metas = []
    for role in roles:
        m = fdb.Metadata(importance=1.0)
        m.set_attribute("type", "message")
        m.set_attribute("role", role)
        m.set_attribute(
            "half_life",
            30 if role == "user" else 14
        )
        m.set_attribute("source", "history_import")
        m.set_attribute("created_at", datetime.utcnow().isoformat())
        metas.append(m)

    # Parallel ingest — GIL released during HNSW graph construction
    ids = list(range(db.count(namespace=user_id),
                     db.count(namespace=user_id) + len(texts)))
    db.add_batch(ids, vecs, metas=metas, namespace=user_id)
    db.save()

    print(f"Imported {len(texts)} messages for {user_id}")

# Usage: load 500 historical messages before the first live turn
history = [
    {"role": "user", "content": "I always want code examples in Python 3.12."},
    {"role": "assistant", "content": "Noted — I'll use Python 3.12 syntax."},
    # ... 498 more
]
import_chat_history("user_42", history)

```

After `import_chat_history()` completes, the agent's `search_memory_tool` will surface relevant historical context on the very first live turn. No warm-up period needed.

## Step 8: Inject retrieved context into the system prompt

The function-calling approach above works well and lets GPT-4o decide when to search. For tighter latency control, you can also pre-retrieve context server-side and inject it directly into the system prompt before calling the agent — bypassing one tool-call round-trip.

```python
async def chat_with_preloaded_context(
    user_id: str,
    message: str
) -> str:
    # Retrieve before the API call — adds ~2ms, saves one tool-call round-trip
    context = search_memory(message, user_id=user_id, k=6)

    agent = Agent(
        name="Assistant",
        instructions=f"""You are a helpful assistant with persistent memory.

Relevant context retrieved from this user's memory:
{context}

Use this context to personalize your response.
After responding, call add_memory_tool to save any new preferences or facts.""",
        tools=[make_memory_tools(user_id)[1]],  # add_memory only — search already done
        model="gpt-4o"
    )

    result = await Runner.run(agent, input=message, max_turns=4)

    # Also store the turn explicitly
    add_memory(message, user_id=user_id, memory_type="message", half_life=30)
    add_memory(result.final_output, user_id=user_id,
               memory_type="message", half_life=14, importance=0.8)

    return result.final_output

```

Both patterns work. The function-calling version is more flexible — the agent decides relevance. The pre-injection version reduces round-trips and keeps total latency lower for high-traffic deployments.

## Production: combine with OpenAI file search

Feather DB handles agent memory — the dynamic, evolving knowledge that accrues from interactions. OpenAI's built-in file search tool handles static document knowledge — product manuals, codebases, knowledge bases that don't change turn by turn. The two are complementary, not competing.

```python
from openai import OpenAI
from agents import Agent, Runner, function_tool

client = OpenAI()

# Upload static documents to OpenAI for file search
vector_store = client.vector_stores.create(name="product-docs")
# ... upload your PDFs, markdown files, etc.

search_tool, add_tool = make_memory_tools("user_42")

production_agent = Agent(
    name="Production Assistant",
    instructions="""You are a helpful assistant with two knowledge sources:

1. File search (built-in): use for product documentation, technical specs, policies.
2. search_memory_tool: use for this specific user's history, preferences, and past interactions.

On every turn:
- Call search_memory_tool first for user-specific context.
- Use file search when the question requires authoritative product knowledge.
- After responding, call add_memory_tool to persist anything new about this user.""",
    tools=[
        search_tool,
        add_tool,
        # OpenAI file search is attached via tool_resources, not function_tool
    ],
    tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
    model="gpt-4o"
)

```

Document knowledge lives in OpenAI's infrastructure. User memory lives in Feather DB — local, fast, and owned by you. Neither competes for the other's role.

## Performance in production

OperationLatencyNotes

Cold start (v0.16.0, 50k vectors)48ms`FEATHER_LOAD_THREADS=8`
ANN search p500.19ms500k vectors, k=10
ANN search p990.13ms500k vectors, k=10
`add_batch()` 50k vectors~10s3.4× over sequential loop
Sequential `add()`~2–5ms/callincludes embed round-trip

The 48ms cold start means memory is fully loaded before your first `openai.chat.completions.create()` call returns. In any I/O-dominated agent loop, Feather DB is not your latency bottleneck.

## What you have

- **Persistent memory across runs** — the agent remembers what it learned last session, last month, from the history import.

- **Per-user namespace isolation** — one `.feather` file, zero cross-contamination between users.

- **Adaptive decay** — preferences persist for six months (`half_life=180`); session facts fade in a week (`half_life=7`).

- **Fast history import** — `add_batch()` loads existing chat logs 3.4× faster than a sequential loop.

- **48ms cold start** — memory ready before the first API call completes.

- **Composable with OpenAI file search** — document knowledge and agent memory as separate, non-competing layers.

**Install:** `pip install feather-db openai openai-agents` · **GitHub:** [github.com/feather-store/feather](https://github.com/feather-store/feather)

---

*This is the machine-readable mirror of the theory post at [getfeather.store/theory/feather-db-openai-gpt-agent-memory](https://getfeather.store/theory/feather-db-openai-gpt-agent-memory). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*