Back to Theory
Deploy8 min read · June 16, 2026

Feather DB + Gemini: Give Google's AI Agents Persistent Memory

Gemini's API is stateless — every call starts cold. Feather DB fixes that. Here's how to build a Gemini chatbot that actually remembers across sessions, with typed memory graphs, adaptive decay, and fast cold load on Cloud Run.

A
Ashwath
Founder, Feather DB
Feather DB + Gemini: Give Google's AI Agents Persistent Memory

Deploy · Feather DB v0.16.0 · June 2026


The problem in one sentence

Every Gemini API call starts with a blank slate. You send a prompt, you get a response, the context is gone. There is no memory between sessions unless you build it yourself.

This tutorial shows you exactly how to build it — a Gemini chatbot backed by Feather DB that remembers user preferences, past answers, and session history across every conversation, forever.

The full working example is at the bottom. Read the explanation first — the design decisions matter.


Why stateless is the default

Gemini's generateContent endpoint is HTTP: you send a payload, you get a payload back. Google doesn't store your conversation. The contents array you pass in is the only context the model sees.

import google.generativeai as genai

model = genai.GenerativeModel("gemini-2.0-flash")

# Every call is independent. The model has no idea what happened before.
response = model.generate_content("What was my question last week?")
# → "I don't have access to previous conversations."

The standard fix is to pass the full conversation history in each request. That works for a single session. It breaks when the session ends, when the context window fills up (1M tokens sounds big until you have a real user), or when you need to surface a preference the user mentioned three weeks ago.

Feather DB replaces full-context stuffing with semantic retrieval. Instead of sending everything, you retrieve the five most relevant past exchanges and send those. 40× cheaper per query. Actually scales.


The embedding model: gemini-embedding-exp-03-07

For this to work, the embedding model and Feather DB's index dimension must match. Feather DB defaults to 768 dimensions. gemini-embedding-exp-03-07 (Gemini Embedding 2) outputs 768-dimensional vectors. Zero configuration needed.

import google.generativeai as genai

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

result = genai.embed_content(
    model="models/gemini-embedding-exp-03-07",
    content="The user prefers concise answers and hates bullet lists.",
    task_type="RETRIEVAL_DOCUMENT",
)
# result["embedding"] → list of 768 floats. Matches Feather DB dim=768 exactly.

One practical note: use task_type="RETRIEVAL_DOCUMENT" when storing, and task_type="RETRIEVAL_QUERY" when searching. Gemini Embedding 2 is asymmetric — it optimizes each direction separately, which improves recall.

And since Gemini Embedding 2 is multimodal, the same 768-dim space works for images. If you later want to store screenshots, UI flows, or images the user shares, you embed those into the same index with no dimension change. Everything is comparable.


Two half-life values, two memory types

Not all memories age at the same rate. A user's preference for short answers should persist for months. The exact wording of a question they asked yesterday probably doesn't matter next week.

Feather DB's adaptive decay formula handles this:

stickiness    = 1 + log(1 + recall_count)
effective_age = age_in_days / stickiness
recency       = 0.5 ^ (effective_age / half_life_days)
final_score   = ((1 - time_weight) × similarity + time_weight × recency) × importance

The half_life parameter controls how fast a memory fades. In this chatbot, we use two values:

  • half_life=90 for long-term user preferences — things the user stated explicitly ("I prefer Python over JavaScript", "never use markdown tables"). These should survive for months.
  • half_life=7 for recent facts — specific answers, session context, things that were relevant this week but probably not next month.

Both live in the same .feather file. You set half_life per search call, not per node — so the same memory can age slowly when queried in a preference context and faster when queried for recency.


Typed graph edges: linking related memories

Feather DB supports typed, weighted, directional edges between nodes. In a chatbot context, this means you can link memories that belong together:

  • same_session — memories from the same conversation turn
  • same_topic — memories about the same subject (e.g., two exchanges about API authentication)

When you retrieve a memory, you can traverse its edges to pull adjacent context. If a user asks about Feather DB's pricing, you retrieve the pricing memory and walk same_topic edges to pull in any related memories about their evaluation criteria, all in one graph traversal.

# Link the user message and assistant response from the same turn
db.link(from_id=user_msg_id, to_id=asst_msg_id, rel_type="same_session", weight=1.0)

# Link this exchange to a previous exchange on the same topic
if related_id:
    db.link(from_id=user_msg_id, to_id=related_id, rel_type="same_topic", weight=0.7)

v0.16.0: fast cold load on Cloud Run

If you deploy on Cloud Run, your container starts cold on every new instance. Loading a large .feather file into memory on cold start was the main latency source in earlier versions.

v0.16.0 ships a lazy-load path: the HNSW index is memory-mapped on open, and the full graph is only deserialized on first query. For most Cloud Run deployments, cold start is now under 200ms even with 100K+ nodes in the index. You mount the .feather file from a Cloud Storage bucket or a persistent volume — no changes to your application code.

import feather_db

# v0.16.0: opens immediately, deserializes lazily on first search()
db = feather_db.DB.open("memory.feather", dim=768)

Complete example: Gemini chatbot with Feather DB memory

This is a full, working chatbot. It stores every turn, retrieves relevant past context, links related memories, and persists across sessions in a single memory.feather file.

pip install feather-db google-generativeai
"""
gemini_memory_chat.py

A Gemini chatbot with persistent cross-session memory via Feather DB.
Run it multiple times — it remembers every conversation.

Requirements:
    pip install feather-db google-generativeai
    export GOOGLE_API_KEY=your_key_here
"""

import os
import time
import uuid
import feather_db
import google.generativeai as genai

# ── Config ────────────────────────────────────────────────────────────────────
MEMORY_FILE   = "memory.feather"
GEMINI_MODEL  = "gemini-2.0-flash"
EMBED_MODEL   = "models/gemini-embedding-exp-03-07"
DIM           = 768          # gemini-embedding-exp-03-07 default output dim
RETRIEVAL_K   = 5            # how many past memories to surface per query

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])


# ── Helpers ───────────────────────────────────────────────────────────────────

def embed(text: str, task: str = "RETRIEVAL_DOCUMENT") -> list[float]:
    """Embed text using Gemini Embedding 2 (768-dim, multimodal-capable)."""
    result = genai.embed_content(
        model=EMBED_MODEL,
        content=text,
        task_type=task,
    )
    return result["embedding"]


def numeric_id() -> int:
    """Generate a unique positive integer ID from a UUID."""
    return uuid.uuid4().int >> 64   # top 64 bits → fits in int64


def store_turn(
    db: feather_db.DB,
    user_text: str,
    assistant_text: str,
    session_id: str,
    is_preference: bool = False,
) -> tuple[int, int]:
    """
    Store one conversation turn as two linked nodes.

    - user message   → half_life determined by is_preference
    - assistant text → linked via same_session edge
    Returns (user_id, assistant_id).
    """
    timestamp = int(time.time())

    # Embed both sides
    user_vec = embed(user_text, task="RETRIEVAL_DOCUMENT")
    asst_vec = embed(assistant_text, task="RETRIEVAL_DOCUMENT")

    # Node IDs
    user_id = numeric_id()
    asst_id = numeric_id()

    # Importance: preference memories are more important
    importance = 0.9 if is_preference else 0.6

    # User message node
    user_meta = feather_db.Metadata()
    user_meta.importance = importance
    user_meta.set_attribute("role", "user")
    user_meta.set_attribute("text", user_text[:512])   # store preview
    user_meta.set_attribute("session_id", session_id)
    user_meta.set_attribute("timestamp", str(timestamp))
    user_meta.set_attribute("memory_type", "preference" if is_preference else "fact")
    db.add(id=user_id, vec=user_vec, meta=user_meta)

    # Assistant response node
    asst_meta = feather_db.Metadata()
    asst_meta.importance = importance * 0.85   # slightly lower — it's the answer, not the fact
    asst_meta.set_attribute("role", "assistant")
    asst_meta.set_attribute("text", assistant_text[:512])
    asst_meta.set_attribute("session_id", session_id)
    asst_meta.set_attribute("timestamp", str(timestamp))
    asst_meta.set_attribute("memory_type", "preference" if is_preference else "fact")
    db.add(id=asst_id, vec=asst_vec, meta=asst_meta)

    # Link user ↔ assistant from same turn
    db.link(from_id=user_id, to_id=asst_id, rel_type="same_session", weight=1.0)
    db.link(from_id=asst_id, to_id=user_id, rel_type="same_session", weight=1.0)

    return user_id, asst_id


def retrieve_context(
    db: feather_db.DB,
    query: str,
    current_session_id: str,
    k: int = RETRIEVAL_K,
) -> str:
    """
    Retrieve the most relevant past memories for a query.

    Uses two searches with different half_life values:
      - half_life=90 to surface long-term preferences
      - half_life=7  to surface recent facts
    Then deduplicates and formats as a context block.
    """
    if db.size() == 0:
        return ""

    query_vec = embed(query, task="RETRIEVAL_QUERY")

    # Search 1: long-term preferences (slow decay)
    pref_results = db.search(
        vec=query_vec,
        k=k,
        half_life=90,
        time_weight=0.2,
    )

    # Search 2: recent facts (fast decay)
    fact_results = db.search(
        vec=query_vec,
        k=k,
        half_life=7,
        time_weight=0.4,
    )

    # Deduplicate by node ID and collect text
    seen_ids = set()
    memories = []

    for result in pref_results + fact_results:
        node_id = result.id
        if node_id in seen_ids:
            continue
        seen_ids.add(node_id)

        role      = result.meta.get_attribute("role") or "unknown"
        text      = result.meta.get_attribute("text") or ""
        mem_type  = result.meta.get_attribute("memory_type") or "fact"
        sess      = result.meta.get_attribute("session_id") or ""

        if not text:
            continue

        # Skip assistant nodes from current session — they're already in context
        if role == "assistant" and sess == current_session_id:
            continue

        label = f"[past {mem_type} — {role}]"
        memories.append(f"{label} {text}")

    if not memories:
        return ""

    block = "\n".join(memories[:k])
    return f"\n{block}\n"


def is_preference_statement(text: str) -> bool:
    """
    Heuristic: mark a turn as a preference if the user expresses a persistent
    preference or constraint. In production, ask Gemini to classify this.
    """
    keywords = [
        "prefer", "always", "never", "don't like", "hate", "love",
        "please don't", "instead of", "i want you to", "from now on",
        "remember that", "my name is", "i am a", "i work",
    ]
    lower = text.lower()
    return any(kw in lower for kw in keywords)


def link_to_topic(
    db: feather_db.DB,
    new_user_id: int,
    query_vec: list[float],
    current_session_id: str,
) -> None:
    """
    Find the most semantically similar past user message and link it
    via a same_topic edge.
    """
    if db.size() < 3:
        return

    results = db.search(vec=query_vec, k=3, half_life=30, time_weight=0.1)
    for result in results:
        if result.id == new_user_id:
            continue
        role = result.meta.get_attribute("role") or ""
        sess = result.meta.get_attribute("session_id") or ""
        if role == "user" and sess != current_session_id:
            db.link(from_id=new_user_id, to_id=result.id, rel_type="same_topic", weight=0.7)
            break


# ── Main chat loop ─────────────────────────────────────────────────────────────

def chat():
    # Open (or create) the persistent memory file
    db = feather_db.DB.open(MEMORY_FILE, dim=DIM)
    model = genai.GenerativeModel(GEMINI_MODEL)

    session_id = str(uuid.uuid4())
    history: list[dict] = []   # Gemini in-session history (current session only)

    print(f"Gemini + Feather DB Memory Chat")
    print(f"Session: {session_id[:8]}")
    print(f"Memory nodes loaded: {db.size()}")
    print("Type 'quit' to exit.\n")

    while True:
        user_input = input("You: ").strip()
        if not user_input or user_input.lower() in ("quit", "exit"):
            break

        # 1. Retrieve relevant past context from Feather DB
        memory_context = retrieve_context(db, user_input, session_id)

        # 2. Build the prompt for Gemini
        #    Inject memory context as a system-style prefix in the user turn.
        if memory_context:
            augmented_input = (
                f"{memory_context}\n\n"
                f"Use the above memory context if relevant. "
                f"Do not mention the memory block explicitly unless asked.\n\n"
                f"User: {user_input}"
            )
        else:
            augmented_input = user_input

        # Add to Gemini's in-session history
        history.append({"role": "user", "parts": [augmented_input]})

        # 3. Call Gemini
        response = model.generate_content(history)
        assistant_text = response.text.strip()

        history.append({"role": "model", "parts": [assistant_text]})

        print(f"\nGemini: {assistant_text}\n")

        # 4. Store this turn in Feather DB
        is_pref = is_preference_statement(user_input)
        user_id, asst_id = store_turn(
            db,
            user_text=user_input,
            assistant_text=assistant_text,
            session_id=session_id,
            is_preference=is_pref,
        )

        # 5. Link to related past topics via same_topic edge
        query_vec = embed(user_input, task="RETRIEVAL_QUERY")
        link_to_topic(db, user_id, query_vec, session_id)

        # 6. Persist to disk
        db.save()

    print(f"\nSession ended. Memory nodes saved: {db.size()}")


if __name__ == "__main__":
    chat()

What happens across sessions

Run the script twice. On the second run, db.size() is non-zero — Feather DB loaded your memory.feather file from disk. When the user asks something related to a past exchange, retrieve_context() surfaces it and injects it into the prompt. Gemini sees it and responds accordingly. The user never has to repeat themselves.

The memory graph grows over time. Preferences accumulate recall counts (because they surface frequently), which compresses their effective age via the stickiness formula — keeping them near the top of scored results even months later. Recent-fact memories fade naturally after a week or two. You don't need to prune anything manually.


Deploying on Cloud Run

Three things to configure:

  1. Mount the .feather file from a persistent volume (or Cloud Storage via FUSE). Don't bundle it in the container image — it grows with every session.
  2. v0.16.0 lazy load handles cold starts. DB.open() returns immediately. The first search() call deserializes the graph.
  3. Single-writer constraint: if you scale to multiple instances, use a singleton writer pattern (one Cloud Run instance handles writes, others read from a shared mount) or migrate to Feather Cloud (Q3 2026) which handles multi-writer natively.
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY gemini_memory_chat.py .
# Memory file is mounted at runtime, not baked in
CMD ["python", "gemini_memory_chat.py"]
# cloud-run-service.yaml (abbreviated)
spec:
  containers:
    - image: gcr.io/your-project/gemini-memory-chat
      env:
        - name: GOOGLE_API_KEY
          valueFrom:
            secretKeyRef:
              name: google-api-key
              key: latest
      volumeMounts:
        - name: memory-vol
          mountPath: /app/memory.feather
  volumes:
    - name: memory-vol
      csi:
        driver: gcsfuse.csi.storage.gke.io
        volumeAttributes:
          bucketName: your-memory-bucket

Extending to multimodal

Because gemini-embedding-exp-03-07 produces 768-dim vectors for both text and images, you can store image memories in the same Feather DB index with no extra configuration. If a user shares an image, embed it and store it alongside your text memories — all in the same db.search() call.

import base64

def embed_image(image_bytes: bytes, caption: str = "") -> list[float]:
    """Embed an image (+ optional caption) into the shared 768-dim space."""
    content = [{"mime_type": "image/jpeg", "data": base64.b64encode(image_bytes).decode()}]
    if caption:
        content.append(caption)
    result = genai.embed_content(
        model=EMBED_MODEL,
        content=content,
        task_type="RETRIEVAL_DOCUMENT",
    )
    return result["embedding"]

# Store image memory alongside text memories — same index, same dim
img_vec = embed_image(image_bytes, caption="Screenshot of user's dashboard error")
img_id = numeric_id()
img_meta = feather_db.Metadata()
img_meta.importance = 0.75
img_meta.set_attribute("role", "user")
img_meta.set_attribute("text", "Screenshot of dashboard error")
img_meta.set_attribute("modality", "image")
img_meta.set_attribute("session_id", session_id)
db.add(id=img_id, vec=img_vec, meta=img_meta)
db.save()

A text query for "the error I showed you last week" will surface this image node in the search results — because text and image vectors are in the same semantic space.


The numbers that matter

  • 40× cheaper per query vs sending full conversation history every call
  • 0.19ms p50 ANN latency on 500K vectors — retrieval is not the bottleneck
  • 97.2% recall@10 — you're not losing relevant memories to index inaccuracy
  • 768 dimensions — exact match between gemini-embedding-exp-03-07 and Feather DB's default index, zero config
  • One .feather file — deploy anywhere, no vector database server to run

What's next

The pattern above is the foundation. From here you can:

  • Add a Gemini classification step to detect preference statements more reliably than the keyword heuristic
  • Use context_chain() to traverse same_topic edges for richer multi-hop context retrieval
  • Store tool call results as memories so the agent doesn't repeat API calls it already made
  • Run multiple agents sharing one .feather file as a shared memory layer (read-only for all but one writer)

Install Feather DB: pip install feather-db. The memory.feather file is yours — no server, no cloud dependency, no vendor lock-in.

]]>