Back to Theory
Theory7 min read · June 16, 2026

Feather DB vs ChromaDB: Which Embedded Vector DB Should You Use?

ChromaDB is the fastest way to add a vector store to your prototype. Feather DB is the right engine when that prototype becomes a production agent that needs memories which fade, strengthen, and connect over time. Here is the technical breakdown.

F
Feather DB
Engineering

The one-line version

ChromaDB is the fastest way to add a vector store to your prototype. Feather DB is the right engine when that prototype becomes a production agent that needs memories which fade, strengthen, and connect over time. This post gives you the numbers to decide which situation you are actually in.


Feature comparison

Feature Feather DB ChromaDB
Adaptive decay (memories that fade and strengthen) Yes — recall-count stickiness + half-life No
Typed graph edges + BFS traversal Yes — 9 predefined edge types, context_chain() No
Hybrid BM25 + dense search via RRF Yes — built-in, sub-millisecond overhead No (dense only by default)
MCP server support Yes — feather-serve exposes 14 MCP tools No native MCP support
int8 quantization (in-RAM) Yes — set_int8_ram(), 1.76× RAM reduction No
Cold load speed 48ms (v0.16.0, typical agent index); 1.7s at 40K vectors with parallel load Varies; no published cold-start number
Python API Yes — pybind11 C++ core Yes — native Python
Rust support Yes — feather-db-cli on crates.io No
p50 ANN latency 0.19ms (500K × 128-dim SIFT, ef=50) Not published at this scale
LangChain integration Yes (feather-db-langchain) Yes (default LangChain vector store)
Embedded (no server required) Yes — single .feather file Yes — local persistent client
Community size Early-stage, growing Large — 15k+ GitHub stars
License MIT Apache 2.0

What ChromaDB does well

ChromaDB earned its position as the default starting point for RAG prototypes because it genuinely is easy. You can go from pip install chromadb to a working vector search in under 20 lines of Python, the docs are thorough, and LangChain uses it as the out-of-the-box vector store in hundreds of tutorials. That matters when you are trying to validate an idea in an afternoon.

Three concrete strengths:

  • Familiar API. The collection.add() / collection.query() pattern is the closest thing the vector DB world has to a standard interface. If you have built anything with RAG before, ChromaDB's API is probably already in your muscle memory.
  • Large community. Stack Overflow threads, Discord support, GitHub issues with fast responses, and a large corpus of tutorials all exist for ChromaDB. If you hit an edge case at 2am, someone has probably hit it before.
  • LangChain default. For any project that starts from a LangChain example — which is most projects — ChromaDB is the zero-config vector store. You do not have to change anything to use it.

What Feather DB adds

Feather DB is built around a different hypothesis: production AI agents do not just need a vector index, they need a context engine — a store that knows which memories are still relevant, which are connected to what, and which exact terms cannot be paraphrased. ChromaDB does not attempt to solve any of those problems. Feather DB is designed around all three.

Adaptive decay: memories that evolve

The core formula lives in include/scoring.h:

stickiness    = 1 + log(1 + recall_count)
effective_age = age_days / stickiness
recency       = 0.5 ** (effective_age / half_life)
score         = ((1 - tw) * similarity + tw * recency) * importance

Every time a memory is returned by a search and acknowledged, its recall_count increments. A memory recalled ten times at day 90 has an effective age of 37.5 days instead of 90 — it behaves as though it is recent, because it keeps being useful. A memory never recalled fades naturally. No manual curation required; the retrieval pattern becomes the memory signal.

Stickiness progression at a glance:

recall_countstickinesseffective age at day 90
01.0090.0 days
51.7950.3 days
102.4037.5 days
203.0429.6 days
503.9322.9 days

ChromaDB has no equivalent. Every document in Chroma is as "relevant" on day 90 as it was on day 1, with no signal about whether it has actually been useful to anyone.

Typed graph edges and graph traversal

Feather DB stores typed directional edges between nodes (same_ad, contradicts, relates_to, and six more). The context_chain() call runs a two-phase retrieval: vector ANN search for seed nodes at hop=0, then BFS expansion across typed edges for hop=1, hop=2. The result is a ranked list that includes both semantically similar nodes and their graph-connected context — without any manual join logic.

ChromaDB has no graph layer. Connections between documents must be maintained externally, and traversal requires application-level code.

Hybrid BM25 + dense search

Dense vector search is excellent at semantic similarity but fails on precise tokens: model names like gpt-4o-mini vs gpt-4o, version strings, user IDs, and technical acronyms all get averaged into similar embedding neighborhoods. Feather DB's hybrid search runs both HNSW ANN and BM25, fused via Reciprocal Rank Fusion (RRF). BM25 only changes the ranking when it has strong exact-match signal; otherwise it adds no noise. The overhead is typically 1.2–1.5× pure dense latency — still sub-millisecond.

ChromaDB defaults to dense-only search. There is no built-in BM25 path.

Latency and cold load

Feather DB is a C++17 core with AVX2/AVX512 SIMD distance kernels. At 500K × 128-dim SIFT vectors with ef=50, p50 latency is 0.19ms and p99 is 0.13ms. Cold load for a typical agent-scale index (v0.16.0) comes in at 48ms. With FEATHER_LOAD_THREADS=8, a 40K × 128-dim index loads in 1.7s versus 7.6s serial — a 4.7× improvement that matters for serverless functions and pod restarts.

MCP server support

feather-serve starts an HTTP server that exposes 14 MCP tools — feather_add, feather_search, feather_hybrid_search, feather_context_chain, feather_add_edge, and more — directly connectable from Claude Desktop, Cursor, or any MCP-compatible client. ChromaDB has no native MCP surface.

int8 quantization

At 60K × 768-dim, calling db.set_int8_ram("text", max_abs=1.0) drops RAM from 227MB to 129MB with recall@10 staying at ~0.88 (vs 0.972 at float32). That 1.76× reduction is meaningful on memory-constrained hosts — a 2GB VPS, a Lambda function, an edge device. ChromaDB has no equivalent in-RAM quantization option.


Code comparison: adding an item with metadata

ChromaDB

import chromadb

client = chromadb.PersistentClient(path="./chroma_store")
collection = client.get_or_create_collection("agent_memory")

collection.add(
    ids=["mem_001"],
    embeddings=[embedding_vector],   # your pre-computed vector
    documents=["User prefers concise responses."],
    metadatas=[{
        "session_id": "sess_42",
        "importance": 0.9,
        "created_at": "2026-06-16"
    }]
)

Feather DB

import feather_db as fdb

db = fdb.DB.open("agent_memory.feather", dim=768)

meta = fdb.Metadata()
meta.importance = 0.9
meta.set_attribute("session_id", "sess_42")

db.add(
    vec=embedding_vector,
    id=1001,
    text="User prefers concise responses.",
    meta=meta
)

# Feather automatically tracks recall_count and last_recalled_at.
# At query time, this memory's score is: similarity × recency × importance.
# If it gets recalled 10 times over the next 30 days, it won't decay out.

The APIs are similar in shape. The difference is what happens after ingestion: Feather's scoring engine tracks that memory over time; Chroma treats it as a static document forever.


LongMemEval: why static retrieval falls short on long-horizon tasks

LongMemEval (Xu et al., ICLR 2025) tests 500 questions against a long conversation history — ~115K tokens, ~40 sessions, most of which are distractors. Five ability axes: information-extraction, multi-session reasoning, temporal reasoning, knowledge-updates, and abstention. It is a direct measure of whether the memory system gives the answerer the right context.

Feather DB v0.8.0 + GPT-4o scores 0.693 on LongMemEval_S — beating the paper's full-context GPT-4o ceiling of 0.640, which dumps the entire 115K-token history into the model's context window. A 10-snippet retrieval from a single .feather file delivers more useful signal than the full history, at 40× lower cost per query.

A naive vector RAG pipeline — the kind you would build with ChromaDB and a standard embedding model — scores approximately 0.31 on the same benchmark (paper-reported). The gap is not a coincidence. Static retrieval has no mechanism to handle the questions LongMemEval is designed to test:

  • Temporal reasoning questions (e.g., "what did I say three weeks ago about X?") require a system that weights recency differently than static cosine similarity does. Without temporal decay, a 90-day-old mention of X is indistinguishable from a mention from last week.
  • Knowledge-update questions require the system to prefer the most recently relevant version of a fact. Without decay, a superseded preference scores identically to the current one.
  • Multi-session reasoning is improved by graph edges that link facts across sessions — the context_chain hop=1 expansion that Feather provides and ChromaDB does not.

The 0.693 vs ~0.31 gap is the measurable cost of using a storage engine that does not model time or connection.

System Answerer LongMemEval_S
Feather DB GPT-4o 0.693
Feather DB Gemini 2.5 Flash 0.657 (~$2.40/run)
Full-context GPT-4o (paper ceiling) GPT-4o 0.640
Naive vector RAG (paper-reported) GPT-4o ~0.31

What ChromaDB is missing

This is not a criticism of ChromaDB's design goals — it was built as a straightforward vector store, not as a memory layer. But if you are evaluating it for production agent memory, three gaps matter:

  • No memory decay. A preference stated by a user 180 days ago and never mentioned again scores identically to a preference from yesterday. The agent has no signal that the older one may be stale.
  • No temporal weighting. Time is not a retrieval signal. Queries that are inherently time-anchored — "what did the user say last week?" — cannot be answered correctly without temporal metadata being manually inserted and filtered. Even then, recency is not blended with similarity; it is an either/or hard filter.
  • No typed graph edges. Documents are isolated nodes. Knowing that memory A relates to memory B, that B updates C, or that D was derived from E — all of that structure must live outside the database. Every traversal is a manual application-layer join against the metadata you stored.

When to choose each

Choose ChromaDB when

  • You are building a RAG prototype and need to ship something fast
  • Your retrieval task is purely semantic — no temporal weighting, no graph structure needed
  • You are teaching an LLM course or writing tutorial content and want the zero-friction default
  • Your team already has ChromaDB deployed and the problem does not require decay
  • You need the LangChain integration with zero configuration friction

Choose Feather DB when

  • You are building a production AI agent that needs to remember things across sessions and over time
  • Memories in your system become stale and you need them to fade naturally without manual curation
  • You need frequently accessed memories to stay "sticky" even as calendar time passes
  • Your retrieval includes exact tokens — model names, version strings, user IDs — where BM25 matters
  • You want graph edges linking memories across sessions without maintaining external join tables
  • You need MCP tools for Claude Desktop or Cursor without writing a custom server
  • You are on a memory-constrained host and need int8 quantization to fit the index in RAM
  • Latency is a primary constraint and you need a C++ HNSW core, not a Python implementation

A note on complexity

Feather DB's scoring formula has more parameters than a plain cosine similarity search. half_life, time_weight, importance, and recall_count all interact. That is intentional — agent memory is more complex than document retrieval, and the formula reflects that complexity honestly rather than hiding it. The tradeoff is that there are more knobs to tune. For a RAG prototype where all documents are equally fresh and equally important, those knobs add friction you do not need. For a long-running agent where context evolves over weeks, they are the difference between a system that actually works and one that slowly fills with stale noise.

The "SQLite vs MySQL" framing applies here too: ChromaDB is excellent for the single-machine, no-administration, get-it-running use case. Feather DB adds the features that matter when the agent runs in production and time becomes a variable in your retrieval problem.


Getting started

# Feather DB
pip install feather-db

# Start the MCP server with a real embedder
feather-serve ~/memory/agent.feather --embed-provider gemini --port 7700

Feather DB is part of Hawky.ai — AI-native development tools.