Back to Theory
Deploy10 min read · June 16, 2026

Feather DB in Production: Deployment Patterns and Best Practices

Embedded, feather-serve daemon, or remote MCP — Feather DB runs wherever you need it. This guide covers namespace design, file management, compaction, memory optimization, cold-start tuning, bulk ingestion, Docker, monitoring, multi-tenant patterns, and disaster recovery.

F
Feather DB
Engineering

Deployment modes

Feather DB ships three deployment modes. Pick one based on where your embedding logic lives and who needs to reach the index.

Embedded (single process)

The default mode. Your Python process opens a .feather file directly — no network hop, no daemon, sub-millisecond search latency. Use this for single-service backends, batch pipelines, and local development.

import feather_db as fdb

db = fdb.DB.open("memory.feather", dim=768)
db.add(vec, text="Alice prefers dark mode.", namespace="user-alice")
results = db.search(query_vec, k=5, namespace="user-alice")
db.save()

The file is created on first open and grown incrementally. You own the process lifecycle — save before exit.

feather-serve (local daemon)

Run feather-serve as a long-lived process. It exposes a REST API at /api/v1/, an MCP endpoint at /mcp, and the Atlas admin SPA at /admin/. Other services call HTTP instead of linking the library directly. Use this when multiple processes share one context store, or when you want the admin SPA for inspection.

GOOGLE_API_KEY=your-key feather-serve memory.feather \
  --embed-provider gemini --dim 768 --port 7700

With --embed-provider, feather-serve handles embeddings server-side. Clients send raw text; Feather returns ranked memories. No embedding pipeline in your application code.

Remote MCP (network)

Connect Claude Desktop, Claude Code, or any MCP-compatible agent to a running feather-serve instance. The agent calls feather_search, feather_add, feather_context_chain, and 11 other tools as native MCP calls. Use this for persistent agent memory across conversations.

// claude_desktop_config.json
{
  "mcpServers": {
    "feather-memory": {
      "url": "http://localhost:7700/mcp"
    }
  }
}

The three modes compose: in production you might run feather-serve in Docker (mode 2), accessed by your Python API via REST (mode 1 semantics over HTTP) and by Claude via MCP (mode 3).

Namespace design

Namespaces are hard tenant boundaries. Memories in "user-alice" are completely invisible to searches in "user-bob". Search latency scales with the number of memories in that namespace, not across all tenants — which is what makes a shared file practical at scale.

The standard pattern:

  • namespace = tenant boundary — user_id, agent_id, org_id
  • entity = topic group within a namespace — "preferences", "work-context", "current-project"
  • attributes = secondary metadata — type, source, created_at, confidence
import feather_db as fdb
from datetime import datetime, timezone

db = fdb.DB.open("saas_memory.feather", dim=768)

def add_memory(user_id: str, text: str, category: str,
               mem_type: str, importance: float = 1.0, vec=None):
    if vec is None:
        vec = embed(text)
    mem = db.add(vec, text=text, namespace=user_id, entity=category)
    mem.meta.importance = importance
    mem.meta.set_attribute("type", mem_type)
    mem.meta.set_attribute("created_at", datetime.now(timezone.utc).isoformat())
    return mem

def search_memory(user_id: str, query: str,
                  category: str = None, k: int = 5):
    return db.search(embed(query), k=k,
                     namespace=user_id, entity=category)

# One tenant per user_id — strict isolation, zero cross-contamination
add_memory("user-42", "Prefers Python for backend, TypeScript for frontend.",
           category="preferences", mem_type="fact", importance=1.2)

add_memory("user-42", "Building a fintech SaaS, Series A in Q3.",
           category="work-context", mem_type="fact", importance=1.5)

results = search_memory("user-42", "What stack does this user prefer?")

Agent roles get their own namespaces too. A planner agent and a coder agent operating on the same codebase should have separate namespaces — "agent-planner" and "agent-coder" — so their context stores don't bleed into each other's retrieval.

File management

Where to store .feather files

Keep .feather files on durable, fast-seek storage — SSD-backed volumes or network-attached storage with high IOPS. The file is read fully at startup (HNSW graph reconstruction) and written atomically on db.save(). Write latency matters at save time; read IOPS matter at startup.

Recommended layout for a production service:

/data/feather/
  production.feather        # primary store
  production.feather.bak    # last manual snapshot
  staging.feather           # staging environment

In Docker, always use a named volume — never a bind mount to a temp directory:

volumes:
  feather-data:
    driver: local

services:
  feather-api:
    volumes:
      - feather-data:/data

Backup strategy

The .feather file is self-contained: vectors, HNSW graph, metadata, edges, and namespace index are all in one binary. A copy is a backup.

# Snapshot before a risky operation (migration, bulk delete)
cp /data/feather/production.feather \
   /data/feather/production.$(date +%Y%m%d-%H%M%S).bak

# Restore is equally simple
cp /data/feather/production.20260616-0900.bak \
   /data/feather/production.feather

For scheduled backups, copy the file to object storage (S3, GCS) nightly. Because db.save() writes atomically to a temp file then renames, a concurrent copy during a save will always get a consistent snapshot — either the old file or the new one, never a partial write.

# Nightly backup cron
0 2 * * * cp /data/feather/production.feather \
             s3://your-bucket/backups/production.$(date +%Y%m%d).feather

Compaction

Feather's HNSW graph accumulates soft-deleted nodes when you call forget() or purge(). These nodes no longer appear in search results but still occupy space in the graph, slowing traversal slightly. compact() rewrites the file clean — removed nodes gone, HNSW rebuilt tight, load time faster on the next restart.

db.compact()   # rebuilds graph in-place, rewrites .feather file
db.save()      # flush the compacted state to disk

When to compact:

  • After a bulk forget() or purge() that removes more than ~10% of nodes
  • After onboarding data migrations where you replaced old records
  • Before a disaster recovery restore — compact the source file first so the restore loads a tight index
  • On a weekly schedule for stores with frequent deletes
import schedule, time

def weekly_compact():
    db.purge(namespace="*", older_than_days=90)  # evict stale memories
    db.compact()
    db.save()
    print(f"Compacted. Vectors remaining: {db.count()}")

schedule.every().monday.at("03:00").do(weekly_compact)

while True:
    schedule.run_pending()
    time.sleep(60)

Compaction is also the fastest way to reduce file size before shipping a snapshot to a new environment.

Memory management

Adaptive capacity (v0.15.3)

v0.15.3 ships adaptive HNSW capacity: the index grows incrementally instead of pre-allocating max_elements upfront. For typical deployments that start small and grow over weeks, this delivers 7.7× less RAM at startup compared to pre-allocating for 1M elements on an empty index. The change is automatic — no config required.

int8 RAM quantization

For memory-constrained hosts (1–2 GB VPS, edge devices, Lambda), enable in-RAM int8 quantization after load:

db = fdb.DB.open("memory.feather", dim=768)
db.set_int8_ram("text", max_abs=1.0)  # 1.76× less RAM, recall@10 ~0.88 vs 0.972

At 60k × 768-dim float32, RAM drops from 227 MB to 129 MB. Recall@10 moves from 0.972 to ~0.88. For context retrieval — surfacing 5–10 relevant memories per query — 0.88 recall is completely acceptable.

ModeRAM (60k × 768-dim)Recall@10
float32 (default)227 MB0.972
int8 in-RAM129 MB~0.88

Stick with float32 when running precision benchmarks, when RAM is not a constraint, or when your index is under 20k vectors and there's no reason to trade recall.

Cold start: persisted HNSW (v0.16.0)

v0.16.0 ships persisted HNSW graph state. The HNSW graph is stored in a ready-to-load binary layout inside the .feather file — no reconstruction on startup. Cold start at 500k vectors drops from 2.7s to 48ms.

This is the highest-impact change for serverless deployments and Kubernetes pods with frequent restarts. Before v0.16.0, parallel load via FEATHER_LOAD_THREADS was the primary lever:

import os
import feather_db as fdb

# v0.15.x: parallel graph reconstruction (4.7× faster than serial)
os.environ["FEATHER_LOAD_THREADS"] = "8"
db = fdb.DB.open("memory.feather", dim=768)

With v0.16.0, FEATHER_LOAD_THREADS is still respected during initial file creation and explicit rebuilds, but routine opens skip reconstruction entirely. Set it in your environment regardless — it handles fallback cases.

VersionCold start (500k vectors)Notes
v0.15.x serial~2.7sSingle-threaded reconstruction
v0.15.x + FEATHER_LOAD_THREADS=8~0.6sParallel reconstruction, 4.7×
v0.16.048msPersisted graph, no reconstruction
# Dockerfile — set regardless of version
ENV FEATHER_LOAD_THREADS=8

Bulk ingestion with add_batch()

For ingesting more than ~1k vectors at once — corpus imports, historical data seeding, document chunking pipelines — use add_batch(). It builds the HNSW graph in parallel with the GIL released: 3.4× faster than a sequential loop on a 4-core machine, ~5–6× on an 8-core machine.

import feather_db as fdb
import numpy as np

db = fdb.DB.open("corpus.feather", dim=768)

# Prepare vectors and metadata in bulk
texts  = load_your_documents()          # list of strings
vecs   = embed_batch(texts)             # np.ndarray shape (N, 768), float32
scores = load_importance_scores(texts)  # np.ndarray shape (N,)

metas = []
for i, (text, score) in enumerate(zip(texts, scores)):
    m = fdb.Metadata(importance=float(min(1.0, score)))
    m.set_attribute("source", "batch_import_2026")
    m.set_attribute("doc_id", str(i))
    metas.append(m)

ids = list(range(len(texts)))

# Single parallel call — no Python loop overhead
db.add_batch(ids, vecs, metas=metas)
db.save()

print(f"Ingested {db.count()} vectors")

Use add() for real-time single-item inserts (one memory after each conversation turn). Use add_batch() for everything else. The crossover point is roughly 1,000 items — below that the overhead isn't worth it; above it the speedup compounds.

Important: always use meta.set_attribute(key, value), not meta.attributes[key] = value. The dict accessor silently does nothing due to pybind11 copy semantics.

Docker: self-hosted feather-serve

# Clone and build
git clone https://github.com/feather-store/feather.git
cd feather
docker compose -f feather-api/docker-compose.yml build
# feather-api/docker-compose.yml
version: '3.9'

services:
  feather-api:
    image: feather-api:latest
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "${FEATHER_PORT:-7700}:7700"
    volumes:
      - feather-data:/data        # persistent across restarts and image updates
    environment:
      - FEATHER_API_KEY=${FEATHER_API_KEY}
      - FEATHER_EMBED_PROVIDER=${FEATHER_EMBED_PROVIDER:-gemini}
      - GOOGLE_API_KEY=${GOOGLE_API_KEY:-}
      - OPENAI_API_KEY=${OPENAI_API_KEY:-}
      - FEATHER_LOAD_THREADS=8
      - FEATHER_DIM=${FEATHER_DIM:-768}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:7700/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s
    restart: unless-stopped

volumes:
  feather-data:
    driver: local
# feather-api/.env
FEATHER_API_KEY=your-secret-key
FEATHER_EMBED_PROVIDER=gemini
GOOGLE_API_KEY=your-google-key
FEATHER_LOAD_THREADS=8
FEATHER_DIM=768

# Start
docker compose -f feather-api/docker-compose.yml up -d

# Verify
curl http://localhost:7700/health
# {"status": "ok", "version": "0.16.0", "vectors": 0, "dim": 768}

For production HTTPS, put Nginx or Caddy in front with proxy_pass http://feather-api:7700 and a Let's Encrypt certificate. The MCP endpoint and admin SPA both work behind a reverse proxy with no additional configuration.

The named volume feather-data persists the .feather file across container restarts, image updates, and host reboots. Never bind-mount to a temp directory — you will lose all memories on container restart.

Monitoring

Feather exposes the metrics you need through the API and in-process.

import time
import feather_db as fdb

start = time.perf_counter()
db = fdb.DB.open("memory.feather", dim=768)
load_time_ms = (time.perf_counter() - start) * 1000

# Core health metrics
record_count  = db.count()
namespace_list = db.list_namespaces()

# Memory estimate (float32 baseline)
ram_estimate_mb = record_count * 768 * 4 / 1e6

print(f"Load time:    {load_time_ms:.0f}ms")
print(f"Vectors:      {record_count:,}")
print(f"Namespaces:   {len(namespace_list)}")
print(f"RAM (est.):   {ram_estimate_mb:.0f} MB")

# Per-namespace counts for multi-tenant monitoring
for ns in namespace_list:
    ns_count = db.count(namespace=ns)
    print(f"  {ns}: {ns_count} vectors")

Expose these as a /metrics endpoint (Prometheus) or ship them to your observability stack on a 60-second interval. The four numbers to track in production: record count (growth rate), load time (regression signal if it spikes), RAM usage (capacity planning), and per-namespace counts (detect runaway tenants).

Via the REST API from feather-serve:

curl -H "Authorization: Bearer $FEATHER_API_KEY" \
     http://localhost:7700/api/v1/stats
# {
#   "total_vectors": 84321,
#   "namespaces": 412,
#   "file_size_mb": 247.3,
#   "load_time_ms": 48
# }

Multi-tenant patterns

Two patterns exist for multi-tenant deployments. Choose based on your tenant count and isolation requirements.

One file per tenant

Each tenant gets a dedicated .feather file. Strong physical isolation, easy per-tenant backup and deletion, and predictable per-file memory usage. Best for: small tenant counts (<100), enterprise customers who need data residency guarantees, or tenants with very large individual corpora (>500k vectors each).

def get_db(tenant_id: str) -> fdb.DB:
    path = f"/data/feather/tenant-{tenant_id}.feather"
    db = fdb.DB.open(path, dim=768)
    return db

# Backup one tenant
def backup_tenant(tenant_id: str):
    import shutil
    shutil.copy(
        f"/data/feather/tenant-{tenant_id}.feather",
        f"/data/backups/tenant-{tenant_id}.{today()}.feather"
    )

# Delete a tenant completely — just delete the file
def offboard_tenant(tenant_id: str):
    import os
    os.remove(f"/data/feather/tenant-{tenant_id}.feather")

Namespace-per-tenant in a shared file

All tenants share one .feather file, isolated by namespace. Best for: SaaS products with hundreds to tens of thousands of users, where per-file overhead would be impractical. Search latency scales with per-namespace vector count, not total file size.

# One file, thousands of tenants — namespace enforces isolation
db = fdb.DB.open("/data/feather/production.feather", dim=768)

def add_for_user(user_id: str, text: str, category: str):
    db.add(embed(text), text=text,
           namespace=user_id, entity=category)
    db.save()

def search_for_user(user_id: str, query: str, k: int = 5):
    return db.search(embed(query), k=k, namespace=user_id)

# Delete all memories for a user (GDPR, offboarding)
def delete_user_data(user_id: str):
    db.purge(namespace=user_id)
    db.compact()
    db.save()

For very large deployments (100k+ namespaces), shard by namespace hash across multiple files with a routing layer. Each shard benefits from persisted HNSW load (48ms cold start) and int8 RAM quantization independently.

Disaster recovery

The .feather file is self-contained. Everything — vectors, HNSW graph, metadata, typed edges, namespace index — is in one binary. Recovery is a file copy.

import shutil
from pathlib import Path

def restore_from_backup(backup_path: str, target_path: str):
    """Restore a .feather file from backup. No special tooling needed."""
    shutil.copy(backup_path, target_path)
    # Verify the restore loaded clean
    db = fdb.DB.open(target_path, dim=768)
    print(f"Restored. Vectors: {db.count()}, Namespaces: {len(db.list_namespaces())}")
    return db

Disaster recovery checklist:

  • Snapshot the file before every migration: cp production.feather production.$(date +%Y%m%d).bak
  • Ship nightly backups to object storage — one copy is not a backup
  • Compact before shipping a snapshot to a new environment — smaller file, faster restore load
  • Test your restore path quarterly: copy a backup, open it, verify count and namespace list
  • Never modify a .feather file directly — always go through the Feather API. The format has a checksum; corrupt files will fail to open with a clear error rather than silently returning wrong results

Production setup: putting it together

A complete production Python service with namespace isolation, startup optimization, compaction schedule, and monitoring:

import os
import time
import logging
import schedule
import feather_db as fdb

logger = logging.getLogger("feather")

# ── Startup ──────────────────────────────────────────────────────────────
os.environ["FEATHER_LOAD_THREADS"] = "8"   # parallel HNSW load

start = time.perf_counter()
DB = fdb.DB.open("/data/feather/production.feather", dim=768)
load_ms = (time.perf_counter() - start) * 1000

# Optional: int8 quantization for memory-constrained hosts
# DB.set_int8_ram("text", max_abs=1.0)  # 1.76× less RAM, recall@10 ~0.88

logger.info(f"Feather ready. vectors={DB.count()} load_ms={load_ms:.0f}")

EMBED = load_your_embedder()  # e.g. Gemini, OpenAI, Voyage

# ── Core operations ───────────────────────────────────────────────────────
def add_memory(user_id: str, text: str, category: str,
               mem_type: str = "fact", importance: float = 1.0):
    vec = EMBED(text)
    mem = DB.add(vec, text=text, namespace=user_id, entity=category)
    mem.meta.importance = importance
    mem.meta.set_attribute("type", mem_type)
    mem.meta.set_attribute("created_at", time.strftime("%Y-%m-%dT%H:%M:%SZ"))
    DB.save()
    return mem.id

def search_memory(user_id: str, query: str,
                  category: str = None, k: int = 5):
    vec = EMBED(query)
    return DB.search(vec, k=k, namespace=user_id, entity=category)

def delete_user(user_id: str):
    """Full GDPR delete — purge namespace, compact, save."""
    DB.purge(namespace=user_id)
    DB.compact()
    DB.save()
    logger.info(f"Deleted namespace={user_id}")

# ── Bulk ingestion ────────────────────────────────────────────────────────
def bulk_seed(user_id: str, records: list[dict]):
    """Seed a user's historical data. Use add_batch for >1k records."""
    import numpy as np
    texts = [r["text"] for r in records]
    vecs  = np.array([EMBED(t) for t in texts], dtype=np.float32)
    ids   = list(range(DB.count(), DB.count() + len(records)))
    metas = []
    for r in records:
        m = fdb.Metadata(importance=r.get("importance", 1.0))
        m.set_attribute("source", r.get("source", "seed"))
        metas.append(m)
    DB.add_batch(ids, vecs, metas=metas, namespace=user_id)
    DB.save()
    logger.info(f"Seeded {len(records)} records for user={user_id}")

# ── Maintenance schedule ──────────────────────────────────────────────────
def weekly_maintenance():
    before = DB.count()
    DB.purge(older_than_days=90)   # evict memories not recalled in 90 days
    DB.compact()
    DB.save()
    after = DB.count()
    logger.info(f"Maintenance: {before - after} nodes pruned, {after} remaining")

schedule.every().monday.at("03:00").do(weekly_maintenance)

# ── Metrics ───────────────────────────────────────────────────────────────
def emit_metrics():
    namespaces = DB.list_namespaces()
    ram_mb = DB.count() * 768 * 4 / 1e6
    logger.info(
        f"metrics vectors={DB.count()} "
        f"namespaces={len(namespaces)} "
        f"ram_estimate_mb={ram_mb:.0f}"
    )

schedule.every(60).seconds.do(emit_metrics)

This pattern runs well on a 2-core / 2 GB VPS serving up to ~10k tenants in a shared file. For larger deployments, shard by hash(user_id) % N across N files, each served by its own feather-serve instance behind a load balancer.

Summary: decision table

DecisionRecommendation
Deployment modeEmbedded for single service; feather-serve for multi-service or MCP
Namespace designnamespace = tenant ID, entity = topic, attributes = secondary filters
Multi-tenant file strategyShared file up to ~100k tenants; one file per tenant above that or for data residency
CompactionAfter bulk deletes >10%, weekly schedule, before shipping backups
Memory on constrained hostsEnable int8 RAM: 1.76× less RAM, recall@10 ~0.88 — fine for context retrieval
Cold startv0.16.0 persisted HNSW = 48ms at 500k vectors; set FEATHER_LOAD_THREADS=8 as fallback
Bulk ingestionadd_batch() for >1k items (3.4×); add() for real-time single inserts
Disaster recoveryCopy = backup; nightly snapshot to object storage; compact before shipping
DockerNamed volume for /data; restart: unless-stopped; FEATHER_LOAD_THREADS in ENV

Install: pip install feather-db · GitHub: github.com/feather-store/feather