# Feather DB in Production: Deployment Patterns and Best Practices

> Embedded, feather-serve daemon, or remote MCP — Feather DB runs wherever you need it. This guide covers namespace design, file management, compaction, memory optimization, cold-start tuning, bulk ingestion, Docker, monitoring, multi-tenant patterns, and disaster recovery.

- **Category**: Deploy
- **Read time**: 10 min read
- **Date**: June 16, 2026
- **Author**: Feather DB (Engineering)
- **URL**: https://getfeather.store/theory/feather-db-production-deployment-guide

---

## Deployment modes

Feather DB ships three deployment modes. Pick one based on where your embedding logic lives and who needs to reach the index.

### Embedded (single process)

The default mode. Your Python process opens a `.feather` file directly — no network hop, no daemon, sub-millisecond search latency. Use this for single-service backends, batch pipelines, and local development.

```python
import feather_db as fdb

db = fdb.DB.open("memory.feather", dim=768)
db.add(vec, text="Alice prefers dark mode.", namespace="user-alice")
results = db.search(query_vec, k=5, namespace="user-alice")
db.save()

```

The file is created on first open and grown incrementally. You own the process lifecycle — save before exit.

### feather-serve (local daemon)

Run `feather-serve` as a long-lived process. It exposes a REST API at `/api/v1/`, an MCP endpoint at `/mcp`, and the Atlas admin SPA at `/admin/`. Other services call HTTP instead of linking the library directly. Use this when multiple processes share one context store, or when you want the admin SPA for inspection.

```bash
GOOGLE_API_KEY=your-key feather-serve memory.feather \
  --embed-provider gemini --dim 768 --port 7700

```

With `--embed-provider`, feather-serve handles embeddings server-side. Clients send raw text; Feather returns ranked memories. No embedding pipeline in your application code.

### Remote MCP (network)

Connect Claude Desktop, Claude Code, or any MCP-compatible agent to a running `feather-serve` instance. The agent calls `feather_search`, `feather_add`, `feather_context_chain`, and 11 other tools as native MCP calls. Use this for persistent agent memory across conversations.

```json
// claude_desktop_config.json
{
  "mcpServers": {
    "feather-memory": {
      "url": "http://localhost:7700/mcp"
    }
  }
}

```

The three modes compose: in production you might run feather-serve in Docker (mode 2), accessed by your Python API via REST (mode 1 semantics over HTTP) and by Claude via MCP (mode 3).

## Namespace design

Namespaces are hard tenant boundaries. Memories in `"user-alice"` are completely invisible to searches in `"user-bob"`. Search latency scales with the number of memories *in that namespace*, not across all tenants — which is what makes a shared file practical at scale.

The standard pattern:

  - **namespace = tenant boundary** — user_id, agent_id, org_id

  - **entity = topic group within a namespace** — "preferences", "work-context", "current-project"

  - **attributes = secondary metadata** — type, source, created_at, confidence

```python
import feather_db as fdb
from datetime import datetime, timezone

db = fdb.DB.open("saas_memory.feather", dim=768)

def add_memory(user_id: str, text: str, category: str,
               mem_type: str, importance: float = 1.0, vec=None):
    if vec is None:
        vec = embed(text)
    mem = db.add(vec, text=text, namespace=user_id, entity=category)
    mem.meta.importance = importance
    mem.meta.set_attribute("type", mem_type)
    mem.meta.set_attribute("created_at", datetime.now(timezone.utc).isoformat())
    return mem

def search_memory(user_id: str, query: str,
                  category: str = None, k: int = 5):
    return db.search(embed(query), k=k,
                     namespace=user_id, entity=category)

# One tenant per user_id — strict isolation, zero cross-contamination
add_memory("user-42", "Prefers Python for backend, TypeScript for frontend.",
           category="preferences", mem_type="fact", importance=1.2)

add_memory("user-42", "Building a fintech SaaS, Series A in Q3.",
           category="work-context", mem_type="fact", importance=1.5)

results = search_memory("user-42", "What stack does this user prefer?")

```

Agent roles get their own namespaces too. A planner agent and a coder agent operating on the same codebase should have separate namespaces — `"agent-planner"` and `"agent-coder"` — so their context stores don't bleed into each other's retrieval.

## File management

### Where to store .feather files

Keep `.feather` files on durable, fast-seek storage — SSD-backed volumes or network-attached storage with high IOPS. The file is read fully at startup (HNSW graph reconstruction) and written atomically on `db.save()`. Write latency matters at save time; read IOPS matter at startup.

Recommended layout for a production service:

```bash
/data/feather/
  production.feather        # primary store
  production.feather.bak    # last manual snapshot
  staging.feather           # staging environment

```

In Docker, always use a named volume — never a bind mount to a temp directory:

```yaml
volumes:
  feather-data:
    driver: local

services:
  feather-api:
    volumes:
      - feather-data:/data

```

### Backup strategy

The `.feather` file is self-contained: vectors, HNSW graph, metadata, edges, and namespace index are all in one binary. A copy *is* a backup.

```bash
# Snapshot before a risky operation (migration, bulk delete)
cp /data/feather/production.feather \
   /data/feather/production.$(date +%Y%m%d-%H%M%S).bak

# Restore is equally simple
cp /data/feather/production.20260616-0900.bak \
   /data/feather/production.feather

```

For scheduled backups, copy the file to object storage (S3, GCS) nightly. Because `db.save()` writes atomically to a temp file then renames, a concurrent copy during a save will always get a consistent snapshot — either the old file or the new one, never a partial write.

```bash
# Nightly backup cron
0 2 * * * cp /data/feather/production.feather \
             s3://your-bucket/backups/production.$(date +%Y%m%d).feather

```

## Compaction

Feather's HNSW graph accumulates soft-deleted nodes when you call `forget()` or `purge()`. These nodes no longer appear in search results but still occupy space in the graph, slowing traversal slightly. `compact()` rewrites the file clean — removed nodes gone, HNSW rebuilt tight, load time faster on the next restart.

```python
db.compact()   # rebuilds graph in-place, rewrites .feather file
db.save()      # flush the compacted state to disk

```

When to compact:

  - After a bulk `forget()` or `purge()` that removes more than ~10% of nodes

  - After onboarding data migrations where you replaced old records

  - Before a disaster recovery restore — compact the source file first so the restore loads a tight index

  - On a weekly schedule for stores with frequent deletes

```python
import schedule, time

def weekly_compact():
    db.purge(namespace="*", older_than_days=90)  # evict stale memories
    db.compact()
    db.save()
    print(f"Compacted. Vectors remaining: {db.count()}")

schedule.every().monday.at("03:00").do(weekly_compact)

while True:
    schedule.run_pending()
    time.sleep(60)

```

Compaction is also the fastest way to reduce file size before shipping a snapshot to a new environment.

## Memory management

### Adaptive capacity (v0.15.3)

v0.15.3 ships adaptive HNSW capacity: the index grows incrementally instead of pre-allocating max_elements upfront. For typical deployments that start small and grow over weeks, this delivers 7.7× less RAM at startup compared to pre-allocating for 1M elements on an empty index. The change is automatic — no config required.

### int8 RAM quantization

For memory-constrained hosts (1–2 GB VPS, edge devices, Lambda), enable in-RAM int8 quantization after load:

```python
db = fdb.DB.open("memory.feather", dim=768)
db.set_int8_ram("text", max_abs=1.0)  # 1.76× less RAM, recall@10 ~0.88 vs 0.972

```

At 60k × 768-dim float32, RAM drops from 227 MB to 129 MB. Recall@10 moves from 0.972 to ~0.88. For context retrieval — surfacing 5–10 relevant memories per query — 0.88 recall is completely acceptable.

  
    ModeRAM (60k × 768-dim)Recall@10
  
  
    float32 (default)227 MB0.972
    int8 in-RAM129 MB~0.88
  

Stick with float32 when running precision benchmarks, when RAM is not a constraint, or when your index is under 20k vectors and there's no reason to trade recall.

## Cold start: persisted HNSW (v0.16.0)

v0.16.0 ships persisted HNSW graph state. The HNSW graph is stored in a ready-to-load binary layout inside the `.feather` file — no reconstruction on startup. Cold start at 500k vectors drops from 2.7s to 48ms.

This is the highest-impact change for serverless deployments and Kubernetes pods with frequent restarts. Before v0.16.0, parallel load via `FEATHER_LOAD_THREADS` was the primary lever:

```python
import os
import feather_db as fdb

# v0.15.x: parallel graph reconstruction (4.7× faster than serial)
os.environ["FEATHER_LOAD_THREADS"] = "8"
db = fdb.DB.open("memory.feather", dim=768)

```

With v0.16.0, `FEATHER_LOAD_THREADS` is still respected during initial file creation and explicit rebuilds, but routine opens skip reconstruction entirely. Set it in your environment regardless — it handles fallback cases.

  
    VersionCold start (500k vectors)Notes
  
  
    v0.15.x serial~2.7sSingle-threaded reconstruction
    v0.15.x + FEATHER_LOAD_THREADS=8~0.6sParallel reconstruction, 4.7×
    v0.16.048msPersisted graph, no reconstruction
  

```bash
# Dockerfile — set regardless of version
ENV FEATHER_LOAD_THREADS=8

```

## Bulk ingestion with add_batch()

For ingesting more than ~1k vectors at once — corpus imports, historical data seeding, document chunking pipelines — use `add_batch()`. It builds the HNSW graph in parallel with the GIL released: 3.4× faster than a sequential loop on a 4-core machine, ~5–6× on an 8-core machine.

```python
import feather_db as fdb
import numpy as np

db = fdb.DB.open("corpus.feather", dim=768)

# Prepare vectors and metadata in bulk
texts  = load_your_documents()          # list of strings
vecs   = embed_batch(texts)             # np.ndarray shape (N, 768), float32
scores = load_importance_scores(texts)  # np.ndarray shape (N,)

metas = []
for i, (text, score) in enumerate(zip(texts, scores)):
    m = fdb.Metadata(importance=float(min(1.0, score)))
    m.set_attribute("source", "batch_import_2026")
    m.set_attribute("doc_id", str(i))
    metas.append(m)

ids = list(range(len(texts)))

# Single parallel call — no Python loop overhead
db.add_batch(ids, vecs, metas=metas)
db.save()

print(f"Ingested {db.count()} vectors")

```

Use `add()` for real-time single-item inserts (one memory after each conversation turn). Use `add_batch()` for everything else. The crossover point is roughly 1,000 items — below that the overhead isn't worth it; above it the speedup compounds.

Important: always use `meta.set_attribute(key, value)`, not `meta.attributes[key] = value`. The dict accessor silently does nothing due to pybind11 copy semantics.

## Docker: self-hosted feather-serve

```bash
# Clone and build
git clone https://github.com/feather-store/feather.git
cd feather
docker compose -f feather-api/docker-compose.yml build

```

```yaml
# feather-api/docker-compose.yml
version: '3.9'

services:
  feather-api:
    image: feather-api:latest
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "${FEATHER_PORT:-7700}:7700"
    volumes:
      - feather-data:/data        # persistent across restarts and image updates
    environment:
      - FEATHER_API_KEY=${FEATHER_API_KEY}
      - FEATHER_EMBED_PROVIDER=${FEATHER_EMBED_PROVIDER:-gemini}
      - GOOGLE_API_KEY=${GOOGLE_API_KEY:-}
      - OPENAI_API_KEY=${OPENAI_API_KEY:-}
      - FEATHER_LOAD_THREADS=8
      - FEATHER_DIM=${FEATHER_DIM:-768}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:7700/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s
    restart: unless-stopped

volumes:
  feather-data:
    driver: local

```

```bash
# feather-api/.env
FEATHER_API_KEY=your-secret-key
FEATHER_EMBED_PROVIDER=gemini
GOOGLE_API_KEY=your-google-key
FEATHER_LOAD_THREADS=8
FEATHER_DIM=768

# Start
docker compose -f feather-api/docker-compose.yml up -d

# Verify
curl http://localhost:7700/health
# {"status": "ok", "version": "0.16.0", "vectors": 0, "dim": 768}

```

For production HTTPS, put Nginx or Caddy in front with `proxy_pass http://feather-api:7700` and a Let's Encrypt certificate. The MCP endpoint and admin SPA both work behind a reverse proxy with no additional configuration.

The named volume `feather-data` persists the `.feather` file across container restarts, image updates, and host reboots. Never bind-mount to a temp directory — you will lose all memories on container restart.

## Monitoring

Feather exposes the metrics you need through the API and in-process.

```python
import time
import feather_db as fdb

start = time.perf_counter()
db = fdb.DB.open("memory.feather", dim=768)
load_time_ms = (time.perf_counter() - start) * 1000

# Core health metrics
record_count  = db.count()
namespace_list = db.list_namespaces()

# Memory estimate (float32 baseline)
ram_estimate_mb = record_count * 768 * 4 / 1e6

print(f"Load time:    {load_time_ms:.0f}ms")
print(f"Vectors:      {record_count:,}")
print(f"Namespaces:   {len(namespace_list)}")
print(f"RAM (est.):   {ram_estimate_mb:.0f} MB")

# Per-namespace counts for multi-tenant monitoring
for ns in namespace_list:
    ns_count = db.count(namespace=ns)
    print(f"  {ns}: {ns_count} vectors")

```

Expose these as a `/metrics` endpoint (Prometheus) or ship them to your observability stack on a 60-second interval. The four numbers to track in production: record count (growth rate), load time (regression signal if it spikes), RAM usage (capacity planning), and per-namespace counts (detect runaway tenants).

Via the REST API from feather-serve:

```bash
curl -H "Authorization: Bearer $FEATHER_API_KEY" \
     http://localhost:7700/api/v1/stats
# {
#   "total_vectors": 84321,
#   "namespaces": 412,
#   "file_size_mb": 247.3,
#   "load_time_ms": 48
# }

```

## Multi-tenant patterns

Two patterns exist for multi-tenant deployments. Choose based on your tenant count and isolation requirements.

### One file per tenant

Each tenant gets a dedicated `.feather` file. Strong physical isolation, easy per-tenant backup and deletion, and predictable per-file memory usage. Best for: small tenant counts (<100), enterprise customers who need data residency guarantees, or tenants with very large individual corpora (>500k vectors each).

```python
def get_db(tenant_id: str) -> fdb.DB:
    path = f"/data/feather/tenant-{tenant_id}.feather"
    db = fdb.DB.open(path, dim=768)
    return db

# Backup one tenant
def backup_tenant(tenant_id: str):
    import shutil
    shutil.copy(
        f"/data/feather/tenant-{tenant_id}.feather",
        f"/data/backups/tenant-{tenant_id}.{today()}.feather"
    )

# Delete a tenant completely — just delete the file
def offboard_tenant(tenant_id: str):
    import os
    os.remove(f"/data/feather/tenant-{tenant_id}.feather")

```

### Namespace-per-tenant in a shared file

All tenants share one `.feather` file, isolated by namespace. Best for: SaaS products with hundreds to tens of thousands of users, where per-file overhead would be impractical. Search latency scales with per-namespace vector count, not total file size.

```python
# One file, thousands of tenants — namespace enforces isolation
db = fdb.DB.open("/data/feather/production.feather", dim=768)

def add_for_user(user_id: str, text: str, category: str):
    db.add(embed(text), text=text,
           namespace=user_id, entity=category)
    db.save()

def search_for_user(user_id: str, query: str, k: int = 5):
    return db.search(embed(query), k=k, namespace=user_id)

# Delete all memories for a user (GDPR, offboarding)
def delete_user_data(user_id: str):
    db.purge(namespace=user_id)
    db.compact()
    db.save()

```

For very large deployments (100k+ namespaces), shard by namespace hash across multiple files with a routing layer. Each shard benefits from persisted HNSW load (48ms cold start) and int8 RAM quantization independently.

## Disaster recovery

The `.feather` file is self-contained. Everything — vectors, HNSW graph, metadata, typed edges, namespace index — is in one binary. Recovery is a file copy.

```python
import shutil
from pathlib import Path

def restore_from_backup(backup_path: str, target_path: str):
    """Restore a .feather file from backup. No special tooling needed."""
    shutil.copy(backup_path, target_path)
    # Verify the restore loaded clean
    db = fdb.DB.open(target_path, dim=768)
    print(f"Restored. Vectors: {db.count()}, Namespaces: {len(db.list_namespaces())}")
    return db

```

Disaster recovery checklist:

  - Snapshot the file before every migration: `cp production.feather production.$(date +%Y%m%d).bak`

  - Ship nightly backups to object storage — one copy is not a backup

  - Compact before shipping a snapshot to a new environment — smaller file, faster restore load

  - Test your restore path quarterly: copy a backup, open it, verify count and namespace list

  - Never modify a `.feather` file directly — always go through the Feather API. The format has a checksum; corrupt files will fail to open with a clear error rather than silently returning wrong results

## Production setup: putting it together

A complete production Python service with namespace isolation, startup optimization, compaction schedule, and monitoring:

```python
import os
import time
import logging
import schedule
import feather_db as fdb

logger = logging.getLogger("feather")

# ── Startup ──────────────────────────────────────────────────────────────
os.environ["FEATHER_LOAD_THREADS"] = "8"   # parallel HNSW load

start = time.perf_counter()
DB = fdb.DB.open("/data/feather/production.feather", dim=768)
load_ms = (time.perf_counter() - start) * 1000

# Optional: int8 quantization for memory-constrained hosts
# DB.set_int8_ram("text", max_abs=1.0)  # 1.76× less RAM, recall@10 ~0.88

logger.info(f"Feather ready. vectors={DB.count()} load_ms={load_ms:.0f}")

EMBED = load_your_embedder()  # e.g. Gemini, OpenAI, Voyage

# ── Core operations ───────────────────────────────────────────────────────
def add_memory(user_id: str, text: str, category: str,
               mem_type: str = "fact", importance: float = 1.0):
    vec = EMBED(text)
    mem = DB.add(vec, text=text, namespace=user_id, entity=category)
    mem.meta.importance = importance
    mem.meta.set_attribute("type", mem_type)
    mem.meta.set_attribute("created_at", time.strftime("%Y-%m-%dT%H:%M:%SZ"))
    DB.save()
    return mem.id

def search_memory(user_id: str, query: str,
                  category: str = None, k: int = 5):
    vec = EMBED(query)
    return DB.search(vec, k=k, namespace=user_id, entity=category)

def delete_user(user_id: str):
    """Full GDPR delete — purge namespace, compact, save."""
    DB.purge(namespace=user_id)
    DB.compact()
    DB.save()
    logger.info(f"Deleted namespace={user_id}")

# ── Bulk ingestion ────────────────────────────────────────────────────────
def bulk_seed(user_id: str, records: list[dict]):
    """Seed a user's historical data. Use add_batch for >1k records."""
    import numpy as np
    texts = [r["text"] for r in records]
    vecs  = np.array([EMBED(t) for t in texts], dtype=np.float32)
    ids   = list(range(DB.count(), DB.count() + len(records)))
    metas = []
    for r in records:
        m = fdb.Metadata(importance=r.get("importance", 1.0))
        m.set_attribute("source", r.get("source", "seed"))
        metas.append(m)
    DB.add_batch(ids, vecs, metas=metas, namespace=user_id)
    DB.save()
    logger.info(f"Seeded {len(records)} records for user={user_id}")

# ── Maintenance schedule ──────────────────────────────────────────────────
def weekly_maintenance():
    before = DB.count()
    DB.purge(older_than_days=90)   # evict memories not recalled in 90 days
    DB.compact()
    DB.save()
    after = DB.count()
    logger.info(f"Maintenance: {before - after} nodes pruned, {after} remaining")

schedule.every().monday.at("03:00").do(weekly_maintenance)

# ── Metrics ───────────────────────────────────────────────────────────────
def emit_metrics():
    namespaces = DB.list_namespaces()
    ram_mb = DB.count() * 768 * 4 / 1e6
    logger.info(
        f"metrics vectors={DB.count()} "
        f"namespaces={len(namespaces)} "
        f"ram_estimate_mb={ram_mb:.0f}"
    )

schedule.every(60).seconds.do(emit_metrics)

```

This pattern runs well on a 2-core / 2 GB VPS serving up to ~10k tenants in a shared file. For larger deployments, shard by `hash(user_id) % N` across N files, each served by its own feather-serve instance behind a load balancer.

## Summary: decision table

  
    DecisionRecommendation
  
  
    Deployment modeEmbedded for single service; feather-serve for multi-service or MCP
    Namespace designnamespace = tenant ID, entity = topic, attributes = secondary filters
    Multi-tenant file strategyShared file up to ~100k tenants; one file per tenant above that or for data residency
    CompactionAfter bulk deletes >10%, weekly schedule, before shipping backups
    Memory on constrained hostsEnable int8 RAM: 1.76× less RAM, recall@10 ~0.88 — fine for context retrieval
    Cold startv0.16.0 persisted HNSW = 48ms at 500k vectors; set FEATHER_LOAD_THREADS=8 as fallback
    Bulk ingestionadd_batch() for >1k items (3.4×); add() for real-time single inserts
    Disaster recoveryCopy = backup; nightly snapshot to object storage; compact before shipping
    DockerNamed volume for /data; restart: unless-stopped; FEATHER_LOAD_THREADS in ENV
  

**Install:** `pip install feather-db` · **GitHub:** [github.com/feather-store/feather](https://github.com/feather-store/feather)

---

*This is the machine-readable mirror of the theory post at [getfeather.store/theory/feather-db-production-deployment-guide](https://getfeather.store/theory/feather-db-production-deployment-guide). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*