# Feather DB in Production: Deployment Patterns and Best Practices > Embedded, feather-serve daemon, or remote MCP — Feather DB runs wherever you need it. This guide covers namespace design, file management, compaction, memory optimization, cold-start tuning, bulk ingestion, Docker, monitoring, multi-tenant patterns, and disaster recovery. - **Category**: Deploy - **Read time**: 10 min read - **Date**: June 16, 2026 - **Author**: Feather DB (Engineering) - **URL**: https://getfeather.store/theory/feather-db-production-deployment-guide --- ## Deployment modes Feather DB ships three deployment modes. Pick one based on where your embedding logic lives and who needs to reach the index. ### Embedded (single process) The default mode. Your Python process opens a `.feather` file directly — no network hop, no daemon, sub-millisecond search latency. Use this for single-service backends, batch pipelines, and local development. ```python import feather_db as fdb db = fdb.DB.open("memory.feather", dim=768) db.add(vec, text="Alice prefers dark mode.", namespace="user-alice") results = db.search(query_vec, k=5, namespace="user-alice") db.save() ``` The file is created on first open and grown incrementally. You own the process lifecycle — save before exit. ### feather-serve (local daemon) Run `feather-serve` as a long-lived process. It exposes a REST API at `/api/v1/`, an MCP endpoint at `/mcp`, and the Atlas admin SPA at `/admin/`. Other services call HTTP instead of linking the library directly. Use this when multiple processes share one context store, or when you want the admin SPA for inspection. ```bash GOOGLE_API_KEY=your-key feather-serve memory.feather \ --embed-provider gemini --dim 768 --port 7700 ``` With `--embed-provider`, feather-serve handles embeddings server-side. Clients send raw text; Feather returns ranked memories. No embedding pipeline in your application code. ### Remote MCP (network) Connect Claude Desktop, Claude Code, or any MCP-compatible agent to a running `feather-serve` instance. The agent calls `feather_search`, `feather_add`, `feather_context_chain`, and 11 other tools as native MCP calls. Use this for persistent agent memory across conversations. ```json // claude_desktop_config.json { "mcpServers": { "feather-memory": { "url": "http://localhost:7700/mcp" } } } ``` The three modes compose: in production you might run feather-serve in Docker (mode 2), accessed by your Python API via REST (mode 1 semantics over HTTP) and by Claude via MCP (mode 3). ## Namespace design Namespaces are hard tenant boundaries. Memories in `"user-alice"` are completely invisible to searches in `"user-bob"`. Search latency scales with the number of memories *in that namespace*, not across all tenants — which is what makes a shared file practical at scale. The standard pattern: - **namespace = tenant boundary** — user_id, agent_id, org_id - **entity = topic group within a namespace** — "preferences", "work-context", "current-project" - **attributes = secondary metadata** — type, source, created_at, confidence ```python import feather_db as fdb from datetime import datetime, timezone db = fdb.DB.open("saas_memory.feather", dim=768) def add_memory(user_id: str, text: str, category: str, mem_type: str, importance: float = 1.0, vec=None): if vec is None: vec = embed(text) mem = db.add(vec, text=text, namespace=user_id, entity=category) mem.meta.importance = importance mem.meta.set_attribute("type", mem_type) mem.meta.set_attribute("created_at", datetime.now(timezone.utc).isoformat()) return mem def search_memory(user_id: str, query: str, category: str = None, k: int = 5): return db.search(embed(query), k=k, namespace=user_id, entity=category) # One tenant per user_id — strict isolation, zero cross-contamination add_memory("user-42", "Prefers Python for backend, TypeScript for frontend.", category="preferences", mem_type="fact", importance=1.2) add_memory("user-42", "Building a fintech SaaS, Series A in Q3.", category="work-context", mem_type="fact", importance=1.5) results = search_memory("user-42", "What stack does this user prefer?") ``` Agent roles get their own namespaces too. A planner agent and a coder agent operating on the same codebase should have separate namespaces — `"agent-planner"` and `"agent-coder"` — so their context stores don't bleed into each other's retrieval. ## File management ### Where to store .feather files Keep `.feather` files on durable, fast-seek storage — SSD-backed volumes or network-attached storage with high IOPS. The file is read fully at startup (HNSW graph reconstruction) and written atomically on `db.save()`. Write latency matters at save time; read IOPS matter at startup. Recommended layout for a production service: ```bash /data/feather/ production.feather # primary store production.feather.bak # last manual snapshot staging.feather # staging environment ``` In Docker, always use a named volume — never a bind mount to a temp directory: ```yaml volumes: feather-data: driver: local services: feather-api: volumes: - feather-data:/data ``` ### Backup strategy The `.feather` file is self-contained: vectors, HNSW graph, metadata, edges, and namespace index are all in one binary. A copy *is* a backup. ```bash # Snapshot before a risky operation (migration, bulk delete) cp /data/feather/production.feather \ /data/feather/production.$(date +%Y%m%d-%H%M%S).bak # Restore is equally simple cp /data/feather/production.20260616-0900.bak \ /data/feather/production.feather ``` For scheduled backups, copy the file to object storage (S3, GCS) nightly. Because `db.save()` writes atomically to a temp file then renames, a concurrent copy during a save will always get a consistent snapshot — either the old file or the new one, never a partial write. ```bash # Nightly backup cron 0 2 * * * cp /data/feather/production.feather \ s3://your-bucket/backups/production.$(date +%Y%m%d).feather ``` ## Compaction Feather's HNSW graph accumulates soft-deleted nodes when you call `forget()` or `purge()`. These nodes no longer appear in search results but still occupy space in the graph, slowing traversal slightly. `compact()` rewrites the file clean — removed nodes gone, HNSW rebuilt tight, load time faster on the next restart. ```python db.compact() # rebuilds graph in-place, rewrites .feather file db.save() # flush the compacted state to disk ``` When to compact: - After a bulk `forget()` or `purge()` that removes more than ~10% of nodes - After onboarding data migrations where you replaced old records - Before a disaster recovery restore — compact the source file first so the restore loads a tight index - On a weekly schedule for stores with frequent deletes ```python import schedule, time def weekly_compact(): db.purge(namespace="*", older_than_days=90) # evict stale memories db.compact() db.save() print(f"Compacted. Vectors remaining: {db.count()}") schedule.every().monday.at("03:00").do(weekly_compact) while True: schedule.run_pending() time.sleep(60) ``` Compaction is also the fastest way to reduce file size before shipping a snapshot to a new environment. ## Memory management ### Adaptive capacity (v0.15.3) v0.15.3 ships adaptive HNSW capacity: the index grows incrementally instead of pre-allocating max_elements upfront. For typical deployments that start small and grow over weeks, this delivers 7.7× less RAM at startup compared to pre-allocating for 1M elements on an empty index. The change is automatic — no config required. ### int8 RAM quantization For memory-constrained hosts (1–2 GB VPS, edge devices, Lambda), enable in-RAM int8 quantization after load: ```python db = fdb.DB.open("memory.feather", dim=768) db.set_int8_ram("text", max_abs=1.0) # 1.76× less RAM, recall@10 ~0.88 vs 0.972 ``` At 60k × 768-dim float32, RAM drops from 227 MB to 129 MB. Recall@10 moves from 0.972 to ~0.88. For context retrieval — surfacing 5–10 relevant memories per query — 0.88 recall is completely acceptable. ModeRAM (60k × 768-dim)Recall@10 float32 (default)227 MB0.972 int8 in-RAM129 MB~0.88 Stick with float32 when running precision benchmarks, when RAM is not a constraint, or when your index is under 20k vectors and there's no reason to trade recall. ## Cold start: persisted HNSW (v0.16.0) v0.16.0 ships persisted HNSW graph state. The HNSW graph is stored in a ready-to-load binary layout inside the `.feather` file — no reconstruction on startup. Cold start at 500k vectors drops from 2.7s to 48ms. This is the highest-impact change for serverless deployments and Kubernetes pods with frequent restarts. Before v0.16.0, parallel load via `FEATHER_LOAD_THREADS` was the primary lever: ```python import os import feather_db as fdb # v0.15.x: parallel graph reconstruction (4.7× faster than serial) os.environ["FEATHER_LOAD_THREADS"] = "8" db = fdb.DB.open("memory.feather", dim=768) ``` With v0.16.0, `FEATHER_LOAD_THREADS` is still respected during initial file creation and explicit rebuilds, but routine opens skip reconstruction entirely. Set it in your environment regardless — it handles fallback cases. VersionCold start (500k vectors)Notes v0.15.x serial~2.7sSingle-threaded reconstruction v0.15.x + FEATHER_LOAD_THREADS=8~0.6sParallel reconstruction, 4.7× v0.16.048msPersisted graph, no reconstruction ```bash # Dockerfile — set regardless of version ENV FEATHER_LOAD_THREADS=8 ``` ## Bulk ingestion with add_batch() For ingesting more than ~1k vectors at once — corpus imports, historical data seeding, document chunking pipelines — use `add_batch()`. It builds the HNSW graph in parallel with the GIL released: 3.4× faster than a sequential loop on a 4-core machine, ~5–6× on an 8-core machine. ```python import feather_db as fdb import numpy as np db = fdb.DB.open("corpus.feather", dim=768) # Prepare vectors and metadata in bulk texts = load_your_documents() # list of strings vecs = embed_batch(texts) # np.ndarray shape (N, 768), float32 scores = load_importance_scores(texts) # np.ndarray shape (N,) metas = [] for i, (text, score) in enumerate(zip(texts, scores)): m = fdb.Metadata(importance=float(min(1.0, score))) m.set_attribute("source", "batch_import_2026") m.set_attribute("doc_id", str(i)) metas.append(m) ids = list(range(len(texts))) # Single parallel call — no Python loop overhead db.add_batch(ids, vecs, metas=metas) db.save() print(f"Ingested {db.count()} vectors") ``` Use `add()` for real-time single-item inserts (one memory after each conversation turn). Use `add_batch()` for everything else. The crossover point is roughly 1,000 items — below that the overhead isn't worth it; above it the speedup compounds. Important: always use `meta.set_attribute(key, value)`, not `meta.attributes[key] = value`. The dict accessor silently does nothing due to pybind11 copy semantics. ## Docker: self-hosted feather-serve ```bash # Clone and build git clone https://github.com/feather-store/feather.git cd feather docker compose -f feather-api/docker-compose.yml build ``` ```yaml # feather-api/docker-compose.yml version: '3.9' services: feather-api: image: feather-api:latest build: context: . dockerfile: Dockerfile ports: - "${FEATHER_PORT:-7700}:7700" volumes: - feather-data:/data # persistent across restarts and image updates environment: - FEATHER_API_KEY=${FEATHER_API_KEY} - FEATHER_EMBED_PROVIDER=${FEATHER_EMBED_PROVIDER:-gemini} - GOOGLE_API_KEY=${GOOGLE_API_KEY:-} - OPENAI_API_KEY=${OPENAI_API_KEY:-} - FEATHER_LOAD_THREADS=8 - FEATHER_DIM=${FEATHER_DIM:-768} healthcheck: test: ["CMD", "curl", "-f", "http://localhost:7700/health"] interval: 30s timeout: 10s retries: 3 start_period: 15s restart: unless-stopped volumes: feather-data: driver: local ``` ```bash # feather-api/.env FEATHER_API_KEY=your-secret-key FEATHER_EMBED_PROVIDER=gemini GOOGLE_API_KEY=your-google-key FEATHER_LOAD_THREADS=8 FEATHER_DIM=768 # Start docker compose -f feather-api/docker-compose.yml up -d # Verify curl http://localhost:7700/health # {"status": "ok", "version": "0.16.0", "vectors": 0, "dim": 768} ``` For production HTTPS, put Nginx or Caddy in front with `proxy_pass http://feather-api:7700` and a Let's Encrypt certificate. The MCP endpoint and admin SPA both work behind a reverse proxy with no additional configuration. The named volume `feather-data` persists the `.feather` file across container restarts, image updates, and host reboots. Never bind-mount to a temp directory — you will lose all memories on container restart. ## Monitoring Feather exposes the metrics you need through the API and in-process. ```python import time import feather_db as fdb start = time.perf_counter() db = fdb.DB.open("memory.feather", dim=768) load_time_ms = (time.perf_counter() - start) * 1000 # Core health metrics record_count = db.count() namespace_list = db.list_namespaces() # Memory estimate (float32 baseline) ram_estimate_mb = record_count * 768 * 4 / 1e6 print(f"Load time: {load_time_ms:.0f}ms") print(f"Vectors: {record_count:,}") print(f"Namespaces: {len(namespace_list)}") print(f"RAM (est.): {ram_estimate_mb:.0f} MB") # Per-namespace counts for multi-tenant monitoring for ns in namespace_list: ns_count = db.count(namespace=ns) print(f" {ns}: {ns_count} vectors") ``` Expose these as a `/metrics` endpoint (Prometheus) or ship them to your observability stack on a 60-second interval. The four numbers to track in production: record count (growth rate), load time (regression signal if it spikes), RAM usage (capacity planning), and per-namespace counts (detect runaway tenants). Via the REST API from feather-serve: ```bash curl -H "Authorization: Bearer $FEATHER_API_KEY" \ http://localhost:7700/api/v1/stats # { # "total_vectors": 84321, # "namespaces": 412, # "file_size_mb": 247.3, # "load_time_ms": 48 # } ``` ## Multi-tenant patterns Two patterns exist for multi-tenant deployments. Choose based on your tenant count and isolation requirements. ### One file per tenant Each tenant gets a dedicated `.feather` file. Strong physical isolation, easy per-tenant backup and deletion, and predictable per-file memory usage. Best for: small tenant counts (<100), enterprise customers who need data residency guarantees, or tenants with very large individual corpora (>500k vectors each). ```python def get_db(tenant_id: str) -> fdb.DB: path = f"/data/feather/tenant-{tenant_id}.feather" db = fdb.DB.open(path, dim=768) return db # Backup one tenant def backup_tenant(tenant_id: str): import shutil shutil.copy( f"/data/feather/tenant-{tenant_id}.feather", f"/data/backups/tenant-{tenant_id}.{today()}.feather" ) # Delete a tenant completely — just delete the file def offboard_tenant(tenant_id: str): import os os.remove(f"/data/feather/tenant-{tenant_id}.feather") ``` ### Namespace-per-tenant in a shared file All tenants share one `.feather` file, isolated by namespace. Best for: SaaS products with hundreds to tens of thousands of users, where per-file overhead would be impractical. Search latency scales with per-namespace vector count, not total file size. ```python # One file, thousands of tenants — namespace enforces isolation db = fdb.DB.open("/data/feather/production.feather", dim=768) def add_for_user(user_id: str, text: str, category: str): db.add(embed(text), text=text, namespace=user_id, entity=category) db.save() def search_for_user(user_id: str, query: str, k: int = 5): return db.search(embed(query), k=k, namespace=user_id) # Delete all memories for a user (GDPR, offboarding) def delete_user_data(user_id: str): db.purge(namespace=user_id) db.compact() db.save() ``` For very large deployments (100k+ namespaces), shard by namespace hash across multiple files with a routing layer. Each shard benefits from persisted HNSW load (48ms cold start) and int8 RAM quantization independently. ## Disaster recovery The `.feather` file is self-contained. Everything — vectors, HNSW graph, metadata, typed edges, namespace index — is in one binary. Recovery is a file copy. ```python import shutil from pathlib import Path def restore_from_backup(backup_path: str, target_path: str): """Restore a .feather file from backup. No special tooling needed.""" shutil.copy(backup_path, target_path) # Verify the restore loaded clean db = fdb.DB.open(target_path, dim=768) print(f"Restored. Vectors: {db.count()}, Namespaces: {len(db.list_namespaces())}") return db ``` Disaster recovery checklist: - Snapshot the file before every migration: `cp production.feather production.$(date +%Y%m%d).bak` - Ship nightly backups to object storage — one copy is not a backup - Compact before shipping a snapshot to a new environment — smaller file, faster restore load - Test your restore path quarterly: copy a backup, open it, verify count and namespace list - Never modify a `.feather` file directly — always go through the Feather API. The format has a checksum; corrupt files will fail to open with a clear error rather than silently returning wrong results ## Production setup: putting it together A complete production Python service with namespace isolation, startup optimization, compaction schedule, and monitoring: ```python import os import time import logging import schedule import feather_db as fdb logger = logging.getLogger("feather") # ── Startup ────────────────────────────────────────────────────────────── os.environ["FEATHER_LOAD_THREADS"] = "8" # parallel HNSW load start = time.perf_counter() DB = fdb.DB.open("/data/feather/production.feather", dim=768) load_ms = (time.perf_counter() - start) * 1000 # Optional: int8 quantization for memory-constrained hosts # DB.set_int8_ram("text", max_abs=1.0) # 1.76× less RAM, recall@10 ~0.88 logger.info(f"Feather ready. vectors={DB.count()} load_ms={load_ms:.0f}") EMBED = load_your_embedder() # e.g. Gemini, OpenAI, Voyage # ── Core operations ─────────────────────────────────────────────────────── def add_memory(user_id: str, text: str, category: str, mem_type: str = "fact", importance: float = 1.0): vec = EMBED(text) mem = DB.add(vec, text=text, namespace=user_id, entity=category) mem.meta.importance = importance mem.meta.set_attribute("type", mem_type) mem.meta.set_attribute("created_at", time.strftime("%Y-%m-%dT%H:%M:%SZ")) DB.save() return mem.id def search_memory(user_id: str, query: str, category: str = None, k: int = 5): vec = EMBED(query) return DB.search(vec, k=k, namespace=user_id, entity=category) def delete_user(user_id: str): """Full GDPR delete — purge namespace, compact, save.""" DB.purge(namespace=user_id) DB.compact() DB.save() logger.info(f"Deleted namespace={user_id}") # ── Bulk ingestion ──────────────────────────────────────────────────────── def bulk_seed(user_id: str, records: list[dict]): """Seed a user's historical data. Use add_batch for >1k records.""" import numpy as np texts = [r["text"] for r in records] vecs = np.array([EMBED(t) for t in texts], dtype=np.float32) ids = list(range(DB.count(), DB.count() + len(records))) metas = [] for r in records: m = fdb.Metadata(importance=r.get("importance", 1.0)) m.set_attribute("source", r.get("source", "seed")) metas.append(m) DB.add_batch(ids, vecs, metas=metas, namespace=user_id) DB.save() logger.info(f"Seeded {len(records)} records for user={user_id}") # ── Maintenance schedule ────────────────────────────────────────────────── def weekly_maintenance(): before = DB.count() DB.purge(older_than_days=90) # evict memories not recalled in 90 days DB.compact() DB.save() after = DB.count() logger.info(f"Maintenance: {before - after} nodes pruned, {after} remaining") schedule.every().monday.at("03:00").do(weekly_maintenance) # ── Metrics ─────────────────────────────────────────────────────────────── def emit_metrics(): namespaces = DB.list_namespaces() ram_mb = DB.count() * 768 * 4 / 1e6 logger.info( f"metrics vectors={DB.count()} " f"namespaces={len(namespaces)} " f"ram_estimate_mb={ram_mb:.0f}" ) schedule.every(60).seconds.do(emit_metrics) ``` This pattern runs well on a 2-core / 2 GB VPS serving up to ~10k tenants in a shared file. For larger deployments, shard by `hash(user_id) % N` across N files, each served by its own feather-serve instance behind a load balancer. ## Summary: decision table DecisionRecommendation Deployment modeEmbedded for single service; feather-serve for multi-service or MCP Namespace designnamespace = tenant ID, entity = topic, attributes = secondary filters Multi-tenant file strategyShared file up to ~100k tenants; one file per tenant above that or for data residency CompactionAfter bulk deletes >10%, weekly schedule, before shipping backups Memory on constrained hostsEnable int8 RAM: 1.76× less RAM, recall@10 ~0.88 — fine for context retrieval Cold startv0.16.0 persisted HNSW = 48ms at 500k vectors; set FEATHER_LOAD_THREADS=8 as fallback Bulk ingestionadd_batch() for >1k items (3.4×); add() for real-time single inserts Disaster recoveryCopy = backup; nightly snapshot to object storage; compact before shipping DockerNamed volume for /data; restart: unless-stopped; FEATHER_LOAD_THREADS in ENV **Install:** `pip install feather-db` · **GitHub:** [github.com/feather-store/feather](https://github.com/feather-store/feather) --- *This is the machine-readable mirror of the theory post at [getfeather.store/theory/feather-db-production-deployment-guide](https://getfeather.store/theory/feather-db-production-deployment-guide). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*