# feather-serve + Real Embedders: Semantic Persona Recall Without Writing a Single Embedding Call > Feather DB v0.15.1 adds --embed-provider to feather-serve. Pass text in, get semantic search out — no embedding pipeline to maintain. Here's what changed and how to wire it up. - **Category**: Tutorial - **Read time**: 8 min read - **Date**: June 16, 2026 - **Author**: Feather DB (Engineering) - **URL**: https://getfeather.store/theory/feather-serve-real-embedders-semantic-recall --- ## The embedding pipeline problem Before v0.15.1, using Feather DB required an embedding step before every `add()` and `search()` call. You'd call an embedding API, get a float array, then pass it to Feather. This is fine in Python code — two lines. But it creates friction in two scenarios: - **MCP clients** (Claude Desktop, Claude Code) don't naturally generate embedding vectors. You'd need an intermediate step that Claude can't do natively. - **REST API clients** calling `feather-serve` have to maintain their own embedding pipeline, adding latency and operational complexity. v0.15.1 solves this: `feather-serve --embed-provider` makes Feather itself responsible for embedding. You send text; Feather handles vectors. ## What --embed-provider does When `feather-serve` starts with `--embed-provider`, it initializes an embedding client for the chosen provider. Every `ingest_text`, `search_text`, and `feather_add` (MCP) call that receives raw text gets embedded server-side before hitting the HNSW index. ```bash # Gemini — native 768-dim, free tier available GOOGLE_API_KEY=… feather-serve persona.feather \ --embed-provider gemini --dim 768 --port 8001 # OpenAI OPENAI_API_KEY=… feather-serve persona.feather \ --embed-provider openai --dim 1536 --port 8001 # Voyage AI VOYAGE_API_KEY=… feather-serve persona.feather \ --embed-provider voyage --dim 1024 --port 8001 # Cohere COHERE_API_KEY=… feather-serve persona.feather \ --embed-provider cohere --dim 1024 --port 8001 # Ollama — fully offline, no API key feather-serve persona.feather \ --embed-provider ollama --ollama-model nomic-embed-text --dim 1024 --port 8001 ``` ## Before vs after **Before v0.15.1 — REST add:** ```python import openai, requests, numpy as np # Step 1: embed resp = openai.embeddings.create(model="text-embedding-3-small", input="User prefers Python") vec = resp.data[0].embedding # Step 2: store requests.post("http://localhost:8001/v1/default/add", json={ "id": 1, "vector": vec, "metadata": {"text": "User prefers Python"} }) ``` **After v0.15.1 — REST add with real embedder:** ```python import requests # One call — feather-serve embeds internally requests.post("http://localhost:8001/v1/default/ingest_text", json={ "id": 1, "text": "User prefers Python" }) ``` Same pattern for search: ```python # Before: embed the query yourself, then search by vector # After: search by text — feather-serve embeds the query results = requests.post("http://localhost:8001/v1/default/search_text", json={ "text": "programming language preference", "k": 5 }).json() ``` ## Semantic persona recall via MCP Combined with the MCP backend (v0.14.0), real embedders make the Claude Desktop persona experience seamless. Claude calls `feather_search` with a natural language query string — feather-serve embeds it, searches HNSW, returns relevant memories. No embedding step visible anywhere in the MCP tool schema. ```json // Claude's tool call (MCP) { "tool": "feather_search", "arguments": { "query": "what programming language does the user prefer?", "k": 5 } } // feather-serve internally: // 1. embed("what programming language does the user prefer?") // 2. db.search(vec, k=5) // 3. return results ``` ## Choosing a provider ProviderDimCostBest for gemini768Free tier / $0.00002/1K charsNative Feather format, low cost, multimodal openai1536$0.02/1M tokens (small)High quality, widely supported voyage1024$0.06/1M tokensCode + technical content cohere1024$0.10/1M tokensMultilingual ollamavariesFree (local compute)Privacy, air-gap, offline For the MCP + Claude Desktop use case, **Gemini** is the recommended starting point: 768-dim is the native Feather format (matches on-disk int8 quantization), the free tier is generous, and the text-embedding-004 model is competitive in quality benchmarks. ## The complete persona stack With v0.15.1, the full persona context engine stack is: ```bash GOOGLE_API_KEY=… feather-serve persona.feather \ --embed-provider gemini \ --dim 768 \ --port 8001 ``` This single command gives you: - Semantic add (text in → embedded → stored) - Semantic search (text query → embedded → ANN → ranked results) - Context chain (semantic search + BFS graph traversal) - 14 MCP tools consumable by Claude Desktop/Code - REST API at `/v1/` for programmatic access - Admin SPA at `/admin/` for manual inspection Add `db.set_int8_ram("text", max_abs=1.0)` at startup for 1.7× RAM savings on memory-constrained hosts. **Install:** `pip install feather-db==0.15.1` --- *This is the machine-readable mirror of the theory post at [getfeather.store/theory/feather-serve-real-embedders-semantic-recall](https://getfeather.store/theory/feather-serve-real-embedders-semantic-recall). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*