# feather-serve + Real Embedders: Semantic Persona Recall Without Writing a Single Embedding Call

> Feather DB v0.15.1 adds --embed-provider to feather-serve. Pass text in, get semantic search out — no embedding pipeline to maintain. Here's what changed and how to wire it up.

- **Category**: Tutorial
- **Read time**: 8 min read
- **Date**: June 16, 2026
- **Author**: Feather DB (Engineering)
- **URL**: https://getfeather.store/theory/feather-serve-real-embedders-semantic-recall

---

## The embedding pipeline problem

Before v0.15.1, using Feather DB required an embedding step before every `add()` and `search()` call. You'd call an embedding API, get a float array, then pass it to Feather. This is fine in Python code — two lines. But it creates friction in two scenarios:

- **MCP clients** (Claude Desktop, Claude Code) don't naturally generate embedding vectors. You'd need an intermediate step that Claude can't do natively.

- **REST API clients** calling `feather-serve` have to maintain their own embedding pipeline, adding latency and operational complexity.

v0.15.1 solves this: `feather-serve --embed-provider` makes Feather itself responsible for embedding. You send text; Feather handles vectors.

## What --embed-provider does

When `feather-serve` starts with `--embed-provider`, it initializes an embedding client for the chosen provider. Every `ingest_text`, `search_text`, and `feather_add` (MCP) call that receives raw text gets embedded server-side before hitting the HNSW index.

```bash
# Gemini — native 768-dim, free tier available
GOOGLE_API_KEY=… feather-serve persona.feather \
  --embed-provider gemini --dim 768 --port 8001

# OpenAI
OPENAI_API_KEY=… feather-serve persona.feather \
  --embed-provider openai --dim 1536 --port 8001

# Voyage AI
VOYAGE_API_KEY=… feather-serve persona.feather \
  --embed-provider voyage --dim 1024 --port 8001

# Cohere
COHERE_API_KEY=… feather-serve persona.feather \
  --embed-provider cohere --dim 1024 --port 8001

# Ollama — fully offline, no API key
feather-serve persona.feather \
  --embed-provider ollama --ollama-model nomic-embed-text --dim 1024 --port 8001

```

## Before vs after

**Before v0.15.1 — REST add:**

```python
import openai, requests, numpy as np

# Step 1: embed
resp = openai.embeddings.create(model="text-embedding-3-small", input="User prefers Python")
vec = resp.data[0].embedding

# Step 2: store
requests.post("http://localhost:8001/v1/default/add", json={
    "id": 1,
    "vector": vec,
    "metadata": {"text": "User prefers Python"}
})

```

**After v0.15.1 — REST add with real embedder:**

```python
import requests

# One call — feather-serve embeds internally
requests.post("http://localhost:8001/v1/default/ingest_text", json={
    "id": 1,
    "text": "User prefers Python"
})

```

Same pattern for search:

```python
# Before: embed the query yourself, then search by vector
# After: search by text — feather-serve embeds the query
results = requests.post("http://localhost:8001/v1/default/search_text", json={
    "text": "programming language preference",
    "k": 5
}).json()

```

## Semantic persona recall via MCP

Combined with the MCP backend (v0.14.0), real embedders make the Claude Desktop persona experience seamless. Claude calls `feather_search` with a natural language query string — feather-serve embeds it, searches HNSW, returns relevant memories. No embedding step visible anywhere in the MCP tool schema.

```json
// Claude's tool call (MCP)
{
  "tool": "feather_search",
  "arguments": {
    "query": "what programming language does the user prefer?",
    "k": 5
  }
}

// feather-serve internally:
// 1. embed("what programming language does the user prefer?")
// 2. db.search(vec, k=5)
// 3. return results

```

## Choosing a provider

ProviderDimCostBest for

gemini768Free tier / $0.00002/1K charsNative Feather format, low cost, multimodal
openai1536$0.02/1M tokens (small)High quality, widely supported
voyage1024$0.06/1M tokensCode + technical content
cohere1024$0.10/1M tokensMultilingual
ollamavariesFree (local compute)Privacy, air-gap, offline

For the MCP + Claude Desktop use case, **Gemini** is the recommended starting point: 768-dim is the native Feather format (matches on-disk int8 quantization), the free tier is generous, and the text-embedding-004 model is competitive in quality benchmarks.

## The complete persona stack

With v0.15.1, the full persona context engine stack is:

```bash
GOOGLE_API_KEY=… feather-serve persona.feather \
  --embed-provider gemini \
  --dim 768 \
  --port 8001

```

This single command gives you:

- Semantic add (text in → embedded → stored)

- Semantic search (text query → embedded → ANN → ranked results)

- Context chain (semantic search + BFS graph traversal)

- 14 MCP tools consumable by Claude Desktop/Code

- REST API at `/v1/` for programmatic access

- Admin SPA at `/admin/` for manual inspection

Add `db.set_int8_ram("text", max_abs=1.0)` at startup for 1.7× RAM savings on memory-constrained hosts.

**Install:** `pip install feather-db==0.15.1`

---

*This is the machine-readable mirror of the theory post at [getfeather.store/theory/feather-serve-real-embedders-semantic-recall](https://getfeather.store/theory/feather-serve-real-embedders-semantic-recall). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*