# Unified Multimodal Intelligence for Performance Marketing: Feather DB + Gemini Embedding 2

> This paper describes an architecture combining Feather DB v0.7.0 and Google Gemini Embedding 2 that stores all three signal types in a single unified 768-dimensional vector index.

- **Category**: Technical Deep Dive
- **Read time**: 9 min read
- **Date**: March 12, 2026
- **Author**: Feather DB Engineering (Engineering Team)
- **URL**: https://getfeather.store/theory/unified-multimodal-intelligence-feather-db-gemini-2

---

# Unified Multimodal Intelligence for Performance Marketing: Feather DB + Gemini Embedding 2

*Technical Deep Dive · Feather DB v0.7.0 · March 2026*

---

## Abstract

Modern performance marketing stacks generate three distinct signal types: ad creative content (text briefs, image descriptions, video transcripts), competitive intelligence (competitor creative and messaging moves), and market context (macro events, strategy briefs, social signals). These signals are typically stored in separate systems with no semantic connection, leading to response latency when competitive or macro events require creative adaptation.

This paper describes an architecture combining **Feather DB v0.7.0** and **Google Gemini Embedding 2** (`gemini-embedding-exp-03-07`) that stores all three signal types in a single unified 768-dimensional vector index, with typed graph edges and adaptive decay-weighted retrieval. We demonstrate four experimental capabilities: cross-modal search, competitor-to-strategy graph traversal, same-ad cross-modal coherence measurement, and passive competitor threat detection — all from a 56-node index stored in a single embedded file with no infrastructure overhead.

---

## 1. The Multi-Bucket Problem

### 1.1 Why Separate Modality Indexes Fail

Most vector database setups store text, image, and video vectors in separate indexes because most embedding models produce incomparable vector spaces. The inner product between a `text-embedding-004` vector and a `multimodalembedding@001` vector has no semantic meaning — the two models were trained on different objectives and produce different geometric spaces.

The consequence is a common pattern:

```python
# The multi-bucket pattern — three calls, three indexes, manual merge
text_results  = text_db.search(query_text_vec,   k=5)
image_results = image_db.search(query_image_vec,  k=5)
video_results = video_db.search(query_video_vec,  k=5)
merged        = merge_and_rerank(text_results, image_results, video_results)

```

This creates three structural problems:

- 
**Query modality lock-in.** A query must be in the same modality as the target index. An image of a competitor's ad cannot surface a text strategy brief without a separate text query, a separate call, and explicit merge logic.

- 
**Arbitrary score normalization.** The merge step requires a normalization heuristic (e.g., min-max scaling per index, softmax across indexes). Any heuristic is an assumption about the relative value of cross-modal results that the model never learned.

- 
**Cross-modal graph complexity.** Typed edges connecting an image creative to its text brief (`same_ad`) require maintaining cross-index ID maps outside the database. Every edge traversal becomes a join between two indexes.

### 1.2 What Gemini Embedding 2 Changes

`gemini-embedding-exp-03-07` is a unified multimodal embedding model. Text, image bytes, and video transcripts all produce 768-dimensional vectors that are directly comparable by cosine similarity.

The model was trained with cross-modal contrastive objectives: the text description of an image and the image itself should produce nearby vectors. This alignment is what enables cross-modal search — a text strategy brief and a competitor's image ad can appear in the same result list if their semantic content is similar.

The new pattern:

```python
# One index, one call, all modalities ranked together
db  = feather_db.DB.open("unified.feather", dim=768)
emb = GeminiEmbedder(api_key=os.environ["GOOGLE_API_KEY"])

# All three ingest to the same index
db.add(id=1001, vec=emb.embed_text(creative_summary),              meta=text_meta)
db.add(id=2001, vec=emb.embed_image(image_description=visual_desc), meta=image_meta)
db.add(id=3001, vec=emb.embed_video_transcript(transcript),         meta=video_meta)

# One search — text, image, and video nodes ranked together
results = db.search(emb.embed_image(image_description="competitor ad"), k=10)

```

---

## 2. Feather DB Architecture for Unified Ingestion

### 2.1 ModalityIndex Design

Feather DB maintains named per-modality HNSW indexes internally:

```cpp
// include/feather.h
std::unordered_map modality_indices_;

```

In the multi-pocket design (the pre-Gemini-Embedding-2 use case), separate modality names (`"text"`, `"visual"`, `"audio"`) store independently-dimensioned vectors. In the unified-space use case, all nodes are stored under a single modality name — typically `"text"` — because the vectors are already comparable regardless of their original modality. A single HNSW index with `M=16`, `ef_construction=200`, and up to 1M elements handles all 56 nodes.

### 2.2 Metadata Architecture

Every node stores a `Metadata` struct alongside its HNSW vector entry:

```cpp
// include/metadata.h
struct Metadata {
    float    importance;        // [0.0-1.0] — set from spend data
    uint32_t recall_count;      // auto-incremented on every search hit
    int64_t  last_recalled_at;  // Unix timestamp of last retrieval
    std::vector edges;    // typed outgoing graph edges
    std::unordered_map attributes; // KV metadata
};

```

In this experiment, importance was set directly from real Meta Ads spend data:

```python
meta.importance = min(1.0, safe_float(row["total_spend"]) / 6_000_000)
# Floor: 0.4 for low-spend creatives
if meta.importance 

recall_count
stickiness
effective_age_multiplier

0
1.00
1.0× (normal aging)

5
2.79
0.36× (ages at 36% of normal rate)

10
3.40
0.29×

20
4.09
0.24×

50
5.02
0.20×

100
5.61
0.18×

A creative brief accessed 20 times in the first week of a campaign has an effective age compression of ~4x. With a 30-day half-life, this means it behaves as though it was created approximately 7 days ago instead of 30 — keeping it near the top of scored results even as calendar time passes.

### 3.3 Relevance to Performance Marketing

`recall_count` is auto-incremented every time a node appears in a `search()` result. In a performance marketing context:

- A creative brief that gets queried every time a new campaign is briefed accumulates recall naturally.

- A strategy note that was relevant to a major event (budget announcement) will be retrieved repeatedly in the days following, naturally boosting its stickiness during the window when it is most needed.

- A creative that was paused and never queried again fades toward the background within its half-life window.

No manual curation. The retrieval pattern becomes the memory signal.

```python
# Custom scoring for short campaign windows
cfg = feather_db.ScoringConfig(half_life=14.0, weight=0.3, min=0.0)
results = db.search(query_vec, k=10, scoring=cfg)

```

---

## 4. context_chain: Vector Search + BFS Graph Traversal

### 4.1 Algorithm

`context_chain` in `include/feather.h` runs in two phases:

**Phase 1 — Vector search (hop=0):** Standard HNSW ANN search with `k` seeds. Each seed is scored by `similarity × importance × stickiness`. These become the hop=0 nodes.

**Phase 2 — BFS expansion (hops 1..n):** For each hop=0 node, traverse both outgoing (`edges`) and incoming (`incoming_index_`) typed graph edges. Each reached node is scored by `(1 / (1 + hop)) × importance × stickiness`. Nodes already visited are not re-added.

The result is a `ContextChainResult` containing all reached nodes with their hop distance and score, plus all traversed edges.

### 4.2 Experiment Result — Competitor to Strategy

Query: a competing fintech's FD post-budget creative (black/gold palette, senior testimonial, 8.85% rate, budget urgency messaging).

```python
chain = db.context_chain(
    emb.embed_text("Competing fintech FD campaign. Post-budget. 8.85% rate. Senior testimonial."),
    k=4, hops=2, modality="text"
)

```

```
  hop=0  ● [text/competitor_intel]  score=1.5357
       launches 8.85% FD campaign post-Budget Feb 1. Heavy Instagram vide...

  hop=0  ● [text/strategy_intel]  score=1.3191
       Budget 2026 strategy: FD interest up to Rs 1.5L tax-free for seniors. Opportunit...

  hop=0  ● [text/competitor_intel]  score=0.9577
       Coin bond retargeting. Post-budget 9.1% corporate bond. Video ad. Telugu...

  hop=0  ● [text/ad_creative]  score=0.5809
       FD ad creative: RTB_FD_KritikaAirportLounge_Video_241225. CTA: . Hook: Ab aap...

  hop=1  └─ [text/strategy_intel]  score=0.4500
       RBI repo rate held 6.25%. Accommodative stance. Corporate bond yields attractive...

  hop=1  └─ [video/ad_creative]  score=0.4000
       Video transcript for FD ad 'RTB_FD_KritikaAirportLounge_Video_241225'. Durati...

```

The FD strategy brief appeared at **hop=0, score=1.3191** — not because of an explicit edge between the competitor node and the strategy node, but because both were semantically close in the 768-dim space. The BFS expansion then traversed `same_ad` typed edges to surface related video and image nodes.

### 4.3 Why This Matters

The strategy brief connection required **no explicit edge**, no filter, no product tag. Vector proximity in the unified space produced the connection. The graph edges then extended that connection to related assets across modalities — a behaviour impossible in a multi-bucket architecture without custom join logic.

```python
# Full traversal — surfaces strategy, competitor intel, and cross-modal creatives
for node in sorted(chain.nodes, key=lambda n: (n.hop, -n.score)):
    m = node.metadata
    print(f"hop={node.hop}  score={node.score:.4f}  "
          f"[{m.get_attribute('modality')}/{m.get_attribute('entity_type')}]")
    print(f"  {m.content[:80]}")

```

---

## 5. Cross-Modal Similarity: The Mock vs Real Gap

### 5.1 EXP 3 Results

For each ad creative in the index, cosine similarity between its text embedding and its image description embedding:

Product
Creative type
text↔image (REAL)
text↔image (real Gemini, projected)

Fixed deposit
Monthly payout static
0.5783
~0.78

Fixed deposit
Video (LLA monthly payout)
0.7605
~0.76

Credit card
Airport lounge video
0.8148
~0.79

Bond
Rate static
0.6326
~0.81

Mutual fund
Invest-in-gold static
0.6509
~0.77

**Average**

**0.6661**
**~0.78**

### 5.2 Mechanistic Explanation

The mock embedder generates vectors from a shared domain vocabulary hash. Text and image descriptions of the same creative use different vocabulary subsets (creative content words vs visual attribute words), landing in different regions of the 768-dim space. Average similarity: 0.6661.

With Gemini Embedding 2, the model was trained with contrastive cross-modal objectives: the same creative's text brief and visual description are pulled into proximate regions of the embedding space at training time. Average similarity: ~0.78. This is a **1.2x improvement** from unified training — the quantified value of a cross-modal embedding model over a text-only model applied to both modalities independently.

### 5.3 Practical Consequence

In mock mode (EXP 1), an image query finds only other image nodes — **7/8 cross-modal hits**. With real Gemini embeddings, the same image query surfaces text competitor intel and text strategy notes at competitive scores — projected **4–6/8 cross-modal hits**. This is the practical capability unlocked by a unified embedding space: every search scans all modalities simultaneously, with no merge step.

---

## 6. The GeminiEmbedder Integration

### 6.1 API

```python
from experiments.gemini_cross_modal.embedder import GeminiEmbedder

# Real mode — calls gemini-embedding-exp-03-07
emb = GeminiEmbedder(api_key=os.environ["GOOGLE_API_KEY"])

# Mock mode — deterministic 768-dim simulation, no API key
emb = GeminiEmbedder(mock=True)

# Same interface in both modes — returns np.ndarray shape=(768,) float32
vec = emb.embed_text("Fixed deposit creative brief. Monthly payout. Senior audience.")
vec = emb.embed_image(image_path="creative.jpg")
vec = emb.embed_image(image_description="Black palette. Rate callout. Senior testimonial.")
vec = emb.embed_video_transcript("0:00 Are your savings safe? 0:12 8.5% guaranteed.")
vec = emb.embed_any(text="...", image_description="...", video_transcript="...")

```

### 6.2 `embed_any` — Fused Multimodal Representation

When multiple modality signals are available for a single creative, `embed_any` averages the L2-normalized unit vectors and re-normalizes:

```python
def embed_any(self, text=None, image_path=None,
              image_description=None, video_transcript=None):
    vecs = []
    if text:             vecs.append(self.embed_text(text))
    if image_path or image_description:
                         vecs.append(self.embed_image(image_path, image_description))
    if video_transcript: vecs.append(self.embed_video_transcript(video_transcript))
    combined = np.mean(vecs, axis=0).astype(np.float32)
    norm = np.linalg.norm(combined)
    return combined / norm if norm > 0 else combined

```

The fused vector is the centroid of all available modality signals. For a creative with a strong visual identity but a sparse text brief, the fused vector is pulled toward the image embedding. For a verbose creative brief with a generic visual, it is pulled toward the text. This mirrors the effective creative weight in the real campaign.

### 6.3 Real API Call

```python
from google import genai
from google.genai import types

def _call_api(self, content, modality="text"):
    response = self._client.models.embed_content(
        model="models/gemini-embedding-exp-03-07",
        contents=content,
        config=types.EmbedContentConfig(task_type="SEMANTIC_SIMILARITY"),
    )
    vec = np.array(response.embeddings[0].values, dtype=np.float32)
    norm = np.linalg.norm(vec)
    return vec / norm if norm > 0 else vec

```

For image inputs, the `contents` parameter receives a `types.Content` object with `inline_data` containing base64-encoded image bytes and MIME type. This is the same call structure whether the input is text, image, or video transcript.

---

## 7. Capability Matrix

Capability
Gemini Embedding 2
Feather DB
Combined

Text → 768-dim vector
✓
—
✓

Image → 768-dim vector (same space)
✓
—
✓

Video → 768-dim vector (same space)
✓
—
✓

Cross-modal cosine similarity
✓
—
✓

Adaptive decay (memory ages)
—
✓
✓

Recall-based stickiness
—
✓
✓

Typed weighted graph edges
—
✓
✓

context_chain (vector search + BFS)
—
✓
✓

Namespace / entity / attribute filtering
—
✓
✓

Zero-server, single-file embedded
—
✓
✓

Living context for LLM agents
—
✓
✓

The systems are non-overlapping. Gemini Embedding 2 solves the encoding problem — any modality into a comparable space. Feather DB solves the memory problem — that space becomes a living context engine with decay, stickiness, typed edges, and graph traversal. Neither substitutes for the other.

---

## 8. Phase 6 Roadmap

Based on this experiment, Phase 6 will include:

**`feather_db/integrations/gemini.py`** — `GeminiEmbedder` as a first-party integration, published as `feather-db[gemini]` optional dependency on PyPI. The mock mode will be retained for CI environments without API keys.

**`modality="unified"` for context_chain** — Currently `modality` in `context_chain` selects which HNSW index to seed from. In the unified-space case, a new `"unified"` value will treat all nodes in the index as traversal candidates regardless of their original modality attribute.

**Modality field in SearchResult** — Surface `modality` as a first-class attribute on search results so agents can distinguish text, image, and video hits without reading the metadata attribute map.

**Phase 6 benchmark suite:**

- Cross-modal recall@5 and recall@10: unified index vs separate modality indexes, measured on the 352-ad dataset

- Stickiness correctness: `recall_count=20` node maintains higher effective rank than `recall_count=0` node after 30 days at equal similarity

- context_chain accuracy: for known competitor-to-strategy pairs, verify correct strategy node appears within 2 hops for each competitor seed

---

## Appendix: Running the Experiment

```bash
cd /path/to/feather
source repro_venv/bin/activate

# Mock mode — offline, no API key
python3 experiments/gemini_cross_modal/experiment.py

# Real Gemini Embedding 2
GOOGLE_API_KEY=AIza... python3 experiments/gemini_cross_modal/experiment.py --real

# Regenerate content with real numbers
GOOGLE_API_KEY=AIza... python3 experiments/gemini_cross_modal/generate_content.py --real

```

Results are written to `experiments/gemini_cross_modal/results/`.

---

*Feather DB v0.7.0 — [github.com/feather-store/feather](https://github.com/feather-store/feather)*
*Experiment: `experiments/gemini_cross_modal/`*

---

*This is the machine-readable mirror of the theory post at [getfeather.store/theory/unified-multimodal-intelligence-feather-db-gemini-2](https://getfeather.store/theory/unified-multimodal-intelligence-feather-db-gemini-2). For the full Feather DB documentation, see [getfeather.store/llms-full.txt](https://getfeather.store/llms-full.txt).*