The Memory Architecture Every Performance Marketing AI Needs

The default setup and why it fails

Most performance marketing teams that add AI memory to their stack do something straightforward: they pick a vector database, dump in campaign data, and run similarity searches at brief time. This is better than nothing. It is also consistently insufficient for the actual problem.

The specific failure modes are predictable:

A winning hook from 18 months ago surfaces ahead of one from last quarter because its text is more semantically similar to the query — the system has no concept of recency.
A brand guardrail ingested two years ago competes for retrieval budget with last week's competitor intelligence — the system has no concept of importance weighting.
The performance data that proves a hook's effectiveness is in a separate system — the retrieval returns the hook without the evidence.
A competitor move that shaped the strategic context for a successful campaign is disconnected from the creative it influenced — the causal chain is invisible to the AI.

These are not edge cases. They are the normal operating conditions of a performance marketing context system. The architecture needs to be designed for them.

The four-layer architecture

Layer 1: Namespace separation by signal type

Performance marketing knowledge lives in at least four categories that should not share a retrieval namespace:

brand::hooks — winning copy angles with spend data. Half-life: 270 days.
brand::guardrails — tone, claim, and visual identity rules. Half-life: 3,650 days.
brand::competitors — competitor creative intelligence, offer changes, campaign angles. Half-life: 45 days.
brand::audiences — segment-level behavioral insights, objections, resonance patterns. Half-life: 90 days.

import feather_db as fdb
from feather_db import ScoringConfig

db = fdb.DB.open("brand_acme.feather", dim=768)

SCORING = {
    "brand::hooks":       ScoringConfig(half_life=270.0,  weight=0.3,  min=0.0),
    "brand::guardrails":  ScoringConfig(half_life=3650.0, weight=0.05, min=0.0),
    "brand::competitors": ScoringConfig(half_life=45.0,   weight=0.45, min=0.0),
    "brand::audiences":   ScoringConfig(half_life=90.0,   weight=0.35, min=0.0),
}

Layer 2: Spend-weighted importance

Not all campaign history is equally evidenced. The architecture encodes this at ingestion time:

def compute_importance(total_spend: float, scale_at: float = 100_000) -> float:
    raw = min(1.0, total_spend / scale_at)
    return max(0.2, raw)  # floor at 0.2 so even small tests contribute

meta.set_attribute("importance", compute_importance(campaign_spend))

The importance attribute multiplies into the final retrieval score, ensuring that high-spend, high-evidence hooks consistently outrank low-evidence tests when semantically equally relevant.

Layer 3: Typed edges for causal chains

The graph layer separates a context engine from a vector search. Performance marketing causality has a consistent structure: competitor moves precede creative pivots; audience insights inform hook selection; performance data validates creative choices:

# Creative proven by its performance record
db.link(from_id=hook_id, to_id=perf_record_id,
        rel_type="proven_by", weight=0.95)

# Creative informed by an audience insight
db.link(from_id=hook_id, to_id=audience_insight_id,
        rel_type="informed_by", weight=0.8)

# Creative that responded to a competitor move
db.link(from_id=hook_id, to_id=competitor_move_id,
        rel_type="responded_to", weight=0.75)

Layer 4: Context chain retrieval

The retrieval pattern for brief generation is a two-hop context chain, not a flat similarity search:

chain = db.context_chain(
    query_vec=embedder.embed(brief_query),
    k=5,
    hops=2,
    namespace="brand::hooks",
    scoring=SCORING["brand::hooks"]
)

for node in sorted(chain.nodes, key=lambda n: (n.hop, -n.score)):
    print(f"hop={node.hop}  score={node.score:.4f}  {node.text[:100]}")

The output at hop=0 is the winning hooks. At hop=1 it is performance records and audience insights. At hop=2 it is the competitive context that shaped those creative decisions. The brief generation model receives all three layers as structured context.

The LongMemEval case for this architecture

LongMemEval tests AI memory quality across long interaction histories. Feather DB scores 0.693 on this benchmark — above GPT-4o's 0.640 with full context. Decay-weighted retrieval consistently surfaces more relevant context than brute-force context dumping, and the graph layer surfaces causally connected evidence that flat retrieval misses.

The cost case: the full LongMemEval benchmark run costs $2.40 with Gemini Flash. A single 300K-token GPT-4o context call — the naive alternative for one brief — costs around $2.25. The architecture produces better outputs at a fraction of the per-query cost.

The full ingestion pipeline

from feather_db import MetaRecord
import feather_db as fdb

def ingest_campaign_result(db, hook_text, perf_data, audience_note,
                           competitor_context, embedder):
    base_id = hash(hook_text) & 0xFFFFFFFF

    hook_meta = MetaRecord()
    hook_meta.set_attribute("importance", compute_importance(perf_data["spend"]))
    hook_meta.set_attribute("ctr", perf_data["ctr"])
    hook_meta.set_attribute("cpl", perf_data["cpl"])
    db.add(base_id, embedder.embed(hook_text), hook_text,
           namespace="brand::hooks", meta=hook_meta)

    perf_id = base_id + 1
    db.add(perf_id, embedder.embed(str(perf_data)), str(perf_data),
           namespace="brand::hooks")
    db.link(base_id, perf_id, "proven_by", 0.95)

    aud_id = base_id + 2
    db.add(aud_id, embedder.embed(audience_note), audience_note,
           namespace="brand::audiences")
    db.link(base_id, aud_id, "informed_by", 0.8)

    if competitor_context:
        comp_id = base_id + 3
        db.add(comp_id, embedder.embed(competitor_context), competitor_context,
               namespace="brand::competitors")
        db.link(base_id, comp_id, "responded_to", 0.75)

    return base_id

FAQ

What is the minimum viable context memory setup for a performance marketing team?

One Feather DB file per brand, two namespaces (hooks and guardrails), spend-weighted importance at ingestion, and a flat semantic search at brief time. The graph layer and multi-namespace decay tuning add 40–60% more retrieval relevance but require more ingestion discipline.

How many campaigns do you need before the context engine produces useful results?

Signal is useful from the first ingested campaign. Quality improves nonlinearly: 10–20 campaigns give a useful baseline, 50+ give enough diversity to surface non-obvious patterns, 100+ start revealing the causal structures that make the graph layer valuable.

Can this architecture work for an agency with 20+ brand clients?

Yes. One .feather file per client, opened on demand. Each file is self-contained and portable. An agency runs 20 separate memory stores without any shared infrastructure — brand A's creative memory cannot contaminate brand B's retrieval.

How does spend-weighted importance interact with temporal decay?

They multiply together in the final score: final_score = similarity * recency * importance. A high-spend old hook and a low-spend recent test can rank similarly if their recency and importance offsets cancel out. The system does not blindly favor old winners or new tests — it balances evidence weight against freshness.