Context Memory for Black-Box Creative Testing: Advantage+ and Performance Max

The black-box problem in modern creative testing

Meta's Advantage+ and Google's Performance Max represent the same structural shift in paid media: the platform takes control of audience selection, placement, bidding, and creative sequencing, and optimizes toward your conversion goal. The upside is real — automation reduces manual optimization work and the platform's audience signals are richer than most advertisers can build independently. The downside is equally real — the creative signal that comes back is minimal.

With manual campaigns, you could split audiences, isolate creative variables, and attribute performance differences to specific creative choices with reasonable confidence. With Advantage+ and PMax, the platform blends audiences and placements dynamically. The signal you get back is holistic campaign performance — ROAS or CPA or conversion volume — not creative-level attribution. You know what the campaign did. You do not know which creatives drove it, or why.

The result is that creative testing in black-box environments requires a different approach to learning. You are not running controlled experiments. You are observing outcomes across creative sets and trying to extract signal from a noisy, uncontrolled environment. That requires memory — a system that stores what you put in, what came out, and can connect those two things semantically over time. That is a context engine.

What you can observe in black-box campaigns

Even in Advantage+ and PMax, some creative-level signal is available. Meta Advantage+ exposes creative-level impressions, reach, and spend allocation within the campaign, even if it does not give you clean audience breakdowns. Google PMax provides asset-level performance ratings (Low / Good / Best) and some conversion attribution by asset group. Neither is the clean experiment result you would get from a manual campaign, but both are signal.

The challenge is that this signal is only useful if you can connect it to what you have learned from previous campaigns. A "Best" rating on an emotional-narrative video hook is not actionable in isolation. It is actionable when you can retrieve: the last five emotional-narrative hooks this brand ran, what their performance signals were in previous campaigns, what audience signals correlated with their best periods. That connection requires memory.

How a context engine extracts signal from black-box environments

The context engine approach to black-box creative testing has two phases: ingestion and retrieval. At ingestion, you store the creative, its campaign context, and whatever signal was available — asset ratings, spend allocation percentages, campaign-level ROAS, the competitive environment at run time. At retrieval, you query the index at brief time to surface what similar creative inputs have historically produced in similar black-box environments.

import feather_db as fdb
from feather_db import ScoringConfig, MetaRecord

db = fdb.DB.open("brand_blackbox_memory.feather", dim=768)

def ingest_pmax_result(asset_group: dict, campaign: dict, embedder):
    for asset in asset_group["headlines"] + asset_group["descriptions"]:
        vec = embedder.embed(asset["text"])
        meta = MetaRecord()
        meta.set_attribute("asset_rating",   asset.get("rating", "unknown"))  # Low/Good/Best
        meta.set_attribute("campaign_roas",  campaign["roas"])
        meta.set_attribute("campaign_cpa",   campaign["cpa"])
        meta.set_attribute("platform",       "pmax")
        meta.set_attribute("asset_type",     asset["type"])  # headline / description / image
        meta.set_attribute("importance",     campaign["spend"] / 50_000)  # normalize by spend

        db.add(
            id=f"{campaign['id']}::{asset['id']}",
            vec=vec,
            text=asset["text"],
            namespace="brand::pmax_assets",
            meta=meta
        )

def query_pmax_context(new_brief: str, embedder) -> dict:
    q = embedder.embed(new_brief)
    return {
        "best_rated_assets": db.search(
            q, k=6, namespace="brand::pmax_assets",
            scoring=ScoringConfig(half_life=60.0, weight=0.5, min=0.0),
            filters={"asset_rating": "Best"}
        ),
        "high_roas_context": db.search(
            q, k=4, namespace="brand::pmax_assets",
            scoring=ScoringConfig(half_life=90.0, weight=0.4, min=0.0),
            filters={"min_campaign_roas": 3.5}
        ),
    }

At 97.2% recall@10 with the HNSW index, the retrieval surfaces semantically relevant precedents from the full history of black-box campaigns — not just the most recent, but the most relevant given the current brief parameters and their time-decayed importance.

Building the creative brief from black-box memory

The practical output of this retrieval is a structured brief that reflects what has historically worked in black-box environments for this brand. Not what worked in a controlled experiment — what worked, as best as can be inferred, in the messy, mixed-signal environment that Advantage+ and PMax actually produce.

Hawky.ai runs this pattern for brands including Puma and Amazon, where black-box campaign environments are the norm. The 20% CTR uplift in 7 days reported by Univest, a fintech running performance campaigns, reflects the immediate impact of starting creative rounds from context-engine-informed briefs rather than cold starts. The signal that PMax and Advantage+ surface — incomplete as it is — accumulates in the context engine and improves each successive brief cycle.

Managing the noise-to-signal problem

Black-box campaign data is noisier than controlled experiment data. The context engine handles this through importance weighting. A campaign that ran for two days before being paused provides weak signal; a campaign that ran for six weeks with $150K spend provides strong signal. The importance weight assigned at ingestion time ensures that high-confidence results dominate retrievals over low-confidence ones.

The half-life decay layer adds a second noise-management dimension: recent results, which reflect the current competitive environment and audience state, carry more weight than older results, which may reflect conditions that no longer apply. The combination of importance weighting and temporal decay produces a retrieval layer that surfaces high-confidence, recent evidence over low-confidence, stale evidence — without requiring manual curation.

Advantage+ versus PMax: different signals, same memory approach

Advantage+ (Meta) and PMax (Google) surface different types of signal, but the memory approach is the same for both. Meta Advantage+ tends to give more creative-level spend distribution data; PMax gives asset performance ratings but less spend granularity. Both can be ingested into the same context engine, tagged by platform, so that platform-specific retrieval is available when needed — but cross-platform pattern retrieval is also possible for hooks or messaging angles that appear in both environments.

Feather DB's namespace and metadata filtering system handles this cross-platform ingestion cleanly. One .feather file holds both Advantage+ and PMax history; platform is a metadata attribute used to filter or weight retrievals depending on the brief context.

FAQ

How do you measure creative performance in Advantage+ if there is no clean creative-level ROAS?

Use proxy signals: spend allocation as a percentage of campaign total (more spend allocated to a creative by the algorithm is a positive signal), creative-level impression share, and the platform's own asset ratings where available. These are imperfect but directional. Stored with importance weights reflecting their confidence level, they accumulate into a usable signal over many campaigns. The context engine does not require clean controlled data — it works with the noisy, proxy-signal data that black-box environments produce.

Can a context engine improve creative performance in Advantage+ or PMax directly?

Not directly — the platform controls optimization. The context engine improves the input to the platform: the creative assets uploaded to the campaign. By briefing creatives from evidence about what has historically performed well in black-box environments for this brand, the asset set entering the next Advantage+ or PMax campaign is better-grounded than a cold-start brief would produce. The platform then optimizes across a better starting set.

How many black-box campaign cycles are needed before the context engine produces useful briefs?

Useful retrieval begins at 5–10 campaigns with ingested asset data. Reliable pattern detection requires 20–30 campaigns. For teams switching to Advantage+ or PMax from manual campaigns, historical manual campaign data can be ingested first as a bootstrap — the creative signals from manual campaigns transfer to black-box brief generation even if the environments differ.

Does the context engine need to be retrained when creative strategy changes significantly?

No retraining — Feather DB does not use trained models. The half-life decay handles creative strategy evolution naturally: old assets from a previous creative strategy decay in importance, while new assets reflecting the updated strategy accumulate with full importance weights. The transition happens gradually over campaign cycles without manual intervention. If a complete clean break is needed, a new namespace partition can separate pre- and post-strategy assets explicitly.

How does this approach handle creative testing in Advantage+ Shopping specifically?

Advantage+ Shopping provides product-level and creative-level performance signals within the catalog. These can be ingested as product-creative pairs — which product images and copy combinations historically allocated more spend — and stored in a dedicated namespace. At brief time for the next catalog campaign, the retrieval surfaces which product-creative combinations have the strongest historical signal for this brand's catalog audience.