What's wrong with prompt tracking
Synthetic prompts, missing data, and the variance nobody accounts for.
👋 Hey, I’m George Chasiotis. Welcome to GrowthWaves, your weekly dose of B2B growth insights—featuring powerful case studies, emerging trends, and unconventional strategies you won’t find anywhere else.
This note is brought to you by Ahrefs.
I spent a few hours building an internal app with Agent A by Ahrefs.
The goal? Speed up how we build keyword universes at Minuttia.
The app takes brand context, pulls thousands of keywords from Ahrefs, classifies them by search intent, and exports everything into a searchable database. One test run generated 11,805 keywords in a few minutes.
No, it doesn’t replace keyword research. And yes, it still needs cleanup and refinement on our side.
But that’s not the point.
What’s exciting is how fast you can now build useful internal marketing tools on top of real search data. And I can already see dozens of use cases for agency workflows.
If you like building systems and automating marketing work, check it out:
My agency, Minuttia, is in talks with a billion-dollar company in the US right now.
They told us we are competing against a US-based agency that most people in this space know.
The kind that positions itself as the top name in AI search, locks you in for 12 months, and charges $200K a year for whatever it is they claim to be doing.
When the prospect asked them a simple question about prompt tracking, the CEO rambled.
If you have watched this person in webinars, you would assume that if anyone has AI search figured out, it is them. Turns out even an agency at the supposed forefront is guessing.
To be fair, I do not think they do not care. I think they care a lot.
BUT
This stuff is so new that having real conviction and communicating it clearly is hard. Everyone in this space, Minuttia included, has had to rebuild how we think about tracking and measurement from scratch.
But that conversation stuck with me. Because the question the prospect asked was not esoteric. It was the most basic question a buyer can ask.
And the answer was a ramble…
So, let’s see what’s wrong with prompt tracking and how you should think about it.
The measurement vacuum
I have written about this before.
In the AEO survey Kevin Indig and Minuttia ran with 599 respondents, 40.6% said their single biggest challenge is a lack of reliable measurement tools and attribution.
Prompt tracking is one of the industry’s attempts to solve that problem. The logic is simple: build a list of prompts your buyers might ask AI engines, run them frequently, and see where your brand appears.
Hundreds of AI search tracking tools now exist for this purpose. We cataloged 240 of them in December 2025. (I’d expect the number to be close to 300 now.)
The category is commoditized. The core use case is identical across most tools.
And yet measurement remains the number one challenge. Which tells you something.
The tools exist. But the problem persists!
Five problems with prompt tracking
First, the graphic:
And now, let’s take a look at each of these problems.
1. Synthetic prompts
Every prompt tracking tool starts with the same premise: build a list of queries your buyers might ask.
The problem is that those lists reflect how you think buyers ask questions. Not how they actually do.
Unironically, prompt lists are built by AI systems hypothesizing what customers might type. The result is measurement against a fabricated reality.
You can score “invisible” on prompts that nobody actually runs. You can score “dominant” on prompts that do not matter.
Real buyer behavior is invisible to you. And that brings us to the second problem.
2. No transparency on actual usage
When Google Search was the only game, we had keyword data. Volume estimates. Search trends. (We can argue about how useful or accurate this data was/is, but the truth is that they’ve been there for a long time.)
Query reports from Search Console alone told you more about search performance than anything AI platforms offer today.
AI platforms give you none of that.
OpenAI, Anthropic, and Perplexity do not publish user query logs. Only they know how often specific questions are asked, which topics generate volume, and where users spend time inside conversations.
Bing recently started sharing some query data. That is a start. But it covers a fraction of the AI search market, and even that data is limited.
Without volume signals, you cannot know which prompts matter. You cannot prioritize. You are flying blind, and every tool that shows you a “search volume” for an AI prompt is extrapolating from incomplete data.
I talked about this at the Baltic-Nordic SEO Summit earlier this year. The AEO industry has a transparency problem, and some of the numbers being sold to marketers are not grounded in anything real.
3. Commercial focus
Most AI search tracking tools focus on bottom-funnel prompts. “Best X.” “X vs Y.” “X alternatives.” “Top tools for Z.”
That is understandable. Those are the prompts closest to purchase intent.
But a significant portion of AI search usage is informational. People ask how to solve problems, how to evaluate approaches, how to think about categories.
Those informational queries are where most citations happen. They are also where perception gets formed.
If you only track commercial prompts, you are measuring a sliver of your actual exposure. The slice that feels most relevant to revenue, but the slice that misses most of what AI engines say about you.
4. Prompt uniqueness
Real prompts are messy.
They include personal context. Constraints. Follow-ups.
Multi-turn refinement makes every conversation unique. Two users asking what sounds like the “same” question almost never phrase it identically.
“What’s a good project management tool for a 5-person agency that’s remote and budget-constrained?” is a different prompt from “best project management software.” But most tracking tools treat the category as a single monolithic query.
Single-shot synthetic prompts are a poor proxy for how people actually use AI search. The gap between what a tool tracks and what a buyer actually types is enormous.
5. Response variance
SparkToro ran a study with 600 volunteers who ran 12 prompts across ChatGPT, Google AI Overviews, and other AI engines. A combined 2,961 runs.
The same prompt produced the same brand list less than 1% of the time. The same list in the same order? Less than 0.1%.
Nearly every response was different. Different brands, different order, different framing.
If your tracking tool runs a prompt once and records the result, that result is a single data point from an inherently unstable distribution. Run it again in five minutes and you will probably get a different answer.
One-off checks are misleading. Measurement in this space requires repeated sampling, and most tools do not account for the variance.
How to think about this
I want to set expectations here.
This is an emerging channel. The tooling is young. The methodologies are immature.
None of this is perfectly solved yet, and anyone who tells you otherwise, well, they’re either not knowing what they’re saying or they have other intentions.
Three things I keep coming back to in conversations with our clients and prospects.
First: no tool beats customer intel. The best prompt data does not come from a software platform. It comes from customers who found you through AI search and prospects who searched for products like yours in AI conversations.
Talk to them. Ask what they typed. Record how they described their problem. That signal is worth more than any synthetic prompt list.
Second: take everything directionally. AI search metrics are directionally helpful. They show you trends, give you rough positioning, tell you whether things are improving or getting worse. What they cannot do is give you the precision that Google Search metrics once offered. Treat them as a compass, not a GPS.
Third: prompt tracking still has value. I am not saying throw out the category. I am saying understand its limits.
Final Thoughts
The CEO who rambled on that call was not incompetent. They run a successful agency with real clients and real revenue.
But when a billion-dollar prospect asks “how do you come up with these prompts?” and the best answer available is a ramble, something is off.
Not with that one agency! With the state of measurement across this entire space.
Prompt tracking is the best tool we have for a problem that does not yet have a clean answer. Use it with your eyes open.
Know where it breaks down. Supplement it with real customer conversations, real buyer language and repeated sampling over time.
The frameworks exist. The conviction comes from doing the work.
Thank you for reading today’s note, and see you again next week.
Note: In our 2-Day Intensive AEO Course, we provide a clear framework for building your Prompt Universe. The first cohort was in May, but due to demand and feedback, we’ll start offering the course on demand on Minuttia’s website. If you’re interested, please reply to this email, and I’ll drop you a note when we launch it.
Research Disclaimers and Limitations
GrowthWaves and its author are not sponsored by or compensated by any company mentioned in this note. This is independent editorial analysis and does not constitute investment, financial, or legal advice. The author may have relationships with, work with, or hold equity in companies referenced; however, no content in this piece was influenced, commissioned, or incentivized by any such relationship. AI tools were used as a research assistant in the preparation of this piece. All claims are sourced and linked throughout.
Sources
SparkToro, NEW Research: AIs are highly inconsistent when recommending brands or products
GrowthWaves x Growth Memo, The State of AEO: What 599 Marketers Told Us About AI Search
GrowthWaves, AI Search Tracking Tools Report
GrowthWaves, Perception Deviation: The most important metric you’re not tracking



