The new-product ideas have been narrowed to three. In the meeting room, the loudest person's favorite and an executive's pet idea are still standing. But the people in the meeting room aren't the ones who'll buy it. Finding out it "didn't sell" only after launch is the most expensive kind of failure there is.
A concept test is the survey where you ask your target customers "what do you think of this?" before you put the idea out into the world. It looks simple, but get one piece of the design wrong and the numbers start lying to you. "Every concept scores above 70% on purchase intent" — a familiar sight, and a textbook symptom of choosing the wrong presentation format. This guide works through how to choose your presentation format, the standard metrics worth measuring, how to read Top Box scores, and the norm comparison you need to answer the real question: "is our number high, or is it low?"
1. What a Concept Test Is — Insurance Against the Most Expensive Failure
A concept test presents a concept — an idea at the stage where there's no actual product yet — to target customers and measures how well it's received. It applies to products, services, features, advertising, and more. It has two goals.
- The Go / No-Go decision: Is this worth launching at all? Of several options, which one do you advance?
- Finding what to improve: Where does the concept land, and where does it fall flat?
The biggest payoff is finding out before you build. If you can validate direction before prototype development, inventory, and ad spend, the cost of being wrong shrinks by orders of magnitude. Put the other way: run a sloppy test here, misread it as "we're good to go," and you invite the most expensive failure of all — pulling out of the market after launch.
A concept test also comes before the "what do we charge?" stage. Once you've confirmed acceptance, you use the Van Westendorp Price Sensitivity Meter (PSM) and conjoint analysis to nail down the optimal mix of price and features, and MaxDiff to rank the elements you should lead with. The concept test is the entry point to this whole cluster of pre-launch research.
2. Choosing the Presentation Format — Monadic / Sequential Monadic / Comparative
The single biggest design decision in a concept test is how you show it. The presentation format changes the results dramatically. Get this wrong and you get the "everything's above 70%" from the opening — or the opposite, "everything's low."
The three ways to present a concept
Principles for choosing
- The final Go / No-Go decision → monadic: It eats sample, but it's the closest to a real purchase (people encounter one concept at a time on the shelf) and lets you read the absolute level
- Limited sample → sequential monadic with randomized order: The practical compromise. For order effects, randomization is mandatory — see order effects and question-order design
- First-pass screening from many concepts down to a few → comparative: When all you want is the relative ranking
Never misread "comparative showed all concepts and they scored high" as the absolute level. Lining them up exaggerates the differences and bears little resemblance to the real market, where people meet one concept at a time.
3. What to Measure — The Standard Concept-Test Metrics
The metrics are fairly settled across the industry. These five are the bare minimum to cover.
- Purchase Intent: "If this were launched, would you want to buy it?" A 5-point scale ("definitely would buy" to "definitely would not buy") is standard. The single most important metric
- Uniqueness / Newness: "Does this feel like something genuinely new?" When newness is low, there's no reason to switch from what people already have
- Appeal / Liking: "Overall, how appealing do you find it?" The first-impression, all-in evaluation
- Relevance: "Does this fit your needs?" Even something highly novel won't sell if it isn't relevant to me
- Differentiation: "Do you think this is different from other products?"
Balancing newness against relevance
The trade-off between newness and relevance matters most of all.
- High newness, low relevance → "Interesting, but not for me." Generates buzz, doesn't sell
- High relevance, low newness → "Looks handy, but what I have is enough." No switching happens
- Both high → ideal. New, and necessary to me
Don't ride the highs and lows of purchase intent alone. Decompose "why is purchase intent at that level?" into newness and relevance and the direction for improvement becomes visible — whether to add newness or to make it more personally relevant.
4. Reading Top Box Scores — The Discipline of Discounting Your Numbers
For tallying purchase intent, use Top Box / Top 2 Box (T2B). On a 5-point scale, "definitely would buy" is the Top Box, and "definitely would buy + probably would buy" is the T2B.
Purchase intent always runs high
This is the biggest pitfall. Survey-measured purchase intent always comes out higher than actual purchasing behavior. Saying "I'd buy it" costs nothing. The actual purchase rate of people who answered "definitely would buy" almost never lives up to that number.
In practice, the standard move is to weight "definitely would buy" heavily and discount "probably would buy" steeply. Some industries hold a conversion factor — "what share of T2B actually buys," built from their own historical results — but this varies enormously by product and price band, so borrowing another company's factor won't fit.
Which is why you need norm comparison (next section)
Even after discounting, you need a benchmark to judge whether the discounted number is high or low. That benchmark is the norm.
5. Norm Comparison — A "70%" on Its Own Tells You Nothing
The most common error in concept testing is judging on the absolute value of the score alone. "Purchase-intent T2B is 65% — that's high." Is it, really?
The level of purchase intent shifts dramatically with category, price band, and survey method. For a new flavor of an everyday consumer good, a T2B of 70% may be unremarkable; for a high-priced durable, 40% might be excellent. Only by comparing against past concepts, competitors, and the category average measured with the same survey design — the norm — can you say "high" or "low."
How to build and use norms
- Accumulate your own past concepts under the same design: The most trustworthy norm. Use the scores of products that succeeded and those that failed as your baseline
- Plant an "anchor" inside the same survey: Alongside the test concept, have respondents evaluate one of your own existing hits or a competitor's product on the same questions. This tells you "how does the new concept compare to an existing hit?" under identical conditions
- Use a research firm's normative database: Commercial norm databases such as BASES (NielsenIQ) hold category-level benchmarks — but they're method-dependent, so they assume you measured with that firm's method
A standalone score is almost meaningless. It only becomes a basis for judgment paired with a comparison. That is the iron rule of concept testing.
6. Designing the Stimulus (the Concept Statement) — The Battle Before You Measure
It's easy to overlook, but how you present the concept — the stimulus itself — drives the results. The same idea scores differently depending on how the stimulus is crafted.
The standard structure of a concept statement
A good concept statement generally has these elements.
- Insight / problem: "Don't you find this frustrating?" (the gateway to empathy)
- Benefit: how it solves that problem (the value it delivers)
- Reason to Believe (RTB): why that's possible (the grounds, technology, or track record that make it credible)
- Product form and usage occasion: concretely what it is, when, and how it's used
Watch-outs in stimulus design
- Match the information volume and craft across concepts: If concept A is polished and concept B is sloppy, you're measuring the skill of the copywriting, not the concept. Fairness of comparison is everything
- Don't turn it into an ad: Add hype or hard-sell language and you measure the power of the advertising, not the raw strength of the concept. State the value plainly
- Strip out jargon and internal terms: Use words the target can understand on one read. A low score for a stimulus they couldn't parse is a failure of communication, not a rejection of the concept
Crafting the text you present is continuous with the wording of your questions. The principles for avoiding leading and hype in the complete guide to writing survey questions apply directly.
7. From the Editor's Desk — Five Things Never to Do in a Concept Test
From the vantage point of continuously following industry cases and the voices of practitioners, here are five accidents that recur in concept testing.
1. Misreading comparative high scores as the absolute level
The most frequent. You line up all the concepts, the winner scores T2B 75%, and you read it as "75% of the market will buy." Lining them up exaggerates the differences; it does not produce an absolute level. Make the Go / No-Go call from monadic, after measuring the absolute level. Comparative is for first-pass screening only.
2. Judging absolute values without a norm
Declaring "65% purchase intent is high" with nothing to compare against. The level swings wildly with category and price band. Only by lining up past concepts, competitors, and the category average under the same design can you speak of high or low. A standalone score is no basis for judgment. At minimum, plant an anchor (an existing product) inside the same survey.
3. Taking purchase intent at face value
Dropping "40% definitely would buy" straight into the business plan. Purchase intent always runs high. Weight "definitely would buy" heavily and discount "probably would buy" steeply. Build the conversion factor from your own history — factors from other companies and other categories won't fit.
4. Varying the stimulus craft by concept
A clean stimulus for your favorite, a half-hearted one for the challenger. That measures the skill of the copy, not the strength of the concept. Match the information volume, tone, and craft across every concept. A study where the fairness of comparison has collapsed is meaningless no matter how much sample you collect.
5. Settling for asking people outside the target
Because they're easy to reach, you ask people who aren't the target — existing heavy users, or staff members' acquaintances. Evaluating a new product is meaningless unless you ask the target you actually want to buy it. Narrowing down to qualified respondents with screening is mandatory. For respondent design, see the guide to designing and running screening questions.
8. Running a Concept Test with the Survey Tool Kicue
A concept test splits into a design phase — "present the stimulus and measure the standard metrics" — and an analysis phase — "interpret it through norm comparison and significance testing." Kicue mainly handles the former.
- Presenting the concept stimulus: You can design the presentation of a concept statement (text) together with Likert questions for purchase intent, newness, appeal, and the rest (question types)
- Branching / randomization for monadic designs: Use display conditions and branching logic to assign respondents by concept (monadic) and to randomize presentation order (sequential monadic) (the complete guide to branching logic)
- Including an anchor (an existing product): Put evaluation questions for an existing hit or competitor product inside the same form to build the foundation for norm comparison
- Respondent screening: Use screening questions up front to exclude people outside the target
- CSV export with respondent IDs: Output structured data so you can run Top Box tallies and cross-concept comparisons externally
⚠️ What Kicue can't do
- Rich video and image stimuli have constraints: Presenting an elaborate video concept or a detailed package image may require workarounds such as linking to externally hosted media (confirm the presentation format in advance)
- No norm database or industry benchmarks: Comparison against commercial norms such as BASES is an external service. Kicue provides only your own survey data
- No significance testing or conversion factors: Testing the difference in purchase intent between concepts, or converting T2B to actual purchase, is done in Excel / R / Python / SPSS (see the guide to aggregation and significance testing)
- Strict quota / cell management has constraints: Strictly balancing sample sizes across the cells of a monadic design may require working with an external panel company
As related reading, pairing the Van Westendorp PSM design guide, conjoint analysis in practice, the MaxDiff design guide, the guide to designing and running screening questions, and the complete guide to writing survey questions brings the whole pre-launch research pipeline into view: "evaluate the concept → nail down price and features → rank what to lead with."
Summary — Six Points That Make a Concept Test Trustworthy
- Go / No-Go from monadic — monadic is the only format that measures the absolute level. Comparative is for first-pass screening only
- Don't look at purchase intent alone — decompose it into newness × relevance to read the direction for improvement
- Discount the Top Box — purchase intent always runs high. Weight "definitely would buy" heavily and discount "probably" steeply
- You can only call it high or low against a norm — a standalone score is meaningless. Compare past concepts, competitors, and anchors under an identical design
- Craft the stimulus fairly across concepts — measure the strength of the concept, not the skill of the copy
- Ask the target — narrow down to qualified respondents with screening. Asking whoever's easy to reach gives you no basis for judgment
A concept test isn't about "running a survey" as an end in itself. By not dropping the three things — presentation format, norm comparison, and fairness of the stimulus — you get to decide Go / No-Go on the voice of the market rather than the loudest voice in the room: the highest-ROI insurance there is before launch.
If you want to design a pre-launch concept evaluation survey, why not try Kicue — a free survey tool? From presenting a concept statement and designing Likert questions for purchase intent, newness, and appeal, to the branching logic for monadic assignment, respondent screening, and CSV export with respondent IDs, you can get the survey portion of a concept test started in a single account (norm-database comparison, statistical significance testing, and converting T2B to actual purchase are handled in combination with external norm services or R / Python / SPSS).
References (4)
- Page, A. L., & Rosenbaum, H. F. (1992). Developing an Effective Concept Testing Program for Consumer Durables. Journal of Product Innovation Management, 9(4), 267-277.
- Moore, W. L. (1982). Concept Testing. Journal of Business Research, 10(3), 279-294.
- Dahan, E., & Hauser, J. R. (2002). The Virtual Customer. Journal of Product Innovation Management, 19(5), 332-353.
- Morwitz, V. G., Steckel, J. H., & Gupta, A. (2007). When do purchase intentions predict sales?. International Journal of Forecasting, 23(3), 347-364.
