You ask "Which feature is the highest priority?", you use a Likert scale, and you end up with a report where every single item is rated 'Extremely Important'. If you've done research for even a year, you've seen this view at least once. That head-in-hands moment of "they're saying everything is important... I can't make a decision with this..."
The technique that structurally avoids this ceiling effect is MaxDiff (Maximum Difference Scaling, Best-Worst Scaling). In this piece, I'll walk through why Likert scales fail to produce priorities, the basic structure of MaxDiff, the conventions of experimental design, how to think about sample size, score calculation (counting vs hierarchical Bayes), and how to choose between MaxDiff, conjoint, and PSM — drawing on both the field practices of implementation vendors and the academic origins.
1. Why Likert Scales Don't Produce Priorities
Line up 10 items and ask people to "rate the importance on a 5-point scale", and almost every item ends up at 'Important' or 'Very Important'. This is the structural weakness of Likert scales.
Three main causes:
- Ceiling effect: When respondents feel "everything matters", they stack near the upper limit. If all 10 items score a 5, you can't distinguish priorities
- Social desirability bias: Items that are hard to disagree with — "security", "quality", "support" — score higher than their actual priority
- Satisficing under response burden: Tagging each of 10 items with a star is monotonous work, and respondents start cutting corners by the second half
The result: data that says "everything is important" and can't drive a decision.
We cover Likert scales themselves in the Likert Scale Design Guide, but Likert is structurally unsuited for "I want to rank things" — and that's exactly why MaxDiff exists.
2. The Basic Structure of MaxDiff — Pick Best & Worst
MaxDiff is a method where you present 4–5 items at a time and ask the respondent to pick the most important (best) and least important (worst). Repeat this for 10–15 blocks, and you can statistically estimate the relative priority of each item.
What the Question Looks Like
Example: if you want to compare 10 items, the respondent sees a screen like this 12 times.
Of the following 4 items, please select the most important and the least important.
[ ] Low price Most important [○] Least important [ ]
[ ] Support quality Most important [ ] Least important [○]
[ ] Range of features Most important [ ] Least important [ ]
[ ] Ease of use Most important [ ] Least important [ ]
Each respondent is forced to directly compare items against each other, so they can't escape into "everything is a 5" the way they can with Likert. The relative strengths between items come out cleanly.
Why This Format Works
The key insight from Louviere, J. J., & Woodworth, G. (1990). Best-worst analysis is the psychological fact that "relative choice is more natural for humans than absolute rating". We're terrible at deciding "this is a 7", but we can answer "do you prefer A or B?" instantly. MaxDiff is a design that leans honestly into this cognitive trait.
3. Experimental Design — Conventions of Incomplete Block Design
The heart of MaxDiff is the experimental design. When you're comparing 10 items, you can't show every combination (45 pairs) to a respondent, so you use a Balanced Incomplete Block Design (BIBD) to randomly distribute the items.
Basic Design Rules
- 4–5 items per block: too many makes the choice hard, too few makes the comparison information thin
- Each item appears the same number of times: if you spread 10 items across 12 blocks, each item shows up about 5 times
- Each item pair co-occurs the same number of times: equalize how often "price" and "support" appear together in the same block
- Randomize item position: prevent display-order effects
The Reality of Implementation
Building a perfect BIBD by hand isn't realistic, so the standard is to use specialized tools:
- Sawtooth Software Lighthouse / Discover: the industry-standard MaxDiff vendor, auto-generates designs
- R package
support.BWS: open source, widely used in academic settings - SurveyEngine / Conjoint.ly: cloud-based, template-supported
These tools take your item count as input and generate the block design automatically. Don't build it from scratch by hand — that's the iron rule.
4. How to Decide Sample Size and Number of Repetitions
"How many people are enough?" and "how many blocks should one respondent see?" are the most debated practical questions in MaxDiff.
Rule of Thumb for Repetitions (Blocks per Respondent)
- Item count × 3 / 4 is the industry rule of thumb. Example: 10 items means 7–8 blocks per person, 15 items means 11–12 blocks
- Too few repetitions makes individual-level estimation unstable; too many drives up drop-off due to response burden
- The practical move is to work backwards from a 5–10 minute response time cap
Rule of Thumb for Sample Size
- Aggregate-level analysis only: N=200–300 is enough
- Hierarchical Bayes by segment: N≥100 per segment, total N=400–500
- Individual-level estimation (deep analysis of key customers): N≥500
Orme, B. K. (2010). Getting Started with Conjoint Analysis (2nd ed.) organizes MaxDiff sample design from Sawtooth Software's implementation experience, and is referenced as the working standard guideline in the field.
For more detail, see How to Calculate Survey Sample Size, which covers the fundamentals of sample size calculation.
5. Score Calculation — Counting Analysis vs Hierarchical Bayes
There are broadly two ways to calculate "priority scores per item" from MaxDiff response data.
Counting Analysis (Simple Version)
- For each item, tally "times selected as best − times selected as worst"
- Compare across items after tallying
- Doable in Excel, simple to interpret, useful when you want a rough aggregate-level ranking
That said, counting analysis can't give you individual-level scores or fine-grained segment comparisons.
Hierarchical Bayes Estimation (HB)
- Estimate each respondent's individual score using a prior (group mean) + posterior correction (individual's choices)
- Produces individual-level scores, so you can use it for segmentation and clustering
- Standard implementations use Sawtooth Software's HB module, or R's
bayesm/ChoiceModelRpackages
Marley, A. A. J., & Louviere, J. J. (2005). Some probabilistic models for best, worst, and best-worst choices lays out the mathematical models for best-worst choice (random utility models, MNL), and serves as the theoretical foundation for HB implementations.
Choosing in Practice
- For an executive deck, you want to show "Feature A is 3× as important as Feature B" → HB estimation (individual scores → present averages)
- You want to compare priorities across 5 segments → HB estimation (segment-level posteriors)
- You just want to share "what's the top priority for Q1" internally in a concise way → counting analysis is plenty
For the theoretical detail of Bayesian estimation, reading Survey Aggregation and Significance Testing — Cross-Tabs, Chi-Square, Effect Sizes alongside this guide makes the frequentist contrast easier to grasp.
6. Choosing Between MaxDiff, Conjoint, and PSM
As the big three of pricing and priority research, MaxDiff, conjoint, and PSM often get discussed side by side. Each answers a different question and fits a different scenario.
Choosing Between MaxDiff / Conjoint / PSM
Selection Flow in Practice
- "I want to decide what to develop first" → MaxDiff
- "I want to see if this price + feature bundle will sell" → Conjoint
- "Should the initial price be 50, or $80?" → PSM
There are cases where you use them in parallel. A standard pattern for mid-sized projects: use MaxDiff to narrow down feature priorities, then a conjoint that bundles the top 3 features, with PSM for price range.
For more detail, reading Conjoint Analysis in Practice and the Van Westendorp PSM Design Guide alongside this piece makes the three-sibling method selection clearer.
7. The Editorial View — 5 Things That Always Pay Off in MaxDiff Implementation
From the vantage of continuously tracking industry cases and public vendor articles, here are 5 things that always pay off in MaxDiff implementation.
1. Narrow Down to 10–20 Items Before You Start
"I want to throw all 30 items into MaxDiff" is a common request, but 30 items requires about 22 blocks per respondent, and response burden collapses. The field convention is to internally debate "obviously keep / obviously drop" before MaxDiff and narrow to under 20 items before running it.
2. Match Item Granularity
If you line up "low price" next to "ease of use of the inquiry form" as peers, the abstraction levels are too different and respondents can't compare them. Consciously match the granularity (level of abstraction) of items — for example, all at "feature category" level, or all at "specific touchpoint" level.
3. Don't Mix "Importance" and "Satisfaction"
If you want to capture both "importance" and "current satisfaction" in MaxDiff in the same survey, split them into separate blocks. Asking respondents to choose "important and satisfying" in the same block confuses them. If you're pairing this with Kano model analysis, designing it as a separate survey is the safe move.
4. Verify Block Display on Real Devices in Pretest
MaxDiff block screens frequently have item text that wraps awkwardly and becomes hard to read on mobile devices. Before going live, always check the display on both iOS and Android. The Survey Pre-Launch Checklist organizes the framework for pre-launch verification.
5. Reports Need the "Score + Rank + Effect Size" 3-Piece Set
Just showing executives "Feature A is 28.5 points" doesn't land. Put the 3-piece set of "score", "rank", and "is the difference between Feature A vs Feature B statistically significant" on a single page. With HB estimation, posterior distribution overlap shows significance intuitively.
8. Implementing MaxDiff in the Survey Tool Kicue
⚠️ Important caveat: Kicue does not have a dedicated MaxDiff question type. Compared to general research-specialist tools (Sawtooth Software / SurveyEngine / Conjoint.ly), the automation of design and analysis is limited.
Two Options for Implementing MaxDiff in Kicue
Option A: Substitute implementation in Kicue
You can reproduce MaxDiff behavior with iterated blocks of single-answer questions:
- "Most important among the following 4 items" as a single-answer question, repeated 12 times
- "Least important among the following 4 items" as a single-answer question, repeated 12 times
- Vary the item set per block (generate the BIBD upfront in Excel / R and paste into each question's options)
- After collecting responses, export to CSV → HB estimation with R's
bayesm/ChoiceModelRpackages
This approach is practical enough for "initial projects that can't justify the cost of specialized tools" and "cases where you want quick validation with 10–15 items".
Option B: Use it alongside specialized tools
For full-scale MaxDiff projects:
- Sawtooth Software Discover / Lighthouse: industry standard, end-to-end from design to HB analysis
- SurveyEngine / Conjoint.ly: SaaS-based, easy to adopt
- Run the main study in these tools, and use Kicue for screening questions and additional profiling questions
What Kicue Cannot Cover
- Automatic BIBD generation → pre-generate in external tools (R
support.BWS/ Sawtooth) and paste into Kicue - Hierarchical Bayes estimation → CSV export → R
bayesm/ Sawtooth HB module - Dashboard display of individual-level scores → external BI tools (Tableau / Looker)
- Automatic randomization of block display → partial support via Kicue's option randomization, but equalizing item-pair co-occurrence requires manual control
For related reading, the Van Westendorp PSM Design Guide, Conjoint Analysis in Practice, Likert Scale Design Guide, and Screening Question Design together give you the three-sibling method selection plus screening design that precedes MaxDiff.
References (5)
- Louviere, J. J., & Woodworth, G. (1990). Best-worst analysis: A novel method of measuring values in marketing research. Journal of Marketing Research, 27(4), 437-444.
- Marley, A. A. J., & Louviere, J. J. (2005). Some probabilistic models for best, worst, and best-worst choices. Journal of Mathematical Psychology, 49(6), 464-480.
- Orme, B. K. (2010). Getting Started with Conjoint Analysis: Strategies for Product Design and Pricing Research (2nd ed.). Research Publishers.
- Cohen, S. H. (2003). Maximum difference scaling: Improved measures of importance and preference for segmentation. Sawtooth Software Research Paper.
- Flynn, T. N., Louviere, J. J., Peters, T. J., & Coast, J. (2007). Best-worst scaling: What it can do for health care research and how to do it. Journal of Health Economics, 26(1), 171-189.
If you want to measure feature priorities or wish-list rankings with high precision, try the free survey tool Kicue. Substitute MaxDiff implementation with iterated single-answer blocks, control display order with the option randomization feature, and integrate with R / Sawtooth via CSV export — you can start the initial verification phase of MaxDiff in a single account (BIBD generation, hierarchical Bayesian estimation, and individual-level analysis require specialized tools like Sawtooth Software / SurveyEngine / R bayesm).
