MaxDiff (Maximum Difference Scaling) Design Guide — Measuring Priorities

You ask "Which feature is the highest priority?", you use a Likert scale, and you end up with a report where every single item is rated 'Extremely Important'. If you've done research for even a year, you've seen this view at least once. That head-in-hands moment of "they're saying everything is important... I can't make a decision with this..."

The technique that structurally avoids this ceiling effect is MaxDiff (Maximum Difference Scaling, Best-Worst Scaling). In this piece, I'll walk through why Likert scales fail to produce priorities, the basic structure of MaxDiff, the conventions of experimental design, how to think about sample size, score calculation (counting vs hierarchical Bayes), and how to choose between MaxDiff, conjoint, and PSM — drawing on both the field practices of implementation vendors and the academic origins.

1. Why Likert Scales Don't Produce Priorities

Line up 10 items and ask people to "rate the importance on a 5-point scale", and almost every item ends up at 'Important' or 'Very Important'. This is the structural weakness of Likert scales.

Three main causes:

Ceiling effect: When respondents feel "everything matters", they stack near the upper limit. If all 10 items score a 5, you can't distinguish priorities
Social desirability bias: Items that are hard to disagree with — "security", "quality", "support" — score higher than their actual priority
Satisficing under response burden: Tagging each of 10 items with a star is monotonous work, and respondents start cutting corners by the second half

The result: data that says "everything is important" and can't drive a decision.

We cover Likert scales themselves in the Likert Scale Design Guide, but Likert is structurally unsuited for "I want to rank things" — and that's exactly why MaxDiff exists.

2. The Basic Structure of MaxDiff — Pick Best & Worst

MaxDiff is a method where you present 4–5 items at a time and ask the respondent to pick the most important (best) and least important (worst). Repeat this for 10–15 blocks, and you can statistically estimate the relative priority of each item.

What the Question Looks Like

Example: if you want to compare 10 items, the respondent sees a screen like this 12 times.

Of the following 4 items, please select the most important and the least important.

[ ] Low price             Most important [○]  Least important [ ]
[ ] Support quality       Most important [ ]  Least important [○]
[ ] Range of features     Most important [ ]  Least important [ ]
[ ] Ease of use           Most important [ ]  Least important [ ]

Each respondent is forced to directly compare items against each other, so they can't escape into "everything is a 5" the way they can with Likert. The relative strengths between items come out cleanly.

Why This Format Works

The key insight from Louviere, J. J., & Woodworth, G. (1990). Best-worst analysis is the psychological fact that "relative choice is more natural for humans than absolute rating". We're terrible at deciding "this is a 7", but we can answer "do you prefer A or B?" instantly. MaxDiff is a design that leans honestly into this cognitive trait.

3. Experimental Design — Conventions of Incomplete Block Design

The heart of MaxDiff is the experimental design. When you're comparing 10 items, you can't show every combination (45 pairs) to a respondent, so you use a Balanced Incomplete Block Design (BIBD) to randomly distribute the items.

Basic Design Rules

4–5 items per block: too many makes the choice hard, too few makes the comparison information thin
Each item appears the same number of times: if you spread 10 items across 12 blocks, each item shows up about 5 times
Each item pair co-occurs the same number of times: equalize how often "price" and "support" appear together in the same block
Randomize item position: prevent display-order effects

The Reality of Implementation

Building a perfect BIBD by hand isn't realistic, so the standard is to use specialized tools:

Sawtooth Software Lighthouse / Discover: the industry-standard MaxDiff vendor, auto-generates designs
R package support.BWS: open source, widely used in academic settings
SurveyEngine / Conjoint.ly: cloud-based, template-supported

These tools take your item count as input and generate the block design automatically. Don't build it from scratch by hand — that's the iron rule.

4. How to Decide Sample Size and Number of Repetitions

"How many people are enough?" and "how many blocks should one respondent see?" are the most debated practical questions in MaxDiff.

Rule of Thumb for Repetitions (Blocks per Respondent)

Item count × 3 / 4 is the industry rule of thumb. Example: 10 items means 7–8 blocks per person, 15 items means 11–12 blocks
Too few repetitions makes individual-level estimation unstable; too many drives up drop-off due to response burden
The practical move is to work backwards from a 5–10 minute response time cap

Rule of Thumb for Sample Size

Aggregate-level analysis only: N=200–300 is enough
Hierarchical Bayes by segment: N≥100 per segment, total N=400–500
Individual-level estimation (deep analysis of key customers): N≥500

Orme, B. K. (2010). Getting Started with Conjoint Analysis (2nd ed.) organizes MaxDiff sample design from Sawtooth Software's implementation experience, and is referenced as the working standard guideline in the field.

For more detail, see How to Calculate Survey Sample Size, which covers the fundamentals of sample size calculation.

5. Score Calculation — Counting Analysis vs Hierarchical Bayes

There are broadly two ways to calculate "priority scores per item" from MaxDiff response data.

Counting Analysis (Simple Version)

For each item, tally "times selected as best − times selected as worst"
Compare across items after tallying
Doable in Excel, simple to interpret, useful when you want a rough aggregate-level ranking

That said, counting analysis can't give you individual-level scores or fine-grained segment comparisons.

Hierarchical Bayes Estimation (HB)

Estimate each respondent's individual score using a prior (group mean) + posterior correction (individual's choices)
Produces individual-level scores, so you can use it for segmentation and clustering
Standard implementations use Sawtooth Software's HB module, or R's bayesm / ChoiceModelR packages

Marley, A. A. J., & Louviere, J. J. (2005). Some probabilistic models for best, worst, and best-worst choices lays out the mathematical models for best-worst choice (random utility models, MNL), and serves as the theoretical foundation for HB implementations.

Choosing in Practice

For an executive deck, you want to show "Feature A is 3× as important as Feature B" → HB estimation (individual scores → present averages)
You want to compare priorities across 5 segments → HB estimation (segment-level posteriors)
You just want to share "what's the top priority for Q1" internally in a concise way → counting analysis is plenty

For the theoretical detail of Bayesian estimation, reading Survey Aggregation and Significance Testing — Cross-Tabs, Chi-Square, Effect Sizes alongside this guide makes the frequentist contrast easier to grasp.

6. Choosing Between MaxDiff, Conjoint, and PSM

As the big three of pricing and priority research, MaxDiff, conjoint, and PSM often get discussed side by side. Each answers a different question and fits a different scenario.

Choosing Between MaxDiff / Conjoint / PSM

MaxDiff (Maximum Difference Scaling)

Measures priorities among individual items. From 10–30 features or wish-list items, identifies "what matters most". Relatively simple to design, with moderate response burden. Best for feature prioritization, concept screening, and attribute shortlisting.

Conjoint Analysis

Presents combinations of attributes and runs share simulations. Powerful for comparing product profiles (price × feature × brand). Design and analysis are harder than MaxDiff. Best for product concept evaluation and price elasticity measurement.

Van Westendorp PSM

Directly asks for 4 price points — "too expensive", "too cheap", "acceptable", etc. The simplest design, but all you get is a "price range" — it tells you nothing about feature priorities. Best for initial price-range exploration for new products.

Selection Flow in Practice

"I want to decide what to develop first" → MaxDiff
"I want to see if this price + feature bundle will sell" → Conjoint
"Should the initial price be $30,$ 50, or $80?" → PSM

There are cases where you use them in parallel. A standard pattern for mid-sized projects: use MaxDiff to narrow down feature priorities, then a conjoint that bundles the top 3 features, with PSM for price range.

For more detail, reading Conjoint Analysis in Practice and the Van Westendorp PSM Design Guide alongside this piece makes the three-sibling method selection clearer.

7. The Editorial View — 5 Things That Always Pay Off in MaxDiff Implementation

From the vantage of continuously tracking industry cases and public vendor articles, here are 5 things that always pay off in MaxDiff implementation.

1. Narrow Down to 10–20 Items Before You Start

"I want to throw all 30 items into MaxDiff" is a common request, but 30 items requires about 22 blocks per respondent, and response burden collapses. The field convention is to internally debate "obviously keep / obviously drop" before MaxDiff and narrow to under 20 items before running it.

2. Match Item Granularity

If you line up "low price" next to "ease of use of the inquiry form" as peers, the abstraction levels are too different and respondents can't compare them. Consciously match the granularity (level of abstraction) of items — for example, all at "feature category" level, or all at "specific touchpoint" level.

3. Don't Mix "Importance" and "Satisfaction"

If you want to capture both "importance" and "current satisfaction" in MaxDiff in the same survey, split them into separate blocks. Asking respondents to choose "important and satisfying" in the same block confuses them. If you're pairing this with Kano model analysis, designing it as a separate survey is the safe move.

4. Verify Block Display on Real Devices in Pretest

MaxDiff block screens frequently have item text that wraps awkwardly and becomes hard to read on mobile devices. Before going live, always check the display on both iOS and Android. The Survey Pre-Launch Checklist organizes the framework for pre-launch verification.

5. Reports Need the "Score + Rank + Effect Size" 3-Piece Set

Just showing executives "Feature A is 28.5 points" doesn't land. Put the 3-piece set of "score", "rank", and "is the difference between Feature A vs Feature B statistically significant" on a single page. With HB estimation, posterior distribution overlap shows significance intuitively.

8. Implementing MaxDiff in the Survey Tool Kicue

⚠️ Important caveat: Kicue does not have a dedicated MaxDiff question type. Compared to general research-specialist tools (Sawtooth Software / SurveyEngine / Conjoint.ly), the automation of design and analysis is limited.

Two Options for Implementing MaxDiff in Kicue

Option A: Substitute implementation in Kicue

You can reproduce MaxDiff behavior with iterated blocks of single-answer questions:

"Most important among the following 4 items" as a single-answer question, repeated 12 times
"Least important among the following 4 items" as a single-answer question, repeated 12 times
Vary the item set per block (generate the BIBD upfront in Excel / R and paste into each question's options)
After collecting responses, export to CSV → HB estimation with R's bayesm / ChoiceModelR packages

This approach is practical enough for "initial projects that can't justify the cost of specialized tools" and "cases where you want quick validation with 10–15 items".

Option B: Use it alongside specialized tools

For full-scale MaxDiff projects:

Sawtooth Software Discover / Lighthouse: industry standard, end-to-end from design to HB analysis
SurveyEngine / Conjoint.ly: SaaS-based, easy to adopt
Run the main study in these tools, and use Kicue for screening questions and additional profiling questions

What Kicue Cannot Cover

Automatic BIBD generation → pre-generate in external tools (R support.BWS / Sawtooth) and paste into Kicue
Hierarchical Bayes estimation → CSV export → R bayesm / Sawtooth HB module
Dashboard display of individual-level scores → external BI tools (Tableau / Looker)
Automatic randomization of block display → partial support via Kicue's option randomization, but equalizing item-pair co-occurrence requires manual control

For related reading, the Van Westendorp PSM Design Guide, Conjoint Analysis in Practice, Likert Scale Design Guide, and Screening Question Design together give you the three-sibling method selection plus screening design that precedes MaxDiff.

References (5)

Louviere, J. J., & Woodworth, G. (1990). Best-worst analysis: A novel method of measuring values in marketing research. Journal of Marketing Research, 27(4), 437-444.
Marley, A. A. J., & Louviere, J. J. (2005). Some probabilistic models for best, worst, and best-worst choices. Journal of Mathematical Psychology, 49(6), 464-480.
Orme, B. K. (2010). Getting Started with Conjoint Analysis: Strategies for Product Design and Pricing Research (2nd ed.). Research Publishers.
Cohen, S. H. (2003). Maximum difference scaling: Improved measures of importance and preference for segmentation. Sawtooth Software Research Paper.
Flynn, T. N., Louviere, J. J., Peters, T. J., & Coast, J. (2007). Best-worst scaling: What it can do for health care research and how to do it. Journal of Health Economics, 26(1), 171-189.

If you want to measure feature priorities or wish-list rankings with high precision, try the free survey tool Kicue. Substitute MaxDiff implementation with iterated single-answer blocks, control display order with the option randomization feature, and integrate with R / Sawtooth via CSV export — you can start the initial verification phase of MaxDiff in a single account (BIBD generation, hierarchical Bayesian estimation, and individual-level analysis require specialized tools like Sawtooth Software / SurveyEngine / R bayesm).