The "very satisfied to very dissatisfied" rating familiar from every customer survey is, in industry terms, a Likert scale — a measurement technique introduced in 1932. CSAT, NPS, CES, brand evaluation, engagement surveys — almost every rating item in modern web surveys is a derivative of the Likert scale, and yet basic design questions like "5 points or 7?" and "include a midpoint?" are still settled by gut feel in most projects.
This article walks through the essence of the Likert scale, the academic basis for choosing point counts, the midpoint question, label-design pitfalls, and the long-running statistical debate about how to analyze the data. Surveys running on "5 points because that's what we always do" sometimes produce conclusions that flip the moment you change the scale — this is closer to the foundation than people realize.
1. What a Likert scale is
A Likert scale is a measurement device that asks respondents to express their attitude or evaluation by choosing among multiple ordered response categories. Rensis Likert proposed the technique in his 1932 doctoral dissertation A Technique for the Measurement of Attitudes.
Typical format
Q. How satisfied are you with our service overall?
1. Very dissatisfied
2. Somewhat dissatisfied
3. Neutral
4. Somewhat satisfied
5. Very satisfied
The basic structure: two opposing poles with graded steps in between. Common derivatives in web surveys include:
- NPS (0–10, 11 points) — Reichheld (2003), recommendation likelihood
- 5-point CSAT — standard for support evaluation
- 7-point semantic differential (SD) — bipolar adjective pairs ("bright vs. dark")
- Sliders — continuous 0–100 scale
Four design decisions
The design problem reduces to four choices:
- Number of points — 5 / 7 / 9 / 11
- Midpoint — include "neutral" or not
- Labels — full text on every point or just the endpoints
- Direction — "negative → positive" or "positive → negative"
Each has its own academic literature.
2. Why "how many points" gets argued so much
The point-count debate stems from a tradeoff among measurement reliability, validity, and respondent burden.
Benefits of more points
- Higher discrimination — separates "somewhat satisfied" from "very satisfied"
- More statistical information — finer granularity for means and standard deviations
- Mitigates ceiling and floor effects — a 5-point scale that pushes everyone to "very satisfied" can be spread out on 7 points
Costs of more points
- Higher cognitive load — telling "somewhat satisfied" from "fairly satisfied" is harder
- Longer response time — taller matrices in vertical layouts
- Blurry middle steps — at 9+ points, mid-range categories become indistinguishable to respondents
- Lower test-retest reliability — same person, same question, more variation across responses with more points
Cox (1980) "The Optimal Number of Response Alternatives for a Scale" concluded that the optimal range is 5 to 9 points — and that's been the industry consensus ever since.
3. 5 vs 7 vs 9 points — what the research actually says
Major findings
| Study | Recommended | Rationale |
|---|---|---|
| Likert (1932) original | 5 points | Sufficient discrimination at minimum burden |
| Cox (1980) | 5–9 points | Beyond 9, discrimination gains < load gains |
| Krosnick & Fabrigar (1997) | 7 points | Best joint reliability + validity |
| Preston & Colman (2000) | 7–10 points | Reliability stabilizes at 7+ |
| Lozano, García-Cueto & Muñiz (2008) | 4–7 points | Validity drops below 4; plateaus at 7 |
| Norman (2010) | 5 or 7 points | Parametric analysis is fine at 5+ |
The academic safe zone is 5–7 points; 9+ points sees cognitive cost outpace discrimination gains.
Conventions by use case
In practice, conventions differ by application:
| Use case | Standard | Why |
|---|---|---|
| CSAT | 5 points | Intuitive (5 out of 5) |
| NPS | 11 points (0–10) | Reichheld's methodology, fixed |
| CES | 5 or 7 points | Dixon et al.'s original used 5 |
| Brand evaluation | 7 points | Wants finer-grained differences |
| Engagement | 5 points | Gallup Q12 standard |
| Academic studies | 7 points | Cronbach's α stabilizes |
When 9 or 11 points make sense
- NPS at 11 points — Reichheld argued that 0–10 specifically captures "strength of recommendation." Academically, the 11-point convention is more "industry standard practice" than "demonstrably optimal."
- 9 points — used in academic surveys and large panels for maximum discrimination. Generally not recommended for typical web surveys.
"5 or 7 if in doubt" is the consensus from both research and practitioner literature.
4. Should You Include the Midpoint in a Likert Scale?
Whether to include a "neutral" / "neither agree nor disagree" midpoint is as important as the point-count question.
With a midpoint (odd-numbered scales)
- Pro: genuinely neutral respondents aren't forced into agree/disagree. Reduces burden.
- Con: gives "I'd rather not answer" respondents a place to hide; potential for satisficing.
Without a midpoint (even-numbered scales)
- Pro: forces respondents to express an opinion, eliminating "vaguely neutral" choices.
- Con: forces genuinely neutral people into one side or the other, distorting the data.
What the research recommends
Krosnick & Fabrigar (1997) conclude that midpoints should generally be included. Reasons:
- Truly neutral respondents exist — no knowledge / no interest / no experience.
- Forced choice increases measurement error — "vaguely positive" choices add noise.
- The evidence that midpoints inflate satisficing is weak — when point count is appropriate, the effect is small.
That said, if neutral responses dominate, the question is the problem, not the midpoint. Fix the wording, don't remove the midpoint.
5. Label-design pitfalls
How you label the categories affects data quality directly.
Fully labeled vs endpoint-labeled
Fully labeled:
1. Very dissatisfied / 2. Somewhat dissatisfied / 3. Neutral / 4. Somewhat satisfied / 5. Very satisfied
Endpoint-labeled:
1 (very dissatisfied) — 2 — 3 — 4 — 5 (very satisfied)
Krosnick & Berent (1993) showed that fully labeled scales have higher reliability and validity — respondents can't reliably interpret bare numbers, so attaching language to every category matters. Default to fully labeled.
The "equal interval" assumption
People routinely compute means assuming the steps are evenly spaced — but are they really?
Tourangeau, Rips & Rasinski (2000) The Psychology of Survey Response point out that the psychological distance from "very satisfied" to "somewhat satisfied" need not equal the distance from "somewhat satisfied" to "neutral." This is the gateway to the ordinal-vs-interval debate (next section).
Direction conventions
Whether "negative → positive" or "positive → negative" reads left-to-right is a convention that varies by region and context. The non-negotiable rules: direction must be consistent within a survey, and must never change in a tracking study.
6. Ordinal or interval — the 50-year statistical debate
A debate that's been running in academia for half a century: can you compute means and standard deviations from Likert data (the 1–5 numbers)?
Strict view: "It's ordinal — means are inappropriate"
A Likert scale is fundamentally ordinal — the difference between "very satisfied" and "somewhat satisfied" is one numeric step, but not necessarily one psychological step. Therefore:
- Means are inappropriate — use median or mode.
- Use non-parametric tests (Mann-Whitney U, etc.).
- Regression and t-tests are inappropriate.
Pragmatic view: "Treat it as interval in practice"
Norman (2010) "Likert Scales, Levels of Measurement and the 'Laws' of Statistics" concludes that treating Likert scales as interval and applying parametric tests (t-tests, regression) causes essentially no problem in practice. Reasons:
- Simulation studies show robustness — even when intervals aren't equal, results are largely correct.
- Central limit theorem applies for 5+ points and large samples — distributions approximate normal.
- The vast majority of published research uses parametric tests — the strict view hasn't kept pace with practice.
Where practice lands
The synthesis from research and practitioner literature:
- 5+ point Likert with N ≥ 100 → means, SDs, and regression are fine for practical purposes.
- For papers and formal reports, explicitly state "Likert data treated as interval."
- Where ceiling or floor effects are present, validate with non-parametric tests.
CSAT averages and NPS subtraction are routine because the pragmatic view is the working standard in industry.
7. Editorial view — five rules that move the needle
From tracking industry reports and public cases, five things we'd push hard on.
1. "5 points if in doubt." Choose 7 only with a reason. Teams flip-flop on 5 vs 7, and the practical heuristic is "5 unless you have a specific reason." When you do choose 7, document the reason ("we need finer discrimination across brand-image items"). Picking 7 because it "feels more precise" is the pattern industry articles return to: teams later regret it because results were less intuitive at 7 than 5 would have been.
2. Default to including the midpoint. If "neutral" is too high, fix the question. Removing the midpoint to force a position is a workaround we see periodically — and it's usually a category error. Excessive neutrality signals an abstract or low-engagement question. Sharpen the wording, don't drop the midpoint. That's also what the Krosnick & Fabrigar research supports.
3. Default to full labeling. Endpoint-only labeling is "saved-effort" design. When you see "1 — 2 — 3 — 4 — 5 (dissatisfied — satisfied)" without labels in between, it's typically a sign someone economized on design effort. Research repeatedly shows fully labeled scales have higher reliability — the one minute it takes to add language to every category buys real downstream quality. NPS is the conventional exception (0–10 numeric); everything else: full labels.
4. In tracking studies, freeze point count, midpoint, and labels — period. We see teams "just bumping it from 5 to 7 this round" or "tweaking the wording" and then trying to compare against the previous wave. Once changed, the historical and current scores no longer share a scale, and longitudinal comparison is broken forever. Either recollect the historical wave on the new scale, or don't change it.
5. The Likert isn't magic — wording is 80%, scale design 20%. Point count and midpoint matter, but the question wording moves results far more. Whether "How satisfied are you with our service?" is on a 5- or 7-point scale, the data is meaningless if the question itself is too abstract. Polish the wording first, then think about the scale.
8. Likert scales in the Survey Tool Kicue
Kicue ships scale-related capabilities as standard.
SCALE question types
SCALE question types come in four flavors:
- LIKERT — standard Likert scale (5 / 7 points and others, fully configurable)
- NPS — optimized for the 11-point (0–10) format
- SLIDER — continuous-value slider
- SD — semantic differential (bipolar adjective pairs)
Combining with matrix questions
To rate multiple items on a shared Likert scale, combine matrix question types with SCALE. For matrix-specific pitfalls, see matrix question design.
Related design articles
Likert scales tie tightly to other survey-design topics. See also our CSAT survey design guide, NPS complete guide, CES design guide, matrix question design, and question order effects.
Choosing the right tool — Free plan limits, branching support, AI capabilities, and CSV export vary widely across tools. See our free survey tool comparison to find the right fit for this approach.
Summary
Checklist for designing and operating Likert scales:
- 5 or 7 points is the academic optimum. 9+ points cost more in load than they gain in discrimination.
- Default to including the midpoint. Forced choice raises measurement error.
- Fully label every category. Endpoint-only labels reduce reliability.
- Treat data as effectively interval in practice. Norman (2010) is the working standard.
- In tracking studies, freeze the scale design. Changing it breaks longitudinal comparison.
- Wording first, scale second. 80/20.
Teams that treat the Likert as "5 points, whatever" produce different reliability than teams that deliberately decide point count, midpoint, and labels. It's the foundational measurement device behind CSAT/NPS/CES — worth designing on purpose.
References (11)
Academic and methodological
- Likert, R. (1932). A Technique for the Measurement of Attitudes. Archives of Psychology.
- Cox, E. P. (1980). The Optimal Number of Response Alternatives for a Scale: A Review. Journal of Marketing Research.
- Krosnick, J. A., & Fabrigar, L. R. (1997). Designing Rating Scales for Effective Measurement in Surveys. Survey Measurement and Process Quality.
- Krosnick, J. A., & Berent, M. K. (1993). Comparisons of Party Identification and Policy Preferences. American Journal of Political Science.
- Preston, C. C., & Colman, A. M. (2000). Optimal Number of Response Categories in Rating Scales. Acta Psychologica.
- Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the Number of Response Categories on the Reliability and Validity of Rating Scales. Methodology.
- Norman, G. (2010). Likert Scales, Levels of Measurement and the 'Laws' of Statistics. Advances in Health Sciences Education.
- Tourangeau, R., Rips, L. J., & Rasinski, K. (2000). The Psychology of Survey Response. Cambridge University Press.
Vendor and practitioner guides
Want to design surveys with deliberate Likert-scale choices end-to-end? Try the free survey tool Kicue. LIKERT, NPS, SLIDER, and SD question types ship as standard, with full control over point count, midpoint, and label design.
