Likert Scale Design Guide — 5-Point vs 7-Point vs 9-Point and the Midpoint Question

The "very satisfied to very dissatisfied" rating familiar from every customer survey is, in industry terms, a Likert scale — a measurement technique introduced in 1932. CSAT, NPS, CES, brand evaluation, engagement surveys — almost every rating item in modern web surveys is a derivative of the Likert scale, and yet basic design questions like "5 points or 7?" and "include a midpoint?" are still settled by gut feel in most projects.

This article walks through the essence of the Likert scale, the academic basis for choosing point counts, the midpoint question, label-design pitfalls, and the long-running statistical debate about how to analyze the data. Surveys running on "5 points because that's what we always do" sometimes produce conclusions that flip the moment you change the scale — this is closer to the foundation than people realize.

1. What a Likert scale is

A Likert scale is a measurement device that asks respondents to express their attitude or evaluation by choosing among multiple ordered response categories. Rensis Likert proposed the technique in his 1932 doctoral dissertation A Technique for the Measurement of Attitudes.

Typical format

Q. How satisfied are you with our service overall?
   1. Very dissatisfied
   2. Somewhat dissatisfied
   3. Neutral
   4. Somewhat satisfied
   5. Very satisfied

The basic structure: two opposing poles with graded steps in between. Common derivatives in web surveys include:

NPS (0–10, 11 points) — Reichheld (2003), recommendation likelihood
5-point CSAT — standard for support evaluation
7-point semantic differential (SD) — bipolar adjective pairs ("bright vs. dark")
Sliders — continuous 0–100 scale

Four design decisions

The design problem reduces to four choices:

Number of points — 5 / 7 / 9 / 11
Midpoint — include "neutral" or not
Labels — full text on every point or just the endpoints
Direction — "negative → positive" or "positive → negative"

Each has its own academic literature.

2. Why "how many points" gets argued so much

The point-count debate stems from a tradeoff among measurement reliability, validity, and respondent burden.

Benefits of more points

Higher discrimination — separates "somewhat satisfied" from "very satisfied"
More statistical information — finer granularity for means and standard deviations
Mitigates ceiling and floor effects — a 5-point scale that pushes everyone to "very satisfied" can be spread out on 7 points

Costs of more points

Higher cognitive load — telling "somewhat satisfied" from "fairly satisfied" is harder
Longer response time — taller matrices in vertical layouts
Blurry middle steps — at 9+ points, mid-range categories become indistinguishable to respondents
Lower test-retest reliability — same person, same question, more variation across responses with more points

Cox (1980) "The Optimal Number of Response Alternatives for a Scale" concluded that the optimal range is 5 to 9 points — and that's been the industry consensus ever since.

3. 5 vs 7 vs 9 points — what the research actually says

Major findings

Study	Recommended	Rationale
Likert (1932) original	5 points	Sufficient discrimination at minimum burden
Cox (1980)	5–9 points	Beyond 9, discrimination gains < load gains
Krosnick & Fabrigar (1997)	7 points	Best joint reliability + validity
Preston & Colman (2000)	7–10 points	Reliability stabilizes at 7+
Lozano, García-Cueto & Muñiz (2008)	4–7 points	Validity drops below 4; plateaus at 7
Norman (2010)	5 or 7 points	Parametric analysis is fine at 5+

The academic safe zone is 5–7 points; 9+ points sees cognitive cost outpace discrimination gains.

Conventions by use case

In practice, conventions differ by application:

Use case	Standard	Why
CSAT	5 points	Intuitive (5 out of 5)
NPS	11 points (0–10)	Reichheld's methodology, fixed
CES	5 or 7 points	Dixon et al.'s original used 5
Brand evaluation	7 points	Wants finer-grained differences
Engagement	5 points	Gallup Q12 standard
Academic studies	7 points	Cronbach's α stabilizes

When 9 or 11 points make sense

NPS at 11 points — Reichheld argued that 0–10 specifically captures "strength of recommendation." Academically, the 11-point convention is more "industry standard practice" than "demonstrably optimal."
9 points — used in academic surveys and large panels for maximum discrimination. Generally not recommended for typical web surveys.

"5 or 7 if in doubt" is the consensus from both research and practitioner literature.

4. Should You Include the Midpoint in a Likert Scale?

Whether to include a "neutral" / "neither agree nor disagree" midpoint is as important as the point-count question.

With a midpoint (odd-numbered scales)

Pro: genuinely neutral respondents aren't forced into agree/disagree. Reduces burden.
Con: gives "I'd rather not answer" respondents a place to hide; potential for satisficing.

Without a midpoint (even-numbered scales)

Pro: forces respondents to express an opinion, eliminating "vaguely neutral" choices.
Con: forces genuinely neutral people into one side or the other, distorting the data.

What the research recommends

Krosnick & Fabrigar (1997) conclude that midpoints should generally be included. Reasons:

Truly neutral respondents exist — no knowledge / no interest / no experience.
Forced choice increases measurement error — "vaguely positive" choices add noise.
The evidence that midpoints inflate satisficing is weak — when point count is appropriate, the effect is small.

That said, if neutral responses dominate, the question is the problem, not the midpoint. Fix the wording, don't remove the midpoint.

5. Label-design pitfalls

How you label the categories affects data quality directly.

Fully labeled vs endpoint-labeled

Fully labeled:

1. Very dissatisfied / 2. Somewhat dissatisfied / 3. Neutral / 4. Somewhat satisfied / 5. Very satisfied

Endpoint-labeled:

1 (very dissatisfied) — 2 — 3 — 4 — 5 (very satisfied)

Krosnick & Berent (1993) showed that fully labeled scales have higher reliability and validity — respondents can't reliably interpret bare numbers, so attaching language to every category matters. Default to fully labeled.

The "equal interval" assumption

People routinely compute means assuming the steps are evenly spaced — but are they really?

Tourangeau, Rips & Rasinski (2000) The Psychology of Survey Response point out that the psychological distance from "very satisfied" to "somewhat satisfied" need not equal the distance from "somewhat satisfied" to "neutral." This is the gateway to the ordinal-vs-interval debate (next section).

Direction conventions

Whether "negative → positive" or "positive → negative" reads left-to-right is a convention that varies by region and context. The non-negotiable rules: direction must be consistent within a survey, and must never change in a tracking study.

6. Ordinal or interval — the 50-year statistical debate

A debate that's been running in academia for half a century: can you compute means and standard deviations from Likert data (the 1–5 numbers)?

Strict view: "It's ordinal — means are inappropriate"

A Likert scale is fundamentally ordinal — the difference between "very satisfied" and "somewhat satisfied" is one numeric step, but not necessarily one psychological step. Therefore:

Means are inappropriate — use median or mode.
Use non-parametric tests (Mann-Whitney U, etc.).
Regression and t-tests are inappropriate.

Pragmatic view: "Treat it as interval in practice"

Norman (2010) "Likert Scales, Levels of Measurement and the 'Laws' of Statistics" concludes that treating Likert scales as interval and applying parametric tests (t-tests, regression) causes essentially no problem in practice. Reasons:

Simulation studies show robustness — even when intervals aren't equal, results are largely correct.
Central limit theorem applies for 5+ points and large samples — distributions approximate normal.
The vast majority of published research uses parametric tests — the strict view hasn't kept pace with practice.

Where practice lands

The synthesis from research and practitioner literature:

5+ point Likert with N ≥ 100 → means, SDs, and regression are fine for practical purposes.
For papers and formal reports, explicitly state "Likert data treated as interval."
Where ceiling or floor effects are present, validate with non-parametric tests.

CSAT averages and NPS subtraction are routine because the pragmatic view is the working standard in industry.

7. Editorial view — five rules that move the needle

From tracking industry reports and public cases, five things we'd push hard on.

1. "5 points if in doubt." Choose 7 only with a reason. Teams flip-flop on 5 vs 7, and the practical heuristic is "5 unless you have a specific reason." When you do choose 7, document the reason ("we need finer discrimination across brand-image items"). Picking 7 because it "feels more precise" is the pattern industry articles return to: teams later regret it because results were less intuitive at 7 than 5 would have been.

2. Default to including the midpoint. If "neutral" is too high, fix the question. Removing the midpoint to force a position is a workaround we see periodically — and it's usually a category error. Excessive neutrality signals an abstract or low-engagement question. Sharpen the wording, don't drop the midpoint. That's also what the Krosnick & Fabrigar research supports.

3. Default to full labeling. Endpoint-only labeling is "saved-effort" design. When you see "1 — 2 — 3 — 4 — 5 (dissatisfied — satisfied)" without labels in between, it's typically a sign someone economized on design effort. Research repeatedly shows fully labeled scales have higher reliability — the one minute it takes to add language to every category buys real downstream quality. NPS is the conventional exception (0–10 numeric); everything else: full labels.

4. In tracking studies, freeze point count, midpoint, and labels — period. We see teams "just bumping it from 5 to 7 this round" or "tweaking the wording" and then trying to compare against the previous wave. Once changed, the historical and current scores no longer share a scale, and longitudinal comparison is broken forever. Either recollect the historical wave on the new scale, or don't change it.

5. The Likert isn't magic — wording is 80%, scale design 20%. Point count and midpoint matter, but the question wording moves results far more. Whether "How satisfied are you with our service?" is on a 5- or 7-point scale, the data is meaningless if the question itself is too abstract. Polish the wording first, then think about the scale.

8. Likert scales in the Survey Tool Kicue

Kicue ships scale-related capabilities as standard.

SCALE question types

SCALE question types come in four flavors:

LIKERT — standard Likert scale (5 / 7 points and others, fully configurable)
NPS — optimized for the 11-point (0–10) format
SLIDER — continuous-value slider
SD — semantic differential (bipolar adjective pairs)

Combining with matrix questions

To rate multiple items on a shared Likert scale, combine matrix question types with SCALE. For matrix-specific pitfalls, see matrix question design.

Likert scales tie tightly to other survey-design topics. See also our CSAT survey design guide, NPS complete guide, CES design guide, matrix question design, and question order effects.

Choosing the right tool — Free plan limits, branching support, AI capabilities, and CSV export vary widely across tools. See our free survey tool comparison to find the right fit for this approach.

Summary

Checklist for designing and operating Likert scales:

5 or 7 points is the academic optimum. 9+ points cost more in load than they gain in discrimination.
Default to including the midpoint. Forced choice raises measurement error.
Fully label every category. Endpoint-only labels reduce reliability.
Treat data as effectively interval in practice. Norman (2010) is the working standard.
In tracking studies, freeze the scale design. Changing it breaks longitudinal comparison.
Wording first, scale second. 80/20.

Teams that treat the Likert as "5 points, whatever" produce different reliability than teams that deliberately decide point count, midpoint, and labels. It's the foundational measurement device behind CSAT/NPS/CES — worth designing on purpose.

References (11)

Academic and methodological

Likert, R. (1932). A Technique for the Measurement of Attitudes. Archives of Psychology.
Cox, E. P. (1980). The Optimal Number of Response Alternatives for a Scale: A Review. Journal of Marketing Research.
Krosnick, J. A., & Fabrigar, L. R. (1997). Designing Rating Scales for Effective Measurement in Surveys. Survey Measurement and Process Quality.
Krosnick, J. A., & Berent, M. K. (1993). Comparisons of Party Identification and Policy Preferences. American Journal of Political Science.
Preston, C. C., & Colman, A. M. (2000). Optimal Number of Response Categories in Rating Scales. Acta Psychologica.
Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the Number of Response Categories on the Reliability and Validity of Rating Scales. Methodology.
Norman, G. (2010). Likert Scales, Levels of Measurement and the 'Laws' of Statistics. Advances in Health Sciences Education.
Tourangeau, R., Rips, L. J., & Rasinski, K. (2000). The Psychology of Survey Response. Cambridge University Press.

Vendor and practitioner guides

Want to design surveys with deliberate Likert-scale choices end-to-end? Try the free survey tool Kicue. LIKERT, NPS, SLIDER, and SD question types ship as standard, with full control over point count, midpoint, and label design.

Likert Scale Design Guide — 5-Point vs 7-Point vs 9-Point and the Midpoint Question

1. What a Likert scale is

Typical format

Four design decisions

2. Why "how many points" gets argued so much

Benefits of more points

Costs of more points

3. 5 vs 7 vs 9 points — what the research actually says

Major findings

Conventions by use case

When 9 or 11 points make sense

4. Should You Include the Midpoint in a Likert Scale?

With a midpoint (odd-numbered scales)

Without a midpoint (even-numbered scales)

What the research recommends

5. Label-design pitfalls

Fully labeled vs endpoint-labeled

The "equal interval" assumption

Direction conventions

6. Ordinal or interval — the 50-year statistical debate

Strict view: "It's ordinal — means are inappropriate"

Pragmatic view: "Treat it as interval in practice"

Where practice lands

7. Editorial view — five rules that move the needle

8. Likert scales in the Survey Tool Kicue

SCALE question types

Combining with matrix questions

Summary

Academic and methodological

Vendor and practitioner guides

Related articles

Mobile Survey Design Guide — UI/UX Principles for the 70%-Smartphone Era

Pilot Testing for Surveys — How Far to Validate Before Going Live

Survey Question Wording — Double-Barreled, Leading, and the 7 Pitfalls That Distort Your Data

Likert Scale Design Guide — 5-Point vs 7-Point vs 9-Point and the Midpoint Question

1. What a Likert scale is

Typical format

Four design decisions

2. Why "how many points" gets argued so much

Benefits of more points

Costs of more points

3. 5 vs 7 vs 9 points — what the research actually says

Major findings

Conventions by use case

When 9 or 11 points make sense

4. Should You Include the Midpoint in a Likert Scale?

With a midpoint (odd-numbered scales)

Without a midpoint (even-numbered scales)

What the research recommends

5. Label-design pitfalls

Fully labeled vs endpoint-labeled

The "equal interval" assumption

Direction conventions

6. Ordinal or interval — the 50-year statistical debate

Strict view: "It's ordinal — means are inappropriate"

Pragmatic view: "Treat it as interval in practice"

Where practice lands

7. Editorial view — five rules that move the needle

8. Likert scales in the Survey Tool Kicue

SCALE question types

Combining with matrix questions

Related design articles

Summary

Academic and methodological

Vendor and practitioner guides

Related articles

Mobile Survey Design Guide — UI/UX Principles for the 70%-Smartphone Era

Pilot Testing for Surveys — How Far to Validate Before Going Live

Survey Question Wording — Double-Barreled, Leading, and the 7 Pitfalls That Distort Your Data