How-to

Likert Scale Design Guide — 5-Point vs 7-Point vs 9-Point and the Midpoint Question

A research-grounded guide to designing Likert scales. Covers how to choose the number of points, whether to include a neutral midpoint, label design, and the long-running statistical debate — the foundational measurement device behind CSAT, NPS, and CES.

The "very satisfied to very dissatisfied" rating familiar from every customer survey is, in industry terms, a Likert scale — a measurement technique introduced in 1932. CSAT, NPS, CES, brand evaluation, engagement surveys — almost every rating item in modern web surveys is a derivative of the Likert scale, and yet basic design questions like "5 points or 7?" and "include a midpoint?" are still settled by gut feel in most projects.

This article walks through the essence of the Likert scale, the academic basis for choosing point counts, the midpoint question, label-design pitfalls, and the long-running statistical debate about how to analyze the data. Surveys running on "5 points because that's what we always do" sometimes produce conclusions that flip the moment you change the scale — this is closer to the foundation than people realize.

1. What a Likert scale is

A Likert scale is a measurement device that asks respondents to express their attitude or evaluation by choosing among multiple ordered response categories. Rensis Likert proposed the technique in his 1932 doctoral dissertation A Technique for the Measurement of Attitudes.

Typical format

Q. How satisfied are you with our service overall?
   1. Very dissatisfied
   2. Somewhat dissatisfied
   3. Neutral
   4. Somewhat satisfied
   5. Very satisfied

The basic structure: two opposing poles with graded steps in between. Common derivatives in web surveys include:

  • NPS (0–10, 11 points) — Reichheld (2003), recommendation likelihood
  • 5-point CSAT — standard for support evaluation
  • 7-point semantic differential (SD) — bipolar adjective pairs ("bright vs. dark")
  • Sliders — continuous 0–100 scale

Four design decisions

The design problem reduces to four choices:

  1. Number of points — 5 / 7 / 9 / 11
  2. Midpoint — include "neutral" or not
  3. Labels — full text on every point or just the endpoints
  4. Direction — "negative → positive" or "positive → negative"

Each has its own academic literature.

2. Why "how many points" gets argued so much

The point-count debate stems from a tradeoff among measurement reliability, validity, and respondent burden.

Benefits of more points

  • Higher discrimination — separates "somewhat satisfied" from "very satisfied"
  • More statistical information — finer granularity for means and standard deviations
  • Mitigates ceiling and floor effects — a 5-point scale that pushes everyone to "very satisfied" can be spread out on 7 points

Costs of more points

  • Higher cognitive load — telling "somewhat satisfied" from "fairly satisfied" is harder
  • Longer response time — taller matrices in vertical layouts
  • Blurry middle steps — at 9+ points, mid-range categories become indistinguishable to respondents
  • Lower test-retest reliability — same person, same question, more variation across responses with more points

Cox (1980) "The Optimal Number of Response Alternatives for a Scale" concluded that the optimal range is 5 to 9 points — and that's been the industry consensus ever since.

3. 5 vs 7 vs 9 points — what the research actually says

Major findings

StudyRecommendedRationale
Likert (1932) original5 pointsSufficient discrimination at minimum burden
Cox (1980)5–9 pointsBeyond 9, discrimination gains < load gains
Krosnick & Fabrigar (1997)7 pointsBest joint reliability + validity
Preston & Colman (2000)7–10 pointsReliability stabilizes at 7+
Lozano, García-Cueto & Muñiz (2008)4–7 pointsValidity drops below 4; plateaus at 7
Norman (2010)5 or 7 pointsParametric analysis is fine at 5+

The academic safe zone is 5–7 points; 9+ points sees cognitive cost outpace discrimination gains.

Conventions by use case

In practice, conventions differ by application:

Use caseStandardWhy
CSAT5 pointsIntuitive (5 out of 5)
NPS11 points (0–10)Reichheld's methodology, fixed
CES5 or 7 pointsDixon et al.'s original used 5
Brand evaluation7 pointsWants finer-grained differences
Engagement5 pointsGallup Q12 standard
Academic studies7 pointsCronbach's α stabilizes

When 9 or 11 points make sense

  • NPS at 11 points — Reichheld argued that 0–10 specifically captures "strength of recommendation." Academically, the 11-point convention is more "industry standard practice" than "demonstrably optimal."
  • 9 points — used in academic surveys and large panels for maximum discrimination. Generally not recommended for typical web surveys.

"5 or 7 if in doubt" is the consensus from both research and practitioner literature.

4. Should You Include the Midpoint in a Likert Scale?

Whether to include a "neutral" / "neither agree nor disagree" midpoint is as important as the point-count question.

With a midpoint (odd-numbered scales)

  • Pro: genuinely neutral respondents aren't forced into agree/disagree. Reduces burden.
  • Con: gives "I'd rather not answer" respondents a place to hide; potential for satisficing.

Without a midpoint (even-numbered scales)

  • Pro: forces respondents to express an opinion, eliminating "vaguely neutral" choices.
  • Con: forces genuinely neutral people into one side or the other, distorting the data.

What the research recommends

Krosnick & Fabrigar (1997) conclude that midpoints should generally be included. Reasons:

  1. Truly neutral respondents exist — no knowledge / no interest / no experience.
  2. Forced choice increases measurement error — "vaguely positive" choices add noise.
  3. The evidence that midpoints inflate satisficing is weak — when point count is appropriate, the effect is small.

That said, if neutral responses dominate, the question is the problem, not the midpoint. Fix the wording, don't remove the midpoint.

5. Label-design pitfalls

How you label the categories affects data quality directly.

Fully labeled vs endpoint-labeled

Fully labeled:

1. Very dissatisfied / 2. Somewhat dissatisfied / 3. Neutral / 4. Somewhat satisfied / 5. Very satisfied

Endpoint-labeled:

1 (very dissatisfied) — 2 — 3 — 4 — 5 (very satisfied)

Krosnick & Berent (1993) showed that fully labeled scales have higher reliability and validity — respondents can't reliably interpret bare numbers, so attaching language to every category matters. Default to fully labeled.

The "equal interval" assumption

People routinely compute means assuming the steps are evenly spaced — but are they really?

Tourangeau, Rips & Rasinski (2000) The Psychology of Survey Response point out that the psychological distance from "very satisfied" to "somewhat satisfied" need not equal the distance from "somewhat satisfied" to "neutral." This is the gateway to the ordinal-vs-interval debate (next section).

Direction conventions

Whether "negative → positive" or "positive → negative" reads left-to-right is a convention that varies by region and context. The non-negotiable rules: direction must be consistent within a survey, and must never change in a tracking study.

6. Ordinal or interval — the 50-year statistical debate

A debate that's been running in academia for half a century: can you compute means and standard deviations from Likert data (the 1–5 numbers)?

Strict view: "It's ordinal — means are inappropriate"

A Likert scale is fundamentally ordinal — the difference between "very satisfied" and "somewhat satisfied" is one numeric step, but not necessarily one psychological step. Therefore:

  • Means are inappropriate — use median or mode.
  • Use non-parametric tests (Mann-Whitney U, etc.).
  • Regression and t-tests are inappropriate.

Pragmatic view: "Treat it as interval in practice"

Norman (2010) "Likert Scales, Levels of Measurement and the 'Laws' of Statistics" concludes that treating Likert scales as interval and applying parametric tests (t-tests, regression) causes essentially no problem in practice. Reasons:

  1. Simulation studies show robustness — even when intervals aren't equal, results are largely correct.
  2. Central limit theorem applies for 5+ points and large samples — distributions approximate normal.
  3. The vast majority of published research uses parametric tests — the strict view hasn't kept pace with practice.

Where practice lands

The synthesis from research and practitioner literature:

  • 5+ point Likert with N ≥ 100 → means, SDs, and regression are fine for practical purposes.
  • For papers and formal reports, explicitly state "Likert data treated as interval."
  • Where ceiling or floor effects are present, validate with non-parametric tests.

CSAT averages and NPS subtraction are routine because the pragmatic view is the working standard in industry.

7. Editorial view — five rules that move the needle

From tracking industry reports and public cases, five things we'd push hard on.

1. "5 points if in doubt." Choose 7 only with a reason. Teams flip-flop on 5 vs 7, and the practical heuristic is "5 unless you have a specific reason." When you do choose 7, document the reason ("we need finer discrimination across brand-image items"). Picking 7 because it "feels more precise" is the pattern industry articles return to: teams later regret it because results were less intuitive at 7 than 5 would have been.

2. Default to including the midpoint. If "neutral" is too high, fix the question. Removing the midpoint to force a position is a workaround we see periodically — and it's usually a category error. Excessive neutrality signals an abstract or low-engagement question. Sharpen the wording, don't drop the midpoint. That's also what the Krosnick & Fabrigar research supports.

3. Default to full labeling. Endpoint-only labeling is "saved-effort" design. When you see "1 — 2 — 3 — 4 — 5 (dissatisfied — satisfied)" without labels in between, it's typically a sign someone economized on design effort. Research repeatedly shows fully labeled scales have higher reliability — the one minute it takes to add language to every category buys real downstream quality. NPS is the conventional exception (0–10 numeric); everything else: full labels.

4. In tracking studies, freeze point count, midpoint, and labels — period. We see teams "just bumping it from 5 to 7 this round" or "tweaking the wording" and then trying to compare against the previous wave. Once changed, the historical and current scores no longer share a scale, and longitudinal comparison is broken forever. Either recollect the historical wave on the new scale, or don't change it.

5. The Likert isn't magic — wording is 80%, scale design 20%. Point count and midpoint matter, but the question wording moves results far more. Whether "How satisfied are you with our service?" is on a 5- or 7-point scale, the data is meaningless if the question itself is too abstract. Polish the wording first, then think about the scale.

8. Likert scales in the Survey Tool Kicue

Kicue ships scale-related capabilities as standard.

SCALE question types

SCALE question types come in four flavors:

  • LIKERT — standard Likert scale (5 / 7 points and others, fully configurable)
  • NPS — optimized for the 11-point (0–10) format
  • SLIDER — continuous-value slider
  • SD — semantic differential (bipolar adjective pairs)

Combining with matrix questions

To rate multiple items on a shared Likert scale, combine matrix question types with SCALE. For matrix-specific pitfalls, see matrix question design.

Likert scales tie tightly to other survey-design topics. See also our CSAT survey design guide, NPS complete guide, CES design guide, matrix question design, and question order effects.

Choosing the right tool — Free plan limits, branching support, AI capabilities, and CSV export vary widely across tools. See our free survey tool comparison to find the right fit for this approach.

Summary

Checklist for designing and operating Likert scales:

  1. 5 or 7 points is the academic optimum. 9+ points cost more in load than they gain in discrimination.
  2. Default to including the midpoint. Forced choice raises measurement error.
  3. Fully label every category. Endpoint-only labels reduce reliability.
  4. Treat data as effectively interval in practice. Norman (2010) is the working standard.
  5. In tracking studies, freeze the scale design. Changing it breaks longitudinal comparison.
  6. Wording first, scale second. 80/20.

Teams that treat the Likert as "5 points, whatever" produce different reliability than teams that deliberately decide point count, midpoint, and labels. It's the foundational measurement device behind CSAT/NPS/CES — worth designing on purpose.


References (11)

Academic and methodological

Vendor and practitioner guides


Want to design surveys with deliberate Likert-scale choices end-to-end? Try the free survey tool Kicue. LIKERT, NPS, SLIDER, and SD question types ship as standard, with full control over point count, midpoint, and label design.

Related articles

Ready to create your own survey?

Upload your survey file and AI generates a web survey form in 30 seconds.

Get started for free