How many user interviews do you need?

How many user interviews do you need? A working answer by study type, with the saturation evidence, and what changes when interviews are async and cheap.

Rizvi Haider·May 16, 2026·15 min read·Updated May 16, 2026

The number changes the moment you say what the study is for. Five for a usability test. Twelve for a thematic interview. Thirty for a behavioral pattern study. The reason the question is hard is that nobody asks it in the second form. They ask "how many user interviews do you need," meaning what is the right number for me, right now, on the thing I am actually doing, and the answer is buried inside two decades of research the asker has not read.

This is a working answer. It cites the literature the answer is built on (Nielsen, Faulkner, Guest and Bunce and Johnson) and it names the thing those papers do not address: that the marginal cost of an interview has fallen by an order of magnitude over the last three years, which means the sample-size conversation is no longer a conversation about budget. It is a conversation about saturation, segmentation, and what you are actually trying to learn.

How many user interviews do you need: the short answer

How many user interviews you need depends on the type of study you are running, not on the size of your company or product. The defensible defaults, by study type:

Usability tests: 5 participants per task per persona. From Jakob Nielsen and Tom Landauer's 1993 model, five participants surface roughly 85% of severe usability problems on a defined task.
Thematic interviews (one homogeneous group): 6 to 12. Guest, Bunce and Johnson's 2006 SAGE paper on saturation found that "basic elements for metathemes were present as early as six interviews," with saturation reached by twelve in a homogeneous sample.
Thematic interviews across multiple segments: 6 to 12 per segment, not 6 to 12 total. Two personas means 12 to 24. Three means 18 to 36.
Behavioral pattern studies / diary studies: 15 to 30. Longitudinal work needs more participants to absorb dropout and to catch patterns that only show on the third or fourth entry.
Concept testing: 5 to 8 per concept, more if you are comparing concepts head to head.
Switch interviews (jobs-to-be-done): 8 to 12 recent switchers. The Re-Wired Group's switch interview protocol recommends recruiting recent switchers (within the last 90 days) for memory fidelity.

These numbers describe completed interviews, not invitations sent. Recruit 1.5 times your target to absorb the no-shows and the partial completions. Send a link to 15 people if you need 10 finished responses.

Where the "five users" rule actually applies

The most-quoted number in user research is also the most misused. Jakob Nielsen's claim that five users find about 85% of usability problems is a usability finding, derived from a curve fit to data on task-specific tests. It assumes a homogeneous user group performing the same task on the same interface. Outside those conditions, the curve does not apply.

Laura Faulkner's 2003 paper put the rule under empirical pressure: in her study, five users found between 55% and 99% of problems depending on the run. The mean was 85%, which is the number that became canonical, but the variance is real. Below five participants, the range of outcomes is too wide to make a confident call. Above ten or twelve, the marginal participant is mostly confirming what you already heard.

The rule was never meant for interview research. A usability test asks a person to perform a task while you watch. An interview asks a person to describe an experience. The first is observational and bounded by the interface. The second is interpretive and bounded by the participant's memory and willingness to be candid. They reach saturation at different rates, on different evidence, and they should use different sample sizes.

When somebody quotes the five-users rule for an interview study, they are usually quoting Nielsen for permission rather than for evidence. The right citation for interview work is Guest, Bunce and Johnson, not Nielsen.

Six to twelve for thematic interviews

The defensible number for a one-segment qualitative interview study is six to twelve participants. The evidence comes from Greg Guest, Arwen Bunce, and Laura Johnson's 2006 study published in Field Methods, which coded 60 in-depth interviews from a single homogeneous population. They found that 73% of codes were identified after the first six interviews, and that thematic saturation (no new codes emerging) landed by twelve. The finding has been reproduced in adjacent settings since.

Two qualifications matter. First, "homogeneous" is doing real work. The 12-interview saturation point assumes participants share the relevant traits: same product, same role, same use case. If your study mixes founders and individual contributors, or new users and power users, you are running two studies stitched together, and each one needs its own saturation budget.

Second, Braun and Clarke's thematic analysis framework treats saturation less as a fixed number and more as a judgment call about whether your codes still need refinement. The number 12 is a planning anchor. The actual stopping rule is the themes have stopped changing. Some studies hit that at 8. Some need 16. The number is downstream of the question, not upstream of it.

For most product-discovery interview studies, 8 completed interviews is enough to run a synthesis, 12 is enough to feel confident in the strongest themes, and anything above 20 in a single segment is usually wasted unless you have decided in advance to publish.

When you need more than twelve

Five reasons to plan for a bigger sample than the saturation default.

Multiple segments. If you are studying B2B SaaS buyers and end users on the same product, you need 6 to 12 of each. The themes that matter to a CFO are not the themes that matter to the analyst who actually uses the tool. Stratify, then sample within each stratum.
Sensitive or low-prevalence experiences. If you are studying a behavior that only 15% of your users exhibit, your effective sample is 15% of whatever you recruit. Plan for it.
Heterogeneous populations. Some studies (early-stage concept testing, generative research on a new persona) sit in spaces where you genuinely do not yet know what the segments are. In those, recruit broader and bigger, then re-segment on what you find.
Longitudinal designs. A diary study with one entry per participant is an interview. A diary study with seven entries per participant over two weeks is a behavioral pattern study. The longer playbook is in how to run a diary study with voice notes; the sample-size note is that dropout compounds, and you need 1.5x to 2x the target to keep the cohort viable through the final entry.
Quantitative validation alongside the qualitative. If you want to ship a number, not just a theme ("47% of churned customers cite onboarding as the reason"), you are running a survey nested inside an interview study, and the survey wants 100 or more responses, not 12.

The mistake is to default to large samples because they sound rigorous. Fifty thin responses to a vague question are worth less than five thick responses to a precise one. Sample size is a function of the research question, not a substitute for one.

How async changes the math

The classical sample-size literature was written when an interview cost something close to its true price. A 45-minute moderated session at $150 plus recruitment plus transcription plus the moderator's time runs $200 to $400 per completed interview, depending on the panel and the synthesis pipeline. At those prices, the conversation about sample size is partly a conversation about budget, and "12 is the saturation point" is also "12 is the most we can afford."

That math has shifted. AI-powered async user research collapses the cost stack: there is no scheduling round, no moderator, no transcription, and no separate coding pass. A researcher shares a link, participants answer in voice, text, choice, or rating on their own time, an AI interviewer asks smart follow-ups in real time, and a synthesis engine streams themes, quotes, and citations back as the responses land. The marginal cost of the 13th interview is close to zero. The marginal cost of the 30th is not meaningfully higher.

That changes two things about sample size.

The first change is that saturation is no longer the binding constraint. If running 20 interviews costs the same as running 8, you should run 20 and stop when the themes stop changing, not when the budget runs out. The honest version of the Guest, Bunce and Johnson finding is "themes stabilize between 6 and 12," not "you cannot learn anything more after 12." You can. It is just a diminishing return, and the older literature was written for a world where the diminishing returns hit the cost ceiling fast.

The second change is that the unit of analysis can be the standing surface rather than the scoped study. A persistent in-product feedback link that returns ten voice notes a week, synthesized as they land, is not a study with a sample size. It is a continuous feedback instrument that runs all year. The longer treatment of that pattern is in how to build a customer feedback loop. The relevant point for sample size is that the question shifts from "how many do I need" to "how long do I leave the link open." Months, usually.

This applies inside the company too. Before a feature ships, share the same link in internal channels (engineering, design, support, legal, finance) and collect a synthesized cross-functional view of objections in less time than it would take to schedule the meeting. The sample size for an internal pre-launch review is "everyone on the relevant teams" rather than "twelve recruited strangers."

"Yeah, the price isn't the problem. The problem is that nobody on my team trusts me to spend money on tools without showing them six dashboards first."

Participant · #5012 · third in a series, on a topic the team thought was about pricing

The point of that quote is that the team had not heard it in interviews one through twelve. The pricing themes had been saturated by interview six. The trust theme showed up on interview thirteen. In the old cost regime, the team would have stopped at twelve and shipped a pricing change. In the new one, the thirteenth interview cost the same as the first, and the call moved off pricing entirely.

How to know when you have enough

Six steps for picking a sample size on a real study, in order.

01 · Pick the study type before the number

Decide whether you are running a usability test, a thematic interview study, a behavioral pattern study, a switch interview series, or a continuous feedback loop. The study type sets the saturation point. Skipping this step is the most common reason teams end up debating sample size in the abstract.

02 · Stratify by segment

If your population has more than one segment that matters (different roles, different tenure, different products), do not pool them. Allocate 6 to 12 per segment and treat them as parallel studies that share an analytic pass. Three segments is 18 to 36 completed interviews, not 6 to 12 total.

03 · Recruit roughly 1.5x your target

Async completion rates are higher than synchronous ones (no scheduling round, no calendar conflict), but the tail is real. For voice and text studies on a homogeneous group, plan for 60% to 80% completion of invited participants. If you need 12 completed, send the link to 15 to 18. The wider craft of recruitment sits in how to recruit user research participants.

04 · Tune probing depth to the audience

The AI follow-up changes how much you learn per participant. A medium-depth probe on a vague answer ("I priced it against Linear, the per-seat thing scared my CFO") gets you the data a moderator would have extracted on the second turn. Shallow probes preserve response rate for in-product feedback links and churn flows. Expert probes are appropriate for switch interviews and recruited research panels. The pattern is in AI follow-up questions for user research. The reason this changes sample size: a study with medium-depth probes reaches saturation faster than one with no probes, because each interview is doing more work.

05 · Stop when themes stop changing

The honest stopping rule, from Braun and Clarke, is that you stop when the codes you are seeing have stopped changing. In practice this looks like a synthesis view that updates as responses land, with theme weights stabilizing across the last three or four interviews. If the next response is mostly re-confirming patterns you have already seen, you are at saturation. If it is introducing a new theme, you are not.

06 · Trust the second study, not the first

The first study on a question almost always over-indexes on the loudest voices in your recruitment pool. The second study, run two weeks later, on the same question, with a slightly different sample, is where the durable themes show up. Plan for two passes on anything you intend to ship from.

The honest answer

The honest answer to "how many user interviews do you need" is: enough to hear the same theme twice from people who arrived at it independently, in a sample that covers the segments you care about, with depth on each response that matches the question. For most one-segment product-discovery studies, that is 8 to 12 completed interviews. For multi-segment work, multiply by the number of segments. For longitudinal or behavioral pattern work, plan for 15 to 30. For a continuous feedback loop, the number stops being meaningful and gets replaced by a question about how long the surface stays open.

The deeper move is to stop treating sample size as a fixed cost and start treating it as a knob you can turn. The cost stack that fixed "12 interviews" as the practical ceiling for a decade is gone. The saturation evidence still holds. The economics that made saturation the ceiling do not.

FAQ

What is the minimum number of user interviews?

Five for a usability test, six for a thematic interview study on a homogeneous segment. Below five, the variance in what you learn is too wide to make a confident call. Below six on a thematic study, you are likely to miss codes that would have shown up by the eighth or twelfth interview. These floors come from Nielsen and Landauer's 1993 usability curve and Guest, Bunce and Johnson's 2006 saturation finding respectively.

Are 5 users really enough for user research?

Five users are enough for a usability test on a specific task, where you are observing how a participant interacts with an interface. They are not enough for a thematic interview study, a switch interview series, a behavioral pattern study, or a multi-segment investigation. The "five users" rule is a usability finding that gets misquoted as a general user-research finding. The right number for interview research is closer to 6 to 12 per segment.

How many user interviews do I need for product-market fit?

There is no fixed number. Product-market fit is a pattern that shows up across multiple studies and multiple signals (retention, usage, willingness to pay, the answers you get when you ask churned customers why they left), not a finding from a single interview series. A reasonable cadence is 8 to 12 interviews per quarter, run continuously, against rotating questions. The continuous discovery rhythm is covered in continuous discovery interviews.

How many participants for a diary study?

15 to 30, with 1.5x to 2x recruitment to absorb dropout across the diary period. A two-week diary study with 7 entries per participant needs more participants than a one-shot interview because completion compounds: a 90% per-entry completion rate over seven entries leaves you with about 48% of the cohort still active by the end. The full playbook is in how to run a diary study with voice notes.

How many user interviews for a B2B SaaS study?

Stratify by the relevant role (decision-maker, buyer, end user, admin) and run 6 to 12 per stratum. A B2B study that pools roles together usually surfaces conflicting themes and resolves them poorly. If buyers and end users disagree, that is the finding, and it shows up cleanly only when the two segments are analyzed separately.

When can I stop recruiting?

When themes have stopped changing across the last three or four completed responses. If response 11 is mostly re-confirming themes that were already strong by response 8, you are at saturation. If response 11 is introducing a new theme that was not in responses 1 through 10, you are not, and the next two or three responses are worth the cost of recruiting them. With AI synthesis streaming as responses land, that stopping decision is observable in real time rather than at the end of a coding project.

The sample-size question has a methodological answer and an economic answer. The methodological answer (5 for usability, 6 to 12 per segment for interviews, more for longitudinal and multi-segment work) has not changed and will not. The economic answer has, which means the binding constraint for most teams is no longer "what can we afford to run" but "what are we actually trying to learn." Talkful has a free plan that is enough to run a first study at 10 participants, and the working guide to voice user research covers the rest of the practice once the responses start landing.