How to run AI moderated user interviews

How to run AI moderated user interviews: seed questions, configurable probe depth, placement, and when to keep a human in the loop.

Rizvi Haider·June 2, 2026·15 min read·Updated June 2, 2026

The first AI moderated user interview a product team runs almost never goes the way they expect. The team designs it as a survey with an extra clever feature bolted on, watches the AI ask a probe they would not have asked, and either decides the tool is magic or decides it is broken. Both reactions are early. The interesting version of the practice shows up in week three, when the team stops treating the AI as a moderator-replacement and starts treating it as an instrument with three knobs they actually have to turn.

This is a working guide on how to run AI moderated user interviews, and the operational decisions that separate a study that closes the qualitative gap with a live call from one that just produces longer transcripts of the same shallow data. The decisions are not technical. They are research-craft decisions, made per question, by the researcher or PM who knows the assumption they are trying to test.

What is an AI moderated user interview?

An AI moderated user interview is an asynchronous research session in which an AI plays the role of the moderator: it reads the participant's answer in real time, decides whether to ask a follow-up, picks the right kind of follow-up from a small taxonomy of probes, and stops when the answer contains the detail the seed question was after. The participant answers in voice, text, choice, or rating; the AI listens to whichever modality they used, transcribes voice if they chose voice, and treats the running thread as a conversation rather than as a form. The output is a transcript plus a synthesized view of themes, quotes, and citations across every participant who took the study.

The category exists because async surveys collect the participant's first thought on a question and almost never the second. A moderated interview produces a transcript that is, on average, two to three times longer per minute of participant time than an unmoderated written response on the same prompt, and most of that volume is the probe-and-answer pattern that does not happen when a seed question stands alone. AI moderation is the operational answer to that gap, and the AI follow-up questions post covers the underlying probe mechanism in more depth. This guide is about running the whole interview, not just the probe.

How to run AI moderated user interviews, step by step

Six steps. They are sequential. Step one is the one most teams skip and the one that most often decides whether the study will produce a decision or a document.

01 · Pick the decision the interview will inform

Before writing any seed questions, write one sentence: what decision will the team make differently if this study comes back the way I expect, versus the opposite way? If that sentence does not exist, the interview is going to produce interesting transcripts and zero product changes. The whole point of an AI moderated user interview is that it is cheap enough to run that the temptation is to run one without a decision attached. Resist it.

The sentence is also what tells you the depth setting for each question (covered in step three). A decision the team is genuinely uncertain about supports an expert-depth probe; a decision the team has already made supports nothing, and running the study is theatre.

02 · Write seed questions a senior moderator would be glad to open with

The AI moderator amplifies whatever the seed surfaces. A vague seed with a deep probe chain produces vague answers with deep clarifying probes, which is the worst of both worlds. Write four to six seed questions, not twelve, and pass each one through three rules.

Anchor to a specific recent moment. "Walk me through the last time you tried to import a CSV" works. "How do you usually use the import?" produces a generalized summary that no probe can rescue.
Open, not closed. As Nielsen Norman Group's guidance on AI tools for UX research makes clear, the probes only work on top of a seed question that gives the participant room to answer at length. Closed seed + AI moderator = the AI generating a question the participant has already answered.
No leading construction. "Was the import confusing?" presupposes the answer. "Tell me what happened the last time you imported" does not.

The companion post on writing user research questions covers prompt craft in depth. For an AI moderated study the rule of thumb is: write a seed that a senior researcher would be glad to open a moderated interview with, then let the depth setting decide how far the AI takes it.

03 · Configure probe depth per question (shallow, medium, expert)

The single most consequential decision in an AI moderated user interview is per-question depth. A probe is not the unit of design; depth is. Three settings cover most cases, and the researcher picks one per seed question based on what the question is doing in the study.

Shallow. At most one clarifying probe. The AI fires it only when the answer is short, vague, or contradictory. Pick shallow for short studies, rating sweeps with an open comment, in-product feedback links where dropoff matters, and any prompt where the seed question is doing most of the work. A two-minute feedback prompt in a churn flow is shallow by default.

Medium. A short chain of two to three probes when the participant's earlier answer is still vague, contradictory, or references something the AI cannot resolve. The default for most product-discovery work. After each probe the AI re-reads the running thread and stops the moment the answer is clear.

Expert. The cap comes off. The AI keeps probing until the answer contains the same level of context a senior researcher would dig out in a moderated interview, or until the participant disengages. Use expert when the study is the primary research artifact and the participant is invested enough to keep going. Jobs-to-be-done switch interviews, diary studies, and internal stakeholder reviews are the canonical cases.

Choice questions and rating-without-comment questions should never trigger a probe; the answer is in the choice. The participant retains the right to skip on every probe at every depth.

04 · Choose the placement before the question count

Where the study link lives decides what it can do for the team. A study link is not a one-time research artifact; the same link is designed to live wherever the team wants ongoing signal. Before you publish, pick the placement and let it shape the seed-question count and the depth settings.

One-shot study, recruited cohort. Three to six seed questions, medium depth on most, expert on one. This is the closest shape to a traditional moderated interview, used when a specific decision needs depth.
Standing in-product feedback link. One or two seed questions, shallow or medium depth, placed in settings, the help menu, or a contextual "what's missing here?" affordance. Continuous signal from users in the moment friction is happening.
Churn and cancellation flow. One seed question on the cancel confirmation page, shallow depth. The highest-signal feedback a product team will ever get, and almost always thrown away when there is no probe at all.
Post-onboarding or activation moment. One seed question after the first session, first invoice, or day-seven retention check. Medium depth. The right shape for catching the friction the user has not yet decided to complain about.
Internal stakeholder review. Three to five seed questions sent as a link inside the company before a launch. Expert depth, because stakeholders are invested. The synthesis becomes a cross-functional view of objections before the team ships.
Owned distribution. The same link in a customer newsletter, a Slack community, a partner round-up. Same depth settings produce the same synthesis pipeline.

The longer essay on continuous discovery interviews covers the standing-instrument shape in more detail, and the broader async user research methodology playbook covers the operational rhythm once a placement is chosen.

05 · Watch the first ten responses live before scaling

The temptation, after publishing, is to open the dashboard a week later and read the synthesis. Do not. Watch the first ten responses as they come in, with the audio on and the transcript open, and judge two things: are the probes the AI is choosing the probes you would have chosen, and is the synthesis pulling the same themes you would have pulled?

If the AI is over-probing on a seed question that was already producing rich answers, lower the depth on that question to shallow. If the AI is stopping at a clarifying probe when the moment called for an explanatory one, raise the depth to expert. If the synthesis is naming a theme the team would not have named, that is either signal (worth investigating) or drift (a sign the seed question was vague). Either reading is useful, and both are only visible if a human watched the first ten responses.

This is the step that separates teams who treat AI moderation as a fire-and-forget feature from teams who treat it as an instrument they are tuning. The first study is always a calibration round.

06 · Read the synthesis at the assumption layer, not the response layer

Once the depth is tuned and responses are flowing, the synthesis engine pulls themes, quotes, and citations across every participant in real time, with each citation linked back to the source transcript. The mistake is reading the synthesis the way the team would read a deck: top to bottom, looking for surprises. The right read is at the assumption layer.

For each assumption the study was designed to test (step one), ask one question: do the responses support, contradict, or fail to address it? Three buckets, one decision per assumption. The synthesis surfaces the themes; the team supplies the decision. Quotes and clips back each theme so that any contested call can be checked against the participant's own words, and the output is in a format the team or the agents they build with can act on directly.

The synthesis playbook covers the read in more depth. For an AI moderated study, the operational version is shorter: the synthesis is not the deliverable, the decision per assumption is.

"I tried it on Tuesday and the columns came in shifted by one and I figured I'd come back, but I didn't. I think if it had told me what was wrong I would have fixed it, but it just sat there."

Participant · #2814 · in-product feedback link, medium depth

The first sentence is what a survey would have captured. Everything that follows is what the AI moderator's clarifying probe recovered.

When AI moderation is the wrong tool

Three cases where the answer is a live call or a different method.

Highly sensitive prompts. Anything legal, medical, financial, or emotionally heavy needs a moderator who can read tone, body language, and the pause before the answer. An AI can probe well on most prompts; it cannot defer a probe because the participant's voice cracked. Default to shallow at most on sensitive seed questions, or move the study to a moderated interview.

Emergent research questions. When the script itself is in flux and the researcher is learning what to ask, the right tool is a live call where the moderator can rewrite the question between participants. AI moderation runs the same configured study across every participant; it is not the tool for the exploratory week where the research plan is being written.

Very small N studies that need deep rapport. A five-participant study where the researcher needs each participant to open up over fifteen minutes will produce better data with a live interviewer who can build rapport across the whole call. AI moderation produces depth via probe chain, not via rapport, and the difference is real on the long-tail prompts.

The voice user interview guide covers when a live interview is still the right tool, and the broader voice user research guide covers the cadence patterns we have seen across teams.

How AI moderation changes the practice

Three changes worth naming, because they show up in the second month, not the first.

The unit of design moves from the script to the depth setting. In a traditional interview the script is the artifact the researcher iterates on. In an AI moderated study the seed question is short and the depth setting is what the researcher tunes between calibration round and full launch. Teams who keep iterating on the seed-question wording instead of the depth setting are running the wrong loop.

The calendar comes off the participant side. A continuous discovery interview habit usually dies in month two because three trio calendars and one customer calendar do not converge weekly. With an AI moderator on the participant side, the customer answers on their own time, and only the trio's listen hour needs to coincide. The cadence survives.

Internal stakeholder review becomes a single link. A pre-launch sanity check across engineering, design, support, legal, and exec used to be a meeting series or a Slack thread. With a link and expert-depth probes, every stakeholder answers in voice, text, or rating on their own time, and the synthesis produces a cross-functional view of objections before the team ships. The same instrument runs both customer and internal cohorts.

FAQ

What is an AI moderated user interview?

An AI moderated user interview is an asynchronous research session in which an AI moderator reads each participant's answer in real time, decides whether to ask a follow-up, picks the right kind of probe from a small taxonomy, and stops when the answer contains the detail the seed question was after. Participants answer in voice, text, choice, or rating. The output is a transcript plus a synthesized view of themes, quotes, and citations across every participant who took the study.

How is an AI moderated interview different from a traditional survey?

A survey collects the participant's first thought on a question and stops. An AI moderated user interview collects the first answer and then, when the answer is vague or contradictory, asks one or more follow-up questions to recover the detail the seed missed. The probes draw from the same taxonomy a human interviewer uses (clarifying, descriptive, idiographic memory, explanatory) and are capped by a per-question depth setting (shallow, medium, expert).

How deep should the AI probe on each question?

Per-question depth is the most consequential decision in an AI moderated study. Shallow allows one clarifying probe at most and fits short surveys, in-product feedback, and churn flows. Medium allows a chain of two to three probes and is the default for product-discovery work. Expert removes the cap and probes until the answer matches the depth of a moderated interview, suiting jobs-to-be-done switch interviews and diary studies. Choice and rating-without-comment questions should never trigger a probe.

Can AI moderated user interviews replace live interviews?

For most studies, AI moderation closes most of the qualitative gap between a survey and a moderated interview, especially at expert depth on prompts where the probe pattern is predictable. Live calls still beat AI on highly sensitive prompts (legal, medical, financial, emotionally heavy), on emergent research questions where the script is in flux, and on small-N studies where the researcher needs to build rapport across a full call. Most teams end up using both, with AI moderated studies running as the standing instrument and live calls reserved for the cases where rapport or script flexibility matters.

Where should the study link live?

The placement decides whether the study collects a small lift or a large one. A one-shot recruited cohort is the closest shape to a traditional moderated interview. A standing in-product feedback link, a churn or cancellation flow, a post-onboarding moment, or an internal stakeholder review all turn the same instrument into a source of continuous signal, with the depth settings adjusted to match the participant's tolerance for being asked one more thing.

How many participants do I need for an AI moderated study?

The same logic that holds for moderated interviews holds here: six to twelve responses on a homogeneous group is usually enough to read the pattern on a single assumption. For a one-shot study, recruit twelve to twenty participants and expect six to ten to complete. For a standing in-product link, the count accumulates over the week and the synthesis updates in real time as responses land.

AI moderated user interviews are not a replacement for the research-craft decisions a team has to make: the assumption, the seed question, the depth, the placement. They are the operational answer to the calendar cost that kills most continuous research habits in month two, and the synthesis-speed problem that turns a week of recordings into a deck nobody reads. Done well, they recover most of the qualitative gap between a survey and a live interview without the scheduling cost. Done poorly, they exhaust the participant and produce thinner data than the seed question alone would have. Talkful ships configurable depth, multimodal participant answers, and real-time synthesis on the free plan, and the longer voice user research guide covers the wider habit once the first study is in flight.