How to synthesize user research

How to synthesize user research into one decision-grade artifact: cluster themes, weight frequency against importance, and keep the voice in the room.

Rizvi Haider··14 min read·Updated May 7, 2026

The deliverable is a 38-slide deck nobody opened. The codebook has 142 codes. The Notion page has eight nested toggles, four of which contain only a single quote. A decision the team needs to make on Wednesday is buried somewhere in the volume, and nobody can find it. This is what happens when you treat synthesis as the act of describing what was said. It is not. This is a working method on how to synthesize user research so it produces an artifact a stakeholder acts on instead of a document a team archives.

What synthesis actually is (and what it is not), the order of operations from coded transcripts to a one-page call, and what changes when the source material is voice rather than written summaries.

What user research synthesis is

User research synthesis is the work of compressing many participant responses into a small number of decisions a team can act on. It is not summarisation, transcript review, or theme labelling, although it depends on all three. The unit of output is not a theme. It is a choice the team can now make differently than they would have made on Monday.

The framing matters because most synthesis stalls between coding and decision. The researcher finishes thematic analysis, produces a clean codebook, and then writes a long document that lists the themes with quotes underneath each one. That document describes the data. It does not synthesize. The two activities feel similar from the inside (both involve organising qualitative material) and produce very different artifacts. A description tells you what participants said. A synthesis tells you what to do.

The academic backbone here is Braun and Clarke's six-phase thematic analysis, which ends at "writing the report." Most product teams stop one phase too early, at theme definition, and call that synthesis. The phase the literature calls "writing" is where synthesis actually happens, and it is the part that requires choosing.

How to synthesize user research, step by step

Six steps. They assume the transcripts have already been cleaned and openly coded along the lines we describe in analyzing user interview transcripts. If you have not done that yet, start there. Synthesis built on a sloppy codebook produces a confident-sounding deck about the wrong thing.

01 · Restate the question before you cluster anything

Before any clustering happens, write down the question the study was supposed to answer. One sentence. Not the prompt the participant saw. The decision the team was hoping to inform. "Should we build a calendar integration before or after the Slack one." "Why do new users churn between day three and day seven." "Is the pricing page the bottleneck or is it the trial length."

Most synthesis goes wrong because the original question has drifted by the time the data arrives. The researcher has read forty transcripts, the questions inside each transcript are interesting, and the impulse is to surface everything interesting. Resist it. Interesting is not the same as decision-relevant. The question on the wall is the filter that separates what enters the synthesis from what stays in the codebook for next time.

A useful test: if a finding does not change a decision the team has already made or is about to make, it is not part of this synthesis. It might be part of the next one.

02 · Cluster codes by decision, not by topic

Open the codebook. Group codes into the smallest number of clusters that map onto branches of the decision. If the question is whether to build the calendar or Slack integration first, the clusters might be: pull toward Slack, pull toward calendar, hidden third option, blocker that overrides both. Four clusters. Not twelve.

The mistake here is to cluster by topic ("onboarding", "pricing", "team setup") because the topic structure is what falls out of the codebook by default. Topic clusters produce themes; decision clusters produce calls. The same set of codes will sort differently depending on which structure you pick, and the difference shows up in the meeting where stakeholders read the output. A topic-clustered synthesis prompts the question "interesting, what should we do?" A decision-clustered synthesis arrives with the answer pre-attached.

This is also where the voice version of jobs to be done interviews earns its keep. JTBD synthesis explicitly clusters by force (push, pull, anxiety, habit) rather than by topic. The same instinct applies here for any decision: cluster around the axes of the choice, not around the surface taxonomy of the data.

03 · Weigh frequency against importance honestly

Frequency is the number of participants who raised a code. Importance is the size of the consequence. They are not the same column, and the most expensive synthesis errors come from confusing them.

Twelve participants mentioning a small UX nit and one participant describing the reason they would cancel are not equivalent inputs even though one has twelve votes and one has one. The qualitative-research literature has named this problem for years; the Nielsen Norman Group has a useful short piece on why frequency is not importance and the heuristics for sorting between them.

The practical move is to keep two columns when you cluster. Mentions and weight. A code can be high-mentions and low-weight (a polish issue), low-mentions and high-weight (a churn driver mentioned by one person who clearly speaks for many), or both. The synthesis surfaces the both-columns and the high-weight quadrant. The high-mentions, low-weight quadrant becomes a bug list, not a finding.

The thematic-saturation literature (Guest, Bunce and Johnson, 2006) suggests most dominant themes show up in the first six interviews and are stable by twelve. If a theme appears in only one transcript out of forty, you do not have evidence of a pattern. You have evidence of a single participant. That can still be a finding, but you have to argue for it on weight, not on count.

04 · Build one artifact, three layers

The output of synthesis is a single artifact. Not a deck and a Notion page and a Loom and a Slack thread. One artifact, three layers, in this order.

The first layer is the call. One paragraph at the top of the artifact. The decision the research now suggests, in plain language, with the confidence level explicit. "Build the calendar integration first. High confidence (8 of 12 participants raised it as the top blocker). The Slack integration is wanted but not load-bearing for the trial decision."

The second layer is the evidence. Three to five clusters, each with the cluster name, the count of participants, the weight summary, and two to four verbatim quotes. The quotes are not decoration. They are the load-bearing element. A reader who only trusts the first sentence and the quotes should arrive at the same call as a reader who reads the whole thing.

The third layer is the audio, where it exists. A short embedded clip from the most representative response in each cluster. Forty-five seconds, not five minutes. The audio is what makes the synthesis trustable in the meeting where it gets read out, because nobody can argue with a participant's hesitation in their own voice.

05 · Keep the voice in the room

The recording is the data. The transcript is a derivative of it. Most synthesis throws the audio away after coding and walks into the stakeholder meeting with text only, then is surprised when the room argues about whether the participant "really meant" the negative thing they said. Of course they argue; the words alone are ambiguous. The pause before the word, the laugh, the audible swallow before the customer admits they were going to cancel, those are the parts of the data that carry conviction. A synthesis that includes embedded audio at the cluster level closes that ambiguity in the only way that closes it.

This is the place where async voice studies have a structural advantage over typed surveys. The artifact is already audio. Nothing has been thrown away. The synthesis can lean on a fifteen-second clip without anyone having to schedule a re-listen of the original session. We have written separately on what voice catches that text loses; for synthesis specifically, the rule is: if your final artifact has no audio in it, you have synthesized the transcript, not the research.

For voice studies, the practical move is to mark a single representative clip per cluster during coding (this is faster than choosing later, when you are tired and every clip sounds the same), and to attach those clips to the synthesis artifact directly rather than linking out. Friction matters. A clip that takes one click to play gets played. A clip behind a sign-in does not.

06 · Hold a synthesis read-out, not a synthesis review

The last step is the meeting. Most teams treat the synthesis review as a presentation: the researcher walks through the deck, stakeholders nod, and the artifact dies in the document graveyard. The shape that produces decisions is a read-out. Forty-five minutes. The artifact is shared in advance. The first ten minutes are silent reading. The next twenty are listening to the audio clips together. The final fifteen are the decision conversation, which now starts from a shared baseline rather than a presenter trying to convince the room.

The read-out structure is what teams running continuous discovery interviews tend to land on after a quarter or two. The synchronous part of the work is not the participant interviews; it is the team listening to the recordings together. Synthesis collapses into that meeting. Without it, the artifact has to do all the work alone, and one-page artifacts are not strong enough to convert a divided room into a decision.

Why voice changes user research synthesis

The Maze Future of User Research Report 2026 found that 69 percent of researchers now use AI somewhere in the pipeline, with synthesis the most-touched step. The fast version of that statistic is "AI is good at synthesis." The honest version is "AI is good at compression and speed, and synthesis is mostly a judgement task that compression alone does not solve."

What AI does well in synthesis is the mechanical layer: cleaning transcripts, pulling candidate quotes, surfacing recurring phrases, summarising long responses into short ones. What it does badly is choosing which cluster matters for the decision the team is actually making. That second job is the one that requires having read the question on the wall, talked to the stakeholder, sat in the strategy meeting last week. AI does not have access to most of those inputs. It will produce a clean-looking synthesis built on the wrong axis.

Voice changes the math because the audio survives all the way to the stakeholder. Synthesis with text-only inputs has to compress aggressively (because nobody reads forty transcripts) and the compression is where most of the signal goes. Synthesis with voice inputs can compress at the artifact layer (one page, three clusters) while preserving the original recording at the evidence layer (a single clip per cluster). The team gets a fast read of the call without giving up the source material. That is the structural improvement. Most teams notice it the first time a stakeholder says "wait, play that one again" in the read-out. The conversation that follows is the synthesis doing its job.

The wider methodology fits inside the voice user research guide, which covers where synthesis sits in a continuous research practice rather than as a one-off ceremony.

When user research synthesis goes wrong

Three failure modes worth naming up front, because each one looks fine until the read-out where it falls apart.

The synthesis describes the data instead of choosing. The artifact is long, balanced, and ends without a call. Stakeholders read it and ask "so what should we do?" The fix is to write the call first and let the evidence support it, rather than writing the evidence first and hoping a call falls out at the end. If you cannot write the call at the top, you do not yet have a synthesis. You have an analysis.

The synthesis confuses frequency with importance. The top finding is "users want better onboarding," supported by twenty-eight mentions, none of which would actually change anything. Meanwhile the one transcript where a customer described why they cancelled is buried on slide 14. The fix is the two-column step in 03 above: keep mentions and weight separate, and surface the high-weight quadrant before the high-mentions one.

The synthesis omits the audio. The artifact is text only, the read-out becomes a debate about interpretation, and the team leaves without a decision. The fix is to attach the clips, not link them, and to schedule the silent listen as part of the meeting rather than treating audio as optional supplementary material. Voice carries the conviction. A synthesis that does not include it leaves the conviction at the door.

FAQ

What is user research synthesis?

User research synthesis is the work of compressing many participant responses into a small number of decisions a product team can act on. It depends on prior steps (recruitment, interviewing, transcription, coding, theming) but is distinct from any of them. The output is not a theme list or a thematic memo. It is a one-page artifact that names the call, supports it with three to five evidence clusters, and embeds verbatim quotes (and ideally audio clips) at the cluster level. If a finding does not change a decision the team is about to make, it does not belong in the synthesis.

How long should research synthesis take?

The compression itself usually takes a working day per study once the codebook is clean. The codebook itself takes most of the upstream time. A good heuristic for a 12-participant qualitative study is one day for transcript cleaning and open coding per six interviews, one day for theming, and one day for synthesis (cluster, weigh, write the call, attach clips, schedule the read-out). Async voice cuts the first half of that roughly in half because transcription and timestamping happen automatically, but the synthesis day is the same shape regardless of input format.

How is synthesis different from thematic analysis?

Thematic analysis produces themes. Synthesis produces decisions. Thematic analysis is the upstream step (Braun and Clarke's phases one through five); synthesis is the downstream one (phase six, "writing the report," extended into a stakeholder-facing artifact). Most product teams stop at theme definition and call that synthesis, which is why so many research outputs read as descriptions rather than calls. The two activities can share an underlying codebook but produce different artifacts and require different skills. Coding rewards patience; synthesis rewards judgement.

How many participants do I need before synthesis is reliable?

Dominant themes typically stabilise around six to twelve participants on a homogeneous segment, which is consistent with the Guest, Bunce and Johnson finding on thematic saturation. Below five participants you risk drawing the synthesis around one loud customer; above twelve, marginal returns drop sharply unless you are running across multiple segments. For multi-segment studies (small business versus enterprise, for example), treat them as separate syntheses rather than averaging across them.

Can AI do user research synthesis?

AI can do the mechanical layers (cleaning transcripts, surfacing candidate quotes, summarising long responses) faster than a human can. It cannot reliably do the choosing layer, which is where synthesis lives. The choice of which cluster matters for the decision the team is making depends on context the model does not have (the strategy meeting last week, the stakeholder's red lines, the engineering constraint nobody wrote down). The Maze 2026 report found 69 percent of researchers use AI in the pipeline; the same report flagged "interpretive discipline" as the differentiator that separates fast synthesis from useful synthesis. Use AI for compression, keep the call human.

What goes in a research synthesis artifact?

One page. The top paragraph names the call and the confidence level. The middle section has three to five evidence clusters, each with a cluster name, the participant count, a one-sentence weight summary, and two to four verbatim quotes. The bottom layer is audio, with one 45-second clip per cluster embedded directly in the artifact (not linked). Anything else is supporting material that lives in a separate document. The synthesis itself is the one page.


Synthesis is the part of user research that almost always gets done badly because it is judgement, not analysis. It rewards choosing over describing, audio over transcript, and a one-page artifact over a thirty-eight slide deck. The mechanical parts of the pipeline (transcription, timestamping, candidate-quote extraction) are getting cheaper every quarter; the part that remains expensive is the human reading the question on the wall and deciding which two clusters matter. Talkful is built so the audio survives all the way to the synthesis artifact, which is the bit most tools quietly drop, and the rest of the playbook lives in the voice user research guide.