How to Do Thematic Analysis in User Research
A working guide to thematic analysis in user research: six phases, common mistakes, and how AI changes the coding pass.
The codes are color-coded and arranged on a Miro board. The themes are named. The deck has been delivered. The researcher who ran the study admits the quiet version of the truth: the codes were generated in two hours on a Sunday, the board took six, and the themes were already decided before transcript twelve. This is what a lot of product teams call thematic analysis, and it is not.
Thematic analysis is the most-cited qualitative method in user research for good reason. Done honestly, it converts a stack of interview transcripts into themes a team can defend and ship from. Done lazily, it produces a confirmation deck that ratifies whatever roadmap was already on the table. This is a working guide to the first kind: the six phases that came out of Virginia Braun and Victoria Clarke's foundational paper, the mistakes that quietly kill each one, and how an AI pass on the mechanical steps changes the math without removing the researcher's judgment from the parts that matter.
What thematic analysis is
Thematic analysis is a qualitative method for identifying, organizing, and reporting patterns of meaning across a dataset of interviews, voice notes, open-ended survey answers, or other transcripts. It works inductively (themes emerge from the data) or deductively (themes are tested against the data), and it produces findings expressed as named, defended themes anchored to verbatim quotes.
The dataset can be small (six interviews) or large (six hundred), and the method scales by adding coders, not by expanding scope. The unit of analysis is the theme, not the transcript. A theme is a pattern that captures something important about the data in relation to the research question. A code is the unit you tag a passage with before you know if it's part of a theme. The distinction matters because most teams collapse them and end up with a flat list of codes presented as themes, which is neither.
When thematic analysis is the right method
Thematic analysis is a fit when the research question is open-ended and the data is qualitative. It is not the only choice. Grounded theory builds a formal model from the data and demands theoretical sampling. Content analysis counts the frequency of pre-defined categories and produces numbers. Interpretative phenomenological analysis goes very deep on a small number of cases. Thematic analysis sits between these, which is why it suits most product-discovery work: it is rigorous enough to defend in a stakeholder review and flexible enough to apply to a churn study, a JTBD interview set, or a stream of in-product feedback.
Three cues that thematic analysis is the method you want:
- You ran an exploratory study and want to know what patterns the participants surfaced.
- You need to communicate findings to a non-research audience that will read themes and quotes, not coefficients.
- You expect the analysis to be revisited as new data lands, not closed off after one writeup.
The six phases of thematic analysis
Braun and Clarke's original 2006 paper named six phases. They are sequential, but the process is recursive: phase four often sends you back to phase two. Treat them as a discipline, not a checklist.
01 · Familiarize yourself with the data
Read every transcript at least once before you tag anything. For voice studies, listen as well. Read at full attention, not at 1.5x, and not while doing something else. The familiarization pass is not optional. It is the only phase where you let the dataset shape your mental model instead of imposing your model on the dataset.
Take notes on what stood out, in your own words, while still raw. These notes are not codes. They are the field of impressions the formal coding pass will later confirm or contradict. Skipping this phase is the most common cause of theme inflation: you start coding immediately, the first few transcripts anchor the taxonomy, and by transcript thirty you are forcing the data into shapes that no longer fit.
02 · Generate initial codes
Open coding means attaching a short label (one to four words) to every passage that says something about the research question. The discipline here is to stay close to the participant's words. "Didn't trust the price would actually go through" is a code that carries reasoning. "Trust" is a label that throws the reasoning away.
Two grammars are useful in the first pass:
- In-vivo coding uses the participant's verbatim phrase as the code. Best for early passes when you want the dataset's grammar to surface.
- Descriptive coding uses a short researcher gloss. Useful once you have enough in-vivo codes to see categories starting to form.
Most teams do both, in that order. Codes can be semantic (what the participant said) or latent (what the participant implied). Tag both, but flag the latent ones. They are the codes a reviewer will challenge later. Expect the first-pass coding list to be long, messy, and overlapping. That is the right shape. Merging happens in phase three, and you cannot un-merge well.
03 · Search for themes
Once every transcript has been coded, the codes go on a wall, a Miro board, or a spreadsheet column, and you start clustering. Codes that point at the same underlying idea become a candidate theme. Codes that don't fit anywhere become their own pile, which you'll either promote or discard later. This is where qualitative analysis stops feeling like reading and starts feeling like sorting.
A theme is a pattern, not a topic. "Pricing" is a topic. "First-time users don't trust the price until they see the confirmation email" is a theme. The first describes the data; the second explains it. Aim for the resolution at which a theme can be stated in one sentence with one verb.
Affinity diagramming is one common technique for this phase, especially in workshop settings. Whatever the tooling, the test is the same: the cluster has to mean something, and you should be able to defend that meaning when someone asks why those codes belong together.
04 · Review and refine themes
Themes get tested at two levels. First, against the coded passages: does every passage tagged with this theme really belong? Move the ones that don't. Second, against the entire dataset: does this theme hold up when you re-read the transcripts looking for it specifically? If the theme is real, you will see it confirmed in places you hadn't coded yet. If it isn't, you'll see it falling apart.
This is also where you run the negative case. For each candidate theme, find the participant who pushes back on it, then read their transcript more carefully than the participants who agreed. There is almost always at least one. The negative case sharpens the theme statement, because you now have to write it in a way that accounts for the dissenting voice. A theme that says "users don't trust the checkout flow" falls apart the moment a reviewer asks "all of them?". A theme that says "first-time users on mobile lose trust when the price changes between cart and checkout; returning users skip past it" survives.
Some teams use an inter-coder reliability check at this phase, where a second researcher codes a subset of transcripts independently and the two compare. It is heavier than most product teams will run, but it catches the bias of a single perspective.
The output of phase four is usually three to seven themes. Fewer than three and you under-coded. More than ten and you over-clustered into themes that don't cut clean enough to act on.
05 · Define and name themes
Each theme needs a one-sentence definition (what it is and what it is not) and a name that a non-researcher will read once and remember. "Trust collapse at price change" names a theme. "Trust" is a header in a deck.
The naming pass is where the analysis stops being internal and starts being communicable. Write the definition before the name. The definition forces precision; the name distills it. Then re-read the codes under the theme: if the codes don't all support the definition, either the definition is wrong or the codes belong elsewhere. Move them now, not later.
Each theme also needs three to six verbatim quotes attached, with the participant ID and the audio timestamp where the recording exists. Not paraphrases. Not summaries. The actual sentence the participant said. The reason verbatim matters isn't pedantic: a finding lands when the audience hears the participant in the participant's words. "Several participants mentioned trust concerns" is forgettable. "I just kept thinking, what if I press this button and the price changes?" with thirteen seconds of audio behind it is not.
06 · Write the analytic narrative
The deliverable is not the analysis. The deliverable is whatever changes on Monday because the analysis happened. Before you write up anything, write one sentence that names the decision the team will make differently because of this study. If you can't, the analysis isn't done; you've stopped at description.
The writeup should foreground the themes, not the methodology. Three to five themes, each one sentence, each tied to a roadmap question. Two to three quotes per theme, with audio links if you have them. A short methodology section at the back for the reviewer who wants to see the corpus, the sampling, and the coding approach. The bulk of the document is the themes; the methodology is a footnote that exists to make the themes defensible.
The pull-quote that the deck remembers is rarely the one with the cleanest sentence. It is usually the one the participant interrupted themselves to revise. Tag for that one.
I just kept thinking, what if I press this button and the price changes again.
A worked example
A team running churn interviews on a SaaS product asks five questions to twenty participants who downgraded or cancelled in the last sixty days. The voice answers run to about thirty thousand words of transcript across the dataset.
Phase one is a Tuesday morning. The researcher listens through every recording at 1x with the transcript open in a second window. Two pages of impressionistic notes. No codes yet.
Phase two takes Wednesday and Thursday. The researcher tags passages, mostly in-vivo, mostly short. By Thursday afternoon there are 287 codes, many of them near-duplicates. That's expected.
Phase three happens Friday in a Miro board. Codes cluster into seven candidate themes. Two of them feel weak (single-participant) and get demoted to "single-voice findings". Five strong themes survive into the weekend.
Phase four runs Monday: re-read every transcript with the five themes in mind. Two themes hold. Two collapse into a stronger combined theme. One is reframed when a negative case forces a tighter definition. Final count: four themes.
Phase five is Monday afternoon: each theme gets a definition, a name, and four to six verbatim quotes with timestamps. Audio clips attached.
Phase six is a Tuesday writeup: four themes, ten roadmap questions, one decision the team will make differently. The deck is six slides. The methodology appendix is three.
Total elapsed time: a researcher week. Total focused time: roughly twelve hours.
Common mistakes that kill thematic analysis
Coding in the researcher's voice, not the participant's
"User confused by checkout" is a designer's gloss. "Didn't trust the price would actually go through" is the participant's reasoning. The first is fast to write and useless six months later. The second carries the why. The way to avoid the trap is to write the first ten codes for any new transcript in the participant's own words, then allow descriptive codes once the in-vivo vocabulary is set. The leakage of researcher language into the codebook is what later produces a deck that confirms the team's prior beliefs rather than the participants' actual reasoning. The framing of the original research questions matters here too: a leading question produces leading codes, which produce a leading theme.
Stopping at the first pattern that confirms the brief
The first pattern that surfaces is usually the one you went in looking for. Tag it, but keep coding. The interesting themes are usually the second and third surprises, which only show up after you have coded past the easy answer. A useful heuristic: if no theme in your final list contradicts the brief that started the study, you have probably stopped too early.
Skipping the negative case
A theme without a negative case is a hypothesis dressed as a finding. The participant who disagrees is the participant who teaches you where the boundary of the theme actually is. Read them harder than the participants who agree. The negative case test is the single most underused move in qualitative work; it's also the move that separates a thematic analysis from a confirmation summary.
Theme inflation
Ten themes is not better than four. Themes that don't cut clean enough to act on are research weight without research signal. If you can't say a theme in one sentence with one verb, it's a topic, not a theme. Demote it to context and keep moving.
Treating the writeup as the deliverable
A theme deck that doesn't change a decision is a description, not research. Write the decision sentence first, then the themes that support it. If the decision sentence doesn't appear, the analysis isn't done. This is the discipline that distinguishes a synthesis that ships from one that gets bookmarked and forgotten.
How AI changes thematic analysis
Thematic analysis was designed in a pre-LLM world, and the six phases reflect that. Phases one, two, and five are mechanical and slow. Phases three, four, and six are interpretive and irreducible. An AI pass on the mechanical phases changes the math without removing the researcher from the interpretive ones.
Talkful is AI-powered async user research for product teams. Participants answer in voice, text, choice, or rating, an AI interviewer asks smart follow-ups in real time, and a synthesis engine streams themes, quotes, and citations back as the responses land, ready for the team to ship from or for the agents you build with to act on. For thematic analysis specifically, three things compress:
- Phase one (familiarization) stays a human step, but the per-response sentiment, summary, and timestamped quote are already attached when you open the response, which means the listening pass is supported instead of starting from blank.
- Phase two (open coding) is where AI does the most work. Each response arrives with first-pass codes inferred from the transcript, anchored to the audio timestamps. The researcher accepts, edits, or rejects them; the labor saved is on the bulk-tagging that used to take Wednesday and Thursday.
- Phase five (define and name themes) gets the verbatim quotes and audio clips attached automatically. The researcher writes the definition and the name; the receipts are already in place.
The phases the researcher still owns are phase three (clustering codes into themes), phase four (the negative case test), and phase six (the decision sentence and the writeup). These are the parts where judgment changes the answer, and they are not the parts a language model is good at. A common failure mode in 2026 is to delegate phase three to a model and then present the result as a finding. The model is doing pattern matching, not theme building, and the difference shows up in how the deck holds under stakeholder questioning.
Adaptive probing is also part of the data the analysis runs on. In Talkful, probing depth is configurable per question (shallow, medium, or expert), so the researcher chooses how aggressively the AI interviewer follows up on vague or contradictory answers. Expert depth produces richer transcripts (more context, more reasoning, more negative cases surfaced in the interview itself), which makes phase four easier later. Shallow depth is fine for high-volume in-product feedback where dropoff matters more than depth. The choice is a methodology decision the researcher owns; the product just enforces it.
Because a Talkful study link is a standing instrument rather than a one-shot campaign, the corpus that thematic analysis runs on can be continuous. A link sitting in the cancel flow, the post-onboarding email, or a help-menu affordance feeds new transcripts into the same study every week. The themes evolve as the data evolves, which is the original spirit of recursive thematic analysis rather than the freeze-and-deliver simplification most teams settle for.
FAQ
What is the difference between thematic analysis and content analysis?
Thematic analysis is interpretive and produces named themes that explain the data. Content analysis is descriptive and produces frequency counts of pre-defined categories. A thematic analysis on a churn study might surface a theme like "trust collapses at the price-change moment"; a content analysis would tell you how many participants mentioned the word "price". The two methods answer different questions and are often combined, with thematic analysis on the open-ended sections and content analysis on the categorical ones.
How long does thematic analysis take?
For a study with twenty voice transcripts averaging a thousand words each, a full inductive thematic analysis across all six phases takes one experienced researcher between ten and fifteen hours of focused work, usually spread across three to five days to allow for the recursive moves between phases. AI assistance on phases one, two, and five compresses that by roughly 40-60% in our experience, but phases three, four, and six (the interpretive ones) take the time they always have.
How many participants do you need for thematic analysis?
Saturation on a homogeneous group typically lands between six and twelve interviews, following Guest, Bunce and Johnson's empirical work on saturation in qualitative interviewing. For mixed recruits or comparative studies, plan for more. The practical signal is when the next transcript stops adding new codes to your list. If transcript twelve is still producing five new codes per response, you're not at saturation; if it produces zero, you are. See our companion piece on sample sizing for the longer argument.
Can AI do thematic analysis on its own?
It can do parts of it. Recent peer-reviewed work shows large language models at substantial agreement with human researchers on first-pass coding and quote extraction, with much weaker performance on theme clustering and the negative case test. The current honest position is: use the model for the bulk tagging, keep the researcher in the loop for the interpretive phases, and never present an LLM-generated summary as a thematic analysis. The result of the first is research; the result of the second is description.
What is the difference between coding and themes?
A code is a short label attached to a specific passage of transcript ("didn't trust the price would actually go through"). A theme is a pattern that emerges when you cluster codes that point at the same underlying idea ("first-time users on mobile lose trust at price change"). Codes live at the sentence level and are descriptive; themes live at the study level and are explanatory. The coding pass is slow and exhaustive; the theming pass is fast and opinionated.
Is thematic analysis qualitative or quantitative?
Qualitative. Thematic analysis is one of the most-used methods in qualitative research because it sits between rigid frameworks like grounded theory and looser interpretive approaches. It can be combined with quantitative analysis (a theme's prevalence across the corpus can be counted, sentiment can be scored), but the named themes themselves are interpretive findings, not measurements.
A thematic analysis is honest when the deck answers the same question the dataset answered, and dishonest when the deck answers the question the team wanted answered. The discipline lives in the recursive moves between phases two, three, and four. The reason we built Talkful is that the mechanical phases (the listening, the coding, the quote pulling) used to consume the time the interpretive phases needed. AI compresses the mechanical phases so the researcher can spend more time on the parts that change the answer. The free plan is enough to run a first study and watch the synthesis pipeline build the corpus in real time. What the team does with the themes is still the human's job.