How to run an affinity diagram for user research

How to run an affinity diagram that ends in a decision, not a wall of sticky notes: the KJ method, why workshops stall, and what real synthesis looks like.

Rizvi Haider··18 min read·Updated June 16, 2026

Forty-three sticky notes on a meeting-room window, half of them in handwriting nobody can read by Thursday, the rest paraphrased from transcripts by whoever happened to be in the room on Tuesday. The PM looks at the wall, says "this was a good workshop", and the notes get photographed for the wiki. Nothing in the product roadmap changes. This is the moment most affinity diagrams die.

The method is older and better than the workshops that misuse it. An affinity diagram, run properly, is one of the most reliable ways to convert a pile of qualitative observations into a small number of decisions a product team can act on. It is also one of the easiest methods to fake the result with, which is why most attempts produce a wall and not a roadmap. This is a working guide on how to run an affinity diagram for user research: what the method actually is, the failure modes that empty it of signal, and the step order that lands at a finding instead of a photograph.

What an affinity diagram is

An affinity diagram is a synthesis method in which a researcher writes individual observations from a body of qualitative data (interview transcripts, voice notes, survey answers, support tickets) onto separate cards, then groups the cards bottom-up by similarity until natural clusters emerge. Each cluster becomes a theme, named in the participants' own language, and the relationships between clusters become the structure of the finding. The method was formalised by Jiro Kawakita in the 1960s as the KJ method and adopted into design thinking by IDEO and the Stanford d.school as one of the canonical synthesis moves.

The point of the method is not the wall. The point is to let the structure of the data emerge before the researcher imposes one on it. Most analysis fails because the categories are decided first and the data is sorted into them; an affinity diagram inverts that, which is why it survives in qualitative research practice fifty years after it was invented.

Why most affinity diagrams stall

Three failure modes show up across most stalled workshops. They tend to appear together.

The first is starting from paraphrases instead of verbatim. A note that reads "user wants faster onboarding" is the researcher's gloss, not the participant's observation. Cluster a wall of glosses and you get a taxonomy of researcher categories. Cluster a wall of verbatim and you get the structure the participants themselves used. The first version produces a deck that reads like every other deck; the second version produces a finding that surprises the team.

The second is top-down clustering. A facilitator walks in with five buckets already on the wall (Onboarding, Pricing, Support, Performance, Other) and asks the room to sort notes into them. This is filing, not synthesis. The buckets pre-decide the finding. The whole reason to run the method is that you do not yet know what the categories are; you are looking for them. As soon as the buckets exist before the cards do, the method is no longer an affinity diagram.

The third is closing the workshop before the synthesis step. The clusters get photographed, named loosely, and posted to a wiki where nobody reads them again. The clusters are the middle of the work, not the end. A theme that ends as a cluster name without a verbatim quote behind it and a recommendation in front of it is a description of the data, not a research finding. The team gets a wall. The roadmap gets nothing.

How to run an affinity diagram, step by step

Seven steps. Steps one and two are slower than they feel like they should be. Steps three through six are where the method earns its name. Step seven is the one most teams skip and the only reason to have done the rest.

01 · Decide what decision the synthesis serves

Before any cards go on a wall, write one sentence that names the decision the synthesis will inform. "Should we rebuild the onboarding flow or extend the existing one?" is a decision. "What did users say about onboarding?" is not. The first is answerable from a clustered wall; the second produces an infinite wall.

A useful affinity diagram is downstream of a research question, not a research dump. If the data was collected without a question in mind (a quarter of support tickets, six months of churn answers, a feedback inbox), the first job is to pick the slice of the data that maps to the decision and ignore the rest until the decision is named. The piece on how to write a user research brief covers the framing side of this.

02 · Extract observations from the data, not summaries

Each card is one observation, in the participant's own language, with the participant ID and a source link attached. This is the boring step that determines whether the rest of the workshop is real.

Three rules for the extraction pass:

  • One observation per card. A card that says "user found checkout confusing and also disliked the pricing" is two cards. Combined cards lose the structure of the data because they pre-cluster two ideas that the participant said in the same breath.
  • Stay verbatim, or as close as the medium allows. A card from a voice transcript reads "I closed the tab when I saw three plans and called my partner". A card that reads "plan-comparison friction" is a researcher gloss and will cluster with other glosses, not with other observations.
  • Attach the source. Every card has the participant ID, the prompt the participant was answering, and a link back to the original transcript or audio. Without the source, the wall is unverifiable the moment somebody asks "who said that?", which is the question that turns a workshop into a critique.

Voice transcripts produce the densest source data here because participants talk longer than they type, and the verbatim is right there in the timestamps. Text answers, choice answers, and rating-plus-comment answers all extract too, in roughly that order of yield. The longer case for keeping the audio in the synthesis loop sits in our piece on the parts of an answer that text loses; for an affinity diagram specifically, the rule is: if your card does not match a verbatim moment in the data, redo the extraction.

For a study with thirty voice responses across five prompts, expect somewhere between one hundred fifty and three hundred cards. Fewer than that and you under-extracted; more and you have duplicates that will reveal themselves in step four.

03 · Read the wall once before grouping

When all cards are on the wall, the first move is not to group them. The first move is to read them. In silence, for twenty minutes, with no clustering allowed.

This step is the affinity diagram's version of the familiarisation pass in thematic analysis. The reason it exists is the same: the first three or four clusters a workshop produces anchor everything that follows, and if those clusters form on the first ten cards somebody happened to read, the rest of the wall sorts itself around an arbitrary start. Reading the whole wall first makes the structure that emerges in step four actually structural, not the artefact of a reading order.

For voice data, the equivalent of reading is listening, in 1x, with the transcripts visible. The hesitation before a word and the laugh that means "I'm being polite" are part of the observation; a wall built from text alone loses them.

04 · Cluster bottom-up, in silence first

The grouping pass starts in silence. Each person in the room moves cards that feel like they belong together into rough piles. No labels yet. No discussion yet. Just movement.

The silent pass exists because the moment somebody says "I think these are about pricing", the wall starts to sort itself by the loudest opinion in the room. Silence keeps the structure honest. After twenty to thirty minutes of silent movement, the natural clusters are usually visible: piles of six to fifteen cards each, with a handful of outliers that do not fit anywhere.

Two patterns to expect:

  • A pile with one or two cards is not yet a theme. It might be a single-participant story worth reading carefully, or it might be an outlier that signals a missing recruit segment. Mark it, do not force it into a larger pile.
  • A pile with more than twenty cards is probably two themes that have not yet separated. Re-read it. The split is usually obvious once the cluster is too big to hold in one sentence.

The output of this step is six to twelve candidate clusters and a small pile of outliers. If you ended at three clusters, you under-clustered (the categories are too broad to act on). If you ended at twenty, you over-clustered (the categories are too narrow to be themes). The right resolution sits between, at the level where each cluster can be named in one sentence with one verb.

05 · Name each cluster in the participants' phrasing

Once the clusters are stable, each one gets a name. The name is a sentence, not a label. "Users care about pricing" is a label. "First-time visitors compare two or three plans and stall when the differences feel arbitrary" is a cluster name. The first describes a topic; the second describes a finding.

The discipline is to write the name from the verbatim on the cards in front of you, not from the topic the cards seem to be about. If the cards say "I closed the tab", "I wanted to talk to someone first", "I came back the next day", the cluster name is closer to "first-time visitors leave to consult somebody before committing" than to "hesitation at the pricing page". The verbatim names the mechanism, not the symptom.

06 · Map relationships and surface the outliers

A wall of named clusters is the middle of the synthesis, not the end. The next move is to draw the relationships between them: which clusters cause which, which are symptoms of others, which contradict each other, which are downstream of the same root.

Two relationship patterns matter most.

The first is cause and effect. A cluster that reads "users do not trust the checkout total" and one that reads "users open a second tab to recompute the price" are not two themes. They are one mechanism with two behavioural traces. Drawing the arrow between them tells the team where to intervene.

The second is contradiction. A cluster that reads "new users want more guidance" and one that reads "returning users want less" are not in tension if they describe different segments. They are in tension if they describe the same segment at different times. Naming which case you are in is what keeps a synthesis from collapsing into a contradiction nobody can act on.

This is also the step where the outlier cards from step four get tested. An outlier that is genuinely one participant's story stays an outlier. An outlier that turns out to share a mechanism with a larger cluster gets folded in, often with a sharper cluster name as a result. The negative case test from our analysis guide lives here in affinity-diagram form: every named cluster needs at least one verbatim that pushes back on it, or the cluster has not been stress-tested.

07 · Convert clusters into one synthesized decision

The deliverable is not the wall. The deliverable is whatever changes on Monday because the wall happened.

Before writing anything up, name the decision the team will make differently because of the synthesis. Then write three to five findings, each one sentence, each tied to one or two clusters, each with two to three verbatim quotes attached. If a cluster cannot be tied to a decision, it goes in a "things we are still figuring out" section, not into the findings, and the team treats it as a question for the next study.

The structure that travels best:

  • Three to five findings, one sentence each, each anchored to a decision.
  • Two to three verbatim quotes per finding, with participant IDs and audio timestamps when the data is voice.
  • An "unresolved" section listing the outliers, the contradictions, and the clusters that did not earn a decision.
  • No methodology slide at the front. The participant count is in the footer.

The longer version of this synthesis pass is in how to synthesize user research. For an affinity diagram specifically, the test is simple: does the document name a decision, and does each finding ship with its receipts.

"I closed the tab when I saw three plans. I wanted to ask my colleague which one she was on before I picked. By the time I came back the next day, I had forgotten which one I was going to try."

Participant · pricing study · price-anchor probe

A workshop wall would have logged this as "pricing confusion." The affinity-diagram version recovers the mechanism: the participant did not stall on price, she stalled on a comparison that required a colleague. A cluster named for the mechanism produces a different decision than a cluster named for the symptom.

Where AI accelerates affinity mapping, and where it doesn't

The honest answer to "can we just have an LLM do this" is: partially, and the parts that matter least.

A language model is good at the mechanical part of step two: reading a transcript, extracting candidate observation cards, attaching the participant ID and the timestamp, tagging the rough topic. Recent peer-reviewed work on LLM-supported thematic summarisation finds substantial agreement between model-extracted candidate codes and human-coded equivalents, at a fraction of the time cost. The model can also propose draft clusters once the cards exist. Those proposals are useful in the same way a first draft is useful: a starting point you correct, not an answer.

The parts the model cannot do are step four (the silent bottom-up grouping where the structure emerges), step five (the cluster naming in the participants' phrasing), and step seven (the conversion into a decision). These steps require judgment about what matters to the team, knowledge of the roadmap context, and the reflexive engagement with the data that separates research from pattern matching. A bullet-pointed cluster summary generated by an LLM with no researcher in the loop reads plausibly, lands flat, and quietly drops the participant's voice.

This is also where a continuously-running synthesis pass changes the shape of the work. Talkful's synthesis engine is essentially an affinity diagram running in real time: as each response lands, the verbatim gets extracted onto a card, the candidate cluster gets proposed, the existing clusters get re-shaped, and the citations stay attached to the original audio. The structured output (themes, quotes, sentiment, citations) is also agent-ready, so whatever the team builds on top (a weekly digest, a roadmap-tagged inbox, a Slack post when a cluster crosses a threshold) reads from the same synthesis pipeline. The wall is no longer a workshop artefact; it is a living view of the corpus that the team checks the way they check a dashboard.

The model does the boring half. The researcher does the half that decides what the team builds next. The split is the same one that holds for transcript analysis and for continuous discovery interviews more broadly: AI for the volume, humans for the judgment.

Where the affinity diagram lives in a continuous practice

A one-off workshop is the worst possible use of the method. The data is stale by the time the wall is photographed, the team has moved on to the next sprint, and the next pile of qualitative data starts collecting in a separate Notion page until somebody books another off-site.

The method works better as a standing instrument. The same Talkful study link sits in the in-product feedback surface, the cancellation confirmation page, the post-onboarding email, and the team's owned distribution channels (a customer newsletter, a community Slack, a partner round-up). Each placement feeds the same synthesis pipeline, which means the affinity diagram is always partially built. The workshop, when it happens, is now a review of an existing wall instead of a construction from scratch.

The same logic holds inside the company. A PM building a contested feature can share the study link in internal channels (engineering, design, support, legal, exec) and get a synthesized view of every stakeholder's input before the build starts. The affinity wall fills with internal observations the same way it fills with external ones, and the cross-functional disagreement (which usually arrives as a meeting at the wrong time) arrives as a cluster in the synthesis instead. The piece on how to build a customer feedback loop covers the standing-instrument pattern in more depth.

FAQ

What is the difference between an affinity diagram and thematic analysis?

An affinity diagram is a workshop-friendly synthesis method that groups individual observations bottom-up into clusters and names the clusters in the participants' language. Thematic analysis is the broader academic framework (formalised by Braun and Clarke) that includes coding, theming, and synthesis, of which the affinity diagram is one possible technique. In practice, an affinity diagram is often the visual front-end of a thematic analysis: the cards are the codes, the clusters are the themes, and the named clusters are the theme statements. The deeper treatment lives in our piece on thematic analysis for user research.

How many cards should an affinity diagram have?

For a typical study with twenty to thirty voice responses across five prompts, expect one hundred fifty to three hundred cards. The rule is one observation per card, in the participant's own language, so the count is driven by the density of the data rather than a target number. A wall with fewer than fifty cards usually means the extraction step was too aggressive in paraphrasing; a wall with more than four hundred usually means duplicates that have not been merged.

Can I run an affinity diagram remotely?

Yes, and most synthesis now happens on a Miro or FigJam board rather than a physical wall. The remote version preserves the method as long as the silent-grouping discipline survives: cards are extracted in advance, the room reads the wall before clustering, and the first grouping pass happens with everyone moving cards without comment. The risk in remote workshops is that the chat fills with opinions before the silent pass ends; turning the chat off for the first thirty minutes is usually enough to restore the discipline.

Should the AI tag cards or just transcribe them?

Both, with the researcher reviewing the tags before clustering. A language model is reliable at the extraction step (reading a transcript and proposing candidate observation cards with participant IDs and timestamps) and at the first-pass tagging step (suggesting a rough topic for each card). It is not reliable at the clustering step, where the structure of the data should emerge from the cards themselves, not from the model's prior. Use AI for the volume, human judgment for the structure. The longer treatment of probing depth and the AI-versus-researcher split sits in how AI follow-up questions work in user research.

How long does an affinity diagram workshop take?

For a workshop run on data that has already been extracted into cards, the clustering, naming, relationship-mapping, and synthesis steps take two to four hours with two or three participants in the room. If the extraction step happens inside the workshop, add another two to three hours per twenty voice responses. The continuous-synthesis version compresses the workshop because the cards and candidate clusters already exist when the team sits down; the meeting becomes a review and decision session rather than a construction session.

When does an affinity diagram fail to add value?

When the decision the synthesis serves is already known. If the team has decided what to build and is collecting evidence to justify it, an affinity diagram will surface clusters that contradict the plan and the workshop will end in a debate about which clusters to ignore. The method works best when the decision is genuinely open and the team is willing to follow the structure of the data even when it cuts across the existing roadmap. If that openness is not there, run a different kind of meeting.


Synthesis gets better when the wall stops being a workshop and starts being a view. The seven steps above survive in any tool, on any board, with any team that takes the silence at step three seriously. The reason we built Talkful is that voice answers carry the densest verbatim, real-time synthesis lets the affinity wall stay current instead of stale, and the structured output (themes, quotes, sentiment, citations) is the same shape whether the team reads it directly or hands it to the agents they build on top. The free plan is enough to run one study, watch the synthesis pipeline build the wall as the responses land, and decide for yourself whether the method feels like the workshop version you used to run.