What we hear when we stop asking people to write

Text responses favor confident writers. Voice responses favor honest ones. A note on the quiet difference between typing and talking in user research.

Rizvi Haider·April 19, 2026·5 min read

The standard user research tool assumes a peculiar kind of person. Someone who is, at the moment you ask, sitting down. Alone. With a keyboard. Willing to type for three minutes about a frustration they half-remember. Comfortable enough with written English (or whatever language your survey shipped in) to compose something coherent in a form field that keeps nagging them for 80 more characters.

That person exists. They answer every survey. They are also, overwhelmingly, not your user.

The text-only research economy

Open any modern research tool and the same affordance greets you: a rectangle. Type here. Be articulate. Be on-brand. Be grammatical. Try to remember what happened last Tuesday.

The rectangle is cheap to build and easy to index, so an entire research economy has been built around it. Typeform, SurveyMonkey, Dovetail, Maze, Great Question. Every one of them asks the same thing, which is for the participant to translate a messy, embodied experience into written English fast enough that they don't close the tab.

The translation tax is real. It is also heavier for exactly the people you most need to hear from: non-native speakers, people in a hurry, people who don't think of themselves as "writers," people who would answer your question in thirty seconds on the phone but will abandon the form at seventy.

Three things voice surfaces that text doesn't

01 · Hesitation

When someone hesitates inside a text box, you don't see it. The cursor blinks, they delete a sentence, they try again, and what you receive is the third attempt, already edited into something presentable. The hesitation itself, the part that would have told you what the real blocker is, never makes it out of the draft.

When someone hesitates on a voice recording, you hear it. The pause before the word budget. The "um" right before "honestly, I just...". The sentence that trails off because they realized halfway through that their reason wasn't the reason they thought it was.

Hesitation is signal. Text erases it. Voice preserves it.

"I mean, I... [pause]... I guess I just didn't trust that it would actually work? Which sounds stupid when I say it out loud."

Participant · #2814 · talking about onboarding

Notice what that quote would have been in text. "I didn't trust the product." Clean, forgettable, actionable by nobody. The pause is where the insight lived.

02 · Enthusiasm

The other asymmetry is on the positive side. Enthusiasm in text is a row of exclamation marks, which every researcher learns to discount because they arrive whether the person meant them or not. Enthusiasm in voice is a change of pace: someone going from slow and careful to fast, leaning into the microphone. You can hear it before they finish the sentence.

We have stopped tagging sentiment in transcripts as a separate analytic step. It's already in the clip. The transcript tells us what they liked; the clip tells us how much.

03 · Stories

Ask a participant to describe how they use your product in a text box, and you'll get a summary: three bullet points, present tense, generalized. Ask them out loud and you'll get a story: "So last Thursday I was trying to... wait, let me start over. We had this launch and..."

Stories are chronologically specific in a way that summaries aren't, and chronological specificity is where most real product decisions come from. A summary tells you the user uses a feature. A story tells you what they tried first, what failed, what they fell back on, and when they finally gave up.

Where text still wins

Text is better than voice in two specific cases, and it's worth naming them so this essay doesn't read as triumphalist.

The first is ranking. If your question is "which of these five prices feels right," a radio button is better than a recording. Voice is bad at sorting because spoken language doesn't compress into ordered lists without a lot of wasted breath.

The second is numbers. Asking someone how many times they used a feature last week is a survey question, not a research question. Use a survey. If you find yourself asking a lot of those, you might not need research. You might need analytics.

The Talkful position isn't that text is dead. It's that text is wildly over-specified for qualitative research, and the tools have conditioned everyone (participants and PMs both) to believe the rectangle is the only way to listen.

What changes when the medium changes

Three things change the day you switch from typed answers to voice answers, and you should know all three before you commit.

The mental model moves from survey to conversation. Participants stop writing sentences to impress you and start talking to a person. The tone of the data shifts before the content does.
Analysis gets harder, then easier. Raw audio is harder than raw text: there's more of it, it's less searchable, you can't ctrl-F it. But an LLM that's been given good transcripts plus timestamps produces far richer synthesis than one fed form responses, because there's more in there to synthesize. The work isn't heavier. It's redistributed.
You find out which of your questions were actually surveys. When the medium is voice, the questions that fail are the ones that could have been a multiple choice. That's a useful filter: it forces a research plan to be about things worth hearing a human answer.

A small bet

We built Talkful on the suspicion that most of the research industry has optimized for the wrong constraint: the ease of collecting text rather than the honesty of what gets collected. We might be wrong. We're publishing what we learn in either direction, because we'd rather be corrected in public than comfortable in private.

If you run user research and you've been getting thin responses from forms, try it out loud. You don't have to believe us about the 2.7×. Run five participants, compare the transcripts side by side, and notice how much more of your data is made of actual sentences people would say to another person.

That's the only test that matters. Ship what you hear.