How to run pricing research that holds up at launch
How to run pricing research that returns a price band, not a flattering number: the method, the traps that hide demand, and what voice catches.
A common scene: a product team picks a price by reading the competitor's pricing page on a Tuesday, ships it with the next release, watches conversion drop, runs a one-week survey asking customers "would you pay $X for this?", reads back 73 percent yeses, holds the price, and six weeks later notices the same conversion drop. The survey said yes. The market said no. The team blames the new positioning, ships a discount, and runs the same kind of survey on the next tier.
The structural problem is not the team's. It is the method. Pricing research, as most teams run it, measures whether a number sounds reasonable in a quiet moment rather than whether a customer would pay it when the alternative is a real one. This is a working guide on pricing research that earns its conclusion: what the method actually is, the three failure modes that fake the result, how to run the study you can decide from, and what voice catches that text answers leave on the floor.
What pricing research is
Pricing research is a set of methods that estimate, before the price is set, how much a target customer would pay, where the demand starts to fall off, and which segments will absorb a higher tier without leaving. The good version measures three things: the band of acceptable prices for a segment, the anchor customers are mentally comparing against, and the reversal point at which the value proposition stops working. The bad version measures one thing, the average willingness to pay, and treats it as a decision.
Pricing research sits between concept testing and the product-market fit survey on the build timeline. Concept testing asks whether a framing of the solution lands. The product-market fit survey measures, after revenue, whether losing the product would disappoint the customer. Pricing research answers the question between them: at what number does the demand survive, and at what number does it collapse. The best short reference on the discipline is Madhavan Ramanujam and Georg Tacke's Monetizing Innovation, which makes the case that price should be designed alongside the product rather than guessed at launch.
Why most pricing research lies to you
Three failure modes show up across most pricing studies that produced a confident number and a disappointing launch. Each one is structural, not effort-related, and they tend to appear together.
The first is detached price anchoring. A participant who is asked "would you pay $49 a month for this?" in a survey is comparing $49 to nothing in particular. They are comparing it to the abstract idea of $49, which most people find tolerable in the abstract. The same $49 compared to the $0 they currently pay for a competitor's free tier returns a different answer. The price that sounds reasonable in a vacuum is rarely the price that converts against an alternative the participant is already using.
The second is measuring intent without action. "How likely are you to pay $X" produces numbers with little predictive value, for the same reason concept-test intent scores do not predict launches. Saying yes is cheap; paying is not. A pricing study that does not anchor to an action the participant would have to take (sign up at this price now, switch from a named current solution, enter a card) measures politeness, not demand.
The third is averaging across segments. The single biggest move available to most pricing studies is to stop reporting one number for the population and start reporting bands per segment. A blended willingness-to-pay score across enterprise and self-serve is not a price; it is two prices averaged into a number that describes neither. The team that ships from the blended average is implicitly choosing to mis-price one of the two segments, usually the one with the higher margin.
How to run pricing research, step by step
Six steps. The order is opinionated: step four (the method choice) is where most teams want to start, and starting there is what produces the study that fakes the result. Steps one through three are what make step four worth running.
01 · Decide which pricing question you're answering
The phrase "pricing research" covers four distinct decisions, and the method that fits one of them is usually the wrong tool for the others. Picking the question explicitly, before recruiting, is the cheapest move available in the entire study.
- Anchor. "What is the customer comparing us to in their head?" A qualitative pass. The output is a list of reference prices the customer already has in mind (competitors, adjacent tools, internal costs the product replaces).
- Band. "What range of prices does this segment find tolerable?" The output is a price band per segment, not a single number. Van Westendorp's Price Sensitivity Meter is the canonical instrument here.
- Elasticity. "How does demand fall off as the price rises?" The Gabor-Granger method walks participants through a sequence of prices and records the drop in stated willingness to buy at each step.
- Tiering. "Which features belong in which tier, and what is the right gap between tiers?" A conjoint or feature-prioritisation pass. Adjacent to pricing but methodologically its own thing.
If the team has not picked which of the four questions the study answers, the result will land in the wrong bucket. A van Westendorp output cannot answer a tiering question, and a conjoint study cannot answer an anchor question. The right tool depends on which decision the team owes the company.
02 · Recruit participants who currently pay for the alternative
Pricing research on people who do not currently pay for anything in the category returns a fantasy number. The willingness-to-pay answers from non-payers are systematically too low, because they have no behavioural reference point, and the willingness-to-pay answers from your existing happy customers are systematically too high, because they already chose you. The right cohort is people who currently pay for a competitor, a workaround, or an adjacent tool that solves part of the same problem.
The screener is built against the spend, not the persona. "Which of these tools do you currently pay for?" is a screener. "Would you be interested in a tool that does X?" is not. The first filters for people with observable purchasing behaviour in the category. The second filters for tire-kickers.
Eight to twelve participants per target segment is usually enough for the qualitative passes (anchor, tiering). For the quantitative passes (van Westendorp, Gabor-Granger), the rule of thumb is closer to 30 to 50 per segment, because the instruments rely on distributional shape rather than thematic saturation. The operational side of recruiting that cohort without polluting the sample is covered in how to recruit user research participants.
03 · Anchor the price to the value they already buy
The single biggest move available in a pricing study, after recruitment, is to put the price in front of the participant in the same context they would actually see it. A price floated as a standalone number ("$49 a month") returns a politeness answer. The same price anchored to the value it would replace ("$49 a month, in place of the $19 you currently pay for X plus the 4 hours a week you spend stitching it to Y") returns a behavioural one.
The brief that anchors well has three pieces: the specific alternative the participant is using today, the specific work that goes away if they switch, and the specific outcome they would gain. The brief that anchors badly has one piece: the price. The first version reads the way a customer would describe the trade to a colleague. The second reads like a pricing page.
For the qualitative passes, the anchor lives in the question text. For the quantitative passes (van Westendorp's four price questions, Gabor-Granger's sequence), the anchor lives in the framing block above the questions. Either way, the price is never asked about in the abstract.
04 · Pick the method that fits the decision
Match the method to the question from step one. Mixing methods in a single study, or running the wrong one for the question, produces results that look quantitative and decide nothing.
- Anchor question. Run an open-ended qualitative interview, voice or text, with eight to twelve participants per segment. Ask what they currently pay for in the category, what they would compare a new entrant to, and what number would feel suspicious (too cheap to be real, too expensive to consider). The output is a list of reference prices in the participant's own words.
- Band question. Run van Westendorp's Price Sensitivity Meter: four price-anchored questions per participant ("at what price would this be so expensive you would not consider it?", "at what price would this be expensive but you would still consider it?", "at what price would this be a bargain?", "at what price would this be so cheap you would question the quality?"). 30 to 50 participants per segment. The output is an acceptable price band, the optimal price point, and the point of marginal cheapness.
- Elasticity question. Run Gabor-Granger: walk each participant through a sequence of prices (anchored, monadic, randomised order across the sample) and record stated willingness to buy at each. 30 to 50 participants per segment. The output is a demand curve, with the elasticity coefficient as the headline.
- Tiering question. Run a feature-prioritisation pass or a small conjoint. For most product teams, a forced-choice prioritisation across feature bundles is enough; a full conjoint is overkill below the 200-participant level.
If you are testing multiple price points across multiple variants, decide between monadic and sequential monadic up front. Monadic shows each participant one price; sequential monadic shows each participant a small number of prices one after another. Monadic is cleaner methodologically (no anchoring or order effects) but requires more participants. Sequential monadic is what most product teams actually run; the trick is to randomise the order of presentation and report on the bias separately.
05 · Probe the number, not the rating
The first answer to any open pricing question is almost always the rehearsed one. The participant has heard the number, taken a moment to do the polite-conscious thing, and delivered an answer that lets them out of the conversation gracefully. The truth, if there is one, lives in the second turn. This is where adaptive follow-up probes pay off.
A well-designed pricing study treats probing depth as a per-question setting. The anchor and band questions benefit from medium-depth probing: when the participant says "$49 sounds high", the system asks "high compared to what?", and the rehearsed answer often gets replaced with a real reference price. The reaction questions on a new tier benefit from expert-depth probing, because the participant's mental comparison is usually under-described on the first pass and the comparison is the entire signal. The demand questions (Gabor-Granger steps, van Westendorp's four prices) benefit from shallow probing: ask one clarifier on the response, then stop, because the participant doing a forced-choice has already done the work and over-probing burns out the response. The full treatment of the depth decision is in how AI follow-up questions work in user research.
"$49 a month? Fine, I guess. Wait, actually, no. I already pay $19 for the one we use and the team would lose their minds at that jump. I'd need it to do the report thing too at that price."
The reversal in the pull-quote is the entire point of the second turn. The first answer was polite. The probe asked one more layer. The honest answer arrived in turn two, and the team has actionable signal: a known anchor ($19), a budget ceiling ("the team would lose their minds"), and a missing feature that would justify the jump ("the report thing"). A static survey would have logged this participant as "fine with $49".
06 · Synthesize by segment, not by average
A common pricing-research synthesis error is to compute a single willingness-to-pay number across all participants and report it as the price. That number is the average of two prices that describe different markets. The useful synthesis is a matrix: segment on one axis, price band on the other, with the anchor and the reversal trigger as additional columns.
Three patterns to look for in the matrix:
- Band concentration. A segment whose band is tight (small spread between "too cheap" and "too expensive") usually has high pricing confidence and is the safest segment to lead with. A diffuse band signals uncertainty, which usually means the value proposition itself has not landed yet.
- Anchor mismatch. Where the reference prices participants name diverge sharply from the team's expectation, the launch positioning is comparing the product to the wrong thing. The fix lives upstream of pricing, in the positioning work.
- Reversal frequency. Across the probed reactions, how often did participants change their stated price after one clarifying question? A high reversal rate (more than ~25 percent) usually means the first-pass numbers are too polite to ship from, and the team should weight the second-turn answers more heavily than the first.
The general synthesis pass is covered in how to analyze user interview transcripts. For pricing specifically, the unit of analysis is the segment-band pair, not the participant.
When to test pricing internally before customers see it
A pattern that under-uses pricing research badly: running it only externally. The same instrument works inside the company, and running it internally first usually saves a round of customer testing. Before the brief and the price questions go to participants, share the same brief and the same question set with engineering, design, support, sales, finance, and the executive sponsor.
The result is a synthesized view of every stakeholder's objection before customers see the number. Engineering will surface the feature whose unit cost makes the price economically wrong. Sales will surface the deal-killer that nobody outside the pipeline knows about (the procurement floor at a target account, the renewal anchor on a key logo). Support will surface the migration friction the brief is silent on. Finance will surface the contribution-margin constraint that the gross willingness-to-pay number obscures.
The async version of the same conversation is a study link shared in internal channels. Each stakeholder gets the brief and the questions; each answers in voice, text, choice, or rating on their own time; the team gets a synthesized view of every objection in less time than scheduling the meeting would have taken. The pre-launch sanity check is the use case where the internal version of pricing research pays off most reliably, because by the time customers see the number it has already survived the people who will own the launch.
When voice changes pricing research
Voice is one of four input modes in a well-run pricing study (voice, text, choice, rating), and the modality choice depends on the question. The anchor and reaction questions benefit most from voice, because the rehearsed-then-real pattern lives in the rhythm of the speech. The half-second pause before "$49 sounds fine" tells you the answer is polite. The "wait, actually..." that prefaces a reversal is a tell that gets lost in writing. A monitored voice answer to a price question returns a transcript that is roughly two to three times longer than the typed equivalent, with the energy of the answer attached. The longer essay on the modality difference is in what we hear when we stop asking people to write, and the operational comparison is in voice vs text surveys.
The band and elasticity questions often do not benefit from voice. Van Westendorp's four prices and Gabor-Granger's stepped sequence are closed-ended by design; voice on those produces longer answers that do not carry more signal. Choice with an optional voice probe on the "too expensive" and "too cheap" answers is usually the right setup.
The point is not to make every pricing question a voice question. The point is to let the participant pick the input that fits the question. A participant on a train answering a price-band question will tap a choice. The same participant at their desk answering the reaction-layer probe will record sixty seconds of voice. Forcing either of them into the other mode loses the answer.
When pricing research is the wrong tool
Three cases where pricing research is the wrong instrument, and running it anyway returns a number that pretends to be a finding.
Pre-product concepts. If the product does not exist yet, pricing research is testing imagination. The participant has nothing to anchor the price against because they cannot observe the value. The right tool upstream of pricing is concept testing, which establishes whether the concept lands; pricing comes after.
Switching costs that dominate the price. In categories where the cost of changing tools (data migration, retraining, contract lock-in) is large relative to the recurring fee, the price itself is not the binding constraint. The pricing study returns a willingness-to-pay number that is technically accurate and operationally irrelevant; the customer would pay the number and still not switch. The right tool is a jobs to be done switch interview on people who have actually switched, to surface the four forces (push, pull, anxiety, habit) that move spend in the category.
Network-effect products with one acceptable price (zero). Consumer products whose value scales with users (messaging, social, marketplaces with weak monetisation) often have a price band that collapses to zero for the consumer side and lives on the supply side. A pricing study on the consumer cohort returns "free", which is correct and useless. The pricing decision sits on the other side of the network, and the study needs to be designed there.
How pricing research connects to PMF, concept testing, and JTBD
Pricing research is one tool in a wider product-research practice. It pairs naturally with three others that show up at different stages of the build:
- Concept testing sits before pricing. It establishes whether the value proposition lands; pricing then establishes what the segment will pay for the validated concept. Running pricing without concept-testing first usually returns a band for an idea nobody wanted.
- The product-market fit survey sits after pricing. The "very disappointed" question, run on customers who are already paying, validates whether the priced product is the must-have version of the concept. The full PMF treatment is in how to run a product-market fit survey.
- Jobs to be done interviews sit alongside pricing and explain the switch. JTBD answers "why did the customer pick us over the alternative?" Pricing answers "at what number does the switch survive?" The two are mirror methods at the same point on the timeline. The piece on jobs to be done interviews covers the switch side.
All four sit inside the wider practice covered in the voice user research guide. The shorthand: concept testing validates the framing, pricing research validates the number, JTBD explains the switch, PMF confirms the fit. None of them replace the others.
FAQ
What is pricing research in product?
Pricing research is a set of methods that estimate, before a price is set, how much a target segment would pay, where the demand curve falls off, and which segments would absorb a higher tier without leaving. The good version measures three things: the band of acceptable prices for a segment, the anchor customers are mentally comparing against, and the reversal point at which the value proposition stops working. The bad version returns one average willingness-to-pay number and treats it as a decision.
How is pricing research different from a willingness-to-pay survey?
A willingness-to-pay survey is one instrument within pricing research, usually a van Westendorp Price Sensitivity Meter or a Gabor-Granger sequence. Pricing research is the wider discipline, which also covers anchor research (what the customer compares us to), tiering research (what belongs in which tier), and segment analysis (which cohort absorbs which band). A willingness-to-pay survey answers the band question for one segment; pricing research answers all four questions across all segments.
How many participants do you need for pricing research?
It depends on the method. Qualitative passes (anchor, tiering) work with eight to twelve participants per target segment. Quantitative passes (van Westendorp, Gabor-Granger) need 30 to 50 participants per segment, because they rely on distributional shape rather than thematic saturation. A blended sample across multiple segments produces a blended price that describes none of them; budget participants by segment, not by total study.
What is the van Westendorp Price Sensitivity Meter?
The Price Sensitivity Meter is a four-question instrument that asks each participant, for a defined product or concept, the prices at which it would be too expensive, expensive but still considered, a bargain, and so cheap that quality is in doubt. Plotted across a sample, the four curves intersect at points that mark an acceptable price band, an optimal price point, and the threshold below which the offer reads as suspect. It is decades old and remains the standard tool for the band question.
Can pricing research predict whether a customer will pay?
Negative results are reliable; positive results are not. Pricing research is good at identifying prices that will not work (the band collapses, the anchor mismatches the positioning, the reversal rate is high). It is worse at predicting which prices inside a valid band will scale, because the band only describes stated willingness in a structured study; real-world conversion adds friction the study cannot reproduce. Treat the output as a filter (eliminate prices that fail), not as a forecast (pick the price that wins).
What is the difference between pricing research and a product-market fit survey?
Pricing research is pre-launch or pre-revenue: it measures the price band for a concept or product against a cohort that has not yet bought. A product-market fit survey is post-launch and post-revenue: it measures the disappointment a paying customer would feel at losing the product. The two are sequential. Pricing research sets the price; the PMF survey, after the price is live, validates whether that price made the product a must-have for the right segment. The full operational treatment is in the product-market fit survey playbook.
Pricing research fails when the price is asked in the abstract, the sample is built around the wrong cohort, and the deliverable is a single average willingness-to-pay number. It works when the price is anchored to the value the participant already buys, the sample is segmented before recruitment, the method is matched to the decision, and the synthesis returns a band per segment rather than a number for the population. Talkful is built for the second shape: a study link goes out, participants answer in voice, text, choice, or rating on their own time, the AI interviewer probes the polite first answers into the honest second ones, and the synthesis engine returns a segment-by-band matrix the team can decide a launch price from. The wider voice user research guide covers where the method sits inside a continuous practice.