Home Insights Marketing Performance

What Is A/B Testing?

What Is A/B Testing? In 2025, 58% of companies were actively using A/B testing for conversion rate optimisation, according to Invesp’s State of A/B Testing report. The remaining 42% were...

What Is A/B Testing?

In 2025, 58% of companies were actively using A/B testing for conversion rate optimisation, according to Invesp’s State of A/B Testing report. The remaining 42% were making decisions based on opinion, instinct, or assumption. The gap between those two groups shows up in revenue. A/B testing is not about guessing better. It is about replacing opinion with evidence — running a structured experiment so the market tells you what works, rather than your internal team agreeing on what looks good.

Key Takeaways

  • A/B testing splits your audience between two versions of a page, email, or element and measures which performs better against a defined goal.
  • In 2025, 67% of digital teams reported that at least one in three of their tests produced measurable business impact, typically a 10–20% lift in conversion or revenue (VWO, 2025 State of Optimisation).
  • Only 1 in 8 A/B tests produce a statistically significant result — which is why testing discipline, not testing volume, determines ROI.
  • Statistical significance at 95% confidence is the minimum threshold before acting on any result.
  • A/B testing isolates one variable; multivariate testing examines how multiple elements interact simultaneously.

What Is A/B Testing and How Does It Work?

In 2025, 60% of companies rated A/B testing as “highly valuable” for conversion optimisation, according to Invesp — yet the mechanics remain widely misunderstood. A/B testing is a controlled experiment. You take one element of a digital experience, create an alternative version, split your traffic between the original (the control) and the alternative (the variant), then measure which version drives better performance against a specific goal.

The critical word is one. A proper A/B test isolates a single variable. Change the headline and only the headline. Change the button colour and only the button colour. The moment you alter two elements at once, you lose the ability to identify which change drove the result. That is not A/B testing — it is wishful thinking dressed as an experiment.

Traffic is split randomly and simultaneously. Visitors are assigned to either the control or variant group without knowing it. The test runs until you reach a statistically significant result or a pre-determined sample size, whichever comes first. The winning version is then deployed to 100% of traffic. The process then repeats with the next hypothesis.

A/B testing does not confirm what looks good. It reveals what works. Those are rarely the same thing.

Citation capsule: A/B testing is a controlled experiment that exposes two audience segments to two versions of a single variable simultaneously. According to Invesp’s 2025 State of A/B Testing, 58% of companies use it as their primary conversion rate optimisation method, making it the most widely adopted structured decision-making tool in digital marketing.

What Can You A/B Test?

A/B testing can be applied to any element of a digital experience where user behaviour generates measurable data. In 2025, according to VWO’s State of Optimisation survey, product detail pages account for 38% of all experiments run on e-commerce sites — but the scope extends well beyond landing pages. Headlines, calls to action, email subject lines, pricing displays, images, form lengths, and navigation structures are all testable variables with documented impact.

The following table illustrates the most commonly tested elements and the typical conversion lift range observed when a winning variant is found. These ranges reflect results from tests that reached statistical significance — not averages across all tests run.

Element Tested Typical Lift Range (Winning Tests) Source
Call-to-action (CTA) button copy or colour 15%–49% improvement in click-through rate Invesp, 2024
Email subject line Up to 30% improvement in open rate Salesforce / MailerLite, 2024
Landing page headline 10%–28% improvement in conversion rate VWO, 2025 State of Optimisation
Product detail page layout 12%–28% improvement in conversion rate VWO, 2025 State of Optimisation
Checkout flow 8%–25% improvement in completion rate VWO, 2025 State of Optimisation
Pricing page structure Variable; test-specific HubSpot, 2024
Hero image or visual 10%–20% improvement in engagement Optimizely, 2024

The principle across all these elements is identical: form a specific hypothesis (“changing the CTA from ‘Learn More’ to ‘Get My Free Quote’ will increase form submissions”), run the test, measure the result. The element itself matters less than the rigour of the hypothesis behind it.

Citation capsule: Across tested elements in 2025, CTA optimisation through A/B testing delivered up to 49% improvement in click-through rates and email subject line testing improved open rates by up to 30%, according to Invesp and Salesforce respectively. The consistent finding is that opinion-based design decisions are routinely outperformed by evidence-based alternatives.

What Is Statistical Significance and Why Does It Matter?

In 2025, only 1 in 8 A/B tests produced a statistically significant result, according to Invesp — which means the majority of tests conclude without a clear winner. Statistical significance is the mechanism that separates a real performance difference from random noise. Without it, a business acts on coincidence and calls it strategy.

Here is the concept without the mathematics. Imagine flipping a coin ten times and getting seven heads. That does not prove the coin is biased — it might simply be chance. A/B testing faces the same problem. A variant that appears to outperform the control after 200 visitors may simply have received a lucky sample. Statistical significance is the confidence threshold that tells you the observed difference is unlikely to be explained by chance alone.

The accepted industry standard is 95% confidence. This means there is only a 5% probability that the result you are observing occurred by chance. It does not guarantee the result is correct — it means the odds are 19 to 1 in your favour. For high-stakes decisions (pricing changes, homepage redesigns), some practitioners raise the threshold to 99%.

Three practical rules follow from this:

  • Pre-calculate your sample size before the test begins. Use a sample size calculator based on your baseline conversion rate and the minimum detectable effect you care about. Optimizely publishes a free tool for this purpose.
  • Do not stop the test early because the variant appears to be winning. Early termination is the single most common source of false positives in A/B testing.
  • Run tests for at least two full weeks, even if you reach your sample size sooner. This accounts for weekly behavioural cycles — a result that holds on Tuesday may not hold on Sunday.

Citation capsule: The industry-standard threshold for A/B test validity is 95% statistical significance, meaning the observed result has only a 5% probability of occurring by chance. According to Optimizely’s sample size guidance (2024), both sample volume and test duration — recommended at a minimum of two weeks — must be satisfied before acting on any result.

What Is the Difference Between A/B Testing and Multivariate Testing?

A/B testing and multivariate testing are related but distinct tools. In 2025, according to Invesp’s analysis, multivariate testing (MVT) becomes the optimal approach when you need to understand how elements interact with each other — not simply which of two versions performs better. Choosing the wrong tool for the question is a common and costly mistake.

A/B testing answers one question: does version A or version B perform better? It is fast, requires less traffic, and produces clear, actionable results. It is the correct tool when you have a single, well-formed hypothesis about one specific element.

Multivariate testing answers a more complex question: how do multiple elements interact to drive performance? Instead of testing one headline, you might simultaneously test three headlines, two images, and two CTA colours — generating up to 12 combinations. The test identifies not just which individual elements perform best, but which combination of elements produces the optimal result.

The trade-off is traffic volume. A multivariate test with 12 combinations requires roughly 12 times the traffic of an equivalent A/B test to reach the same statistical confidence. This makes MVT impractical for most small and mid-sized businesses. The decision framework is straightforward:

  • Use A/B testing when you have a clear single-variable hypothesis and moderate traffic.
  • Use multivariate testing when you have high traffic, a complex page with multiple interacting elements, and the capacity to wait for results across many combinations.

Most organisations should master A/B testing before attempting multivariate testing. The two are not interchangeable. They answer different questions.

Citation capsule: Multivariate testing examines how multiple page elements interact to influence conversions, whereas A/B testing isolates a single variable. According to Invesp’s 2024 analysis, MVT becomes optimal only when element interdependencies are the primary question — and it demands significantly greater traffic volume to reach statistical validity across all variable combinations.

What Are Common A/B Testing Mistakes?

In 2025, 80% to 90% of A/B tests produced no statistically significant result, according to Invesp — and a significant portion of this failure rate is attributable to avoidable process errors, not the limits of the methodology. A/B testing is only as reliable as the discipline applied to it. Four mistakes account for the majority of wasted test cycles.

Testing too many variables at once. When two or more elements change between control and variant, it becomes impossible to attribute the result to a specific change. The test produces a number, not an insight. Businesses then act on that number as if it were evidence. It is not.

Ending tests too early. Checking results daily and stopping the moment the variant appears to lead is the most reliable way to generate false positives. A variant that leads after three days may trail after three weeks. Commit to the pre-determined sample size and duration before the test begins, and do not touch it mid-flight.

Ignoring seasonality and external events. A test run across a public holiday, a major news event, or a promotional period is contaminated data. Visitor behaviour during those periods is not representative of normal behaviour. If an external event occurs during a test, the test should be paused or discarded and restarted under stable conditions.

Failing to document results. In 2025, 54% of companies had reached strategic or transformative levels of experimentation maturity, up from 35% in 2021, according to Convert.com’s experimentation maturity data. The difference between mature and immature programmes is almost always documentation. An undocumented test is an experience that disappears when the person who ran it leaves the organisation. A documented result is institutional knowledge.

Citation capsule: The most damaging A/B testing mistakes — early test termination, multi-variable contamination, and absent documentation — are process failures, not methodological ones. Invesp’s 2025 data shows that 80–90% of tests produce no statistically significant result, underscoring that testing discipline, not testing volume, determines whether an experimentation programme generates genuine competitive advantage.

Frequently Asked Questions

How long should an A/B test run?

The recommended minimum duration is two weeks, regardless of when you reach your pre-calculated sample size. This accounts for weekly behavioural cycles in user traffic. According to AB Tasty’s 2024 sample size guidance, tests shorter than one week risk capturing non-representative behaviour patterns, particularly when weekend and weekday traffic differ significantly in intent or conversion rate.

How much traffic do you need to run an A/B test?

Sample size requirements depend on your baseline conversion rate and the minimum detectable effect you care about. As a practical benchmark, a page converting at 2% typically requires 10,000 or more visitors per variant to detect a 20% relative improvement at 95% confidence. Low-traffic sites should focus on fewer, higher-impact hypotheses rather than running continuous tests that will never reach significance.

Can you A/B test email campaigns?

Yes. Email is one of the highest-return channels for A/B testing. Subject line testing alone can improve open rates by up to 30%, according to MailerLite’s 2024 email optimisation data. Testable email elements include subject lines, sender names, preview text, send time, CTA copy, and email length. Most enterprise email platforms include native A/B testing functionality at no additional cost.

What is the difference between A/B testing and split testing?

The terms are used interchangeably in most industry contexts. Technically, “split testing” sometimes refers to splitting traffic between two entirely different page URLs, while “A/B testing” can refer to on-page variation testing within a single URL. In practice, both terms describe the same experimental structure: a control, a variant, randomised traffic allocation, and performance measurement against a defined metric.

How do you know if your A/B test result is reliable?

A result is considered reliable when it reaches 95% statistical significance — meaning there is only a 5% probability the observed difference occurred by chance. According to Optimizely’s experimentation documentation (2024), reliability also requires meeting the pre-calculated sample size threshold and running the test for a minimum of two complete weekly cycles. Both conditions must be met before any result is acted upon.

Conclusion

A/B testing is the structured practice of replacing assumption with evidence. It does not require sophisticated technology or large teams. It requires a clear hypothesis, a single variable, sufficient traffic, and the discipline to wait for statistical confidence before drawing a conclusion. The 58% of companies actively running tests in 2025 are not necessarily smarter than their competitors. They have simply chosen to let their market — not their opinions — make decisions. For any business operating a website, running email campaigns, or making product or pricing decisions online, A/B testing is not a specialist tool. It is the minimum standard for evidence-based management.

If your organisation is not currently running structured experiments, the starting point is not software — it is a single hypothesis about one element you believe is underperforming, and a plan to test it against an alternative. Start there. Build the discipline before building the programme.

Share

Intelligence,
delivered.

Our thinking, direct to your inbox. No noise. Only perspectives worth your time.

No spam. Unsubscribe at any time.