A/B Test Sample Size Calculator

Calculate the required sample size for statistically significant A/B test results.

Baseline Conversion Rate (%)

Your current conversion rate

Minimum Detectable Effect (%)

Relative change you want to detect

Significance Level (%)

Confidence level (typically 95%)

Statistical Power (%)

Ability to detect true effect (typically 80%)

Daily Traffic (Optional)

Estimate test duration based on daily visitors

How to Use This A/B Test Calculator

Baseline Conversion Rate: Enter your current conversion rate (e.g., 2.5% means 2.5 out of 100 visitors convert).

Minimum Detectable Effect: The relative change you want to detect (e.g., 20% improvement on a 2.5% baseline = 3.0% new rate).

Significance Level: Probability of not making a false positive error (95% is standard, meaning 5% chance of false positive).

Statistical Power: Probability of detecting a true effect if it exists (80% is standard, meaning 20% chance of false negative).

Daily Traffic: Optional - add your daily visitor count to estimate how long the test will take.

Why Sample Size Matters

Running an A/B test without proper sample size is like flipping a coin three times and declaring heads "the winner." You need enough data to distinguish real patterns from random noise.

Too small a sample and you'll miss real improvements (false negatives) or declare winners that aren't actually better (false positives). Too large and you're wasting time and traffic on tests that could have concluded earlier.

The calculator uses the standard statistical formula accounting for both Type I errors (false positives, controlled by significance level) and Type II errors (false negatives, controlled by statistical power). This ensures your test results are both accurate and reliable.

Remember: hitting the calculated sample size is when you can check results, not when you must stop. If you're borderline, run the test longer. Statistical significance isn't a finish line—it's a minimum threshold.

A/B Testing Best Practices

Test one variable at a time: If you change the headline AND the CTA, you won't know which caused the lift.
Run full weeks: User behavior varies by day. Always test in complete 7-day cycles to avoid weekday bias.
Don't peek early: Checking results before reaching sample size increases false positive risk dramatically.
Account for seasonality: Black Friday data is different from January. Context matters.
Validate with multiple tests: One winning test might be luck. Consistent patterns across tests build confidence.