Power Analysis

 1. Conduct power analysis

  • This step is about calculating the required sample size for an A/B test.

  • The goal is to make sure the experiment is:

    • Large enough to detect a real effect,

    • But not unnecessarily long or expensive.

2. The four key inputs of power analysis

2.1. Effect size

  • The minimum change you want to be able to detect, for example:
    conversion rate increasing from 5% to 5.5%.

  • It should be practically meaningful for the business, not just statistically significant.

  • Usually defined based on input from product managers and stakeholders.

2.2. Power

  • The probability of detecting a true effect if it actually exists.

  • Typically set to 0.8 (80%).

  • This means: if there is a real effect, you have an 80% chance of detecting it.

2.3. Significance level (alpha)

  • The probability of a false positive, i.e. concluding there is an effect when there is none.

  • Typically set to 0.05 (5%).

2.4. Variance

  • The variability of the metric being measured.

  • Usually estimated from historical data.

  • Higher variance, larger sample size is required.

3. Output of power analysis

  • The main output is the required sample size to detect the Minimum Detectable Effect (MDE).

  • From the sample size, you can estimate:

    • How long the experiment needs to run,

    • Given the current user traffic.

4. Other factors affecting experiment duration

  • Seasonality: user behavior may differ between weekdays and weekends.

  • Metric variability over time: daily fluctuations can affect stability.

  • App adoption: if users must update the app, the update rate affects how fast you collect samples.

5. What is a ramp-up strategy?

  • A ramp-up strategy means gradually increasing the percentage of users exposed to the experiment.

  • Purpose:

    • Reduce risk if there are bugs or negative impacts,

    • Detect issues early before affecting a large population.

6. Example ramp-up strategy

  • Day 1: expose 1% of users

  • Day 2: if metrics look normal, increase to 10%

  • Day 3: if still stable, increase to 50% to reach the required sample size

7. When is ramp-up not necessary?

  • For small, low-risk changes, it may be acceptable to start directly at the required sample size.

  • Ramp-up is recommended, but not mandatory.

Comments

Popular posts from this blog

Assimil German 5-10

Key metrics LONG -VN 1-10

ITA 22