Glossary¶

Assumptions used in analysis¶

Sample-size estimation

Treatment does not affect variance

Variance in treatment and control is identical

Mean of delta is normally distributed

Welch t-test

Mean of means is t-distributed (or normally distributed)

In general

Sample represents underlying population

Entities are independent

Per-entity ratio vs. ratio of totals¶

There are two different definitions of a ratio metric (think of e.g. conversion rate, which is the ratio between the number of orders and the number of visits): 1) one that is based on the entity level or 2) ratio between the total sums, and ExpAn supports both of them.

In a nutshell, one can reweight the individual per-entity ratio to calculate the ratio of totals. This enables to use the existing statistics.delta() function to calculate both ratio statistics (either using normal assumtion or bootstraping).

Calculating conversion rate¶

As an example let’s look at how to calculate the conversion rate, which might be typically defined per-entity as the average ratio between the number of orders and the number of visits:

\[\overline{CR}^{(pe)} = \frac{1}{n} \sum_{i=1}^n CR_i = \frac{1}{n} \sum_{i=1}^n \frac{O_i}{V_i}\]

The ratio of totals is a reweighted version of \(CR_i\) to reflect not the entities’ contributions (e.g. contribution per custormer) but overall equal contributions to the conversion rate, which can be formulated as:

\[CR^{(rt)} = \frac{\sum_{i=1}^n O_i}{\sum_{i=1}^n V_i}\]

Overall as Reweighted Individual¶

One can calculate the \(CR^{(rt)}\) from the \(\overline{CR}^{(pe)}\) using the following weighting factor (easily proved by paper and pencile):

\[CR^{(rt)} = \frac{1}{n} \sum_{i=1}^n \alpha_i \frac{O_i}{V_i}\]

with

\[\alpha_i = n \frac{V_i}{\sum_{i=1}^n V_i}\]

Weighted delta function¶

To have such functionality as a more generic approach in ExpAn, we can introduce a weighted delta function. Its input are

The per-entity metric, e.g. \(O_i/V_i\)
A reference metric, on which the weighting factor is based, e.g. \(V_i\)

With this input it calculates \(\alpha\) as described above and outputs the result of statistics.delta().

Early stopping¶

Given samples x from treatment group, samples y from control group, we want to know whether there is a significant difference between the means \(\delta=\mu(y)−\mu(x)\). To save the cost of long-running experiments, we want to stop the test early if we are already certain that there is a statistically significant result.

You can find links to our detailed documentations for concept of early stopping and early stopping methods we investigated.

Subgroup analysis¶

Subgroup analysis in ExaAn will select subgroup (which is a segment of data) based on the input argument, and then perform a regular delta analysis per subgroup as described before.

That is to say, we don’t compare between subgroups, but compare treatment with control within each subgroup.

Multiple testing problem¶

ToDo