Tutorial¶
Here is a tutorial to use ExpAn. Let’s get started!
Generate demo data¶
First, let’s generate some random data for the tutorial.
from expan.core.util import generate_random_data
data, metadata = generate_random_data()
data
is a pandas DataFrame.
It must contain a column for entity identifier named entity,
a column for variant, and one column per kpi/feature.
metadata
is a python dict. It should contain the following keys:
experiment
: Name of the experiment, as known to stakeholders. It can be anything meaningful to you.sources
(optional): Names of the data sources used in the preparation of this data.experiment_id
(optional): This uniquely identifies the experiment. Could be a concatenation of the experiment name and the experiment start timestamp.retrieval_time
(optional): Time that data was fetched from original sources.primary_KPI
(optional): Primary evaluation criteria.
Currently, metadata
is only used for including more information about the experiment,
and is not taken into consideration for analysis.
Create an experiment¶
To use ExpAn for analysis, you first need to create an Experiment
object.
from expan.core.experiment import Experiment
exp = Experiment(metadata=metadata)
This Experiment
object has the following parameters:
metadata
: Specifies an experiment name as the mandatory and data source as the optional fields. Described above.
Create a statistical test¶
Now we need a StatisticalTest
object to represent what statistical test to run.
Each statistical test consist of a dataset, one kpi, treatment and control variant names, and the optional features.
Dataset should contain necessary kpis, variants and features columns.
from expan.core.statistical_test import KPI, Variants, StatisticalTest
kpi = KPI('normal_same')
variants = Variants(variant_column_name='variant', control_name='B', treatment_name='A')
test = StatisticalTest(data=data, kpi=kpi, features=[], variants=variants)
Let’s start analyzing!¶
Running an analysis is very simple:
exp.analyze_statistical_test(test)
Currently analyze_statistical_test
supports 4 test methods: fixed_horizon
(default), group_sequential
, bayes_factor
and bayes_precision
.
All methods requires different additional parameters.
If you would like to change any of the default values, just pass them as parameters to delta. For example:
exp.analyze_statistical_test(test, test_method='fixed_horizon', assume_normal=True, percentiles=[2.5, 97.5])
exp.analyze_statistical_test(test, test_method='group_sequential', estimated_sample_size=1000)
exp.analyze_statistical_test(test, test_method='bayes_factor', distribution='normal')
Here is the list of additional parameters. You may also find the description in our API page.
fixed_horizon is the default method:
assume_normal=True
: Specifies whether normal distribution assumptions can be made. A t-test is performed under normal assumption. We use bootstrapping otherwise. Bootstrapping takes considerably longer time than assuming the normality before running experiment. If we do not have an explicit reason to use it, it is almost always better to leave it off.alpha=0.05
: Type-I error rate.min_observations=20
: Minimum number of observations needed.nruns=10000
: Only used if assume normal is false.relative=False
: If relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values.
group_sequential is a frequentist approach for early stopping:
spending_function='obrien_fleming'
: Currently we support only Obrient-Fleming alpha spending function for the frequentist early stopping decision.estimated_sample_size=None
: Sample size to be achieved towards the end of experiment. In other words, the actual size of data should be always smaller than estimated_sample_size.alpha=0.05
: Type-I error rate.cap=8
: Upper bound of the adapted z-score.
bayes_factor is a Bayesian approach for delta analysis and early stopping:
distribution='normal'
: The name of the KPI distribution model, which assumes a Stan model file with the same name exists. Currently we support normal and poisson models.num_iters=25000
: Number of iterations of bayes sampling.inference=sampling
: ‘sampling’ for MCMC sampling method or ‘variational’ for variational inference method to approximate the posterior distribution.
bayes_precision is another Bayesian approach similar as bayes_factor:
distribution='normal'
: The name of the KPI distribution model, which assumes a Stan model file with the same name exists. Currently we support normal and poisson models.num_iters=25000
: Number of iterations of bayes sampling.posterior_width=0.08
: The stopping criterion, threshold of the posterior width.inference=sampling
: ‘sampling’ for MCMC sampling method or ‘variational’ for variational inference method to approximate the posterior distribution.
Interpreting result¶
The output of the analyze_statistical_test
method is an instance of class core.result.StatisticalTestResult
.
Please refer to the API page for result structure as well as descriptions of all fields.
An example of the result is shown below:
{
"result": {
"confidence_interval": [
{
"percentile": 2.5,
"value": 0.1
},
{
"percentile": 97.5,
"value": 1.1
}],
"control_statistics": {
"mean": 0.0,
"sample_size": 1000,
"variance": 1.0
},
"delta": 1.0,
"p": 0.04,
"statistical_power": 0.8,
"treatment_statistics": {
"mean": 1.0,
"sample_size": 1200,
"variance": 1.0
}
},
"test": {
"features": [],
"kpi": {
"name": "revenue"
},
"variants": {
"control_name": "control",
"treatment_name": "treatment",
"variant_column_name": "variant"
}
}
}
Subgroup analysis¶
Subgroup analysis in ExaAn will select subgroup (which is a segment of data) based on the input argument, and then perform a regular delta analysis per subgroup as described before. That is to say, we don’t compare between subgroups, but compare treatment with control within each subgroup.
If you wish to perform the test on a specific subgroup,
you can use the FeatureFilter
object:
feature = FeatureFilter('feature', 'has')
test = StatisticalTest(data=data, kpi=kpi, features=[feature], variants=variants)
Statistical test suite¶
It is very common to run a suite of statistical tests.
In this case, you need to create a StatisticalTestSuite
object to represent the test suite.
A StatisticalTestSuite
object consists of a list of StatisticalTest
and a correction method:
from expan.core.statistical_test import *
kpi = KPI('normal_same')
variants = Variants(variant_column_name='variant', control_name='B', treatment_name='A')
feature_1 = FeatureFilter('feature', 'has')
feature_2 = FeatureFilter('feature', 'non')
feature_3 = FeatureFilter('feature', 'feature that only has one data point')
test_subgroup1 = StatisticalTest(data, kpi, [feature_1], variants)
test_subgroup2 = StatisticalTest(data, kpi, [feature_2], variants)
test_subgroup3 = StatisticalTest(data, kpi, [feature_3], variants)
tests = [test_subgroup1, test_subgroup2, test_subgroup3]
test_suite = StatisticalTestSuite(tests=tests, correction_method=CorrectionMethod.BH)
And then you can use the `Experiment`
instance to run the test suite.
Method analyze_statistical_test_suite
has the same arguments as analyze_statistical_test
. For example:
exp.analyze_statistical_test_suite(test_suite)
exp.analyze_statistical_test_suite(test_suite, test_method='group_sequential', estimated_sample_size=1000)
exp.analyze_statistical_test_suite(test_suite, test_method='bayes_factor', distribution='normal')
Result of statistical test suite¶
The output of the analyze_statistical_test_suite
method is an instance of class core.result.MultipleTestSuiteResult
.
Please refer to the API page for result structure as well as descriptions of all fields.
Following is an example of the analysis result of statistical test suite:
{
"correction_method": "BH",
"results": [
{
"test": {
"features": [
{
"column_name": "device_type",
"column_value": "desktop"
}
],
"kpi": {
"name": "revenue"
},
"variants": {
"control_name": "control",
"treatment_name": "treatment",
"variant_column_name": "variant"
}
},
"result": {
"corrected_test_statistics": {
"confidence_interval": [
{
"percentile": 1.0,
"value": -0.7
},
{
"percentile": 99.0,
"value": 0.7
}
],
"control_statistics": {
"mean": 0.0,
"sample_size": 1000,
"variance": 1.0
},
"delta": 1.0,
"p": 0.02,
"statistical_power": 0.8,
"treatment_statistics": {
"mean": 1.0,
"sample_size": 1200,
"variance": 1.0
}
},
"original_test_statistics": {
"confidence_interval": [
{
"percentile": 2.5,
"value": 0.1
},
{
"percentile": 97.5,
"value": 1.1
}
],
"control_statistics": {
"mean": 0.0,
"sample_size": 1000,
"variance": 1.0
},
"delta": 1.0,
"p": 0.04,
"statistical_power": 0.8,
"treatment_statistics": {
"mean": 1.0,
"sample_size": 1200,
"variance": 1.0
}
}
}
},
{
"test": {
"features": [
{
"column_name": "device_type",
"column_value": "mobile"
}
],
"kpi": {
"name": "revenue"
},
"variants": {
"control_name": "control",
"treatment_name": "treatment",
"variant_column_name": "variant"
}
},
"result": {
"corrected_test_statistics": {
"confidence_interval": [
{
"percentile": 1.0,
"value": -0.7
},
{
"percentile": 99.0,
"value": 0.7
}
],
"control_statistics": {
"mean": 0.0,
"sample_size": 1000,
"variance": 1.0
},
"delta": 1.0,
"p": 0.02,
"statistical_power": 0.8,
"stop": false,
"treatment_statistics": {
"mean": 1.0,
"sample_size": 1200,
"variance": 1.0
}
},
"original_test_statistics": {
"confidence_interval": [
{
"percentile": 2.5,
"value": 0.1
},
{
"percentile": 97.5,
"value": 1.1
}
],
"control_statistics": {
"mean": 0.0,
"sample_size": 1000,
"variance": 1.0
},
"delta": 1.0,
"p": 0.04,
"statistical_power": 0.8,
"stop": true,
"treatment_statistics": {
"mean": 1.0,
"sample_size": 1200,
"variance": 1.0
}
}
}
}
]
}
That’s it!
For API list and theoretical concepts, please read the next sections.