expan.core package¶
Submodules¶
expan.core.binning module¶
NB: This module is deprecated.
-
class
expan.core.binning.
Bin
(bin_type, *repr_args)¶ Bases:
object
Constructor for a bin object. :param id: identifier (e.g. bin number) of the bin :param bin_type: “numerical” or “categorical” :param repr_args: arguments to represent this bin. args for numerical bin includes lower, upper, lower_closed, upper_closed args for categorical bin includes a list of categories for this bin.
-
class
expan.core.binning.
CategoricalRepresentation
(categories)¶ Bases:
object
Constructor for representation of a categorical bin. :param categories: list of categorical values that belong to this bin
-
apply_to_data
(data, feature)¶ Apply the bin to data. :param data: pandas data frame :param feature: feature name on which this bin is defined :return: subset of input dataframe which belongs to this bin
-
-
class
expan.core.binning.
NumericalRepresentation
(lower, upper, lower_closed, upper_closed)¶ Bases:
object
Constructor for representation of a numerical bin. :param upper: upper bound of the bin (exclusive) :param lower: lower bound of the bin (inclusive) :param lower_closed: boolean indicator whether lower bound is closed :param upper_closed: boolean indicator whether upper bound is closed
-
apply_to_data
(data, feature)¶ Apply the bin to data. :param data: pandas data frame :param feature: feature name on which this bin is defined :return: subset of input dataframe which belongs to this bin
-
-
expan.core.binning.
create_bins
(data, n_bins)¶ Create bins from the data value :param data: a list or a 1-dim array of data to determine the bins :param n_bins: number of bins to create :return: a list of Bin object
-
expan.core.binning.
toBinObject
(bins)¶
expan.core.correction module¶
-
expan.core.correction.
benjamini_hochberg
(false_discovery_rate, original_p_values)¶ Benjamini-Hochberg procedure.
Parameters: - false_discovery_rate (float) – proportion of significant results that are actually false positives
- original_p_values (list[float]) – p values from all the tests
Returns: new critical value (i.e. the corrected alpha)
Return type: float
-
expan.core.correction.
bonferroni
(false_positive_rate, original_p_values)¶ Bonferrnoi correction.
Parameters: - false_positive_rate (float) – alpha value before correction
- original_p_values (list[float]) – p values from all the tests
Returns: new critical value (i.e. the corrected alpha)
Return type: float
expan.core.early_stopping module¶
-
expan.core.early_stopping.
HDI_from_MCMC
(posterior_samples, credible_mass=0.95)¶ Computes highest density interval from a sample of representative values, estimated as the shortest credible interval. Takes Arguments posterior_samples (samples from posterior) and credible mass (normally .95). http://stackoverflow.com/questions/22284502/highest-posterior-density-region-and-central-credible-region
Parameters: - posterior_samples (array-like) – sample of data points from posterior distribution of some parameter
- credible_mass (float) – the range of credible interval. 0.95 means 95% represents credible interval.
Returns: corresponding lower and upper bound for the credible interval
Return type: tuple[float]
-
expan.core.early_stopping.
bayes_factor
(x, y, distribution='normal', num_iters=25000, inference='sampling')¶ Bayes factor computation.
Parameters: - x (pd.Series or list (array-like)) – sample of a treatment group
- y (pd.Series or list (array-like)) – sample of a control group
- distribution (str) – name of the KPI distribution model, which assumes a Stan model file with the same name exists
- num_iters (int) – number of iterations of bayes sampling
- inference (str) – sampling or variational inference method for approximation the posterior
Returns: results of type EarlyStoppingTestStatistics (without p-value and stat. power)
Return type:
-
expan.core.early_stopping.
bayes_precision
(x, y, distribution='normal', posterior_width=0.08, num_iters=25000, inference='sampling')¶ Bayes precision computation.
Parameters: - x (pd.Series or list (array-like)) – sample of a treatment group
- y (pd.Series or list (array-like)) – sample of a control group
- distribution (str) – name of the KPI distribution model, which assumes a Stan model file with the same name exists
- posterior_width (float) – the stopping criterion, threshold of the posterior width
- num_iters (int) – number of iterations of bayes sampling
- inference (str) – sampling or variational inference method for approximation the posterior
Returns: results of type EarlyStoppingTestStatistics (without p-value and stat. power)
Return type:
-
expan.core.early_stopping.
get_or_compile_stan_model
(model_file, distribution)¶ Creates Stan model. Compiles a Stan model and saves it to .pkl file to the folder selected by tempfile module if file doesn’t exist yet and load precompiled model if there is a model file in temporary dir.
Note: compiled_model_file is the hardcoded file path which may cause some issues in future. There are 2 alternative implementations for Stan models handling:
- Using global variables
- Pre-compiling stan models and adding them as a part of ExpAn project
Using temporary files with tempfile module is not currently possible, since it generates a unique file name which is difficult to track. However, compiled modules are saved in temporary directory using tempfile module which vary based on the current platform and settings. Cleaning up a temp dir is done on boot.
Parameters: - model_file (str) – model file location
- distribution (str) – name of the KPI distribution model, which assumes a Stan model file with the same name exists
Returns: compiled Stan model for the selected distribution or normal distribution as a default option
Return type: Class representing a compiled Stan model
-
expan.core.early_stopping.
get_trace_normalized_effect_size
(distribution, traces)¶ Obtaining a Stan model statistics for ‘normal’ or ‘poisson’ distribution
Parameters: - distribution (str) – name of the KPI distribution model, which assumes a Stan model file with the same name exists
- traces (dict) – sampling statistics
Returns: sample of data points from posterior distribution of some parameter
Return type: array-like
-
expan.core.early_stopping.
group_sequential
(x, y, spending_function='obrien_fleming', estimated_sample_size=None, alpha=0.05, cap=8)¶ Group sequential method to determine whether to stop early.
Parameters: - x (pd.Series or array-like) – sample of a treatment group
- y (pd.Series or array-like) – sample of a control group
- spending_function (str) – name of the alpha spending function, currently supports only ‘obrien_fleming’.
- estimated_sample_size (int) – sample size to be achieved towards the end of experiment
- alpha (float) – type-I error rate
- cap (int) – upper bound of the adapted z-score
Returns: results of type EarlyStoppingTestStatistics
Return type:
-
expan.core.early_stopping.
make_bayes_factor
(distribution='normal', num_iters=25000, inference='sampling')¶ Closure method for the bayes_factor
-
expan.core.early_stopping.
make_bayes_precision
(distribution='normal', posterior_width=0.08, num_iters=25000, inference='sampling')¶ Closure method for the bayes_precision
-
expan.core.early_stopping.
make_group_sequential
(spending_function='obrien_fleming', estimated_sample_size=None, alpha=0.05, cap=8)¶ A closure to the group_sequential function.
-
expan.core.early_stopping.
obrien_fleming
(information_fraction, alpha=0.05)¶ Calculate an approximation of the O’Brien-Fleming alpha spending function.
Parameters: - information_fraction (float) – share of the information amount at the point of evaluation, e.g. the share of the maximum sample size
- alpha (float) – type-I error rate
Returns: redistributed alpha value at the time point with the given information fraction
Return type: float
expan.core.experiment module¶
-
class
expan.core.experiment.
Experiment
(metadata)¶ Bases:
object
Class which adds the analysis functions to experimental data.
Constructor of the experiment object.
Parameters: metadata (dict) – additional information about the experiment. (e.g. primary KPI, source, etc) -
analyze_statistical_test
(test, test_method='fixed_horizon', include_data=False, **worker_args)¶ Runs delta analysis on one statistical test and returns statistical results.
Parameters: - test (StatisticalTest) – a statistical test to run
- test_method (str) – analysis method to perform. It can be ‘fixed_horizon’, ‘group_sequential’, ‘bayes_factor’ or ‘bayes_precision’.
- include_data (bool) – True if test results should include data, False - if no data should be included
- worker_args – additional arguments for the analysis method
Returns: statistical result of the test
Return type:
-
analyze_statistical_test_suite
(test_suite, test_method='fixed_horizon', **worker_args)¶ Runs delta analysis on a set of tests and returns statistical results for each statistical test in the suite.
Parameters: - test_suite (StatisticalTestSuite) – a suite of statistical test to run
- test_method (str) – analysis method to perform. It can be ‘fixed_horizon’, ‘group_sequential’, ‘bayes_factor’ or ‘bayes_precision’.
- worker_args – additional arguments for the analysis method (see signatures of corresponding methods)
Returns: statistical result of the test suite
Return type:
-
chi_square_test_result_and_statistics
(variant_column, weights, min_counts=5, alpha=0.05)¶ Tests the consistency of variant split with the hypothesized distribution.
Parameters: - variant_column – variant column from the input data frame
- weights – dict with variant names as keys, weights as values ({<variant_name>:<weight>, …}
- min_counts – minimum number of observed and expected frequencies (should be at least 5), see http://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.stats.chisquare.html
- alpha – significance level, 0.05 by default
Returns: True(if split is consistent with the given split) or False(if split is not consistent with the given split)
Return type: Boolean, float, float
-
outlier_filter
(data, kpis, percentile=99.0, threshold_type='upper')¶ Method that filters out entities whose KPIs exceed the value at a given percentile. If any of the KPIs exceeds its threshold the entity is filtered out. If kpis contains derived kpi, this method will first create these columns, and then perform outlier filtering on all given kpis.
Parameters: - kpis (list[KPI]) – list of KPI instances
- percentile (float) – percentile considered as filtering threshold
- threshold_type (str) – type of threshold used (‘lower’ or ‘upper’)
Returns: Will return data with filtered outliers.
-
expan.core.results module¶
-
class
expan.core.results.
BaseTestStatistics
(control_statistics, treatment_statistics)¶ Bases:
expan.core.util.JsonSerializable
Holds only statistics for the control and treatment group.
Parameters: - control_statistics (SampleStatistics) – statistics within the control group
- treatment_statistics (SampleStatistics) – statistics within the treatment group
-
class
expan.core.results.
CombinedTestStatistics
(original_test_statistics, corrected_test_statistics)¶ Bases:
expan.core.util.JsonSerializable
Holds original and corrected statistics. This class should be used to hold statistics for multiple testing. original_test_statistics and corrected_test_statistics should have the same type. In case there is no correction specified, corrected_test_statistics == original_test_statistics.
Parameters: - original_test_statistics (SimpleTestStatistics or EarlyStoppingTestStatistics) – test result before correction
- corrected_test_statistics (SimpleTestStatistics or EarlyStoppingTestStatistics) – test result after correction or same as original_test_statistics if no correction
-
class
expan.core.results.
EarlyStoppingTestStatistics
(control_statistics, treatment_statistics, delta, ci, p, statistical_power, stop)¶ Bases:
expan.core.results.SimpleTestStatistics
Additionally to SimpleTestStatistics, holds boolean flag for early stopping.
Parameters: - control_statistics (SampleStatistics) – sample size, mean, variance for the control group
- treatment_statistics (SampleStatistics) – sample size, mean, variance for the treatment group
- ci (dict) – a dict where keys are percentiles and values are the corresponding value for the statistic.
- stop (bool) – early-stopping flag
-
class
expan.core.results.
MultipleTestSuiteResult
(results, correction_method=<CorrectionMethod.NONE: 1>)¶ Bases:
expan.core.util.JsonSerializable
This class holds the results of a MultipleTestSuite.
Parameters: - results (list[StatisticalTestResult]) – test results for all statistical testing unit
- correction_method (CorrectionMethod) – method used for multiple testing correction
-
merge_with
(multiple_test_suite_result)¶ Merges two multiple test suite results. :param multiple_test_suite_result: multiple test suite result :type multiple_test_suite_result: MultipleTestSuiteResult
:return merged multiple test suite results :rtype MultipleTestSuiteResult
-
class
expan.core.results.
SampleStatistics
(sample_size, mean, variance)¶ Bases:
expan.core.util.JsonSerializable
This class holds sample size, mean and variance.
Parameters: - sample_size (int) – samples size of the control or treatment group
- mean (float) – mean of the control or treatment group
- variance (float) – variance of the control or treatment group
-
class
expan.core.results.
SimpleTestStatistics
(control_statistics, treatment_statistics, delta, ci, p, statistical_power)¶ Bases:
expan.core.results.BaseTestStatistics
Additionally to BaseTestStatistics, holds delta, confidence interval, statistical power, and p value.
Parameters: - control_statistics (SampleStatistics) – sample size, mean, variance for the control group
- treatment_statistics (SampleStatistics) – sample size, mean, variance for the treatment group
- delta (float) – delta (relative or absolute difference between control and treatment, uplift)
- p (float) – p value
- statistical_power (float) – statistical power value
- ci (dict) – a dict where keys are percentiles and values are the corresponding value for the statistic.
-
class
expan.core.results.
StatisticalTestResult
(test, result)¶ Bases:
expan.core.util.JsonSerializable
This class holds the results of a single statistical test.
Parameters: - test (StatisticalTest) – information about the statistical test
- result (CombinedTestStatistics) – result of this statistical test
expan.core.statistical_test module¶
-
class
expan.core.statistical_test.
CorrectionMethod
¶ Bases:
enum.Enum
Correction methods.
-
BH
= 3¶
-
BONFERRONI
= 2¶
-
NONE
= 1¶
-
-
class
expan.core.statistical_test.
DerivedKPI
(name, numerator, denominator)¶ Bases:
expan.core.statistical_test.KPI
This class represents a derived KPI which is a ratio of two columns. Names of the the two columns are passed as numerator and denominator.
Parameters: - name (str) – name of the kpi
- numerator (str) – the numerator for the derived KPI
- denominator (str) – the denominator for the derived KPI
-
make_derived_kpi
(data)¶ Create the derived kpi column if it is not yet created.
-
class
expan.core.statistical_test.
FeatureFilter
(column_name, column_value)¶ Bases:
expan.core.util.JsonSerializable
This class represents a filter, restricting a DataFrame to rows with column_value in column_name.
It can be used to specify subgroup conditions. :param column_name: name of the column to perform filter on :type column_name: str :param column_value: value of the column to perform filter on :type column_value: str
-
apply_to_data
(data)¶
-
-
class
expan.core.statistical_test.
KPI
(name)¶ Bases:
expan.core.util.JsonSerializable
This class represents a basic kpi. :param name: name of the kpi :type name: str
-
class
expan.core.statistical_test.
StatisticalTest
(data, kpi, features, variants)¶ Bases:
expan.core.util.JsonSerializable
This class describes what has to be tested against what and represent a unit of statistical testing.
Parameters: - data (DataFrame) – data for statistical test
- kpi (KPI or its subclass) – the kpi to perform on
- features (list[FeatureFilter]) – list of features used for subgroups
- variants (Variants) – variant column name and their values
-
class
expan.core.statistical_test.
StatisticalTestSuite
(tests, correction_method=<CorrectionMethod.NONE: 1>)¶ Bases:
expan.core.util.JsonSerializable
This class consists of a number of tests plus choice of the correction method.
Parameters: - tests (list[StatisticalTest]) – list of statistical tests in the suite
- correction_method (CorrectionMethod) – method used for multiple testing correction
-
size
¶
-
class
expan.core.statistical_test.
Variants
(variant_column_name, control_name, treatment_name)¶ Bases:
expan.core.util.JsonSerializable
This class represents information of variants.
Parameters: - variant_column_name (str) – name of the column that represents variant
- control_name (str) – value of the variant that represents control group
- treatment_name (str) – value of the variant that represents control group
-
get_variant
(data, variant_name)¶
expan.core.statistics module¶
-
expan.core.statistics.
bootstrap
(x, y, func=<function _delta_mean>, nruns=10000, percentiles=[2.5, 97.5], min_observations=20, return_bootstraps=False, relative=False)¶ Bootstraps the Confidence Intervals for a particular function comparing two samples. NaNs are ignored (discarded before calculation).
Parameters: - x (pd.Series or list (array-like)) – sample of the treatment group
- y (pd.Series or list (array-like)) – sample of the control group
- func (function) – function of which the distribution is to be computed. The default comparison metric is the difference of means. For bootstraping correlation: func=lambda x,y: np.stats.pearsonr(x,y)[0].
- nruns (int) – number of bootstrap runs to perform
- percentiles (list) – The values corresponding to the given percentiles are returned. The default percentiles (2.5% and 97.5%) correspond to an alpha of 0.05.
- min_observations (int) – minimum number of observations necessary
- return_bootstraps (bool) – If this variable is set the bootstrap sets are returned, otherwise the first return value is empty.
- relative (bool) – if relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
Return (c_val, bootstraps): c_val is a dict which contains percentile levels (index) and values bootstraps is a np.array containing the bootstrapping results per run
Return type: tuple
-
expan.core.statistics.
chi_square
(observed_freqs, expected_freqs, ddof=0)¶ Compute chi-square statistics and p-values given observed and expected frequencies and degrees of freedom.
Parameters: - observed_freqs (pd.Series or array-like) – observed frequencies
- expected_freqs (pd.Series or array-like) – expected frequencies
- ddof (int) – delta degrees of freedom, 0 by default
Returns: chi-square statistics and p-value
Return type: float, float
-
expan.core.statistics.
compute_p_value
(mean1, std1, n1, mean2, std2, n2)¶ Compute two-tailed p value for statistical Student’s T-test given statistics of control and treatment.
Parameters: - mean1 (float) – mean value of the treatment distribution
- std1 (float) – standard deviation of the treatment distribution
- n1 (int) – number of samples of the treatment distribution
- mean2 (float) – mean value of the control distribution
- std2 (float) – standard deviation of the control distribution
- n2 (int) – number of samples of the control distribution
Returns: two-tailed p-value
Return type: float
-
expan.core.statistics.
compute_p_value_from_samples
(x, y)¶ Calculates two-tailed p value for statistical Student’s T-test based on pooled standard deviation.
Parameters: - x (pd.Series or array-like) – samples of a treatment group
- y (pd.Series or array-like) – samples of a control group
Returns: two-tailed p-value
Return type: float
-
expan.core.statistics.
compute_statistical_power
(mean1, std1, n1, mean2, std2, n2, z_1_minus_alpha)¶ Compute statistical power given statistics of control and treatment.
Parameters: - mean1 (float) – mean value of the treatment distribution
- std1 (float) – standard deviation of the treatment distribution
- n1 (int) – number of samples of the treatment distribution
- mean2 (float) – mean value of the control distribution
- std2 (float) – standard deviation of the control distribution
- n2 (int) – number of samples of the control distribution
- z_1_minus_alpha (float) – critical value for significance level alpha. That is, z-value for 1-alpha.
Returns: statistical power—the probability of a test to detect an effect if the effect actually exists or -1 if std is less or equal to 0
Return type: float
-
expan.core.statistics.
compute_statistical_power_from_samples
(x, y, alpha=0.05)¶ Compute statistical power given data samples of control and treatment.
Parameters: - x (pd.Series or array-like) – samples of a treatment group
- y (pd.Series or array-like) – samples of a control group
- alpha (float) – Type I error (false positive rate)
Returns: statistical power—the probability of a test to detect an effect if the effect actually exists
Return type: float
-
expan.core.statistics.
delta
(x, y, x_denominators=1, y_denominators=1, assume_normal=True, alpha=0.05, min_observations=20, nruns=10000, relative=False)¶ Calculates the difference of means between the samples in a statistical sense. Computation is done in form of treatment minus control, i.e. x-y. Note that NaNs are treated as if they do not exist in the data.
Parameters: - x (pd.Series or array-like) – sample of the treatment group
- y (pd.Series or array-like) – sample of the control group
- x_denominators (pd.Series or array-like) – sample of the treatment group
- y_denominators (pd.Series or array-like) – sample of the control group
- assume_normal (boolean) – specifies whether normal distribution assumptions can be made
- alpha (float) – significance level (alpha)
- min_observations (int) – minimum number of observations needed
- nruns (int) – only used if assume normal is false
- relative – if relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
Type: relative: boolean
Returns: results of type SimpleTestStatistics
Return type:
-
expan.core.statistics.
estimate_sample_size
(x, mde, r, alpha=0.05, beta=0.2)¶ Estimates sample size based on sample mean and variance given MDE (Minimum Detectable effect), number of variants and variant split ratio
Parameters: - x (pd.Series or pd.DataFrame) – sample to base estimation on
- mde (float) – minimum detectable effect
- r (float) – variant split ratio
- alpha (float) – significance level
- beta (float) – type II error
Returns: estimated sample size
Return type: float or pd.Series
-
expan.core.statistics.
make_delta
(assume_normal=True, alpha=0.05, min_observations=20, nruns=10000, relative=False)¶ A closure to the delta function.
-
expan.core.statistics.
normal_difference
(mean1, std1, n1, mean2, std2, n2, percentiles=[2.5, 97.5], relative=False)¶ Calculates the difference distribution of two normal distributions. Computation is done in form of treatment minus control. It is assumed that the standard deviations of both distributions do not differ too much.
- For further information visit:
- http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Confidence_Intervals/BS704_Confidence_Intervals5.html
Parameters: - mean1 (float) – mean value of the treatment distribution
- std1 (float) – standard deviation of the treatment distribution
- n1 (int) – number of samples of the treatment distribution
- mean2 (float) – mean value of the control distribution
- std2 (float) – standard deviation of the control distribution
- n2 (int) – number of samples of the control distribution
- percentiles (list) – list of percentile values to compute
- relative (bool) – If relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
Returns: percentiles and corresponding values
Return type: dict
-
expan.core.statistics.
normal_sample_difference
(x, y, percentiles=[2.5, 97.5], relative=False)¶ Calculates the difference distribution of two normal distributions given by their samples.
Computation is done in form of treatment minus control. It is assumed that the standard deviations of both distributions do not differ too much.
Parameters: - x (pd.Series or list (array-like)) – sample of a treatment group
- y – sample of a control group
- percentiles (list) – list of percentile values to compute
- relative (bool) – If relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
Returns: percentiles and corresponding values
Return type: dict
-
expan.core.statistics.
normal_sample_weighted_difference
(x_numerators, y_numerators, x_denominators, y_denominators, percentiles=[2.5, 97.5], relative=False)¶ Calculates the difference distribution of two distributions given by their samples.
Computation is done in form of treatment(x) minus control(y). It is assumed that the standard deviations of both distributions do not differ too much.
The estimate of the mean difference is \(\frac{mean(x_{numerators})}{mean(x_{denominators})}-\frac{mean(y_{numerators})}{mean(y_{denominators})}\). For non-derived KPIs, the denominators will be exactly 1, and hence this will simplify to \(mean(x_{numerators})-mean(y_{numerators})\). For details on the variance calcuation, see the Glossary.
Parameters: - x_numerators (pd.Series or list (array-like)) – sample of a treatment group
- y_numerators (pd.Series or list (array-like)) – sample of a control group
- x_denominators (pd.Series or list (array-like), or simply 1 as an int/float if a non-derived KPI) – sample of a treatment group
- y_denominators (pd.Series or list (array-like), or simply 1 as an int/float if a non-derived KPI) – sample of a control group
- percentiles (list) – list of percentile values to compute
- relative (bool) – If relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
Returns: percentiles and corresponding values
Return type: dict with multiple entries:
- c_i: confidence_interval
- mean1: \(\frac{mean(x_{numerators})}{mean(x_{denominators})}\)
- mean2: \(\frac{mean(y_{numerators})}{mean(y_{denominators})}\)
- n1: sample size of x, after discarding NaNs
- n2: sample size of y, after discarding NaNs
- var1: \(var\left(\frac{x_{numerators}[i] - mean1 \cdot x_{denominators}[i]}{mean(x_{denominators})}\right)\)
- var2: \(var\left(\frac{y_{numerators}[i] - mean2 \cdot y_{denominators}[i]}{mean(y_{denominators})}\right)\)
-
expan.core.statistics.
pooled_std
(std1, n1, std2, n2)¶ Returns the pooled estimate of standard deviation.
- For further information visit:
- http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Confidence_Intervals/BS704_Confidence_Intervals5.html
Parameters: - std1 (float) – standard deviation of first sample
- n1 (int) – size of first sample
- std2 (float) – standard deviation of second sample
- n2 (int) – size of second sample
Returns: pooled standard deviation
Type: float
-
expan.core.statistics.
sample_size
(x)¶ Calculates valid sample size given the data.
Parameters: x (pd.Series or list (array-like)) – sample to calculate the sample size Returns: sample size of the sample excluding nans Return type: int
expan.core.util module¶
-
expan.core.util.
drop_nan
(array)¶ Drop Nan values from the given numpy array.
Parameters: array (np.ndarray) – input array Returns: a new array without NaN values Return type: np.ndarray
-
expan.core.util.
find_value_by_key_with_condition
(items, condition_key, condition_value, lookup_key)¶ Find the value of lookup key where the dictionary contains condition key = condition value.
Parameters: - items (list) – list of dictionaries
- condition_key (str) – condition key
- condition_value – a value for the condition key
- lookup_key (str) – lookup key or key you want to find the value for
Returns: lookup value or found value for the lookup key
-
expan.core.util.
generate_random_data
()¶ Generate random data for two variants. It can be used in unit tests or demo.
-
expan.core.util.
is_nan
(obj)¶ Checks whether the input is NaN. It uses the trick that NaN is not equal to NaN.
expan.core.version module¶
-
expan.core.version.
git_commit_count
()¶ Returns the output of git rev-list –count HEAD as an int. Note: http://programmers.stackexchange.com/a/151558
-
expan.core.version.
git_latest_commit
()¶ Returns output of git rev-parse HEAD. Note: http://programmers.stackexchange.com/a/151558.
-
expan.core.version.
version
(format_str='{short}')¶ Returns current version number in specified format.
Parameters: format_str (str) – format string for the version Returns: version number in the specified format Return type: str
-
expan.core.version.
version_numbers
()¶ Returns ExpAn version.