expan.core package¶

Submodules¶

expan.core.binning module¶

NB: This module is deprecated.

class expan.core.binning.Bin(bin_type, *repr_args)¶

Bases: object

Constructor for a bin object. :param id: identifier (e.g. bin number) of the bin :param bin_type: “numerical” or “categorical” :param repr_args: arguments to represent this bin. args for numerical bin includes lower, upper, lower_closed, upper_closed args for categorical bin includes a list of categories for this bin.

class expan.core.binning.CategoricalRepresentation(categories)¶

Bases: object

Constructor for representation of a categorical bin. :param categories: list of categorical values that belong to this bin

apply_to_data(data, feature)¶: Apply the bin to data. :param data: pandas data frame :param feature: feature name on which this bin is defined :return: subset of input dataframe which belongs to this bin

class expan.core.binning.NumericalRepresentation(lower, upper, lower_closed, upper_closed)¶

Bases: object

Constructor for representation of a numerical bin. :param upper: upper bound of the bin (exclusive) :param lower: lower bound of the bin (inclusive) :param lower_closed: boolean indicator whether lower bound is closed :param upper_closed: boolean indicator whether upper bound is closed

apply_to_data(data, feature)¶: Apply the bin to data. :param data: pandas data frame :param feature: feature name on which this bin is defined :return: subset of input dataframe which belongs to this bin

expan.core.binning.create_bins(data, n_bins)¶: Create bins from the data value :param data: a list or a 1-dim array of data to determine the bins :param n_bins: number of bins to create :return: a list of Bin object

expan.core.binning.toBinObject(bins)¶

expan.core.correction module¶

expan.core.correction.benjamini_hochberg(false_discovery_rate, original_p_values)¶

Benjamini-Hochberg procedure.

Parameters:	false_discovery_rate (float) – proportion of significant results that are actually false positives original_p_values (list[float]) – p values from all the tests
Returns:	new critical value (i.e. the corrected alpha)
Return type:	float

expan.core.correction.bonferroni(false_positive_rate, original_p_values)¶

Bonferrnoi correction.

Parameters:	false_positive_rate (float) – alpha value before correction original_p_values (list[float]) – p values from all the tests
Returns:	new critical value (i.e. the corrected alpha)
Return type:	float

expan.core.early_stopping module¶

expan.core.early_stopping.HDI_from_MCMC(posterior_samples, credible_mass=0.95)¶

Computes highest density interval from a sample of representative values, estimated as the shortest credible interval. Takes Arguments posterior_samples (samples from posterior) and credible mass (normally .95). http://stackoverflow.com/questions/22284502/highest-posterior-density-region-and-central-credible-region

Parameters:	posterior_samples (array-like) – sample of data points from posterior distribution of some parameter credible_mass (float) – the range of credible interval. 0.95 means 95% represents credible interval.
Returns:	corresponding lower and upper bound for the credible interval
Return type:	tuple[float]

expan.core.early_stopping.bayes_factor(x, y, distribution='normal', num_iters=25000, inference='sampling')¶

Bayes factor computation.

Parameters:	x (pd.Series or list (array-like)) – sample of a treatment group y (pd.Series or list (array-like)) – sample of a control group distribution (str) – name of the KPI distribution model, which assumes a Stan model file with the same name exists num_iters (int) – number of iterations of bayes sampling inference (str) – sampling or variational inference method for approximation the posterior
Returns:	results of type EarlyStoppingTestStatistics (without p-value and stat. power)
Return type:	EarlyStoppingTestStatistics

expan.core.early_stopping.bayes_precision(x, y, distribution='normal', posterior_width=0.08, num_iters=25000, inference='sampling')¶

Bayes precision computation.

Parameters:	x (pd.Series or list (array-like)) – sample of a treatment group y (pd.Series or list (array-like)) – sample of a control group distribution (str) – name of the KPI distribution model, which assumes a Stan model file with the same name exists posterior_width (float) – the stopping criterion, threshold of the posterior width num_iters (int) – number of iterations of bayes sampling inference (str) – sampling or variational inference method for approximation the posterior
Returns:	results of type EarlyStoppingTestStatistics (without p-value and stat. power)
Return type:	EarlyStoppingTestStatistics

expan.core.early_stopping.get_or_compile_stan_model(model_file, distribution)¶

Creates Stan model. Compiles a Stan model and saves it to .pkl file to the folder selected by tempfile module if file doesn’t exist yet and load precompiled model if there is a model file in temporary dir.

Note: compiled_model_file is the hardcoded file path which may cause some issues in future. There are 2 alternative implementations for Stan models handling:

Using global variables
Pre-compiling stan models and adding them as a part of ExpAn project

Using temporary files with tempfile module is not currently possible, since it generates a unique file name which is difficult to track. However, compiled modules are saved in temporary directory using tempfile module which vary based on the current platform and settings. Cleaning up a temp dir is done on boot.

Parameters:	model_file (str) – model file location distribution (str) – name of the KPI distribution model, which assumes a Stan model file with the same name exists
Returns:	compiled Stan model for the selected distribution or normal distribution as a default option
Return type:	Class representing a compiled Stan model

expan.core.early_stopping.get_trace_normalized_effect_size(distribution, traces)¶

Obtaining a Stan model statistics for ‘normal’ or ‘poisson’ distribution

Parameters:	distribution (str) – name of the KPI distribution model, which assumes a Stan model file with the same name exists traces (dict) – sampling statistics
Returns:	sample of data points from posterior distribution of some parameter
Return type:	array-like

expan.core.early_stopping.group_sequential(x, y, spending_function='obrien_fleming', estimated_sample_size=None, alpha=0.05, cap=8)¶

Group sequential method to determine whether to stop early.

Parameters:	x (pd.Series or array-like) – sample of a treatment group y (pd.Series or array-like) – sample of a control group spending_function (str) – name of the alpha spending function, currently supports only ‘obrien_fleming’. estimated_sample_size (int) – sample size to be achieved towards the end of experiment alpha (float) – type-I error rate cap (int) – upper bound of the adapted z-score
Returns:	results of type EarlyStoppingTestStatistics
Return type:	EarlyStoppingTestStatistics

expan.core.early_stopping.make_bayes_factor(distribution='normal', num_iters=25000, inference='sampling')¶: Closure method for the bayes_factor

expan.core.early_stopping.make_bayes_precision(distribution='normal', posterior_width=0.08, num_iters=25000, inference='sampling')¶: Closure method for the bayes_precision

expan.core.early_stopping.make_group_sequential(spending_function='obrien_fleming', estimated_sample_size=None, alpha=0.05, cap=8)¶: A closure to the group_sequential function.

expan.core.early_stopping.obrien_fleming(information_fraction, alpha=0.05)¶

Calculate an approximation of the O’Brien-Fleming alpha spending function.

Parameters:	information_fraction (float) – share of the information amount at the point of evaluation, e.g. the share of the maximum sample size alpha (float) – type-I error rate
Returns:	redistributed alpha value at the time point with the given information fraction
Return type:	float

expan.core.experiment module¶

class expan.core.experiment.Experiment(metadata)¶

Bases: object

Class which adds the analysis functions to experimental data.

Constructor of the experiment object.

Parameters:	metadata (dict) – additional information about the experiment. (e.g. primary KPI, source, etc)

analyze_statistical_test(test, test_method='fixed_horizon', include_data=False, **worker_args)¶

Runs delta analysis on one statistical test and returns statistical results.

Parameters:	test (StatisticalTest) – a statistical test to run test_method (str) – analysis method to perform. It can be ‘fixed_horizon’, ‘group_sequential’, ‘bayes_factor’ or ‘bayes_precision’. include_data (bool) – True if test results should include data, False - if no data should be included worker_args – additional arguments for the analysis method
Returns:	statistical result of the test
Return type:	StatisticalTestResult

analyze_statistical_test_suite(test_suite, test_method='fixed_horizon', **worker_args)¶

Runs delta analysis on a set of tests and returns statistical results for each statistical test in the suite.

Parameters:	test_suite (StatisticalTestSuite) – a suite of statistical test to run test_method (str) – analysis method to perform. It can be ‘fixed_horizon’, ‘group_sequential’, ‘bayes_factor’ or ‘bayes_precision’. worker_args – additional arguments for the analysis method (see signatures of corresponding methods)
Returns:	statistical result of the test suite
Return type:	MultipleTestSuiteResult

chi_square_test_result_and_statistics(variant_column, weights, min_counts=5, alpha=0.05)¶

Tests the consistency of variant split with the hypothesized distribution.

Parameters:	variant_column – variant column from the input data frame weights – dict with variant names as keys, weights as values ({<variant_name>:<weight>, …} min_counts – minimum number of observed and expected frequencies (should be at least 5), see http://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.stats.chisquare.html alpha – significance level, 0.05 by default
Returns:	True(if split is consistent with the given split) or False(if split is not consistent with the given split)
Return type:	Boolean, float, float

outlier_filter(data, kpis, percentile=99.0, threshold_type='upper')¶

Method that filters out entities whose KPIs exceed the value at a given percentile. If any of the KPIs exceeds its threshold the entity is filtered out. If kpis contains derived kpi, this method will first create these columns, and then perform outlier filtering on all given kpis.

Parameters:	kpis (list[KPI]) – list of KPI instances percentile (float) – percentile considered as filtering threshold threshold_type (str) – type of threshold used (‘lower’ or ‘upper’)
Returns:	Will return data with filtered outliers.

expan.core.results module¶

class expan.core.results.BaseTestStatistics(control_statistics, treatment_statistics)¶

Bases: expan.core.util.JsonSerializable

Holds only statistics for the control and treatment group.

Parameters:	control_statistics (SampleStatistics) – statistics within the control group treatment_statistics (SampleStatistics) – statistics within the treatment group

class expan.core.results.CombinedTestStatistics(original_test_statistics, corrected_test_statistics)¶

Bases: expan.core.util.JsonSerializable

Holds original and corrected statistics. This class should be used to hold statistics for multiple testing. original_test_statistics and corrected_test_statistics should have the same type. In case there is no correction specified, corrected_test_statistics == original_test_statistics.

Parameters:	original_test_statistics (SimpleTestStatistics or EarlyStoppingTestStatistics) – test result before correction corrected_test_statistics (SimpleTestStatistics or EarlyStoppingTestStatistics) – test result after correction or same as original_test_statistics if no correction

class expan.core.results.EarlyStoppingTestStatistics(control_statistics, treatment_statistics, delta, ci, p, statistical_power, stop)¶

Bases: expan.core.results.SimpleTestStatistics

Additionally to SimpleTestStatistics, holds boolean flag for early stopping.

Parameters:	control_statistics (SampleStatistics) – sample size, mean, variance for the control group treatment_statistics (SampleStatistics) – sample size, mean, variance for the treatment group ci (dict) – a dict where keys are percentiles and values are the corresponding value for the statistic. stop (bool) – early-stopping flag

class expan.core.results.MultipleTestSuiteResult(results, correction_method=<CorrectionMethod.NONE: 1>)¶

Bases: expan.core.util.JsonSerializable

This class holds the results of a MultipleTestSuite.

Parameters:	results (list[StatisticalTestResult]) – test results for all statistical testing unit correction_method (CorrectionMethod) – method used for multiple testing correction

merge_with(multiple_test_suite_result)¶

Merges two multiple test suite results. :param multiple_test_suite_result: multiple test suite result :type multiple_test_suite_result: MultipleTestSuiteResult

:return merged multiple test suite results :rtype MultipleTestSuiteResult

class expan.core.results.SampleStatistics(sample_size, mean, variance)¶

Bases: expan.core.util.JsonSerializable

This class holds sample size, mean and variance.

Parameters:	sample_size (int) – samples size of the control or treatment group mean (float) – mean of the control or treatment group variance (float) – variance of the control or treatment group

class expan.core.results.SimpleTestStatistics(control_statistics, treatment_statistics, delta, ci, p, statistical_power)¶

Bases: expan.core.results.BaseTestStatistics

Additionally to BaseTestStatistics, holds delta, confidence interval, statistical power, and p value.

Parameters:

control_statistics (SampleStatistics) – sample size, mean, variance for the control group
treatment_statistics (SampleStatistics) – sample size, mean, variance for the treatment group
delta (float) – delta (relative or absolute difference between control and treatment, uplift)
p (float) – p value
statistical_power (float) – statistical power value
ci (dict) – a dict where keys are percentiles and values are the corresponding value for the statistic.

class expan.core.results.StatisticalTestResult(test, result)¶

Bases: expan.core.util.JsonSerializable

This class holds the results of a single statistical test.

Parameters:	test (StatisticalTest) – information about the statistical test result (CombinedTestStatistics) – result of this statistical test

expan.core.statistical_test module¶

class expan.core.statistical_test.CorrectionMethod¶

Bases: enum.Enum

Correction methods.

BH = 3¶

BONFERRONI = 2¶

NONE = 1¶

class expan.core.statistical_test.DerivedKPI(name, numerator, denominator)¶

Bases: expan.core.statistical_test.KPI

This class represents a derived KPI which is a ratio of two columns. Names of the the two columns are passed as numerator and denominator.

Parameters:	name (str) – name of the kpi numerator (str) – the numerator for the derived KPI denominator (str) – the denominator for the derived KPI

make_derived_kpi(data)¶: Create the derived kpi column if it is not yet created.

class expan.core.statistical_test.FeatureFilter(column_name, column_value)¶

Bases: expan.core.util.JsonSerializable

This class represents a filter, restricting a DataFrame to rows with column_value in column_name.

It can be used to specify subgroup conditions. :param column_name: name of the column to perform filter on :type column_name: str :param column_value: value of the column to perform filter on :type column_value: str

apply_to_data(data)¶

class expan.core.statistical_test.KPI(name)¶

Bases: expan.core.util.JsonSerializable

This class represents a basic kpi. :param name: name of the kpi :type name: str

class expan.core.statistical_test.StatisticalTest(data, kpi, features, variants)¶

Bases: expan.core.util.JsonSerializable

This class describes what has to be tested against what and represent a unit of statistical testing.

Parameters:	data (DataFrame) – data for statistical test kpi (KPI or its subclass) – the kpi to perform on features (list[FeatureFilter]) – list of features used for subgroups variants (Variants) – variant column name and their values

class expan.core.statistical_test.StatisticalTestSuite(tests, correction_method=<CorrectionMethod.NONE: 1>)¶

Bases: expan.core.util.JsonSerializable

This class consists of a number of tests plus choice of the correction method.

Parameters:	tests (list[StatisticalTest]) – list of statistical tests in the suite correction_method (CorrectionMethod) – method used for multiple testing correction

size¶

class expan.core.statistical_test.Variants(variant_column_name, control_name, treatment_name)¶

Bases: expan.core.util.JsonSerializable

This class represents information of variants.

Parameters:	variant_column_name (str) – name of the column that represents variant control_name (str) – value of the variant that represents control group treatment_name (str) – value of the variant that represents control group

get_variant(data, variant_name)¶

expan.core.statistics module¶

expan.core.statistics.bootstrap(x, y, func=<function _delta_mean>, nruns=10000, percentiles=[2.5, 97.5], min_observations=20, return_bootstraps=False, relative=False)¶

Bootstraps the Confidence Intervals for a particular function comparing two samples. NaNs are ignored (discarded before calculation).

Return (c_val, bootstraps):
Parameters:	x (pd.Series or list (array-like)) – sample of the treatment group y (pd.Series or list (array-like)) – sample of the control group func (function) – function of which the distribution is to be computed. The default comparison metric is the difference of means. For bootstraping correlation: func=lambda x,y: np.stats.pearsonr(x,y)[0]. nruns (int) – number of bootstrap runs to perform percentiles (list) – The values corresponding to the given percentiles are returned. The default percentiles (2.5% and 97.5%) correspond to an alpha of 0.05. min_observations (int) – minimum number of observations necessary return_bootstraps (bool) – If this variable is set the bootstrap sets are returned, otherwise the first return value is empty. relative (bool) – if relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
	c_val is a dict which contains percentile levels (index) and values bootstraps is a np.array containing the bootstrapping results per run
Return type:	tuple

expan.core.statistics.chi_square(observed_freqs, expected_freqs, ddof=0)¶

Compute chi-square statistics and p-values given observed and expected frequencies and degrees of freedom.

Parameters:	observed_freqs (pd.Series or array-like) – observed frequencies expected_freqs (pd.Series or array-like) – expected frequencies ddof (int) – delta degrees of freedom, 0 by default
Returns:	chi-square statistics and p-value
Return type:	float, float

expan.core.statistics.compute_p_value(mean1, std1, n1, mean2, std2, n2)¶

Compute two-tailed p value for statistical Student’s T-test given statistics of control and treatment.

Parameters:	mean1 (float) – mean value of the treatment distribution std1 (float) – standard deviation of the treatment distribution n1 (int) – number of samples of the treatment distribution mean2 (float) – mean value of the control distribution std2 (float) – standard deviation of the control distribution n2 (int) – number of samples of the control distribution
Returns:	two-tailed p-value
Return type:	float

expan.core.statistics.compute_p_value_from_samples(x, y)¶

Calculates two-tailed p value for statistical Student’s T-test based on pooled standard deviation.

Parameters:	x (pd.Series or array-like) – samples of a treatment group y (pd.Series or array-like) – samples of a control group
Returns:	two-tailed p-value
Return type:	float

expan.core.statistics.compute_statistical_power(mean1, std1, n1, mean2, std2, n2, z_1_minus_alpha)¶

Compute statistical power given statistics of control and treatment.

Parameters:	mean1 (float) – mean value of the treatment distribution std1 (float) – standard deviation of the treatment distribution n1 (int) – number of samples of the treatment distribution mean2 (float) – mean value of the control distribution std2 (float) – standard deviation of the control distribution n2 (int) – number of samples of the control distribution z_1_minus_alpha (float) – critical value for significance level alpha. That is, z-value for 1-alpha.
Returns:	statistical power—the probability of a test to detect an effect if the effect actually exists or -1 if std is less or equal to 0
Return type:	float

expan.core.statistics.compute_statistical_power_from_samples(x, y, alpha=0.05)¶

Compute statistical power given data samples of control and treatment.

Parameters:	x (pd.Series or array-like) – samples of a treatment group y (pd.Series or array-like) – samples of a control group alpha (float) – Type I error (false positive rate)
Returns:	statistical power—the probability of a test to detect an effect if the effect actually exists
Return type:	float

expan.core.statistics.delta(x, y, x_denominators=1, y_denominators=1, assume_normal=True, alpha=0.05, min_observations=20, nruns=10000, relative=False)¶

Calculates the difference of means between the samples in a statistical sense. Computation is done in form of treatment minus control, i.e. x-y. Note that NaNs are treated as if they do not exist in the data.

Parameters:	x (pd.Series or array-like) – sample of the treatment group y (pd.Series or array-like) – sample of the control group x_denominators (pd.Series or array-like) – sample of the treatment group y_denominators (pd.Series or array-like) – sample of the control group assume_normal (boolean) – specifies whether normal distribution assumptions can be made alpha (float) – significance level (alpha) min_observations (int) – minimum number of observations needed nruns (int) – only used if assume normal is false relative – if relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
Type:	relative: boolean
Returns:	results of type SimpleTestStatistics
Return type:	SimpleTestStatistics

expan.core.statistics.estimate_sample_size(x, mde, r, alpha=0.05, beta=0.2)¶

Estimates sample size based on sample mean and variance given MDE (Minimum Detectable effect), number of variants and variant split ratio

Parameters:	x (pd.Series or pd.DataFrame) – sample to base estimation on mde (float) – minimum detectable effect r (float) – variant split ratio alpha (float) – significance level beta (float) – type II error
Returns:	estimated sample size
Return type:	float or pd.Series

expan.core.statistics.make_delta(assume_normal=True, alpha=0.05, min_observations=20, nruns=10000, relative=False)¶: A closure to the delta function.

expan.core.statistics.normal_difference(mean1, std1, n1, mean2, std2, n2, percentiles=[2.5, 97.5], relative=False)¶

Calculates the difference distribution of two normal distributions. Computation is done in form of treatment minus control. It is assumed that the standard deviations of both distributions do not differ too much.

For further information visit:: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Confidence_Intervals/BS704_Confidence_Intervals5.html

Parameters:	mean1 (float) – mean value of the treatment distribution std1 (float) – standard deviation of the treatment distribution n1 (int) – number of samples of the treatment distribution mean2 (float) – mean value of the control distribution std2 (float) – standard deviation of the control distribution n2 (int) – number of samples of the control distribution percentiles (list) – list of percentile values to compute relative (bool) – If relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
Returns:	percentiles and corresponding values
Return type:	dict

expan.core.statistics.normal_sample_difference(x, y, percentiles=[2.5, 97.5], relative=False)¶

Calculates the difference distribution of two normal distributions given by their samples.

Computation is done in form of treatment minus control. It is assumed that the standard deviations of both distributions do not differ too much.

Parameters:	x (pd.Series or list (array-like)) – sample of a treatment group y – sample of a control group percentiles (list) – list of percentile values to compute relative (bool) – If relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
Returns:	percentiles and corresponding values
Return type:	dict

expan.core.statistics.normal_sample_weighted_difference(x_numerators, y_numerators, x_denominators, y_denominators, percentiles=[2.5, 97.5], relative=False)¶

Calculates the difference distribution of two distributions given by their samples.

Computation is done in form of treatment(x) minus control(y). It is assumed that the standard deviations of both distributions do not differ too much.

The estimate of the mean difference is \(\frac{mean(x_{numerators})}{mean(x_{denominators})}-\frac{mean(y_{numerators})}{mean(y_{denominators})}\). For non-derived KPIs, the denominators will be exactly 1, and hence this will simplify to \(mean(x_{numerators})-mean(y_{numerators})\). For details on the variance calcuation, see the Glossary.

Parameters:

x_numerators (pd.Series or list (array-like)) – sample of a treatment group
y_numerators (pd.Series or list (array-like)) – sample of a control group
x_denominators (pd.Series or list (array-like), or simply 1 as an int/float if a non-derived KPI) – sample of a treatment group
y_denominators (pd.Series or list (array-like), or simply 1 as an int/float if a non-derived KPI) – sample of a control group
percentiles (list) – list of percentile values to compute
relative (bool) – If relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.

Returns:

percentiles and corresponding values

Return type:

dict with multiple entries:

c_i: confidence_interval
mean1: \(\frac{mean(x_{numerators})}{mean(x_{denominators})}\)
mean2: \(\frac{mean(y_{numerators})}{mean(y_{denominators})}\)
n1: sample size of x, after discarding NaNs
n2: sample size of y, after discarding NaNs
var1: \(var\left(\frac{x_{numerators}[i] - mean1 \cdot x_{denominators}[i]}{mean(x_{denominators})}\right)\)
var2: \(var\left(\frac{y_{numerators}[i] - mean2 \cdot y_{denominators}[i]}{mean(y_{denominators})}\right)\)

expan.core.statistics.pooled_std(std1, n1, std2, n2)¶

Returns the pooled estimate of standard deviation.

For further information visit:: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Confidence_Intervals/BS704_Confidence_Intervals5.html

Parameters:	std1 (float) – standard deviation of first sample n1 (int) – size of first sample std2 (float) – standard deviation of second sample n2 (int) – size of second sample
Returns:	pooled standard deviation
Type:	float

expan.core.statistics.sample_size(x)¶

Calculates valid sample size given the data.

Parameters:	x (pd.Series or list (array-like)) – sample to calculate the sample size
Returns:	sample size of the sample excluding nans
Return type:	int

expan.core.util module¶

class expan.core.util.JsonSerializable¶

Bases: object

Interface for serializable classes.

toJson()¶

expan.core.util.drop_nan(array)¶

Drop Nan values from the given numpy array.

Parameters:	array (np.ndarray) – input array
Returns:	a new array without NaN values
Return type:	np.ndarray

expan.core.util.find_value_by_key_with_condition(items, condition_key, condition_value, lookup_key)¶

Find the value of lookup key where the dictionary contains condition key = condition value.

Parameters:	items (list) – list of dictionaries condition_key (str) – condition key condition_value – a value for the condition key lookup_key (str) – lookup key or key you want to find the value for
Returns:	lookup value or found value for the lookup key

expan.core.util.generate_random_data()¶: Generate random data for two variants. It can be used in unit tests or demo.

expan.core.util.is_nan(obj)¶: Checks whether the input is NaN. It uses the trick that NaN is not equal to NaN.

expan.core.version module¶

expan.core.version.git_commit_count()¶: Returns the output of git rev-list –count HEAD as an int. Note: http://programmers.stackexchange.com/a/151558

expan.core.version.git_latest_commit()¶: Returns output of git rev-parse HEAD. Note: http://programmers.stackexchange.com/a/151558.

expan.core.version.version(format_str='{short}')¶

Returns current version number in specified format.

Parameters:	format_str (str) – format string for the version
Returns:	version number in the specified format
Return type:	str

expan.core.version.version_numbers()¶: Returns ExpAn version.

Module contents¶

ExpAn core module.

expan.core.version(format_str='{short}')¶

Returns current version number in specified format.

Parameters:	format_str (str) – format string for the version
Returns:	version number in the specified format
Return type:	str