expan.core package

Submodules

expan.core.binning module

NB: This module is deprecated.

class expan.core.binning.Bin(bin_type, *repr_args)

Bases: object

Constructor for a bin object. :param id: identifier (e.g. bin number) of the bin :param bin_type: “numerical” or “categorical” :param repr_args: arguments to represent this bin. args for numerical bin includes lower, upper, lower_closed, upper_closed args for categorical bin includes a list of categories for this bin.

class expan.core.binning.CategoricalRepresentation(categories)

Bases: object

Constructor for representation of a categorical bin. :param categories: list of categorical values that belong to this bin

apply_to_data(data, feature)

Apply the bin to data. :param data: pandas data frame :param feature: feature name on which this bin is defined :return: subset of input dataframe which belongs to this bin

class expan.core.binning.NumericalRepresentation(lower, upper, lower_closed, upper_closed)

Bases: object

Constructor for representation of a numerical bin. :param upper: upper bound of the bin (exclusive) :param lower: lower bound of the bin (inclusive) :param lower_closed: boolean indicator whether lower bound is closed :param upper_closed: boolean indicator whether upper bound is closed

apply_to_data(data, feature)

Apply the bin to data. :param data: pandas data frame :param feature: feature name on which this bin is defined :return: subset of input dataframe which belongs to this bin

expan.core.binning.create_bins(data, n_bins)

Create bins from the data value :param data: a list or a 1-dim array of data to determine the bins :param n_bins: number of bins to create :return: a list of Bin object

expan.core.binning.toBinObject(bins)

expan.core.correction module

expan.core.correction.benjamini_hochberg(false_discovery_rate, original_p_values)

Benjamini-Hochberg procedure.

Parameters:
  • false_discovery_rate (float) – proportion of significant results that are actually false positives
  • original_p_values (list[float]) – p values from all the tests
Returns:

new critical value (i.e. the corrected alpha)

Return type:

float

expan.core.correction.bonferroni(false_positive_rate, original_p_values)

Bonferrnoi correction.

Parameters:
  • false_positive_rate (float) – alpha value before correction
  • original_p_values (list[float]) – p values from all the tests
Returns:

new critical value (i.e. the corrected alpha)

Return type:

float

expan.core.early_stopping module

expan.core.early_stopping.HDI_from_MCMC(posterior_samples, credible_mass=0.95)

Computes highest density interval from a sample of representative values, estimated as the shortest credible interval. Takes Arguments posterior_samples (samples from posterior) and credible mass (normally .95). http://stackoverflow.com/questions/22284502/highest-posterior-density-region-and-central-credible-region

Parameters:
  • posterior_samples (array-like) – sample of data points from posterior distribution of some parameter
  • credible_mass (float) – the range of credible interval. 0.95 means 95% represents credible interval.
Returns:

corresponding lower and upper bound for the credible interval

Return type:

tuple[float]

expan.core.early_stopping.bayes_factor(x, y, distribution='normal', num_iters=25000, inference='sampling')

Bayes factor computation.

Parameters:
  • x (pd.Series or list (array-like)) – sample of a treatment group
  • y (pd.Series or list (array-like)) – sample of a control group
  • distribution (str) – name of the KPI distribution model, which assumes a Stan model file with the same name exists
  • num_iters (int) – number of iterations of bayes sampling
  • inference (str) – sampling or variational inference method for approximation the posterior
Returns:

results of type EarlyStoppingTestStatistics (without p-value and stat. power)

Return type:

EarlyStoppingTestStatistics

expan.core.early_stopping.bayes_precision(x, y, distribution='normal', posterior_width=0.08, num_iters=25000, inference='sampling')

Bayes precision computation.

Parameters:
  • x (pd.Series or list (array-like)) – sample of a treatment group
  • y (pd.Series or list (array-like)) – sample of a control group
  • distribution (str) – name of the KPI distribution model, which assumes a Stan model file with the same name exists
  • posterior_width (float) – the stopping criterion, threshold of the posterior width
  • num_iters (int) – number of iterations of bayes sampling
  • inference (str) – sampling or variational inference method for approximation the posterior
Returns:

results of type EarlyStoppingTestStatistics (without p-value and stat. power)

Return type:

EarlyStoppingTestStatistics

expan.core.early_stopping.get_or_compile_stan_model(model_file, distribution)

Creates Stan model. Compiles a Stan model and saves it to .pkl file to the folder selected by tempfile module if file doesn’t exist yet and load precompiled model if there is a model file in temporary dir.

Note: compiled_model_file is the hardcoded file path which may cause some issues in future. There are 2 alternative implementations for Stan models handling:

  1. Using global variables
  2. Pre-compiling stan models and adding them as a part of ExpAn project

Using temporary files with tempfile module is not currently possible, since it generates a unique file name which is difficult to track. However, compiled modules are saved in temporary directory using tempfile module which vary based on the current platform and settings. Cleaning up a temp dir is done on boot.

Parameters:
  • model_file (str) – model file location
  • distribution (str) – name of the KPI distribution model, which assumes a Stan model file with the same name exists
Returns:

compiled Stan model for the selected distribution or normal distribution as a default option

Return type:

Class representing a compiled Stan model

expan.core.early_stopping.get_trace_normalized_effect_size(distribution, traces)

Obtaining a Stan model statistics for ‘normal’ or ‘poisson’ distribution

Parameters:
  • distribution (str) – name of the KPI distribution model, which assumes a Stan model file with the same name exists
  • traces (dict) – sampling statistics
Returns:

sample of data points from posterior distribution of some parameter

Return type:

array-like

expan.core.early_stopping.group_sequential(x, y, spending_function='obrien_fleming', estimated_sample_size=None, alpha=0.05, cap=8)

Group sequential method to determine whether to stop early.

Parameters:
  • x (pd.Series or array-like) – sample of a treatment group
  • y (pd.Series or array-like) – sample of a control group
  • spending_function (str) – name of the alpha spending function, currently supports only ‘obrien_fleming’.
  • estimated_sample_size (int) – sample size to be achieved towards the end of experiment
  • alpha (float) – type-I error rate
  • cap (int) – upper bound of the adapted z-score
Returns:

results of type EarlyStoppingTestStatistics

Return type:

EarlyStoppingTestStatistics

expan.core.early_stopping.make_bayes_factor(distribution='normal', num_iters=25000, inference='sampling')

Closure method for the bayes_factor

expan.core.early_stopping.make_bayes_precision(distribution='normal', posterior_width=0.08, num_iters=25000, inference='sampling')

Closure method for the bayes_precision

expan.core.early_stopping.make_group_sequential(spending_function='obrien_fleming', estimated_sample_size=None, alpha=0.05, cap=8)

A closure to the group_sequential function.

expan.core.early_stopping.obrien_fleming(information_fraction, alpha=0.05)

Calculate an approximation of the O’Brien-Fleming alpha spending function.

Parameters:
  • information_fraction (float) – share of the information amount at the point of evaluation, e.g. the share of the maximum sample size
  • alpha (float) – type-I error rate
Returns:

redistributed alpha value at the time point with the given information fraction

Return type:

float

expan.core.experiment module

class expan.core.experiment.Experiment(metadata)

Bases: object

Class which adds the analysis functions to experimental data.

Constructor of the experiment object.

Parameters:metadata (dict) – additional information about the experiment. (e.g. primary KPI, source, etc)
analyze_statistical_test(test, test_method='fixed_horizon', include_data=False, **worker_args)

Runs delta analysis on one statistical test and returns statistical results.

Parameters:
  • test (StatisticalTest) – a statistical test to run
  • test_method (str) – analysis method to perform. It can be ‘fixed_horizon’, ‘group_sequential’, ‘bayes_factor’ or ‘bayes_precision’.
  • include_data (bool) – True if test results should include data, False - if no data should be included
  • worker_args – additional arguments for the analysis method
Returns:

statistical result of the test

Return type:

StatisticalTestResult

analyze_statistical_test_suite(test_suite, test_method='fixed_horizon', **worker_args)

Runs delta analysis on a set of tests and returns statistical results for each statistical test in the suite.

Parameters:
  • test_suite (StatisticalTestSuite) – a suite of statistical test to run
  • test_method (str) – analysis method to perform. It can be ‘fixed_horizon’, ‘group_sequential’, ‘bayes_factor’ or ‘bayes_precision’.
  • worker_args – additional arguments for the analysis method (see signatures of corresponding methods)
Returns:

statistical result of the test suite

Return type:

MultipleTestSuiteResult

outlier_filter(data, kpis, thresholds=None)

Method that filters out entities whose KPIs exceed the value at a given percentile. If any of the KPIs exceeds its threshold the entity is filtered out. If kpis contains derived kpi, this method will first create these columns, and then perform outlier filtering on all given kpis.

Parameters:
  • kpis (list[KPI]) – list of KPI instances
  • thresholds (dict) – dict of thresholds mapping KPI names to (type, percentile) tuples
Returns:

Will return data with filtered outliers.

run_goodness_of_fit_test(observed_freqs, expected_freqs, alpha=0.01, min_counts=5)

Checks the validity of observed and expected counts and runs chi-square test for goodness of fit.

Parameters:
  • observed_freqs (pd.Series) – observed frequencies
  • expected_freqs (pd.Series) – expected frequencies
  • alpha (float) – significance level
  • min_counts (int) – minimum number of observations to run chi-square test
Return split_is_unbiased:
 

False is split is biased and True if split is correct p_value: corresponding chi-square p-value

Return type:

bool, float

expan.core.results module

class expan.core.results.BaseTestStatistics(control_statistics, treatment_statistics)

Bases: expan.core.util.JsonSerializable

Holds only statistics for the control and treatment group.

Parameters:
  • control_statistics (SampleStatistics) – statistics within the control group
  • treatment_statistics (SampleStatistics) – statistics within the treatment group
class expan.core.results.CombinedTestStatistics(original_test_statistics, corrected_test_statistics)

Bases: expan.core.util.JsonSerializable

Holds original and corrected statistics. This class should be used to hold statistics for multiple testing. original_test_statistics and corrected_test_statistics should have the same type. In case there is no correction specified, corrected_test_statistics == original_test_statistics.

Parameters:
class expan.core.results.EarlyStoppingTestStatistics(control_statistics, treatment_statistics, delta, ci, p, statistical_power, stop)

Bases: expan.core.results.SimpleTestStatistics

Additionally to SimpleTestStatistics, holds boolean flag for early stopping.

Parameters:
  • control_statistics (SampleStatistics) – sample size, mean, variance for the control group
  • treatment_statistics (SampleStatistics) – sample size, mean, variance for the treatment group
  • ci (dict) – a dict where keys are percentiles and values are the corresponding value for the statistic.
  • stop (bool) – early-stopping flag
class expan.core.results.MultipleTestSuiteResult(results, correction_method=<CorrectionMethod.NONE: 1>)

Bases: expan.core.util.JsonSerializable

This class holds the results of a MultipleTestSuite.

Parameters:
merge_with(multiple_test_suite_result)

Merges two multiple test suite results. :param multiple_test_suite_result: multiple test suite result :type multiple_test_suite_result: MultipleTestSuiteResult

:return merged multiple test suite results :rtype MultipleTestSuiteResult

class expan.core.results.SampleStatistics(sample_size, mean, variance)

Bases: expan.core.util.JsonSerializable

This class holds sample size, mean and variance.

Parameters:
  • sample_size (int) – samples size of the control or treatment group
  • mean (float) – mean of the control or treatment group
  • variance (float) – variance of the control or treatment group
class expan.core.results.SimpleTestStatistics(control_statistics, treatment_statistics, delta, ci, p, statistical_power)

Bases: expan.core.results.BaseTestStatistics

Additionally to BaseTestStatistics, holds delta, confidence interval, statistical power, and p value.

Parameters:
  • control_statistics (SampleStatistics) – sample size, mean, variance for the control group
  • treatment_statistics (SampleStatistics) – sample size, mean, variance for the treatment group
  • delta (float) – delta (relative or absolute difference between control and treatment, uplift)
  • p (float) – p value
  • statistical_power (float) – statistical power value
  • ci (dict) – a dict where keys are percentiles and values are the corresponding value for the statistic.
class expan.core.results.StatisticalTestResult(test, result)

Bases: expan.core.util.JsonSerializable

This class holds the results of a single statistical test.

Parameters:

expan.core.statistical_test module

class expan.core.statistical_test.CorrectionMethod

Bases: enum.Enum

Correction methods.

BH = 3
BONFERRONI = 2
NONE = 1
class expan.core.statistical_test.DerivedKPI(name, numerator, denominator)

Bases: expan.core.statistical_test.KPI

This class represents a derived KPI which is a ratio of two columns. Names of the the two columns are passed as numerator and denominator.

Parameters:
  • name (str) – name of the kpi
  • numerator (str) – the numerator for the derived KPI
  • denominator (str) – the denominator for the derived KPI
make_derived_kpi(data)

Create the derived kpi column if it is not yet created.

class expan.core.statistical_test.FeatureFilter(column_name, column_value)

Bases: expan.core.util.JsonSerializable

This class represents a filter, restricting a DataFrame to rows with column_value in column_name.

It can be used to specify subgroup conditions. :param column_name: name of the column to perform filter on :type column_name: str :param column_value: value of the column to perform filter on :type column_value: str

apply_to_data(data)
class expan.core.statistical_test.KPI(name)

Bases: expan.core.util.JsonSerializable

This class represents a basic kpi. :param name: name of the kpi :type name: str

class expan.core.statistical_test.StatisticalTest(data, kpi, features, variants)

Bases: expan.core.util.JsonSerializable

This class describes what has to be tested against what and represent a unit of statistical testing.

Parameters:
  • data (DataFrame) – data for statistical test
  • kpi (KPI or its subclass) – the kpi to perform on
  • features (list[FeatureFilter]) – list of features used for subgroups
  • variants (Variants) – variant column name and their values
class expan.core.statistical_test.StatisticalTestSuite(tests, correction_method=<CorrectionMethod.NONE: 1>)

Bases: expan.core.util.JsonSerializable

This class consists of a number of tests plus choice of the correction method.

Parameters:
  • tests (list[StatisticalTest]) – list of statistical tests in the suite
  • correction_method (CorrectionMethod) – method used for multiple testing correction
size
class expan.core.statistical_test.Variants(variant_column_name, control_name, treatment_name)

Bases: expan.core.util.JsonSerializable

This class represents information of variants.

Parameters:
  • variant_column_name (str) – name of the column that represents variant
  • control_name (str) – value of the variant that represents control group
  • treatment_name (str) – value of the variant that represents control group
get_variant(data, variant_name)

expan.core.statistics module

expan.core.statistics.bootstrap(x, y, func=<function _delta_mean>, nruns=10000, percentiles=[2.5, 97.5], min_observations=20, return_bootstraps=False, relative=False)

Bootstraps the Confidence Intervals for a particular function comparing two samples. NaNs are ignored (discarded before calculation).

Parameters:
  • x (pd.Series or list (array-like)) – sample of the treatment group
  • y (pd.Series or list (array-like)) – sample of the control group
  • func (function) – function of which the distribution is to be computed. The default comparison metric is the difference of means. For bootstraping correlation: func=lambda x,y: np.stats.pearsonr(x,y)[0].
  • nruns (int) – number of bootstrap runs to perform
  • percentiles (list) – The values corresponding to the given percentiles are returned. The default percentiles (2.5% and 97.5%) correspond to an alpha of 0.05.
  • min_observations (int) – minimum number of observations necessary
  • return_bootstraps (bool) – If this variable is set the bootstrap sets are returned, otherwise the first return value is empty.
  • relative (bool) – if relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
Return (c_val, bootstraps):
 

c_val is a dict which contains percentile levels (index) and values bootstraps is a np.array containing the bootstrapping results per run

Return type:

tuple

expan.core.statistics.chi_square(observed_freqs, expected_freqs, ddof=0)

Computes chi-square statistics and p-values given observed and expected frequencies and degrees of freedom.

Parameters:
  • observed_freqs (pd.Series or array-like) – observed frequencies
  • expected_freqs (pd.Series or array-like) – expected frequencies
  • ddof (int) – delta degrees of freedom, 0 by default
Returns:

chi-square statistics and p-value

Return type:

float, float

expan.core.statistics.compute_p_value(mean1, std1, n1, mean2, std2, n2)

Compute two-tailed p value for statistical Student’s T-test given statistics of control and treatment.

Parameters:
  • mean1 (float) – mean value of the treatment distribution
  • std1 (float) – standard deviation of the treatment distribution
  • n1 (int) – number of samples of the treatment distribution
  • mean2 (float) – mean value of the control distribution
  • std2 (float) – standard deviation of the control distribution
  • n2 (int) – number of samples of the control distribution
Returns:

two-tailed p-value

Return type:

float

expan.core.statistics.compute_p_value_from_samples(x, y)

Calculates two-tailed p value for statistical Student’s T-test based on pooled standard deviation.

Parameters:
  • x (pd.Series or array-like) – samples of a treatment group
  • y (pd.Series or array-like) – samples of a control group
Returns:

two-tailed p-value

Return type:

float

expan.core.statistics.compute_statistical_power(mean1, std1, n1, mean2, std2, n2, z_1_minus_alpha)

Compute statistical power given statistics of control and treatment.

Parameters:
  • mean1 (float) – mean value of the treatment distribution
  • std1 (float) – standard deviation of the treatment distribution
  • n1 (int) – number of samples of the treatment distribution
  • mean2 (float) – mean value of the control distribution
  • std2 (float) – standard deviation of the control distribution
  • n2 (int) – number of samples of the control distribution
  • z_1_minus_alpha (float) – critical value for significance level alpha. That is, z-value for 1-alpha.
Returns:

statistical power—the probability of a test to detect an effect if the effect actually exists or -1 if std is less or equal to 0

Return type:

float

expan.core.statistics.compute_statistical_power_from_samples(x, y, alpha=0.05)

Compute statistical power given data samples of control and treatment.

Parameters:
  • x (pd.Series or array-like) – samples of a treatment group
  • y (pd.Series or array-like) – samples of a control group
  • alpha (float) – Type I error (false positive rate)
Returns:

statistical power—the probability of a test to detect an effect if the effect actually exists

Return type:

float

expan.core.statistics.delta(x, y, x_denominators=1, y_denominators=1, assume_normal=True, alpha=0.05, min_observations=20, nruns=10000, relative=False)

Calculates the difference of means between the samples in a statistical sense. Computation is done in form of treatment minus control, i.e. x-y. Note that NaNs are treated as if they do not exist in the data.

Parameters:
  • x (pd.Series or array-like) – sample of the treatment group
  • y (pd.Series or array-like) – sample of the control group
  • x_denominators (pd.Series or array-like) – sample of the treatment group
  • y_denominators (pd.Series or array-like) – sample of the control group
  • assume_normal (boolean) – specifies whether normal distribution assumptions can be made
  • alpha (float) – significance level (alpha)
  • min_observations (int) – minimum number of observations needed
  • nruns (int) – only used if assume normal is false
  • relative – if relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
Type:

relative: boolean

Returns:

results of type SimpleTestStatistics

Return type:

SimpleTestStatistics

expan.core.statistics.estimate_sample_size(x, mde, r, alpha=0.05, beta=0.2)

Estimates sample size based on sample mean and variance given MDE (Minimum Detectable effect), number of variants and variant split ratio

Parameters:
  • x (pd.Series or pd.DataFrame) – sample to base estimation on
  • mde (float) – minimum detectable effect
  • r (float) – variant split ratio
  • alpha (float) – significance level
  • beta (float) – type II error
Returns:

estimated sample size

Return type:

float or pd.Series

expan.core.statistics.make_delta(assume_normal=True, alpha=0.05, min_observations=20, nruns=10000, relative=False)

A closure to the delta function.

expan.core.statistics.normal_difference(mean1, std1, n1, mean2, std2, n2, percentiles=[2.5, 97.5], relative=False)

Calculates the difference distribution of two normal distributions. Computation is done in form of treatment minus control. It is assumed that the standard deviations of both distributions do not differ too much.

For further information visit:
http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Confidence_Intervals/BS704_Confidence_Intervals5.html
Parameters:
  • mean1 (float) – mean value of the treatment distribution
  • std1 (float) – standard deviation of the treatment distribution
  • n1 (int) – number of samples of the treatment distribution
  • mean2 (float) – mean value of the control distribution
  • std2 (float) – standard deviation of the control distribution
  • n2 (int) – number of samples of the control distribution
  • percentiles (list) – list of percentile values to compute
  • relative (bool) – If relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
Returns:

percentiles and corresponding values

Return type:

dict

expan.core.statistics.normal_sample_difference(x, y, percentiles=[2.5, 97.5], relative=False)

Calculates the difference distribution of two normal distributions given by their samples.

Computation is done in form of treatment minus control. It is assumed that the standard deviations of both distributions do not differ too much.

Parameters:
  • x (pd.Series or list (array-like)) – sample of a treatment group
  • y – sample of a control group
  • percentiles (list) – list of percentile values to compute
  • relative (bool) – If relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
Returns:

percentiles and corresponding values

Return type:

dict

expan.core.statistics.normal_sample_weighted_difference(x_numerators, y_numerators, x_denominators, y_denominators, percentiles=[2.5, 97.5], relative=False)

Calculates the difference distribution of two distributions given by their samples.

Computation is done in form of treatment(x) minus control(y). It is assumed that the standard deviations of both distributions do not differ too much.

The estimate of the mean difference is \(\frac{mean(x_{numerators})}{mean(x_{denominators})}-\frac{mean(y_{numerators})}{mean(y_{denominators})}\). For non-derived KPIs, the denominators will be exactly 1, and hence this will simplify to \(mean(x_{numerators})-mean(y_{numerators})\). For details on the variance calcuation, see the Glossary.

Parameters:
  • x_numerators (pd.Series or list (array-like)) – sample of a treatment group
  • y_numerators (pd.Series or list (array-like)) – sample of a control group
  • x_denominators (pd.Series or list (array-like), or simply 1 as an int/float if a non-derived KPI) – sample of a treatment group
  • y_denominators (pd.Series or list (array-like), or simply 1 as an int/float if a non-derived KPI) – sample of a control group
  • percentiles (list) – list of percentile values to compute
  • relative (bool) – If relative==True, then the values will be returned as distances below and above the mean, respectively, rather than the absolute values. In this case, the interval is mean-ret_val[0] to mean+ret_val[1]. This is more useful in many situations because it corresponds with the sem() and std() functions.
Returns:

percentiles and corresponding values

Return type:

dict with multiple entries:

  • c_i: confidence_interval
  • mean1: \(\frac{mean(x_{numerators})}{mean(x_{denominators})}\)
  • mean2: \(\frac{mean(y_{numerators})}{mean(y_{denominators})}\)
  • n1: sample size of x, after discarding NaNs
  • n2: sample size of y, after discarding NaNs
  • var1: \(var\left(\frac{x_{numerators}[i] - mean1 \cdot x_{denominators}[i]}{mean(x_{denominators})}\right)\)
  • var2: \(var\left(\frac{y_{numerators}[i] - mean2 \cdot y_{denominators}[i]}{mean(y_{denominators})}\right)\)

expan.core.statistics.pooled_std(std1, n1, std2, n2)

Returns the pooled estimate of standard deviation.

For further information visit:
http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Confidence_Intervals/BS704_Confidence_Intervals5.html
Parameters:
  • std1 (float) – standard deviation of first sample
  • n1 (int) – size of first sample
  • std2 (float) – standard deviation of second sample
  • n2 (int) – size of second sample
Returns:

pooled standard deviation

Type:

float

expan.core.statistics.sample_size(x)

Calculates valid sample size given the data.

Parameters:x (pd.Series or list (array-like)) – sample to calculate the sample size
Returns:sample size of the sample excluding nans
Return type:int

expan.core.util module

class expan.core.util.JsonSerializable

Bases: object

Interface for serializable classes.

toJson()
expan.core.util.drop_nan(array)

Drop Nan values from the given numpy array.

Parameters:array (np.ndarray) – input array
Returns:a new array without NaN values
Return type:np.ndarray
expan.core.util.find_value_by_key_with_condition(items, condition_key, condition_value, lookup_key)

Find the value of lookup key where the dictionary contains condition key = condition value.

Parameters:
  • items (list) – list of dictionaries
  • condition_key (str) – condition key
  • condition_value – a value for the condition key
  • lookup_key (str) – lookup key or key you want to find the value for
Returns:

lookup value or found value for the lookup key

expan.core.util.generate_random_data()

Generate random data for two variants. It can be used in unit tests or demo.

expan.core.util.is_nan(obj)

Checks whether the input is NaN. It uses the trick that NaN is not equal to NaN.

expan.core.version module

expan.core.version.git_commit_count()

Returns the output of git rev-list –count HEAD as an int. Note: http://programmers.stackexchange.com/a/151558

expan.core.version.git_latest_commit()

Returns output of git rev-parse HEAD. Note: http://programmers.stackexchange.com/a/151558.

expan.core.version.version(format_str='{short}')

Returns current version number in specified format.

Parameters:format_str (str) – format string for the version
Returns:version number in the specified format
Return type:str
expan.core.version.version_numbers()

Returns ExpAn version.

Module contents

ExpAn core module.

expan.core.version(format_str='{short}')

Returns current version number in specified format.

Parameters:format_str (str) – format string for the version
Returns:version number in the specified format
Return type:str