DiscrimTwoSample¶
- 
class hyppo.discrim.DiscrimTwoSample(is_dist=False, remove_isolates=True)¶
- Two Sample Discriminability test statistic and p-value. - Two sample test measures whether the discriminability is different for one dataset compared to another. More details can be described in [1]. - Let \(\hat D_{x_1}\) denote the sample discriminability of one approach, and \(\hat D_{x_2}\) denote the sample discriminability of another approach. Then, \[\begin{split}H_0: D_{x_1} &= D_{x_2} \\ H_A: D_{x_1} &> D_{x_2}\end{split}\]- Alternatively, tests can be done for \(D_{x_1} < D_{x_2}\) and \(D_{x_1} \neq D_{x_2}\). 
Methods Summary
| Helper function that calculates the discriminability test statistic. | |
| 
 | Calculates the test statistic and p-value for a two sample test for discriminability. | 
- 
DiscrimTwoSample.statistic(x, y)¶
- Helper function that calculates the discriminability test statistic. - Parameters
- x, y ( - ndarray) -- Input data matrices. x and y must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x and y can be distance matrices, where the shapes must both be (n, n).
- Returns
- stat ( - float) -- The computed two sample discriminability statistic.
 
- 
DiscrimTwoSample.test(x1, x2, y, reps=1000, alt='neq', workers=- 1, random_state=None)¶
- Calculates the test statistic and p-value for a two sample test for discriminability. - Parameters
- x1, x2 ( - ndarray) -- Input data matrices. x1 and x2 must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x1 and x2 can be distance matrices, where the shapes must both be (n, n), and- is_distmust set to- Truein this case.
- y ( - ndarray) -- A vector containing the sample ids for our n samples. Should be matched to the inputs such that- y[i]is the corresponding label for- x_1[i, :]and- x_2[i, :].
- reps ( - int,- optional (default:- 1000)) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.
- alt ( - {"greater", "less", "neq"}(default:- "neq"- )) -- The alternative hypothesis for the test. Can test that first dataset is more discriminable (alt = "greater"), less discriminable (alt = "less") or unequal discriminability (alt = "neq").
- workers ( - int,- optional (default:- -1)) -- The number of cores to parallelize the p-value computation over. Supply -1 to use all cores available to the Process.
 
- Returns
 - Examples - >>> import numpy as np >>> from hyppo.discrim import DiscrimTwoSample >>> x1 = np.ones((100,2), dtype=float) >>> x2 = np.concatenate([np.zeros((50, 2)), np.ones((50, 2))], axis=0) >>> y = np.concatenate([np.zeros(50), np.ones(50)], axis=0) >>> discrim1, discrim2, pvalue = DiscrimTwoSample().test(x1, x2, y) >>> '%.1f, %.1f, %.2f' % (discrim1, discrim2, pvalue) '0.5, 1.0, 0.00' 
