dcor: Distance Correlation and Energy Statistics
dcor is a Python library that provides efficient implementations of distance correlation and energy statistics, powerful tools for measuring dependence and performing two-sample tests. It supports various statistical tests including independence testing and two-sample testing. Currently at version 0.7, it is actively maintained with regular updates and a focus on numerical stability and performance.
Common errors
-
ValueError: x and y must have the same number of observations.
cause The number of samples (rows) in the input arrays `x` and `y` do not match.fixEnsure that `x.shape[0]` is equal to `y.shape[0]`. If `x` and `y` are 1D arrays, they must have the same length. -
TypeError: unsupported operand type(s) for -: 'list' and 'list'
cause dcor functions expect `numpy.ndarray` objects, but raw Python lists were passed directly.fixConvert your lists to NumPy arrays using `np.array(my_list)` before passing them to dcor functions. -
AttributeError: module 'dcor' has no attribute 'get_distance_correlation'
cause Attempting to use an old function name `get_distance_correlation` which was deprecated and removed.fixUpdate your code to use the current function name `dcor.distance_correlation`. -
TypeError: independence_test() missing 1 required positional argument: 'n_bootstraps'
cause The `independence_test` function requires the `n_bootstraps` argument to specify the number of bootstrap iterations.fixAdd the `n_bootstraps` argument with an appropriate integer value, e.g., `dcor.independence_test(x, y, n_bootstraps=1000)`.
Warnings
- breaking The function `dcor.get_distance_correlation` was renamed to `dcor.distance_correlation` in version 0.4.0. Using the old name will result in an AttributeError.
- gotcha Input data for all dcor functions (e.g., `distance_correlation`, `energy_distance`) must be NumPy arrays or objects convertible to them. Passing raw Python lists will lead to TypeError.
- gotcha When performing statistical tests like `dcor.independence_test`, the `n_bootstraps` parameter is mandatory and determines the number of bootstrap samples used for p-value calculation. A too small value can lead to unreliable results.
- gotcha The input arrays `x` and `y` must have the same number of observations (rows). If they represent samples, they must be from the same number of experimental units.
Install
-
pip install dcor
Imports
- distance_correlation
from dcor import get_distance_correlation
from dcor import distance_correlation
- energy_distance
from dcor import energy_distance
- independence_test
from dcor import independence_test
Quickstart
import numpy as np
import dcor
# Example data
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 2, 3, 4, 5])
z = np.array([5, 4, 3, 2, 1])
w = np.array([1, 2, 3, 4, 6]) # Slightly different for energy_distance
# Calculate distance correlation
dc_xy = dcor.distance_correlation(x, y)
dc_xz = dcor.distance_correlation(x, z)
print(f"Distance correlation (x, y): {dc_xy:.4f}")
print(f"Distance correlation (x, z): {dc_xz:.4f}")
# Calculate energy distance
ed_xw = dcor.energy_distance(x, w)
print(f"Energy distance (x, w): {ed_xw:.4f}")
# Perform independence test (requires bootstrapping)
# Note: n_bootstraps should be sufficiently large for real analysis
independence_p_value = dcor.independence_test(x, y, n_bootstraps=100).p_value
print(f"P-value for independence test (x, y): {independence_p_value:.4f}")