Skip to content

Open Source

Beta KDE Software Rollout: Python, R, and Julia

Following the recent acceptance of my paper on Beta Kernel Density Estimation in the Journal of Computational and Graphical Statistics (JCGS), the focus has shifted to making the method immediately accessible to applied researchers.

I am pleased to share that the new closed-form "HS" bandwidth selector is now available across the three major statistical computing languages.

The goal of this research was to provide a fast, rule-of-thumb solution for boundary-corrected density estimation without relying on unstable numerical optimization. To ensure it is a true plug-and-play solution, the method has been rolled out in the following ecosystems:

  • Python: Available via pip install beta-kde (fully API-compatible with scikit-learn). [ GitHub | Documentation ]
  • R: Now integrated as the default bandwidth selector for beta kernels in the standard kdensity CRAN package (v1.2.0+). [ CRAN | GitHub ]
  • Julia: Available via the official package manager using ] add BetaKDE. [ GitHub ]

A preprint detailing the derivation of the HS bandwidth rule is available on arXiv, and the official journal version will be published in JCGS shortly.

New Preprint: A Fast Rule for Beta Kernel Density Estimation

If you have ever tried to estimate a probability density for data bounded in [0, 1] (like percentages, probabilities, or Gini coefficients) using a standard Gaussian KDE, you know the pain: Boundary Bias. The Gaussian kernel "leaks" probability mass below 0 and above 1, ruining the estimate at the edges.

The theoretical solution—the Beta Kernel—has existed since 1999, but it has been held back by a major practical flaw: it lacked a simple "rule-of-thumb" bandwidth selector, forcing users to rely on slow, unstable numerical optimization.

In my latest preprint, "A Fast, Closed-Form Bandwidth Selector for the Beta Kernel Density Estimator", I solve this computational bottleneck.

I derive the Beta Reference Rule, a closed-form analytical formula for the optimal bandwidth. By eliminating the need for iterative optimization, this new rule matches the accuracy of the "gold standard" methods while delivering a computational speedup of over 35,000x.

This turns the Beta Kernel from a theoretical curiosity into a practical, drop-in replacement for the Gaussian KDE for bounded data.

Key Contributions

  • Derivation: A closed-form bandwidth rule based on the asymptotic mean integrated squared error (AMISE).
  • Heuristic Fallback: A principled heuristic for "hard" (U-shaped and J-shaped) distributions where standard asymptotics fail.
  • Software: A fully documented Python package, beta-kde, that is API-compatible with scikit-learn.

You can now fit a boundary-corrected density estimator in $\mathcal{O}(1)$ time:

from beta_kde import BetaKDE
import numpy as np

# Fits instantly, no boundary bias
data = np.random.beta(2, 5, 1000)
est = BetaKDE(bandwidth='beta-reference').fit(data.reshape(-1, 1))