Skip to content

Research

New Preprint: A Fast Rule for Beta Kernel Density Estimation

If you have ever tried to estimate a probability density for data bounded in [0, 1] (like percentages, probabilities, or Gini coefficients) using a standard Gaussian KDE, you know the pain: Boundary Bias. The Gaussian kernel "leaks" probability mass below 0 and above 1, ruining the estimate at the edges.

The theoretical solution—the Beta Kernel—has existed since 1999, but it has been held back by a major practical flaw: it lacked a simple "rule-of-thumb" bandwidth selector, forcing users to rely on slow, unstable numerical optimization.

In my latest preprint, "A Fast, Closed-Form Bandwidth Selector for the Beta Kernel Density Estimator", I solve this computational bottleneck.

I derive the Beta Reference Rule, a closed-form analytical formula for the optimal bandwidth. By eliminating the need for iterative optimization, this new rule matches the accuracy of the "gold standard" methods while delivering a computational speedup of over 35,000x.

This turns the Beta Kernel from a theoretical curiosity into a practical, drop-in replacement for the Gaussian KDE for bounded data.

Key Contributions

  • Derivation: A closed-form bandwidth rule based on the asymptotic mean integrated squared error (AMISE).
  • Heuristic Fallback: A principled heuristic for "hard" (U-shaped and J-shaped) distributions where standard asymptotics fail.
  • Software: A fully documented Python package, beta-kde, that is API-compatible with scikit-learn.

You can now fit a boundary-corrected density estimator in $\mathcal{O}(1)$ time:

from beta_kde import BetaKDE
import numpy as np

# Fits instantly, no boundary bias
data = np.random.beta(2, 5, 1000)
est = BetaKDE(bandwidth='beta-reference').fit(data.reshape(-1, 1))

New Preprint: Conformal Blindness

We typically assume that if a data distribution shifts drastically, our Conformal Test Martingales (CTMs) will explode and warn us. The standard logic is simple: exchangeability implies uniform p-values; therefore, non-uniform p-values imply a break in exchangeability.

But what if the p-values stay uniform while the data moves?

In my new note, "Conformal Blindness: A Note on A-Cryptic change-points", I demonstrate that this is possible.

By constructing a specific counter-example using bivariate Gaussian distributions and an oracle conformity measure, I identify a trajectory (an "A-cryptic line") along which the data can shift arbitrarily far without triggering any CTM. In this specific setting, the p-values remain perfectly uniform, and the CTM remains flat.

This finding serves as a proof-of-concept for a fundamental "blind spot" in conformal testing: we only detect shifts that are distinguishable by our specific conformity measure. If the shift aligns with the measure's blind spot, we are flying blind.