Skip to content

Research

Betting on Moments: Legendre Jumper Martingales for Online Exchangeability Testing

In online exchangeability testing, conformal test martingales—such as the Simple Jumper—are powerful tools for detecting distributional shifts in data streams. However, they are traditionally limited to detecting location shifts (mean). Extending these martingales to detect higher-order deviations (like variance, skewness, or kurtosis) is theoretically straightforward but computationally prohibitive.

The requirement to adapt to multiple moments simultaneously leads to an exponential expansion of the martingale's state space—a "jumping tax" that makes real-time deployment impossible.

In my latest preprint, "Betting on Moments: Legendre Jumper Martingales for Online Exchangeability Testing", I bypass this combinatorial bottleneck.

By reformulating the betting function using a basis of shifted Legendre polynomials, I provide a closed-form approach to detecting complex distributional deviations. To ensure scalability, I introduce the Variational Legendre Jumper, which uses a mean-field approximation to decouple the joint adaptation. This reduces the computational scaling from exponential to $\mathcal{O}(1)$ time.

Key Contributions

  • Orthogonal Basis: A Legendre-polynomial framework for capturing higher-order moments.
  • Variational Inference: A mean-field approach that eliminates the "jumping tax" while preserving statistical power.

  • Read the preprint: arXiv:2606.20859

  • Get the software: pip install online-cp | GitHub

New Preprint: A Fast Rule for Beta Kernel Density Estimation

If you have ever tried to estimate a probability density for data bounded in [0, 1] (like percentages, probabilities, or Gini coefficients) using a standard Gaussian KDE, you know the pain: Boundary Bias. The Gaussian kernel "leaks" probability mass below 0 and above 1, ruining the estimate at the edges.

The theoretical solution—the Beta Kernel—has existed since 1999, but it has been held back by a major practical flaw: it lacked a simple "rule-of-thumb" bandwidth selector, forcing users to rely on slow, unstable numerical optimization.

In my latest preprint, "A Fast, Closed-Form Bandwidth Selector for the Beta Kernel Density Estimator", I solve this computational bottleneck.

I derive the Beta Reference Rule, a closed-form analytical formula for the optimal bandwidth. By eliminating the need for iterative optimization, this new rule matches the accuracy of the "gold standard" methods while delivering a computational speedup of over 35,000x.

This turns the Beta Kernel from a theoretical curiosity into a practical, drop-in replacement for the Gaussian KDE for bounded data.

Key Contributions

  • Derivation: A closed-form bandwidth rule based on the asymptotic mean integrated squared error (AMISE).
  • Heuristic Fallback: A principled heuristic for "hard" (U-shaped and J-shaped) distributions where standard asymptotics fail.
  • Software: A fully documented Python package, beta-kde, that is API-compatible with scikit-learn.

You can now fit a boundary-corrected density estimator in $\mathcal{O}(1)$ time:

from beta_kde import BetaKDE
import numpy as np

# Fits instantly, no boundary bias
data = np.random.beta(2, 5, 1000)
est = BetaKDE(bandwidth='beta-reference').fit(data.reshape(-1, 1))

New Preprint: Conformal Blindness

We typically assume that if a data distribution shifts drastically, our Conformal Test Martingales (CTMs) will explode and warn us. The standard logic is simple: exchangeability implies uniform p-values; therefore, non-uniform p-values imply a break in exchangeability.

But what if the p-values stay uniform while the data moves?

In my new note, "Conformal Blindness: A Note on A-Cryptic change-points", I demonstrate that this is possible.

By constructing a specific counter-example using bivariate Gaussian distributions and an oracle conformity measure, I identify a trajectory (an "A-cryptic line") along which the data can shift arbitrarily far without triggering any CTM. In this specific setting, the p-values remain perfectly uniform, and the CTM remains flat.

This finding serves as a proof-of-concept for a fundamental "blind spot" in conformal testing: we only detect shifts that are distinguishable by our specific conformity measure. If the shift aligns with the measure's blind spot, we are flying blind.