Posts


Maximum Likelihood Estimation on Binned Data - 2020-05-25

If in doubt, use KL divergence

Most of the time in machine learning is spent coming up with clever ways to estimate underlying probability distributions from which the observed samples have been drawn by chance, or waiting for said clever ways to rack up a sizable computing bill. But what if we have a lot of data? In such cases, we often use histograms to get a compressed representation.

Similarly, the underlying (parametric) distribution can be discretized for faster computations, with an often negligible effect on accuracy. Such formulation can arise if the parametric model itself is defined as a mixture of (binned) empirical distributions (as in this real-world example).

How do we find the maximum likelihood estimate (MLE) of the distribution parameters in this binned world? My intuition suggested that MLE should be equivalent to minimizing the KL divergence between the emprical and the model distributions. Nontheless, I felt that it was worth going through a simple derivation to remove any doubt.

more...

Better function transformers in ML pipelines - 2018-11-21

A transformer factory using metaprogramming

One of the most convenient features in scikit-learn is the ability to build complex models by chaining transformers and estimators into pipelines.

Importantly, all (hyper-)parameters of each transformer remain accessible and tunable. The simplicity suffers somewhat once we need to add custom preprocessing functions into the pipeline. The “standard” approach using sklearn.preprocessing.FunctionTransformer felt decidedly unsatisfactory once I tried to define some parameter search spaces, so I looked into implementing a more usable alternative:

Beautiful is better than ugly!

more...