In Part 1 of this series of posts, we trained a Mixture Density Network to capture a heteroscedastic conditional probability distribution.
After predicting the parameters $(w_{ij}, \mu_{ij}, \sigma_{ij})$ of the $m$ mixture components, we would like to generate some - and in some cases many - samples from the learned distribution.
A fully vectorized implementation for the problem at hand required some thought, so I decided to write it up for future reference.
Being able to sample is important for two reasons:
- first, it gives us access to unlimited “synthetic” data,
 - and second, any statistic you need can be computed from a big enough sample.
 
Imagine we want to sample num_samples random values from a mixture of num_components Gaussians, at each of num_rows values of $X_i$.
While numpy.random provides functions for efficient sampling from a number of probability distributions, we are trying to sample from num_rows
different distributions simultaneously.
Let us create some test data of appropriate shape (you can find the complete code here):
def create_data(num_rows=100, num_components=5):
    w = np.random.rand(num_rows, num_components)
    w = np.exp(w) / np.exp(w).sum(axis=1, keepdims=True)
    mu = np.random.rand(num_rows, num_components)
    sigma = np.random.rand(num_rows, num_components)
    return w, mu, sigma
w, mu, sigma = create_data()
Sampling from a single Gaussian Mixture (at row i) is a two-step process:
- Choose a random number, 
choice, between0andnum_components - 1, with probabilitiesw[i, :] - Sample a random number from the chosen mixture component
 
This can be implemented as follows:
# Implementation 0 - two nested loops
samples = np.empty((num_rows, num_samples))
for i in range(num_rows):
    _w, _mu, _sigma = w[i, :], mu[i, :], sigma[i, :]
    for j in range(num_samples):
        choice = np.random.choice(num_components, p=_w)
        samples[i, j] = _mu[choice] + _sigma[choice] * np.random.randn()
Looping through numpy arrays is never a good thing, and it is easy to spot
that both np.random.choice and np.random.randn can generate an array of random values in a single call. Using numpy’s fancy indexing, we can get rid of the for loop:
# Implementation 1 - a single loop over rows
samples = np.empty((num_rows, num_samples))
for i in range(num_rows):
    _w, _mu, _sigma = w[i, :], mu[i, :], sigma[i, :]
        choice = np.random.choice(num_components, size=num_samples, p=_w)
        samples[i] = _mu[choice] + _sigma[choice] * np.random.randn(num_samples)
For reasonable values (num_rows, num_samples, num_components = 1000, 1000, 5), the speed gain is massive:
| sampling time | |
|---|---|
| Implementation 0 | 21s | 
| Implementation 1 | 0.13s | 
| Implementation 2 | 0.12s | 
However, getting rid of the inner loop is harder. The only way I could come up with is as follows:
# Implementation 2 - no python loops
## First, sample which mixture component to use for each (row, sample)
choices = np.random.rand(num_rows, 1, num_samples)
thresholds = w.cumsum(axis=1)[:, :, np.newaxis]
col_idx = (choices < thresholds).argmax(axis=1)
_, row_idx = np.meshgrid(np.arange(num_samples), np.arange(num_rows))
## Second, compute the samples
mu = mu[row_idx, col_idx]
sigma = sigma[row_idx, col_idx]
samples = mu + sigma * np.random.randn(num_rows, num_samples)
The approach hinges on numpy array broadcasting.
Note the shapes of the arrays we construct to choose a mixture component for each sample:
| array | shape | 
|---|---|
choices | 
(num_rows, 1, num_samples) | 
thresholds | 
(num_rows, num_components, 1) | 
(choices < thresholds) | 
(num_rows, num_components, num_samples) | 
col_idx | 
(num_rows, num_samples) | 
This way, we can index the right mu and sigma for all samples simultaneously.
All the indexing doesn’t come for free, and in fact the speed difference to the previous (looped)
implementation is not as clear-cut:
When sampling a large number of points for each row, the looped approach is in fact marginally faster!
However, vectorization comes into its own for a large number of rows, where my final
implementation’s speed depends only on the total number of points samples (i.e. num_rows * num_samples), as evidenced
by the straight-line contours in the log-log plot above.
Can we make it faster still? I suspect so. Is the speed-up significant in practice? Depends. Did solving this problem make me revisit vectorization and broadcasting? Definitely.
References
- Array broadcasting in 
numpy: https://docs.scipy.org/doc/numpy/user/theory.broadcasting.html - Implicit array expansion in MATLAB: https://uk.mathworks.com/help/matlab/ref/bsxfun.html
 - My code for this post: See https://github.com/ig248/deepquantiles/blob/master/notebooks/mdn_sampling.ipynb
 - My implementation of the 
MixtureDensityRegressor, including sampling methods: https://github.com/ig248/deepquantiles/blob/master/deepquantiles/regressors/mdn.py - Box–Muller transform for generating normal random variables: https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform
 - Marsaglia polar method for generating normal random variables (as used in 
numpy.random): https://en.wikipedia.org/wiki/Marsaglia_polar_method - Sampling discrete choices: https://en.wikipedia.org/wiki/Pseudo-random_number_sampling#Finite_discrete_distributions