This is the second part of my little series about the Numba library. This time we will take a look on how we can use custom data types inside of functions we like to get optimized by Numba. Out-of-the-box Numba can handle scalars and n-dimensional Numpy arrays as input. Tuples and lists are also supported but lists for example have to be strictly homogeneous (even integers and floats in one list are not supported). As we have seen in the last part Pythons dictionaries are not supported. I personally like dictonaries and their key-indexing and find them really useful in many occasions. Luckily there is a way, how we can use something similar inside a Numba optimized function: numpy.dtype()
To be honest, I never used this feature of the amazing Numpy library before, but here it comes quite handy. If you, as me before, don’t know it already: You can use numpy.dtype()
to specify custom types with key/value pairs, quite similar to dictionaries and these can be used inside Numba optimized functions.
In the same occasion we will take a first look on parallelization.
Let’s have a look how we can do this.
import timeit
import numpy as np
from numba import jit, prange
Let’s use the hydrological ABC-Model again (see the introductory article for a more detailed explanation), we tried to use with Python dictionaries without success in last part. Remember that this model has three different model parameters (a, b, c) we need to pass for a simulation.
First we have to create a custom type with Numpy (here is the official documentation for this feature). For this we can pass a list of tuples to the np.dtype()
function, where each tuple specifies the name of a field and the data type of the elements in this field. Here we will create three fields, one for each parameter.
abc_dtype = np.dtype([('a', np.float64),
('b', np.float64),
('c', np.float64)])
abc_dtype
dtype([('a', '<f8'), ('b', '<f8'), ('c', '<f8')])
This can now be used like any of the build-in dtypes for any of the Numpy functions. Like creating an empty array filled with zeros.
arr = np.zeros(3, dtype=abc_dtype)
arr
array([( 0., 0., 0.), ( 0., 0., 0.), ( 0., 0., 0.)],
dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')])
We now have created an array of length three filled with zeros, where each entry consists of three values. As for dictionaries we can access any of the fields with their name and specify the index as for any Numpy array. E.g. the first value of the parameter a
can be accessed by:
arr['a'][0] = 1
arr
array([( 1., 0., 0.), ( 0., 0., 0.), ( 0., 0., 0.)],
dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')])
Okay, now let’s adapt the ABC-Model a little bit, so that we can pass multiple sets of model parameters as input and receive one time series of simulated stream flow for each parameter set. This also sounds like we could parallelize this job, since each simulation can be made independent of the others. The documentation says here that the parallelization feature is still experimental, so this might change in future but for version 0.35 we have two different methods how we can tell Numba to parallelize parts of our code - implicit and explicit.
implicit
means, that we just pass another flag to the@jit
decorator, namelyparallel=True
.- for-loops can be marked
explicitly
to be parallelized by using another function of the Numba library - theprange
function. This can be used like Pythonsrange
but tells Numba that this loop can be parallelized. If you want to useprange
you have to make sure, that there are no cross iteration dependencies.
Okay now lets implement four different versions of the ABC-Model, which all take multiple sets of parameters as inputs using our custom dtype. As reference we implement one version, that won’t be optimized by Numba, to compare the speed to a pure Python implementation. A second version will use the @jit
decorator but no parallelization. The third uses implicit
parallelization and the fourth explicit
parallelization.
# Reference implementation without Numba optimization
def abc_model_py(params, rain):
# initialize model variables
outflow = np.zeros((rain.size, params.size), dtype=np.float64)
# loop over parameter sets
for i in range(params.size):
# unpack model parameters
a = params['a'][i]
b = params['b'][i]
c = params['c'][i]
# Reset model states
state_in = 0
state_out = 0
# Actual simulation loop
for j in range(rain.size):
state_out = (1 - c) * state_in + a * rain[j]
outflow[j,i] = (1 - a - b) * rain[j] + c * state_in
state_in = state_out
return outflow
# Jit'ed but not parallelized implementation
@jit(nopython=True)
def abc_model_jit(params, rain):
# initialize model variables
outflow = np.zeros((rain.size, params.size), dtype=np.float64)
# loop over parameter sets
for i in range(params.size):
# unpack model parameters
a = params['a'][i]
b = params['b'][i]
c = params['c'][i]
# Reset model states
state_in = 0
state_out = 0
# Actual simulation loop
for j in range(rain.size):
state_out = (1 - c) * state_in + a * rain[j]
outflow[j,i] = (1 - a - b) * rain[j] + c * state_in
state_in = state_out
return outflow
# Implementation with implicit parallelization
@jit(nopython=True, parallel=True)
def abc_model_impl(params, rain):
# initialize model variables
outflow = np.zeros((rain.size, params.size), dtype=np.float64)
# loop over parameter sets
for i in range(params.size):
# unpack model parameters
a = params['a'][i]
b = params['b'][i]
c = params['c'][i]
# Reset model states
state_in = 0
state_out = 0
# Actual simulation loop
for j in range(rain.size):
state_out = (1 - c) * state_in + a * rain[j]
outflow[j,i] = (1 - a - b) * rain[j] + c * state_in
state_in = state_out
return outflow
# Implementation with explicit parallelization (see prange in 1st loop)
@jit(nopython=True, parallel=True)
def abc_model_expl(params, rain):
# initialize model variables
outflow = np.zeros((rain.size, params.size), dtype=np.float64)
# loop over parameter sets
for i in prange(params.size):
# unpack model parameters
a = params['a'][i]
b = params['b'][i]
c = params['c'][i]
# Reset model states
state_in = 0
state_out = 0
# Actual simulation loop
for j in range(rain.size):
state_out = (1 - c) * state_in + a * rain[j]
outflow[j,i] = (1 - a - b) * rain[j] + c * state_in
state_in = state_out
return outflow
Okay now we gonna generate random model parameters and a random array of precipitation. Also look how we can use our numpy data type here in combination with the astype()
function.
params = np.random.random(8).astype(abc_dtype)
rain = np.random.random(10**6)
Now we gonna use the timeit
module to compare the runtimes of each of the four implementations.
time_py = %timeit -o abc_model_py(params, rain)
7.63 s ± 59.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
time_jit = %timeit -o abc_model_jit(params, rain)
66.9 ms ± 2.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
time_impl = %timeit -o abc_model_impl(params, rain)
54.8 ms ± 329 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
time_expl = %timeit -o abc_model_expl(params, rain)
22.3 ms ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
time_py.best/time_expl.best
402.9802136214722
Between the pure Python version and the explicit parallelized version there is roughly a 400 x time difference! And through explicit parallelization we could improve the runtime by a factor of 3, while implicit parallelization only gave us a minor speed up for this function. The numbers above come from running this code on a Intel Xeon(R) CPU E5-1620 v3 @ 3.50GHz × 8.
Since with parallel computation weired stuff can happen, let’s also make sure all function outputs are identical.
outflow_py = abc_model_py(params, rain)
outflow_jit = abc_model_jit(params, rain)
outflow_impl = abc_model_impl(params, rain)
outflow_expl = abc_model_expl(params, rain)
if (np.array_equal(outflow_py, outflow_jit) and
np.array_equal(outflow_py, outflow_impl) and
np.array_equal(outflow_py, outflow_expl)):
print("All output matrices are identical.")
All output matrices are identical.
Okay so that’s it for now. I hope this post helps you to understand how custom Numpy data types can be used in combination with the Numba library and how you can make Numba parallelize your code to gain additional speedups
You can find this entire article as Jupyter Notebook here.