Parallelism

Skorch supports distributing work among a cluster of workers via dask.distributed. In this section we’ll describe how to use Dask to efficiently distribute a grid search or a randomized search on hyperparamerers across multiple GPUs and potentially multiple hosts.

Let’s assume that you have two GPUs that you want to run a hyper parameter search on.

The key here is using the CUDA environment variable CUDA_VISIBLE_DEVICES to limit which devices are visible to our CUDA application. We’ll set up Dask workers that, using this environment variable, each see one GPU only. On the PyTorch side, we’ll have to make sure to set the device to cuda when we initialize the NeuralNet class.

Let’s run through the steps. First, install Dask and dask.distributed:

python -m pip install dask distributed

Next, assuming you have two GPUs on your machine, let’s start up a Dask scheduler and two Dask workers. Make sure the Dask workers are started up in the right environment, that is, with access to all packages required to do the work:

dask-scheduler
CUDA_VISIBLE_DEVICES=0 dask-worker 127.0.0.1:8786 --nthreads 1
CUDA_VISIBLE_DEVICES=1 dask-worker 127.0.0.1:8786 --nthreads 1

In your code, use joblib’s parallel_backend() context manager to activate the Dask backend when you run grid searches and the like. Also instantiate a dask.distributed.Client to point to the Dask scheduler that you want to use. Let’s see how this could look like:

from dask.distributed import Client
from joblib import parallel_backend

client = Client('127.0.0.1:8786')

X, y = load_my_data()
net = get_that_net()

gs = GridSearchCV(
    net,
    param_grid={'lr': [0.01, 0.03]},
    scoring='accuracy',
    )
with parallel_backend('dask'):
    gs.fit(X, y)
print(gs.cv_results_)

You can also use Palladium to do the job. An example is included in the source in the examples/rnn_classifier folder. Change in there and run the following command, after having set up your Dask workers:

PALLADIUM_CONFIG=palladium-config.py,dask-config.py pld-grid-search