3.4. GPU#

GPUs and other heterogeneous accelerators are widely utilized to accelerate deep learning. The Dask community, in collaboration with NVIDIA, has provided a GPU-based toolkit for data science to expedite a variety of tasks.

Dask GPU Cluster#

Dask-CUDA is an extension to dask.distributed that enables detecting and managing GPU devices. Users should instal Dask-CUDA via pip install dask-cuda. Like dask.distributed discussed in Section 3.3, Dask-CUDA offers a LocalCUDACluster for a single machine. The LocalCUDACluster automatically detects and registers the multiple GPUs on the computing node, assigning a certain number of CPU cores to each GPU. For instance, in an environment equipped with 4 GPUs, initiating a single-machine Dask cluster will launch 4 Dask workers, with each worker allocated one GPU.

from dask_cuda import LocalCUDACluster
from dask.distributed import Client

cluster = LocalCUDACluster()
client = Client(cluster)
/fs/fast/u20200002/envs/dispy/lib/python3.11/site-packages/distributed/node.py:182: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37111 instead



Connection method: Cluster object Cluster type: dask_cuda.LocalCUDACluster

Cluster Info

We can launch a Dask GPU cluster by first launching the Dask Scheduler.

dask scheduler

Subsequently, launch a Dask GPU Worker on each GPU node. Thus, we have a Dask GPU cluster.

dask cuda worker tcp://scheduler:8786
client = Client("")



Connection method: Direct

Scheduler Info



Comm: tcp:// Workers: 8
Dashboard: Total threads: 8
Started: Just now Total memory: 180.00 GiB


Worker: tcp://

Comm: tcp:// Total threads: 1
Dashboard: Memory: 22.50 GiB
Nanny: tcp://
Local directory: /tmp/dask-scratch-space/worker-p2de783n
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 216.19 MiB Spilled bytes: 0 B
Read bytes: 8.81 kiB Write bytes: 14.61 kiB

Worker: tcp://

Comm: tcp:// Total threads: 1
Dashboard: Memory: 22.50 GiB
Nanny: tcp://
Local directory: /tmp/dask-scratch-space/worker-mo04yp4a
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 6.0% Last seen: Just now
Memory usage: 216.30 MiB Spilled bytes: 0 B
Read bytes: 9.76 kiB Write bytes: 14.86 kiB

Worker: tcp://

Comm: tcp:// Total threads: 1
Dashboard: Memory: 22.50 GiB
Nanny: tcp://
Local directory: /tmp/dask-scratch-space/worker-4p9jsv4f
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 216.27 MiB Spilled bytes: 0 B
Read bytes: 9.77 kiB Write bytes: 14.88 kiB

Worker: tcp://

Comm: tcp:// Total threads: 1
Dashboard: Memory: 22.50 GiB
Nanny: tcp://
Local directory: /tmp/dask-scratch-space/worker-gt5epnxr
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 216.21 MiB Spilled bytes: 0 B
Read bytes: 10.04 kiB Write bytes: 15.00 kiB

Worker: tcp://

Comm: tcp:// Total threads: 1
Dashboard: Memory: 22.50 GiB
Nanny: tcp://
Local directory: /tmp/dask-scratch-space/worker-gqcyic7m
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 217.51 MiB Spilled bytes: 0 B
Read bytes: 63.74 kiB Write bytes: 58.80 kiB

Worker: tcp://

Comm: tcp:// Total threads: 1
Dashboard: Memory: 22.50 GiB
Nanny: tcp://
Local directory: /tmp/dask-scratch-space/worker-yo78gnof
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 6.0% Last seen: Just now
Memory usage: 218.25 MiB Spilled bytes: 0 B
Read bytes: 63.73 kiB Write bytes: 58.80 kiB

Worker: tcp://

Comm: tcp:// Total threads: 1
Dashboard: Memory: 22.50 GiB
Nanny: tcp://
Local directory: /tmp/dask-scratch-space/worker-skts4xjq
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 6.0% Last seen: Just now
Memory usage: 216.24 MiB Spilled bytes: 0 B
Read bytes: 63.74 kiB Write bytes: 58.81 kiB

Worker: tcp://

Comm: tcp:// Total threads: 1
Dashboard: Memory: 22.50 GiB
Nanny: tcp://
Local directory: /tmp/dask-scratch-space/worker-pu9uzxbg
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 218.16 MiB Spilled bytes: 0 B
Read bytes: 64.86 kiB Write bytes: 59.93 kiB


Dask-CUDA only discovers and registers these GPUs, but it is incapable of isolating GPUs. Other non-Dask tasks can still preempt these GPUs. Isolation of GPU resources should be achieved by leveraging container technologies such as Kubernetes.

GPU Task#

Not all tasks can be accelerated by GPUs. GPUs are mainly used to expedite computationally intensive tasks, such as machine learning and deep learning. At present, the frameworks supported by Dask on GPUs include:

  • Scaling CuPy to a GPU cluster.

  • Scaling Dask-cudf DataFrame to a GPU cluster.


When utilizing NVIDIA’s GPUs, it is necessary to append the CUDA directory to the PATH and LD_LIBRARY_PATH environment variables, as CuPy and cuDF depend on NVIDIA’s GPU libraries.

Example: SVD#

The following code performs Singular Value Decomposition (SVD) on a GPU, which is a task well-suited for GPU acceleration. By setting dask.config.set({"array.backend": "cupy"}), the execution backend for Dask Array can be changed to CuPy on the GPU.

import cupy
import dask
import dask.array as da

dask.config.set({"array.backend": "cupy"})
rs = dask.array.random.RandomState(RandomState=cupy.random.RandomState)
x = rs.random((10000, 1000), chunks=(1000, 1000))
u, s, v = dask.array.linalg.svd(x)