3.4. GPU#

GPUs and other heterogeneous accelerators are widely utilized to accelerate deep learning. The Dask community, in collaboration with NVIDIA, has provided a GPU-based toolkit for data science to expedite a variety of tasks.

Dask GPU Cluster#

Dask-CUDA is an extension to dask.distributed that enables detecting and managing GPU devices. Users should instal Dask-CUDA via pip install dask-cuda. Like dask.distributed discussed in Section 3.3, Dask-CUDA offers a LocalCUDACluster for a single machine. The LocalCUDACluster automatically detects and registers the multiple GPUs on the computing node, assigning a certain number of CPU cores to each GPU. For instance, in an environment equipped with 4 GPUs, initiating a single-machine Dask cluster will launch 4 Dask workers, with each worker allocated one GPU.

from dask_cuda import LocalCUDACluster
from dask.distributed import Client

cluster = LocalCUDACluster()
client = Client(cluster)
client
/fs/fast/u20200002/envs/dispy/lib/python3.11/site-packages/distributed/node.py:182: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37111 instead
  warnings.warn(

Client

Client-5c3311bf-0ce5-11ef-bd8c-000012e4fe80

Connection method: Cluster object Cluster type: dask_cuda.LocalCUDACluster
Dashboard: http://127.0.0.1:37111/status

Cluster Info

We can launch a Dask GPU cluster by first launching the Dask Scheduler.

dask scheduler

Subsequently, launch a Dask GPU Worker on each GPU node. Thus, we have a Dask GPU cluster.

dask cuda worker tcp://scheduler:8786
client = Client("10.0.0.3:8786")
client

Client

Client-6039933f-0ce3-11ef-b163-000012e4fe80

Connection method: Direct
Dashboard: http://10.0.0.3:8787/status

Scheduler Info

Scheduler

Scheduler-d073585d-dcac-41bf-9c5c-1055fe07576c

Comm: tcp://10.0.0.3:8786 Workers: 8
Dashboard: http://10.0.0.3:8787/status Total threads: 8
Started: Just now Total memory: 180.00 GiB

Workers

Worker: tcp://10.0.0.2:34491

Comm: tcp://10.0.0.2:34491 Total threads: 1
Dashboard: http://10.0.0.2:38385/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.2:37559
Local directory: /tmp/dask-scratch-space/worker-p2de783n
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 216.19 MiB Spilled bytes: 0 B
Read bytes: 8.81 kiB Write bytes: 14.61 kiB

Worker: tcp://10.0.0.2:39239

Comm: tcp://10.0.0.2:39239 Total threads: 1
Dashboard: http://10.0.0.2:45797/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.2:36259
Local directory: /tmp/dask-scratch-space/worker-mo04yp4a
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 6.0% Last seen: Just now
Memory usage: 216.30 MiB Spilled bytes: 0 B
Read bytes: 9.76 kiB Write bytes: 14.86 kiB

Worker: tcp://10.0.0.2:40863

Comm: tcp://10.0.0.2:40863 Total threads: 1
Dashboard: http://10.0.0.2:43677/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.2:32877
Local directory: /tmp/dask-scratch-space/worker-4p9jsv4f
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 216.27 MiB Spilled bytes: 0 B
Read bytes: 9.77 kiB Write bytes: 14.88 kiB

Worker: tcp://10.0.0.2:46243

Comm: tcp://10.0.0.2:46243 Total threads: 1
Dashboard: http://10.0.0.2:40513/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.2:45107
Local directory: /tmp/dask-scratch-space/worker-gt5epnxr
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 216.21 MiB Spilled bytes: 0 B
Read bytes: 10.04 kiB Write bytes: 15.00 kiB

Worker: tcp://10.0.0.3:39647

Comm: tcp://10.0.0.3:39647 Total threads: 1
Dashboard: http://10.0.0.3:38377/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.3:34843
Local directory: /tmp/dask-scratch-space/worker-gqcyic7m
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 217.51 MiB Spilled bytes: 0 B
Read bytes: 63.74 kiB Write bytes: 58.80 kiB

Worker: tcp://10.0.0.3:40155

Comm: tcp://10.0.0.3:40155 Total threads: 1
Dashboard: http://10.0.0.3:34723/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.3:46339
Local directory: /tmp/dask-scratch-space/worker-yo78gnof
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 6.0% Last seen: Just now
Memory usage: 218.25 MiB Spilled bytes: 0 B
Read bytes: 63.73 kiB Write bytes: 58.80 kiB

Worker: tcp://10.0.0.3:45005

Comm: tcp://10.0.0.3:45005 Total threads: 1
Dashboard: http://10.0.0.3:42503/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.3:34929
Local directory: /tmp/dask-scratch-space/worker-skts4xjq
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 6.0% Last seen: Just now
Memory usage: 216.24 MiB Spilled bytes: 0 B
Read bytes: 63.74 kiB Write bytes: 58.81 kiB

Worker: tcp://10.0.0.3:46333

Comm: tcp://10.0.0.3:46333 Total threads: 1
Dashboard: http://10.0.0.3:36413/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.3:44405
Local directory: /tmp/dask-scratch-space/worker-pu9uzxbg
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 218.16 MiB Spilled bytes: 0 B
Read bytes: 64.86 kiB Write bytes: 59.93 kiB

Note

Dask-CUDA only discovers and registers these GPUs, but it is incapable of isolating GPUs. Other non-Dask tasks can still preempt these GPUs. Isolation of GPU resources should be achieved by leveraging container technologies such as Kubernetes.

GPU Task#

Not all tasks can be accelerated by GPUs. GPUs are mainly used to expedite computationally intensive tasks, such as machine learning and deep learning. At present, the frameworks supported by Dask on GPUs include:

  • Scaling CuPy to a GPU cluster.

  • Scaling Dask-cudf DataFrame to a GPU cluster.

Note

When utilizing NVIDIA’s GPUs, it is necessary to append the CUDA directory to the PATH and LD_LIBRARY_PATH environment variables, as CuPy and cuDF depend on NVIDIA’s GPU libraries.

Example: SVD#

The following code performs Singular Value Decomposition (SVD) on a GPU, which is a task well-suited for GPU acceleration. By setting dask.config.set({"array.backend": "cupy"}), the execution backend for Dask Array can be changed to CuPy on the GPU.

import cupy
import dask
import dask.array as da

dask.config.set({"array.backend": "cupy"})
rs = dask.array.random.RandomState(RandomState=cupy.random.RandomState)
x = rs.random((10000, 1000), chunks=(1000, 1000))
u, s, v = dask.array.linalg.svd(x)