3.4. GPU
GPUs and other heterogeneous accelerators are widely utilized to accelerate deep learning. The Dask community, in collaboration with NVIDIA, has provided a GPU-based toolkit for data science to expedite a variety of tasks.
Dask GPU Cluster
Dask-CUDA is an extension to dask.distributed
that enables detecting and managing GPU devices. Users should instal Dask-CUDA via pip install dask-cuda
. Like dask.distributed
discussed in Section 3.3, Dask-CUDA offers a LocalCUDACluster
for a single machine. The LocalCUDACluster
automatically detects and registers the multiple GPUs on the computing node, assigning a certain number of CPU cores to each GPU. For instance, in an environment equipped with 4 GPUs, initiating a single-machine Dask cluster will launch 4 Dask workers, with each worker allocated one GPU.
/fs/fast/u20200002/envs/dispy/lib/python3.11/site-packages/distributed/node.py:182: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37111 instead
warnings.warn(
Client
Client-5c3311bf-0ce5-11ef-bd8c-000012e4fe80
Cluster Info
LocalCUDACluster
209b2784
Scheduler Info
Scheduler
Scheduler-39587c13-5825-4748-be18-a18f23c602bb
Workers
Worker: 0
Comm: tcp://127.0.0.1:36681
|
Total threads: 1
|
Dashboard: http://127.0.0.1:38373/status
|
Memory: 22.50 GiB
|
Nanny: tcp://127.0.0.1:41031
|
|
Local directory: /tmp/dask-scratch-space/worker-jkx850hc
|
Worker: 1
Comm: tcp://127.0.0.1:37987
|
Total threads: 1
|
Dashboard: http://127.0.0.1:38845/status
|
Memory: 22.50 GiB
|
Nanny: tcp://127.0.0.1:36415
|
|
Local directory: /tmp/dask-scratch-space/worker-gelyun5u
|
Worker: 2
Comm: tcp://127.0.0.1:36139
|
Total threads: 1
|
Dashboard: http://127.0.0.1:44939/status
|
Memory: 22.50 GiB
|
Nanny: tcp://127.0.0.1:40211
|
|
Local directory: /tmp/dask-scratch-space/worker-c6owcg7k
|
Worker: 3
Comm: tcp://127.0.0.1:46363
|
Total threads: 1
|
Dashboard: http://127.0.0.1:40611/status
|
Memory: 22.50 GiB
|
Nanny: tcp://127.0.0.1:38093
|
|
Local directory: /tmp/dask-scratch-space/worker-hyl9pn8_
|
We can launch a Dask GPU cluster by first launching the Dask Scheduler.
Subsequently, launch a Dask GPU Worker on each GPU node. Thus, we have a Dask GPU cluster.
dask cuda worker tcp://scheduler:8786
Client
Client-6039933f-0ce3-11ef-b163-000012e4fe80
Scheduler Info
Scheduler
Scheduler-d073585d-dcac-41bf-9c5c-1055fe07576c
Comm: tcp://10.0.0.3:8786
|
Workers: 8
|
Dashboard: http://10.0.0.3:8787/status
|
Total threads: 8
|
Started: Just now
|
Total memory: 180.00 GiB
|
Workers
Worker: tcp://10.0.0.2:34491
Comm: tcp://10.0.0.2:34491
|
Total threads: 1
|
Dashboard: http://10.0.0.2:38385/status
|
Memory: 22.50 GiB
|
Nanny: tcp://10.0.0.2:37559
|
|
Local directory: /tmp/dask-scratch-space/worker-p2de783n
|
Tasks executing:
|
Tasks in memory:
|
Tasks ready:
|
Tasks in flight:
|
CPU usage: 4.0%
|
Last seen: Just now
|
Memory usage: 216.19 MiB
|
Spilled bytes: 0 B
|
Read bytes: 8.81 kiB
|
Write bytes: 14.61 kiB
|
Worker: tcp://10.0.0.2:39239
Comm: tcp://10.0.0.2:39239
|
Total threads: 1
|
Dashboard: http://10.0.0.2:45797/status
|
Memory: 22.50 GiB
|
Nanny: tcp://10.0.0.2:36259
|
|
Local directory: /tmp/dask-scratch-space/worker-mo04yp4a
|
Tasks executing:
|
Tasks in memory:
|
Tasks ready:
|
Tasks in flight:
|
CPU usage: 6.0%
|
Last seen: Just now
|
Memory usage: 216.30 MiB
|
Spilled bytes: 0 B
|
Read bytes: 9.76 kiB
|
Write bytes: 14.86 kiB
|
Worker: tcp://10.0.0.2:40863
Comm: tcp://10.0.0.2:40863
|
Total threads: 1
|
Dashboard: http://10.0.0.2:43677/status
|
Memory: 22.50 GiB
|
Nanny: tcp://10.0.0.2:32877
|
|
Local directory: /tmp/dask-scratch-space/worker-4p9jsv4f
|
Tasks executing:
|
Tasks in memory:
|
Tasks ready:
|
Tasks in flight:
|
CPU usage: 4.0%
|
Last seen: Just now
|
Memory usage: 216.27 MiB
|
Spilled bytes: 0 B
|
Read bytes: 9.77 kiB
|
Write bytes: 14.88 kiB
|
Worker: tcp://10.0.0.2:46243
Comm: tcp://10.0.0.2:46243
|
Total threads: 1
|
Dashboard: http://10.0.0.2:40513/status
|
Memory: 22.50 GiB
|
Nanny: tcp://10.0.0.2:45107
|
|
Local directory: /tmp/dask-scratch-space/worker-gt5epnxr
|
Tasks executing:
|
Tasks in memory:
|
Tasks ready:
|
Tasks in flight:
|
CPU usage: 4.0%
|
Last seen: Just now
|
Memory usage: 216.21 MiB
|
Spilled bytes: 0 B
|
Read bytes: 10.04 kiB
|
Write bytes: 15.00 kiB
|
Worker: tcp://10.0.0.3:39647
Comm: tcp://10.0.0.3:39647
|
Total threads: 1
|
Dashboard: http://10.0.0.3:38377/status
|
Memory: 22.50 GiB
|
Nanny: tcp://10.0.0.3:34843
|
|
Local directory: /tmp/dask-scratch-space/worker-gqcyic7m
|
Tasks executing:
|
Tasks in memory:
|
Tasks ready:
|
Tasks in flight:
|
CPU usage: 4.0%
|
Last seen: Just now
|
Memory usage: 217.51 MiB
|
Spilled bytes: 0 B
|
Read bytes: 63.74 kiB
|
Write bytes: 58.80 kiB
|
Worker: tcp://10.0.0.3:40155
Comm: tcp://10.0.0.3:40155
|
Total threads: 1
|
Dashboard: http://10.0.0.3:34723/status
|
Memory: 22.50 GiB
|
Nanny: tcp://10.0.0.3:46339
|
|
Local directory: /tmp/dask-scratch-space/worker-yo78gnof
|
Tasks executing:
|
Tasks in memory:
|
Tasks ready:
|
Tasks in flight:
|
CPU usage: 6.0%
|
Last seen: Just now
|
Memory usage: 218.25 MiB
|
Spilled bytes: 0 B
|
Read bytes: 63.73 kiB
|
Write bytes: 58.80 kiB
|
Worker: tcp://10.0.0.3:45005
Comm: tcp://10.0.0.3:45005
|
Total threads: 1
|
Dashboard: http://10.0.0.3:42503/status
|
Memory: 22.50 GiB
|
Nanny: tcp://10.0.0.3:34929
|
|
Local directory: /tmp/dask-scratch-space/worker-skts4xjq
|
Tasks executing:
|
Tasks in memory:
|
Tasks ready:
|
Tasks in flight:
|
CPU usage: 6.0%
|
Last seen: Just now
|
Memory usage: 216.24 MiB
|
Spilled bytes: 0 B
|
Read bytes: 63.74 kiB
|
Write bytes: 58.81 kiB
|
Worker: tcp://10.0.0.3:46333
Comm: tcp://10.0.0.3:46333
|
Total threads: 1
|
Dashboard: http://10.0.0.3:36413/status
|
Memory: 22.50 GiB
|
Nanny: tcp://10.0.0.3:44405
|
|
Local directory: /tmp/dask-scratch-space/worker-pu9uzxbg
|
Tasks executing:
|
Tasks in memory:
|
Tasks ready:
|
Tasks in flight:
|
CPU usage: 4.0%
|
Last seen: Just now
|
Memory usage: 218.16 MiB
|
Spilled bytes: 0 B
|
Read bytes: 64.86 kiB
|
Write bytes: 59.93 kiB
|
Note
Dask-CUDA only discovers and registers these GPUs, but it is incapable of isolating GPUs. Other non-Dask tasks can still preempt these GPUs. Isolation of GPU resources should be achieved by leveraging container technologies such as Kubernetes.
GPU Task
Not all tasks can be accelerated by GPUs. GPUs are mainly used to expedite computationally intensive tasks, such as machine learning and deep learning. At present, the frameworks supported by Dask on GPUs include:
Note
When utilizing NVIDIA’s GPUs, it is necessary to append the CUDA directory to the PATH
and LD_LIBRARY_PATH
environment variables, as CuPy and cuDF depend on NVIDIA’s GPU libraries.
Example: SVD
The following code performs Singular Value Decomposition (SVD) on a GPU, which is a task well-suited for GPU acceleration. By setting dask.config.set({"array.backend": "cupy"})
, the execution backend for Dask Array can be changed to CuPy on the GPU.
import cupy
import dask
import dask.array as da
dask.config.set({"array.backend": "cupy"})
rs = dask.array.random.RandomState(RandomState=cupy.random.RandomState)
x = rs.random((10000, 1000), chunks=(1000, 1000))
u, s, v = dask.array.linalg.svd(x)