(sec-dask-gpu)=
# GPU

GPUs and other heterogeneous accelerators are widely utilized to accelerate deep learning. The Dask community, in collaboration with NVIDIA, has provided a GPU-based toolkit for data science to expedite a variety of tasks.

## Dask GPU Cluster

[Dask-CUDA](https://docs.rapids.ai/api/dask-cuda/stable/) is an extension to `dask.distributed` that enables detecting and managing GPU devices. Users should instal Dask-CUDA via `pip install dask-cuda`. Like `dask.distributed` discussed in {numref}`sec-dask-distributed`, Dask-CUDA offers a `LocalCUDACluster` for a single machine. The `LocalCUDACluster` automatically detects and registers the multiple GPUs on the computing node, assigning a certain number of CPU cores to each GPU. For instance, in an environment equipped with 4 GPUs, initiating a single-machine Dask cluster will launch 4 Dask workers, with each worker allocated one GPU.

In [1]:
from dask_cuda import LocalCUDACluster
from dask.distributed import Client

cluster = LocalCUDACluster()
client = Client(cluster)
client

Perhaps you already have a cluster running?
Hosting the HTTP server on port 37111 instead


0,1
Connection method: Cluster object,Cluster type: dask_cuda.LocalCUDACluster
Dashboard: http://127.0.0.1:37111/status,

0,1
Dashboard: http://127.0.0.1:37111/status,Workers: 4
Total threads: 4,Total memory: 90.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:46657,Workers: 4
Dashboard: http://127.0.0.1:37111/status,Total threads: 4
Started: Just now,Total memory: 90.00 GiB

0,1
Comm: tcp://127.0.0.1:36681,Total threads: 1
Dashboard: http://127.0.0.1:38373/status,Memory: 22.50 GiB
Nanny: tcp://127.0.0.1:41031,
Local directory: /tmp/dask-scratch-space/worker-jkx850hc,Local directory: /tmp/dask-scratch-space/worker-jkx850hc

0,1
Comm: tcp://127.0.0.1:37987,Total threads: 1
Dashboard: http://127.0.0.1:38845/status,Memory: 22.50 GiB
Nanny: tcp://127.0.0.1:36415,
Local directory: /tmp/dask-scratch-space/worker-gelyun5u,Local directory: /tmp/dask-scratch-space/worker-gelyun5u

0,1
Comm: tcp://127.0.0.1:36139,Total threads: 1
Dashboard: http://127.0.0.1:44939/status,Memory: 22.50 GiB
Nanny: tcp://127.0.0.1:40211,
Local directory: /tmp/dask-scratch-space/worker-c6owcg7k,Local directory: /tmp/dask-scratch-space/worker-c6owcg7k

0,1
Comm: tcp://127.0.0.1:46363,Total threads: 1
Dashboard: http://127.0.0.1:40611/status,Memory: 22.50 GiB
Nanny: tcp://127.0.0.1:38093,
Local directory: /tmp/dask-scratch-space/worker-hyl9pn8_,Local directory: /tmp/dask-scratch-space/worker-hyl9pn8_


We can launch a Dask GPU cluster by first launching the Dask Scheduler.

```
dask scheduler
```

Subsequently, launch a Dask GPU Worker on each GPU node. Thus, we have a Dask GPU cluster.

```
dask cuda worker tcp://scheduler:8786
```

In [4]:
client = Client("10.0.0.3:8786")
client

0,1
Connection method: Direct,
Dashboard: http://10.0.0.3:8787/status,

0,1
Comm: tcp://10.0.0.3:8786,Workers: 8
Dashboard: http://10.0.0.3:8787/status,Total threads: 8
Started: Just now,Total memory: 180.00 GiB

0,1
Comm: tcp://10.0.0.2:34491,Total threads: 1
Dashboard: http://10.0.0.2:38385/status,Memory: 22.50 GiB
Nanny: tcp://10.0.0.2:37559,
Local directory: /tmp/dask-scratch-space/worker-p2de783n,Local directory: /tmp/dask-scratch-space/worker-p2de783n
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 216.19 MiB,Spilled bytes: 0 B
Read bytes: 8.81 kiB,Write bytes: 14.61 kiB

0,1
Comm: tcp://10.0.0.2:39239,Total threads: 1
Dashboard: http://10.0.0.2:45797/status,Memory: 22.50 GiB
Nanny: tcp://10.0.0.2:36259,
Local directory: /tmp/dask-scratch-space/worker-mo04yp4a,Local directory: /tmp/dask-scratch-space/worker-mo04yp4a
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 6.0%,Last seen: Just now
Memory usage: 216.30 MiB,Spilled bytes: 0 B
Read bytes: 9.76 kiB,Write bytes: 14.86 kiB

0,1
Comm: tcp://10.0.0.2:40863,Total threads: 1
Dashboard: http://10.0.0.2:43677/status,Memory: 22.50 GiB
Nanny: tcp://10.0.0.2:32877,
Local directory: /tmp/dask-scratch-space/worker-4p9jsv4f,Local directory: /tmp/dask-scratch-space/worker-4p9jsv4f
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 216.27 MiB,Spilled bytes: 0 B
Read bytes: 9.77 kiB,Write bytes: 14.88 kiB

0,1
Comm: tcp://10.0.0.2:46243,Total threads: 1
Dashboard: http://10.0.0.2:40513/status,Memory: 22.50 GiB
Nanny: tcp://10.0.0.2:45107,
Local directory: /tmp/dask-scratch-space/worker-gt5epnxr,Local directory: /tmp/dask-scratch-space/worker-gt5epnxr
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 216.21 MiB,Spilled bytes: 0 B
Read bytes: 10.04 kiB,Write bytes: 15.00 kiB

0,1
Comm: tcp://10.0.0.3:39647,Total threads: 1
Dashboard: http://10.0.0.3:38377/status,Memory: 22.50 GiB
Nanny: tcp://10.0.0.3:34843,
Local directory: /tmp/dask-scratch-space/worker-gqcyic7m,Local directory: /tmp/dask-scratch-space/worker-gqcyic7m
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 217.51 MiB,Spilled bytes: 0 B
Read bytes: 63.74 kiB,Write bytes: 58.80 kiB

0,1
Comm: tcp://10.0.0.3:40155,Total threads: 1
Dashboard: http://10.0.0.3:34723/status,Memory: 22.50 GiB
Nanny: tcp://10.0.0.3:46339,
Local directory: /tmp/dask-scratch-space/worker-yo78gnof,Local directory: /tmp/dask-scratch-space/worker-yo78gnof
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 6.0%,Last seen: Just now
Memory usage: 218.25 MiB,Spilled bytes: 0 B
Read bytes: 63.73 kiB,Write bytes: 58.80 kiB

0,1
Comm: tcp://10.0.0.3:45005,Total threads: 1
Dashboard: http://10.0.0.3:42503/status,Memory: 22.50 GiB
Nanny: tcp://10.0.0.3:34929,
Local directory: /tmp/dask-scratch-space/worker-skts4xjq,Local directory: /tmp/dask-scratch-space/worker-skts4xjq
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 6.0%,Last seen: Just now
Memory usage: 216.24 MiB,Spilled bytes: 0 B
Read bytes: 63.74 kiB,Write bytes: 58.81 kiB

0,1
Comm: tcp://10.0.0.3:46333,Total threads: 1
Dashboard: http://10.0.0.3:36413/status,Memory: 22.50 GiB
Nanny: tcp://10.0.0.3:44405,
Local directory: /tmp/dask-scratch-space/worker-pu9uzxbg,Local directory: /tmp/dask-scratch-space/worker-pu9uzxbg
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 218.16 MiB,Spilled bytes: 0 B
Read bytes: 64.86 kiB,Write bytes: 59.93 kiB


:::{note}
Dask-CUDA only discovers and registers these GPUs, but it is incapable of isolating GPUs. Other non-Dask tasks can still preempt these GPUs. Isolation of GPU resources should be achieved by leveraging container technologies such as Kubernetes. 
:::

## GPU Task

Not all tasks can be accelerated by GPUs. GPUs are mainly used to expedite computationally intensive tasks, such as machine learning and deep learning. At present, the frameworks supported by Dask on GPUs include:

* Scaling [CuPy](https://cupy.dev/) to a GPU cluster.
* Scaling [Dask-cudf](https://docs.rapids.ai/api/dask-cudf/stable/) DataFrame to a GPU cluster.

:::{note}
When utilizing NVIDIA's GPUs, it is necessary to append the CUDA directory to the `PATH` and `LD_LIBRARY_PATH` environment variables, as CuPy and cuDF depend on NVIDIA's GPU libraries.
:::

### Example: SVD

The following code performs Singular Value Decomposition (SVD) on a GPU, which is a task well-suited for GPU acceleration. By setting `dask.config.set({"array.backend": "cupy"})`, the execution backend for Dask Array can be changed to CuPy on the GPU.

```python
import cupy
import dask
import dask.array as da

dask.config.set({"array.backend": "cupy"})
rs = dask.array.random.RandomState(RandomState=cupy.random.RandomState)
x = rs.random((10000, 1000), chunks=(1000, 1000))
u, s, v = dask.array.linalg.svd(x)
```