{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "(sec-dask-gpu)=\n", "# GPU\n", "\n", "GPUs and other heterogeneous accelerators are widely utilized to accelerate deep learning. The Dask community, in collaboration with NVIDIA, has provided a GPU-based toolkit for data science to expedite a variety of tasks.\n", "\n", "## Dask GPU Cluster\n", "\n", "[Dask-CUDA](https://docs.rapids.ai/api/dask-cuda/stable/) is an extension to `dask.distributed` that enables detecting and managing GPU devices. Users should instal Dask-CUDA via `pip install dask-cuda`. Like `dask.distributed` discussed in {numref}`sec-dask-distributed`, Dask-CUDA offers a `LocalCUDACluster` for a single machine. The `LocalCUDACluster` automatically detects and registers the multiple GPUs on the computing node, assigning a certain number of CPU cores to each GPU. For instance, in an environment equipped with 4 GPUs, initiating a single-machine Dask cluster will launch 4 Dask workers, with each worker allocated one GPU." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/fs/fast/u20200002/envs/dispy/lib/python3.11/site-packages/distributed/node.py:182: UserWarning: Port 8787 is already in use.\n", "Perhaps you already have a cluster running?\n", "Hosting the HTTP server on port 37111 instead\n", " warnings.warn(\n" ] }, { "data": { "text/html": [ "
\n", "
\n", "
\n", "

Client

\n", "

Client-5c3311bf-0ce5-11ef-bd8c-000012e4fe80

\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
Connection method: Cluster objectCluster type: dask_cuda.LocalCUDACluster
\n", " Dashboard: http://127.0.0.1:37111/status\n", "
\n", "\n", " \n", "\n", " \n", "
\n", "

Cluster Info

\n", "
\n", "
\n", "
\n", "
\n", "

LocalCUDACluster

\n", "

209b2784

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", "
\n", " Dashboard: http://127.0.0.1:37111/status\n", " \n", " Workers: 4\n", "
\n", " Total threads: 4\n", " \n", " Total memory: 90.00 GiB\n", "
Status: runningUsing processes: True
\n", "\n", "
\n", " \n", "

Scheduler Info

\n", "
\n", "\n", "
\n", "
\n", "
\n", "
\n", "

Scheduler

\n", "

Scheduler-39587c13-5825-4748-be18-a18f23c602bb

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", " Comm: tcp://127.0.0.1:46657\n", " \n", " Workers: 4\n", "
\n", " Dashboard: http://127.0.0.1:37111/status\n", " \n", " Total threads: 4\n", "
\n", " Started: Just now\n", " \n", " Total memory: 90.00 GiB\n", "
\n", "
\n", "
\n", "\n", "
\n", " \n", "

Workers

\n", "
\n", "\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "

Worker: 0

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", "\n", "
\n", " Comm: tcp://127.0.0.1:36681\n", " \n", " Total threads: 1\n", "
\n", " Dashboard: http://127.0.0.1:38373/status\n", " \n", " Memory: 22.50 GiB\n", "
\n", " Nanny: tcp://127.0.0.1:41031\n", "
\n", " Local directory: /tmp/dask-scratch-space/worker-jkx850hc\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "

Worker: 1

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", "\n", "
\n", " Comm: tcp://127.0.0.1:37987\n", " \n", " Total threads: 1\n", "
\n", " Dashboard: http://127.0.0.1:38845/status\n", " \n", " Memory: 22.50 GiB\n", "
\n", " Nanny: tcp://127.0.0.1:36415\n", "
\n", " Local directory: /tmp/dask-scratch-space/worker-gelyun5u\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "

Worker: 2

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", "\n", "
\n", " Comm: tcp://127.0.0.1:36139\n", " \n", " Total threads: 1\n", "
\n", " Dashboard: http://127.0.0.1:44939/status\n", " \n", " Memory: 22.50 GiB\n", "
\n", " Nanny: tcp://127.0.0.1:40211\n", "
\n", " Local directory: /tmp/dask-scratch-space/worker-c6owcg7k\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "

Worker: 3

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", "\n", "
\n", " Comm: tcp://127.0.0.1:46363\n", " \n", " Total threads: 1\n", "
\n", " Dashboard: http://127.0.0.1:40611/status\n", " \n", " Memory: 22.50 GiB\n", "
\n", " Nanny: tcp://127.0.0.1:38093\n", "
\n", " Local directory: /tmp/dask-scratch-space/worker-hyl9pn8_\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "
\n", "\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from dask_cuda import LocalCUDACluster\n", "from dask.distributed import Client\n", "\n", "cluster = LocalCUDACluster()\n", "client = Client(cluster)\n", "client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can launch a Dask GPU cluster by first launching the Dask Scheduler.\n", "\n", "```\n", "dask scheduler\n", "```\n", "\n", "Subsequently, launch a Dask GPU Worker on each GPU node. Thus, we have a Dask GPU cluster.\n", "\n", "```\n", "dask cuda worker tcp://scheduler:8786\n", "```" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "
\n", "
\n", "

Client

\n", "

Client-6039933f-0ce3-11ef-b163-000012e4fe80

\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
Connection method: Direct
\n", " Dashboard: http://10.0.0.3:8787/status\n", "
\n", "\n", " \n", "\n", " \n", "
\n", "

Scheduler Info

\n", "
\n", "
\n", "
\n", "
\n", "

Scheduler

\n", "

Scheduler-d073585d-dcac-41bf-9c5c-1055fe07576c

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", " Comm: tcp://10.0.0.3:8786\n", " \n", " Workers: 8\n", "
\n", " Dashboard: http://10.0.0.3:8787/status\n", " \n", " Total threads: 8\n", "
\n", " Started: Just now\n", " \n", " Total memory: 180.00 GiB\n", "
\n", "
\n", "
\n", "\n", "
\n", " \n", "

Workers

\n", "
\n", "\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "

Worker: tcp://10.0.0.2:34491

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
\n", " Comm: tcp://10.0.0.2:34491\n", " \n", " Total threads: 1\n", "
\n", " Dashboard: http://10.0.0.2:38385/status\n", " \n", " Memory: 22.50 GiB\n", "
\n", " Nanny: tcp://10.0.0.2:37559\n", "
\n", " Local directory: /tmp/dask-scratch-space/worker-p2de783n\n", "
\n", " Tasks executing: \n", " \n", " Tasks in memory: \n", "
\n", " Tasks ready: \n", " \n", " Tasks in flight: \n", "
\n", " CPU usage: 4.0%\n", " \n", " Last seen: Just now\n", "
\n", " Memory usage: 216.19 MiB\n", " \n", " Spilled bytes: 0 B\n", "
\n", " Read bytes: 8.81 kiB\n", " \n", " Write bytes: 14.61 kiB\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "

Worker: tcp://10.0.0.2:39239

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
\n", " Comm: tcp://10.0.0.2:39239\n", " \n", " Total threads: 1\n", "
\n", " Dashboard: http://10.0.0.2:45797/status\n", " \n", " Memory: 22.50 GiB\n", "
\n", " Nanny: tcp://10.0.0.2:36259\n", "
\n", " Local directory: /tmp/dask-scratch-space/worker-mo04yp4a\n", "
\n", " Tasks executing: \n", " \n", " Tasks in memory: \n", "
\n", " Tasks ready: \n", " \n", " Tasks in flight: \n", "
\n", " CPU usage: 6.0%\n", " \n", " Last seen: Just now\n", "
\n", " Memory usage: 216.30 MiB\n", " \n", " Spilled bytes: 0 B\n", "
\n", " Read bytes: 9.76 kiB\n", " \n", " Write bytes: 14.86 kiB\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "

Worker: tcp://10.0.0.2:40863

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
\n", " Comm: tcp://10.0.0.2:40863\n", " \n", " Total threads: 1\n", "
\n", " Dashboard: http://10.0.0.2:43677/status\n", " \n", " Memory: 22.50 GiB\n", "
\n", " Nanny: tcp://10.0.0.2:32877\n", "
\n", " Local directory: /tmp/dask-scratch-space/worker-4p9jsv4f\n", "
\n", " Tasks executing: \n", " \n", " Tasks in memory: \n", "
\n", " Tasks ready: \n", " \n", " Tasks in flight: \n", "
\n", " CPU usage: 4.0%\n", " \n", " Last seen: Just now\n", "
\n", " Memory usage: 216.27 MiB\n", " \n", " Spilled bytes: 0 B\n", "
\n", " Read bytes: 9.77 kiB\n", " \n", " Write bytes: 14.88 kiB\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "

Worker: tcp://10.0.0.2:46243

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
\n", " Comm: tcp://10.0.0.2:46243\n", " \n", " Total threads: 1\n", "
\n", " Dashboard: http://10.0.0.2:40513/status\n", " \n", " Memory: 22.50 GiB\n", "
\n", " Nanny: tcp://10.0.0.2:45107\n", "
\n", " Local directory: /tmp/dask-scratch-space/worker-gt5epnxr\n", "
\n", " Tasks executing: \n", " \n", " Tasks in memory: \n", "
\n", " Tasks ready: \n", " \n", " Tasks in flight: \n", "
\n", " CPU usage: 4.0%\n", " \n", " Last seen: Just now\n", "
\n", " Memory usage: 216.21 MiB\n", " \n", " Spilled bytes: 0 B\n", "
\n", " Read bytes: 10.04 kiB\n", " \n", " Write bytes: 15.00 kiB\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "

Worker: tcp://10.0.0.3:39647

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
\n", " Comm: tcp://10.0.0.3:39647\n", " \n", " Total threads: 1\n", "
\n", " Dashboard: http://10.0.0.3:38377/status\n", " \n", " Memory: 22.50 GiB\n", "
\n", " Nanny: tcp://10.0.0.3:34843\n", "
\n", " Local directory: /tmp/dask-scratch-space/worker-gqcyic7m\n", "
\n", " Tasks executing: \n", " \n", " Tasks in memory: \n", "
\n", " Tasks ready: \n", " \n", " Tasks in flight: \n", "
\n", " CPU usage: 4.0%\n", " \n", " Last seen: Just now\n", "
\n", " Memory usage: 217.51 MiB\n", " \n", " Spilled bytes: 0 B\n", "
\n", " Read bytes: 63.74 kiB\n", " \n", " Write bytes: 58.80 kiB\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "

Worker: tcp://10.0.0.3:40155

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
\n", " Comm: tcp://10.0.0.3:40155\n", " \n", " Total threads: 1\n", "
\n", " Dashboard: http://10.0.0.3:34723/status\n", " \n", " Memory: 22.50 GiB\n", "
\n", " Nanny: tcp://10.0.0.3:46339\n", "
\n", " Local directory: /tmp/dask-scratch-space/worker-yo78gnof\n", "
\n", " Tasks executing: \n", " \n", " Tasks in memory: \n", "
\n", " Tasks ready: \n", " \n", " Tasks in flight: \n", "
\n", " CPU usage: 6.0%\n", " \n", " Last seen: Just now\n", "
\n", " Memory usage: 218.25 MiB\n", " \n", " Spilled bytes: 0 B\n", "
\n", " Read bytes: 63.73 kiB\n", " \n", " Write bytes: 58.80 kiB\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "

Worker: tcp://10.0.0.3:45005

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
\n", " Comm: tcp://10.0.0.3:45005\n", " \n", " Total threads: 1\n", "
\n", " Dashboard: http://10.0.0.3:42503/status\n", " \n", " Memory: 22.50 GiB\n", "
\n", " Nanny: tcp://10.0.0.3:34929\n", "
\n", " Local directory: /tmp/dask-scratch-space/worker-skts4xjq\n", "
\n", " Tasks executing: \n", " \n", " Tasks in memory: \n", "
\n", " Tasks ready: \n", " \n", " Tasks in flight: \n", "
\n", " CPU usage: 6.0%\n", " \n", " Last seen: Just now\n", "
\n", " Memory usage: 216.24 MiB\n", " \n", " Spilled bytes: 0 B\n", "
\n", " Read bytes: 63.74 kiB\n", " \n", " Write bytes: 58.81 kiB\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "

Worker: tcp://10.0.0.3:46333

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
\n", " Comm: tcp://10.0.0.3:46333\n", " \n", " Total threads: 1\n", "
\n", " Dashboard: http://10.0.0.3:36413/status\n", " \n", " Memory: 22.50 GiB\n", "
\n", " Nanny: tcp://10.0.0.3:44405\n", "
\n", " Local directory: /tmp/dask-scratch-space/worker-pu9uzxbg\n", "
\n", " Tasks executing: \n", " \n", " Tasks in memory: \n", "
\n", " Tasks ready: \n", " \n", " Tasks in flight: \n", "
\n", " CPU usage: 4.0%\n", " \n", " Last seen: Just now\n", "
\n", " Memory usage: 218.16 MiB\n", " \n", " Spilled bytes: 0 B\n", "
\n", " Read bytes: 64.86 kiB\n", " \n", " Write bytes: 59.93 kiB\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "client = Client(\"10.0.0.3:8786\")\n", "client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ":::{note}\n", "Dask-CUDA only discovers and registers these GPUs, but it is incapable of isolating GPUs. Other non-Dask tasks can still preempt these GPUs. Isolation of GPU resources should be achieved by leveraging container technologies such as Kubernetes. \n", ":::\n", "\n", "## GPU Task\n", "\n", "Not all tasks can be accelerated by GPUs. GPUs are mainly used to expedite computationally intensive tasks, such as machine learning and deep learning. At present, the frameworks supported by Dask on GPUs include:\n", "\n", "* Scaling [CuPy](https://cupy.dev/) to a GPU cluster.\n", "* Scaling [Dask-cudf](https://docs.rapids.ai/api/dask-cudf/stable/) DataFrame to a GPU cluster.\n", "\n", ":::{note}\n", "When utilizing NVIDIA's GPUs, it is necessary to append the CUDA directory to the `PATH` and `LD_LIBRARY_PATH` environment variables, as CuPy and cuDF depend on NVIDIA's GPU libraries.\n", ":::\n", "\n", "### Example: SVD\n", "\n", "The following code performs Singular Value Decomposition (SVD) on a GPU, which is a task well-suited for GPU acceleration. By setting `dask.config.set({\"array.backend\": \"cupy\"})`, the execution backend for Dask Array can be changed to CuPy on the GPU.\n", "\n", "```python\n", "import cupy\n", "import dask\n", "import dask.array as da\n", "\n", "dask.config.set({\"array.backend\": \"cupy\"})\n", "rs = dask.array.random.RandomState(RandomState=cupy.random.RandomState)\n", "x = rs.random((10000, 1000), chunks=(1000, 1000))\n", "u, s, v = dask.array.linalg.svd(x)\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.8" } }, "nbformat": 4, "nbformat_minor": 2 }