網站首頁 編程語言 正文
簡介
前幾天搗鼓了一下Ubuntu,正是想用一下我舊電腦上的N卡,可以用GPU來跑代碼,體驗一下多核的快樂。
還好我這破電腦也是支持Cuda的:
$ sudo lshw -C display *-display description: 3D controller product: GK208M [GeForce GT 740M] vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:01:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list rom configuration: driver=nouveau latency=0 resources: irq:35 memory:f0000000-f0ffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:6000(size=128)
安裝相關工具
首先安裝一下Cuda的開發工具,命令如下:
$ sudo apt install nvidia-cuda-toolkit
查看一下相關信息:
$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Nov_18_09:45:30_PST_2021 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0
通過Conda安裝相關的依賴包:
conda install numba & conda install cudatoolkit
通過pip安裝也可以,一樣的。
測試與驅動安裝
簡單測試了一下,發覺報錯了:
$ /home/larry/anaconda3/bin/python /home/larry/code/pkslow-samples/python/src/main/python/cuda/test1.py Traceback (most recent call last): File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 246, in ensure_initialized self.cuInit(0) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 319, in safe_cuda_api_call self._check_ctypes_error(fname, retcode) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 387, in _check_ctypes_error raise CudaAPIError(retcode, msg) numba.cuda.cudadrv.driver.CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/larry/code/pkslow-samples/python/src/main/python/cuda/test1.py", line 15, in <module> gpu_print[1, 2]() File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/compiler.py", line 862, in __getitem__ return self.configure(*args) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/compiler.py", line 857, in configure return _KernelConfiguration(self, griddim, blockdim, stream, sharedmem) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/compiler.py", line 718, in __init__ ctx = get_context() File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/devices.py", line 220, in get_context return _runtime.get_or_create_context(devnum) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/devices.py", line 138, in get_or_create_context return self._get_or_create_context_uncached(devnum) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/devices.py", line 153, in _get_or_create_context_uncached with driver.get_active_context() as ac: File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 487, in __enter__ driver.cuCtxGetCurrent(byref(hctx)) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 284, in __getattr__ self.ensure_initialized() File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 250, in ensure_initialized raise CudaSupportError(f"Error at driver init: {description}") numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: Call to cuInit results in CUDA_ERROR_NO_DEVICE (100)
網上搜了一下,發現是驅動問題。通過Ubuntu自帶的工具安裝顯卡驅動:
還是失敗:
$ nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
最后,通過命令行安裝驅動,成功解決這個問題:
$ sudo apt install nvidia-driver-470
檢查后發現正常了:
$ nvidia-smi Wed Dec 7 22:13:49 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 N/A | N/A | | N/A 51C P8 N/A / N/A | 4MiB / 2004MiB | N/A Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
測試代碼也可以跑了。
測試Python代碼
打印ID
準備以下代碼:
from numba import cuda import os def cpu_print(): print('cpu print') @cuda.jit def gpu_print(): dataIndex = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x print('gpu print ', cuda.threadIdx.x, cuda.blockIdx.x, cuda.blockDim.x, dataIndex) if __name__ == '__main__': gpu_print[4, 4]() cuda.synchronize() cpu_print()
這個代碼主要有兩個函數,一個是用CPU執行,一個是用GPU執行,執行打印操作。關鍵在于@cuda.jit
這個注解,讓代碼在GPU上執行。運行結果如下:
$ /home/larry/anaconda3/bin/python /home/larry/code/pkslow-samples/python/src/main/python/cuda/print_test.py
gpu print ?0 3 4 12
gpu print ?1 3 4 13
gpu print ?2 3 4 14
gpu print ?3 3 4 15
gpu print ?0 2 4 8
gpu print ?1 2 4 9
gpu print ?2 2 4 10
gpu print ?3 2 4 11
gpu print ?0 1 4 4
gpu print ?1 1 4 5
gpu print ?2 1 4 6
gpu print ?3 1 4 7
gpu print ?0 0 4 0
gpu print ?1 0 4 1
gpu print ?2 0 4 2
gpu print ?3 0 4 3
cpu print
可以看到GPU總共打印了16次,使用了不同的Thread來執行。這次每次打印的結果都可能不同,因為提交GPU是異步執行的,無法確保哪個單元先執行。同時也需要調用同步函數cuda.synchronize()
,確保GPU執行完再繼續往下跑。
查看時間
我們通過這個函數來看GPU并行的力量:
from numba import jit, cuda import numpy as np # to measure exec time from timeit import default_timer as timer # normal function to run on cpu def func(a): for i in range(10000000): a[i] += 1 # function optimized to run on gpu @jit(target_backend='cuda') def func2(a): for i in range(10000000): a[i] += 1 if __name__ == "__main__": n = 10000000 a = np.ones(n, dtype=np.float64) start = timer() func(a) print("without GPU:", timer() - start) start = timer() func2(a) print("with GPU:", timer() - start)
結果如下:
$ /home/larry/anaconda3/bin/python /home/larry/code/pkslow-samples/python/src/main/python/cuda/time_test.py
without GPU: 3.7136273959999926
with GPU: 0.4040513340000871
可以看到使用CPU需要3.7秒,而GPU則只要0.4秒,還是能快不少的。當然這里不是說GPU一定比CPU快,具體要看任務的類型。
原文鏈接:https://www.cnblogs.com/larrydpk/p/17093627.html
相關推薦
- 2022-05-17 解析Go?中的?rune?類型_Golang
- 2022-09-10 nginx?Rewrite重寫地址的實現_nginx
- 2022-10-09 C#實現折半查找算法_C#教程
- 2022-02-12 android button的圓角邊框及點擊效果實現
- 2022-05-11 Python學習之私有函數,私有變量及封裝詳解_python
- 2022-09-29 Python3中map(),reduce(),filter()的詳細用法_python
- 2022-08-05 C語言超詳細講解宏與指針的使用_C 語言
- 2022-10-02 C#使用is、as關鍵字以及顯式強轉實現引用類型轉換_C#教程
- 最近更新
-
- window11 系統安裝 yarn
- 超詳細win安裝深度學習環境2025年最新版(
- Linux 中運行的top命令 怎么退出?
- MySQL 中decimal 的用法? 存儲小
- get 、set 、toString 方法的使
- @Resource和 @Autowired注解
- Java基礎操作-- 運算符,流程控制 Flo
- 1. Int 和Integer 的區別,Jav
- spring @retryable不生效的一種
- Spring Security之認證信息的處理
- Spring Security之認證過濾器
- Spring Security概述快速入門
- Spring Security之配置體系
- 【SpringBoot】SpringCache
- Spring Security之基于方法配置權
- redisson分布式鎖中waittime的設
- maven:解決release錯誤:Artif
- restTemplate使用總結
- Spring Security之安全異常處理
- MybatisPlus優雅實現加密?
- Spring ioc容器與Bean的生命周期。
- 【探索SpringCloud】服務發現-Nac
- Spring Security之基于HttpR
- Redis 底層數據結構-簡單動態字符串(SD
- arthas操作spring被代理目標對象命令
- Spring中的單例模式應用詳解
- 聊聊消息隊列,發送消息的4種方式
- bootspring第三方資源配置管理
- GIT同步修改后的遠程分支