YOLOV4在RTX 4090 Ubuntu 24.04 LTS 下的实践总结
YOLOV4在RTX 4090下的实践总结
作者 伍增田 Tommy WU zxpns18@126.com
root@g4090-2:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.04 LTS
Release: 24.04
Codename: noble
root@g4090-2:~# nvidia-smi
Thu Apr 17 07:38:37 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=++======|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off |
| 0% 39C P8 12W / 450W | 19596MiB / 24564MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 537450 C python 998MiB |
| 0 N/A N/A 537454 C python 998MiB |
| 0 N/A N/A 537458 C python 998MiB |
| 0 N/A N/A 537462 C python 998MiB |
| 0 N/A N/A 537466 C python 998MiB |
| 0 N/A N/A 537470 C python 998MiB |
| 0 N/A N/A 537474 C python 998MiB |
| 0 N/A N/A 537478 C python 998MiB |
| 0 N/A N/A 2384008 C …r1/image_gen/.image_gen/bin/python3 11574MiB |
±----------------------------------------------------------------------------------------+
root@g4090-2:~# nvcc -V
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
gcc默认使用的是 gcc-10 ,编译时出错:
/usr/include/x86_64-linux-gnu/bits/stdio2.h(31): error: identifier "__builtin_dynamic_object_size" is undefined/usr/include/x86_64-linux-gnu/bits/stdio2.h(45): error: identifier "__builtin_dynamic_object_size" is undefined/usr/include/x86_64-linux-gnu/bits/stdio2.h(55): error: identifier "__builtin_dynamic_object_size" is undefined/usr/include/x86_64-linux-gnu/bits/stdio2.h(69): error: identifier "__builtin_dynamic_object_size" is undefined/usr/include/x86_64-linux-gnu/bits/stdio2.h(198): error: identifier "__builtin_dynamic_object_size" is undefined/usr/include/x86_64-linux-gnu/bits/stdio2.h(210): error: identifier "__builtin_dynamic_object_size" is undefined/usr/include/x86_64-linux-gnu/bits/stdio2.h(223): error: identifier "__builtin_dynamic_object_size" is undefined/usr/include/x86_64-linux-gnu/bits/stdio2.h(238): error: identifier "__builtin_dynamic_object_size" is undefined
解决办法,使用gcc-12
https://github.com/vllm-project/vllm/issues/10179
#on debian/ubuntu system
sudo apt install gcc-12 g++-12
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 12
cuda12下的cudnn版本和YOVO v4不兼容,编译报错
cudnn-linux-x86_64-9.8.0.87_cuda12-archive.tar.xz
gcc -Iinclude/ -I3rdparty/stb/include -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas -fPIC -Ofast -DGPU -DCUDNN -I/usr/local/cudnn/include -fPIC -c ./src/convolutional_layer.c -o obj/convolutional_layer.o
./src/convolutional_layer.c: In function ‘cudnn_convolutional_setup’:
./src/convolutional_layer.c:286:24: error: ‘CUDNN_CONVOLUTION_FWD_PREFER_FASTEST’ undeclared (first use in this function); did you mean ‘CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3’?286 | int forward_algo = CUDNN_CONVOLUTION_FWD_PREFER_FASTEST;| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3```bash
root@g4090-2:~# nvidia-smi --query-gpu=compute_cap --format=csv
compute_cap
8.9
Makefile中的修改```c
CUDNN=0
#RTX 4090
ARCH= -gencode arch=compute_89,code=sm_89 -gencode arch=compute_89,code=compute_89