How to install ROCm on Ubuntu 22.04 (Jammy) for TensorFlow
Ubuntu 22.04 installation
I’ve installed a minimal Ubuntu 22.04 server installation on a machine with a Radeon 6900 XT card.
ROCm installation
AMD provides a deb package which setups apt repositories and contain amdgpu-install
tool.
I’ve downloaded the latest deb for Ubuntu 22.04 and installed it:
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_5.4.50400-1_all.deb
sudo apt-get install ./amdgpu-install_5.4.50400-1_all.deb
At this point I had to fix added apt repositories in /etc/apt/sources.list.d/
. Please check that all added repositories are for jammy and for the latest version.
For example:
# cat /etc/apt/sources.list.d/amdgpu.list
deb https://repo.radeon.com/amdgpu/latest/ubuntu jammy main
# cat /etc/apt/sources.list.d/rocm.list
deb [arch=amd64] https://repo.radeon.com/rocm/apt/latest jammy main
In the next step I installed ROCm and it’s libraries:
amdgpu-install --usecase=rocm,dkms
apt install rocm-libs miopen-hip
At this point you should reboot your system.
It is important to setup environment variables. ROCm is installed in /opt by default and there is a symbolic link /opt/rocm pointing to the active installation (chosen by alternatives). You can for example create a file /etc/profile.d/rocm.sh:
#!/bin/sh
export PATH="${PATH:+${PATH}:}/opt/rocm/bin/"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}/opt/rocm/lib/"
Diagnostics
You can run rocminfo
to check if the graphics card is detected by ROCm. If it’s not make sure the amdgpu kernel module is loaded.
To monitor card’s performnce rocm-smi
can be used. I’ve never got nvtop
to work, but there is an alternative made specifically for radeons: radeontop
(it’s available in Ubuntu repos).
TensorFlow installation
You have to install dedicated tensorflow-rocm
package. It is available on pip. tensorflow-gpu
uses CUDA.
Troubleshooting
I was missing some libraries. It turns out that amdgpu-install doesn’t install everything that is needed to use ROCm.
I had to install libtinfo5
and libnuma-dev
for example. Observe TF’s output and install what is missing.
Particularly hard to find was libstdc++-12-dev
. It looks like ROCm uses clang to compile itself on the first run and it’s missing c++ std library by default.
The error looks like this:
MIOpen(HIP): Error [BuildHip] HIPRTC status = HIPRTC_ERROR_COMPILATION (6), source file: naive_conv.cpp
MIOpen(HIP): Warning [BuildHip] /tmp/comgr-112729/input/CompileSource:39:10: fatal error: 'limits' file not found
#include <limits> // std::numeric_limits
^~~~~~~~
1 error generated when compiling for gfx1030.
terminate called after throwing an instance of 'miopen::Exception'
what(): /MIOpen/src/hipoc/hipoc_program.cpp:299: Code object build failed. Source: naive_conv.cpp
Aborted (core dumped)