tlaguz's webpage

tlaguz's webpage

13 Jan 2023

How to install ROCm on Ubuntu 22.04 (Jammy) for TensorFlow

Ubuntu 22.04 installation

I’ve installed a minimal Ubuntu 22.04 server installation on a machine with a Radeon 6900 XT card.

ROCm installation

AMD provides a deb package which setups apt repositories and contain amdgpu-install tool.

I’ve downloaded the latest deb for Ubuntu 22.04 and installed it:

wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_5.4.50400-1_all.deb
sudo apt-get install ./amdgpu-install_5.4.50400-1_all.deb

At this point I had to fix added apt repositories in /etc/apt/sources.list.d/. Please check that all added repositories are for jammy and for the latest version. For example:

# cat /etc/apt/sources.list.d/amdgpu.list
deb https://repo.radeon.com/amdgpu/latest/ubuntu jammy main

# cat /etc/apt/sources.list.d/rocm.list 
deb [arch=amd64] https://repo.radeon.com/rocm/apt/latest jammy main

In the next step I installed ROCm and it’s libraries:

amdgpu-install --usecase=rocm,dkms
apt install rocm-libs miopen-hip

At this point you should reboot your system.

It is important to setup environment variables. ROCm is installed in /opt by default and there is a symbolic link /opt/rocm pointing to the active installation (chosen by alternatives). You can for example create a file /etc/profile.d/rocm.sh:

#!/bin/sh
export PATH="${PATH:+${PATH}:}/opt/rocm/bin/"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}/opt/rocm/lib/"

Diagnostics

You can run rocminfo to check if the graphics card is detected by ROCm. If it’s not make sure the amdgpu kernel module is loaded.

To monitor card’s performnce rocm-smi can be used. I’ve never got nvtop to work, but there is an alternative made specifically for radeons: radeontop (it’s available in Ubuntu repos).

TensorFlow installation

You have to install dedicated tensorflow-rocm package. It is available on pip. tensorflow-gpu uses CUDA.

Troubleshooting

I was missing some libraries. It turns out that amdgpu-install doesn’t install everything that is needed to use ROCm. I had to install libtinfo5 and libnuma-dev for example. Observe TF’s output and install what is missing.

Particularly hard to find was libstdc++-12-dev. It looks like ROCm uses clang to compile itself on the first run and it’s missing c++ std library by default.

The error looks like this:

MIOpen(HIP): Error [BuildHip] HIPRTC status = HIPRTC_ERROR_COMPILATION (6), source file: naive_conv.cpp
MIOpen(HIP): Warning [BuildHip] /tmp/comgr-112729/input/CompileSource:39:10: fatal error: 'limits' file not found
#include <limits> // std::numeric_limits
         ^~~~~~~~
1 error generated when compiling for gfx1030.
terminate called after throwing an instance of 'miopen::Exception'
  what():  /MIOpen/src/hipoc/hipoc_program.cpp:299: Code object build failed. Source: naive_conv.cpp
Aborted (core dumped)