tlaguz's webpage

tlaguz's webpage

23 Nov 2019

Debian KVM CPU pinning considering HT/SMT and NUMA domains

By default KVM uses all processors threads which are shared between virtual machines, emulator and host processes. Moreover it can change cores assignment using some arbitrary algorithm while the virtual machine is up.
It’s generally not an ideal solution, since virtual machines are blind to these changes, and doesn’t know about HT/SMT or NUMA topology outside. The hypervisor can change core assignment in such a way that will force all vm’s processes in the hypervisor to change NUMA domain, causing a random lag.
Also, the hypervisor threads are presented as cores by default in the vm which is not always true, because two cores can be HT/SMT siblings in the hypervisor.

As a solution static core assignment called CPU pinning can be made.

  Commands in this article has been tested on Debian 10.

Determining CPU topology

It can easily be done using likwid-topology from likwid package.

$ sudo apt install likwid
$ likwid-topology

--------------------------------------------------------------------------------
CPU name:	AMD Ryzen 5 3600 6-Core Processor
CPU type:	AMD K17 (Zen) architecture
CPU stepping:	0
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets:		1
Cores per socket:	6
Threads per core:	2
--------------------------------------------------------------------------------
HWThread	Thread		Core		Socket		Available
0		0		0		0		*
1		0		1		0		*
2		0		2		0		*
3		0		3		0		*
4		0		4		0		*
5		0		5		0		*
6		1		0		0		*
7		1		1		0		*
8		1		2		0		*
9		1		3		0		*
10		1		4		0		*
11		1		5		0		*
--------------------------------------------------------------------------------
Socket 0:		( 0 6 1 7 2 8 3 9 4 10 5 11 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level:			1
Size:			32 kB
Cache groups:		( 0 6 ) ( 1 7 ) ( 2 8 ) ( 3 9 ) ( 4 10 ) ( 5 11 )
--------------------------------------------------------------------------------
Level:			2
Size:			512 kB
Cache groups:		( 0 6 ) ( 1 7 ) ( 2 8 ) ( 3 9 ) ( 4 10 ) ( 5 11 )
--------------------------------------------------------------------------------
Level:			3
Size:			16 MB
Cache groups:		( 0 6 1 7 2 8 ) ( 3 9 4 10 5 11 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains:		1
--------------------------------------------------------------------------------
Domain:			0
Processors:		( 0 6 1 7 2 8 3 9 4 10 5 11 )
Distances:		10
Free memory:		25701.3 MB
Total memory:		32160 MB
--------------------------------------------------------------------------------

Here we have only one NUMA domain with 6 physical cores and 2 threads per core. Threads are paired in following pattern: (n, n+6).

$ likwid-topology
--------------------------------------------------------------------------------
CPU name:	Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
CPU type:	Intel Xeon IvyBridge EN/EP/EX processor
CPU stepping:	4
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets:		2
Cores per socket:	10
Threads per core:	2
--------------------------------------------------------------------------------
HWThread	Thread		Core		Socket		Available
0		0		0		0		*
1		0		10		1		*
2		0		1		0		*
3		0		11		1		*
4		0		2		0		*
5		0		12		1		*
6		0		3		0		*
7		0		13		1		*
8		0		4		0		*
9		0		14		1		*
10		0		5		0		*
11		0		15		1		*
12		0		6		0		*
13		0		16		1		*
14		0		7		0		*
15		0		17		1		*
16		0		8		0		*
17		0		18		1		*
18		0		9		0		*
19		0		19		1		*
20		1		0		0		*
21		1		10		1		*
22		1		1		0		*
23		1		11		1		*
24		1		2		0		*
25		1		12		1		*
26		1		3		0		*
27		1		13		1		*
28		1		4		0		*
29		1		14		1		*
30		1		5		0		*
31		1		15		1		*
32		1		6		0		*
33		1		16		1		*
34		1		7		0		*
35		1		17		1		*
36		1		8		0		*
37		1		18		1		*
38		1		9		0		*
39		1		19		1		*
--------------------------------------------------------------------------------
Socket 0:		( 0 20 2 22 4 24 6 26 8 28 10 30 12 32 14 34 16 36 18 38 )
Socket 1:		( 1 21 3 23 5 25 7 27 9 29 11 31 13 33 15 35 17 37 19 39 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level:			1
Size:			32 kB
Cache groups:		( 0 20 ) ( 2 22 ) ( 4 24 ) ( 6 26 ) ( 8 28 ) ( 10 30 ) ( 12 32 ) ( 14 34 ) ( 16 36 ) ( 18 38 ) ( 1 21 ) ( 3 23 ) ( 5 25 ) ( 7 27 ) ( 9 29 ) ( 11 31 ) ( 13 33 ) ( 15 35 ) ( 17 37 ) ( 19 39 )
--------------------------------------------------------------------------------
Level:			2
Size:			256 kB
Cache groups:		( 0 20 ) ( 2 22 ) ( 4 24 ) ( 6 26 ) ( 8 28 ) ( 10 30 ) ( 12 32 ) ( 14 34 ) ( 16 36 ) ( 18 38 ) ( 1 21 ) ( 3 23 ) ( 5 25 ) ( 7 27 ) ( 9 29 ) ( 11 31 ) ( 13 33 ) ( 15 35 ) ( 17 37 ) ( 19 39 )
--------------------------------------------------------------------------------
Level:			3
Size:			25 MB
Cache groups:		( 0 20 2 22 4 24 6 26 8 28 10 30 12 32 14 34 16 36 18 38 ) ( 1 21 3 23 5 25 7 27 9 29 11 31 13 33 15 35 17 37 19 39 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains:		2
--------------------------------------------------------------------------------
Domain:			0
Processors:		( 0 20 2 22 4 24 6 26 8 28 10 30 12 32 14 34 16 36 18 38 )
Distances:		10 20
Free memory:		79311.1 MB
Total memory:		128871 MB
--------------------------------------------------------------------------------
Domain:			1
Processors:		( 1 21 3 23 5 25 7 27 9 29 11 31 13 33 15 35 17 37 19 39 )
Distances:		20 10
Free memory:		29362.2 MB
Total memory:		129019 MB
--------------------------------------------------------------------------------

Here we can see a more complex scenario: two NUMA domains with identical processors each consisting of 10 cores and 2 threads per core. Threads are paired in following pattern: (k, k+20) where threads with even k are in domain 0, and with odd k in domain 1.

We don’t want a virtual machine to have cores from different domains.
I will be using the first topology in the rest of this article.

Restricting hypervisor to specific threads

We can restrict which threads can the hypervisor use for it’s processes using isolcpus kernel option. This option tells the kernel NOT TO USE specified threads.

In file /etc/default/grub add it to the variable GRUB_CMDLINE_LINUX, for example:

GRUB_CMDLINE_LINUX="isolcpus=1,2,3,4,5,7,8,9,10,11"

Next run update-grub and reboot.
Now the hypervisor will be using only 0,6 threads. We can validate this by:

cat /sys/devices/system/cpu/isolated

this should show us isolated threads.

Pinning virtual machine to specific threads

Unfortunately there is no way to do this using virt-manager. After executing virsh edit <vm_name> we add/edit following lines:


  <vcpu placement='static'>2</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='5'/>
    <vcpupin vcpu='1' cpuset='11'/>
    <emulatorpin cpuset='0,6'/>
  </cputune>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='1' threads='2'/>
  </cpu>

Above code means:

  • vcpu – total number of threads
  • vcpupin – mapping between virtualized and hypervizor’s threads
  • emulatorpin – sets cores for emulator
  • topology – specifies what topology will virtual machine see

Please note that threads 0 and 1, are siblings in the hypervisor and in the same way they appear in the virtual machine.

  CPU threads in linux are indexed from 0, but in various programs (eg. htop) are indexed from 1.