lscpu: fix incorrect number of sockets during hotplug
lscpu sometimes shows incorrect 'Socket(s)' value if a hotplug operation
is running.
On a 32 CPU 2-socket system, the expected output is as shown below:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Model name: POWER10 (architected), altivec supported
Model: 2.0 (pvr 0080 0200)
Thread(s) per core: 8
Core(s) per socket: 2
Socket(s): 2
On the same system, if hotplug is running along with lscpu, it shows
"Socket(s):" as 3 and 4 incorrectly sometimes.
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-11,16-31
Off-line CPU(s) list: 12-15
Model name: POWER10 (architected), altivec supported
Model: 2.0 (pvr 0080 0200)
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 3
The number of sockets is considered as the number of unique core_siblings
CPU groups. The issues causing the number of sockets to sometimes be
higher during hotplug is:
1. The core_siblings of CPUs on the same socket are different because a CPU
on the socket has been onlined/offlined in between. In the below example,
nr sockets was wrongly incremented for CPU 5 though CPU 4 and 5 are on the
same socket because their core_siblings was different as CPU 12 was onlined
in between.
CPU: 4
core_siblings: ff f0 0 0 0 0 0 0
CPU: 5
core_siblings: ff f8 0 0 0 0 0 0
2. The core_siblings file of a CPU is created when a CPU is onlined. It may
have an invalid value for some time until the online operation is fully
complete. In the below example, nr sockets is wrongly incremented because
the core_siblings of CPU 14 was 0 as it had just been onlined.
CPU: 14
core_siblings: 0 0 0 0 0 0 0 0
To fix this, make the below changes:
1. Instead of considering CPUs to be on different sockets if their
core_siblings masks are unequal, consider them to be on different sockets
only if their core_siblings masks don't have even one common CPU. Then CPUs
on the same socket will be correctly identified even if offline/online
operations happen while they are read if at least one CPU in the socket is
online during both reads.
2. Check if a CPU's hotplug operation has been completed before using its
core_siblings file
[kzak@redhat.com: - use xmalloc(),
- use ul_strtos32(),
- use err() on CPU_ALLOC() error]
Reported-by: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com> Signed-off-by: Anjali K <anjalik@linux.ibm.com> Signed-off-by: Karel Zak <kzak@redhat.com>