git.ipfire.org Git - thirdparty/qemu.git/log

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* target/i386/kvm: Intel TDX support
* target/i386/emulate: more lflags cleanups
* meson: remove need for explicit listing of dependencies in hw_common_arch and
  target_common_arch
* rust: small fixes
* hpet: Reorganize register decoding to be more similar to Rust code
* target/i386: fixes for AMD models
* target/i386: new EPYC-Turin CPU model

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmg4BxwUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroP67gf+PEP4EDQP0AJUfxXYVsczGf5snGjz
# ro8jYmKG+huBZcrS6uPK5zHYxtOI9bHr4ipTHJyHd61lyzN6Ys9amPbs/CRE2Q4x
# Ky4AojPhCuaL2wHcYNcu41L+hweVQ3myj97vP3hWvkatulXYeMqW3/4JZgr4WZ69
# A9LGLtLabobTz5yLc8x6oHLn/BZ2y7gjd2LzTz8bqxx7C/kamjoDrF2ZHbX9DLQW
# BKWQ3edSO6rorSNHWGZsy9BE20AEkW2LgJdlV9eXglFEuEs6cdPKwGEZepade4bQ
# Rdt2gHTlQdUDTFmAbz8pttPxFGMC9Zpmb3nnicKJpKQAmkT/x4k9ncjyAQ==
# =XmkU
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 29 May 2025 03:05:00 EDT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (77 commits)
  target/i386/tcg/helper-tcg: fix file references in comments
  target/i386: Add support for EPYC-Turin model
  target/i386: Update EPYC-Genoa for Cache property, perfmon-v2, RAS and SVM feature bits
  target/i386: Add couple of feature bits in CPUID_Fn80000021_EAX
  target/i386: Update EPYC-Milan CPU model for Cache property, RAS, SVM feature bits
  target/i386: Update EPYC-Rome CPU model for Cache property, RAS, SVM feature bits
  target/i386: Update EPYC CPU model for Cache property, RAS, SVM feature bits
  rust: make declaration of dependent crates more consistent
  docs: Add TDX documentation
  i386/tdx: Validate phys_bits against host value
  i386/tdx: Make invtsc default on
  i386/tdx: Don't treat SYSCALL as unavailable
  i386/tdx: Fetch and validate CPUID of TD guest
  target/i386: Print CPUID subleaf info for unsupported feature
  i386: Remove unused parameter "uint32_t bit" in feature_word_description()
  i386/cgs: Introduce x86_confidential_guest_check_features()
  i386/tdx: Define supported KVM features for TDX
  i386/tdx: Add XFD to supported bit of TDX
  i386/tdx: Add supported CPUID bits relates to XFAM
  i386/tdx: Add supported CPUID bits related to TD Attributes
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'pull-nbd-2025-05-29' of https://repo.or.cz/qemu/ericb into staging

NBD patches for 2025-05-29

- Nir Soffer: Allow for larger Unix socket buffers in NBD
- Eric Blake: clean up mirror-sparse iotest issues

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmg42T0ACgkQp6FrSiUn
# Q2r5nwgAg4ftfPBnynqL54dQ6rPKPOwW3n4Ei26EsC86OcFIGEGuCK6UGBH4bH6d
# BgyjNWY/6/t90vnXcBGVFmxrugHGh3TwOpAY08TqW0LGmpJiwX5wZTk3cVbcwXat
# ME8oYeOQwLwqboFthlgnXsUuQrKtXrkY27154ztH354x4bi5AmHi//Or4+EdFf8L
# /cCmS7uHPiHV9l1+U1hV4i1UQ+3rWHIOcfn/sKeEwPfrlyEW+2fxWUjl7qyf/Mqz
# EwCtkjz4WsFTxYyQPN6r3NyoEIZDRK27srubVhat6Fk9gOnR5Rh2MCntyxUpXmo5
# 4xD3QkVbXVRhXv6n6rjmA/Q3bvZ1oQ==
# =yjPj
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 29 May 2025 18:01:33 EDT
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* tag 'pull-nbd-2025-05-29' of https://repo.or.cz/qemu/ericb:
  iotests: Filter out ZFS in several tests
  iotests: Improve mirror-sparse on ext4 and xfs
  iotests: Use disk_usage in more places
  nbd: Set unix socket send buffer on Linux
  nbd: Set unix socket send buffer on macOS
  io: Add helper for setting socket send buffer size

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

iotests: Filter out ZFS in several tests

Fiona reported that ZFS makes sparse file testing awkward, since:
- it has asynchronous allocation (not even 'fsync $file' makes du see
  the desired size; it takes the slower 'fsync -f $file' which is not
  appropriate for the tests)
- for tests of fully allocated files, ZFS with compression enabled
  still reports smaller disk usage

Add a new _require_disk_usage that quickly probes whether an attempt
to create a sparse 5M file shows as less than 1M usage, while the same
file with -o preallocation=full shows as more than 4M usage without
sync, which should filter out ZFS behavior.  Then use it in various
affected tests.

This does not add the new filter on all tests that Fiona is seeing ZFS
failures on, but only those where I could quickly spot that there is
at least one place where the test depends on the output of 'du -b' or
'stat -c %b'.

Reported-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250523163041.2548675-8-eblake@redhat.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>

iotests: Improve mirror-sparse on ext4 and xfs

Fiona reported that an ext4 filesystem on top of LVM can sometimes
report over-allocation to du (based on the heuristics the filesystem
is making while observing the contents being mirrored); even though
the contents and actual size matched, about 50% of the time the size
reported by disk_usage was too large by 4k, failing the test. In
auditing other iotests, this is a common problem we've had to deal
with.

Meanwhile, Markus reported that an xfs filesystem reports disk usage
at a default granularity of 1M (so the sparse file occupies 3M, since
it has just over 2M data).

Reported-by: Fiona Ebner <f.ebner@proxmox.com>
Reported-by: Markus Armbruster <armbru@redhat.com>
Fixes: c0ddcb2c ("tests: Add iotest mirror-sparse for recent patches")
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250523163041.2548675-7-eblake@redhat.com>
[eblake: Also fix xfs issue]
Signed-off-by: Eric Blake <eblake@redhat.com>

iotests: Use disk_usage in more places

Commit be9bac07 added a utility disk_usage function, but there are
a couple of other tests that could also use it.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250523163041.2548675-6-eblake@redhat.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>

nbd: Set unix socket send buffer on Linux

Like macOS we have similar issue on Linux. For TCP socket the send
buffer size is 2626560 bytes (~2.5 MiB) and we get good performance.
However for unix socket the default and maximum buffer size is 212992
bytes (208 KiB) and we see poor performance when using one NBD
connection, up to 4 times slower than macOS on the same machine.

Tracing shows that for every 2 MiB payload (qemu uses 2 MiB io size), we
do 1 recvmsg call with TCP socket, and 10 recvmsg calls with unix
socket.

Fixing this issue requires changing the maximum send buffer size (the
receive buffer size is ignored). This can be done using:

    $ cat /etc/sysctl.d/net-mem-max.conf
    net.core.wmem_max = 2097152

    $ sudo sysctl -p /etc/sysctl.d/net-mem-max.conf

With this we can set the socket buffer size to 2 MiB. With the defaults
the value requested by qemu is clipped to the maximum size and has no
effect.

I tested on 2 machines:
- Fedora 42 VM on MacBook Pro M2 Max
- Dell PowerEdge R640 (Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz)

On the older Dell machine we see very little improvement, up to 1.03
higher throughput. On the M2 machine we see up to 2.67 times higher
throughput. The following results are from the M2 machine.

Reading from qemu-nbd with qemu-img convert. In this test buffer size of
4m is optimal (2.28 times faster).

| buffer size | time    | user    | system  |
|-------------|---------|---------|---------|
|     default |   4.292 |   0.243 |   1.604 |
|      524288 |   2.167 |   0.058 |   1.288 |
|     1048576 |   2.041 |   0.060 |   1.238 |
|     2097152 |   1.884 |   0.060 |   1.191 |
|     4194304 |   1.881 |   0.054 |   1.196 |

Writing to qemu-nbd with qemu-img convert. In this test buffer size of
1m is optimal (2.67 times faster).

| buffer size | time    | user    | system  |
|-------------|---------|---------|---------|
|     default |   3.113 |   0.334 |   1.094 |
|      524288 |   1.173 |   0.179 |   0.654 |
|     1048576 |   1.164 |   0.164 |   0.670 |
|     2097152 |   1.227 |   0.197 |   0.663 |
|     4194304 |   1.227 |   0.198 |   0.666 |

Computing a blkhash with nbdcopy. In this test buffer size of 512k is
optimal (1.19 times faster).

| buffer size | time    | user    | system  |
|-------------|---------|---------|---------|
|     default |   2.140 |   4.483 |   2.681 |
|      524288 |   1.794 |   4.467 |   2.572 |
|     1048576 |   1.807 |   4.447 |   2.644 |
|     2097152 |   1.822 |   4.461 |   2.698 |
|     4194304 |   1.827 |   4.465 |   2.700 |

Computing a blkhash with blksum. In this test buffer size of 4m is
optimal (2.65 times faster).

| buffer size | time    | user    | system  |
|-------------|---------|---------|---------|
|     default |   3.582 |   4.595 |   2.392 |
|      524288 |   1.499 |   4.384 |   1.482 |
|     1048576 |   1.377 |   4.381 |   1.345 |
|     2097152 |   1.388 |   4.389 |   1.354 |
|     4194304 |   1.352 |   4.395 |   1.302 |

Signed-off-by: Nir Soffer <nirsof@gmail.com>
Message-ID: <20250517201154.88456-4-nirsof@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

nbd: Set unix socket send buffer on macOS

On macOS we need to increase unix socket buffers size on the client and
server to get good performance. We set socket buffers on macOS after
connecting or accepting a client connection.

Testing shows that setting socket receive buffer size (SO_RCVBUF) has no
effect on performance, so we set only the send buffer size (SO_SNDBUF).
It seems to work like Linux but not documented.

Testing shows that optimal buffer size is 512k to 4 MiB, depending on
the test case. The difference is very small, so I chose 2 MiB.

I tested reading from qemu-nbd and writing to qemu-nbd with qemu-img and
computing a blkhash with nbdcopy and blksum.

To focus on NBD communication and get less noisy results, I tested
reading and writing to null-co driver. I added a read-pattern option to
the null-co driver to return data full of 0xff:

    NULL="json:{'driver': 'raw', 'file': {'driver': 'null-co', 'size': '10g', 'read-pattern': 255}}"

For testing buffer size I added an environment variable for setting the
socket buffer size.

Read from qemu-nbd via qemu-img convert. In this test buffer size of 2m
is optimal (12.6 times faster).

    qemu-nbd -r -t -e 0 -f raw -k /tmp/nbd.sock "$NULL" &
    qemu-img convert -f raw -O raw -W -n "nbd+unix:///?socket=/tmp/nbd.sock" "$NULL"

| buffer size | time    | user    | system  |
|-------------|---------|---------|---------|
|     default |  13.361 |   2.653 |   5.702 |
|       65536 |   2.283 |   0.204 |   1.318 |
|      131072 |   1.673 |   0.062 |   1.008 |
|      262144 |   1.592 |   0.053 |   0.952 |
|      524288 |   1.496 |   0.049 |   0.887 |
|     1048576 |   1.234 |   0.047 |   0.738 |
|     2097152 |   1.060 |   0.080 |   0.602 |
|     4194304 |   1.061 |   0.076 |   0.604 |

Write to qemu-nbd with qemu-img convert. In this test buffer size of 2m
is optimal (9.2 times faster).

    qemu-nbd -t -e 0 -f raw -k /tmp/nbd.sock "$NULL" &
    qemu-img convert -f raw -O raw -W -n "$NULL" "nbd+unix:///?socket=/tmp/nbd.sock"

| buffer size | time    | user    | system  |
|-------------|---------|---------|---------|
|     default |   8.063 |   2.522 |   4.184 |
|       65536 |   1.472 |   0.430 |   0.867 |
|      131072 |   1.071 |   0.297 |   0.654 |
|      262144 |   1.012 |   0.239 |   0.587 |
|      524288 |   0.970 |   0.201 |   0.514 |
|     1048576 |   0.895 |   0.184 |   0.454 |
|     2097152 |   0.877 |   0.174 |   0.440 |
|     4194304 |   0.944 |   0.231 |   0.535 |

Compute a blkhash with nbdcopy, using 4 NBD connections and 256k request
size. In this test buffer size of 4m is optimal (5.1 times faster).

    qemu-nbd -r -t -e 0 -f raw -k /tmp/nbd.sock "$NULL" &
    nbdcopy --blkhash "nbd+unix:///?socket=/tmp/nbd.sock" null:

| buffer size | time    | user    | system  |
|-------------|---------|---------|---------|
|     default |   8.624 |   5.727 |   6.507 |
|       65536 |   2.563 |   4.760 |   2.498 |
|      131072 |   1.903 |   4.559 |   2.093 |
|      262144 |   1.759 |   4.513 |   1.935 |
|      524288 |   1.729 |   4.489 |   1.924 |
|     1048576 |   1.696 |   4.479 |   1.884 |
|     2097152 |   1.710 |   4.480 |   1.763 |
|     4194304 |   1.687 |   4.479 |   1.712 |

Compute a blkhash with blksum, using 1 NBD connection and 256k read
size. In this test buffer size of 512k is optimal (10.3 times faster).

    qemu-nbd -r -t -e 0 -f raw -k /tmp/nbd.sock "$NULL" &
    blksum "nbd+unix:///?socket=/tmp/nbd.sock"

| buffer size | time    | user    | system  |
|-------------|---------|---------|---------|
|     default |  13.085 |   5.664 |   6.461 |
|       65536 |   3.299 |   5.106 |   2.515 |
|      131072 |   2.396 |   4.989 |   2.069 |
|      262144 |   1.607 |   4.724 |   1.555 |
|      524288 |   1.271 |   4.528 |   1.224 |
|     1048576 |   1.294 |   4.565 |   1.333 |
|     2097152 |   1.299 |   4.569 |   1.344 |
|     4194304 |   1.291 |   4.559 |   1.327 |

Signed-off-by: Nir Soffer <nirsof@gmail.com>
Message-ID: <20250517201154.88456-3-nirsof@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

io: Add helper for setting socket send buffer size

Testing reading and writing from qemu-nbd using a unix domain socket
shows that the platform default send buffer size is too low, leading to
poor performance and hight cpu usage.

Add a helper for setting socket send buffer size to be used in NBD code.
It can also be used in other contexts.

We don't need a helper for receive buffer size since it is not used with
unix domain sockets. This is documented for Linux, and not documented
for macOS.

Failing to set the socket buffer size is not a fatal error, but the
caller may want to warn about the failure.

Signed-off-by: Nir Soffer <nirsof@gmail.com>
Message-ID: <20250517201154.88456-2-nirsof@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

Merge tag 'pull-qapi-2025-05-28' of https://repo.or.cz/qemu/armbru into staging

QAPI patches patches for 2025-05-28

# -----BEGIN PGP SIGNATURE-----
#
# iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmg3UTYSHGFybWJydUBy
# ZWRoYXQuY29tAAoJEDhwtADrkYZTz9cQALqbici35rI19BYR8XNTcIK1sS6iB9wx
# 6vLLix7a+/vsmBXiHEfo6nnlTBsU1NVP+8Tvx8+6TRBUnjK+9YaPZHE8k6lGglWm
# 5lbue7nUlzaF4TfTmqrsCeeYKvc8iwC5TUBHbsLNpf9IIvNHbRm4IrD4ySnur+mN
# vTQWEvLkT9quh5KPaiZajlQulIpaFZjwREJ2U8LC6Tb+t0qtAGL6hc+etI49re6A
# 2jJq29G+hSxK87FBFwgilh4Dl5DCDAe75Plp1Opy0wyowM06ilSATYBJ6SL4B3wg
# RKQXmHiHZCxk+MLs3vhE65bhNmMLkf+xbY/jxSNs5Hisj4Snt7bLqWRaBAhkRZOz
# ZCyGMI6lpJELo8VIEE2gB8m/kf6YAG4pfLdZkIZCuFyW2I6b3OQjOn5G0td6JtvX
# a5ygtuzi8VIxA3FcODb/EMNAPOv6B4aHgW3IaiwLB2kgeiqR+yMIE6zqZZHrEGUl
# A/S7an99vbHgSFPtJ37VaUEdDnb06b4ebIvNyBzrgtXO8ekHaXAjCh52UYkLFOJe
# S0dBrENj6M1yJ8HPwqWgP25PdlBAbCGHCsaZScrv7j08Q7sNJbQz0mmrCi0V/djV
# riZBVcODabQ9mveMc1KJplKwIg351YJk3XwHqMLKHw9srMl3z3YcZf6T3e/G3ScQ
# rlqRDslZvvgd
# =3NrB
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 28 May 2025 14:08:54 EDT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* tag 'pull-qapi-2025-05-28' of https://repo.or.cz/qemu/armbru:
  qapi: use imperative style in documentation
  qapi: make all generated files common
  qapi: remove qapi_specific_outputs from meson.build
  qapi: make s390x specific CPU commands unconditionally available
  qapi: make most CPU commands unconditionally available
  qapi: Make CpuModelExpansionInfo::deprecated-props optional and generic
  qapi: remove the misc-target.json file
  qapi: make Xen event commands unconditionally available
  qapi: make SGX commands unconditionally available
  qapi: expose query-gic-capability command unconditionally
  qapi: make SEV commands unconditionally available
  qapi: expand docs for SEV commands
  qapi: expose rtc-reset-reinjection command unconditionally

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'pull-misc-2025-05-28' of https://repo.or.cz/qemu/armbru into staging

Miscellaneous patches for 2025-05-28

# -----BEGIN PGP SIGNATURE-----
#
# iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmg2+5ISHGFybWJydUBy
# ZWRoYXQuY29tAAoJEDhwtADrkYZTZnUP/jl0/gGZujGVQVPtKKF5ZlbQauTeun9q
# Odd8cAbusQhgWnMrmyPDcvrmsVCcFg+8mIRU38kY43oTIKzig4q8/qB3+zcO4r0u
# 4FZKO1sbqx9ByQTqQlXQxzqFcFtTS8vP0iQ2OkupnPBK41JcvFExnMovXHD3HTRd
# om7FKeEb//oplSWW66sHmH6Dco6AcdO+2rSMLRRLftq3QH0bXlbNdaLl+CNfoRmd
# VcqjkKHYCPDGY3u1vcbY97qKiju/Yg7lQZmtGJ2MfAFB9saLhCi2URYglUAbCmDK
# f6pp/vzVf+kNi8XYDcFoAqk85k5J8jXUXV9HekEPGIi8Jqz7bdCwLSdSGVBzLjOd
# sQcB6gZDKA5/JmK6jtRZYiHS70Izn0ZZec0B4xuzFA3saRg42H4Yj+MeoFBGI9HE
# 58S6GOz6R1tPD8ZVW366adlyjHQbJdiYz/MYNWBqEMJ1qhmAzcnkXxwWN+sEAchE
# vjQ4ZqZdcfjFydjoceYsJagIUHTQDR7ATvRg2TVYTdgzd/dS6ZC1dEiJNu18fhr1
# Io4cDhCUoKJmIJFKa/R9Egh2ZuH7TX3XapZahNiae2cXpDbbcgGml62a0cyIEzo4
# leFm0vmhpdSKQUQnTpsUAb7vFQkygoUOmRNAo44eXOxuEZWUCm/jcjBjjhbrazGY
# 40701r6nEDZv
# =BDzG
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 28 May 2025 08:03:30 EDT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* tag 'pull-misc-2025-05-28' of https://repo.or.cz/qemu/armbru:
  docs/about/removed-features: Move removal notes to tidy up order
  docs/about/deprecated: Move deprecation notes to tidy up order
  qapi/migration: Deprecate migrate argument @detach
  docs/about: Belatedly document tightening of QMP device_add checking

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'pull-tcg-20250528' of https://gitlab.com/rth7680/qemu into staging

accel/tcg: Fix atomic_mmu_lookup vs TLB_FORCE_SLOW
linux-user: implement pgid field of /proc/self/stat
target/sh4: Use MO_ALIGN for system UNALIGN()
target/microblaze: Use TARGET_LONG_BITS == 32 for system mode
accel/tcg: Add TCGCPUOps.pointer_wrap
target/*: Populate TCGCPUOps.pointer_wrap

# -----BEGIN PGP SIGNATURE-----
#
# iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmg2xZAdHHJpY2hhcmQu
# aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV/VmAgAu5PHIARUuNqneUPQ
# 2JxqpZHGVbaXE0ACi9cslpfThFM/I4OXmK21ZWb1dHB3qasNiKU8cdImXSUVH3dj
# DLsr/tliReerZGUoHEtFsYd+VOtqb3wcrvXxnzG/xB761uZjFCnqwy4MrXMfSXVh
# 6w+eysWOblYHQb9rAZho4nyw6BgjYAX2vfMFxLJBcDP/fjILFB7xoXHEyqKWMmE1
# 0enA0KUotyLOCRXVEXSsfPDYD8szXfMkII3YcGnscthm5j58oc3skVdKFGVjNkNb
# /aFpyvoU7Vp3JpxkYEIWLQrRM75VSb1KzJwMipHgYy3GoV++BrY10T0jyEPrx0iq
# RFzK4A==
# =XQzq
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 28 May 2025 04:13:04 EDT
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* tag 'pull-tcg-20250528' of https://gitlab.com/rth7680/qemu: (28 commits)
  accel/tcg: Assert TCGCPUOps.pointer_wrap is set
  target/sparc: Fill in TCGCPUOps.pointer_wrap
  target/s390x: Fill in TCGCPUOps.pointer_wrap
  target/riscv: Fill in TCGCPUOps.pointer_wrap
  target/ppc: Fill in TCGCPUOps.pointer_wrap
  target/mips: Fill in TCGCPUOps.pointer_wrap
  target/loongarch: Fill in TCGCPUOps.pointer_wrap
  target/i386: Fill in TCGCPUOps.pointer_wrap
  target/arm: Fill in TCGCPUOps.pointer_wrap
  target: Use cpu_pointer_wrap_uint32 for 32-bit targets
  target: Use cpu_pointer_wrap_notreached for strict align targets
  accel/tcg: Add TCGCPUOps.pointer_wrap
  target/sh4: Use MO_ALIGN for system UNALIGN()
  tcg: Drop TCGContext.page_{mask,bits}
  tcg: Drop TCGContext.tlb_dyn_max_bits
  target/microblaze: Simplify compute_ldst_addr_type{a,b}
  target/microblaze: Drop DisasContext.r0
  target/microblaze: Use TARGET_LONG_BITS == 32 for system mode
  target/microblaze: Fix printf format in mmu_translate
  target/microblaze: Use TCGv_i64 for compute_ldst_addr_ea
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'qemu-sparc-20250527' of https://github.com/mcayland/qemu into staging

qemu-sparc queue

# -----BEGIN PGP SIGNATURE-----
#
# iQFSBAABCgA8FiEEzGIauY6CIA2RXMnEW8LFb64PMh8FAmg2K6QeHG1hcmsuY2F2
# ZS1heWxhbmRAaWxhbmRlLmNvLnVrAAoJEFvCxW+uDzIfLHcIAJeHFKWI/CFrsuu4
# DkbizEY7g6DPROg11XfL/EPIJdCQwDM5b1uWqUq0QajtCZBM6VUEnQ/VrhAo8ZRX
# xEzK6119bZx68hGRKQIEhpRBX72OoqCb5poP/Xo8xgtSrHpD0EkW6L05YnUl3o6h
# 1pWFA5ivtKOvUoAzzBKw0EH0UuOXr1sX/SwYrmwbiOs6oY0U+sY8SaCucEhffKtb
# po1WBXNyZxGwJdLfypf0HoytKLS/09LgYnMGAcQT6VPovhyaAV1453gq8k6tmoR5
# myfJ8+Y8jeRFRinzP3DSt8+TlRJQijUYXD+MclsijuY6BV/rZYWkNBcAyPyu9m0w
# Pd04v34=
# =s84p
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 27 May 2025 17:16:20 EDT
# gpg:                using RSA key CC621AB98E82200D915CC9C45BC2C56FAE0F321F
# gpg:                issuer "mark.cave-ayland@ilande.co.uk"
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>" [full]
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C  C9C4 5BC2 C56F AE0F 321F

* tag 'qemu-sparc-20250527' of https://github.com/mcayland/qemu:
  target/sparc: don't set FSR_NVA when comparing unordered floats

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

target/i386/tcg/helper-tcg: fix file references in comments

Commit 32cad1ffb8 ("include: Rename sysemu/ -> system/") renamed
target/i386/tcg/sysemu => target/i386/tcg/system.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20250526114447.1243840-1-f.ebner@proxmox.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: Add support for EPYC-Turin model

Add the support for AMD EPYC zen 5 processors (EPYC-Turin).

Add the following new feature bits on top of the feature bits from
the previous generation EPYC models.

movdiri             : Move Doubleword as Direct Store Instruction
movdir64b           : Move 64 Bytes as Direct Store Instruction
avx512-vp2intersect : AVX512 Vector Pair Intersection to a Pair
                      of Mask Register
avx-vnni            : AVX VNNI Instruction
prefetchi           : Indicates support for IC prefetch
sbpb                : Selective Branch Predictor Barrier
ibpb-brtype         : IBPB includes branch type prediction flushing
srso-user-kernel-no : Not vulnerable to SRSO at the user-kernel boundary

Link: https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/57238.zip
Link: https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/b4fa7708a0e1453d2e9b8ec3dc881feb92eeca0b.1746734284.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: Update EPYC-Genoa for Cache property, perfmon-v2, RAS and SVM feature bits

Found that some of the cache properties are not set correctly for EPYC models.
l1d_cache.no_invd_sharing should not be true.
l1i_cache.no_invd_sharing should not be true.

L2.self_init should be true.
L2.inclusive should be true.

L3.inclusive should not be true.
L3.no_invd_sharing should be true.

Fix these cache properties.

Also add the missing RAS and SVM features bits on AMD EPYC-Genoa model.
The SVM feature bits are used in nested guests.

perfmon-v2     : Allow guests to make use of the PerfMonV2 features.
succor         : Software uncorrectable error containment and recovery capability.
overflow-recov : MCA overflow recovery support.
lbrv           : LBR virtualization
tsc-scale      : MSR based TSC rate control
vmcb-clean     : VMCB clean bits
flushbyasid    : Flush by ASID
pause-filter   : Pause intercept filter
pfthreshold    : PAUSE filter threshold
v-vmsave-vmload: Virtualized VMLOAD and VMSAVE
vgif           : Virtualized GIF
fs-gs-base-ns  : WRMSR to {FS,GS,KERNEL_GS}_BASE is non-serializing

The feature details are available in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/afe3f05d4116124fd5795f28fc23d7b396140313.1746734284.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: Add couple of feature bits in CPUID_Fn80000021_EAX

Add CPUID bit indicates that a WRMSR to MSR_FS_BASE, MSR_GS_BASE, or
MSR_KERNEL_GS_BASE is non-serializing amd PREFETCHI that the indicates
support for IC prefetch.

CPUID_Fn80000021_EAX
Bit    Feature description
20     Indicates support for IC prefetch.
1      FsGsKernelGsBaseNonSerializing.
       WRMSR to FS_BASE, GS_BASE and KernelGSbase are non-serializing.

Link: https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/57238.zip
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/a5f6283a59579b09ac345b3f21ecb3b3b2d92451.1746734284.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: Update EPYC-Milan CPU model for Cache property, RAS, SVM feature bits

Found that some of the cache properties are not set correctly for EPYC models.
l1d_cache.no_invd_sharing should not be true.
l1i_cache.no_invd_sharing should not be true.

L2.self_init should be true.
L2.inclusive should be true.

L3.inclusive should not be true.
L3.no_invd_sharing should be true.

Fix these cache properties.

Also add the missing RAS and SVM features bits on AMD EPYC-Milan model.
The SVM feature bits are used in nested guests.

succor          : Software uncorrectable error containment and recovery capability.
overflow-recov  : MCA overflow recovery support.
lbrv            : LBR virtualization
tsc-scale       : MSR based TSC rate control
vmcb-clean      : VMCB clean bits
flushbyasid     : Flush by ASID
pause-filter    : Pause intercept filter
pfthreshold     : PAUSE filter threshold
v-vmsave-vmload : Virtualized VMLOAD and VMSAVE
vgif            : Virtualized GIF

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/c619c0e09a9d5d496819ed48d69181d65f416891.1746734284.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: Update EPYC-Rome CPU model for Cache property, RAS, SVM feature bits

Found that some of the cache properties are not set correctly for EPYC models.

l1d_cache.no_invd_sharing should not be true.
l1i_cache.no_invd_sharing should not be true.

L2.self_init should be true.
L2.inclusive should be true.

L3.inclusive should not be true.
L3.no_invd_sharing should be true.

Fix these cache properties.

Also add the missing RAS and SVM features bits on AMD EPYC-Rome. The SVM
feature bits are used in nested guests.

succor : Software uncorrectable error containment and recovery capability.
overflow-recov : MCA overflow recovery support.
lbrv : LBR virtualization
tsc-scale : MSR based TSC rate control
vmcb-clean : VMCB clean bits
flushbyasid : Flush by ASID
pause-filter : Pause intercept filter
pfthreshold : PAUSE filter threshold
v-vmsave-vmload : Virtualized VMLOAD and VMSAVE
vgif : Virtualized GIF

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/8265af72057b84c99ac3a02a5487e32759cc69b1.1746734284.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: Update EPYC CPU model for Cache property, RAS, SVM feature bits

Found that some of the cache properties are not set correctly for EPYC models.

l1d_cache.no_invd_sharing should not be true.
l1i_cache.no_invd_sharing should not be true.

L2.self_init should be true.
L2.inclusive should be true.

L3.inclusive should not be true.
L3.no_invd_sharing should be true.

Fix the cache properties.

Also add the missing RAS and SVM features bits on AMD
EPYC CPU models. The SVM feature bits are used in nested guests.

succor : Software uncorrectable error containment and recovery capability.
overflow-recov : MCA overflow recovery support.
lbrv : LBR virtualization
tsc-scale : MSR based TSC rate control
vmcb-clean : VMCB clean bits
flushbyasid : Flush by ASID
pause-filter : Pause intercept filter
pfthreshold : PAUSE filter threshold
v-vmsave-vmload : Virtualized VMLOAD and VMSAVE
vgif : Virtualized GIF

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/515941861700d7066186c9600bc5d96a1741ef0c.1746734284.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

rust: make declaration of dependent crates more consistent

Crates like "bilge" and "libc" can be shared by more than one directory,
so declare them directly in rust/meson.build. While at it, make their
variable names end with "_rs" and always add a subproject() statement
(as that pinpoints the error better if the subproject is missing and
cannot be downloaded).

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

docs: Add TDX documentation

Add docs/system/i386/tdx.rst for TDX support, and add tdx in
confidential-guest-support.rst

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-56-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Validate phys_bits against host value

For TDX guest, the phys_bits is not configurable and can only be
host/native value.

Validate phys_bits inside tdx_check_features().

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-55-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Make invtsc default on

Because it's fixed1 bit that enforced by TDX module.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-54-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Don't treat SYSCALL as unavailable

On Intel CPU, the value of CPUID_EXT2_SYSCALL depends on the mode of
the vcpu. It's 0 outside 64-bit mode and 1 in 64-bit mode.

The initial state of TDX vcpu is 32-bit protected mode. At the time of
calling KVM_TDX_GET_CPUID, vcpu hasn't started running so the value read
is 0.

In reality, 64-bit mode should always be supported. So mark
CPUID_EXT2_SYSCALL always supported to avoid false warning.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-53-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Fetch and validate CPUID of TD guest

Use KVM_TDX_GET_CPUID to get the CPUIDs that are managed and enfored
by TDX module for TD guest. Check QEMU's configuration against the
fetched data.

Print wanring message when 1. a feature is not supported but requested
by QEMU or 2. QEMU doesn't want to expose a feature while it is enforced
enabled.

- If cpu->enforced_cpuid is not set, prints the warning message of both
1) and 2) and tweak QEMU's configuration.

- If cpu->enforced_cpuid is set, quit if any case of 1) or 2).

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-52-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: Print CPUID subleaf info for unsupported feature

Some CPUID leaves have meaningful subleaf index. Print the subleaf info
in feature_word_description for CPUID features.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20241217123932.948789-3-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386: Remove unused parameter "uint32_t bit" in feature_word_description()

Parameter "uint32_t bit" is not used in function feature_word_description(),
so remove it.

Signed-off-by: Lei Wang <lei4.wang@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20241217123932.948789-2-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/cgs: Introduce x86_confidential_guest_check_features()

To do cgs specific feature checking. Note the feature checking in
x86_cpu_filter_features() is valid for non-cgs VMs. For cgs VMs like
TDX, what features can be supported has more restrictions.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-51-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Define supported KVM features for TDX

For TDX, only limited KVM PV features are supported.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-50-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Add XFD to supported bit of TDX

Just mark XFD as always supported for TDX. This simple solution relies
on the fact KVM will report XFD as 0 when it's not supported by the
hardware.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-49-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Add supported CPUID bits relates to XFAM

Some CPUID bits are controlled by XFAM. They are not covered by
tdx_caps.cpuid (which only contians the directly configurable bits), but
they are actually supported when the related XFAM bit is supported.

Add these XFAM controlled bits to TDX supported CPUID bits based on the
supported_xfam.

Besides, incorporate the supported_xfam into the supported CPUID leaf of
0xD.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-48-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Add supported CPUID bits related to TD Attributes

For TDX, some CPUID feature bit is configured via TD attributes. They
are not covered by tdx_caps.cpuid (which only contians the directly
configurable CPUID bits), but they are actually supported when the
related attributre bit is supported.

Note, LASS and KeyLocker are not supported by KVM for TDX, nor does
QEMU support it (see TDX_SUPPORTED_TD_ATTRS). They are defined in
tdx_attrs_maps[] for the completeness of the existing TD Attribute
bits that are related with CPUID features.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-47-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Add TDX fixed1 bits to supported CPUIDs

TDX architecture forcibly sets some CPUID bits for TD guest that VMM
cannot disable it. They are fixed1 bits.

Fixed1 bits are not covered by tdx_caps.cpuid (which only contains the
directly configurable bits), while fixed1 bits are supported for TD guest
obviously.

Add fixed1 bits to tdx_supported_cpuid. Besides, set all the fixed1
bits to the initial set of KVM's support since KVM might not report them
as supported.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-46-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Implement adjust_cpuid_features() for TDX

Maintain a TDX specific supported CPUID set, and use it to mask the
common supported CPUID value of KVM. It can avoid newly added supported
features (reported via KVM_GET_SUPPORTED_CPUID) for common VMs being
falsely reported as supported for TDX.

As the first step, initialize the TDX supported CPUID set with all the
configurable CPUID bits. It's not complete because there are other CPUID
bits are supported for TDX but not reported as directly configurable.
E.g. the XFAM related bits, attribute related bits and fixed-1 bits.
They will be handled in the future.

Also, what matters are the CPUID bits related to QEMU's feature word.
Only mask the CPUID leafs which are feature word leaf.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-45-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/cgs: Rename *mask_cpuid_features() to *adjust_cpuid_features()

Because for TDX case, there are also fixed-1 bits that enforced by TDX
module.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-44-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cpu: Don't set vcpu_dirty when guest_state_protected

QEMU calls kvm_arch_put_registers() when vcpu_dirty is true in
kvm_vcpu_exec(). However, for confidential guest, like TDX, putting
registers is disallowed due to guest state is protected.

Only set vcpu_dirty to true with guest state is not protected when
creating the vcpu.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-43-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/apic: Skip kvm_apic_put() for TDX

KVM neithers allow writing to MSR_IA32_APICBASE for TDs, nor allow for
KVM_SET_LAPIC[*].

Note, KVM_GET_LAPIC is also disallowed for TDX. It is called in the path

  do_kvm_cpu_synchronize_state()
  -> kvm_arch_get_registers()
     -> kvm_get_apic()

and it's already disllowed for confidential guest through
guest_state_protected.

[*] https://lore.kernel.org/all/Z3w4Ku4Jq0CrtXne@google.com/

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-42-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs

For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
by VMM, while the features enumerated/controlled by other MSRs except
MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.

Only configure MSR_IA32_UCODE_REV for TDs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-41-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Don't synchronize guest tsc for TDs

TSC of TDs is not accessible and KVM doesn't allow access of
MSR_IA32_TSC for TDs. To avoid the assert() in kvm_get_tsc, make
kvm_synchronize_all_tsc() noop for TDs,

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-40-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Set and check kernel_irqchip mode for TDX

KVM mandates kernel_irqchip to be split mode.

Set it to split mode automatically when users don't provide an explicit
value, otherwise check it to be the split mode.

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-39-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Disable PIC for TDX VMs

Legacy PIC (8259) cannot be supported for TDX VMs since TDX module
doesn't allow directly interrupt injection. Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.

Hence disable PIC for TDX VMs and error out if user wants PIC.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-38-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Disable SMM for TDX VMs

TDX doesn't support SMM and VMM cannot emulate SMM for TDX VMs because
VMM cannot manipulate TDX VM's memory.

Disable SMM for TDX VMs and error out if user requests to enable SMM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-37-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM

TDX only supports readonly for shared memory but not for private memory.

In the view of QEMU, it has no idea whether a memslot is used as shared
memory of private. Thus just mark kvm_readonly_mem_enabled to false to
TDX VM for simplicity.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-36-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Force exposing CPUID 0x1f

TDX uses CPUID 0x1f to configure TD guest's CPU topology. So set
enable_cpuid_0x1f for TDs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-35-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/cpu: Introduce enable_cpuid_0x1f to force exposing CPUID 0x1f

Currently, QEMU exposes CPUID 0x1f to guest only when necessary, i.e.,
when topology level that cannot be enumerated by leaf 0xB, e.g., die or
module level, are configured for the guest, e.g., -smp xx,dies=2.

However, TDX architecture forces to require CPUID 0x1f to configure CPU
topology.

Introduce a bool flag, enable_cpuid_0x1f, in CPU for the case that
requires CPUID leaf 0x1f to be exposed to guest.

Introduce a new function x86_has_cpuid_0x1f(), which is the wrapper of
cpu->enable_cpuid_0x1f and x86_has_extended_topo() to check if it needs
to enable cpuid leaf 0x1f for the guest.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-34-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: implement tdx_cpu_instance_init()

Currently, pmu is not supported for TDX by KVM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-33-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/cpu: introduce x86_confidential_guest_cpu_instance_init()

To allow execute confidential guest specific cpu init operations.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-32-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

kvm: Check KVM_CAP_MAX_VCPUS at vm level

KVM with TDX support starts to report different KVM_CAP_MAX_VCPUS per
different VM types. So switch to check the KVM_CAP_MAX_VCPUS at vm level.

KVM still returns the global KVM_CAP_MAX_VCPUS when the KVM is old that
doesn't report different value at vm level.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-31-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Wire TDX_REPORT_FATAL_ERROR with GuestPanic facility

Integrate TDX's TDX_REPORT_FATAL_ERROR into QEMU GuestPanic facility

Originated-from: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-30-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Handle KVM_SYSTEM_EVENT_TDX_FATAL

TD guest can use TDG.VP.VMCALL<REPORT_FATAL_ERROR> to request
termination. KVM translates such request into KVM_EXIT_SYSTEM_EVENT with
type of KVM_SYSTEM_EVENT_TDX_FATAL.

Add hanlder for such exit. Parse and print the error message, and
terminate the TD guest in the handler.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-29-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Enable user exit on KVM_HC_MAP_GPA_RANGE

KVM translates TDG.VP.VMCALL<MapGPA> to KVM_HC_MAP_GPA_RANGE, and QEMU
needs to enable user exit on KVM_HC_MAP_GPA_RANGE in order to handle the
memory conversion requested by TD guest.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-28-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Finalize TDX VM

Invoke KVM_TDX_FINALIZE_VM to finalize the TD's measurement and make
the TD vCPUs runnable once machine initialization is complete.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-27-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu

TDX vcpu needs to be initialized by SEAMCALL(TDH.VP.INIT) and KVM
provides vcpu level IOCTL KVM_TDX_INIT_VCPU for it.

KVM_TDX_INIT_VCPU needs the address of the HOB as input. Invoke it for
each vcpu after HOB list is created.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-26-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION

TDVF firmware (CODE and VARS) needs to be copied to TD's private
memory via KVM_TDX_INIT_MEM_REGION, as well as TD HOB and TEMP memory.

If the TDVF section has TDVF_SECTION_ATTRIBUTES_MR_EXTEND set in the
flag, calling KVM_TDX_EXTEND_MEMORY to extend the measurement.

After populating the TDVF memory, the original image located in shared
ramblock can be discarded.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-25-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Setup the TD HOB list

The TD HOB list is used to pass the information from VMM to TDVF. The TD
HOB must include PHIT HOB and Resource Descriptor HOB. More details can
be found in TDVF specification and PI specification.

Build the TD HOB in TDX's machine_init_done callback.

Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-24-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

headers: Add definitions from UEFI spec for volumes, resources, etc...

Add UEFI definitions for literals, enums, structs, GUIDs, etc... that
will be used by TDX to build the UEFI Hand-Off Block (HOB) that is passed
to the Trusted Domain Virtual Firmware (TDVF).

All values come from the UEFI specification [1], PI spec [2] and TDVF
design guide[3].

[1] UEFI Specification v2.1.0 https://uefi.org/sites/default/files/resources/UEFI_Spec_2_10_Aug29.pdf
[2] UEFI PI spec v1.8 https://uefi.org/sites/default/files/resources/UEFI_PI_Spec_1_8_March3.pdf
[3] https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-23-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Track RAM entries for TDX VM

The RAM of TDX VM can be classified into two types:

- TDX_RAM_UNACCEPTED: default type of TDX memory, which needs to be
   accepted by TDX guest before it can be used and will be all-zeros
   after being accepted.

- TDX_RAM_ADDED: the RAM that is ADD'ed to TD guest before running, and
   can be used directly. E.g., TD HOB and TEMP MEM that needed by TDVF.

Maintain TdxRamEntries[] which grabs the initial RAM info from e820 table
and mark each RAM range as default type TDX_RAM_UNACCEPTED.

Then turn the range of TD HOB and TEMP MEM to TDX_RAM_ADDED since these
ranges will be ADD'ed before TD runs and no need to be accepted runtime.

The TdxRamEntries[] are later used to setup the memory TD resource HOB
that passes memory info from QEMU to TDVF.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-22-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Track mem_ptr for each firmware entry of TDVF

For each TDVF sections, QEMU needs to copy the content to guest
private memory via KVM API (KVM_TDX_INIT_MEM_REGION).

Introduce a field @mem_ptr for TdxFirmwareEntry to track the memory
pointer of each TDVF sections. So that QEMU can add/copy them to guest
private memory later.

TDVF sections can be classified into two groups:
- Firmware itself, e.g., TDVF BFV and CFV, that located separately from
guest RAM. Its memory pointer is the bios pointer.

- Sections located at guest RAM, e.g., TEMP_MEM and TD_HOB.
mmap a new memory range for them.

Register a machine_init_done callback to do the stuff.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-21-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Don't initialize pc.rom for TDX VMs

For TDX, the address below 1MB are entirely general RAM. No need to
initialize pc.rom memory region for TDs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-20-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Parse TDVF metadata for TDX VM

After TDVF is loaded to bios MemoryRegion, it needs parse TDVF metadata.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-19-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdvf: Introduce function to parse TDVF metadata

TDX VM needs to boot with its specialized firmware, Trusted Domain
Virtual Firmware (TDVF). QEMU needs to parse TDVF and map it in TD
guest memory prior to running the TDX VM.

A TDVF Metadata in TDVF image describes the structure of firmware.
QEMU refers to it to setup memory for TDVF. Introduce function
tdvf_parse_metadata() to parse the metadata from TDVF image and store
the info of each TDVF section.

TDX metadata is located by a TDX metadata offset block, which is a
GUID-ed structure. The data portion of the GUID structure contains
only an 4-byte field that is the offset of TDX metadata to the end
of firmware file.

Select X86_FW_OVMF when TDX is enable to leverage existing functions
to parse and search OVMF's GUID-ed structures.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-18-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: load TDVF for TD guest

TDVF(OVMF) needs to run at private memory for TD guest. TDX cannot
support pflash device since it doesn't support read-only private memory.
Thus load TDVF(OVMF) with -bios option for TDs.

Use memory_region_init_ram_guest_memfd() to allocate the MemoryRegion
for TDVF because it needs to be located at private memory.

Also store the MemoryRegion pointer of TDVF since the shared ramblock of
it can be discared after it gets copied to private ramblock.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-17-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Implement user specified tsc frequency

Reuse "-cpu,tsc-frequency=" to get user wanted tsc frequency and call VM
scope VM_SET_TSC_KHZ to set the tsc frequency of TD before KVM_TDX_INIT_VM.

Besides, sanity check the tsc frequency to be in the legal range and
legal granularity (required by TDX module).

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-16-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Set APIC bus rate to match with what TDX module enforces

TDX advertises core crystal clock with cpuid[0x15] as 25MHz for TD
guests and it's unchangeable from VMM. As a result, TDX guest reads
the APIC timer at the same frequency, 25MHz.

While KVM's default emulated frequency for APIC bus is 1GHz, set the
APIC bus rate to match with TDX explicitly to ensure KVM provide correct
emulated APIC timer for TD guest.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-15-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Support user configurable mrconfigid/mrowner/mrownerconfig

Three sha384 hash values, mrconfigid, mrowner and mrownerconfig, of a TD
can be provided for TDX attestation. Detailed meaning of them can be
found: https://lore.kernel.org/qemu-devel/31d6dbc1-f453-4cef-ab08-4813f4e0ff92@intel.com/

Allow user to specify those values via property mrconfigid, mrowner and
mrownerconfig. They are all in base64 format.

example
-object tdx-guest, \
  mrconfigid=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
  mrowner=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
  mrownerconfig=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-14-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Validate TD attributes

Validate TD attributes with tdx_caps that only supported bits are
allowed by KVM.

Besides, sanity check the attribute bits that have not been supported by
QEMU yet. e.g., debug bit, it will be allowed in the future when debug
TD support lands in QEMU.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Link: https://lore.kernel.org/r/20250508150002.689633-13-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Wire CPU features up with attributes of TD guest

For QEMU VMs,
- PKS is configured via CPUID_7_0_ECX_PKS, e.g., -cpu xxx,+pks and
- PMU is configured by x86cpu->enable_pmu, e.g., -cpu xxx,pmu=on

While the bit 30 (PKS) and bit 63 (PERFMON) of TD's attributes are also
used to configure the PKS and PERFMON/PMU of TD, reuse the existing
configuration interfaces of 'cpu' for TD's attributes.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-12-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Make sept_ve_disable set by default

For TDX KVM use case, Linux guest is the most major one. It requires
sept_ve_disable set. Make it default for the main use case. For other use
case, it can be enabled/disabled via qemu command line.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-11-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Add property sept-ve-disable for tdx-guest object

Bit 28 of TD attribute, named SEPT_VE_DISABLE. When set to 1, it disables
EPT violation conversion to #VE on guest TD access of PENDING pages.

Some guest OS (e.g., Linux TD guest) may require this bit as 1.
Otherwise refuse to boot.

Add sept-ve-disable property for tdx-guest object, for user to configure
this bit.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-10-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Initialize TDX before creating TD vcpus

Invoke KVM_TDX_INIT_VM in kvm_arch_pre_create_vcpu() that
KVM_TDX_INIT_VM configures global TD configurations, e.g. the canonical
CPUID config, and must be executed prior to creating vCPUs.

Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM.

Note, this doesn't address the fact that QEMU may change the CPUID
configuration when creating vCPUs, i.e. punts on refactoring QEMU to
provide a stable CPUID config prior to kvm_arch_init().

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-9-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

kvm: Introduce kvm_arch_pre_create_vcpu()

Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.

The specific implementation for i386 will be added in the future patch.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-8-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object

It will need special handling for TDX VMs all around the QEMU.
Introduce is_tdx_vm() helper to query if it's a TDX VM.

Cache tdx_guest object thus no need to cast from ms->cgs every time.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-7-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES

KVM provides TDX capabilities via sub command KVM_TDX_CAPABILITIES of
IOCTL(KVM_MEMORY_ENCRYPT_OP). Get the capabilities when initializing
TDX context. It will be used to validate user's setting later.

Since there is no interface reporting how many cpuid configs contains in
KVM_TDX_CAPABILITIES, QEMU chooses to try starting with a known number
and abort when it exceeds KVM_MAX_CPUID_ENTRIES.

Besides, introduce the interfaces to invoke TDX "ioctls" at VCPU scope
in preparation.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-6-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context

Implement TDX specific ConfidentialGuestSupportClass::kvm_init()
callback, tdx_kvm_init().

Mark guest state is proctected for TDX VM. More TDX specific
initialization will be added later.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-5-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/tdx: Implement tdx_kvm_type() for TDX

TDX VM requires VM type to be KVM_X86_TDX_VM. Implement tdx_kvm_type()
as X86ConfidentialGuestClass->kvm_type.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-4-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386: Introduce tdx-guest object

Introduce tdx-guest object which inherits X86_CONFIDENTIAL_GUEST,
and will be used to create TDX VMs (TDs) by

qemu -machine ...,confidential-guest-support=tdx0 \
-object tdx-guest,id=tdx0

It has one QAPI member 'attributes' defined, which allows user to set
TD's attributes directly.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-3-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

rocker: do not pollute the namespace

Do not leave the __le* macros defined, in fact do not use them at all.  Fixes a
build failure on Alpine with the TDX patches:

In file included from ../hw/net/rocker/rocker_of_dpa.c:25:
../hw/net/rocker/rocker_hw.h:14:16: error: conflicting types for 'uint64_t'; have '__u64' {aka 'long long unsigned int'}
   14 | #define __le64 uint64_t
      |                ^~~~~~~~
In file included from /usr/include/stdint.h:20,
                 from ../include/qemu/osdep.h:111,
                 from ../hw/net/rocker/rocker_of_dpa.c:17:
/usr/include/bits/alltypes.h:136:25: note: previous declaration of 'uint64_t' with type 'uint64_t' {aka 'long unsigned int'}
  136 | typedef unsigned _Int64 uint64_t;
      |                         ^~~~~~~~

because the Linux headers include a typedef of __leNN.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qapi: use imperative style in documentation

As requested by Markus:
> We prefer imperative mood "Return" over "Returns".

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250522190542.588267-14-pierrick.bouvier@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Change several more]

qapi: make all generated files common

Monolithic files (qapi_nonmodule_outputs) can now be compiled just
once, so we can remove qapi_util_outputs logic.
This removes the need for any specific_ss file.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250522190542.588267-13-pierrick.bouvier@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

qapi: remove qapi_specific_outputs from meson.build

There is no more QAPI files that need to be compiled per target, so we
can remove this. qapi_specific_outputs is now empty, so we can remove
the associated logic in meson.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250522190542.588267-12-pierrick.bouvier@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

qapi: make s390x specific CPU commands unconditionally available

This removes the TARGET_S390X and CONFIG_KVM conditions from the
CPU commands that are conceptually specific to s390x. Top level
stubs are provided to cope with non-s390x targets, or builds
without KVM.

The removal of CONFIG_KVM is justified by the fact there is no
conceptual difference between running 'qemu-system-s390x -accel tcg'
on a build with and without KVM built-in, so apps only using TCG
can't rely on the CONFIG_KVM in the schema.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250522190542.588267-11-pierrick.bouvier@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

qapi: make most CPU commands unconditionally available

This removes the TARGET_* conditions from all the CPU commands
that are conceptually target independent. Top level stubs are
provided to cope with targets which do not currently implement
all of the commands. Adjust the doc comments accordingly.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250522190542.588267-10-pierrick.bouvier@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

qapi: Make CpuModelExpansionInfo::deprecated-props optional and generic

We'd like to have some unified QAPI schema. Having a structure field
conditional to a target being built in is not very practical.

While @deprecated-props is only used by s390x target, it is generic
enough and could be used by other targets (assuming we expand
CpuModelExpansionType enum values).

Let's always include this field, regardless of the target, but make it
optional. This is not a compatibility break only because the field
remains present always on S390x.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250522190542.588267-9-pierrick.bouvier@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

qapi: remove the misc-target.json file

This file is now empty and can thus be removed.

Observe the pre-existing bug with s390-skeys.c and target/i386/monitor.c
both including qapi-commands-misc-target.h despite not requiring it.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250522190542.588267-8-pierrick.bouvier@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

qapi: make Xen event commands unconditionally available

This removes the TARGET_I386 condition from the Xen event channel
commands, moving them to the recently introduced misc-i386.json
QAPI file, given they are inherantly i386 specific commands.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250522190542.588267-7-pierrick.bouvier@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

qapi: make SGX commands unconditionally available

This removes the TARGET_I386 condition from the SGX confidential
virtualization commands, moving them to the recently introduced
misc-i386.json QAPI file, given they are inherantly i386 specific
commands.

Observe a pre-existing bug that the "SGXEPCSection" struct lacked
a TARGET_I386 condition, despite its only usage being behind a
TARGET_I386 condition.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250522190542.588267-6-pierrick.bouvier@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

qapi: expose query-gic-capability command unconditionally

This removes the TARGET_ARM condition from the query-gic-capability
command. This requires providing a QMP command stub for non-ARM targets.
This in turn requires moving the command out of misc-target.json, since
that will trigger symbol poisoning errors when built from target
independent code.

Following the earlier precedent, this creates a misc-arm.json file to
hold this ARM specific command.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250522190542.588267-5-pierrick.bouvier@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

qapi: make SEV commands unconditionally available

This removes the TARGET_I386 condition from the SEV confidential
virtualization commands, moving them to the recently introduced
misc-i386.json QAPI file, given they are inherantly i386 specific
commands.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250522190542.588267-4-pierrick.bouvier@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

qapi: expand docs for SEV commands

This gives some more context about the behaviour of the commands in
unsupported guest configuration or platform scenarios.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250522190542.588267-3-pierrick.bouvier@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Tweak query-sev doc, turn error descriptions into Errors sections,
delate a stray #, normalize whitespace, wrap lines]

qapi: expose rtc-reset-reinjection command unconditionally

This removes the TARGET_I386 condition from the rtc-reset-reinjection
command. This requires providing a QMP command stub for non-i386 target.
This in turn requires moving the command out of misc-target.json, since
that will trigger symbol poisoning errors when built from target
independent code.

Rather than putting the command into misc.json, it is proposed to create
misc-$TARGET.json files to hold commands whose impl is conceptually
only applicable to a single target. This gives an obvious docs hint to
consumers that the command is only useful in relation a specific target,
while misc.json is for commands applicable to 2 or more targets.

The current impl of qmp_rtc_reset_reinject() is a no-op if the i386
RTC is disabled in Kconfig, or if the running machine type lack any
RTC device.

The stub impl for non-i386 targets retains this no-op behaviour.
However, it is now reporting an Error mentioning this command is not
available for current target.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250522190542.588267-2-pierrick.bouvier@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

accel/tcg: Assert TCGCPUOps.pointer_wrap is set

All targets now provide the function, so we can
make the call unconditional.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target/sparc: Fill in TCGCPUOps.pointer_wrap

Check address masking state for sparc64.

Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target/s390x: Fill in TCGCPUOps.pointer_wrap

Use the existing wrap_address function.

Cc: qemu-s390x@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target/riscv: Fill in TCGCPUOps.pointer_wrap

Check 32 vs 64-bit and pointer masking state.

Cc: qemu-riscv@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target/ppc: Fill in TCGCPUOps.pointer_wrap

Check 32 vs 64-bit state.

Cc: qemu-ppc@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target/mips: Fill in TCGCPUOps.pointer_wrap

Check 32 vs 64-bit addressing state.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target/loongarch: Fill in TCGCPUOps.pointer_wrap

Check va32 state.

Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target/i386: Fill in TCGCPUOps.pointer_wrap

Check 32 vs 64-bit state.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: Fill in TCGCPUOps.pointer_wrap

For a-profile, check A32 vs A64 state.
For m-profile, use cpu_pointer_wrap_uint32.

Cc: qemu-arm@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target: Use cpu_pointer_wrap_uint32 for 32-bit targets

M68K, MicroBlaze, OpenRISC, RX, TriCore and Xtensa are
all 32-bit targets. AVR is more complicated, but using
a 32-bit wrap preserves current behaviour.

Cc: Michael Rolnik <mrolnik@gmail.com>
Cc: Laurent Vivier <laurent@vivier.eu>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Tested-by Bastian Koppelmann <kbastian@mail.uni-paderborn.de> (tricore)
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>