gpu: host1x: Skip redundant syncpoint loads in host1x_syncpt_wait()
In host1x_syncpt_wait(), the hardware syncpoint value was loaded
initially for expiry check, and then loaded a second time to
populate the caller's value pointer. Reuse a single load for
both purposes.
After dma_fence_wait_timeout(), the previous code reloaded the syncpoint
value for the expiry check, which is only required in the timeout case.
On success (i.e., return value > 0, or return value == 0 with zero
jiffies remaining), the ISR has already cached the value before
signaling the fence. The value pointer can therefore be populated using
the cached value using host1x_syncpt_read_min() without MMIO access.
Only the timeout path requires a fresh load, move host1x_syncpt_load()
under that path.
Measured Syncpoint wait latency (50000 samples):
Average latency: 12.2 us -> 10.6 us
99.99 pct latency: 62.96 us -> 51.90 us
Signed-off-by: Tanmay Patil <tanmayp@nvidia.com>
Acked-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20260514103153.766343-2-tanmayp@nvidia.com