--- /dev/null
+From 415d832497098030241605c52ea83d4e2cfa7879 Mon Sep 17 00:00:00 2001
+From: Hector Martin <marcan@marcan.st>
+Date: Tue, 16 Aug 2022 16:03:11 +0900
+Subject: locking/atomic: Make test_and_*_bit() ordered on failure
+
+From: Hector Martin <marcan@marcan.st>
+
+commit 415d832497098030241605c52ea83d4e2cfa7879 upstream.
+
+These operations are documented as always ordered in
+include/asm-generic/bitops/instrumented-atomic.h, and producer-consumer
+type use cases where one side needs to ensure a flag is left pending
+after some shared data was updated rely on this ordering, even in the
+failure case.
+
+This is the case with the workqueue code, which currently suffers from a
+reproducible ordering violation on Apple M1 platforms (which are
+notoriously out-of-order) that ends up causing the TTY layer to fail to
+deliver data to userspace properly under the right conditions. This
+change fixes that bug.
+
+Change the documentation to restrict the "no order on failure" story to
+the _lock() variant (for which it makes sense), and remove the
+early-exit from the generic implementation, which is what causes the
+missing barrier semantics in that case. Without this, the remaining
+atomic op is fully ordered (including on ARM64 LSE, as of recent
+versions of the architecture spec).
+
+Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: stable@vger.kernel.org
+Fixes: e986a0d6cb36 ("locking/atomics, asm-generic/bitops/atomic.h: Rewrite using atomic_*() APIs")
+Fixes: 61e02392d3c7 ("locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()")
+Signed-off-by: Hector Martin <marcan@marcan.st>
+Acked-by: Will Deacon <will@kernel.org>
+Reviewed-by: Arnd Bergmann <arnd@arndb.de>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ Documentation/atomic_bitops.txt | 2 +-
+ include/asm-generic/bitops/atomic.h | 6 ------
+ 2 files changed, 1 insertion(+), 7 deletions(-)
+
+--- a/Documentation/atomic_bitops.txt
++++ b/Documentation/atomic_bitops.txt
+@@ -59,7 +59,7 @@ Like with atomic_t, the rule of thumb is
+ - RMW operations that have a return value are fully ordered.
+
+ - RMW operations that are conditional are unordered on FAILURE,
+- otherwise the above rules apply. In the case of test_and_{}_bit() operations,
++ otherwise the above rules apply. In the case of test_and_set_bit_lock(),
+ if the bit in memory is unchanged by the operation then it is deemed to have
+ failed.
+
+--- a/include/asm-generic/bitops/atomic.h
++++ b/include/asm-generic/bitops/atomic.h
+@@ -35,9 +35,6 @@ static inline int test_and_set_bit(unsig
+ unsigned long mask = BIT_MASK(nr);
+
+ p += BIT_WORD(nr);
+- if (READ_ONCE(*p) & mask)
+- return 1;
+-
+ old = atomic_long_fetch_or(mask, (atomic_long_t *)p);
+ return !!(old & mask);
+ }
+@@ -48,9 +45,6 @@ static inline int test_and_clear_bit(uns
+ unsigned long mask = BIT_MASK(nr);
+
+ p += BIT_WORD(nr);
+- if (!(READ_ONCE(*p) & mask))
+- return 0;
+-
+ old = atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
+ return !!(old & mask);
+ }