From: Greg Kroah-Hartman Date: Wed, 1 Sep 2021 08:24:56 +0000 (+0200) Subject: 5.10-stable patches X-Git-Tag: v4.4.283~40 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=b95c43a1aa6b1e29917cd843c5a5d2d0a9f82d7c;p=thirdparty%2Fkernel%2Fstable-queue.git 5.10-stable patches added patches: btrfs-fix-race-between-marking-inode-needs-to-be-logged-and-log-syncing.patch mtd-spinand-fix-incorrect-parameters-for-on-die-ecc.patch pipe-avoid-unnecessary-epollet-wakeups-under-normal-loads.patch pipe-do-fasync-notifications-for-every-pipe-io-not-just-state-changes.patch tipc-call-tipc_wait_for_connect-only-when-dlen-is-not-0.patch --- diff --git a/queue-5.10/btrfs-fix-race-between-marking-inode-needs-to-be-logged-and-log-syncing.patch b/queue-5.10/btrfs-fix-race-between-marking-inode-needs-to-be-logged-and-log-syncing.patch new file mode 100644 index 00000000000..b8bd9c07631 --- /dev/null +++ b/queue-5.10/btrfs-fix-race-between-marking-inode-needs-to-be-logged-and-log-syncing.patch @@ -0,0 +1,213 @@ +From bc0939fcfab0d7efb2ed12896b1af3d819954a14 Mon Sep 17 00:00:00 2001 +From: Filipe Manana +Date: Tue, 23 Feb 2021 12:08:48 +0000 +Subject: btrfs: fix race between marking inode needs to be logged and log syncing + +From: Filipe Manana + +commit bc0939fcfab0d7efb2ed12896b1af3d819954a14 upstream. + +We have a race between marking that an inode needs to be logged, either +at btrfs_set_inode_last_trans() or at btrfs_page_mkwrite(), and between +btrfs_sync_log(). The following steps describe how the race happens. + +1) We are at transaction N; + +2) Inode I was previously fsynced in the current transaction so it has: + + inode->logged_trans set to N; + +3) The inode's root currently has: + + root->log_transid set to 1 + root->last_log_commit set to 0 + + Which means only one log transaction was committed to far, log + transaction 0. When a log tree is created we set ->log_transid and + ->last_log_commit of its parent root to 0 (at btrfs_add_log_tree()); + +4) One more range of pages is dirtied in inode I; + +5) Some task A starts an fsync against some other inode J (same root), and + so it joins log transaction 1. + + Before task A calls btrfs_sync_log()... + +6) Task B starts an fsync against inode I, which currently has the full + sync flag set, so it starts delalloc and waits for the ordered extent + to complete before calling btrfs_inode_in_log() at btrfs_sync_file(); + +7) During ordered extent completion we have btrfs_update_inode() called + against inode I, which in turn calls btrfs_set_inode_last_trans(), + which does the following: + + spin_lock(&inode->lock); + inode->last_trans = trans->transaction->transid; + inode->last_sub_trans = inode->root->log_transid; + inode->last_log_commit = inode->root->last_log_commit; + spin_unlock(&inode->lock); + + So ->last_trans is set to N and ->last_sub_trans set to 1. + But before setting ->last_log_commit... + +8) Task A is at btrfs_sync_log(): + + - it increments root->log_transid to 2 + - starts writeback for all log tree extent buffers + - waits for the writeback to complete + - writes the super blocks + - updates root->last_log_commit to 1 + + It's a lot of slow steps between updating root->log_transid and + root->last_log_commit; + +9) The task doing the ordered extent completion, currently at + btrfs_set_inode_last_trans(), then finally runs: + + inode->last_log_commit = inode->root->last_log_commit; + spin_unlock(&inode->lock); + + Which results in inode->last_log_commit being set to 1. + The ordered extent completes; + +10) Task B is resumed, and it calls btrfs_inode_in_log() which returns + true because we have all the following conditions met: + + inode->logged_trans == N which matches fs_info->generation && + inode->last_subtrans (1) <= inode->last_log_commit (1) && + inode->last_subtrans (1) <= root->last_log_commit (1) && + list inode->extent_tree.modified_extents is empty + + And as a consequence we return without logging the inode, so the + existing logged version of the inode does not point to the extent + that was written after the previous fsync. + +It should be impossible in practice for one task be able to do so much +progress in btrfs_sync_log() while another task is at +btrfs_set_inode_last_trans() right after it reads root->log_transid and +before it reads root->last_log_commit. Even if kernel preemption is enabled +we know the task at btrfs_set_inode_last_trans() can not be preempted +because it is holding the inode's spinlock. + +However there is another place where we do the same without holding the +spinlock, which is in the memory mapped write path at: + + vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) + { + (...) + BTRFS_I(inode)->last_trans = fs_info->generation; + BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid; + BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit; + (...) + +So with preemption happening after setting ->last_sub_trans and before +setting ->last_log_commit, it is less of a stretch to have another task +do enough progress at btrfs_sync_log() such that the task doing the memory +mapped write ends up with ->last_sub_trans and ->last_log_commit set to +the same value. It is still a big stretch to get there, as the task doing +btrfs_sync_log() has to start writeback, wait for its completion and write +the super blocks. + +So fix this in two different ways: + +1) For btrfs_set_inode_last_trans(), simply set ->last_log_commit to the + value of ->last_sub_trans minus 1; + +2) For btrfs_page_mkwrite() only set the inode's ->last_sub_trans, just + like we do for buffered and direct writes at btrfs_file_write_iter(), + which is all we need to make sure multiple writes and fsyncs to an + inode in the same transaction never result in an fsync missing that + the inode changed and needs to be logged. Turn this into a helper + function and use it both at btrfs_page_mkwrite() and at + btrfs_file_write_iter() - this also fixes the problem that at + btrfs_page_mkwrite() we were setting those fields without the + protection of the inode's spinlock. + +This is an extremely unlikely race to happen in practice. + +Signed-off-by: Filipe Manana +Signed-off-by: David Sterba +Signed-off-by: Anand Jain +Signed-off-by: Greg Kroah-Hartman +--- + fs/btrfs/btrfs_inode.h | 15 +++++++++++++++ + fs/btrfs/file.c | 11 ++--------- + fs/btrfs/inode.c | 4 +--- + fs/btrfs/transaction.h | 2 +- + 4 files changed, 19 insertions(+), 13 deletions(-) + +--- a/fs/btrfs/btrfs_inode.h ++++ b/fs/btrfs/btrfs_inode.h +@@ -308,6 +308,21 @@ static inline void btrfs_mod_outstanding + mod); + } + ++/* ++ * Called every time after doing a buffered, direct IO or memory mapped write. ++ * ++ * This is to ensure that if we write to a file that was previously fsynced in ++ * the current transaction, then try to fsync it again in the same transaction, ++ * we will know that there were changes in the file and that it needs to be ++ * logged. ++ */ ++static inline void btrfs_set_inode_last_sub_trans(struct btrfs_inode *inode) ++{ ++ spin_lock(&inode->lock); ++ inode->last_sub_trans = inode->root->log_transid; ++ spin_unlock(&inode->lock); ++} ++ + static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation) + { + int ret = 0; +--- a/fs/btrfs/file.c ++++ b/fs/btrfs/file.c +@@ -1862,7 +1862,6 @@ static ssize_t btrfs_file_write_iter(str + struct file *file = iocb->ki_filp; + struct inode *inode = file_inode(file); + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); +- struct btrfs_root *root = BTRFS_I(inode)->root; + u64 start_pos; + u64 end_pos; + ssize_t num_written = 0; +@@ -2006,14 +2005,8 @@ static ssize_t btrfs_file_write_iter(str + + inode_unlock(inode); + +- /* +- * We also have to set last_sub_trans to the current log transid, +- * otherwise subsequent syncs to a file that's been synced in this +- * transaction will appear to have already occurred. +- */ +- spin_lock(&BTRFS_I(inode)->lock); +- BTRFS_I(inode)->last_sub_trans = root->log_transid; +- spin_unlock(&BTRFS_I(inode)->lock); ++ btrfs_set_inode_last_sub_trans(BTRFS_I(inode)); ++ + if (num_written > 0) + num_written = generic_write_sync(iocb, num_written); + +--- a/fs/btrfs/inode.c ++++ b/fs/btrfs/inode.c +@@ -8449,9 +8449,7 @@ again: + set_page_dirty(page); + SetPageUptodate(page); + +- BTRFS_I(inode)->last_trans = fs_info->generation; +- BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid; +- BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit; ++ btrfs_set_inode_last_sub_trans(BTRFS_I(inode)); + + unlock_extent_cached(io_tree, page_start, page_end, &cached_state); + +--- a/fs/btrfs/transaction.h ++++ b/fs/btrfs/transaction.h +@@ -171,7 +171,7 @@ static inline void btrfs_set_inode_last_ + spin_lock(&inode->lock); + inode->last_trans = trans->transaction->transid; + inode->last_sub_trans = inode->root->log_transid; +- inode->last_log_commit = inode->root->last_log_commit; ++ inode->last_log_commit = inode->last_sub_trans - 1; + spin_unlock(&inode->lock); + } + diff --git a/queue-5.10/mtd-spinand-fix-incorrect-parameters-for-on-die-ecc.patch b/queue-5.10/mtd-spinand-fix-incorrect-parameters-for-on-die-ecc.patch new file mode 100644 index 00000000000..432083329b2 --- /dev/null +++ b/queue-5.10/mtd-spinand-fix-incorrect-parameters-for-on-die-ecc.patch @@ -0,0 +1,93 @@ +From frieder@fris.de Wed Sep 1 10:16:58 2021 +From: Frieder Schrempf +Date: Mon, 30 Aug 2021 15:02:10 +0200 +Subject: mtd: spinand: Fix incorrect parameters for on-die ECC +To: Miquel Raynal +Cc: Frieder Schrempf , stable@vger.kernel.org, voice INTER connect GmbH , Alexander Lobakin , Felix Fietkau , Greg Kroah-Hartman , linux-kernel@vger.kernel.org, linux-mtd@lists.infradead.org, Richard Weinberger , YouChing Lin +Message-ID: <20210830130211.445728-1-frieder@fris.de> + + +From: Frieder Schrempf + +The new generic NAND ECC framework stores the configuration and +requirements in separate places since commit 93ef92f6f422 ("mtd: nand: Use +the new generic ECC object"). In 5.10.x The SPI NAND layer still uses only +the requirements to track the ECC properties. This mismatch leads to +values of zero being used for ECC strength and step_size in the SPI NAND +layer wherever nanddev_get_ecc_conf() is used and therefore breaks the SPI +NAND on-die ECC support in 5.10.x. + +By using nanddev_get_ecc_requirements() instead of nanddev_get_ecc_conf() +for SPI NAND, we make sure that the correct parameters for the detected +chip are used. In later versions (5.11.x) this is fixed anyway with the +implementation of the SPI NAND on-die ECC engine. + +Cc: stable@vger.kernel.org # 5.10.x +Reported-by: voice INTER connect GmbH +Signed-off-by: Frieder Schrempf +Acked-by: Miquel Raynal +Signed-off-by: Greg Kroah-Hartman + +--- + drivers/mtd/nand/spi/core.c | 6 +++--- + drivers/mtd/nand/spi/macronix.c | 6 +++--- + drivers/mtd/nand/spi/toshiba.c | 6 +++--- + 3 files changed, 9 insertions(+), 9 deletions(-) + +--- a/drivers/mtd/nand/spi/core.c ++++ b/drivers/mtd/nand/spi/core.c +@@ -419,7 +419,7 @@ static int spinand_check_ecc_status(stru + * fixed, so let's return the maximum possible value so that + * wear-leveling layers move the data immediately. + */ +- return nanddev_get_ecc_conf(nand)->strength; ++ return nanddev_get_ecc_requirements(nand)->strength; + + case STATUS_ECC_UNCOR_ERROR: + return -EBADMSG; +@@ -1090,8 +1090,8 @@ static int spinand_init(struct spinand_d + mtd->oobavail = ret; + + /* Propagate ECC information to mtd_info */ +- mtd->ecc_strength = nanddev_get_ecc_conf(nand)->strength; +- mtd->ecc_step_size = nanddev_get_ecc_conf(nand)->step_size; ++ mtd->ecc_strength = nanddev_get_ecc_requirements(nand)->strength; ++ mtd->ecc_step_size = nanddev_get_ecc_requirements(nand)->step_size; + + return 0; + +--- a/drivers/mtd/nand/spi/macronix.c ++++ b/drivers/mtd/nand/spi/macronix.c +@@ -84,11 +84,11 @@ static int mx35lf1ge4ab_ecc_get_status(s + * data around if it's not necessary. + */ + if (mx35lf1ge4ab_get_eccsr(spinand, &eccsr)) +- return nanddev_get_ecc_conf(nand)->strength; ++ return nanddev_get_ecc_requirements(nand)->strength; + +- if (WARN_ON(eccsr > nanddev_get_ecc_conf(nand)->strength || ++ if (WARN_ON(eccsr > nanddev_get_ecc_requirements(nand)->strength || + !eccsr)) +- return nanddev_get_ecc_conf(nand)->strength; ++ return nanddev_get_ecc_requirements(nand)->strength; + + return eccsr; + +--- a/drivers/mtd/nand/spi/toshiba.c ++++ b/drivers/mtd/nand/spi/toshiba.c +@@ -90,12 +90,12 @@ static int tx58cxgxsxraix_ecc_get_status + * data around if it's not necessary. + */ + if (spi_mem_exec_op(spinand->spimem, &op)) +- return nanddev_get_ecc_conf(nand)->strength; ++ return nanddev_get_ecc_requirements(nand)->strength; + + mbf >>= 4; + +- if (WARN_ON(mbf > nanddev_get_ecc_conf(nand)->strength || !mbf)) +- return nanddev_get_ecc_conf(nand)->strength; ++ if (WARN_ON(mbf > nanddev_get_ecc_requirements(nand)->strength || !mbf)) ++ return nanddev_get_ecc_requirements(nand)->strength; + + return mbf; + diff --git a/queue-5.10/pipe-avoid-unnecessary-epollet-wakeups-under-normal-loads.patch b/queue-5.10/pipe-avoid-unnecessary-epollet-wakeups-under-normal-loads.patch new file mode 100644 index 00000000000..66fda4dd222 --- /dev/null +++ b/queue-5.10/pipe-avoid-unnecessary-epollet-wakeups-under-normal-loads.patch @@ -0,0 +1,119 @@ +From 3b844826b6c6affa80755254da322b017358a2f4 Mon Sep 17 00:00:00 2001 +From: Linus Torvalds +Date: Thu, 5 Aug 2021 10:04:43 -0700 +Subject: pipe: avoid unnecessary EPOLLET wakeups under normal loads + +From: Linus Torvalds + +commit 3b844826b6c6affa80755254da322b017358a2f4 upstream. + +I had forgotten just how sensitive hackbench is to extra pipe wakeups, +and commit 3a34b13a88ca ("pipe: make pipe writes always wake up +readers") ended up causing a quite noticeable regression on larger +machines. + +Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's +not clear that this matters in real life all that much, but as Mel +points out, it's used often enough when comparing kernels and so the +performance regression shows up like a sore thumb. + +It's easy enough to fix at least for the common cases where pipes are +used purely for data transfer, and you never have any exciting poll +usage at all. So set a special 'poll_usage' flag when there is polling +activity, and make the ugly "EPOLLET has crazy legacy expectations" +semantics explicit to only that case. + +I would love to limit it to just the broken EPOLLET case, but the pipe +code can't see the difference between epoll and regular select/poll, so +any non-read/write waiting will trigger the extra wakeup behavior. That +is sufficient for at least the hackbench case. + +Apart from making the odd extra wakeup cases more explicitly about +EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe +write, not at the first write chunk. That is actually much saner +semantics (as much as you can call any of the legacy edge-triggered +expectations for EPOLLET "sane") since it means that you know the wakeup +will happen once the write is done, rather than possibly in the middle +of one. + +[ For stable people: I'm putting a "Fixes" tag on this, but I leave it + up to you to decide whether you actually want to backport it or not. + It likely has no impact outside of synthetic benchmarks - Linus ] + +Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/ +Fixes: 3a34b13a88ca ("pipe: make pipe writes always wake up readers") +Reported-by: kernel test robot +Tested-by: Sandeep Patil +Tested-by: Mel Gorman +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman +--- + fs/pipe.c | 15 +++++++++------ + include/linux/pipe_fs_i.h | 2 ++ + 2 files changed, 11 insertions(+), 6 deletions(-) + +--- a/fs/pipe.c ++++ b/fs/pipe.c +@@ -444,9 +444,6 @@ pipe_write(struct kiocb *iocb, struct io + #endif + + /* +- * Epoll nonsensically wants a wakeup whether the pipe +- * was already empty or not. +- * + * If it wasn't empty we try to merge new data into + * the last buffer. + * +@@ -455,9 +452,9 @@ pipe_write(struct kiocb *iocb, struct io + * spanning multiple pages. + */ + head = pipe->head; +- was_empty = true; ++ was_empty = pipe_empty(head, pipe->tail); + chars = total_len & (PAGE_SIZE-1); +- if (chars && !pipe_empty(head, pipe->tail)) { ++ if (chars && !was_empty) { + unsigned int mask = pipe->ring_size - 1; + struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask]; + int offset = buf->offset + buf->len; +@@ -590,8 +587,11 @@ out: + * This is particularly important for small writes, because of + * how (for example) the GNU make jobserver uses small writes to + * wake up pending jobs ++ * ++ * Epoll nonsensically wants a wakeup whether the pipe ++ * was already empty or not. + */ +- if (was_empty) { ++ if (was_empty || pipe->poll_usage) { + wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM); + kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); + } +@@ -654,6 +654,9 @@ pipe_poll(struct file *filp, poll_table + struct pipe_inode_info *pipe = filp->private_data; + unsigned int head, tail; + ++ /* Epoll has some historical nasty semantics, this enables them */ ++ pipe->poll_usage = 1; ++ + /* + * Reading pipe state only -- no need for acquiring the semaphore. + * +--- a/include/linux/pipe_fs_i.h ++++ b/include/linux/pipe_fs_i.h +@@ -48,6 +48,7 @@ struct pipe_buffer { + * @files: number of struct file referring this pipe (protected by ->i_lock) + * @r_counter: reader counter + * @w_counter: writer counter ++ * @poll_usage: is this pipe used for epoll, which has crazy wakeups? + * @fasync_readers: reader side fasync + * @fasync_writers: writer side fasync + * @bufs: the circular array of pipe buffers +@@ -70,6 +71,7 @@ struct pipe_inode_info { + unsigned int files; + unsigned int r_counter; + unsigned int w_counter; ++ unsigned int poll_usage; + struct page *tmp_page; + struct fasync_struct *fasync_readers; + struct fasync_struct *fasync_writers; diff --git a/queue-5.10/pipe-do-fasync-notifications-for-every-pipe-io-not-just-state-changes.patch b/queue-5.10/pipe-do-fasync-notifications-for-every-pipe-io-not-just-state-changes.patch new file mode 100644 index 00000000000..d63d9e7ff5c --- /dev/null +++ b/queue-5.10/pipe-do-fasync-notifications-for-every-pipe-io-not-just-state-changes.patch @@ -0,0 +1,127 @@ +From fe67f4dd8daa252eb9aa7acb61555f3cc3c1ce4c Mon Sep 17 00:00:00 2001 +From: Linus Torvalds +Date: Tue, 24 Aug 2021 10:39:25 -0700 +Subject: pipe: do FASYNC notifications for every pipe IO, not just state changes + +From: Linus Torvalds + +commit fe67f4dd8daa252eb9aa7acb61555f3cc3c1ce4c upstream. + +It turns out that the SIGIO/FASYNC situation is almost exactly the same +as the EPOLLET case was: user space really wants to be notified after +every operation. + +Now, in a perfect world it should be sufficient to only notify user +space on "state transitions" when the IO state changes (ie when a pipe +goes from unreadable to readable, or from unwritable to writable). User +space should then do as much as possible - fully emptying the buffer or +what not - and we'll notify it again the next time the state changes. + +But as with EPOLLET, we have at least one case (stress-ng) where the +kernel sent SIGIO due to the pipe being marked for asynchronous +notification, but the user space signal handler then didn't actually +necessarily read it all before returning (it read more than what was +written, but since there could be multiple writes, it could leave data +pending). + +The user space code then expected to get another SIGIO for subsequent +writes - even though the pipe had been readable the whole time - and +would only then read more. + +This is arguably a user space bug - and Colin King already fixed the +stress-ng code in question - but the kernel regression rules are clear: +it doesn't matter if kernel people think that user space did something +silly and wrong. What matters is that it used to work. + +So if user space depends on specific historical kernel behavior, it's a +regression when that behavior changes. It's on us: we were silly to +have that non-optimal historical behavior, and our old kernel behavior +was what user space was tested against. + +Because of how the FASYNC notification was tied to wakeup behavior, this +was first broken by commits f467a6a66419 and 1b6b26ae7053 ("pipe: fix +and clarify pipe read/write wakeup logic"), but at the time it seems +nobody noticed. Probably because the stress-ng problem case ends up +being timing-dependent too. + +It was then unwittingly fixed by commit 3a34b13a88ca ("pipe: make pipe +writes always wake up readers") only to be broken again when by commit +3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal +loads"). + +And at that point the kernel test robot noticed the performance +refression in the stress-ng.sigio.ops_per_sec case. So the "Fixes" tag +below is somewhat ad hoc, but it matches when the issue was noticed. + +Fix it for good (knock wood) by simply making the kill_fasync() case +separate from the wakeup case. FASYNC is quite rare, and we clearly +shouldn't even try to use the "avoid unnecessary wakeups" logic for it. + +Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/ +Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads") +Reported-by: kernel test robot +Tested-by: Oliver Sang +Cc: Eric Biederman +Cc: Colin Ian King +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman +--- + fs/pipe.c | 20 ++++++++------------ + 1 file changed, 8 insertions(+), 12 deletions(-) + +--- a/fs/pipe.c ++++ b/fs/pipe.c +@@ -363,10 +363,9 @@ pipe_read(struct kiocb *iocb, struct iov + * _very_ unlikely case that the pipe was full, but we got + * no data. + */ +- if (unlikely(was_full)) { ++ if (unlikely(was_full)) + wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM); +- kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); +- } ++ kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); + + /* + * But because we didn't read anything, at this point we can +@@ -385,12 +384,11 @@ pipe_read(struct kiocb *iocb, struct iov + wake_next_reader = false; + __pipe_unlock(pipe); + +- if (was_full) { ++ if (was_full) + wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM); +- kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); +- } + if (wake_next_reader) + wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM); ++ kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); + if (ret > 0) + file_accessed(filp); + return ret; +@@ -565,10 +563,9 @@ pipe_write(struct kiocb *iocb, struct io + * become empty while we dropped the lock. + */ + __pipe_unlock(pipe); +- if (was_empty) { ++ if (was_empty) + wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM); +- kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); +- } ++ kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); + wait_event_interruptible_exclusive(pipe->wr_wait, pipe_writable(pipe)); + __pipe_lock(pipe); + was_empty = pipe_empty(pipe->head, pipe->tail); +@@ -591,10 +588,9 @@ out: + * Epoll nonsensically wants a wakeup whether the pipe + * was already empty or not. + */ +- if (was_empty || pipe->poll_usage) { ++ if (was_empty || pipe->poll_usage) + wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM); +- kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); +- } ++ kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); + if (wake_next_writer) + wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM); + if (ret > 0 && sb_start_write_trylock(file_inode(filp)->i_sb)) { diff --git a/queue-5.10/series b/queue-5.10/series index b0be7890757..50eb20bdb2e 100644 --- a/queue-5.10/series +++ b/queue-5.10/series @@ -68,3 +68,8 @@ drm-copy-drm_wait_vblank-to-user-before-returning.patch drm-nouveau-disp-power-down-unused-dp-links-during-i.patch drm-nouveau-kms-nv50-workaround-efi-gop-window-chann.patch net-rds-dma_map_sg-is-entitled-to-merge-entries.patch +btrfs-fix-race-between-marking-inode-needs-to-be-logged-and-log-syncing.patch +pipe-avoid-unnecessary-epollet-wakeups-under-normal-loads.patch +pipe-do-fasync-notifications-for-every-pipe-io-not-just-state-changes.patch +mtd-spinand-fix-incorrect-parameters-for-on-die-ecc.patch +tipc-call-tipc_wait_for_connect-only-when-dlen-is-not-0.patch diff --git a/queue-5.10/tipc-call-tipc_wait_for_connect-only-when-dlen-is-not-0.patch b/queue-5.10/tipc-call-tipc_wait_for_connect-only-when-dlen-is-not-0.patch new file mode 100644 index 00000000000..2ef960eb62d --- /dev/null +++ b/queue-5.10/tipc-call-tipc_wait_for_connect-only-when-dlen-is-not-0.patch @@ -0,0 +1,45 @@ +From 7387a72c5f84f0dfb57618f9e4770672c0d2e4c9 Mon Sep 17 00:00:00 2001 +From: Xin Long +Date: Sun, 15 Aug 2021 03:13:36 -0400 +Subject: tipc: call tipc_wait_for_connect only when dlen is not 0 + +From: Xin Long + +commit 7387a72c5f84f0dfb57618f9e4770672c0d2e4c9 upstream. + +__tipc_sendmsg() is called to send SYN packet by either tipc_sendmsg() +or tipc_connect(). The difference is in tipc_connect(), it will call +tipc_wait_for_connect() after __tipc_sendmsg() to wait until connecting +is done. So there's no need to wait in __tipc_sendmsg() for this case. + +This patch is to fix it by calling tipc_wait_for_connect() only when dlen +is not 0 in __tipc_sendmsg(), which means it's called by tipc_connect(). + +Note this also fixes the failure in tipcutils/test/ptts/: + + # ./tipcTS & + # ./tipcTC 9 + (hang) + +Fixes: 36239dab6da7 ("tipc: fix implicit-connect for SYN+") +Reported-by: Shuang Li +Signed-off-by: Xin Long +Acked-by: Jon Maloy +Signed-off-by: David S. Miller +Cc: Paul Gortmaker +Signed-off-by: Greg Kroah-Hartman +--- + net/tipc/socket.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/net/tipc/socket.c ++++ b/net/tipc/socket.c +@@ -1511,7 +1511,7 @@ static int __tipc_sendmsg(struct socket + + if (unlikely(syn && !rc)) { + tipc_set_sk_state(sk, TIPC_CONNECTING); +- if (timeout) { ++ if (dlen && timeout) { + timeout = msecs_to_jiffies(timeout); + tipc_wait_for_connect(sock, &timeout); + }