From: Greg Kroah-Hartman Date: Tue, 7 Aug 2018 13:23:51 +0000 (+0200) Subject: 4.9-stable patches X-Git-Tag: v4.17.14~11 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=f57a4a2b35166b91a8b5c41c11ce9580e93e1898;p=thirdparty%2Fkernel%2Fstable-queue.git 4.9-stable patches added patches: acpi-pci-bail-early-in-acpi_pci_add_bus-if-there-is-no-acpi-handle.patch btrfs-fix-file-data-corruption-after-cloning-a-range-and-fsync.patch ext4-fix-false-negatives-and-false-positives-in-ext4_check_descriptors.patch i2c-imx-fix-reinit_completion-use.patch ring_buffer-tracing-inherit-the-tracing-setting-to-next-ring-buffer.patch tcp-add-tcp_ooo_try_coalesce-helper.patch --- diff --git a/queue-4.9/acpi-pci-bail-early-in-acpi_pci_add_bus-if-there-is-no-acpi-handle.patch b/queue-4.9/acpi-pci-bail-early-in-acpi_pci_add_bus-if-there-is-no-acpi-handle.patch new file mode 100644 index 00000000000..28b2ad27df1 --- /dev/null +++ b/queue-4.9/acpi-pci-bail-early-in-acpi_pci_add_bus-if-there-is-no-acpi-handle.patch @@ -0,0 +1,42 @@ +From a0040c0145945d3bd203df8fa97f6dfa819f3f7d Mon Sep 17 00:00:00 2001 +From: Vitaly Kuznetsov +Date: Thu, 14 Sep 2017 16:50:14 +0200 +Subject: ACPI / PCI: Bail early in acpi_pci_add_bus() if there is no ACPI handle + +From: Vitaly Kuznetsov + +commit a0040c0145945d3bd203df8fa97f6dfa819f3f7d upstream. + +Hyper-V instances support PCI pass-through which is implemented through PV +pci-hyperv driver. When a device is passed through, a new root PCI bus is +created in the guest. The bus sits on top of VMBus and has no associated +information in ACPI. acpi_pci_add_bus() in this case proceeds all the way +to acpi_evaluate_dsm(), which reports + + ACPI: \: failed to evaluate _DSM (0x1001) + +While acpi_pci_slot_enumerate() and acpiphp_enumerate_slots() are protected +against ACPI_HANDLE() being NULL and do nothing, acpi_evaluate_dsm() is not +and gives us the error. It seems the correct fix is to not do anything in +acpi_pci_add_bus() in such cases. + +Signed-off-by: Vitaly Kuznetsov +Signed-off-by: Bjorn Helgaas +Cc: Sinan Kaya +Signed-off-by: Greg Kroah-Hartman + +--- + drivers/pci/pci-acpi.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/drivers/pci/pci-acpi.c ++++ b/drivers/pci/pci-acpi.c +@@ -567,7 +567,7 @@ void acpi_pci_add_bus(struct pci_bus *bu + union acpi_object *obj; + struct pci_host_bridge *bridge; + +- if (acpi_pci_disabled || !bus->bridge) ++ if (acpi_pci_disabled || !bus->bridge || !ACPI_HANDLE(bus->bridge)) + return; + + acpi_pci_slot_enumerate(bus); diff --git a/queue-4.9/btrfs-fix-file-data-corruption-after-cloning-a-range-and-fsync.patch b/queue-4.9/btrfs-fix-file-data-corruption-after-cloning-a-range-and-fsync.patch new file mode 100644 index 00000000000..ea6eb28b2e7 --- /dev/null +++ b/queue-4.9/btrfs-fix-file-data-corruption-after-cloning-a-range-and-fsync.patch @@ -0,0 +1,105 @@ +From bd3599a0e142cd73edd3b6801068ac3f48ac771a Mon Sep 17 00:00:00 2001 +From: Filipe Manana +Date: Thu, 12 Jul 2018 01:36:43 +0100 +Subject: Btrfs: fix file data corruption after cloning a range and fsync + +From: Filipe Manana + +commit bd3599a0e142cd73edd3b6801068ac3f48ac771a upstream. + +When we clone a range into a file we can end up dropping existing +extent maps (or trimming them) and replacing them with new ones if the +range to be cloned overlaps with a range in the destination inode. +When that happens we add the new extent maps to the list of modified +extents in the inode's extent map tree, so that a "fast" fsync (the flag +BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps +and log corresponding extent items. However, at the end of range cloning +operation we do truncate all the pages in the affected range (in order to +ensure future reads will not get stale data). Sometimes this truncation +will release the corresponding extent maps besides the pages from the page +cache. If this happens, then a "fast" fsync operation will miss logging +some extent items, because it relies exclusively on the extent maps being +present in the inode's extent tree, leading to data loss/corruption if +the fsync ends up using the same transaction used by the clone operation +(that transaction was not committed in the meanwhile). An extent map is +released through the callback btrfs_invalidatepage(), which gets called by +truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The +later ends up calling try_release_extent_mapping() which will release the +extent map if some conditions are met, like the file size being greater +than 16Mb, gfp flags allow blocking and the range not being locked (which +is the case during the clone operation) nor being the extent map flagged +as pinned (also the case for cloning). + +The following example, turned into a test for fstests, reproduces the +issue: + + $ mkfs.btrfs -f /dev/sdb + $ mount /dev/sdb /mnt + + $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo + $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar + + $ xfs_io -c "fsync" /mnt/bar + # reflink destination offset corresponds to the size of file bar, + # 2728Kb minus 4Kb. + $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar + $ xfs_io -c "fsync" /mnt/bar + + $ md5sum /mnt/bar + 95a95813a8c2abc9aa75a6c2914a077e /mnt/bar + + + + $ mount /dev/sdb /mnt + $ md5sum /mnt/bar + 207fd8d0b161be8a84b945f0df8d5f8d /mnt/bar + # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the + # power failure + +In the above example, the destination offset of the clone operation +corresponds to the size of the "bar" file minus 4Kb. So during the clone +operation, the extent map covering the range from 2572Kb to 2728Kb gets +trimmed so that it ends at offset 2724Kb, and a new extent map covering +the range from 2724Kb to 11724Kb is created. So at the end of the clone +operation when we ask to truncate the pages in the range from 2724Kb to +2724Kb + 15908Kb, the page invalidation callback ends up removing the new +extent map (through try_release_extent_mapping()) when the page at offset +2724Kb is passed to that callback. + +Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent +map is removed at try_release_extent_mapping(), forcing the next fsync to +search for modified extents in the fs/subvolume tree instead of relying on +the presence of extent maps in memory. This way we can continue doing a +"fast" fsync if the destination range of a clone operation does not +overlap with an existing range or if any of the criteria necessary to +remove an extent map at try_release_extent_mapping() is not met (file +size not bigger then 16Mb or gfp flags do not allow blocking). + +CC: stable@vger.kernel.org # 3.16+ +Signed-off-by: Filipe Manana +Signed-off-by: David Sterba +Signed-off-by: Sudip Mukherjee +Signed-off-by: Greg Kroah-Hartman +--- + fs/btrfs/extent_io.c | 3 +++ + 1 file changed, 3 insertions(+) + +--- a/fs/btrfs/extent_io.c ++++ b/fs/btrfs/extent_io.c +@@ -4298,6 +4298,7 @@ int try_release_extent_mapping(struct ex + struct extent_map *em; + u64 start = page_offset(page); + u64 end = start + PAGE_SIZE - 1; ++ struct btrfs_inode *btrfs_inode = BTRFS_I(page->mapping->host); + + if (gfpflags_allow_blocking(mask) && + page->mapping->host->i_size > SZ_16M) { +@@ -4320,6 +4321,8 @@ int try_release_extent_mapping(struct ex + extent_map_end(em) - 1, + EXTENT_LOCKED | EXTENT_WRITEBACK, + 0, NULL)) { ++ set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, ++ &btrfs_inode->runtime_flags); + remove_extent_mapping(map, em); + /* once for the rb tree */ + free_extent_map(em); diff --git a/queue-4.9/ext4-fix-false-negatives-and-false-positives-in-ext4_check_descriptors.patch b/queue-4.9/ext4-fix-false-negatives-and-false-positives-in-ext4_check_descriptors.patch new file mode 100644 index 00000000000..eddc980e745 --- /dev/null +++ b/queue-4.9/ext4-fix-false-negatives-and-false-positives-in-ext4_check_descriptors.patch @@ -0,0 +1,55 @@ +From 44de022c4382541cebdd6de4465d1f4f465ff1dd Mon Sep 17 00:00:00 2001 +From: Theodore Ts'o +Date: Sun, 8 Jul 2018 19:35:02 -0400 +Subject: ext4: fix false negatives *and* false positives in ext4_check_descriptors() + +From: Theodore Ts'o + +commit 44de022c4382541cebdd6de4465d1f4f465ff1dd upstream. + +Ext4_check_descriptors() was getting called before s_gdb_count was +initialized. So for file systems w/o the meta_bg feature, allocation +bitmaps could overlap the block group descriptors and ext4 wouldn't +notice. + +For file systems with the meta_bg feature enabled, there was a +fencepost error which would cause the ext4_check_descriptors() to +incorrectly believe that the block allocation bitmap overlaps with the +block group descriptor blocks, and it would reject the mount. + +Fix both of these problems. + +Signed-off-by: Theodore Ts'o +Cc: stable@vger.kernel.org +Signed-off-by: Benjamin Gilbert +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/super.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/fs/ext4/super.c ++++ b/fs/ext4/super.c +@@ -2231,7 +2231,7 @@ static int ext4_check_descriptors(struct + struct ext4_sb_info *sbi = EXT4_SB(sb); + ext4_fsblk_t first_block = le32_to_cpu(sbi->s_es->s_first_data_block); + ext4_fsblk_t last_block; +- ext4_fsblk_t last_bg_block = sb_block + ext4_bg_num_gdb(sb, 0) + 1; ++ ext4_fsblk_t last_bg_block = sb_block + ext4_bg_num_gdb(sb, 0); + ext4_fsblk_t block_bitmap; + ext4_fsblk_t inode_bitmap; + ext4_fsblk_t inode_table; +@@ -3941,13 +3941,13 @@ static int ext4_fill_super(struct super_ + goto failed_mount2; + } + } ++ sbi->s_gdb_count = db_count; + if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) { + ext4_msg(sb, KERN_ERR, "group descriptors corrupted!"); + ret = -EFSCORRUPTED; + goto failed_mount2; + } + +- sbi->s_gdb_count = db_count; + get_random_bytes(&sbi->s_next_generation, sizeof(u32)); + spin_lock_init(&sbi->s_next_gen_lock); + diff --git a/queue-4.9/i2c-imx-fix-reinit_completion-use.patch b/queue-4.9/i2c-imx-fix-reinit_completion-use.patch new file mode 100644 index 00000000000..6ef12e96457 --- /dev/null +++ b/queue-4.9/i2c-imx-fix-reinit_completion-use.patch @@ -0,0 +1,53 @@ +From 9f9e3e0d4dd3338b3f3dde080789f71901e1e4ff Mon Sep 17 00:00:00 2001 +From: Esben Haabendal +Date: Mon, 9 Jul 2018 11:43:01 +0200 +Subject: i2c: imx: Fix reinit_completion() use +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Esben Haabendal + +commit 9f9e3e0d4dd3338b3f3dde080789f71901e1e4ff upstream. + +Make sure to call reinit_completion() before dma is started to avoid race +condition where reinit_completion() is called after complete() and before +wait_for_completion_timeout(). + +Signed-off-by: Esben Haabendal +Fixes: ce1a78840ff7 ("i2c: imx: add DMA support for freescale i2c driver") +Reviewed-by: Uwe Kleine-König +Signed-off-by: Wolfram Sang +Cc: stable@kernel.org +Signed-off-by: Sudip Mukherjee +Signed-off-by: Greg Kroah-Hartman +--- + drivers/i2c/busses/i2c-imx.c | 3 +-- + 1 file changed, 1 insertion(+), 2 deletions(-) + +--- a/drivers/i2c/busses/i2c-imx.c ++++ b/drivers/i2c/busses/i2c-imx.c +@@ -376,6 +376,7 @@ static int i2c_imx_dma_xfer(struct imx_i + goto err_desc; + } + ++ reinit_completion(&dma->cmd_complete); + txdesc->callback = i2c_imx_dma_callback; + txdesc->callback_param = i2c_imx; + if (dma_submit_error(dmaengine_submit(txdesc))) { +@@ -619,7 +620,6 @@ static int i2c_imx_dma_write(struct imx_ + * The first byte must be transmitted by the CPU. + */ + imx_i2c_write_reg(msgs->addr << 1, i2c_imx, IMX_I2C_I2DR); +- reinit_completion(&i2c_imx->dma->cmd_complete); + time_left = wait_for_completion_timeout( + &i2c_imx->dma->cmd_complete, + msecs_to_jiffies(DMA_TIMEOUT)); +@@ -678,7 +678,6 @@ static int i2c_imx_dma_read(struct imx_i + if (result) + return result; + +- reinit_completion(&i2c_imx->dma->cmd_complete); + time_left = wait_for_completion_timeout( + &i2c_imx->dma->cmd_complete, + msecs_to_jiffies(DMA_TIMEOUT)); diff --git a/queue-4.9/ring_buffer-tracing-inherit-the-tracing-setting-to-next-ring-buffer.patch b/queue-4.9/ring_buffer-tracing-inherit-the-tracing-setting-to-next-ring-buffer.patch new file mode 100644 index 00000000000..40702efb899 --- /dev/null +++ b/queue-4.9/ring_buffer-tracing-inherit-the-tracing-setting-to-next-ring-buffer.patch @@ -0,0 +1,103 @@ +From 73c8d8945505acdcbae137c2e00a1232e0be709f Mon Sep 17 00:00:00 2001 +From: Masami Hiramatsu +Date: Sat, 14 Jul 2018 01:28:15 +0900 +Subject: ring_buffer: tracing: Inherit the tracing setting to next ring buffer + +From: Masami Hiramatsu + +commit 73c8d8945505acdcbae137c2e00a1232e0be709f upstream. + +Maintain the tracing on/off setting of the ring_buffer when switching +to the trace buffer snapshot. + +Taking a snapshot is done by swapping the backup ring buffer +(max_tr_buffer). But since the tracing on/off setting is defined +by the ring buffer, when swapping it, the tracing on/off setting +can also be changed. This causes a strange result like below: + + /sys/kernel/debug/tracing # cat tracing_on + 1 + /sys/kernel/debug/tracing # echo 0 > tracing_on + /sys/kernel/debug/tracing # cat tracing_on + 0 + /sys/kernel/debug/tracing # echo 1 > snapshot + /sys/kernel/debug/tracing # cat tracing_on + 1 + /sys/kernel/debug/tracing # echo 1 > snapshot + /sys/kernel/debug/tracing # cat tracing_on + 0 + +We don't touch tracing_on, but snapshot changes tracing_on +setting each time. This is an anomaly, because user doesn't know +that each "ring_buffer" stores its own tracing-enable state and +the snapshot is done by swapping ring buffers. + +Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox + +Cc: Ingo Molnar +Cc: Shuah Khan +Cc: Tom Zanussi +Cc: Hiraku Toyooka +Cc: stable@vger.kernel.org +Fixes: debdd57f5145 ("tracing: Make a snapshot feature available from userspace") +Signed-off-by: Masami Hiramatsu +[ Updated commit log and comment in the code ] +Signed-off-by: Steven Rostedt (VMware) +Signed-off-by: Sudip Mukherjee +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/ring_buffer.h | 1 + + kernel/trace/ring_buffer.c | 16 ++++++++++++++++ + kernel/trace/trace.c | 6 ++++++ + 3 files changed, 23 insertions(+) + +--- a/include/linux/ring_buffer.h ++++ b/include/linux/ring_buffer.h +@@ -162,6 +162,7 @@ void ring_buffer_record_enable(struct ri + void ring_buffer_record_off(struct ring_buffer *buffer); + void ring_buffer_record_on(struct ring_buffer *buffer); + int ring_buffer_record_is_on(struct ring_buffer *buffer); ++int ring_buffer_record_is_set_on(struct ring_buffer *buffer); + void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu); + void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu); + +--- a/kernel/trace/ring_buffer.c ++++ b/kernel/trace/ring_buffer.c +@@ -3137,6 +3137,22 @@ int ring_buffer_record_is_on(struct ring + } + + /** ++ * ring_buffer_record_is_set_on - return true if the ring buffer is set writable ++ * @buffer: The ring buffer to see if write is set enabled ++ * ++ * Returns true if the ring buffer is set writable by ring_buffer_record_on(). ++ * Note that this does NOT mean it is in a writable state. ++ * ++ * It may return true when the ring buffer has been disabled by ++ * ring_buffer_record_disable(), as that is a temporary disabling of ++ * the ring buffer. ++ */ ++int ring_buffer_record_is_set_on(struct ring_buffer *buffer) ++{ ++ return !(atomic_read(&buffer->record_disabled) & RB_BUFFER_OFF); ++} ++ ++/** + * ring_buffer_record_disable_cpu - stop all writes into the cpu_buffer + * @buffer: The ring buffer to stop writes to. + * @cpu: The CPU buffer to stop +--- a/kernel/trace/trace.c ++++ b/kernel/trace/trace.c +@@ -1323,6 +1323,12 @@ update_max_tr(struct trace_array *tr, st + + arch_spin_lock(&tr->max_lock); + ++ /* Inherit the recordable setting from trace_buffer */ ++ if (ring_buffer_record_is_set_on(tr->trace_buffer.buffer)) ++ ring_buffer_record_on(tr->max_buffer.buffer); ++ else ++ ring_buffer_record_off(tr->max_buffer.buffer); ++ + buf = tr->trace_buffer.buffer; + tr->trace_buffer.buffer = tr->max_buffer.buffer; + tr->max_buffer.buffer = buf; diff --git a/queue-4.9/series b/queue-4.9/series index 9f026ed8034..01f1300fa0a 100644 --- a/queue-4.9/series +++ b/queue-4.9/series @@ -5,3 +5,9 @@ nohz-fix-local_timer_softirq_pending.patch netlink-do-not-subscribe-to-non-existent-groups.patch netlink-don-t-shift-with-ub-on-nlk-ngroups.patch netlink-don-t-shift-on-64-for-ngroups.patch +ext4-fix-false-negatives-and-false-positives-in-ext4_check_descriptors.patch +acpi-pci-bail-early-in-acpi_pci_add_bus-if-there-is-no-acpi-handle.patch +ring_buffer-tracing-inherit-the-tracing-setting-to-next-ring-buffer.patch +i2c-imx-fix-reinit_completion-use.patch +btrfs-fix-file-data-corruption-after-cloning-a-range-and-fsync.patch +tcp-add-tcp_ooo_try_coalesce-helper.patch diff --git a/queue-4.9/tcp-add-tcp_ooo_try_coalesce-helper.patch b/queue-4.9/tcp-add-tcp_ooo_try_coalesce-helper.patch new file mode 100644 index 00000000000..c353e137292 --- /dev/null +++ b/queue-4.9/tcp-add-tcp_ooo_try_coalesce-helper.patch @@ -0,0 +1,73 @@ +From 58152ecbbcc6a0ce7fddd5bf5f6ee535834ece0c Mon Sep 17 00:00:00 2001 +From: Eric Dumazet +Date: Mon, 23 Jul 2018 09:28:21 -0700 +Subject: tcp: add tcp_ooo_try_coalesce() helper + +From: Eric Dumazet + +commit 58152ecbbcc6a0ce7fddd5bf5f6ee535834ece0c upstream. + +In case skb in out_or_order_queue is the result of +multiple skbs coalescing, we would like to get a proper gso_segs +counter tracking, so that future tcp_drop() can report an accurate +number. + +I chose to not implement this tracking for skbs in receive queue, +since they are not dropped, unless socket is disconnected. + +Signed-off-by: Eric Dumazet +Acked-by: Soheil Hassas Yeganeh +Acked-by: Yuchung Cheng +Signed-off-by: David S. Miller +Signed-off-by: David Woodhouse +Signed-off-by: Greg Kroah-Hartman +--- + net/ipv4/tcp_input.c | 23 +++++++++++++++++++++-- + 1 file changed, 21 insertions(+), 2 deletions(-) + +--- a/net/ipv4/tcp_input.c ++++ b/net/ipv4/tcp_input.c +@@ -4370,6 +4370,23 @@ static bool tcp_try_coalesce(struct sock + return true; + } + ++static bool tcp_ooo_try_coalesce(struct sock *sk, ++ struct sk_buff *to, ++ struct sk_buff *from, ++ bool *fragstolen) ++{ ++ bool res = tcp_try_coalesce(sk, to, from, fragstolen); ++ ++ /* In case tcp_drop() is called later, update to->gso_segs */ ++ if (res) { ++ u32 gso_segs = max_t(u16, 1, skb_shinfo(to)->gso_segs) + ++ max_t(u16, 1, skb_shinfo(from)->gso_segs); ++ ++ skb_shinfo(to)->gso_segs = min_t(u32, gso_segs, 0xFFFF); ++ } ++ return res; ++} ++ + static void tcp_drop(struct sock *sk, struct sk_buff *skb) + { + sk_drops_add(sk, skb); +@@ -4493,7 +4510,8 @@ static void tcp_data_queue_ofo(struct so + /* In the typical case, we are adding an skb to the end of the list. + * Use of ooo_last_skb avoids the O(Log(N)) rbtree lookup. + */ +- if (tcp_try_coalesce(sk, tp->ooo_last_skb, skb, &fragstolen)) { ++ if (tcp_ooo_try_coalesce(sk, tp->ooo_last_skb, ++ skb, &fragstolen)) { + coalesce_done: + tcp_grow_window(sk, skb); + kfree_skb_partial(skb, fragstolen); +@@ -4543,7 +4561,8 @@ coalesce_done: + tcp_drop(sk, skb1); + goto merge_right; + } +- } else if (tcp_try_coalesce(sk, skb1, skb, &fragstolen)) { ++ } else if (tcp_ooo_try_coalesce(sk, skb1, ++ skb, &fragstolen)) { + goto coalesce_done; + } + p = &parent->rb_right;