+++ /dev/null
-From aeafbf8486c9e2bd53f5cc3c10c0b7fd7149d69c Mon Sep 17 00:00:00 2001
-From: Filipe Manana <fdmanana@suse.com>
-Date: Fri, 6 Nov 2015 13:33:33 +0000
-Subject: Btrfs: fix race leading to incorrect item deletion when dropping extents
-
-From: Filipe Manana <fdmanana@suse.com>
-
-commit aeafbf8486c9e2bd53f5cc3c10c0b7fd7149d69c upstream.
-
-While running a stress test I got the following warning triggered:
-
- [191627.672810] ------------[ cut here ]------------
- [191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
- (...)
- [191627.701485] Call Trace:
- [191627.702037] [<ffffffff8145f077>] dump_stack+0x4f/0x7b
- [191627.702992] [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
- [191627.704091] [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
- [191627.705380] [<ffffffffa0664499>] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
- [191627.706637] [<ffffffff8104b46d>] warn_slowpath_null+0x1a/0x1c
- [191627.707789] [<ffffffffa0664499>] __btrfs_drop_extents+0x391/0xa50 [btrfs]
- [191627.709155] [<ffffffff8115663c>] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
- [191627.712444] [<ffffffff81155007>] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
- [191627.714162] [<ffffffffa06570c9>] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
- [191627.715887] [<ffffffffa065422b>] ? start_transaction+0x3bb/0x610 [btrfs]
- [191627.717287] [<ffffffffa065b604>] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
- [191627.728865] [<ffffffffa065b888>] finish_ordered_fn+0x15/0x17 [btrfs]
- [191627.730045] [<ffffffffa067d688>] normal_work_helper+0x14c/0x32c [btrfs]
- [191627.731256] [<ffffffffa067d96a>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
- [191627.732661] [<ffffffff81061119>] process_one_work+0x24c/0x4ae
- [191627.733822] [<ffffffff810615b0>] worker_thread+0x206/0x2c2
- [191627.734857] [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
- [191627.736052] [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
- [191627.737349] [<ffffffff810669a6>] kthread+0xef/0xf7
- [191627.738267] [<ffffffff810f3b3a>] ? time_hardirqs_on+0x15/0x28
- [191627.739330] [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
- [191627.741976] [<ffffffff81465592>] ret_from_fork+0x42/0x70
- [191627.743080] [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
- [191627.744206] ---[ end trace bbfddacb7aaada8d ]---
-
- $ cat -n fs/btrfs/file.c
- 691 int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
- (...)
- 758 btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
- 759 if (key.objectid > ino ||
- 760 key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
- 761 break;
- 762
- 763 fi = btrfs_item_ptr(leaf, path->slots[0],
- 764 struct btrfs_file_extent_item);
- 765 extent_type = btrfs_file_extent_type(leaf, fi);
- 766
- 767 if (extent_type == BTRFS_FILE_EXTENT_REG ||
- 768 extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
- (...)
- 774 } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
- (...)
- 778 } else {
- 779 WARN_ON(1);
- 780 extent_end = search_start;
- 781 }
- (...)
-
-This happened because the item we were processing did not match a file
-extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
-case we cast the item to a struct btrfs_file_extent_item pointer and
-then find a type field value that does not match any of the expected
-values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
-due to a tiny time window where a race can happen as exemplified below.
-For example, consider the following scenario where we're using the
-NO_HOLES feature and we have the following two neighbour leafs:
-
- Leaf X (has N items) Leaf Y
-
-[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 EXTENT_DATA 8192), ... ]
- slot N - 2 slot N - 1 slot 0
-
-Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
-than explicit because NO_HOLES is enabled). Now if our inode has an
-ordered extent for the range [4K, 8K[ that is finishing, the following
-can happen:
-
- CPU 1 CPU 2
-
- btrfs_finish_ordered_io()
- insert_reserved_file_extent()
- __btrfs_drop_extents()
- Searches for the key
- (257 EXTENT_DATA 4096) through
- btrfs_lookup_file_extent()
-
- Key not found and we get a path where
- path->nodes[0] == leaf X and
- path->slots[0] == N
-
- Because path->slots[0] is >=
- btrfs_header_nritems(leaf X), we call
- btrfs_next_leaf()
-
- btrfs_next_leaf() releases the path
-
- inserts key
- (257 INODE_REF 4096)
- at the end of leaf X,
- leaf X now has N + 1 keys,
- and the new key is at
- slot N
-
- btrfs_next_leaf() searches for
- key (257 INODE_REF 256), with
- path->keep_locks set to 1,
- because it was the last key it
- saw in leaf X
-
- finds it in leaf X again and
- notices it's no longer the last
- key of the leaf, so it returns 0
- with path->nodes[0] == leaf X and
- path->slots[0] == N (which is now
- < btrfs_header_nritems(leaf X)),
- pointing to the new key
- (257 INODE_REF 4096)
-
- __btrfs_drop_extents() casts the
- item at path->nodes[0], slot
- path->slots[0], to a struct
- btrfs_file_extent_item - it does
- not skip keys for the target
- inode with a type less than
- BTRFS_EXTENT_DATA_KEY
- (BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)
-
- sees a bogus value for the type
- field triggering the WARN_ON in
- the trace shown above, and sets
- extent_end = search_start (4096)
-
- does the if-then-else logic to
- fixup 0 length extent items created
- by a past bug from hole punching:
-
- if (extent_end == key.offset &&
- extent_end >= search_start)
- goto delete_extent_item;
-
- that evaluates to true and it ends
- up deleting the key pointed to by
- path->slots[0], (257 INODE_REF 4096),
- from leaf X
-
-The same could happen for example for a xattr that ends up having a key
-with an offset value that matches search_start (very unlikely but not
-impossible).
-
-So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
-skipped, never casted to struct btrfs_file_extent_item and never deleted
-by accident. Also protect against the unexpected case of getting a key
-for a lower inode number by skipping that key and issuing a warning.
-
-Signed-off-by: Filipe Manana <fdmanana@suse.com>
-Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
----
- fs/btrfs/file.c | 16 ++++++++++++----
- 1 file changed, 12 insertions(+), 4 deletions(-)
-
---- a/fs/btrfs/file.c
-+++ b/fs/btrfs/file.c
-@@ -736,8 +736,16 @@ next_slot:
- }
-
- btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
-- if (key.objectid > ino ||
-- key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
-+
-+ if (key.objectid > ino)
-+ break;
-+ if (WARN_ON_ONCE(key.objectid < ino) ||
-+ key.type < BTRFS_EXTENT_DATA_KEY) {
-+ ASSERT(del_nr == 0);
-+ path->slots[0]++;
-+ goto next_slot;
-+ }
-+ if (key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
- break;
-
- fi = btrfs_item_ptr(leaf, path->slots[0],
-@@ -755,8 +763,8 @@ next_slot:
- extent_end = key.offset +
- btrfs_file_extent_inline_len(leaf, fi);
- } else {
-- WARN_ON(1);
-- extent_end = search_start;
-+ /* can't happen */
-+ BUG();
- }
-
- if (extent_end <= search_start) {
+++ /dev/null
-From foo@baz Fri Dec 11 11:39:46 EST 2015
-From: Daniel Borkmann <daniel@iogearbox.net>
-Date: Wed, 11 Nov 2015 23:25:43 +0100
-Subject: packet: infer protocol from ethernet header if unset
-
-From: Daniel Borkmann <daniel@iogearbox.net>
-
-[ Upstream commit c72219b75fde768efccf7666342282fab7f9e4e7 ]
-
-In case no struct sockaddr_ll has been passed to packet
-socket's sendmsg() when doing a TX_RING flush run, then
-skb->protocol is set to po->num instead, which is the protocol
-passed via socket(2)/bind(2).
-
-Applications only xmitting can go the path of allocating the
-socket as socket(PF_PACKET, <mode>, 0) and do a bind(2) on the
-TX_RING with sll_protocol of 0. That way, register_prot_hook()
-is neither called on creation nor on bind time, which saves
-cycles when there's no interest in capturing anyway.
-
-That leaves us however with po->num 0 instead and therefore
-the TX_RING flush run sets skb->protocol to 0 as well. Eric
-reported that this leads to problems when using tools like
-trafgen over bonding device. I.e. the bonding's hash function
-could invoke the kernel's flow dissector, which depends on
-skb->protocol being properly set. In the current situation, all
-the traffic is then directed to a single slave.
-
-Fix it up by inferring skb->protocol from the Ethernet header
-when not set and we have ARPHRD_ETHER device type. This is only
-done in case of SOCK_RAW and where we have a dev->hard_header_len
-length. In case of ARPHRD_ETHER devices, this is guaranteed to
-cover ETH_HLEN, and therefore being accessed on the skb after
-the skb_store_bits().
-
-Reported-by: Eric Dumazet <edumazet@google.com>
-Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-Acked-by: Willem de Bruijn <willemb@google.com>
-Signed-off-by: David S. Miller <davem@davemloft.net>
-Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
----
- net/packet/af_packet.c | 11 +++++++++++
- 1 file changed, 11 insertions(+)
-
---- a/net/packet/af_packet.c
-+++ b/net/packet/af_packet.c
-@@ -1945,6 +1945,15 @@ static void tpacket_destruct_skb(struct
- sock_wfree(skb);
- }
-
-+static void tpacket_set_protocol(const struct net_device *dev,
-+ struct sk_buff *skb)
-+{
-+ if (dev->type == ARPHRD_ETHER) {
-+ skb_reset_mac_header(skb);
-+ skb->protocol = eth_hdr(skb)->h_proto;
-+ }
-+}
-+
- static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
- void *frame, struct net_device *dev, int size_max,
- __be16 proto, unsigned char *addr, int hlen)
-@@ -2031,6 +2040,8 @@ static int tpacket_fill_skb(struct packe
- dev->hard_header_len);
- if (unlikely(err))
- return err;
-+ if (!skb->protocol)
-+ tpacket_set_protocol(dev, skb);
-
- data += dev->hard_header_len;
- to_write -= dev->hard_header_len;