Future work in this driver would like to look at priv->active_offloads &
ENETC_F_QBV to determine whether a tc-taprio qdisc offload was
installed, but this does not produce the intended effect.
All the other flags in priv->active_offloads are managed dynamically,
except ENETC_F_QBV which is set statically based on the probed SI capability.
This change makes priv->active_offloads & ENETC_F_QBV really track the
presence of a tc-taprio schedule on the port.
Some existing users, like the enetc_sched_speed_set() call from
phylink_mac_link_up(), are best kept using the old logic: the tc-taprio
offload does not re-trigger another link mode resolve, so the scheduler
needs to be functional from the get go, as long as Qbv is supported at
all on the port. So to preserve functionality there, look at the static
station interface capability from pf->si->hw_features instead.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 290b5fe096e7 ("net: enetc: preserve TX ring priority across reconfiguration") Signed-off-by: Sasha Levin <sashal@kernel.org>
The size of the TOD programmable field was incorrectly increased from
four to eight bytes with commit 1a2c5840acf9 ("s390/dump: cleanup CPU
save area handling").
This leads to an elf notes section NT_S390_TODPREG which has a size of
eight instead of four bytes in case of kdump, however even worse is
that the contents is incorrect: it is supposed to contain only the
contents of the TOD programmable field, but in fact contains a mix of
the TOD programmable field (32 bit upper bits) and parts of the CPU
timer register (lower 32 bits).
Fix this by simply changing the size of the todpreg field within the
save area structure. This will implicitly also fix the size of the
corresponding elf notes sections.
This also gets rid of this compile time warning:
in function ‘fortify_memcpy_chk’,
inlined from ‘save_area_add_regs’ at arch/s390/kernel/crash_dump.c:99:2:
./include/linux/fortify-string.h:413:25: error: call to ‘__read_overflow2_field’
declared with attribute warning: detected read beyond size of field
(2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
413 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 1a2c5840acf9 ("s390/dump: cleanup CPU save area handling") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The ACPI buffer memory (string.pointer) should be freed as the buffer is
not used after returning from bgx_acpi_match_id(), free it to prevent
memory leak.
Fixes: 46b903a01c05 ("net, thunder, bgx: Add support to get MAC address from ACPI.") Signed-off-by: Yu Liao <liaoyu15@huawei.com> Link: https://lore.kernel.org/r/20221123082237.1220521-1-liaoyu15@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The first validation check for EVT_TRANSACTION has two different checks
tied together with logical AND. One is a check for minimum packet length,
and the other is for a valid aid_tag. If either condition is true (fails),
then an error should be triggered. The fix is to change && to ||.
In com20020_probe(), if com20020_config() fails, dev and info
will not be freed, which will lead to a memory leak.
This patch adds freeing dev and info after com20020_config()
fails to fix this bug.
Compile tested only.
Fixes: 15b99ac17295 ("[PATCH] pcmcia: add return value to _config() functions") Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The main arcnet interrupt handler calls arcnet_close() then
arcnet_open(), if the RESET status flag is encountered.
This is invalid:
1) In general, interrupt handlers should never call ->ndo_stop() and
->ndo_open() functions. They are usually full of blocking calls and
other methods that are expected to be called only from drivers
init and exit code paths.
2) arcnet_close() contains a del_timer_sync(). If the irq handler
interrupts the to-be-deleted timer, del_timer_sync() will just loop
forever.
3) arcnet_close() also calls tasklet_kill(), which has a warning if
called from irq context.
4) For device reset, the sequence "arcnet_close(); arcnet_open();" is
not complete. Some children arcnet drivers have special init/exit
code sequences, which then embed a call to arcnet_open() and
arcnet_close() accordingly. Check drivers/net/arcnet/com20020.c.
Run the device RESET sequence from a scheduled workqueue instead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20210128194802.727770-1-a.darwish@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 1c40cde6b517 ("arcnet: fix potential memory leak in com20020_probe()") Signed-off-by: Sasha Levin <sashal@kernel.org>
For DASD devices in raw_track_access mode only full track images are
read and written.
For this purpose it is not necessary to do search operation in the
locate record extended function. The documentation even states that
this might fail if the searched record is not found on a track.
Currently the driver sets a value of 1 in the search field for the first
record after record zero. This is the default for disks not in
raw_track_access mode but record 1 might be missing on a completely
empty track.
There has not been any problem with this on IBM storage servers but it
might lead to errors with DASD devices on other vendors storage servers.
Fix this by setting the search field to 0. Record zero is always available
even on a completely empty track.
Fixes: e4dbb0f2b5dd ("[S390] dasd: Add support for raw ECKD access.") Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Link: https://lore.kernel.org/r/20221123160719.3002694-4-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
In fib_table_insert(), if the alias was already inserted, but node not
exist, the error code should be set before return from error handling path.
Fixes: a6c76c17df02 ("ipv4: Notify route after insertion to the routing table") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Link: https://lore.kernel.org/r/20221120072838.2167047-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When connect() is called on a socket bound to the wildcard address,
we change the socket's saddr to a local address. If the socket
fails to connect() to the destination, we have to reset the saddr.
However, when an error occurs after inet_hash6?_connect() in
(dccp|tcp)_v[46]_conect(), we forget to reset saddr and leave
the socket bound to the address.
From the user's point of view, whether saddr is reset or not varies
with errno. Let's fix this inconsistent behaviour.
Note that after this patch, the repro [0] will trigger the WARN_ON()
in inet_csk_get_port() again, but this patch is not buggy and rather
fixes a bug papering over the bhash2's bug for which we need another
fix.
For the record, the repro causes -EADDRNOTAVAIL in inet_hash6_connect()
by this sequence:
nf_flow_table_block_setup and the driver TC_SETUP_FT call can modify the flow
block cb list while they are being traversed elsewhere, causing a crash.
Add a write lock around the calls to protect readers
Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Reported-by: Chad Monroe <chad.monroe@smartrg.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Racing conflict could be:
task A task B
list_for_each_entry
strcmp(h->name))
list_for_each_entry
strcmp(h->name)
kzalloc kzalloc
...... .....
device_create device_create
list_add
list_add
The root cause is that task B has no idea about the fact someone
else(A) has inserted heap with same name when it calls list_add,
so a potential collision occurs.
As comment of pci_get_domain_bus_and_slot() says, it returns
a pci device with refcount increment, when finish using it,
the caller must decrement the reference count by calling
pci_dev_put(). Call pci_dev_put() before returning from
bnx2x_vf_is_pcie_pending() to avoid refcount leak.
Fixes: b56e9670ffa4 ("bnx2x: Prepare device and initialize VF database") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221119070202.1407648-1-yangyingliang@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In former times, info->feature was populated via the parent driver
by pdata/regulator_init_data->driver_data for all regulators when
USB_PRODUCT_ID_LSB indicates a TWL6032.
Today, the information is not set, so re-add it at the regulator
definitions.
Fixes: 25d82337705e2 ("regulator: twl: make driver DT only") Signed-off-by: Andreas Kemnade <andreas@kemnade.info> Link: https://lore.kernel.org/r/20221120221208.3093727-2-andreas@kemnade.info Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In commit f11fe1dae1c4 ("net/sched: Make NET_ACT_CT depends on NF_NAT"),
it fixed the build failure when NF_NAT is m and NET_ACT_CT is y by
adding depends on NF_NAT for NET_ACT_CT. However, it would also cause
NET_ACT_CT cannot be built without NF_NAT, which is not expected. This
patch fixes it by changing to use "(!NF_NAT || NF_NAT)" as the depend.
One extra conn_get() is needed in tipc_conn_alloc(), as after
tipc_conn_alloc() is called, tipc_conn_close() may free this
con before deferencing it in tipc_topsrv_accept():
This patch fixes it by holding it in tipc_conn_alloc(), then after
all accessing in tipc_topsrv_accept() releasing it. Note when does
this in tipc_topsrv_kern_subscr(), as tipc_conn_rcv_sub() returns
0 or -1 only, we don't need to check for "> 0".
Fixes: c5fa7b3cf3cb ("tipc: introduce new TIPC server infrastructure") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
It was caused by !con->sock in tipc_conn_close(). In tipc_topsrv_accept(),
con is allocated in conn_idr then its sock is set:
con = tipc_conn_alloc();
... <----[1]
con->sock = newsock;
If tipc_conn_close() is called in anytime of [1], the null-pointer-def
is triggered by con->sock->sk due to con->sock is not yet set.
This patch fixes it by moving the con->sock setting to tipc_conn_alloc()
under s->idr_lock. So that con->sock can never be NULL when getting the
con from s->conn_idr. It will be also safer to move con->server and flag
CF_CONNECTED setting under s->idr_lock, as they should all be set before
tipc_conn_alloc() is called.
Fixes: c5fa7b3cf3cb ("tipc: introduce new TIPC server infrastructure") Reported-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In case command interface is down, or the command is not allowed, driver
did not increment the entry refcount, but might have decrement as part
of forced completion handling.
Fix that by always increment and decrement the refcount to make it
symmetric for all flows.
Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler") Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reported-by: Jack Wang <jinpu.wang@ionos.com> Tested-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix a bug in calculation of FW tracer timestamp. Decreasing one in the
calculation should effect only bits 52_7 and not effect bits 6_0 of the
timestamp, otherwise bits 6_0 are always set in this calculation.
This patch introduced a regression: commit 48596a8ddc46 ("netfilter:
ipset: Fix adding an IPv4 range containing more than 2^31 addresses")
The variable e.ip is passed to adtfn() function which finally adds the
ip address to the set. The patch above refactored the for loop and moved
e.ip = htonl(ip) to the end of the for loop.
What this means is that if the value of "ip" changes between the first
assignement of e.ip and the forloop, then e.ip is pointing to a
different ip address than "ip".
Test case:
$ ipset create jdtest_tmp hash:ip family inet hashsize 2048 maxelem 100000
$ ipset add jdtest_tmp 10.0.1.1/31
ipset v6.21.1: Element cannot be added to the set: it's already added
The value of ip gets updated inside the "else if (tb[IPSET_ATTR_CIDR])"
block but e.ip is still pointing to the old value.
Fixes: 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses") Reviewed-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The range size of consecutive elements were not limited. Thus one could
define a huge range which may result soft lockup errors due to the long
execution time. Now the range size is limited to 2^20 entries.
If device_register() returns error in vmbus_device_register(),
the name allocated by dev_set_name() must be freed. As comment
of device_register() says, it should use put_device() to give
up the reference in the error path. So fix this by calling
put_device(), then the name can be freed in kobject_cleanup().
Fixes: 09d50ff8a233 ("Staging: hv: make the Hyper-V virtual bus code build") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20221119081135.1564691-3-yangyingliang@huawei.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the error path of vmbus_device_register(), device_unregister()
is called, which calls vmbus_device_release(). The latter frees
the struct hv_device that was passed in to vmbus_device_register().
So remove the kfree() in vmbus_add_channel_work() to avoid a double
free.
Fixes: c2e5df616e1a ("vmbus: add per-channel sysfs info") Suggested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20221119081135.1564691-2-yangyingliang@huawei.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Setting of the port flag `NFP_PORT_CHANGED`, introduced
to ensure the correct reading of EEPROM data, causes a
fatal kernel NULL pointer dereference in cases where
the target netdev type cannot be determined.
Add validation of port struct pointer before attempting
to set the `NFP_PORT_CHANGED` flag. Return that operation
is not supported if the netdev type cannot be determined.
Fixes: 4ae97cae07e1 ("nfp: ethtool: fix the display error of `ethtool -m DEVNAME`") Signed-off-by: Jaco Coetzee <jaco.coetzee@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The error is reflected in that it shows wrong splittable status of
port when executing "devlink port show".
The reason which leads the error is that the assigned operation of
splittable is just a simple negation operation of split and it does
not consider port lanes quantity. A splittable port should have
several lanes that can be split(lanes quantity > 1).
If without the judgement, it will show wrong message for some
firmware, such as 2x25G, 2x10G.
Fixes: a0f49b548652 ("devlink: Add a new devlink port split ability attribute and pass to netlink") Signed-off-by: Diana Wang <na.wang@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
As comment of pci_get_domain_bus_and_slot() says, it returns
a pci device with refcount increment, when finish using it,
the caller must decrement the reference count by calling
pci_dev_put().
In pch_gbe_probe(), pci_get_domain_bus_and_slot() is called,
so in error path in probe() and remove() function, pci_dev_put()
should be called to avoid refcount leak. Compile tested only.
If mtk_start_dma() fails, invoke phylink_disconnect_phy() to perform
cleanup. phylink_disconnect_phy() contains the put_device action. If
phylink_disconnect_phy is not performed, the Kref of netdev will leak.
Fixes: b8fc9f30821e ("net: ethernet: mediatek: Add basic PHYLINK support") Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20221117111356.161547-1-liujian56@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
nf_conn:mark can be read from and written to in parallel. Use
READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted
compiler optimizations.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The process is as follows:
Thread A: Thread B:
p9_poll_workfn() p9_client_create()
... ...
p9_conn_cancel() p9_fd_cancel()
list_del() ...
... list_del() //list_del
corruption
There is no lock protection when deleting list in p9_conn_cancel(). After
deleting list in Thread A, thread B will delete the same list again. It
will cause issue of list_del corruption.
Setting req->status to REQ_STATUS_ERROR under lock prevents other
cleanup paths from trying to manipulate req_list.
The other thread can safely check req->status because it still holds a
reference to req at this point.
Link: https://lkml.kernel.org/r/20221110122606.383352-1-shaozhengchao@huawei.com Fixes: 52f1c45dde91 ("9p: trans_fd/p9_conn_cancel: drop client lock earlier") Reported-by: syzbot+9b69b8d10ab4a7d88056@syzkaller.appspotmail.com Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
[Dominique: add description of the fix in commit message] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In pch_gbe_xmit_frame(), NETDEV_TX_OK will be returned whether
pch_gbe_tx_queue() sends data successfully or not, so pch_gbe_tx_queue()
needs to free skb before returning. But pch_gbe_tx_queue() returns without
freeing skb in case of dma_map_single() fails. Add dev_kfree_skb_any()
to fix it.
Fixes: 77555ee72282 ("net: Add Gigabit Ethernet driver of Topcliff PCH") Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Previously we leverage NCI_UNREG and the lock inside nci_close_device to
prevent the race condition between opening a device and closing a
device. However, it still has problem because a failed opening command
will erase the NCI_UNREG flag and allow another opening command to
bypass the status checking.
This fix corrects that by making sure the NCI_UNREG is held.
Reported-by: syzbot+43475bf3cfbd6e41f5b7@syzkaller.appspotmail.com Fixes: 48b71a9e66c2 ("NFC: add NCI_UNREG flag to eliminate the race") Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
After rxrpc_unbundle_conn() has removed a connection from a bundle, it
checks to see if there are any conns with available channels and, if not,
removes and attempts to destroy the bundle.
Whilst it does check after grabbing client_bundles_lock that there are no
connections attached, this races with rxrpc_look_up_bundle() retrieving the
bundle, but not attaching a connection for the connection to be attached
later.
There is therefore a window in which the bundle can get destroyed before we
manage to attach a new connection to it.
Fix this by adding an "active" counter to struct rxrpc_bundle:
(1) rxrpc_connect_call() obtains an active count by prepping/looking up a
bundle and ditches it before returning.
(2) If, during rxrpc_connect_call(), a connection is added to the bundle,
this obtains an active count, which is held until the connection is
discarded.
(3) rxrpc_deactivate_bundle() is created to drop an active count on a
bundle and destroy it when the active count reaches 0. The active
count is checked inside client_bundles_lock() to prevent a race with
rxrpc_look_up_bundle().
(4) rxrpc_unbundle_conn() then calls rxrpc_deactivate_bundle().
Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-15975 Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: zdi-disclosures@trendmicro.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Move to using refcount_t rather than atomic_t for refcounts in rxrpc.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 3bcd6c7eaa53 ("rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975]") Signed-off-by: Sasha Levin <sashal@kernel.org>
Allow the list of in-use local UDP endpoints in the current network
namespace to be viewed in /proc.
To aid with this, the endpoint list is converted to an hlist and RCU-safe
manipulation is used so that the list can be read with only the RCU
read lock held.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 3bcd6c7eaa53 ("rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975]") Signed-off-by: Sasha Levin <sashal@kernel.org>
cocci warnings: (new ones prefixed by >>)
>> drivers/net/ethernet/cavium/liquidio/lio_main.c:1797:54-56: WARNING !A || A && B is equivalent to !A || B
drivers/net/ethernet/cavium/liquidio/lio_main.c:1827:54-56: WARNING !A || A && B is equivalent to !A || B
Fixes: 8979f428a4af ("net: liquidio: release resources when liquidio driver open failed") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
We set the PIOC to GPIO mode. This way the pin becomes an
input signal will be usable by the controller. Without
this change the udc on the 9g20ek does not work.
If device_register() returns error in optee_register_device(),
the name allocated by dev_set_name() need be freed. As comment
of device_register() says, it should use put_device() to give
up the reference in the error path. So fix this by calling
put_device(), then the name can be freed in kobject_cleanup(),
and optee_device is freed in optee_release_device().
Fixes: c3fa24af9244 ("tee: optee: add TEE bus device enumeration support") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When communicating with a PMIC during system poweroff (pm_power_off()),
IRQs are disabled and we are in a RCU read-side critical section, so we
cannot use wait_for_completion_io_timeout(). Instead, poll the status
register for transfer completion.
Fixes: d787dcdb9c8f ("bus: sunxi-rsb: Add driver for Allwinner Reduced Serial Bus") Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20221114015749.28490-3-samuel@sholland.org Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
processor A processor B
regulator_register()
rdev_init_debugfs()
rdev->debugfs = debugfs_create_dir()
devm_regulator_get()
rdev = regulator_dev_lookup()
create_regulator(rdev)
// using rdev->debugfs as parent
debugfs_create_dir(rdev->debugfs)
In devm_regulator_get(), after getting rdev, the refcount
is get, so fix this by moving debugfs_remove_recursive()
to regulator_dev_release(), then it can be proctected by
the refcount, the 'rdev->debugfs' can not be freed until
the refcount is 0.
pci_get_device() will increase the reference count for the returned
pci_dev. Since 'dma_dev' is only used to filter the channel in
dw_spi_dma_chan_filer() after using it we need to call pci_dev_put() to
decrease the reference count. Also add pci_dev_put() for the error case.
Fixes: 7063c0d942a1 ("spi/dw_spi: add DMA support") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Acked-by: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/20221116093204.46700-1-wangxiongfeng2@huawei.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Here is a warning report about lack of registered release()
from kobject lib:
Device '(null)' does not have a release() function, it is broken and must be fixed.
WARNING: CPU: 0 PID: 48430 at drivers/base/core.c:2332 device_release+0x104/0x120
Call Trace:
kobject_put+0xdc/0x180
put_device+0x1b/0x30
regulator_register+0x651/0x1170
devm_regulator_register+0x4f/0xb0
When regulator_register() returns fail and directly goto `clean` symbol,
rdev->dev has not registered release() function yet (which is registered
by regulator_class in the following), so rdev needs to be freed manually.
If rdev->dev.of_node is not NULL, which means the of_node has gotten by
regulator_of_get_init_data(), it needs to call of_node_put() to avoid
refcount leak.
Otherwise, only calling put_device() would lead memory leak of rdev
in further:
When regulator_register() returns fail and goto `wash` symbol,
rdev->dev has registered release() function, so directly call
put_device() to cleanup everything.
Fixes: d3c731564e09 ("regulator: plug of_node leak in regulator_register()'s error path") Signed-off-by: Zeng Heng <zengheng4@huawei.com> Link: https://lore.kernel.org/r/20221116074339.1024240-1-zengheng4@huawei.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Current handling of the srb_status is incorrect. Commit 52e1b3b3daa9
("scsi: storvsc: Correctly handle multiple flags in srb_status")
is based on srb_status being a set of flags, when in fact only the
2 high order bits are flags and the remaining 6 bits are an integer
status. Because the integer values of interest mostly look like flags,
the code actually works when treated that way.
But in the interest of correctness going forward, fix this by treating
the low 6 bits of srb_status as an integer status code. Add handling
for SRB_STATUS_INVALID_REQUEST, which was the original intent of commit 52e1b3b3daa9. Furthermore, treat the ERROR, ABORTED, and INVALID_REQUEST
srb status codes as essentially equivalent for the cases we care about.
There's no harm in doing so, and it isn't always clear which status code
current or older versions of Hyper-V report for particular conditions.
Treating the srb status codes as equivalent has the additional benefit
of ensuring that capacity change events result in an immediate rescan
so that the new size is known to Linux. Existing code checks SCSI
sense data for capacity change events when the srb status is ABORTED.
But capacity change events are also being observed when Hyper-V reports
the srb status as ERROR. Without the immediate rescan, the new size
isn't known until something else causes a rescan (such as running
fdisk to expand a partition), and in the meantime, tools such as "lsblk"
continue to report the old size.
Fixes: 52e1b3b3daa9 ("scsi: storvsc: Correctly handle multiple flags in srb_status") Reported-by: Juan Tian <juantian@microsoft.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1668019722-1983-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The DAI tx_mask and rx_mask are set by snd_soc_dai_set_tdm_slot()
and used by later code that depends on the TDM settings. So
__soc_pcm_open() should not be obliterating those mask values.
The code in __soc_pcm_hw_params() uses these masks to calculate the
active channels so that only the AIF_IN/AIF_OUT widgets for the
active TDM slots are enabled. The zeroing of the masks in
__soc_pcm_open() disables this functionality so all AIF widgets
were enabled even for channels that are not assigned to a TDM slot.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Fixes: 2e5894d73789 ("ASoC: pcm: Add support for DAI multicodec") Link: https://lore.kernel.org/r/20221104132213.121847-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since commit bf2aebccddef ("ASoC: sgtl5000: Fix noise on shutdown/remove"),
the device power control registers are reset when the driver is
removed/shutdown.
This is an issue when the device is configured to use the PLL clock. The
device will stop responding if it is still configured to use the PLL
clock but the PLL clock is powered down.
When rebooting linux, the probe function will show:
sgtl5000 0-000a: Error reading chip id -11
Make sure that the CHIP_CLK_CTRL is reset to its default value before
powering down the device.
Fixes: bf2aebccddef ("ASoC: sgtl5000: Fix noise on shutdown/remove") Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com> Reviewed-by: Fabio Estevam <festevam@gmail.com> Link: https://lore.kernel.org/r/20221110190612.1341469-1-detlev.casanova@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When KASAN is enabled, below log might be dumped with Intel EHL hardware:
[ 48.583597] ==================================================================
[ 48.585921] BUG: KASAN: slab-out-of-bounds in hdac_hda_dai_hw_params+0x20a/0x22b [snd_soc_hdac_hda]
[ 48.587995] Write of size 4 at addr ffff888103489708 by task pulseaudio/759
Remove the regulators node and define fixed regulators in the root node.
Prevents the sdhci-omap driver from waiting in probe deferral forever
because of the missing vmmc-supply and keeps am335x-pcm-953 consistent with
the other Phytec AM335 boards.
Fixes: bb07a829ec38 ("ARM: dts: Add support for phyCORE-AM335x PCM-953 carrier board") Signed-off-by: Dominik Haller <d.haller@phytec.de>
Message-Id: <20221011143115.248003-1-d.haller@phytec.de> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When using GSO it can happen that the wrong seq_hi is used for the last
packets before the wrap around. This can lead to double usage of a
sequence number. To avoid this, we should serialize this last GSO
packet.
The commit in the "Fixes" tag tried to avoid a case where policy check
is ignored due to dst caching in next hops.
However, when the traffic is locally consumed, the dst may be cached
in a local TCP or UDP socket as part of early demux. In this case the
"disable_policy" flag is not checked as ip_route_input_noref() was only
called before caching, and thus, packets after the initial packet in a
flow will be dropped if not matching policies.
Fix by checking the "disable_policy" flag also when a valid dst is
already available.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216557 Reported-by: Monil Patel <monil191989@gmail.com> Fixes: e6175a2ed1f1 ("xfrm: fix "disable_policy" flag use when arriving from different devices") Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
----
v2: use dev instead of skb->dev Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
get_port_from_cmdline() returns an int, yet is assigned to a char, which
is wrong in its own right, but also, with char becoming unsigned, this
poses problems, because -1 is used as an error value. Further
complicating things, fw_init_early_console() is only ever called with a
-1 argument. Fix this up by removing the unused argument from
fw_init_early_console() and treating port as a proper signed integer.
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Recently, ld.lld moved from '--undefined-version' to
'--no-undefined-version' as the default, which breaks the compat vDSO
build:
ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_gettimeofday' failed: symbol not defined
ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_clock_gettime' failed: symbol not defined
ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_clock_getres' failed: symbol not defined
These symbols are not present in the compat vDSO or the regular vDSO for
32-bit but they are unconditionally included in the version section of
the linker script, which is prohibited with '--no-undefined-version'.
Fix this issue by only including the symbols that are actually exported
in the version section of the linker script.
Add the same change for ARM64 as done in the commit 9440c4294160
("x86/syscall: Include asm/ptrace.h in syscall_wrapper header") to
make sure all syscalls see 'struct pt_regs' definition and resulted
BTF for '__arm64_sys_*(struct pt_regs *regs)' functions point to
actual struct.
Without this patch, the BPF verifier refuses to load a tracing prog
which accesses pt_regs.
blkcg_unpin_online
blkcg_destroy_blkgs
blkg_destroy
list_del_init(&blkg->q_node)
// blkg is removed from queue list
6) switch elevator back to bfq
bfq_init_queue
bfq_create_group_hierarchy
blkcg_activate_policy
list_for_each_entry_reverse(blkg, &q->blkg_list)
// blkg is removed from list, hence bfq policy is still NULL
7) throttled io is dispatched to bfq:
bfq_insert_requests
bfq_init_rq
bfq_bic_update_cgroup
bfq_bio_bfqg
bfqg = blkg_to_bfqg(blkg)
// bfqg is NULL because bfq policy is NULL
The problem is only possible in bfq because only bfq can be deactivated and
activated while queue is online, while others can only be deactivated while
the device is removed.
Fix the problem in bfq by checking if blkg is online before calling
blkg_to_bfqg().
Like the Acer Switch One 10 S1003, for which there already is a quirk,
the Acer Switch V 10 (SW5-017) has a 800x1280 portrait screen mounted
in the tablet part of a landscape oriented 2-in-1. Add a quirk for this.
- RC BASIS = 0: The RETURNED LOGICAL BLOCK ADDRESS field indicates the
highest LBA of a contiguous range of zones that are not sequential write
required zones starting with the first zone.
- RC BASIS = 1: The RETURNED LOGICAL BLOCK ADDRESS field indicates the LBA
of the last logical block on the logical unit.
The current scsi_debug READ CAPACITY response does not comply with the
above if there are one or more sequential write required zones. SCSI
initiators need a way to retrieve the largest valid LBA from SCSI
devices. Reporting the largest valid LBA if there are one or more
sequential zones requires to set the RC BASIS field in the READ CAPACITY
response to one. Hence this patch.
Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221102193248.3177608-1-bvanassche@acm.org Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix an issue reported when performing a live migration when multipath is
configured with a short fast fail timeout of 5 seconds and also to have
no_path_retry set to fail. In this scenario, all paths would go into the
devloss state while the ibmvfc driver went through discovery to log back
in. On a loaded system, the discovery might take longer than 5 seconds,
which was resulting in all paths being marked failed, which then resulted
in a read only filesystem.
This patch changes the migration code in ibmvfc to avoid deleting rports at
all in this scenario, so we avoid losing all paths.
That commit tried to improve the performance of macsec offload by
taking advantage of some of the NIC's features, but in doing so, broke
macsec offload when the lower device supports both macsec and ipsec
offload, as the ipsec offload feature flags (mainly NETIF_F_HW_ESP)
were copied from the real device. Since the macsec device doesn't
provide xdo_* ops, the XFRM core rejects the registration of the new
macsec device in xfrm_api_check.
Example perf trace when running
ip link add link eni1np1 type macsec port 4 offload mac
ip 737 [003] 795.477687: probe:xfrm_dev_event__return ret=0x8002 (NOTIFY_BAD)
notifier_call_chain+0x47
register_netdevice+0x846
macsec_newlink+0x25a
dev->features includes NETIF_F_HW_ESP (0x04000000000000), so
xfrm_api_check returns NOTIFY_BAD because we don't have
dev->xfrmdev_ops on the macsec device.
We could probably propagate GSO and a few other features from the
lower device, similar to macvlan. This will be done in a future patch.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add a test case to ensure that released pointer registers will not be
leaked into the map.
Before fix:
./test_verifier 984
984/u reference tracking: try to leak released ptr reg FAIL
Unexpected success to load!
verification time 67 usec
stack depth 4
processed 23 insns (limit 1000000) max_states_per_insn 0 total_states 2
peak_states 2 mark_read 1
984/p reference tracking: try to leak released ptr reg OK
Summary: 1 PASSED, 0 SKIPPED, 1 FAILED
After fix:
./test_verifier 984
984/u reference tracking: try to leak released ptr reg OK
984/p reference tracking: try to leak released ptr reg OK
Summary: 2 PASSED, 0 SKIPPED, 0 FAILED
When this driver is used with a driver that uses preallocated spi_transfer
structs. The speed_hz is halved by every run. This results in:
spi_stm32 44004000.spi: SPI transfer setup failed
ads7846 spi0.0: SPI transfer failed: -22
Example when running with DIV_ROUND_UP():
- First run; speed_hz = 1000000, spi->clk_rate 125000000
div 125 -> mbrdiv = 7, cur_speed = 976562
- Second run; speed_hz = 976562
div 128,00007 (roundup to 129) -> mbrdiv = 8, cur_speed = 488281
- Third run; speed_hz = 488281
div 256,000131072067109 (roundup to 257) and then -EINVAL is returned.
Use DIV_ROUND_CLOSEST to allow to round down and allow us to keep the
set speed.
When trying to transmit an data frame with tx_status to a destination
that have no route in the mesh, then it is dropped without recrediting
the ack_status_frames idr.
Once it is exhausted, wpa_supplicant starts failing to do SAE with
NL80211_CMD_FRAME and logs "nl80211: Frame command failed".
Use ieee80211_free_txskb() instead of kfree_skb() to fix it.
With char becoming unsigned by default, and with `char` alone being
ambiguous and based on architecture, we get a warning when assigning the
unchecked output of hex_to_bin() to that unsigned char. Mark `key` as a
`u8`, which matches the struct's type, and then check each call to
hex_to_bin() before casting.
Cc: Kalle Valo <kvalo@kernel.org> Cc: linux-wireless@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221024162843.535921-1-Jason@zx2c4.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Shifting signed 32-bit value by 31 bits is undefined, so changing
significant bit to unsigned. The UBSAN warning calltrace like below:
UBSAN: shift-out-of-bounds in kernel/auditfilter.c:179:23
left shift of 1 by 31 places cannot be represented in type 'int'
Call Trace:
<TASK>
dump_stack_lvl+0x7d/0xa5
dump_stack+0x15/0x1b
ubsan_epilogue+0xe/0x4e
__ubsan_handle_shift_out_of_bounds+0x1e7/0x20c
audit_register_class+0x9d/0x137
audit_classes_init+0x4d/0xb8
do_one_initcall+0x76/0x430
kernel_init_freeable+0x3b3/0x422
kernel_init+0x24/0x1e0
ret_from_fork+0x1f/0x30
</TASK>
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
[PM: remove bad 'Fixes' tag as issue predates git, added in v2.6.6-rc1] Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This adds the 4 PWM controlled green LEDs to the HiFive Unleashed device
tree. The schematic doesn't specify any special function for the LEDs,
so they're added here without any default triggers and named d1, d2, d3
and d4 just like in the schematic.
Fixes a warning that occurs when rc table support is enabled
(IEEE80211_HW_SUPPORTS_RC_TABLE) in mac80211_hwsim and the PS mode
is changed via the exported debugfs attribute.
When the PS mode is changed, a packet is broadcasted via
hwsim_send_nullfunc by creating and transmitting a plain skb with only
header initialized. The ieee80211 rate array in the control buffer is
zero-initialized. When ratetbl support is enabled, ieee80211_get_tx_rates
is called for the skb with sta parameter set to NULL and thus no
ratetbl can be used. The final rate array then looks like
[-1,0; 0,0; 0,0; 0,0] which causes the warning in ieee80211_get_tx_rate.
The issue is fixed by setting the count of the first rate with idx '0'
to 1 and hence ieee80211_get_tx_rates won't overwrite it with idx '-1'.
Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.
We will only track the uppest parent snapshot realm from which we
need to rebuild the snapshot contexts _downward_ in hierarchy. For
all the others having no new snapshot we will do nothing.
This fix will avoid calling ceph_queue_cap_snap() on some inodes
inappropriately. For example, with the code in mainline, suppose there
are 2 directory hierarchies (with 6 directories total), like this:
/dir_X1/dir_X2/dir_X3/
/dir_Y1/dir_Y2/dir_Y3/
Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then make a
root snapshot under /.snap/root_snap. Every time we make snapshots under
/dir_Y1/..., the kclient will always try to rebuild the snap context for
snap_X2 realm and finally will always try to queue cap snaps for dir_Y2
and dir_Y3, which makes no sense.
That's because the snap_X2's seq is 2 and root_snap's seq is 3. So when
creating a new snapshot under /dir_Y1/... the new seq will be 4, and
the mds will send the kclient a snapshot backtrace in _downward_
order: seqs 4, 3.
When ceph_update_snap_trace() is called, it will always rebuild the from
the last realm, that's the root_snap. So later when rebuilding the snap
context, the current logic will always cause it to rebuild the snap_X2
realm and then try to queue cap snaps for all the inodes related in that
realm, even though it's not necessary.
This is accompanied by a lot of these sorts of dout messages:
Also, the 'invalidate' word is not precise here. In actuality, it will
cause a rebuild of the existing snapshot contexts or just build
non-existent ones. Rename it to 'rebuild_snapcs'.
URL: https://tracker.ceph.com/issues/44100 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Stable-dep-of: 51884d153f7e ("ceph: avoid putting the realm twice when decoding snaps fails") Signed-off-by: Sasha Levin <sashal@kernel.org>
When using multiple instances of this driver the compensation PROM was
overwritten by the last initialized sensor. Now each sensor has own PROM
storage.
Signed-off-by: Mitja Spes <mitja@lxnav.com> Fixes: 9690d81a02dc ("iio: pressure: ms5611: add support for MS5607 temperature and pressure sensor") Link: https://lore.kernel.org/r/20221021135827.1444793-2-mitja@lxnav.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The ms5611 passes &indio_dev->dev as a parameter to all its IO callbacks
only to directly cast the struct device back to struct iio_dev. And the
struct iio_dev is then only used to get the drivers state struct.
Simplify this a bit by passing the state struct directly. This makes it a
bit easier to follow what the code is doing.
Current dual mode adaptor ("DP++") detection code assumes that all
adaptors support i2c sub-addressing for read operations from the
DP-HDMI adaptor ID buffer. It has been observed that multiple
adaptors do not in fact support this, and always return data starting
at register 0. On affected adaptors, the code fails to read the proper
registers that would identify the device as a type 2 adaptor, and
handles those as type 1, limiting the TMDS clock to 165MHz, even if
the according register would announce a higher TMDS clock.
Fix this by always reading the ID buffer starting from offset 0, and
discarding any bytes before the actual offset of interest.
We tried finding authoritative documentation on whether or not this is
allowed behaviour, but since all the official VESA docs are paywalled,
the best we could come up with was the spec sheet for Texas Instruments'
SNx5DP149 chip family.[1] It explicitly mentions that sub-addressing is
supported for register writes, but *not* for reads (See NOTE in
section 8.5.3). Unless TI openly decided to violate the VESA spec, one
could take that as a hint that sub-addressing is in fact not mandated
by VESA.
The other two adaptors affected used the PS8409(A) and the LT8611,
according to the data returned from their ID buffers.
The bridge driver can offload VLANs to the underlying hardware either
via switchdev or the 8021q driver. When the former is used, the VLAN is
marked in the bridge driver with the 'BR_VLFLAG_ADDED_BY_SWITCHDEV'
private flag.
To avoid the memory leaks mentioned in the cited commit, the bridge
driver will try to delete a VLAN via the 8021q driver if the VLAN is not
marked with the previously mentioned flag.
When the VLAN protocol of the bridge changes, switchdev drivers are
notified via the 'SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL' attribute, but
the 8021q driver is also called to add the existing VLANs with the new
protocol and delete them with the old protocol.
In case the VLANs were offloaded via switchdev, the above behavior is
both redundant and buggy. Redundant because the VLANs are already
programmed in hardware and drivers that support VLAN protocol change
(currently only mlx5) change the protocol upon the switchdev attribute
notification. Buggy because the 8021q driver is called despite these
VLANs being marked with 'BR_VLFLAG_ADDED_BY_SWITCHDEV'. This leads to
memory leaks [1] when the VLANs are deleted.
Fix by not calling the 8021q driver for VLANs that were already
programmed via switchdev.
Drivers that support bridge offload need to be notified about changes to
the bridge's VLAN protocol so that they could react accordingly and
potentially veto the change.
Add a new switchdev attribute to communicate the change to drivers.
Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 9d45921ee4cb ("bridge: switchdev: Fix memory leaks when changing VLAN protocol") Signed-off-by: Sasha Levin <sashal@kernel.org>
While the ATA specification states that a device should return command
aborted for all commands queued after the device has entered error state,
since ATA only keeps the sense data for the latest command (in non-NCQ
case), we really don't want to send block layer commands to the device
after it has entered error state. (Only ATA EH commands should be sent,
to read the sense data etc.)
Currently, scsi_queue_rq() will check if scsi_host_in_recovery()
(state is SHOST_RECOVERY), and if so, it will _not_ issue a command via:
scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
-> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()
Before commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
when receiving a TFES error IRQ, the call chain looked like this:
ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)
Which meant that as soon as an error IRQ was serviced, SHOST_RECOVERY
would be set.
However, after commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
scsi_times_out() will instead call scsi_abort_command() which will queue
delayed work, and the worker function scmd_eh_abort_handler() will call
scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).
So now, after the TFES error IRQ has been serviced, we need to wait for
the SCSI workqueue to run its work before SHOST_RECOVERY gets set.
It is worth noting that, even before commit e494f6a72839 ("[SCSI] improved
eh timeout handler"), we could receive an error IRQ from the time when
scsi_queue_rq() checks scsi_host_in_recovery(), to the time when
ata_scsi_queuecmd() is actually called.
In order to handle both the delayed setting of SHOST_RECOVERY and the
window where we can receive an error IRQ, add a check against
ATA_PFLAG_EH_PENDING (which gets set when servicing the error IRQ),
inside ata_scsi_queuecmd() itself, while holding the ap->lock.
(Since the ap->lock is held while servicing IRQs.)
Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler") Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Tested-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch cleans up the code of __ata_scsi_queuecmd(). Since each
branch of the "if" condition check that scmd->cmd_len is not zero, move
this check out of the "if" to simplify the conditions being checked in
the "else" branch.
While at it, avoid the if-else-if-else structure using if-else if
structure and remove the redundant rc local variable.
This patch does not change the function logic.
Signed-off-by: Wenchao Hao <haowenchao@huawei.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Stable-dep-of: e20e81a24a4d ("ata: libata-core: do not issue non-internal commands once EH is pending") Signed-off-by: Sasha Levin <sashal@kernel.org>
If transport_add_device() fails in sas_phy_add(), the kernel will crash
trying to delete the device in transport_remove_device() called from
sas_remove_host().
Fix this by checking and handling return value of transport_add_device()
in sas_phy_add().
Fixes: c7ebbbce366c ("[SCSI] SAS transport class") Suggested-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221107124828.115557-1-yangyingliang@huawei.com Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
It causes runtime failures as reported by Naresh and Arnd writes:
Greg, please just revert fbb4e8e6dc7b ("net: broadcom: Fix BCMGENET Kconfig")
in stable/linux-5.10.y: it depends on e5f31552674e ("ethernet: fix
PTP_1588_CLOCK dependencies"), which we probably don't want backported
from 5.15 to 5.10.
Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find().
Because the ATTR_RECORDs are next to each other, kernel can get the next
ATTR_RECORD from end address of current ATTR_RECORD, through current
ATTR_RECORD length field.
The problem is that during iteration, when kernel calculates the end
address of current ATTR_RECORD, kernel may trigger an integer overflow bug
in executing `a = (ATTR_RECORD*)((u8*)a + le32_to_cpu(a->length))`. This
may wrap, leading to a forever iteration on 32bit systems.
This patch solves it by adding some checks on calculating end address
of current ATTR_RECORD during iteration.
Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find(). To
ensure access on these ATTR_RECORDs are within bounds, kernel will do some
checking during iteration.
The problem is that during checking whether ATTR_RECORD's name is within
bounds, kernel will dereferences the ATTR_RECORD name_offset field, before
checking this ATTR_RECORD strcture is within bounds. This problem may
result out-of-bounds read in ntfs_attr_find(), reported by Syzkaller:
==================================================================
BUG: KASAN: use-after-free in ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597
Read of size 2 at addr ffff88807e352009 by task syz-executor153/3607
This patch solves it by moving the ATTR_RECORD strcture's bounds checking
earlier, then checking whether ATTR_RECORD's name is within bounds.
What's more, this patch also add some comments to improve its
maintainability.
Patch series "ntfs: fix bugs about Attribute", v2.
This patchset fixes three bugs relative to Attribute in record:
Patch 1 adds a sanity check to ensure that, attrs_offset field in first
mft record loading from disk is within bounds.
Patch 2 moves the ATTR_RECORD's bounds checking earlier, to avoid
dereferencing ATTR_RECORD before checking this ATTR_RECORD is within
bounds.
Patch 3 adds an overflow checking to avoid possible forever loop in
ntfs_attr_find().
Without patch 1 and patch 2, the kernel triggersa KASAN use-after-free
detection as reported by Syzkaller.
Although one of patch 1 or patch 2 can fix this, we still need both of
them. Because patch 1 fixes the root cause, and patch 2 not only fixes
the direct cause, but also fixes the potential out-of-bounds bug.
This patch (of 3):
Syzkaller reported use-after-free read as follows:
==================================================================
BUG: KASAN: use-after-free in ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597
Read of size 2 at addr ffff88807e352009 by task syz-executor153/3607
Kernel will loads $MFT/$DATA's first mft record in
ntfs_read_inode_mount().
Yet the problem is that after loading, kernel doesn't check whether
attrs_offset field is a valid value.
To be more specific, if attrs_offset field is larger than bytes_allocated
field, then it may trigger the out-of-bounds read bug(reported as
use-after-free bug) in ntfs_attr_find(), when kernel tries to access the
corresponding mft record's attribute.
This patch solves it by adding the sanity check between attrs_offset field
and bytes_allocated field, after loading the first mft record.
Functions implementing the a_ops->write_end() interface accept the `void
*fsdata` parameter that is supposed to be initialized by the corresponding
a_ops->write_begin() (which accepts `void **fsdata`).
However not all a_ops->write_begin() implementations initialize `fsdata`
unconditionally, so it may get passed uninitialized to a_ops->write_end(),
resulting in undefined behavior.
Fix this by initializing fsdata with NULL before the call to
write_begin(), rather than doing so in all possible a_ops implementations.
This patch covers only the following cases found by running x86 KMSAN
under syzkaller:
- generic_perform_write()
- cont_expand_zero() and generic_cont_expand_simple()
- page_symlink()
Other cases of passing uninitialized fsdata may persist in the codebase.
Link: https://lkml.kernel.org/r/20220915150417.722975-43-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop()
from p9_conn_destroy() from p9_fd_close() is failing to interrupt already
started kernel_read() from p9_fd_read() from p9_read_work() and/or
kernel_write() from p9_fd_write() from p9_write_work() requests.
Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not
need to interrupt kernel_read()/kernel_write(). However, since p9_fd_open()
does not set O_NONBLOCK flag, but pipe blocks unless signal is pending,
p9_mux_poll_stop() needs to interrupt kernel_read()/kernel_write() when
the file descriptor refers to a pipe. In other words, pipe file descriptor
needs to be handled as if socket file descriptor.
We somehow need to interrupt kernel_read()/kernel_write() on pipes.
A minimal change, which this patch is doing, is to set O_NONBLOCK flag
from p9_fd_open(), for O_NONBLOCK flag does not affect reading/writing
of regular files. But this approach changes O_NONBLOCK flag on userspace-
supplied file descriptors (which might break userspace programs), and
O_NONBLOCK flag could be changed by userspace. It would be possible to set
O_NONBLOCK flag every time p9_fd_read()/p9_fd_write() is invoked, but still
remains small race window for clearing O_NONBLOCK flag.
If we don't want to manipulate O_NONBLOCK flag, we might be able to
surround kernel_read()/kernel_write() with set_thread_flag(TIF_SIGPENDING)
and recalc_sigpending(). Since p9_read_work()/p9_write_work() works are
processed by kernel threads which process global system_wq workqueue,
signals could not be delivered from remote threads when p9_mux_poll_stop()
from p9_conn_destroy() from p9_fd_close() is called. Therefore, calling
set_thread_flag(TIF_SIGPENDING)/recalc_sigpending() every time would be
needed if we count on signals for making kernel_read()/kernel_write()
non-blocking.
Switch from strlcpy to strscpy and make sure that @count is the size of
the smaller of the source and destination buffers. This prevents
reading beyond the end of the source buffer when the source string isn't
null terminated.
Found by a modified version of syzkaller.
Suggested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fuzzers like to scribble over sb_bsize_shift but in reality it's very
unlikely that this field would be corrupted on its own. Nevertheless it
should be checked to avoid the possibility of messy mount errors due to
bad calculations. It's always a fixed value based on the block size so
we can just check that it's the expected value.
Tested with:
mkfs.gfs2 -O -p lock_nolock /dev/vdb
for i in 0 -1 64 65 32 33; do
gfs2_edit -p sb field sb_bsize_shift $i /dev/vdb
mount /dev/vdb /mnt/test && umount /mnt/test
done
and with UBSAN configured we also get complaints like
[ 76.373395] UBSAN: shift-out-of-bounds in fs/gfs2/ops_fstype.c:295:19
[ 76.373815] shift exponent 4294967287 is too large for 64-bit type 'long unsigned int'
After the patch, these complaints don't appear, mount fails immediately
and we get an explanation in dmesg.
Reported-by: syzbot+dcf33a7aae997956fe06@syzkaller.appspotmail.com Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sk->sk_receive_queue is protected by skb queue lock, but for KCM
sockets its RX path takes mux->rx_lock to protect more than just
skb queue. However, kcm_recvmsg() still only grabs the skb queue
lock, so race conditions still exist.
We can teach kcm_recvmsg() to grab mux->rx_lock too but this would
introduce a potential performance regression as struct kcm_mux can
be shared by multiple KCM sockets.
So we have to enforce skb queue lock in requeue_rx_msgs() and handle
skb peek case carefully in kcm_wait_data(). Fortunately,
skb_recv_datagram() already handles it nicely and is widely used by
other sockets, we can just switch to skb_recv_datagram() after
getting rid of the unnecessary sock lock in kcm_recvmsg() and
kcm_splice_read(). Side note: SOCK_DONE is not used by KCM sockets,
so it is safe to get rid of this check too.
I ran the original syzbot reproducer for 30 min without seeing any
issue.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot+278279efdd2730dd14bf@syzkaller.appspotmail.com Reported-by: shaozhengchao <shaozhengchao@huawei.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20221114005119.597905-1-xiyou.wangcong@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot found that kcm_tx_work() could crash [1] in:
/* Primarily for SOCK_SEQPACKET sockets */
if (likely(sk->sk_socket) &&
test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
<<*>> clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
sk->sk_write_space(sk);
}
I think the reason is that another thread might concurrently
run in kcm_release() and call sock_orphan(sk) while sk is not
locked. kcm_tx_work() find sk->sk_socket being NULL.
[1]
BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:86 [inline]
BUG: KASAN: null-ptr-deref in clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline]
BUG: KASAN: null-ptr-deref in kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742
Write of size 8 at addr 0000000000000008 by task kworker/u4:3/53
Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]
Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.
[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567