Just like Fujitsu CELSIUS H730, the H760 also has an Elantech touchpad with
the same quirks. Without this patch, the touchpad is useless out-of-the-box
as the mouse pointer won't move.
This patch makes the driver aware of both the crc_enabled=1 requirement and
the middle button, making the touchpad fully functional out-of-the-box.
On suspend/resume cycle, selftest is executed to reset i8042 controller.
But when this is done in Asus devices, subsequent calls to detect/init
functions to elantech driver fails. Skipping selftest fixes this problem.
An easier step to reproduce this problem is adding i8042.reset=1 as a
kernel parameter. On Asus laptops, it'll make the system to start with the
touchpad already stuck, since psmouse_probe forcibly calls the selftest
function.
This patch was inspired by John Hiesey's change[1], but, since this problem
affects a lot of models of Asus, let's avoid running selftests on them.
All models affected by this problem:
A455LD
K401LB
K501LB
K501LX
R409L
V502LX
X302LA
X450LCP
X450LD
X455LAB
X455LDB
X455LF
Z450LA
Currently regs_return_value always negates reg[2] if it determines
the syscall has failed, but when called in kernel context this check is
invalid and may result in returning a wrong value.
This fixes errors reported by CONFIG_KPROBES_SANITY_TEST
Fixes: d7e7528bcd45 ("Audit: push audit success and retcode into arch ptrace.h") Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14381/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The native ABI vDSO linker script vdso.lds is built by preprocessing
vdso.lds.S, with the native -mabi flag passed in to get the correct ABI
definitions. Unfortunately however certain toolchains choke on -mabi=64
without a corresponding compatible -march flag, for example:
cc1: error: ‘-march=mips32r2’ is not compatible with the selected ABI
scripts/Makefile.build:338: recipe for target 'arch/mips/vdso/vdso.lds' failed
Fix this by including ccflags-vdso in the KBUILD_CPPFLAGS for vdso.lds,
which includes the appropriate -march flag.
Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Signed-off-by: James Hogan <james.hogan@imgtec.com> Reviewed-by: Maciej W. Rozycki <macro@imgtec.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14368/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On a Dell laptop, there is no global adcs for all input devices, so
the input devices use the different adc, as a result, dyn_adc_switch
is set to true.
In this situation, it is safe to control the micmute led according to
user's choice of muting/unmuting the current input device, since only
current input device path is active, while other input device paths
are inactive and powered down.
Fixes: 00ef99408b6c ('ALSA: hda - add mic mute led hook for dell machines') Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver should not ignore errors while registering the I2C
bus, as this device can't even minimally work without the buses,
as it uses those buses internally to talk with the several IP
blocks inside the chip.
The cx231xx_set_agc_analog_digital_mux_select() callers
expect it to return 0 or an error. Returning a positive value
makes the first attempt to switch between analog/digital to fail.
With the current settings, only one channel locks properly.
That's likely because, when this driver was written, Brazil
were still using experimental transmissions.
Change it to reproduce the settings used by the newer drivers.
That makes it lock on other channels.
Tested with both PixelView SBTVD Hybrid (cx231xx-based) and
C3Tech Digital Duo HDTV/SDTV (em28xx-based) devices.
On this frontend, it takes a while to start output normal
TS data. That only happens on state S9. On S8, the TS output
is enabled, but it is not reliable enough.
However, the zigzag loop is too fast to let it sync.
As, on practical tests, the zigzag software loop doesn't
seem to be helping, but just slowing down the tuning, let's
switch to hardware algorithm, as the tuners used on such
devices are capable of work with frequency drifts without
any help from software.
Be defensive about what underlying fs provides us in the returned xattr
list buffer. strlen() may overrun the buffer, so use strnlen() and WARN if
the contents are not properly null terminated.
The function uses the memory address of a struct dentry as unique id.
While the address-based directory entry is only visible to root it is IMHO
still worth fixing since the temporary name does not have to be a kernel
address. It can be any unique number. Replace it by an atomic integer
which is allowed to wrap around.
When using efifb with a 16-bit (5:6:5) visual, fbcon's text is rendered
in the wrong colors - e.g. text gray (#aaaaaa) is rendered as green
(#50bc50) and neighboring pixels have slightly different values
(such as #50bc78).
The reason is that fbcon loads its 16 color palette through
efifb_setcolreg(), which in turn calculates a 32-bit value to write
into memory for each palette index.
Until now, this code could only handle 8-bit visuals and didn't mask
overlapping values when ORing them.
With this patch, fbcon displays the correct colors when a qemu VM is
booted in 16-bit mode (in GRUB: "set gfxpayload=800x600x16").
Fixes: 7c83172b98e5 ("x86_64 EFI boot support: EFI frame buffer driver") # v2.6.24+ Signed-off-by: Max Staudt <mstaudt@suse.de> Acked-By: Peter Jones <pjones@redhat.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We accidentally overwrite the original saved value of "flags" so that we
can't re-enable IRQs at the end of the function. Presumably this
function is mostly called with IRQs disabled or it would be obvious in
testing.
Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This was lost with commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
but is necessary for problem determination, e.g. to see the
currently active zone set during automatic port scan.
For the large GPN_FT response (4 pages), save space by not dumping
any empty residual entries.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Reviewed-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
started to add FC_CT_HDR_LEN which made zfcp dump random data
out of bounds for RSPN GS responses because u.rspn.rsp
is the largest and last field in the union of struct zfcp_fc_req.
Other request/response types only happened to stay within bounds
due to the padding of the union or
due to the trace capping of u.gspn.rsp to ZFCP_DBF_SAN_MAX_PAYLOAD.
Timestamp : ...
Area : SAN
Subarea : 00
Level : 1
Exception : -
CPU id : ..
Caller : ...
Record id : 2
Tag : fsscth2
Request id : 0x...
Destination ID : 0x00fffffc
Payload short : 01000000fc0200008002000000000000
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx <=== 00000000000000000000000000000000
Payload length : 32 <===
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Reviewed-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
we lost the N_Port-ID where an ELS response comes from.
With commit 7c7dc196814b9e1d5cc254dc579a5fa78ae524f7
("[SCSI] zfcp: Simplify handling of ct and els requests")
we lost the N_Port-ID where a CT response comes from.
It's especially useful if the request SAN trace record
with D_ID was already lost due to trace buffer wrap.
GS uses an open WKA port handle and ELS just a D_ID, and
only for ELS we could get D_ID from QTCB bottom via zfcp_fsf_req.
To cover both cases, add a new field to zfcp_fsf_ct_els
and fill it in on request to use in SAN response trace.
Strictly speaking the D_ID on SAN response is the FC frame's S_ID.
We don't need a field for the other end which is always us.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Fixes: 7c7dc196814b ("[SCSI] zfcp: Simplify handling of ct and els requests") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This information was lost with
commit a54ca0f62f953898b05549391ac2a8a4dad6482b
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
but is required to debug e.g. invalid handle situations.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit a54ca0f62f953898b05549391ac2a8a4dad6482b
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
HBA records no longer contain WWPN, D_ID, or LUN
to reduce duplicate information which is already in REC records.
In contrast to "regular" target ports, we don't use recovery to open
WKA ports such as directory/nameserver, so we don't get REC records.
Therefore, introduce pseudo REC running records without any
actual recovery action but including D_ID of WKA port on open/close.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While retaining the actual filtering according to trace level,
the following commits started to write such filtered records
with a hardcoded record level of 1 instead of the actual record level:
commit 250a1352b95e1db3216e5c5d4f4365bea5122f4a
("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.")
commit a54ca0f62f953898b05549391ac2a8a4dad6482b
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Now we can distinguish written records again for offline level filtering.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 250a1352b95e ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.") Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On a successful end of reopen port forced,
zfcp_erp_strategy_followup_success() re-uses the port erp_action
and the subsequent zfcp_erp_action_cleanup() now
sees ZFCP_ERP_SUCCEEDED with
erp_action->action==ZFCP_ERP_ACTION_REOPEN_PORT
instead of ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
but must not perform zfcp_scsi_schedule_rport_register().
We can detect this because the fresh port reopen erp_action
is in its very first step ZFCP_ERP_STEP_UNINITIALIZED.
Otherwise this opens a time window with unblocked rport
(until the followup port reopen recovery would block it again).
If a scsi_cmnd timeout occurs during this time window
fc_timed_out() cannot work as desired and such command
would indeed time out and trigger scsi_eh. This prevents
a clean and timely path failover.
This should not happen if the path issue can be recovered
on FC transport layer such as path issues involving RSCNs.
Also, unnecessary and repeated DID_IMM_RETRY for pending and
undesired new requests occur because internally zfcp still
has its zfcp_port blocked.
As follow-on errors with scsi_eh, it can cause,
in the worst case, permanently lost paths due to one of:
sd <scsidev>: [<scsidisk>] Medium access timeout failure. Offlining disk!
sd <scsidev>: Device offlined - not ready after error recovery
For fix validation and to aid future debugging with other recoveries
we now also trace (un)blocking of rports.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 5767620c383a ("[SCSI] zfcp: Do not unblock rport from REOPEN_PORT_FORCED") Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the hardware data router case, introduced with kernel 3.2
commit 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router")
the ELS/GS request&response length needs to be initialized
as in the chained SBAL case.
Otherwise, the FCP channel rejects ELS requests with
FSF_REQUEST_SIZE_TOO_LARGE.
Such ELS requests can be issued by user space through BSG / HBA API,
or zfcp itself uses ADISC ELS for remote port link test on RSCN.
The latter can cause a short path outage due to
unnecessary remote target port recovery because the always
failing ADISC cannot detect extremely short path interruptions
beyond the local FCP channel.
Below example is decoded with zfcpdbf from s390-tools:
Timestamp : ...
Area : SAN
Subarea : 00
Level : 1
Exception : -
CPU id : ..
Caller : zfcp_dbf_san_req+0408
Record id : 1
Tag : fssels1
Request id : 0x<reqid>
Destination ID : 0x00<target d_id>
Payload info : 5200000000000000 <our wwpn > [ADISC]
<our wwnn > 00<s_id> 00000000 00000000000000000000000000000000
Timestamp : ...
Area : HBA
Subarea : 00
Level : 1
Exception : -
CPU id : ..
Caller : zfcp_dbf_hba_fsf_res+0740
Record id : 1
Tag : fs_ferr
Request id : 0x<reqid>
Request status : 0x00000010
FSF cmnd : 0x0000000b [FSF_QTCB_SEND_ELS]
FSF sequence no: 0x...
FSF issued : ...
FSF stat : 0x00000061 [FSF_REQUEST_SIZE_TOO_LARGE]
FSF stat qual : 00000000000000000000000000000000
Prot stat : 0x00000100
Prot stat qual : 00000000000000000000000000000000
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For an NPIV-enabled FCP device, zfcp can erroneously show
"NPort (fabric via point-to-point)" instead of "NPIV VPORT"
for the port_type sysfs attribute of the corresponding
fc_host.
s390-tools that can be affected are dbginfo.sh and ziomon.
zfcp_fsf_exchange_config_evaluate() ignores
fsf_qtcb_bottom_config.connection_features indicating NPIV
and only sets fc_host_port_type to FC_PORTTYPE_NPORT if
fsf_qtcb_bottom_config.fc_topology is FSF_TOPO_FABRIC.
Only the independent zfcp_fsf_exchange_port_evaluate()
evaluates connection_features to overwrite fc_host_port_type
to FC_PORTTYPE_NPIV in case of NPIV.
Code was introduced with upstream kernel 2.6.30
commit 0282985da5923fa6365adcc1a1586ae0c13c1617
("[SCSI] zfcp: Report fc_host_port_type as NPIV").
This works during FCP device recovery (such as set online)
because it performs FSF_QTCB_EXCHANGE_CONFIG_DATA followed by
FSF_QTCB_EXCHANGE_PORT_DATA in sequence.
However, the zfcp-specific scsi host sysfs attributes
"requests", "megabytes", or "seconds_active" trigger only
zfcp_fsf_exchange_config_evaluate() resetting fc_host
port_type to FC_PORTTYPE_NPORT despite NPIV.
The zfcp-specific scsi host sysfs attribute "utilization"
triggers only zfcp_fsf_exchange_port_evaluate() correcting
the fc_host port_type again in case of NPIV.
Evaluate fsf_qtcb_bottom_config.connection_features
in zfcp_fsf_exchange_config_evaluate() where it belongs to.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 0282985da592 ("[SCSI] zfcp: Report fc_host_port_type as NPIV") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When Fastmap is used we can face here an -EBADMSG
since Fastmap cannot know about unmaps.
If the erasure was interrupted the PEB may show ECC
errors and UBI would go to ro-mode as it assumes
that the PEB was check during attach time, which is
not the case with Fastmap.
Function ib_create_qp() was failing to return an error when
rdma_rw_init_mrs() fails, causing a crash further down in ib_create_qp()
when trying to dereferece the qp pointer which was actually a negative
errno.
Avoid that mapping an sg-list in which the first element has a
non-zero offset triggers an infinite loop when using FMR. This
patch makes the FMR mapping code similar to that of ib_sg_to_pages().
Note: older Mellanox HCAs do not support non-zero offsets for FMR.
See also commit 8c4037b501ac ("IB/srp: always avoid non-zero offsets
into an FMR").
Reported-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 2b4e3ad8f579 ("powerpc/mm/hash64: Don't test for machine type
to detect HEA special case") we changed the logic in might_have_hea()
to check FW_FEATURE_SPLPAR rather than machine_is(pseries).
However the check was incorrectly negated, leading to crashes on
machines with HEA adapters, such as:
mm: Hashing failure ! EA=0xd000080080004040 access=0x800000000000000e current=NetworkManager
trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2 pte=0xc0003cc033e701ae
Unable to handle kernel paging request for data at address 0xd000080080004040
Call Trace:
.ehea_create_cq+0x148/0x340 [ehea] (unreliable)
.ehea_up+0x258/0x1200 [ehea]
.ehea_open+0x44/0x1a0 [ehea]
...
Fix it by removing the negation.
Fixes: 2b4e3ad8f579 ("powerpc/mm/hash64: Don't test for machine type to detect HEA special case") Reported-by: Denis Kirjanov <kda@linux-powerpc.org> Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commit fixes a stack corruption in the pseries specific code dealing
with the huge pages.
In __pSeries_lpar_hugepage_invalidate() the buffer used to pass arguments
to the hypervisor is not large enough. This leads to a stack corruption
where a previously saved register could be corrupted leading to unexpected
result in the caller, like the following panic:
Most of the time, the bug is surfacing in a caller up in the stack from
__pSeries_lpar_hugepage_invalidate() which is quite confusing.
This bug is pending since v3.11 but was hidden if a caller of the
caller of __pSeries_lpar_hugepage_invalidate() has pushed the corruped
register (r18 in this case) in the stack and is not using it until
restoring it. GCC 6.2.0 seems to raise it more frequently.
This commit also change the definition of the parameter buffer in
pSeries_lpar_flush_hash_range() to rely on the global define
PLPAR_HCALL9_BUFSIZE (no functional change here).
Debugging a data corruption issue with virtio-net/vhost-net led to
the observation that __copy_tofrom_user was occasionally returning
a value 16 larger than it should. Since the return value from
__copy_tofrom_user is the number of bytes not copied, this means
that __copy_tofrom_user can occasionally return a value larger
than the number of bytes it was asked to copy. In turn this can
cause higher-level copy functions such as copy_page_to_iter_iovec
to corrupt memory by copying data into the wrong memory locations.
It turns out that the failing case involves a fault on the store
at label 79, and at that point the first unmodified byte of the
destination is at R3 + 16. Consequently the exception handler
for that store needs to add 16 to R3 before using it to work out
how many bytes were not copied, but in this one case it was not
adding the offset to R3. To fix it, this moves the label 179 to
the point where we add 16 to R3. I have checked manually all the
exception handlers for the loads and stores in this code and the
rest of them are correct (it would be excellent to have an
automated test of all the exception cases).
This bug has been present since this code was initially
committed in May 2002 to Linux version 2.5.20.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For hugetlb to work with 4K page size, we need MAX_ORDER to be 13 or
more. When switching from a 64K page size to 4K linux page size using
make oldconfig, we end up with a CONFIG_FORCE_MAX_ZONEORDER value of 9.
This results in a 16M hugepage beiing considered as a gigantic huge page
which in turn results in failure to setup hugepages if gigantic hugepage
support is not enabled.
This also results in kernel crash with 4K radix configuration. We
hit the below BUG_ON on radix:
The hub diag-data type is filled with big-endian data by OPAL call
opal_pci_get_hub_diag_data(). We need convert it to CPU-endian value
before using it. The issue is reported by sparse as pointed by Michael
Ellerman:
eeh-powernv.c:1309:21: warning: restricted __be16 degrades to integer
This converts hub diag-data type to CPU-endian before using it in
pnv_eeh_get_and_dump_hub_diag().
Fixes: 2a485ad7c88d ("powerpc/powernv: Drop PHB operation next_error()") Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
eeh_pe_bus_get() can return NULL if a PCI bus isn't found for a given PE.
Some callers don't check this, and can cause a null pointer dereference
under certain circumstances.
Fix this by checking NULL everywhere eeh_pe_bus_get() is called.
Fixes: 8a6b1bc70dbb ("powerpc/eeh: EEH core to handle special event") Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The PE number (@frozen_pe_no), filled by opal_pci_next_error() is in
big-endian format. It should be converted to CPU-endian before it is
passed to opal_pci_eeh_freeze_clear() when clearing the frozen state if
the PE is invalid one. As Michael Ellerman pointed out, the issue is
also detected by sparse:
eeh-powernv.c:1541:41: warning: incorrect type in argument 2 (different base types)
This passes CPU-endian PE number to opal_pci_eeh_freeze_clear() and it
should be part of commit <0f36db77643b> ("powerpc/eeh: Fix wrong printed
PE number"), which was merged to 4.3 kernel.
Fixes: 71b540adffd9 ("powerpc/powernv: Don't escalate non-existing frozen PE") Suggested-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
__kernel_get_syscall_map() and __kernel_clock_getres() use cmpli to
check if the passed in pointer is non zero. cmpli maps to a 32 bit
compare on binutils, so we ignore the top 32 bits.
A simple test case can be created by passing in a bogus pointer with
the bottom 32 bits clear. Using a clk_id that is handled by the VDSO,
then one that is handled by the kernel shows the problem:
The bigger issue is if we pass a valid pointer with the bottom 32 bits
clear, in this case we will return success but won't write any data
to the pointer.
I stumbled across this issue because the LLVM integrated assembler
doesn't accept cmpli with 3 arguments. Fix this by converting them to
cmpldi.
Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel") Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 31cdd0c39c75 ("powerpc/xmon: Fix SPR read/write commands and
add command to dump SPRs") I added two uses of the "ld" instruction in
spr_access.S. "ld" is a 64-bit instruction, so shouldn't be used on
32-bit CPUs.
Replace it with PPC_LL which is a macro that gives us either "ld" or
"lwz" depending on whether we're 64 or 32-bit.
Fixes: 31cdd0c39c75 ("powerpc/xmon: Fix SPR read/write commands and add command to dump SPRs") Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As the documentation for kthread_stop() says, "if threadfn() may call
do_exit() itself, the caller must ensure task_struct can't go away".
dm-crypt does not ensure this and therefore crashes when crypt_dtr()
calls kthread_stop(). The crash is trivially reproducible by adding a
delay before the call to kthread_stop() and just opening and closing a
dm-crypt device.
This problem was introduced by bcbd94ff481e ("dm crypt: fix a possible
hang due to race condition on exit").
Looking at the description of that patch (excerpted below), it seems
like the problem it addresses can be solved by just using
set_current_state instead of __set_current_state, since we obviously
need the memory barrier.
| dm crypt: fix a possible hang due to race condition on exit
|
| A kernel thread executes __set_current_state(TASK_INTERRUPTIBLE),
| __add_wait_queue, spin_unlock_irq and then tests kthread_should_stop().
| It is possible that the processor reorders memory accesses so that
| kthread_should_stop() is executed before __set_current_state(). If
| such reordering happens, there is a possible race on thread
| termination: [...]
So this patch just reverts the aforementioned patch and changes the
__set_current_state(TASK_INTERRUPTIBLE) to set_current_state(...). This
fixes the crash and should also fix the potential hang.
Fixes: bcbd94ff481e ("dm crypt: fix a possible hang due to race condition on exit") Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If pg_init_retries is set and a request is queued against a multipath
device with all underlying block device request_queues in the "dying"
state then an infinite loop is triggered because activate_path() never
succeeds and hence never calls pg_init_done().
This change avoids that device removal triggers an infinite loop by
failing the activate_path() which causes the "dying" path to be failed.
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Every call of queue_flag_clear_unlocked() after block device
initialization has finished is wrong if blk_cleanup_queue() can be
called concurrently. Convert queue_flag_clear_unlocked() into
queue_flag_clear() and protect it by the block layer queue lock.
Also, factor out dm_mq_start_queue().
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The MTC packet provides a 8-bit slice of CTC which is related to TSC by
the TMA packet, however the TMA packet only provides the lower 16 bits
of CTC. If mtc_shift > 8 then some of the MTC bits are not in the CTC
provided by the TMA packet. Fix-up the last_mtc calculated from the TMA
packet by copying the missing bits from the current MTC assuming the
least difference between the two, and that the current MTC comes after
last_mtc.
In cycle-accurate mode, timestamps can be calculated from CYC packets.
The decoder also estimates timestamps based on the number of
instructions since the last timestamp. For that to work in
cycle-accurate mode, the instruction count needs to be reset to zero
when a timestamp is calculated from a CYC packet, but that wasn't
happening, so fix it.
Fix occasional decoder errors decoding trace data collected in snapshot
mode.
Snapshot mode can take successive snapshots of trace which might overlap.
The decoder checks whether there is an overlap but only looks at the
current and previous buffer. However buffers that do not contain
synchronization (i.e. PSB) packets cannot be decoded or used for overlap
checking. That means the decoder actually needs to check overlaps between
the current buffer and the previous buffer that contained usable data.
Make that change.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Link: http://lkml.kernel.org/r/1474641528-18776-10-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ramoops buffer may be mapped as either I/O memory or uncached
memory. On ARM64, this results in a device-type (strongly-ordered)
mapping. Since unnaligned accesses to device-type memory will
generate an alignment fault (regardless of whether or not strict
alignment checking is enabled), it is not safe to use memcpy().
memcpy_fromio() is guaranteed to only use aligned accesses, so use
that instead.
persistent_ram_update uses vmap / iomap based on whether the buffer is in
memory region or reserved region. However, both map it as non-cacheable
memory. For armv8 specifically, non-cacheable mapping requests use a
memory type that has to be accessed aligned to the request size. memcpy()
doesn't guarantee that.
I have here a FPGA behind PCIe which exports SRAM which I use for
pstore. Now it seems that the FPGA no longer supports cmpxchg based
updates and writes back 0xff…ff and returns the same. This leads to
crash during crash rendering pstore useless.
Since I doubt that there is much benefit from using cmpxchg() here, I am
dropping this atomic access and use the spinlock based version.
Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Rabin Vincent <rabinv@axis.com> Tested-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
[kees: remove "_locked" suffix since it's the only option now] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 34f0ec82e0a9 ("pstore: Correct the max_dump_cnt clearing of
ramoops") sets ->max_dump_cnt to zero before looping over ->przs but we
didn't use it before that either.
And since commit ee1d267423a1 ("pstore: add pstore unregister") we free
that memory on rmmod.
But even then, we looped until a NULL pointer or ERR. I don't see where
it is ensured that the last member is NULL. Let's try this instead:
simply error recovery and free. Clean up in error case where resources
were allocated. And then, in the free path, rely on ->max_dump_cnt in
the free path.
Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Architecturally we need to keep __gp below 0x1000000.
But because of ftrace and tracepoint support, the RO_DATA_SECTION now gets much
bigger than it was before. By moving the linkage tables before RO_DATA_SECTION
we can avoid that __gp gets positioned at a too high address.
The config option HAVE_UNSTABLE_SCHED_CLOCK is set automatically when compiling
for SMP. There is no need to clear the stable-clock flag via
clear_sched_clock_stable() when starting secondary CPUs, and even worse,
clearing it triggers wrong self-detected CPU stall warnings on 64bit Mako
machines.
Increase the initial kernel default page mapping size for SMP kernels to 32MB
and add a runtime check which panics early if the kernel is bigger than the
initial mapping size.
This fixes boot crashes of 32bit SMP kernels. Due to the introduction of huge
page support in kernel 4.4 and it's required initial kernel layout in memory, a
32bit SMP kernel usually got bigger (in layout, not size) than 16MB.
'best' is always less or equals to 'pos', so `best - pos' returns
a negative value which is then getting casted to `unsigned int'
and passed to __cpufreq_driver_target()->acpi_cpufreq_target()
for policy->freq_table selection. This results in
This is a requirement that MSR MSR_PM_ENABLE must be set to 0x01 before
reading MSR_HWP_CAPABILITIES on a given CPU. If cpufreq init() is
scheduled on a CPU which is not same as policy->cpu or migrates to a
different CPU before calling msr read for MSR_HWP_CAPABILITIES, it
is possible that MSR_PM_ENABLE was not to set to 0x01 on that CPU.
This will cause GP fault. So like other places in this path
rdmsrl_on_cpu should be used instead of rdmsrl.
Moreover the scope of MSR_HWP_CAPABILITIES is on per thread basis, so it
should be read from the same CPU, for which MSR MSR_HWP_REQUEST is
getting set.
Commit d352cf47d93e (cpufreq: conservative: Do not use transition
notifications) overlooked the case when the "frequency step" used
by the conservative governor is small relative to the distances
between the available frequencies and broke the algorithm by
using policy->cur instead of the previously requested frequency
when computing the next one.
As a result, the governor may not be able to go outside of a narrow
range between two consecutive available frequencies.
Fix the problem by making the governor save the previously requested
frequency and select the next one relative that value (unless it is
out of range, in which case policy->cur will be used instead).
Fixes: d352cf47d93e (cpufreq: conservative: Do not use transition notifications) Link: https://bugzilla.kernel.org/show_bug.cgi?id=177171 Reported-and-tested-by: Aleksey Rybalkin <aleksey@rybalkin.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that the cpufreq-dt-platdev is used to create the cpufreq-dt platform
device for all OMAP platforms and the platform code that did it
before has been removed, add ti,am33xx and ti,dra7xx to the machine list
in cpufreq-dt-platdev which had relied on the removed platform code to do
this previously.
Fixes: 7694ca6e1d6f (cpufreq: omap: Use generic platdev driver) Signed-off-by: Dave Gerlach <d-gerlach@ti.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
of_irq_get[_byname]() return 0 iff irq_create_of_mapping() call fails.
Returning both error code and 0 on failure is a sign of a misdesigned API,
it makes the failure check unnecessarily complex and error prone. We should
rely on the platform IRQ resource in this case, not return 0, especially
as 0 can be a valid IRQ resource too...
Fixes: aff008ad813c ("platform_get_irq: Revert to platform_get_resource if of_irq_get fails") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The tegra_pcie_phy_disable() path called pads_writel() with arguments in
the wrong order. Swap them to be the "value, offset" order expected by
pads_writel().
Similar to the AR93xx and the AR94xx series, the AR95xx also have the same
quirk for the Bus Reset. It will lead to instant system reset if the
device is assigned via VFIO to a KVM VM. I've been able reproduce this
behavior with a MikroTik R11e-2HnD.
Fixes: c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset") Signed-off-by: Maik Broemme <mbroemme@libmpq.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Potentially overflowing expression 1000000 * data->timeout_clks with
type unsigned int is evaluated using 32-bit arithmetic, and then used
in a context that expects an expression of type unsigned long long.
To avoid overflow, cast 1000000U to type unsigned long long.
Special thanks to Coverity.
CMD23 aka SET_BLOCK_COUNT was introduced with MMC v3.1.
Older versions of the specification allowed to terminate
multi-block transfers only with CMD12.
The patch fixes the following problem:
mmc0: new MMC card at address 0001
mmcblk0: mmc0:0001 SDMB-16 15.3 MiB
mmcblk0: timed out sending SET_BLOCK_COUNT command, card status 0x400900
...
blk_update_request: I/O error, dev mmcblk0, sector 0
Buffer I/O error on dev mmcblk0, logical block 0, async page read
mmcblk0: unable to read partition table
Some RTL8821AE devices sold in Great Britain have the country code of
0x25 encoded in their EEPROM. This value is not tested in the routine
that establishes the regulatory info for the chip. The fix is to set
this code to have the same capabilities as the EU countries. In addition,
the channels allowed for COUNTRY_CODE_ETSI were more properly suited
for China and Israel, not the EU. This problem has also been fixed.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Firmware is running watchdog timer for tracking copy engine ring index
and write index. Whenever both indices are stuck at same location for
given duration, watchdog will be trigger to assert target. While
updating copy engine destination ring write index, driver ensures that
write index will not be same as read index by finding delta between these
two indices (CE_RING_DELTA).
HTT target to host copy engine (CE5) is special case where ring buffers
will be reused and delta check is not applied while updating write index.
In rare scenario, whenever CE5 ring is full, both indices will be referring
same location and this is causing CE ring stuck issue as explained
above. This issue is originally reported on IPQ4019 during long hour stress
testing and during veriwave max clients testsuites. The same issue is
also observed in other chips as well. Fix this by ensuring that write
index is one less than read index which means that full ring is
available for receiving data.
Tested-by: Tamizh chelvam <c_traja@qti.qualcomm.com> Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
there define two devfreq_event_get_drvdata() function in devfreq-event.h
when disable CONFIG_PM_DEVFREQ_EVENT, it will lead to build fail. So
remove devfreq_event_get_drvdata() function.
The current clock tree only implements the minimal set of differences
between the i.MX6Q and the i.MX6DL, but that doesn't really reflect
reality.
Apply the following fixes to match the RM:
- DL has no GPU3D_SHADER_SEL/PODF, the shader domain is clocked by
GPU3D_CORE
- GPU3D_SHADER_SEL/PODF has been repurposed as GPU2D_CORE_SEL/PODF
- GPU2D_CORE_SEL/PODF has been repurposed as MLB_SEL/PODF
Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Initialize the GPU clock muxes to sane inputs. Until now they have
not been changed from their default values, which means that both
GPU3D shader and GPU2D core were fed by clock inputs whose rates
exceed the maximium allowed frequency of the cores by as much as
200MHz.
This fixes a severe GPU stability issue on i.MX6DL.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
http://www.ti.com/lit/pdf/SWCZ010:
DCDC o/p voltage can go higher than programmed value
Impact:
VDDI, VDD2, and VIO output programmed voltage level can go higher than
expected or crash, when coming out of PFM to PWM mode or using DVFS.
Description:
When DCDC CLK SYNC bits are 11/01:
* VIO 3-MHz oscillator is the source clock of the digital core and input
clock of VDD1 and VDD2
* Turn-on of VDD1 and VDD2 HSD PFETis synchronized or at a constant
phase shift
* Current pulled though VCC1+VCC2 is Iload(VDD1) + Iload(VDD2)
* The 3 HSD PFET will be turned-on at the same time, causing the highest
possible switching noise on the application. This noise level depends
on the layout, the VBAT level, and the load current. The noise level
increases with improper layout.
When DCDC CLK SYNC bits are 00:
* VIO 3-MHz oscillator is the source clock of digital core
* VDD1 and VDD2 are running on their own 3-MHz oscillator
* Current pulled though VCC1+VCC2 average of Iload(VDD1) + Iload(VDD2)
* The switching noise of the 3 SMPS will be randomly spread over time,
causing lower overall switching noise.
Workaround:
Set DCDCCTRL_REG[1:0]= 00.
Signed-off-by: Jan Remmet <j.remmet@phytec.de> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
From the beginning of the gpio-mpc8xxx.c, the "handle_level_irq"
has being used to handle GPIO interrupts in the PowerPC/Layerscape
platforms. But actually, almost all PowerPC/Layerscape platforms
assert an interrupt request upon either a high-to-low change or
any change on the state of the signal.
So the "handle_level_irq" is not reasonable for PowerPC/Layerscape
GPIO interrupt, it should be "handle_edge_irq". Otherwise the system
may lost some interrupts from the PIN's state changes.
Signed-off-by: Liu Gang <Gang.Liu@nxp.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While debugging timeouts happening in my application workload (ScyllaDB), I have
observed calls to open() taking a long time, ranging everywhere from 2 seconds -
the first ones that are enough to time out my application - to more than 30
seconds.
The problem seems to happen because XFS may block on pending metadata updates
under certain circumnstances, and that's confirmed with the following backtrace
taken by the offcputime tool (iovisor/bcc):
Inspecting my run with blktrace, I can see that the xfsaild kthread exhibit very
high "Dispatch wait" times, on the dozens of seconds range and consistent with
the open() times I have saw in that run.
Still from the blktrace output, we can after searching a bit, identify the
request that wasn't dispatched:
8,0 11 152 81.092472813 804 A WM 141698288 + 8 <- (8,1) 141696240
8,0 11 153 81.092472889 804 Q WM 141698288 + 8 [xfsaild/sda1]
8,0 11 154 81.092473207 804 G WM 141698288 + 8 [xfsaild/sda1]
8,0 11 206 81.092496118 804 I WM 141698288 + 8 ( 22911) [xfsaild/sda1]
<==== 'I' means Inserted (into the IO scheduler) ===================================>
8,0 0 289372 96.718761435 0 D WM 141698288 + 8 (15626265317) [swapper/0]
<==== Only 15s later the CFQ scheduler dispatches the request ======================>
As we can see above, in this particular example CFQ took 15 seconds to dispatch
this request. Going back to the full trace, we can see that the xfsaild queue
had plenty of opportunity to run, and it was selected as the active queue many
times. It would just always be preempted by something else (example):
8,0 1 0 81.117912979 0 m N cfq1618SN / insert_request
8,0 1 0 81.117913419 0 m N cfq1618SN / add_to_rr
8,0 1 0 81.117914044 0 m N cfq1618SN / preempt
8,0 1 0 81.117914398 0 m N cfq767A / slice expired t=1
8,0 1 0 81.117914755 0 m N cfq767A / resid=40
8,0 1 0 81.117915340 0 m N / served: vt=1948520448 min_vt=1948520448
8,0 1 0 81.117915858 0 m N cfq767A / sl_used=1 disp=0 charge=0 iops=1 sect=0
where cfq767 is the xfsaild queue and cfq1618 corresponds to one of the ScyllaDB
IO dispatchers.
The requests preempting the xfsaild queue are synchronous requests. That's a
characteristic of ScyllaDB workloads, as we only ever issue O_DIRECT requests.
While it can be argued that preempting ASYNC requests in favor of SYNC is part
of the CFQ logic, I don't believe that doing so for 15+ seconds is anyone's
goal.
Moreover, unless I am misunderstanding something, that breaks the expectation
set by the "fifo_expire_async" tunable, which in my system is set to the
default.
Looking at the code, it seems to me that the issue is that after we make
an async queue active, there is no guarantee that it will execute any request.
When the queue itself tests if it cfq_may_dispatch() it can bail if it sees SYNC
requests in flight. An incoming request from another queue can also preempt it
in such situation before we have the chance to execute anything (as seen in the
trace above).
This patch sets the must_dispatch flag if we notice that we have requests
that are already fifo_expired. This flag is always cleared after
cfq_dispatch_request() returns from cfq_dispatch_requests(), so it won't pin
the queue for subsequent requests (unless they are themselves expired)
Care is taken during preempt to still allow rt requests to preempt us
regardless.
Testing my workload with this patch applied produces much better results.
From the application side I see no timeouts, and the open() latency histogram
generated by systemtap looks much better, with the worst outlier at 131ms:
Commit 209851649dc4 "acpi: nfit: Add support for hot-add" added
support for _FIT notifications, but it neglected to verify the
notification event code matches the one in the ACPI spec for
"NFIT Update". Currently there is only one code in the spec, but
once additional codes are added, older kernels (without this fix)
will misbehave by assuming all event notifications are for an
NFIT Update.
Fixes: 209851649dc4 ("acpi: nfit: Add support for hot-add") Cc: <linux-acpi@vger.kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before commit a325725633c2 ("drm: Lobotomize set_busid nonsense for !pci
drivers"), several DRM drivers for platform devices used to expose an
explicit "drm_driver.set_busid" callback, invariably backed by
drm_platform_set_busid().
Commit a325725633c2 removed drm_platform_set_busid(), along with the
referring .set_busid field initializations. This was justified because
interchangeable functionality had been implemented in drm_dev_alloc() /
drm_dev_init(), which DRM_IOCTL_SET_VERSION would rely on going forward.
However, commit a325725633c2 also removed drm_virtio_set_busid(), for
which the same consolidation was not appropriate: this .set_busid callback
had been implemented with drm_pci_set_busid(), and not
drm_platform_set_busid(). The error regressed Xorg/xserver on QEMU's
"virtio-vga" card; the drmGetBusid() function from libdrm would no longer
return stable PCI identifiers like "pci:0000:00:02.0", but rather unstable
platform ones like "virtio0".
Reinstate drm_virtio_set_busid() with judicious use of
An NULL-pointer dereference happens in cachefiles_mark_object_inactive()
when it tries to read i_blocks so that it can tell the cachefilesd daemon
how much space it's making available.
The problem is that cachefiles_drop_object() calls
cachefiles_mark_object_inactive() after calling cachefiles_delete_object()
because the object being marked active staves off attempts to (re-)use the
file at that filename until after it has been deleted. This means that
d_inode is NULL by the time we come to try to access it.
To fix the problem, have the caller of cachefiles_mark_object_inactive()
supply the number of blocks freed up.
This fixes a bug where the permission was not properly checked in
overlayfs. The testcase is ltp/utimensat01.
It is also cleaner and safer to do the permission checking in the vfs
helper instead of the caller.
This patch introduces an additional ia_valid flag ATTR_TOUCH (since
touch(1) is the most obvious user of utimes(NULL)) that is passed into
notify_change whenever the conditions for this special permission checking
mode are met.
After backporting commit ee44b4bc054a ("dlm: use sctp 1-to-1 API")
series to a kernel with an older workqueue which didn't use RCU yet, it
was noticed that we are freeing the workqueues in dlm_lowcomms_stop()
too early as free_conn() will try to access that memory for canceling
the queued works if any.
This issue was introduced by commit 0d737a8cfd83 as before it such
attempt to cancel the queued works wasn't performed, so the issue was
not present.
This patch fixes it by simply inverting the free order.
This patch changes the p8_ghash driver to use ghash-generic as a fixed
fallback implementation. This allows the correct value of descsize to be
defined directly in its shash_alg structure and avoids problems with
incorrect buffer sizes when its state is exported or imported.
Reported-by: Jan Stancek <jstancek@redhat.com> Fixes: cc333cd68dfa ("crypto: vmx - Adding GHASH routines for VMX module") Signed-off-by: Marcelo Cerri <marcelo.cerri@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When zeroing blocks for DAX allocations, we also have to unmap aliases
in the block device mappings. Otherwise writeback can overwrite zeros
with stale data from block device page cache.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently when doing a DAX hole punch with ext4 we fail to do a writeback.
This is because the logic around filemap_write_and_wait_range() in
ext4_punch_hole() only looks for dirty page cache pages in the radix tree,
not for dirty DAX exceptional entries.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pages clear buffers after ext4 delayed block allocation failed,
However, it does not clean its pte_dirty flag.
if the pages unmap ,in cording to the pte_dirty ,
unmap_page_range may try to call __set_page_dirty,
which may lead to the bugon at
mpage_prepare_extent_to_map:head = page_buffers(page);.
This patch just call clear_page_dirty_for_io to clean pte_dirty
at mpage_release_unused_pages for pages mmaped.
Steps to reproduce the bug:
(1) mmap a file in ext4
addr = (char *)mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED,
fd, 0);
memset(addr, 'i', 4096);
Now, ext4_do_update_inode() clears high 16-bit fields of uid/gid
of deleted and evicted inode to fix up interoperability with old
kernels. However, it checks only i_dtime of an inode to determine
whether the inode was deleted and evicted, and this is very risky,
because i_dtime can be used for the pointer maintaining orphan inode
list, too. We need to further check whether the i_dtime is being
used for the orphan inode list even if the i_dtime is not NULL.
We found that high 16-bit fields of uid/gid of inode are unintentionally
and permanently cleared when the inode truncation is just triggered,
but not finished, and the inode metadata, whose high uid/gid bits are
cleared, is written on disk, and the sudden power-off follows that
in order.
Online defragging of encrypted files is not currently implemented.
However, the move extent ioctl can still return successfully when
called. For example, this occurs when xfstest ext4/020 is run on an
encrypted file system, resulting in a corrupted test file and a
corresponding test failure.
Until the proper functionality is implemented, fail the move extent
ioctl if either the original or donor file is encrypted.
Thomas has reported a lockdep splat hitting in
add_transaction_credits(). The problem is that that function calls
jbd2_might_wait_for_commit() while holding j_state_lock which is wrong
(we do not really wait for transaction commit while holding that lock).
Fix the problem by moving jbd2_might_wait_for_commit() into places where
we are ready to wait for transaction commit and thus j_state_lock is
unlocked.
Fixes: 1eaa566d368b214d99cbb973647c1b0b8102a9ae Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The filesystem used in this case is ubifs, but it can be triggered on
many other filesystems.
When preadv64() is called with offset=0xffffffff000, a page with
index=0xffffffff will be added to the radix tree of ->mapping. Then
this page can be found in ->mapping with pagevec_lookup(). After that,
truncate_inode_pages_range(), which is called in ftruncate(), will fall
into an infinite loop:
- find a page with index=0xffffffff, since index>=end, this page won't
be truncated
- index++, and index become 0
- the page with index=0xffffffff will be found again
The data type of index is unsigned long, so index won't overflow to 0 on
64 bits architecture in this case, and the dead loop won't happen.
Since truncate_inode_pages_range() is executed with holding lock of
inode->i_rwsem, any operation related with this lock will be blocked,
and a hung task will happen, e.g.:
INFO: task truncate_test:3364 blocked for more than 120 seconds.
...
call_rwsem_down_write_failed+0x17/0x30
generic_file_write_iter+0x32/0x1c0
ubifs_write_iter+0xcc/0x170
__vfs_write+0xc4/0x120
vfs_write+0xb2/0x1b0
SyS_write+0x46/0xa0
The page with index=0xffffffff added to ->mapping is useless. Fix this
by checking the read position before allocating pages.
Link: http://lkml.kernel.org/r/1475151010-40166-1-git-send-email-fangwei1@huawei.com Signed-off-by: Wei Fang <fangwei1@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Patch series "mm/hugetlb: memory offline issues with hugepages", v4.
This addresses several issues with hugepages and memory offline. While
the first patch fixes a panic, and is therefore rather important, the
last patch is just a performance optimization.
The second patch fixes a theoretical issue with reserved hugepages,
while still leaving some ugly usability issue, see description.
This patch (of 3):
dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a "gigantic"
hugetlb page with a size > memory block size.
When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
will trigger directly. In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not work on the
head page of the compound hugetlb page which will result in a NULL
hstate from page_hstate().
To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page(). This means
that an unused pre-allocated gigantic page that has any part of itself
inside the memory block that is going offline will be dissolved
completely. Losing an unused gigantic hugepage is preferable to failing
the memory offline, for example in the situation where a (possibly
faulty) memory DIMM needs to go offline.
Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Link: http://lkml.kernel.org/r/20160926172811.94033-2-gerald.schaefer@de.ibm.com Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Rui Teng <rui.teng@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
race:
sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
- a non-simple operations is ongoing (sma->sem_perm.lock held)
- a complex operation is sleeping (sma->complex_count != 0)
As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions. See below for more details
(or kernel bugzilla 105651).
The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.
The patch also updates stale documentation regarding the locking.
With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?)
The alternative is to revert the patch that introduced the race.
The patch is safe for backporting, i.e. it makes no assumptions
about memory barriers in spin_unlock_wait().
Background:
Here is the race of the current implementation:
Thread A: (simple op)
- does the first "sma->complex_count == 0" test
Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
drops sem_lock, sleeps
Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test
Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count
Thread A:
- does the complex_count test
Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.
If a VFC port gets unmapped in the VIOS, it may not respond with a CRQ
init complete following H_REG_CRQ. If this occurs, we can end up having
called scsi_block_requests and not a resulting unblock until the init
complete happens, which may never occur, and we end up hanging I/O
requests. This patch ensures the host action stay set to
IBMVFC_HOST_ACTION_TGT_DEL so we move all rports into devloss state and
unblock unless we receive an init complete.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Do the user_len check first and then the ver_addr allocation so that we
can save us the kfree() on the error path when user_len is >
ARCMSR_API_DATA_BUFLEN.
Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Marco Grassi <marco.gra@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Tomas Henzl <thenzl@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to put an upper bound on "user_len" so the memcpy() doesn't
overflow.
Reported-by: Marco Grassi <marco.gra@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Seth Forshee reports that in 4.8-rcN some automounts are failing
because the requesting the automount changed.
The relevant call path is:
follow_automount()
->d_automount
autofs4_d_automount
autofs4_mount_wait
autofs4_wait
In autofs4_wait wq_uid and wq_gid are set to current_uid() and
current_gid respectively. With follow_automount now overriding creds
uid that we export to userspace changes and that breaks existing
setups.
To remove the regression set wq_uid and wq_gid from
current_real_cred()->uid and current_real_cred()->gid respectively.
This restores the current behavior as current->real_cred is identical
to current->cred except when override creds are used.
Fixes: aeaa4a79ff6a ("fs: Call d_automount with the filesystems creds") Reported-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we hold the superblock lock while calling reiserfs_quota_on_mount(), we can
deadlock our own worker - mount blocks kworker/3:2, sleeps forever more.
Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>