can: flexcan: use correct clock as base for bit rate calculation
The flexcan IP core uses the peripheral clock ("per") as basic clock for the
bit timing calculation. However the driver uses the the wrong clock ("ipg").
This leads to wrong bit rates if the rates on both clock are different.
This patch fixes the problem by using the correct clock for the bit rate
calculation.
Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Tim Harvey [Sun, 19 Oct 2014 15:07:21 +0000 (17:07 +0200)]
sky2: allow mac to come from dt
The driver reads the mac address from the device registers which would
need to have been programmed by the bootloader. This patch adds
the ability to pull the mac from devicetree via the aliases/sky2 node.
Tim Harvey [Sun, 19 Oct 2014 15:06:35 +0000 (17:06 +0200)]
PCI: imx6: add support for legacy irqs
The i.MX6 supports legacy IRQ's via 155,154,153,152. When devices
are behind a PCIe-to-PCIe switch (at least for the TI XIO2001) the
mapping is reversed from when they are behind a PCIe switch.
This patch still needs some review and clarification before going
upstream.
Tim Harvey [Thu, 17 Oct 2013 22:52:16 +0000 (15:52 -0700)]
PCI: imx6: fix imprecise abort handler
An imprecise abort is triggered when a port behind a switch is accessed
and no device is present. At enumeration, imprecise aborts are not enabled
thus this ends up getting deferred until the kernel has completed init. At
that point we must not adjust PC - the handler must do nothing, but a handler
must exist.
This fixes random crashes that occur right after freeing init.
This is against linux-pci/host-imx6.
Acked-by: Marek Vasut <marex@denx.de> Tested-by: Marek Vasut <marex@denx.de> Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Pratyush Anand [Sun, 19 Oct 2014 15:05:48 +0000 (17:05 +0200)]
pp->io_base which is the input of the outbound IO address translation
unit should be the cpu address, it was programmed wrongly to realio
address.
We should pass global_io_offset rather than sys->io_offset to
pci_ioremap_io, so we map the new window into the first available spot
in the Linux view of the I/O space.
We must also pass cpu address instead of realio address to
pci_ioremap_io.
This patch fixes above issue. It has been tested with Lecroy PTC in AIC
mode and Pericom PI7C9X2G303EL PCIe switch, which does not work
otherwise.
Signed-off-by: Pratyush Anand <pratyush.anand@st.com> Tested-by: Mohit Kumar <mohit.kumar@st.com> Tested-by: Tim Harvey <tharvey@gateworks.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Marek Vasut <marex@denx.de> Cc: Richard Zhu <Hong-Xing.Zhu@freescale.com> Cc: linux-pci@vger.kernel.org Cc: spear-devel@list.st.com
Simplify the equation to calculate ramp_delay.
Below equations are equivalent:
ramp_delay = 25000 / (2 * ramp_delay);
ramp_delay = 50000 / (4 * ramp_delay);
ramp_delay = 25000 / (2 * ramp_delay);
ramp_delay = 12500 / ramp_delay;
So we don't need to read BIT6 of rdev->desc->vsel_reg for applying different
equations.
Also use rdev->desc->vsel_reg instead of run-time calculate register address.
Signed-off-by: Axel Lin <axel.lin@ingics.com> Reviewed-by: Robin Gong <b38343@freescale.com> Signed-off-by: Mark Brown <broonie@linaro.org>
Axel Lin [Sun, 19 Oct 2014 15:05:04 +0000 (17:05 +0200)]
regulator: pfuze100: Fix n_voltages setting for SW2~SW4 with high bit set
Current code adjust min_uV and uV_step but missed adjusting the n_voltages
setting.
When BIT6 is clear:
n_voltages = (1975000 - 400000) / 25000 + 1 = 64
When BIT6 is set:
n_voltages = (3300000 - 800000) / 50000 + 1 = 51
The n_voltages needs update because when BIT6 is set 0x73 ~ 0x7f are reserved.
When using regulator_list_voltage_linear, the n_voltages does matter here
because wrong n_voltages setting make the equation return wrong result.
e.g. if selector is 63, regulator_list_voltage_linear returns
800000 + (50000 * 63) = 4000000
It should return -EINVAL if the selector is in the range of 51 ~ 63.
Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@linaro.org>
Axel Lin [Sun, 19 Oct 2014 15:04:13 +0000 (17:04 +0200)]
regulator: pfuze100: REGULATOR_PFUZE100 needs to select REGMAP_I2C
This fixes below build errors:
CC [M] drivers/regulator/pfuze100-regulator.o
drivers/regulator/pfuze100-regulator.c:342:21: error: variable 'pfuze_regmap_config' has initializer but incomplete type
drivers/regulator/pfuze100-regulator.c:343:2: error: unknown field 'reg_bits' specified in initializer
drivers/regulator/pfuze100-regulator.c:343:2: warning: excess elements in struct initializer [enabled by default]
drivers/regulator/pfuze100-regulator.c:343:2: warning: (near initialization for 'pfuze_regmap_config') [enabled by default]
drivers/regulator/pfuze100-regulator.c:344:2: error: unknown field 'val_bits' specified in initializer
drivers/regulator/pfuze100-regulator.c:344:2: warning: excess elements in struct initializer [enabled by default]
drivers/regulator/pfuze100-regulator.c:344:2: warning: (near initialization for 'pfuze_regmap_config') [enabled by default]
drivers/regulator/pfuze100-regulator.c:345:2: error: unknown field 'max_register' specified in initializer
drivers/regulator/pfuze100-regulator.c:345:2: warning: excess elements in struct initializer [enabled by default]
drivers/regulator/pfuze100-regulator.c:345:2: warning: (near initialization for 'pfuze_regmap_config') [enabled by default]
drivers/regulator/pfuze100-regulator.c:346:2: error: unknown field 'cache_type' specified in initializer
drivers/regulator/pfuze100-regulator.c:346:2: warning: excess elements in struct initializer [enabled by default]
drivers/regulator/pfuze100-regulator.c:346:2: warning: (near initialization for 'pfuze_regmap_config') [enabled by default]
drivers/regulator/pfuze100-regulator.c: In function 'pfuze100_regulator_probe':
drivers/regulator/pfuze100-regulator.c:370:2: error: implicit declaration of function 'devm_regmap_init_i2c' [-Werror=implicit-function-declaration]
drivers/regulator/pfuze100-regulator.c:370:21: warning: assignment makes pointer from integer without a cast [enabled by default]
cc1: some warnings being treated as errors
make[2]: *** [drivers/regulator/pfuze100-regulator.o] Error 1
make[1]: *** [drivers/regulator] Error 2
make: *** [drivers] Error 2
Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Robin Gong <b38343@freescale.com> Tested-by: Steffen Trumtrar <s.trumtrar@pengutronix.de> Signed-off-by: Mark Brown <broonie@linaro.org>
Shawn Guo [Mon, 24 Jun 2013 06:30:44 +0000 (14:30 +0800)]
thermal: add imx thermal driver support
This is based on the initial imx thermal work done by
Rob Lee <rob.lee@linaro.org> (Not sure if the email address is still
valid). Since he is no longer interested in the work and I have
rewritten a significant amount of the code, I just took the authorship
over from him.
It adds the imx thermal support using Temperature Monitor (TEMPMON)
block found on some Freescale i.MX SoCs. The driver uses syscon regmap
interface to access TEMPMON control registers and calibration data, and
supports cpufreq as the cooling device.
Andrew Murray [Sun, 19 Oct 2014 15:02:33 +0000 (17:02 +0200)]
of/pci: Provide support for parsing PCI DT ranges property
This patch factors out common implementation patterns to reduce overall kernel
code and provide a means for host bridge drivers to directly obtain struct
resources from the DT's ranges property without relying on architecture specific
DT handling. This will make it easier to write archiecture independent host bridge
drivers and mitigate against further duplication of DT parsing code.
Additionally the implementation takes care of adjacent ranges and merges them
into a single range (as was the case with powerpc and microblaze).
Signed-off-by: Andrew Murray <Andrew.Murray@arm.com> Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Reviewed-by: Rob Herring <rob.herring@calxeda.com> Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Tested-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Jingoo Han <jg1.han@samsung.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Tejun Heo [Sun, 19 Oct 2014 15:02:21 +0000 (17:02 +0200)]
ahci_imx: depend on CONFIG_MFD_SYSCON
ahci_imx makes use of regmap but the dependency wasn't specified in
Kconfig leading build failures if CONFIG_AHCI_IMX is enabled but
CONFIG_MFD_SYSCON is not. Add the Kconfig dependency.
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Richard Zhu [Sun, 19 Oct 2014 15:01:53 +0000 (17:01 +0200)]
ahci_imx: add ahci sata support on imx platforms
imx6q contains one Synopsys AHCI SATA controller, But it can't share
ahci_platform driver with other controllers because there are some
misalignments of the generic AHCI controller - the bits definitions of
the HBA registers, the Vendor Specific registers, the AHCI PHY clock
and the AHCI signals adjustment window(GPR13 register).
- CAP_SSS(bit20) of the HOST_CAP is writable, default value is '0',
should be configured to be '1'
- bit0 (only one AHCI SATA port on imx6q) of the HOST_PORTS_IMPL
should be set to be '1'.(default 0)
- One Vendor Specific register HOST_TIMER1MS(offset:0xe0) should be
configured regarding to the frequency of AHB bus clock.
- Configurations of the AHCI PHY clock, and the signal parameters of
the GPR13
Setup its own ahci sata driver, contained the imx6q specific
initialized codes, re-use the generic ahci_platform driver, and keep
the generic ahci_platform driver clean as much as possible.
Lothar Waßmann [Sun, 19 Oct 2014 15:01:15 +0000 (17:01 +0200)]
usb: chipidea: improve kconfig 2.0
This patch provides a cleaner solution to the problem described in
commit 20a677fd ("usb: chipidea: improve kconfig").
The goal to be achieved is to force USB_CHIPIDEA=m if either
USB_EHCI_HCD=m or USB_GADGET=m.
If both are 'y' USB_CHIPIDEA may be selected to be 'm' or 'y'.
The old patch had the drawback, that USB_CHIPIDEA could be chosen as
'y' though USB_EHCI_HCD or USB_GADGET (or both) were 'm' leading to a
situation where USB_CHIPIDEA_HOST or USB_CHIPIDEA_UDC vanished from
the config options producing a compilable but dysfunctional driver.
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de> Reviewed-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Chen [Sun, 19 Oct 2014 15:00:55 +0000 (17:00 +0200)]
usb: chipidea: improve kconfig
Randy Dunlap <rdunlap@infradead.org> reported this problem
on i386:
> drivers/built-in.o: In function `ci_hdrc_host_init':
> (.text+0x2ce75c): undefined reference to `ehci_init_driver'
>
> When USB_EHCI_HCD=m and USB_CHIPIDEA=y.
In fact, this problem is existed on all platforms which are using
chipidea driver. The root cause of this problem is the chipidea host
uses symbol exported from ehci-hcd, but chipidea core
does not depends on USB_EHCI_HCD. So, chipidea driver
will not be compiled as module if USB_EHCI_HCD=m.
It is very hard to give a perfect solution since chipidea core
depends on USB || USB_GADGET, and chipdiea host depends on
both USB_EHCI_HCD and USB_CHIPIDEA, the same problem exists for
gadget.
To fix this problem, we had to have below assumptions:
- If USB_EHCI_HCD=y && USB_GADGET=y, USB_CHIPIDEA can be 'y'.
- If USB_EHCI_HCD=m && USB_GADGET=y, USB_CHIPIDEA=m
or USB_CHIPIDEA_HOST can't be seen if USB_CHIPIDEA=y.
It will cause compile error due to no glue layer for ehci:
> error: #error "missing bus glue for ehci-hcd"
So, we had to compile USB_CHIPIDEA=m if USB_EHCI_HCD=m,
current ehci hcd core guarantee it.
- If USB_EHCI_HCD=y && USB_GADGET=m, USB_CHIPIDEA=m
or USB_CHIPIDEA_UDC can't be seen if USB_CHIPIDEA=y.
Of cos, the gadget will out of working at this situation,
so the user had to compile USB_CHIPIDEA=m.
- USB_EHCI_HCD=m && USB_GADGET=m, we can't see
USB_CHIPIDEA_HOST and USB_CHIPIDEA_UDC unless
USB_CHIPIDEA=m.
The reason why it has above assumptions:
- If both ehci core and gadget core build as module,
the chipidea has to build as module.
- If one of ehci core or gadget core is built in, another
is built as module, we can only enable the function which
is built in, or enable both roles as modules (USB_CHIPIDEA=m),
since chipidea core driver takes care of both host and device roles.
Signed-off-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Added support for Ketra N1 wireless interface, which uses the
Silicon Labs' CP2104 USB to UART bridge with customized PID 8946.
Signed-off-by: Joe Savage <joe.savage@goketra.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This full-speed USB device generates spurious remote wakeup event
as soon as USB_DEVICE_REMOTE_WAKEUP feature is set. As the result,
Linux can't enter system suspend and S0ix power saving modes once
this keyboard is used.
This patch tries to introduce USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk.
With this quirk set, wakeup capability will be ignored during
device configure.
This patch could be back-ported to kernels as old as 2.6.39.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since the addrconf router is moved to the garbage list when
lo device down, we should release this router and rellocate
a new one for ipv6 address when lo device up.
This patch solves bug 67951 on bugzilla
https://bugzilla.kernel.org/show_bug.cgi?id=67951
change from v1:
use ip6_rt_put to repleace ip6_del_rt, thanks Hannes!
change code style, suggested by Sergei.
CC: Sabrina Dubroca <sd@queasysnail.net> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix to a problem observed when losing a FIN segment that does not
contain data. In such situations, TLP is unable to recover from
*any* tail loss and instead adds at least PTO ms to the
retransmission process, i.e., RTO = RTO + PTO.
Signed-off-by: Per Hurtig <per.hurtig@kau.se> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently association restarts do not take into consideration the
state of the socket. When a restart happens, the current assocation
simply transitions into established state. This creates a condition
where a remote system, through a the restart procedure, may create a
local association that is no way reachable by user. The conditions
to trigger this are as follows:
1) Remote does not acknoledge some data causing data to remain
outstanding.
2) Local application calls close() on the socket. Since data
is still outstanding, the association is placed in SHUTDOWN_PENDING
state. However, the socket is closed.
3) The remote tries to create a new association, triggering a restart
on the local system. The association moves from SHUTDOWN_PENDING
to ESTABLISHED. At this point, it is no longer reachable by
any socket on the local system.
This patch addresses the above situation by moving the newly ESTABLISHED
association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
the COOKIE-ACK chunk. This way, the restarted associate immidiately
enters the shutdown procedure and forces the termination of the
unreachable association.
Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In xmit path, we build a flowi6 which will be used for the output route lookup.
We are sending a GRE packet, neither IPv4 nor IPv6 encapsulated packet, thus the
protocol should be IPPROTO_GRE.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Reported-by: Matthieu Ternisien d'Ouville <matthieu.tdo@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After the packet is successfully sent, we should not touch the skb
as it may have been freed. This patch is based on the work done by
Long Li <longli@microsoft.com>.
In this version of the patch I have fixed issues pointed out by David.
David, please queue this up for stable.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Tested-by: Long Li <longli@microsoft.com> Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When receiving a vlan-tagged frame that still contains
a vlan header, the length of the packet will be greater
then MTU+ETH_HLEN since it will account of the extra
vlan header. TG3 checks this for the case for 802.1Q,
but not for 802.1ad. As a result, full sized 802.1ad
frames get dropped by the card.
Add a check for 802.1ad protocol when receving full
sized frames.
Suggested-by: Prashant Sreedharan <prashant@broadcom.com> CC: Prashant Sreedharan <prashant@broadcom.com> CC: Michael Chan <mchan@broadcom.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
TG3 appears to have an issue performing TSO and checksum offloading
correclty when the frame has been vlan encapsulated (non-accelrated).
In these cases, tcp checksum is not correctly updated.
This patch attempts to work around this issue. After the patch,
802.1ad vlans start working correctly over tg3 devices.
CC: Prashant Sreedharan <prashant@broadcom.com> CC: Michael Chan <mchan@broadcom.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU.
The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get()
could return NULL if tunnel->sock->sk_dst_cache was reset just before the
call, thus making dst_mtu() dereference a NULL pointer:
Fixes: f34c4a35d879 ("l2tp: take PMTU from tunnel UDP socket") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When there are multiple vlan headers present in a received frame, the first
one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
skb beyond the VLAN TPID may be still non-linear, including the inner TCI
and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.
This leads to two things:
1. Part of the resulting ethernet header is in the non-linear part of the
skb. When eth_type_trans is called later as the result of
OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
is in fact accessing random data when it reads past the TPID.
2. network_header points into the ethernet header instead of behind it.
mac_len is set to a wrong value (10), too.
Reported-by: Yulong Pei <ypei@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
af_packet can currently overwrite kernel memory by out of bound
accesses, because it assumed a [new] block can always hold one frame.
This is not generally the case, even if most existing tools do it right.
This patch clamps too long frames as API permits, and issue a one time
error on syslog.
[ 394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82
In this example, packet header tp_snaplen was set to 3966,
and tp_len was set to 5042 (skb->len)
Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.") Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure we use the correct address-family-specific function for
handling MTU reductions from within tcp_release_cb().
Previously AF_INET6 sockets were incorrectly always using the IPv6
code path when sometimes they were handling IPv4 traffic and thus had
an IPv4 dst.
Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Willem de Bruijn <willemb@google.com> Fixes: 563d34d057862 ("tcp: dont drop MTU reduction indications") Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As of 4fddbf5d78 ("sit: strictly restrict incoming traffic to tunnel link device"),
when looking up a tunnel, tunnel's underlying interface (t->parms.link)
is verified to match incoming traffic's ingress device.
However the comparison was incorrectly based on skb->dev->iflink.
Instead, dev->ifindex should be used, which correctly represents the
interface from which the IP stack hands the ipip6 packets.
This allows setting up sit tunnels bound to vlan interfaces (otherwise
incoming ipip6 traffic on the vlan interface was dropped due to
ipip6_tunnel_lookup match failure).
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On IOMMU systems DMA mapping can fail, we need to check for
that possibility.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ondemand governor calculates load in terms of frequency and
increases it only if load_freq is greater than up_threshold
multiplied by the current or average frequency. This appears to
produce oscillations of frequency between min and max because,
for example, a relatively small load can easily saturate minimum
frequency and lead the CPU to the max. Then, it will decrease
back to the min due to small load_freq.
Change the calculation method of load and target frequency on the
basis of the following two observations:
- Load computation should not depend on the current or average
measured frequency. For example, absolute load of 80% at 100MHz
is not necessarily equivalent to 8% at 1000MHz in the next
sampling interval.
- It should be possible to increase the target frequency to any
value present in the frequency table proportional to the absolute
load, rather than to the max only, so that:
Target frequency = C * load
where we take C = policy->cpuinfo.max_freq / 100.
Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
that middle frequencies are used more, with this patch. Highest
and lowest frequencies were used less by ~9%.
[rjw: We have run multiple other tests on kernels with this
change applied and in the vast majority of cases it turns out
that the resulting performance improvement also leads to reduced
consumption of energy. The change is additionally justified by
the overall simplification of the code in question.]
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The time spent by a CPU under a given frequency is stored in jiffies unit
in the cpu var cpufreq_stats_table->time_in_state[i], i being the index of
the frequency.
This is what is displayed in the following file on the right column:
Now cpufreq converts this jiffies unit delta to clock_t before returning it
to the user as in the above file. And that conversion is achieved using the API
cputime64_to_clock_t().
Although it accidentally works on traditional tick based cputime accounting, where
cputime_t maps directly to jiffies, it doesn't work with other types of cputime
accounting such as CONFIG_VIRT_CPU_ACCOUNTING_* where cputime_t can map to nsecs
or any granularity preffered by the architecture.
For example we get a buggy zero delta on full dyntick configurations:
Fix this with using the proper jiffies_64_t to clock_t conversion.
Reported-and-tested-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In testmode and vendor command reply/event SKBs we use the
skb cb data to store nl80211 parameters between allocation
and sending. This causes the code for CONFIG_NETLINK_MMAP
to get confused, because it takes ownership of the skb cb
data when the SKB is handed off to netlink, and it doesn't
explicitly clear it.
Clear the skb cb explicitly when we're done and before it
gets passed to netlink to avoid this issue.
Reported-by: Assaf Azulay <assaf.azulay@intel.com> Reported-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since linux kernel 3.13, kthread_run() internally uses
wait_for_completion_killable(). We sometimes may use kthread_run()
while we still have a signal pending, which we used to kick our threads
out of potentially blocking network functions, causing kthread_run() to
mistake that as a new fatal signal and fail.
Fix: flush_signals() before kthread_run().
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
timeval_to_jiffies tried to round a timeval up to an integral number
of jiffies, but the logic for doing so was incorrect: intervals
corresponding to exactly N jiffies would become N+1. This manifested
itself particularly repeatedly stopping/starting an itimer:
would add a full tick to val, _even if it was exactly representable in
terms of jiffies_ (say, the result of a previous rounding.) Doing
this repeatedly would cause unbounded growth in val. So fix the math.
Here's what was wrong with the conversion: we essentially computed
(eliding seconds)
jiffies = usec * (NSEC_PER_USEC/TICK_NSEC)
by using scaling arithmetic, which took the best approximation of
NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
x/(2^USEC_JIFFIE_SC), and computed:
jiffies = (usec * x) >> USEC_JIFFIE_SC
and rounded this calculation up in the intermediate form (since we
can't necessarily exactly represent TICK_NSEC in usec.) But the
scaling arithmetic is a (very slight) *over*approximation of the true
value; that is, instead of dividing by (1 usec/ 1 jiffie), we
effectively divided by (1 usec/1 jiffie)-epsilon (rounding
down). This would normally be fine, but we want to round timeouts up,
and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
would be fine if our division was exact, but dividing this by the
slightly smaller factor was equivalent to adding just _over_ 1 to the
final result (instead of just _under_ 1, as desired.)
In particular, with HZ=1000, we consistently computed that 10000 usec
was 11 jiffies; the same was true for any exact multiple of
TICK_NSEC.
We could possibly still round in the intermediate form, adding
something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
convert usec->nsec, round in nanoseconds, and then convert using
time*spec*_to_jiffies. This adds one constant multiplication, and is
not observably slower in microbenchmarks on recent x86 hardware.
Tested: the following program:
int main() {
struct itimerval zero = {{0, 0}, {0, 0}};
/* Initially set to 10 ms. */
struct itimerval initial = zero;
initial.it_interval.tv_usec = 10000;
setitimer(ITIMER_PROF, &initial, NULL);
/* Save and restore several times. */
for (size_t i = 0; i < 10; ++i) {
struct itimerval prev;
setitimer(ITIMER_PROF, &zero, &prev);
/* on old kernels, this goes up by TICK_USEC every iteration */
printf("previous value: %ld %ld %ld %ld\n",
prev.it_interval.tv_sec, prev.it_interval.tv_usec,
prev.it_value.tv_sec, prev.it_value.tv_usec);
setitimer(ITIMER_PROF, &prev, NULL);
}
return 0;
}
Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Turner <pjt@google.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Paul Turner <pjt@google.com> Reported-by: Aaron Jacobs <jacobsa@google.com> Signed-off-by: Andrew Hunter <ahh@google.com>
[jstultz: Tweaked to apply to 3.17-rc] Signed-off-by: John Stultz <john.stultz@linaro.org>
[bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It has come to my attention (thanks Martin) that 'discard_zeroes_data'
is only a hint. Some devices in some cases don't do what it
says on the label.
The use of DISCARD in RAID5 depends on reads from discarded regions
being predictably zero. If a write to a previously discarded region
performs a read-modify-write cycle it assumes that the parity block
was consistent with the data blocks. If all were zero, this would
be the case. If some are and some aren't this would not be the case.
This could lead to data corruption after a device failure when
data needs to be reconstructed from the parity.
As we cannot trust 'discard_zeroes_data', ignore it by default
and so disallow DISCARD on all raid4/5/6 arrays.
As many devices are trustworthy, and as there are benefits to using
DISCARD, add a module parameter to over-ride this caution and cause
DISCARD to work if discard_zeroes_data is set.
If a site want to enable DISCARD on some arrays but not on others they
should select DISCARD support at the filesystem level, and set the
raid456 module parameter.
raid456.devices_handle_discard_safely=Y
As this is a data-safety issue, I believe this patch is suitable for
-stable.
DISCARD support for RAID456 was added in 3.7
Cc: Shaohua Li <shli@kernel.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Heinz Mauelshagen <heinzm@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Fixes: 620125f2bf8ff0c4969b79653b54d7bcc9d40637 Signed-off-by: NeilBrown <neilb@suse.de>
[bwh: Backported to 3.10: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The recent conversion of saa7134 to vb2 unconvered a poll() bug that
broke the teletext applications alevt and mtt. These applications
expect that calling poll() without having called VIDIOC_STREAMON will
cause poll() to return POLLERR. That did not happen in vb2.
This patch fixes that behavior. It also fixes what should happen when
poll() is called when STREAMON is called but no buffers have been
queued. In that case poll() will also return POLLERR, but only for
capture queues since output queues will always return POLLOUT
anyway in that situation.
This brings the vb2 behavior in line with the old videobuf behavior.
This patch reverts 1ba6e0b50b ("mm: numa: split_huge_page: transfer the
NUMA type from the pmd to the pte"). If a huge page is being split due
a protection change and the tail will be in a PROT_NONE vma then NUMA
hinting PTEs are temporarily created in the protected VMA.
VM_RW|VM_PROTNONE
|-----------------|
^
split here
In the specific case above, it should get fixed up by change_pte_range()
but there is a window of opportunity for weirdness to happen. Similarly,
if a huge page is shrunk and split during a protection update but before
pmd_numa is cleared then a pte_numa can be left behind.
Instead of adding complexity trying to deal with the case, this patch
will not mark PTEs NUMA when splitting a huge page. NUMA hinting faults
will not be triggered which is marginal in comparison to the complexity
in dealing with the corner cases during THP split.
In __split_huge_page_map(), the check for page_mapcount(page) is
invariant within the for loop. Because of the fact that the macro is
implemented using atomic_read(), the redundant check cannot be optimized
away by the compiler leading to unnecessary read to the page structure.
This patch moves the invariant bug check out of the loop so that it will
be done only once. On a 3.16-rc1 based kernel, the execution time of a
microbenchmark that broke up 1000 transparent huge pages using munmap()
had an execution time of 38,245us and 38,548us with and without the
patch respectively. The performance gain is about 1%.
Signed-off-by: Waiman Long <Waiman.Long@hp.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 651e22f2701b "ring-buffer: Always reset iterator to reader page"
fixed one bug but in the process caused another one. The reset is to
update the header page, but that fix also changed the way the cached
reads were updated. The cache reads are used to test if an iterator
needs to be updated or not.
A ring buffer iterator, when created, disables writes to the ring buffer
but does not stop other readers or consuming reads from happening.
Although all readers are synchronized via a lock, they are only
synchronized when in the ring buffer functions. Those functions may
be called by any number of readers. The iterator continues down when
its not interrupted by a consuming reader. If a consuming read
occurs, the iterator starts from the beginning of the buffer.
The way the iterator sees that a consuming read has happened since
its last read is by checking the reader "cache". The cache holds the
last counts of the read and the reader page itself.
Commit 651e22f2701b changed what was saved by the cache_read when
the rb_iter_reset() occurred, making the iterator never match the cache.
Then if the iterator calls rb_iter_reset(), it will go into an
infinite loop by checking if the cache doesn't match, doing the reset
and retrying, just to see that the cache still doesn't match! Which
should never happen as the reset is suppose to set the cache to the
current value and there's locks that keep a consuming reader from
having access to the data.
Fixes: 651e22f2701b "ring-buffer: Always reset iterator to reader page" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 03b8c7b623c80af264c4c8d6111e5c6289933666 ("futex: Allow
architectures to skip futex_atomic_cmpxchg_inatomic() test") added the
HAVE_FUTEX_CMPXCHG symbol right below FUTEX. This placed it right in
the middle of the options for the EXPERT menu. However,
HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops
placing items in the EXPERT menu, and displays the remaining several
EXPERT items (starting with EPOLL) directly in the General Setup menu.
Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make
HAVE_FUTEX_CMPXCHG itself depend on FUTEX. With this change, the
subsequent items display as part of the EXPERT menu again; the EMBEDDED
menu now appears as the next top-level item in the General Setup menu,
which makes General Setup much shorter and more usable.
Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by
calling perf_event_free_task() when failing sched_fork() we will not yet
have done the memset() on ->perf_event_ctxp[] and will therefore try and
'free' the inherited contexts, which are still in use by the parent
process. This is bad..
We did not implement any bound on number of indirect ICBs we follow when
loading inode. Thus corrupted medium could cause kernel to go into an
infinite loop, possibly causing a stack overflow.
Fix the possible stack overflow by removing recursion from
__udf_read_inode() and limit number of indirect ICBs we follow to avoid
infinite loops.
Signed-off-by: Jan Kara <jack@suse.cz> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
find_lock_task_mm() expects it is called under rcu or tasklist lock, but
it seems that at least oom_unkillable_task()->task_in_mem_cgroup() and
mem_cgroup_out_of_memory()->oom_badness() can call it lockless.
Perhaps we could fix the callers, but this patch simply adds rcu lock
into find_lock_task_mm(). This also allows to simplify a bit one of its
callers, oom_kill_process().
At least out_of_memory() calls has_intersects_mems_allowed() without
even rcu_read_lock(), this is obviously buggy.
Add the necessary rcu_read_lock(). This means that we can not simply
return from the loop, we need "bool ret" and "break".
While at it, swap the names of task_struct's (the argument and the
local). This cleans up the code a little bit and avoids the unnecessary
initialization.
while_each_thread() and next_thread() should die, almost every lockless
usage is wrong.
1. Unless g == current, the lockless while_each_thread() is not safe.
while_each_thread(g, t) can loop forever if g exits, next_thread()
can't reach the unhashed thread in this case. Note that this can
happen even if g is the group leader, it can exec.
2. Even if while_each_thread() itself was correct, people often use
it wrongly.
It was never safe to just take rcu_read_lock() and loop unless
you verify that pid_alive(g) == T, even the first next_thread()
can point to the already freed/reused memory.
This patch adds signal_struct->thread_head and task->thread_node to
create the normal rcu-safe list with the stable head. The new
for_each_thread(g, t) helper is always safe under rcu_read_lock() as
long as this task_struct can't go away.
Note: of course it is ugly to have both task_struct->thread_node and the
old task_struct->thread_group, we will kill it later, after we change
the users of while_each_thread() to use for_each_thread().
Perhaps we can kill it even before we convert all users, we can
reimplement next_thread(t) using the new thread_head/thread_node. But
we can't do this right now because this will lead to subtle behavioural
changes. For example, do/while_each_thread() always sees at least one
task, while for_each_thread() can do nothing if the whole thread group
has died. Or thread_group_empty(), currently its semantics is not clear
unless thread_group_leader(p) and we need to audit the callers before we
can change it.
So this patch adds the new interface which has to coexist with the old
one for some time, hopefully the next changes will be more or less
straightforward and the old one will go away soon.
Move the "if (clone_flags & CLONE_THREAD)" code down under "if
(likely(p->pid))" and turn it into into the "else" branch. This makes the
process/thread initialization more symmetrical and removes one check.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 8e3dffc651cb "Ext2: mark inode dirty after the function
dquot_free_block_nodirty is called" unveiled a bug in __ext2_get_block()
called from ext2_get_xip_mem(). That function called ext2_get_block()
mistakenly asking it to map 0 blocks while 1 was intended. Before the
above mentioned commit things worked out fine by luck but after that commit
we started returning that we allocated 0 blocks while we in fact
allocated 1 block and thus allocation was looping until all blocks in
the filesystem were exhausted.
Fix the problem by properly asking for one block and also add assertion
in ext2_get_blocks() to catch similar problems.
Reported-and-tested-by: Andiry Xu <andiry.xu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Cc: Wang Nan <wangnan0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>