]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob - releases/5.0.19/pci-reset-lenovo-thinkpad-p50-nvgpu-at-boot-if-necessary.patch
4.9-stable patches
[thirdparty/kernel/stable-queue.git] / releases / 5.0.19 / pci-reset-lenovo-thinkpad-p50-nvgpu-at-boot-if-necessary.patch
1 From e0547c81bfcfad01cbbfa93a5e66bb98ab932f80 Mon Sep 17 00:00:00 2001
2 From: Lyude Paul <lyude@redhat.com>
3 Date: Tue, 12 Feb 2019 17:02:30 -0500
4 Subject: PCI: Reset Lenovo ThinkPad P50 nvgpu at boot if necessary
5
6 From: Lyude Paul <lyude@redhat.com>
7
8 commit e0547c81bfcfad01cbbfa93a5e66bb98ab932f80 upstream.
9
10 On ThinkPad P50 SKUs with an Nvidia Quadro M1000M instead of the M2000M
11 variant, the BIOS does not always reset the secondary Nvidia GPU during
12 reboot if the laptop is configured in Hybrid Graphics mode. The reason is
13 unknown, but the following steps and possibly a good bit of patience will
14 reproduce the issue:
15
16 1. Boot up the laptop normally in Hybrid Graphics mode
17 2. Make sure nouveau is loaded and that the GPU is awake
18 3. Allow the Nvidia GPU to runtime suspend itself after being idle
19 4. Reboot the machine, the more sudden the better (e.g. sysrq-b may help)
20 5. If nouveau loads up properly, reboot the machine again and go back to
21 step 2 until you reproduce the issue
22
23 This results in some very strange behavior: the GPU will be left in exactly
24 the same state it was in when the previously booted kernel started the
25 reboot. This has all sorts of bad side effects: for starters, this
26 completely breaks nouveau starting with a mysterious EVO channel failure
27 that happens well before we've actually used the EVO channel for anything:
28
29 nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000 00000002
30
31 This causes a timeout trying to bring up the GR ctx:
32
33 nouveau 0000:01:00.0: timeout
34 WARNING: CPU: 0 PID: 12 at drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547 gf100_grctx_generate+0x7b2/0x850 [nouveau]
35 Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET82W (1.55 ) 12/18/2018
36 Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
37 ...
38 nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
39 nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
40 nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000008000 engine 00 [GR] client 15 [HUB/SCC_NB] reason c4 [] on channel -1 [0000000000 unknown]
41
42 The GPU never manages to recover. Booting without loading nouveau causes
43 issues as well, since the GPU starts sending spurious interrupts that cause
44 other device's IRQs to get disabled by the kernel:
45
46 irq 16: nobody cared (try booting with the "irqpoll" option)
47 ...
48 handlers:
49 [<000000007faa9e99>] i801_isr [i2c_i801]
50 Disabling IRQ #16
51 ...
52 serio: RMI4 PS/2 pass-through port at rmi4-00.fn03
53 i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
54 i801_smbus 0000:00:1f.4: Transaction timeout
55 rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-110).
56 i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
57 i801_smbus 0000:00:1f.4: Transaction timeout
58 rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts!
59
60 This causes the touchpad and sometimes other things to get disabled.
61
62 Since this happens without nouveau, we can't fix this problem from nouveau
63 itself.
64
65 Add a PCI quirk for the specific P50 variant of this GPU. Make sure the
66 GPU is advertising NoReset- so we don't reset the GPU when the machine is
67 in Dedicated graphics mode (where the GPU being initialized by the BIOS is
68 normal and expected). Map the GPU MMIO space and read the magic 0x2240c
69 register, which will have bit 1 set if the device was POSTed during a
70 previous boot. Once we've confirmed all of this, reset the GPU and
71 re-disable it - bringing it back to a healthy state.
72
73 Link: https://bugzilla.kernel.org/show_bug.cgi?id=203003
74 Link: https://lore.kernel.org/lkml/20190212220230.1568-1-lyude@redhat.com
75 Signed-off-by: Lyude Paul <lyude@redhat.com>
76 Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
77 Cc: nouveau@lists.freedesktop.org
78 Cc: dri-devel@lists.freedesktop.org
79 Cc: Karol Herbst <kherbst@redhat.com>
80 Cc: Ben Skeggs <skeggsb@gmail.com>
81 Cc: stable@vger.kernel.org
82 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
83
84 ---
85 drivers/pci/quirks.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++
86 1 file changed, 58 insertions(+)
87
88 --- a/drivers/pci/quirks.c
89 +++ b/drivers/pci/quirks.c
90 @@ -5122,3 +5122,61 @@ SWITCHTEC_QUIRK(0x8573); /* PFXI 48XG3
91 SWITCHTEC_QUIRK(0x8574); /* PFXI 64XG3 */
92 SWITCHTEC_QUIRK(0x8575); /* PFXI 80XG3 */
93 SWITCHTEC_QUIRK(0x8576); /* PFXI 96XG3 */
94 +
95 +/*
96 + * On Lenovo Thinkpad P50 SKUs with a Nvidia Quadro M1000M, the BIOS does
97 + * not always reset the secondary Nvidia GPU between reboots if the system
98 + * is configured to use Hybrid Graphics mode. This results in the GPU
99 + * being left in whatever state it was in during the *previous* boot, which
100 + * causes spurious interrupts from the GPU, which in turn causes us to
101 + * disable the wrong IRQ and end up breaking the touchpad. Unsurprisingly,
102 + * this also completely breaks nouveau.
103 + *
104 + * Luckily, it seems a simple reset of the Nvidia GPU brings it back to a
105 + * clean state and fixes all these issues.
106 + *
107 + * When the machine is configured in Dedicated display mode, the issue
108 + * doesn't occur. Fortunately the GPU advertises NoReset+ when in this
109 + * mode, so we can detect that and avoid resetting it.
110 + */
111 +static void quirk_reset_lenovo_thinkpad_p50_nvgpu(struct pci_dev *pdev)
112 +{
113 + void __iomem *map;
114 + int ret;
115 +
116 + if (pdev->subsystem_vendor != PCI_VENDOR_ID_LENOVO ||
117 + pdev->subsystem_device != 0x222e ||
118 + !pdev->reset_fn)
119 + return;
120 +
121 + if (pci_enable_device_mem(pdev))
122 + return;
123 +
124 + /*
125 + * Based on nvkm_device_ctor() in
126 + * drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
127 + */
128 + map = pci_iomap(pdev, 0, 0x23000);
129 + if (!map) {
130 + pci_err(pdev, "Can't map MMIO space\n");
131 + goto out_disable;
132 + }
133 +
134 + /*
135 + * Make sure the GPU looks like it's been POSTed before resetting
136 + * it.
137 + */
138 + if (ioread32(map + 0x2240c) & 0x2) {
139 + pci_info(pdev, FW_BUG "GPU left initialized by EFI, resetting\n");
140 + ret = pci_reset_function(pdev);
141 + if (ret < 0)
142 + pci_err(pdev, "Failed to reset GPU: %d\n", ret);
143 + }
144 +
145 + iounmap(map);
146 +out_disable:
147 + pci_disable_device(pdev);
148 +}
149 +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
150 + PCI_CLASS_DISPLAY_VGA, 8,
151 + quirk_reset_lenovo_thinkpad_p50_nvgpu);