]> git.ipfire.org Git - people/teissler/ipfire-2.x.git/blob - src/patches/suse-2.6.27.31/patches.arch/pseries-set-error_state-correctly-in-eeh_report_reset
Merge branch 'master' of git://git.ipfire.org/ipfire-2.x
[people/teissler/ipfire-2.x.git] / src / patches / suse-2.6.27.31 / patches.arch / pseries-set-error_state-correctly-in-eeh_report_reset
1 From: Mike Mason <mmlnx@us.ibm.com>
2 Date: Fri, 10 Apr 2009 08:57:03 +0000 (+0000)
3 Subject: powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset()
4 Patch-mainline: 2.6.30
5 Git-commit: c58dc575f3c8bdc69fb868ec51e1c80ee7cae5e7
6 References: bnc#509407
7
8 powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset()
9
10 While adding native EEH support to Emulex and Qlogic drivers, it was
11 discovered that dev->error_state was set to pci_io_channel_normal too
12 late in the recovery process. These drivers rely on error_state to
13 determine if they can access the device in their slot_reset callback,
14 thus error_state needs to be set to pci_io_channel_normal in
15 eeh_report_reset(). Below is a detailed explanation (courtesy of Richard
16 Lary) as to why this is necessary.
17
18 Background:
19 PCI MMIO or DMA accesses to a frozen slot generate additional EEH
20 errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the
21 adapter will be shutdown. To avoid triggering excessive EEH errors and
22 an undesirable adapter shutdown, some drivers use the
23 pci_channel_offline(dev) wrapper function to return a Boolean value
24 based on the value of pci_dev->error_state to determine if PCI MMIO or
25 DMA accesses are safe. If the wrapper returns TRUE, drivers must not
26 make PCI MMIO or DMA access to their hardware.
27
28 The pci_dev structure member error_state reflects one of three values,
29 1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3)
30 pci_channel_io_perm_failure. Function pci_channel_offline(dev) returns
31 TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure.
32
33 The EEH driver sets pci_dev->error_state to pci_channel_io_frozen at the
34 point where the PCI slot is frozen. Currently, the EEH driver restores
35 dev->error_state to pci_channel_io_normal in eeh_report_resume() before
36 calling the driver's resume callback. However, when the EEH driver calls
37 the driver's slot_reset callback() from eeh_report_reset(), it
38 incorrectly indicates the error state is still pci_channel_io_frozen.
39
40 Waiting until eeh_report_resume() to restore dev->error_state to
41 pci_channel_io_normal is too late for Emulex and QLogic FC drivers and
42 any other drivers which are designed to use common code paths in these
43 two cases: i) those called after the driver's slot_reset callback() and
44 ii) those called after the PCI slot is frozen but before the driver's
45 slot_reset callback is called. Case i) all driver paths executed to
46 reinitialize the hardware after a reset and case ii) all code paths
47 executed by driver kernel threads that run asynchronous to the main
48 driver thread, such as interrupt handlers and worker threads to process
49 driver work queues.
50
51 Emulex and QLogic FC drivers are designed with common code paths which
52 require that pci_channel_offline(dev) reflect the true state of the
53 hardware. The state transitions that the hardware takes from Normal
54 Operations to Slot Frozen to Reset to Normal Operations are documented
55 in the Power Architectureâ„¢ Platform Requirements+ (PAPR+) in Table 75.
56 PE State Control.
57
58 PAPR defines the following 3 states:
59
60 0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed
61 (Normal Operations)
62 1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled
63 2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled
64 (Slot Frozen)
65
66 An EEH error places the slot in state 2 (Frozen) and the adapter driver
67 is notified that an EEH error was detected. If the adapter driver
68 returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls
69 eeh_reset_device() to place the slot into state 1 (Reset) and
70 eeh_reset_device completes by placing the slot into State 0 (Normal
71 Operations). Upon return from eeh_reset_device(), the EEH driver calls
72 eeh_report_reset, which then calls the adapter's slot_reset callback. At
73 the time the adapter's slot_reset callback is called, the true state of
74 the hardware is Normal Operations and should be accurately reflected by
75 setting dev->error_state to pci_channel_io_normal.
76
77 The current implementation of EEH driver does not do so and requires
78 this change to correct this deficiency.
79
80 Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
81 Acked-by: Linas Vepstas <linasvepstas@gmail.com>
82 Signed-off-by: Paul Mackerras <paulus@samba.org>
83 Acked-by: Jeff Mahoney <jeffm@suse.com>
84 ---
85 arch/powerpc/platforms/pseries/eeh_driver.c | 2 ++
86 1 file changed, 2 insertions(+)
87
88 --- a/arch/powerpc/platforms/pseries/eeh_driver.c
89 +++ b/arch/powerpc/platforms/pseries/eeh_driver.c
90 @@ -152,6 +152,8 @@ static void eeh_report_reset(struct pci_
91 if (!driver)
92 return;
93
94 + dev->error_state = pci_channel_io_normal;
95 +
96 if ((PCI_DN(dn)->eeh_mode) & EEH_MODE_IRQ_DISABLED) {
97 PCI_DN(dn)->eeh_mode &= ~EEH_MODE_IRQ_DISABLED;
98 enable_irq(dev->irq);