]>
Commit | Line | Data |
---|---|---|
2cb7cef9 BS |
1 | From: Mike Mason <mmlnx@us.ibm.com> |
2 | Date: Fri, 10 Apr 2009 08:57:03 +0000 (+0000) | |
3 | Subject: powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset() | |
4 | Patch-mainline: 2.6.30 | |
5 | Git-commit: c58dc575f3c8bdc69fb868ec51e1c80ee7cae5e7 | |
6 | References: bnc#509407 | |
7 | ||
8 | powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset() | |
9 | ||
10 | While adding native EEH support to Emulex and Qlogic drivers, it was | |
11 | discovered that dev->error_state was set to pci_io_channel_normal too | |
12 | late in the recovery process. These drivers rely on error_state to | |
13 | determine if they can access the device in their slot_reset callback, | |
14 | thus error_state needs to be set to pci_io_channel_normal in | |
15 | eeh_report_reset(). Below is a detailed explanation (courtesy of Richard | |
16 | Lary) as to why this is necessary. | |
17 | ||
18 | Background: | |
19 | PCI MMIO or DMA accesses to a frozen slot generate additional EEH | |
20 | errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the | |
21 | adapter will be shutdown. To avoid triggering excessive EEH errors and | |
22 | an undesirable adapter shutdown, some drivers use the | |
23 | pci_channel_offline(dev) wrapper function to return a Boolean value | |
24 | based on the value of pci_dev->error_state to determine if PCI MMIO or | |
25 | DMA accesses are safe. If the wrapper returns TRUE, drivers must not | |
26 | make PCI MMIO or DMA access to their hardware. | |
27 | ||
28 | The pci_dev structure member error_state reflects one of three values, | |
29 | 1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3) | |
30 | pci_channel_io_perm_failure. Function pci_channel_offline(dev) returns | |
31 | TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure. | |
32 | ||
33 | The EEH driver sets pci_dev->error_state to pci_channel_io_frozen at the | |
34 | point where the PCI slot is frozen. Currently, the EEH driver restores | |
35 | dev->error_state to pci_channel_io_normal in eeh_report_resume() before | |
36 | calling the driver's resume callback. However, when the EEH driver calls | |
37 | the driver's slot_reset callback() from eeh_report_reset(), it | |
38 | incorrectly indicates the error state is still pci_channel_io_frozen. | |
39 | ||
40 | Waiting until eeh_report_resume() to restore dev->error_state to | |
41 | pci_channel_io_normal is too late for Emulex and QLogic FC drivers and | |
42 | any other drivers which are designed to use common code paths in these | |
43 | two cases: i) those called after the driver's slot_reset callback() and | |
44 | ii) those called after the PCI slot is frozen but before the driver's | |
45 | slot_reset callback is called. Case i) all driver paths executed to | |
46 | reinitialize the hardware after a reset and case ii) all code paths | |
47 | executed by driver kernel threads that run asynchronous to the main | |
48 | driver thread, such as interrupt handlers and worker threads to process | |
49 | driver work queues. | |
50 | ||
51 | Emulex and QLogic FC drivers are designed with common code paths which | |
52 | require that pci_channel_offline(dev) reflect the true state of the | |
53 | hardware. The state transitions that the hardware takes from Normal | |
54 | Operations to Slot Frozen to Reset to Normal Operations are documented | |
55 | in the Power Architectureâ„¢ Platform Requirements+ (PAPR+) in Table 75. | |
56 | PE State Control. | |
57 | ||
58 | PAPR defines the following 3 states: | |
59 | ||
60 | 0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed | |
61 | (Normal Operations) | |
62 | 1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled | |
63 | 2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled | |
64 | (Slot Frozen) | |
65 | ||
66 | An EEH error places the slot in state 2 (Frozen) and the adapter driver | |
67 | is notified that an EEH error was detected. If the adapter driver | |
68 | returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls | |
69 | eeh_reset_device() to place the slot into state 1 (Reset) and | |
70 | eeh_reset_device completes by placing the slot into State 0 (Normal | |
71 | Operations). Upon return from eeh_reset_device(), the EEH driver calls | |
72 | eeh_report_reset, which then calls the adapter's slot_reset callback. At | |
73 | the time the adapter's slot_reset callback is called, the true state of | |
74 | the hardware is Normal Operations and should be accurately reflected by | |
75 | setting dev->error_state to pci_channel_io_normal. | |
76 | ||
77 | The current implementation of EEH driver does not do so and requires | |
78 | this change to correct this deficiency. | |
79 | ||
80 | Signed-off-by: Mike Mason <mmlnx@us.ibm.com> | |
81 | Acked-by: Linas Vepstas <linasvepstas@gmail.com> | |
82 | Signed-off-by: Paul Mackerras <paulus@samba.org> | |
83 | Acked-by: Jeff Mahoney <jeffm@suse.com> | |
84 | --- | |
85 | arch/powerpc/platforms/pseries/eeh_driver.c | 2 ++ | |
86 | 1 file changed, 2 insertions(+) | |
87 | ||
88 | --- a/arch/powerpc/platforms/pseries/eeh_driver.c | |
89 | +++ b/arch/powerpc/platforms/pseries/eeh_driver.c | |
90 | @@ -152,6 +152,8 @@ static void eeh_report_reset(struct pci_ | |
91 | if (!driver) | |
92 | return; | |
93 | ||
94 | + dev->error_state = pci_channel_io_normal; | |
95 | + | |
96 | if ((PCI_DN(dn)->eeh_mode) & EEH_MODE_IRQ_DISABLED) { | |
97 | PCI_DN(dn)->eeh_mode &= ~EEH_MODE_IRQ_DISABLED; | |
98 | enable_irq(dev->irq); |