releases/4.9.132/mac80211-fix-a-race-between-restart-and-csa-flows.patch

   1 From foo@baz Thu Oct  4 12:38:43 PDT 2018
   2 From: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
   3 Date: Fri, 31 Aug 2018 11:31:06 +0300
   4 Subject: mac80211: fix a race between restart and CSA flows
   5
   6 From: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
   7
   8 [ Upstream commit f3ffb6c3a28963657eb8b02a795d75f2ebbd5ef4 ]
   9
  10 We hit a problem with iwlwifi that was caused by a bug in
  11 mac80211. A bug in iwlwifi caused the firwmare to crash in
  12 certain cases in channel switch. Because of that bug,
  13 drv_pre_channel_switch would fail and trigger the restart
  14 flow.
  15 Now we had the hw restart worker which runs on the system's
  16 workqueue and the csa_connection_drop_work worker that runs
  17 on mac80211's workqueue that can run together. This is
  18 obviously problematic since the restart work wants to
  19 reconfigure the connection, while the csa_connection_drop_work
  20 worker does the exact opposite: it tries to disconnect.
  21
  22 Fix this by cancelling the csa_connection_drop_work worker
  23 in the restart worker.
  24
  25 Note that this can sound racy: we could have:
  26
  27 driver   iface_work   CSA_work   restart_work
  28 +++++++++++++++++++++++++++++++++++++++++++++
  29               |
  30  <--drv_cs ---|
  31 <FW CRASH!>
  32 -CS FAILED-->
  33               |                       |
  34               |                 cancel_work(CSA)
  35            schedule                   |
  36            CSA work                   |
  37                          |            |
  38                         Race between those 2
  39
  40 But this is not possible because we flush the workqueue
  41 in the restart worker before we cancel the CSA worker.
  42 That would be bullet proof if we could guarantee that
  43 we schedule the CSA worker only from the iface_work
  44 which runs on the workqueue (and not on the system's
  45 workqueue), but unfortunately we do have an instance
  46 in which we schedule the CSA work outside the context
  47 of the workqueue (ieee80211_chswitch_done).
  48
  49 Note also that we should probably cancel other workers
  50 like beacon_connection_loss_work and possibly others
  51 for different types of interfaces, at the very least,
  52 IBSS should suffer from the exact same problem, but for
  53 now, do the minimum to fix the actual bug that was actually
  54 experienced and reproduced.
  55
  56 Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
  57 Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
  58 Signed-off-by: Johannes Berg <johannes.berg@intel.com>
  59 Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
  60 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
  61 ---
  62  net/mac80211/main.c |   21 ++++++++++++++++++++-
  63  1 file changed, 20 insertions(+), 1 deletion(-)
  64
  65 --- a/net/mac80211/main.c
  66 +++ b/net/mac80211/main.c
  67 @@ -254,8 +254,27 @@ static void ieee80211_restart_work(struc
  68              "%s called with hardware scan in progress\n", __func__);
  69
  70         rtnl_lock();
  71 -       list_for_each_entry(sdata, &local->interfaces, list)
  72 +       list_for_each_entry(sdata, &local->interfaces, list) {
  73 +               /*
  74 +                * XXX: there may be more work for other vif types and even
  75 +                * for station mode: a good thing would be to run most of
  76 +                * the iface type's dependent _stop (ieee80211_mg_stop,
  77 +                * ieee80211_ibss_stop) etc...
  78 +                * For now, fix only the specific bug that was seen: race
  79 +                * between csa_connection_drop_work and us.
  80 +                */
  81 +               if (sdata->vif.type == NL80211_IFTYPE_STATION) {
  82 +                       /*
  83 +                        * This worker is scheduled from the iface worker that
  84 +                        * runs on mac80211's workqueue, so we can't be
  85 +                        * scheduling this worker after the cancel right here.
  86 +                        * The exception is ieee80211_chswitch_done.
  87 +                        * Then we can have a race...
  88 +                        */
  89 +                       cancel_work_sync(&sdata->u.mgd.csa_connection_drop_work);
  90 +               }
  91                 flush_delayed_work(&sdata->dec_tailroom_needed_wk);
  92 +       }
  93         ieee80211_scan_cancel(local);
  94
  95         /* make sure any new ROC will consider local->in_reconfig */