From: Daniel Axtens Date: Tue, 17 Mar 2020 13:59:13 +0000 (+1100) Subject: REST: massively improve the patch counting query under filters X-Git-Tag: v2.1.6~4 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=913f1957f9341d5edc7c0b78c9c6d4a5b78f27fb;p=thirdparty%2Fpatchwork.git REST: massively improve the patch counting query under filters The DRF web view counts the patches as part of pagination. The query it uses is a disaster zone: SELECT Count(*) FROM ( SELECT DISTINCT `patchwork_submission`.`id` AS Col1, `patchwork_submission`.`msgid` AS Col2, `patchwork_submission`.`date` AS Col3, `patchwork_submission`.`submitter_id` AS Col4, `patchwork_submission`.`project_id` AS Col5, `patchwork_submission`.`name` AS Col6, `patchwork_patch`.`submission_ptr_id` AS Col7, `patchwork_patch`.`commit_ref` AS Col8, `patchwork_patch`.`pull_url` AS Col9, `patchwork_patch`.`delegate_id` AS Col10, `patchwork_patch`.`state_id` AS Col11, `patchwork_patch`.`archived` AS Col12, `patchwork_patch`.`hash` AS Col13, `patchwork_patch`.`patch_project_id` AS Col14, `patchwork_patch`.`series_id` AS Col15, `patchwork_patch`.`number` AS Col16, `patchwork_patch`.`related_id` AS Col17 FROM `patchwork_patch` INNER JOIN `patchwork_submission` ON (`patchwork_patch`.`submission_ptr_id`=`patchwork_submission`.`id`) WHERE `patchwork_submission`.`project_id`=1 ) This is because django-filters adds a DISTINCT qualifier on a ModelMultiChoiceFilter by default. I guess it makes sense and they do a decent job of justifying it, but it causes the count to be made with this awful subquery. (The justification is that they don't know if you're filtering on a to-many relationship, in which case there could be duplicate values that need to be removed.) While fixing that, we can also tell the filter to filter on patch_project rather than submission's project, which allows us in some cases to avoid the join entirely. The resultant SQL is beautiful when filtering by project only: SELECT COUNT(*) AS `__count` FROM `patchwork_patch` WHERE `patchwork_patch`.`patch_project_id` = 1 On my test setup (2x canonical kernel mailing list in the db, warm cache, my laptop) this query goes from >1s to ~10ms, a ~100x improvement. If we filter by project and date the query is still nice, but still also very slow: SELECT COUNT(*) AS `__count` FROM `patchwork_patch` INNER JOIN `patchwork_submission` ON (`patchwork_patch`.`submission_ptr_id`=`patchwork_submission`.`id`) WHERE (`patchwork_patch`.`patch_project_id`=1 AND `patchwork_submission`.`date`>='2010-11-01 00:00:00') This us from ~1.3s to a bit under 400ms - still not ideal, but I'll take the 3x improvement! Reported-by: Konstantin Ryabitsev Signed-off-by: Daniel Axtens Reviewed-by: Stephen Finucane (backported from commit 97155c0bc8881787f6c536031b678a4c3f89bda6 old django-filters uses 'name' instead of 'field_name') Signed-off-by: Daniel Axtens --- diff --git a/patchwork/api/filters.py b/patchwork/api/filters.py index ddf527fd..d374b585 100644 --- a/patchwork/api/filters.py +++ b/patchwork/api/filters.py @@ -182,7 +182,8 @@ class CoverLetterFilterSet(TimestampMixin, FilterSet): class PatchFilterSet(TimestampMixin, FilterSet): - project = ProjectFilter(queryset=Project.objects.all()) + project = ProjectFilter(queryset=Project.objects.all(), distinct=False, + name='patch_project') # NOTE(stephenfin): We disable the select-based HTML widgets for these # filters as the resulting query is _huge_ series = BaseFilter(queryset=Series.objects.all(),