Stephen Finucane [Sun, 29 Nov 2020 12:50:22 +0000 (12:50 +0000)]
REST: Null out previous, current relation info
These fields don't work like we expect them to. Because we're linking to
a non-idempotent entity, an instance of 'relation', what we're storing
in either of these fields is subject to change as patches are added and
removed. This makes the information pretty much useless after the fact.
It's best to just state the patch and request that people query the
information themselves if necessary. We don't want to remove the field
entirely from the API - that would be a truly breaking change - so
instead we null it out like we do for patch tags. In a v2 API (i.e. a
major version bump) we can remove this entirely.
A small bug with the schema generation is corrected.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Related: #379
This wasn't writeable for reasons I haven't been able to figure out.
However, it's not actually needed: the 'PatchSerializer' can do the job
just fine, given enough information. This exposes a bug in DRF, which
has been reported upstream [1]. While we wait for that fix, or some
variant of it, to be merged, we must monkey patch the library.
tests: Add tests for 'patch-relation-changed' events
This event is rather odd. If you have two patches then the way a
relation is created is by creating a 'PatchRelation' instance and then
setting the 'related' attribute on the first patch followed by the
second patch. Because the event uses the 'Patch' model's 'pre_save'
signal, we'll only see events for the patch being currently saved. This
means no event will be raised for the first patch and only one event,
the one for the second patch, will be raised when the second patch is
being added to the relationship.
In hindsight, the structure of the event is off. We should have had
something like a 'patch-added-to-relationship' and a
'patch-removed-from-relationship' event, both with the same fields:
'project', 'actor', 'patch' and 'related', the latter of which would
have listed all of the _other_ patches in the relationship. Sadly, this
is an API change which means we can't do it now. We may well wish to do
so in the future though.
Signed-off-by: Stephen Finucane <stephen@that.guru>
This wasn't actually creating just a patch relation object - it was also
creating patches, which is something we already have an explicit helper
for. Clean this thing up.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Sat, 28 Nov 2020 17:32:32 +0000 (17:32 +0000)]
lib: Grant SELECT on auth_user
If a mail arrives with the 'X-Patchwork-Delegate' hint header, the
'patchwork.parser' script will need to index the users table to find the
appropriate user. This should be okay from a security perspective since
passwords are hashed and salted and the rest of the information is
mostly accessible publicly via the web UI and REST API.
Signed-off-by: Stephen Finucane <stephen@that.guru> Suggested-by: Ali Alnubani <alialnu@mellanox.com> Closes: #365
The patch_mbox view returns text/plain data without specifying a character
set, which means clients will assume the default of iso-8559-1 as defined
in the HTTP/1.1 standard. Since the data being returned is in fact utf-8
encoded, set the encoding accordingly in the HTTP Content-Type header.
Reported-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Stephen Finucane [Sat, 28 Nov 2020 16:34:01 +0000 (16:34 +0000)]
parser: Update reference to PatchComment
Commit 0686a736fbf6d869bd31bd135ba38080ac96de22 split out 'CoverLetter'
from the old 'Submission' model, removing the common 'Comment' model in
favour of distinct 'CoverComment' and 'PatchComment' models in the
process. Unfortunately we misssed some references to the old model in
the 'patchwork.parser' module. Correct these now, adding unit tests to
prevent regressions.
Signed-off-by: Stephen Finucane <stephen@that.guru> Fixes: 0686a736 ("models: Split 'CoverLetter' from 'Submission'") Closes: #384
models: Validate Project.linkname does not contain forward slash
I started by creating a project that contained a forward slash
(importing patches from https://lists.sr.ht/~sircmpwn/sr.ht-dev/) and
it fails to render the "projects" main page.
The specific error reads:
NoReverseMatch at /
Reverse for 'patch-list' with keyword arguments
'{'project_id': 'foo/bar'}' not found. 1 pattern(s) tried:
['project/(?P<project_id>[^/]+)/list/$']
which appears to explicitly disallow forward slashes.
So I think it makes sense to validate that project linkname doesn't
contain forward slahes.
This implementation uses the validate_unicode_slug validator instead of just
rejecting inputs that contain forward slashes.
Signed-off-by: Thomas Bracht Laumann Jespersen <t@laumann.xyz> Signed-off-by: Stephen Finucane <stephen@that.guru> Closes: #380
Django 3.1 adds a new admin sidebar feature that requires the
django.template.context_processors.request context processor to be enabled
in the settings.
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stephen Finucane <stephen@that.guru>
Andrew Donnellan [Thu, 27 Aug 2020 14:14:03 +0000 (00:14 +1000)]
urls: Update url pattern functions
Django 3.1 deprecates django.conf.urls.url() as an alias for
django.urls.re_path(). Also switch to using django.urls.include() rather
than django.conf.urls.include().
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Stephen Finucane <stephen@that.guru>
Rohit Sarkar [Sat, 1 Aug 2020 06:24:15 +0000 (11:54 +0530)]
management: introduce replacerelations command
The replacerelations script is used to ingest relations into Patchwork's
patch database. A patch groups file is taken as input, which on each
line contains a space separated list of patchwork ids denoting a
relation. All the existing relations in Patchwork's database are removed
and the relations read from the patch groups file are ingested.
Signed-off-by: Rohit Sarkar <rohitsarkar5398@gmail.com>
[dja: pep8, drop relations directory as empty dirs don't get stored by git,
comment about how lines are generated.] Signed-off-by: Daniel Axtens <dja@axtens.net>
Stephen Finucane [Thu, 30 Apr 2020 21:22:10 +0000 (22:22 +0100)]
tox: Add default Django version
I occasionally forget myself and run e.g. 'tox -e pyNN' when I want to
sanity check something instead of 'tox -e pyNN-djangoMM'. Add fallback
Django versions so that this doesn't crash and burn. It's less succict
than it could be since tox doesn't seem to support '!django{22,30}'
(yet!).
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Sun, 26 Apr 2020 13:17:57 +0000 (14:17 +0100)]
migrations: Resolve issues with other DB backends for 0042, 0043
0042 was using MySQL-specific SQL to delete entries in the
'patchwork_comment' table that were associated with entries in the
'patchwork_coverletter' table, while 0043 only considered MySQL and
PostgrSQL when attempting to copy fields from 'patchwork_patch' to
'patchwork_submission'. Both issues are resolved.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Fri, 10 Apr 2020 11:08:32 +0000 (12:08 +0100)]
migrations: Squash 0001 to 0040
Now that we're moving a new major version, we can squash the migrations
we have to date. This coexists alongside the old migrations, which we
can remove and replace entirely when we release 4.0, per the advice of
the 'squashmigrations' tool.
You should commit this migration but leave the old ones in place; the
new migration will be used for new installs. Once you are sure all
instances of the codebase have applied the migrations you squashed,
you can delete them.
The 'squashmigrations' tool can't parse the output of 'RunPython' blocks
so the output of the tool was less optimized than it could be. As a
result, we've manually modified this change to remove the 'RunPython'
block and unnecessary 'AlterField' entries. This was done by removing
all migrations and generating a new "initial" migration, which was then
modified to mark all strings as byte strings (as they were when we were
using Python 2 to generate these migrations) so that 0041 would apply
cleanly.
The main benefit of this change is that it significantly reduces the
startup time for unit tests. Executed on my host, the run time for a
single test goes from ~ 22 seconds to ~ 14 seconds. This is obviously
reduced for additional tests.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Oh, the follies of youth. Time to undo the damage of 2.0.0, specifically
commit 86172ccc16, which split Patch into two separate models using
concrete inheritance. As noted previously, this introduced a large
number of unavoidable JOINs across large tables and the performance
impacts these introduce are blocking other features we want, such as
improved tagging functionality. To combine these two models, we must do
the following:
- Update any references to the 'Patch' model to point to the
'Submission' model instead
- Move everything from 'Patch' to 'Submission', including both fields
and options
- Delete the 'Patch' model
- Rename the 'Submission' model to 'Patch'
With this change, our model "hierarchy" goes from:
Submission
Patch
PatchComment
Cover
CoverComment
To a nice, flat:
Patch
PatchComment
Cover
CoverComment
I expect this migration to be intensive, particularly for MySQL users
who will see their entire tables rewritten. Unfortunately I don't see
any way to resolve this in an easier manner.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Sat, 29 Feb 2020 17:20:50 +0000 (17:20 +0000)]
models: Split 'CoverLetter' from 'Submission'
We want to get rid of the split between 'Patch' and 'Submission' because
of the cost of using JOINs basically everywhere we use 'Patch'. Before
we do that, we need to move the other users of 'Submission' to other
models and other models that rely on these users sharing the common
'Submission' base. For the former, there is only one user,
'CoverLetter', while for the latter there is only the 'Comment' model.
As a result, we must do the following:
- Create a new 'Cover' model
- Create a new 'CoverComment' model
- Move everything from 'CoverLetter' to 'Cover' and all entries
associated with a 'CoverLetter' from 'Comment' to 'CoverComment'
- Delete the 'CoverLetter' model
- Rename the 'Comment' model to 'PatchComment'
This means our model "hierarchy" goes from:
Submission
Patch
CoverLetter
Comment
To:
Submission
Patch
PatchComment
Cover
CoverComment
A future change will flatten the 'Submission' and 'Patch' model.
Note that this actually highlighted a bug in Django, which has since
been reported upstream [1]. As noted there, the issue stems from MySQL's
refusal to remove an index from a foreign key when DB constraints are
used and the workaround is to remove the foreign key constraint before
altering the indexes and then re-add the constraint after.
[1] https://code.djangoproject.com/ticket/31335
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Sat, 18 Apr 2020 12:21:13 +0000 (13:21 +0100)]
travis: Resolve warnings, info messages from Travis
The following were reported by Travis' build config validation:
- root: deprecated key 'sudo' (The key `sudo` has no effect anymore.)
- env: key 'matrix' is an alias for 'jobs', using 'jobs'
- root: key 'matrix' is an alias for 'jobs', using 'jobs'
- root: missing 'os', using the default 'linux'
Resolve all of the above.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Jeremy Kerr [Thu, 16 Apr 2020 01:29:25 +0000 (09:29 +0800)]
tests: ensure we don't see database errors during duplicate insert
Currently, the parser causes IntegrityErrors while inserting duplicate
patches; these tend to pollute database logs.
This change adds a check, which currently fails, to ensure we do not
cause errors during a duplicate patch parse.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Stephen Finucane <stephen@that.guru>
[stephenfin: Add 'expectedFailure' marker to keep all tests green]
Andrew Donnellan [Wed, 15 Apr 2020 09:06:56 +0000 (19:06 +1000)]
parser: Don't crash when From: is list email but has weird mangle format
get_original_sender() tries to demangle DMARC-mangled From headers, in
the case where the email's From address is the list address. It knows how
to handle Google Groups and Mailman style mangling, where the original
submitter's name will be turned into e.g. "Andrew Donnellan via
linuxppc-dev".
If an email has the From header set to the list address but has a name that
doesn't include " via ", we'll throw an exception because stripped_name
hasn't been set. Sometimes this is because the list name is seemingly
empty, resulting in a mangled name like "Andrew Donnellan via"
without the space after "via" that we detect. Handle this as well as we can
instead, and add a test.
Fixes: 8279a84238c10 ("parser: Unmangle From: headers that have been mangled for DMARC purposes") Reported-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Wed, 15 Apr 2020 15:52:08 +0000 (16:52 +0100)]
tests: Switch to openapi-core 0.13.x
We've done the necessary work here already so this is a relatively easy
switchover. However, we do have to work around an issue whereby the
first possible matching route is used rather than the best one [1]. In
addition, we have to install from master since there are fixes missing
from the latest release, 0.13.3. Hopefully both issues will be resolved
in a future release.
Stephen Finucane [Wed, 15 Apr 2020 15:44:32 +0000 (16:44 +0100)]
tests: Drop Django 1.x support
openapi-core 0.13.x has added support for Django validation. Before we
migrate to that version and presumably remove most of this code, remove
the stuff that is *definitely* dead.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Thu, 16 Apr 2020 13:03:54 +0000 (14:03 +0100)]
docs: Resolve issues with 'relations'
Two issues here:
- 'PATCH /patches/{id}' and 'PUT /patches/{id}' expect a list of
integers on the 'related' field - not strings
- 'GET /patches' and 'GET /patches/{id}' return a list of embedded patch
objects on the 'related' field - not strings
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Thu, 16 Apr 2020 12:45:28 +0000 (13:45 +0100)]
docs: Resolve issues with 'events'
Four things to change here:
- The response is any array that can contain any type of event, not one
of them.
- The 'actor' field is nullable.
- The 'cover' field of the 'cover-created' event is an embedded cover
letter, not a string.
- The specifications for the 'current_delegate' and 'previous_delegate'
fields of the 'patch-delegated' field were apparently invalid.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Wed, 15 Apr 2020 23:51:17 +0000 (00:51 +0100)]
docs: Resolve issues with 'covers'
Two issues to correct:
- Each header in the 'headers' field can be either a string or a list
value.
- We state that the 'content' field will always have content but our
tests were configuring otherwise.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Wed, 15 Apr 2020 22:36:59 +0000 (23:36 +0100)]
docs: Resolve issues with 'patches'
Four issues to resolve:
- The 'tags' field is a key-value mapping, not an array.
- Each header in the 'headers' field can be either a string or a list
value.
- Errors are reported as a mapping of the field name to an array of
errors, not a string.
- The security type information isn't complete and doesn't account for
security types. Skip it for now.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Wed, 15 Apr 2020 22:37:17 +0000 (23:37 +0100)]
docs: Make embedded fields nullable by default
As discussed at [1], "subtypes can add restrictions, but they cannot
relax restrictions that are already in place." These fields need to be
marked nullable and then "subclassed" to set non-nullability if
required.
Stephen Finucane [Fri, 17 Apr 2020 22:07:27 +0000 (23:07 +0100)]
REST: Allow update of bundle without patches
Presently, when updating a patch we assume that patches are provided.
This isn't necessary - you might just want to make it public - and isn't
enforced by the API itself. However, because we make this assumption, we
see a HTTP 500. Resolve the issue and add tests to prevent a regression.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Resolves: #357
Daniel Axtens [Tue, 14 Apr 2020 05:34:40 +0000 (15:34 +1000)]
api: allow filtering patches and covers by msgid
In the process of fixing the previous bug, I realised that:
a) /api/patches/msgid is a perfectly reasonable thing to attempt
b) We have no way of finding a patch by message id in the API
We can't actualy make /api/patches/msgid work because it may not
be unique, but we can add a filter.
I'm shoehorning this into stable/2.2, even though it's technically
an API change: it's minor, not incompatible and in hindsight a
glaring hole.
Cc: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Jeremy Kerr <jk@ozlabs.org> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Daniel Axtens <dja@axtens.net>
Daniel Axtens [Tue, 14 Apr 2020 04:26:58 +0000 (14:26 +1000)]
api: do not fetch every patch in a patch detail view 404
mpe and jk and sfr found that the OzLabs server was melting due
to some queries downloading every patch.
Turns out if you 404 the patch detail view in the API, d-r-f attempts
to render a listbox with every single patch to fill in the 'related'
field. The bundle API also has a similar field.
Replace the multiple selection box with a text field. You can still
(AIUI) populate the relevant patch IDs manually.
This is the recommended approach per
https://www.django-rest-framework.org/topics/browsable-api/#handling-choicefield-with-large-numbers-of-items
Reported-by: Jeremy Kerr <jk@ozlabs.org> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Jeremy Kerr <jk@ozlabs.org> Server-no-longer-on-fire-by: Jeremy Kerr <jk@ozlabs.org> Reviewed-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Daniel Axtens <dja@axtens.net>
Stephen Finucane [Fri, 10 Apr 2020 16:33:26 +0000 (17:33 +0100)]
pre-commit: Use Python 3 for everything
This lets us use e.g. f-strings. We also bump the version of the default
pre-commit lib and migrate to the upstream flake8 plugin, since the old
one is now deprecated.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Note that we need to use a subclass because the 'ServerProxy' class,
rather annoyingly, does not expose a 'close()' method. Instead, you're
expected to use a context manager, which isn't useful from the context
of a 'setUp' call. We could call '__enter__' and '__exit__' manually but
this seems cleaner. Also note that 'Server' was an alias of
'ServerProxy' [1], and we're taking the opportunity to switch here.
Daniel Axtens [Tue, 17 Mar 2020 13:59:14 +0000 (00:59 +1100)]
REST: fix patch listing query
The patch listing query is punishingly slow under even very simple
filters.
The new data model in 3.0 will help _a lot_, so this is a simple fix: I
did try indexes but haven't got really deeply into the weeds of what we can do
with them.
Move a number of things from select_related to prefetch_related: we trade off
one big, inefficient query for a slightly larger number of significantly more
efficient queries.
On my laptop with 2 copies of the canonical kernel team list loaded into the
database, and considering only the API view (the JSON view is always faster)
with warm caches and considering the entire set of SQL queries:
- /api/patches/?project=1
~1.4-1.5s -> <100ms, something like 14x better
- /api/patches/?project=1&since=2010-11-01T00:00:00
~1.7-1.8s -> <560ms, something like 3x better (now dominated by the
counting query only invoked on the HTML API view,
not the pure JSON API view.)
The things I moved:
* project: this was generating SQL that looked like:
INNER JOIN `patchwork_project` T5
ON (`patchwork_submission`.`project_id` = T5.`id`)
This is correct but we've already had to join the patchwork_submission
table and perhaps as a result it seems to be inefficient.
* series__project: Likewise we've already had to join the series table,
doing another join is possibly why it is inefficient.
* delegate: I do not know why this was tanking performance. I think it
might relate to the strategy mysql was using.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Stephen Finucane <stephen@that.guru>
Daniel Axtens [Tue, 17 Mar 2020 13:59:13 +0000 (00:59 +1100)]
REST: massively improve the patch counting query under filters
The DRF web view counts the patches as part of pagination.
The query it uses is a disaster zone:
SELECT Count(*) FROM (
SELECT DISTINCT
`patchwork_submission`.`id` AS Col1,
`patchwork_submission`.`msgid` AS Col2,
`patchwork_submission`.`date` AS Col3,
`patchwork_submission`.`submitter_id` AS Col4,
`patchwork_submission`.`project_id` AS Col5,
`patchwork_submission`.`name` AS Col6,
`patchwork_patch`.`submission_ptr_id` AS Col7,
`patchwork_patch`.`commit_ref` AS Col8,
`patchwork_patch`.`pull_url` AS Col9,
`patchwork_patch`.`delegate_id` AS Col10,
`patchwork_patch`.`state_id` AS Col11,
`patchwork_patch`.`archived` AS Col12,
`patchwork_patch`.`hash` AS Col13,
`patchwork_patch`.`patch_project_id` AS Col14,
`patchwork_patch`.`series_id` AS Col15,
`patchwork_patch`.`number` AS Col16,
`patchwork_patch`.`related_id` AS Col17
FROM `patchwork_patch`
INNER JOIN `patchwork_submission`
ON (`patchwork_patch`.`submission_ptr_id`=`patchwork_submission`.`id`)
WHERE `patchwork_submission`.`project_id`=1
)
This is because django-filters adds a DISTINCT qualifier on a
ModelMultiChoiceFilter by default. I guess it makes sense and they do a
decent job of justifying it, but it causes the count to be made with
this awful subquery. (The justification is that they don't know if you're
filtering on a to-many relationship, in which case there could be
duplicate values that need to be removed.)
While fixing that, we can also tell the filter to filter on patch_project
rather than submission's project, which allows us in some cases to avoid
the join entirely.
The resultant SQL is beautiful when filtering by project only:
SELECT COUNT(*) AS `__count`
FROM `patchwork_patch`
WHERE `patchwork_patch`.`patch_project_id` = 1
On my test setup (2x canonical kernel mailing list in the db, warm cache,
my laptop) this query goes from >1s to ~10ms, a ~100x improvement.
If we filter by project and date the query is still nice, but still also
very slow:
SELECT COUNT(*) AS `__count`
FROM `patchwork_patch`
INNER JOIN `patchwork_submission`
ON (`patchwork_patch`.`submission_ptr_id`=`patchwork_submission`.`id`)
WHERE (`patchwork_patch`.`patch_project_id`=1 AND `patchwork_submission`.`date`>='2010-11-01 00:00:00')
This us from ~1.3s to a bit under 400ms - still not ideal, but I'll take
the 3x improvement!
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Stephen Finucane <stephen@that.guru>
Mete Polat [Thu, 27 Feb 2020 23:29:32 +0000 (23:29 +0000)]
REST: Add patch relations
View relations and add/update/delete them as a maintainer. Maintainers
can only create relations of patches which are part of a project they
maintain. Because this is a writable many-many nested relationship, it
behaves a little unusually. In short:
- All operations use PATCH to the 'related' field of a patch
- To relate a patch to another patch, say 7 to 19, either:
PATCH /api/patch/7 related := [19]
PATCH /api/patch/19 related := [7]
- To delete a patch from a relation, say 1, 21 and 42 are related but we
only want it to be 1 and 42:
PATCH /api/patch/21 related := []
* You _cannot_ remove a patch from a relation by patching another
patch in the relation: I'm trying to avoid read-modify-write loops.
* Relations that would be left with just 1 patch are deleted. This is
only ensured in the API - the admin interface will let you do this.
- Break-before-make: if you have [1, 12, 24] and [7, 15, 42] and you want
to end up with [1, 12, 15, 42], you have to remove 15 from the second
relation first:
PATCH /api/patch/1 related := [15] will fail with 409 Conflict.
Instead do:
PATCH /api/patch/15 related := []
PATCH /api/patch/1 related := [15]
-> 200 OK, [1, 12, 15, 42] and [7, 42] are the resulting relations
Signed-off-by: Mete Polat <metepolat2000@gmail.com> Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Daniel Axtens <dja@axtens.net>