Stephen Finucane [Sat, 18 Apr 2020 12:21:13 +0000 (13:21 +0100)]
travis: Resolve warnings, info messages from Travis
The following were reported by Travis' build config validation:
- root: deprecated key 'sudo' (The key `sudo` has no effect anymore.)
- env: key 'matrix' is an alias for 'jobs', using 'jobs'
- root: key 'matrix' is an alias for 'jobs', using 'jobs'
- root: missing 'os', using the default 'linux'
Resolve all of the above.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Jeremy Kerr [Thu, 16 Apr 2020 01:29:25 +0000 (09:29 +0800)]
tests: ensure we don't see database errors during duplicate insert
Currently, the parser causes IntegrityErrors while inserting duplicate
patches; these tend to pollute database logs.
This change adds a check, which currently fails, to ensure we do not
cause errors during a duplicate patch parse.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Stephen Finucane <stephen@that.guru>
[stephenfin: Add 'expectedFailure' marker to keep all tests green]
Andrew Donnellan [Wed, 15 Apr 2020 09:06:56 +0000 (19:06 +1000)]
parser: Don't crash when From: is list email but has weird mangle format
get_original_sender() tries to demangle DMARC-mangled From headers, in
the case where the email's From address is the list address. It knows how
to handle Google Groups and Mailman style mangling, where the original
submitter's name will be turned into e.g. "Andrew Donnellan via
linuxppc-dev".
If an email has the From header set to the list address but has a name that
doesn't include " via ", we'll throw an exception because stripped_name
hasn't been set. Sometimes this is because the list name is seemingly
empty, resulting in a mangled name like "Andrew Donnellan via"
without the space after "via" that we detect. Handle this as well as we can
instead, and add a test.
Fixes: 8279a84238c10 ("parser: Unmangle From: headers that have been mangled for DMARC purposes") Reported-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Wed, 15 Apr 2020 15:52:08 +0000 (16:52 +0100)]
tests: Switch to openapi-core 0.13.x
We've done the necessary work here already so this is a relatively easy
switchover. However, we do have to work around an issue whereby the
first possible matching route is used rather than the best one [1]. In
addition, we have to install from master since there are fixes missing
from the latest release, 0.13.3. Hopefully both issues will be resolved
in a future release.
Stephen Finucane [Wed, 15 Apr 2020 15:44:32 +0000 (16:44 +0100)]
tests: Drop Django 1.x support
openapi-core 0.13.x has added support for Django validation. Before we
migrate to that version and presumably remove most of this code, remove
the stuff that is *definitely* dead.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Thu, 16 Apr 2020 13:03:54 +0000 (14:03 +0100)]
docs: Resolve issues with 'relations'
Two issues here:
- 'PATCH /patches/{id}' and 'PUT /patches/{id}' expect a list of
integers on the 'related' field - not strings
- 'GET /patches' and 'GET /patches/{id}' return a list of embedded patch
objects on the 'related' field - not strings
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Thu, 16 Apr 2020 12:45:28 +0000 (13:45 +0100)]
docs: Resolve issues with 'events'
Four things to change here:
- The response is any array that can contain any type of event, not one
of them.
- The 'actor' field is nullable.
- The 'cover' field of the 'cover-created' event is an embedded cover
letter, not a string.
- The specifications for the 'current_delegate' and 'previous_delegate'
fields of the 'patch-delegated' field were apparently invalid.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Wed, 15 Apr 2020 23:51:17 +0000 (00:51 +0100)]
docs: Resolve issues with 'covers'
Two issues to correct:
- Each header in the 'headers' field can be either a string or a list
value.
- We state that the 'content' field will always have content but our
tests were configuring otherwise.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Wed, 15 Apr 2020 22:36:59 +0000 (23:36 +0100)]
docs: Resolve issues with 'patches'
Four issues to resolve:
- The 'tags' field is a key-value mapping, not an array.
- Each header in the 'headers' field can be either a string or a list
value.
- Errors are reported as a mapping of the field name to an array of
errors, not a string.
- The security type information isn't complete and doesn't account for
security types. Skip it for now.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Wed, 15 Apr 2020 22:37:17 +0000 (23:37 +0100)]
docs: Make embedded fields nullable by default
As discussed at [1], "subtypes can add restrictions, but they cannot
relax restrictions that are already in place." These fields need to be
marked nullable and then "subclassed" to set non-nullability if
required.
Stephen Finucane [Fri, 17 Apr 2020 22:07:27 +0000 (23:07 +0100)]
REST: Allow update of bundle without patches
Presently, when updating a patch we assume that patches are provided.
This isn't necessary - you might just want to make it public - and isn't
enforced by the API itself. However, because we make this assumption, we
see a HTTP 500. Resolve the issue and add tests to prevent a regression.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Resolves: #357
Daniel Axtens [Tue, 14 Apr 2020 05:34:40 +0000 (15:34 +1000)]
api: allow filtering patches and covers by msgid
In the process of fixing the previous bug, I realised that:
a) /api/patches/msgid is a perfectly reasonable thing to attempt
b) We have no way of finding a patch by message id in the API
We can't actualy make /api/patches/msgid work because it may not
be unique, but we can add a filter.
I'm shoehorning this into stable/2.2, even though it's technically
an API change: it's minor, not incompatible and in hindsight a
glaring hole.
Cc: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Jeremy Kerr <jk@ozlabs.org> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Daniel Axtens <dja@axtens.net>
Daniel Axtens [Tue, 14 Apr 2020 04:26:58 +0000 (14:26 +1000)]
api: do not fetch every patch in a patch detail view 404
mpe and jk and sfr found that the OzLabs server was melting due
to some queries downloading every patch.
Turns out if you 404 the patch detail view in the API, d-r-f attempts
to render a listbox with every single patch to fill in the 'related'
field. The bundle API also has a similar field.
Replace the multiple selection box with a text field. You can still
(AIUI) populate the relevant patch IDs manually.
This is the recommended approach per
https://www.django-rest-framework.org/topics/browsable-api/#handling-choicefield-with-large-numbers-of-items
Reported-by: Jeremy Kerr <jk@ozlabs.org> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Jeremy Kerr <jk@ozlabs.org> Server-no-longer-on-fire-by: Jeremy Kerr <jk@ozlabs.org> Reviewed-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Daniel Axtens <dja@axtens.net>
Stephen Finucane [Fri, 10 Apr 2020 16:33:26 +0000 (17:33 +0100)]
pre-commit: Use Python 3 for everything
This lets us use e.g. f-strings. We also bump the version of the default
pre-commit lib and migrate to the upstream flake8 plugin, since the old
one is now deprecated.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Note that we need to use a subclass because the 'ServerProxy' class,
rather annoyingly, does not expose a 'close()' method. Instead, you're
expected to use a context manager, which isn't useful from the context
of a 'setUp' call. We could call '__enter__' and '__exit__' manually but
this seems cleaner. Also note that 'Server' was an alias of
'ServerProxy' [1], and we're taking the opportunity to switch here.
Daniel Axtens [Tue, 17 Mar 2020 13:59:14 +0000 (00:59 +1100)]
REST: fix patch listing query
The patch listing query is punishingly slow under even very simple
filters.
The new data model in 3.0 will help _a lot_, so this is a simple fix: I
did try indexes but haven't got really deeply into the weeds of what we can do
with them.
Move a number of things from select_related to prefetch_related: we trade off
one big, inefficient query for a slightly larger number of significantly more
efficient queries.
On my laptop with 2 copies of the canonical kernel team list loaded into the
database, and considering only the API view (the JSON view is always faster)
with warm caches and considering the entire set of SQL queries:
- /api/patches/?project=1
~1.4-1.5s -> <100ms, something like 14x better
- /api/patches/?project=1&since=2010-11-01T00:00:00
~1.7-1.8s -> <560ms, something like 3x better (now dominated by the
counting query only invoked on the HTML API view,
not the pure JSON API view.)
The things I moved:
* project: this was generating SQL that looked like:
INNER JOIN `patchwork_project` T5
ON (`patchwork_submission`.`project_id` = T5.`id`)
This is correct but we've already had to join the patchwork_submission
table and perhaps as a result it seems to be inefficient.
* series__project: Likewise we've already had to join the series table,
doing another join is possibly why it is inefficient.
* delegate: I do not know why this was tanking performance. I think it
might relate to the strategy mysql was using.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Stephen Finucane <stephen@that.guru>
Daniel Axtens [Tue, 17 Mar 2020 13:59:13 +0000 (00:59 +1100)]
REST: massively improve the patch counting query under filters
The DRF web view counts the patches as part of pagination.
The query it uses is a disaster zone:
SELECT Count(*) FROM (
SELECT DISTINCT
`patchwork_submission`.`id` AS Col1,
`patchwork_submission`.`msgid` AS Col2,
`patchwork_submission`.`date` AS Col3,
`patchwork_submission`.`submitter_id` AS Col4,
`patchwork_submission`.`project_id` AS Col5,
`patchwork_submission`.`name` AS Col6,
`patchwork_patch`.`submission_ptr_id` AS Col7,
`patchwork_patch`.`commit_ref` AS Col8,
`patchwork_patch`.`pull_url` AS Col9,
`patchwork_patch`.`delegate_id` AS Col10,
`patchwork_patch`.`state_id` AS Col11,
`patchwork_patch`.`archived` AS Col12,
`patchwork_patch`.`hash` AS Col13,
`patchwork_patch`.`patch_project_id` AS Col14,
`patchwork_patch`.`series_id` AS Col15,
`patchwork_patch`.`number` AS Col16,
`patchwork_patch`.`related_id` AS Col17
FROM `patchwork_patch`
INNER JOIN `patchwork_submission`
ON (`patchwork_patch`.`submission_ptr_id`=`patchwork_submission`.`id`)
WHERE `patchwork_submission`.`project_id`=1
)
This is because django-filters adds a DISTINCT qualifier on a
ModelMultiChoiceFilter by default. I guess it makes sense and they do a
decent job of justifying it, but it causes the count to be made with
this awful subquery. (The justification is that they don't know if you're
filtering on a to-many relationship, in which case there could be
duplicate values that need to be removed.)
While fixing that, we can also tell the filter to filter on patch_project
rather than submission's project, which allows us in some cases to avoid
the join entirely.
The resultant SQL is beautiful when filtering by project only:
SELECT COUNT(*) AS `__count`
FROM `patchwork_patch`
WHERE `patchwork_patch`.`patch_project_id` = 1
On my test setup (2x canonical kernel mailing list in the db, warm cache,
my laptop) this query goes from >1s to ~10ms, a ~100x improvement.
If we filter by project and date the query is still nice, but still also
very slow:
SELECT COUNT(*) AS `__count`
FROM `patchwork_patch`
INNER JOIN `patchwork_submission`
ON (`patchwork_patch`.`submission_ptr_id`=`patchwork_submission`.`id`)
WHERE (`patchwork_patch`.`patch_project_id`=1 AND `patchwork_submission`.`date`>='2010-11-01 00:00:00')
This us from ~1.3s to a bit under 400ms - still not ideal, but I'll take
the 3x improvement!
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Stephen Finucane <stephen@that.guru>
Mete Polat [Thu, 27 Feb 2020 23:29:32 +0000 (23:29 +0000)]
REST: Add patch relations
View relations and add/update/delete them as a maintainer. Maintainers
can only create relations of patches which are part of a project they
maintain. Because this is a writable many-many nested relationship, it
behaves a little unusually. In short:
- All operations use PATCH to the 'related' field of a patch
- To relate a patch to another patch, say 7 to 19, either:
PATCH /api/patch/7 related := [19]
PATCH /api/patch/19 related := [7]
- To delete a patch from a relation, say 1, 21 and 42 are related but we
only want it to be 1 and 42:
PATCH /api/patch/21 related := []
* You _cannot_ remove a patch from a relation by patching another
patch in the relation: I'm trying to avoid read-modify-write loops.
* Relations that would be left with just 1 patch are deleted. This is
only ensured in the API - the admin interface will let you do this.
- Break-before-make: if you have [1, 12, 24] and [7, 15, 42] and you want
to end up with [1, 12, 15, 42], you have to remove 15 from the second
relation first:
PATCH /api/patch/1 related := [15] will fail with 409 Conflict.
Instead do:
PATCH /api/patch/15 related := []
PATCH /api/patch/1 related := [15]
-> 200 OK, [1, 12, 15, 42] and [7, 42] are the resulting relations
Signed-off-by: Mete Polat <metepolat2000@gmail.com> Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Daniel Axtens <dja@axtens.net>
Mete Polat [Thu, 27 Feb 2020 23:29:31 +0000 (23:29 +0000)]
models, templates: Add patch relations
Introduces the ability to add relations between patches. Relations are
displayed in the details page of a patch under 'Related'. Related
patches located in another projects can be viewed as well.
Changes to relations are tracked in events. Currently the display of
this is very bare in the API but that will be fixed in a subsequent patch:
this is the minimum required to avoid throwing errors when you view the
events feed.
Signed-off-by: Mete Polat <metepolat2000@gmail.com>
[dja: address some review comments from Stephen, add an admin view,
move to using Events, misc tidy-ups.] Signed-off-by: Daniel Axtens <dja@axtens.net>
Migration 0039 attempts to move patches that have ended up in an
arbitrary series due to race conditions into the correct series.
However, there are a number of race conditions that can occur here that
make this particularly tricky to do. Given that series are really just
arbitary metadata, it's not really necessary to do this...so don't.
Instead, just delete the series references that identical message IDs
and below to the same project, allowing us to add the uniqueness
constraint and prevent the issue bubbling up in the future. This means
we're still left with orphaned series but these could be fixed manually,
if necessary.
Signed-off-by: Stephen Finucane <stephen@that.guru> Tested-by: Ali Alnubani <alialnu@mellanox.com> Closes: #340
parser: Don't group patches with different versions in a series
As noted in #340 [1], if a patch from a series is dropped or
miscategorised, patches from a later revision of that series can end up
included in the earlier series rather than in their own series. This was
actually intentional as part of the fix for #105 [2]. However,
completely ignoring this information can be problematic. Refine things
by checking for versions and, if they don't match, using timeboxing to
try guess if they should be kept together. This would resolve the issue
seen in #340 while preventing a regression for #105.
Daniel Axtens [Tue, 28 Jan 2020 13:55:20 +0000 (00:55 +1100)]
pyenv: also install requirements for python2
The first time you do a migration with python3, you get a whole
lot of seemingly null changes. This is a bit annoying so I use
py2 to generate the changes. To do that, first fix the pyenv
transition so requirements are still installed for python2.
Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Stephen Finucane <stephen@that.guru>
Pranav Annam [Wed, 12 Feb 2020 03:35:47 +0000 (04:35 +0100)]
docker: update dependency for current build
The dependency libssl1.0-dev in the Dockerfile makes docker build fail:
The following packages have unmet dependencies:
libmysqlclient-dev : Depends: libssl-dev (>= 1.1.1-1ubuntu2.1~18.04.5~)
but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
There seems to be a conflict with different versions of libssl and
libmysqlclient that did not exist previously with Ubuntu 18.04.
Just use the current libssl-dev from Ubuntu 18.04 to fix the build.
Mete Polat [Wed, 29 Jan 2020 19:01:22 +0000 (20:01 +0100)]
REST: Fix duplicate project queries
Eliminates duplicate project queries caused by calling
get_absolute_url() in the embedded serializers. Following foreign keys
with 'series__project' will cache the project of the series as well as
the series itself.
Signed-off-by: Mete Polat <metepolat2000@gmail.com> Signed-off-by: Stephen Finucane <stephen@that.guru> Closes: #335
Another fix for copy-pasted pull requests, this time for cases
when something is copy-pasted from a terminal and retains all
the bogus trailing whitespace.
Add tests for the recent changes we made to how we parse multiple series
received at once. These tests actually highlighted what appeared to be
the test failure that's been intermittently breaking our CI for years
now, so the 'expectedFailure' marker has been removed in the hope that
this is actually the case.
Signed-off-by: Stephen Finucane <stephen@that.guru>
parser: Use a second query to weed out duplicate series
Annoyingly, not all email clients properly thread emails using the
message ID fields originally specified in RFC 822 [1]. Worse, some MTAs
(cough, outlook.com, cough) actually override what the client
configures, breaking the world in the process. Realising this is an
issue, Patchwork supports threading using arbitrary metadata in addition
to the RFC 822 metadata. Specifically, it uses a combination of
submitter and list-id extracted from the headers along with the series
version and total count metadata extracted from the subject. In addition
to this, we timebox things so that two or more series that match on all
of this metadata but which are sent some time apart from each other
aren't combined by accident. This does leave one edge case - duplicate
series received within the timebox will be combined. We've resigned
ourselves to this fact on the basis that it's extremely unlikely for all
of these things to go wrong at once.
Given all the above, there should be no reason that attempting to find
series by series markers should return more than one series. The
timeboxing will prevent us grouping similar looking series by accident
and the only other reason for this to happen is because we lost a race
and we should try again.
[1] https://tools.ietf.org/html/rfc822
Signed-off-by: Stephen Finucane <stephen@that.guru> Cc: Daniel Axtens <dja@axtens.net>
models: Use database constraints to prevent split Series
Currently, the 'SeriesReference' object has a unique constraint on the
two fields it has, 'series', which is a foreign key to 'Series', and
'msgid'. This is the wrong constraint. What we actually want to enforce
is that a patch, cover letter or comment is referenced by a single
series, or rather a single series per project the submission appears on.
As such, we should be enforcing uniqueness on the msgid and the project
that the patch, cover letter or comment belongs to.
This requires adding a new field to the object, 'project', since it's
not possible to do something like the following:
unique_together = [('msgid', 'series__project')]
This is detailed here [1]. In addition, the migration needs a precursor
migration step to merge any broken series.
These are failing due to differences in behavior of the backend. Since
this will never be used for production, we can simply skip these unit
tests and rely on the CI to catch potential issues.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Fri, 20 Dec 2019 10:49:21 +0000 (10:49 +0000)]
models: Add State.slug field
This reduces a lot of the tech debt we had built up around this. This
field is populated from a slugified representation of State.name and has
a uniqueness constraint. As a result, we also need to a uniqueness
constraint to the State.name field. This shouldn't be an issue and
probably should have been used from day one.
Note that there is no REST API impact for this since we've been using
state slugs all along.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Sat, 30 Nov 2019 18:48:35 +0000 (18:48 +0000)]
tests: Provide a way to disable API schema
The API schema validation is strict, in that it will error out with
invalid keys in either the request or response. Unfortunately the API
itself is not. We're hopefully going to fix this in a distant v2.0, but
for now we need a way to ensure that the API does what it's supposed to,
namely not set fields that don't exist or that the user isn't allowed to
set, even if proper error codes aren't raised.
This isn't actually used yet. That will come later.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Mete Polat [Sat, 7 Dec 2019 16:46:20 +0000 (17:46 +0100)]
docs: Add missing series index schema
Fixes: 7d8e24bc84bd ("docs: Start documenting API using OpenAPI") Signed-off-by: Mete Polat <metepolat2000@gmail.com> Reviewed-by: Stephen Finucane <stephen@that.guru>
requirements: Add pyup markers to prevent dumb PRs
Until [1] is merged, we're going to have to override what these markers
are doing. Perhaps it would be easier to just specify the markers in the
comments as the actual marker, but I like using pip's features and the
comments *should* be temporary.
[1] https://github.com/pyupio/pyup/pull/367
Signed-off-by: Stephen Finucane <stephen@that.guru>
Johan Herland [Sun, 1 Dec 2019 01:49:53 +0000 (02:49 +0100)]
Include the responsible actor in applicable events
We want to use the events as an audit log. An important part of this is
recording _who_ made the changes that the events represent.
To accomplish this, we need to know the current user (aka. request.user)
at the point where we create the Event instance. Event creation is
currently triggered by signals, but neither the signal handlers, nor the
model classes themselves have easy access to request.user.
For some Patch-based events (patch-state-changed, patch-delegated), we
can do the following hack: The relevant events are created in signal
handlers that are all hooked up to either the pre_save or post_save
signals sent by Patch.save(). But before calling Patch.save(),
Patchwork must naturally query Patch.is_editable() to ascertain whether
the patch can in fact be changed by the current user. Thus, we only
need a way to communicate the current user from Patch.is_editable()
to the signal handlers that create the resulting Events. The Patch
object itself is available in both places, so we simply add an
'_edited_by' attribute to the instance (which fortunately is not
detected as a persistent db field by Django).
For the check-created event the current user always happens to be the
same as the 'user' field recorded in the Check object itself.
For the other Patch-based events (patch-created, patch-completed, and
series-completed), although they are also triggered by Patch.save(),
they are triggered as a result of incoming emails, hence have no real
actor as such, so we simply leave the actor as None/NULL. The same
argument also applies to the cover-created and series-created events.
Signed-off-by: Johan Herland <johan@herland.net> Reviewed-by: Stephen Finucane <stephen@that.guru> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Stephen Finucane [Sat, 30 Nov 2019 17:11:13 +0000 (17:11 +0000)]
templates: Use 'en' locale
As discussed at [1], the UI was originally written in Australian English
but as it's been through a couple of pairs of hands since the chances
are things are more than a little messed up. Just use 'en' as our locale
rather than 'en-US', 'en-AU' or anything else.
Jeremy Cline [Tue, 15 Oct 2019 21:30:11 +0000 (17:30 -0400)]
Allow ordering events by date
By default, the events API orders events by date in descending order
(newest first). However, it's useful to be able to order the events by
oldest events first. For example, when a client is polling the events
API for new events since a given date and wishes to process them in
chronological order.
Signed-off-by: Jeremy Cline <jcline@redhat.com> Reviewed-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Sat, 30 Nov 2019 16:24:36 +0000 (16:24 +0000)]
REST: Exclude filters added in later version
If a person requests API version 1.1, they should get the exact same
behavior regardless of the base Patchwork version. We already do this
for fields in the output, so now extend this to filters in the
querystring.
Signed-off-by: Stephen Finucane <stephen@that.guru> Cc: Daniel Axtens <dja@axtens.net>