Stephen Finucane [Thu, 11 Oct 2018 10:47:56 +0000 (11:47 +0100)]
REST: Allow setting of values using embedded serializers
Unfortunately, the use of embedded serializers for some fields breaks
the ability to update these fields, either via the HTML interface (where
the widget is totally busted) or via a client like 'git-pw'. What we
actually want is to be able to update these fields like normal primary
key but show them using the embedded serializer. We do just this by
using a modified variant of the PrimaryKeyRelatedField and using the
serializers simply for displaying.
Stephen Finucane [Thu, 11 Oct 2018 13:53:25 +0000 (14:53 +0100)]
REST: Validate patch delegate
At present, only users who are maintainers of projects can be delegated
a project. Validate this. This is currently broken due to #216 but that
will be fixed in a future change.
Jiri Benc [Thu, 28 Jun 2018 19:42:11 +0000 (15:42 -0400)]
parser: fix parsing of patches with headings
Some people tend to use lines full of '=' as a fancy way to format headings
in their commit messages in a rst-like style. However, the current parser
treats such lines as a beginning of a diff.
The only currently used tool that produces diffs with '=' lines is quilt in
the default configuration. However, even with quilt, the diff looks this
way:
It's enough to match on the "Index:" line. The state of the state machine is
kept at 1 when it encounters the '=' line, thus it's safe to remove the
match on '=' completely.
[This prevents us from properly parsing metadata out of the changelog. -dcz ]
Signed-off-by: Jiri Benc <jbenc@redhat.com> Reviewed-by: Stephen Finucane <stephen@that.guru>
(cherry picked from commit 67faf96ab96d93252c89967ef766bcbe8214c0fc)
Petr Vorel [Thu, 26 Jul 2018 07:11:14 +0000 (09:11 +0200)]
pwclient: Fix pwclient am output formatting
repr() print unicode prefix for string:
$ pwclient git-am N
Applying patch #N using u'git am'
Remove it:
$ pwclient git-am N
Applying patch #918868 using "git am"
git mixes single and double quotes, use double quotes which are more
frequently used.
Signed-off-by: Petr Vorel <petr.vorel@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net>
(cherry picked from commit ea5847ada3ac79908a2b251839e77e1b2f3dc6d2)
Stephen Finucane [Sat, 29 Sep 2018 20:11:22 +0000 (21:11 +0100)]
filters: Pre-populate delegate, submitter filters
This appears to have got lost in the transition to 'selectize.js'. In
brief, this will ensure that a previously selected delegate or submitter
is still enabled post-filtering. For more information, see [1].
We had registered an event handler on a checkbox in table header which
would call a function, 'checkboxes', on all checkboxes within that
table. This function, in turn, causes does its work and then triggers
event handlers for all modified checkboxes which include the original
table header checkbox. This resulted in the original event calling
itself recursively.
Resolve this by only modifying the checkboxes in the table body.
Signed-off-by: Stephen Finucane <stephen@that.guru> Fixes: 44fe7bae ("js: Allow shift-select of checkboxes")
(cherry picked from commit ae154148c78a75ff73c3c22f0ff0c6b3a3d01408)
Yuri Volchkov [Wed, 20 Jun 2018 12:21:42 +0000 (14:21 +0200)]
parsemail: ignore html part of multi-part comments
Currently an html-protection present only for patch-emails. If a
multi-part comment-email arrives, it messes up patchwork. In my case,
the symptom was a non intended 'Signed-off-by' in the downloaded
patches, with html-like junk.
This patch makes parsemail skip all parts of comment which are not
text/plain.
Of course, this will drop html-only emails completely. But they can
not be parsed anyways.
Signed-off-by: Yuri Volchkov <yuri.volchkov@gmail.com> Reviewed-by: Stephen Finucane <stephen@that.guru>
(cherry picked from commit e27ff061dc01e51967a978884a5c59152863ab9c)
This provides an easy way for clients to navigate to the web view. The
URL is added to four resources: bundles, comments, cover letters and
series. We could also extend this to projects and users in the future,
but the latter would require renaming an existing property while the
latter would require a public "user" page which does not currently
exists.
Signed-off-by: Stephen Finucane <stephen@that.guru> Reviewed-by: Daniel Axtens <dja@axtens.net>
Stephen Finucane [Wed, 13 Jun 2018 21:53:22 +0000 (22:53 +0100)]
Don't discard values from 'archive' filter
The pagination functionality used in the 'patchwork.view.generic_list'
generates the filter querystring from scratch. To do this, it calls the
'patchwork.filters.Filters.params' function, which in turn calls the
'patchwork.filters.Filter.key' function for each filter. If any of these
'key' functions return None, the relevant filter is not included in the
querystring. This ensures we don't end up with a load of filters like
the below:
There is one exception to this rule, however: ArchiveFilter. This is a
little unusual in that it is active by default, excluding patches that
are "archived" from the list. As a result, the 'key' function should
return None for this active state, not for the disabled state. This has
been the case up until commit d848f046 which falsely equated 'is False'
with 'is None'. This small typo resulted in the filter being ignored
when generating pagination links and essentially broke pagination for
some use cases.
Fix this up. We could probably simplify this thing greatly by not
recalculating filters for pagination at least or, better yet, by using
django-filter here too. That is a change for another day though.
Signed-off-by: Stephen Finucane <stephen@that.guru> Reported-by: John McNamara <john.mcnamara@intel.com> Reported-by: Eli Schwartz <eschwartz93@gmail.com> Fixes: d848f046 ("trivial: Don't shadow built-ins") Closes: #184
views: Raise 404 if downloading non-existent dependencies
If a patch was processed by Patchwork before series support was added,
it will not have a series associated with it. As a result, it is not
possible to extract the dependencies for that patch from the series and
a 404 should be raised. This was not previously handled correctly.
Signed-off-by: Stephen Finucane <stephen@that.guru> Reviewed-by: Daniel Axtens <dja@axtens.net> Reported-by: John McNamara <john.mcnamara@intel.com> Fixes: e2dfd490 ("views: Add 'series' parameter to '/mbox' endpoint") Closes: #189
docs: Update deployment installation guide for v2.1
There are no actual changes needed from the Patchwork side so this is
mostly a cleanup.
- Use Ubuntu 18.04 (including package names)
- Resolve some minor issues with commands
- Remove use of "trust" authentication for PostgreSQL
- Minor style changes
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Wed, 30 May 2018 10:57:59 +0000 (11:57 +0100)]
sql: Update 'grant-all.mysql' script with missing tables
These were all introduced in 2.0 and while the postgreSQL script was
fixed in commit 234bc7c3, the MySQL one was not. This suggests either
(a) no one is using this or (b) people are carrying local changes but
for now we just resolve the issues.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Daniel Axtens [Fri, 11 May 2018 17:00:44 +0000 (03:00 +1000)]
REST: Disable control for filtering patches by series in web view
As with the events view, creating and rendering the control for
filtering patches by series creates a massive slowdown. It's a little
sad not to be able to do this in the web UI as filtering patches
by series does make sense, but hopefully people figure out you can
still do it, just not from the web view.
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Stephen Finucane <stephen@that.guru>
docs: Add additional information about API versions
As we're soon going to be supporting a v1.1 API, we should document what
versions are available and whether they're supported still (hint: we
support both v1.0 and v1.1 at present).
Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Daniel Axtens <dja@axtens.net>
The root cause of this performance issue was not the use of the
JSONRenderer but rather the population of filter forms. The latter is
now disabled, meaning we can start using the original renderer.
Signed-off-by: Stephen Finucane <stephen@that.guru> Cc: Daniel Axtens <dja@axtens.net> Acked-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Daniel Axtens <dja@axtens.net>
Stephen Finucane [Thu, 10 May 2018 14:45:05 +0000 (15:45 +0100)]
REST: Resolve performance issues with '/events' web view
The dropdown select-based filters in the web view of the REST API have
stung us a few times. In this case, populating these filters for the
'/events' endpoint results in a huge query that hammers the database and
results in seriously laggy responses.
The root cause of this performance issues was erroneously identified as
an issue with the JSON renderer so that particular patch can now be
reverted. This will be done separately.
Signed-off-by: Stephen Finucane <stephen@that.guru> Cc: Daniel Axtens <dja@axtens.net> Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Daniel Axtens <dja@axtens.net>
Stephen Finucane [Wed, 11 Apr 2018 16:13:38 +0000 (17:13 +0100)]
REST: Use DRF-specific filterset
This variant of 'FilterSet' will convert a 'django.forms.ValidationError',
which wouldn't be handled by DRF, to a
'rest_framework.exceptions.ValidationError', which would be.
Signed-off-by: Stephen Finucane <stephen@that.guru>
[dja: commit message] Signed-off-by: Daniel Axtens <dja@axtens.net>
Stephen Finucane [Wed, 11 Apr 2018 16:13:36 +0000 (17:13 +0100)]
REST: Rename Filter -> FilterSet
FilterSets are to Forms as Filters are to Fields: the former is
made up of the latter. We have a FilterSet for each resource that we
wish to support filtering on (i.e. all of them).
Rename our "Filters" to the more appropriate FilterSets.
The old name was confusing and will conflict with some forthcoming changes.
Signed-off-by: Stephen Finucane <stephen@that.guru>
[dja: commit message] Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Stephen Finucane <stephen@that.guru> Fixes: 81e6f8e4 ("docs: Prepare for 2.1.0-rc1") Cc: Daniel Axtens <dja@axtens.net> Signed-off-by: Daniel Axtens <dja@axtens.net>
Yuri Volchkov [Mon, 7 May 2018 15:57:55 +0000 (01:57 +1000)]
skip original Content-Transfer-Encoding for mbox
In the commit 01b9cbb9 all original mail headers are copied into the
resulted mbox file. This means that some headers are going to be
present twice in the generated mbox. That is fine unless the original
email arrived in base64 encoding.
Apparently git relies on the latest Content-Transfer-Encoding key. And
since downloaded patch's actual encoding is '7bit', git fails to apply
it with the message 'Patch is empty'.
Since patchwork adds a proper 'Content-Transfer-Encoding' anyways,
let's skip this field while copying headers from the original mail
Explicitly distinguish between comments on patch and cover
reverse() gets confused when the same view name and kwargs are passed to
it, ignoring what endpoint the request originated from. Fix this by
using different view names for cover letter and patch comments views.
Reported-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Veronika Kabatova <vkabatov@redhat.com> Reviewed-by: Stephen Finucane <stephen@that.guru>
Daniel Axtens [Mon, 30 Apr 2018 15:44:59 +0000 (01:44 +1000)]
parsemail: Clarify exit codes
jk reports that the patchwork error codes are really unhelpful for
correct integration with an MDA. In particular they make sorting out
failures into a separate queue very difficult. Make this better and
clearer: only return 1 on a genuinely unexpected case that requires
adminstrator intervention.
Update the comment for parse_mail regarding return values and exceptions
to line up with how the function actually works and how we use the
results.
Update the tests. None of the existing tests should exit 1; they're
all 'expected' failures: unknown project etc. Also we removed the
exit(0) from the success path, so stop expecting that exception to
be raised.
Add a test for duplicates. That should also succeed without raising
an exception: dups are part of life.
Update parsearchive to deal with the fact that we can no longer
differentiate duplicates.
Reported-by: Jeremy Kerr <jk@ozlabs.org> Fixes: #171 Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Tue, 17 Apr 2018 08:58:00 +0000 (09:58 +0100)]
docker-compose: Switch to 3.0 syntax
Ubuntu 18.04 (Bionic Beaver) providers 'docker-compose' 1.17.1 [1] at
release which supports the 3.0 syntax [2]. Using this allows some users
(me) to resolve a long standing issue caused by a UID that's not 1000.
Stephen Finucane [Tue, 17 Apr 2018 08:50:44 +0000 (09:50 +0100)]
tests: Remove Selenium tests
These were added quite some time ago in order to allow some level of UI
testing. However, I've personally never used them, they're not used by
the CI, and no one has shown any desire in extending them in their time
here. It is time to bid these tests adieu.
Removing these allows us to remove a whole load of wiring that existed
just to enable these. Some of this, like the '--quick-tox' option for
the Dockerfile, is retained so we don't need to use different commands
for various versions of Patchwork, but the majority is just stripped
out.
Signed-off-by: Stephen Finucane <stephen@that.guru> Cc: Daniel Axtens <dja@axtens.net>
While the code on our side returns all (key, value) pairs for email
headers, Django's REST framework probably uses dictionaries behind the
scenes. This means that having multiple headers with same key (eg
'Received', which is totally valid and common situation), only one of
these headers is visible in the REST API.
Let's hack around this by returning a list of values in case the key is
present multiple times.
Signed-off-by: Veronika Kabatova <vkabatov@redhat.com> Reviewed-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Tue, 17 Apr 2018 08:37:44 +0000 (09:37 +0100)]
tests: Replace incorrect tests
In commit 683792d1, the 'test_api.py' was split into multiple
'api/test_xyz.py' files. As part of this change, the tests for cover
letter were mistakenly included in place of tests for checks. Correct
this oversight.
Stephen Finucane [Wed, 11 Apr 2018 16:36:18 +0000 (17:36 +0100)]
tox: Add 'docs' to default environments
I'd simply run 'tox' (via docker) to validate some previous patches.
Sadly that didn't catch a release note issue. Make sure this doesn't
happen again by always running 'docs'.
Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Daniel Axtens <dja@axtens.net>
Bundle tests got broken after the subject in mbox was changed from the
parsed version to the original one because the tests checked for the
presence of patch's name in the response. Fixing this turned out to be
a bit tricky since the tests check the mbox attachment and HTML
responses separately, so we need a string that would be present in both
(the intuitive idea of checking X-Patchwork-Id won't work well).
Add the patch's name to the content of the test patch so we can continue
testing things the same way, checking for the presence of patch's name.
Also add a releasenote notifying about the inclusion of the original
headers.
Reverts: b2a25342 ("Use parsed subject for mboxes") Fixes: 01b9cbb9 ("Include all email headers in mboxes") Signed-off-by: Veronika Kabatova <vkabatov@redhat.com> Reviewed-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Sun, 25 Mar 2018 18:28:21 +0000 (19:28 +0100)]
docs: Add information on REST API versioning
This isn't too prescriptive, given that so far we've only dealt with
adding new fields. However, it should serve as a guide to alert devs
that this stuff exists and should be a concern.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Stephen Finucane [Sun, 25 Mar 2018 18:28:20 +0000 (19:28 +0100)]
REST: Use versioning for modified responses
This ensures clients are getting a consistent response if they request
the old version of the API. We do this by way of extensions to the
'HyperlinkedModelSerializer' class rather than duplicating the
serializers as it results in far less duplication. This approach won't
work for a MAJOR version bump but, all going well, it will be a while
before we have to deal with one of these.
The only two fields added since API 1.0 was released, 'cover.mbox' and
'project.subject_match', are handled accordingly.
Signed-off-by: Stephen Finucane <stephen@that.guru>
With a recent change, we started using the original subject header
instead of the one we had already cleaned up at the parsing stage.
Revert this aspect of that change.
Signed-off-by: Stephen Finucane <stephen@that.guru> Fixes: 01b9cbb9 ("Include all email headers in mboxes")
Solves issue #165 (Exported mboxes should include In-Reply-To,
References, etc headers). Instead of including only a few chosen ones,
all received headers are added to mboxes.
Signed-off-by: Veronika Kabatova <vkabatov@redhat.com> Reviewed-by: Stephen Finucane <stephen@that.guru>
The docs suggested to account for git prefixes (a/, b/) using eg.
?/patchwork/views/*. My rules didn't work so I tried bare path
(patchwork/views/*) instead. Looking at the code, the prefix really is
striped away (filename = '/'.join(filename.split('/')[1:])). Fix the
documentation to reflect on what is really happening.
Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
[dja: see 7bb0ebd78ff7 ("parser: Add patch_get_filenames()")] Signed-off-by: Daniel Axtens <dja@axtens.net>
Daniel Axtens [Tue, 20 Mar 2018 22:34:15 +0000 (09:34 +1100)]
docker: set timezone to Australia/Canberra
The tzinfo package isn't installed in docker, which makes the
default timezone UTC. This is unfortunate: the Django TZ in
settings/base.py is Australia/Canberra, and having a non-UTC
TZ is good for exposing faulty assumptions about what is and
isn't UTC.
Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Stephen Finucane <stephen@that.guru>
Daniel Axtens [Mon, 12 Mar 2018 12:08:54 +0000 (23:08 +1100)]
api: Only provide JSON version of events list
Something is very, very slow in the d-r-f browsable API events renderer.
In my MySQL test (~33k patches), the CPU time to render the events list
is ~11s, and the time taken by SQL queries is only ~3s. If the JSON
renderer is used, that drops to 0.2s for the entire page (because less
CPU is used, and - for some as yet unknown reason - a *very* expensive
db query is dropped.)
In my PostgreSQL test (~100k patches), the results are even more stark:
30s of CPU time and 0.2s of DB time goes to 0.25s for the entire page.
Something is seriously, seriously wrong with whatever d-r-f is doing.
So, simply render the event list as unlinked JSON for now.
There are a few followups we should do, but this is an important start -
no-one should be able to DoS a patchwork server by just enumerating the
events!
In particular, we should find out:
- why postgres and mysql behaviour is so different.
- what on earth d-r-f is doing that makes rendering the pretty-printed
version so incredibly slow.
Daniel Axtens [Mon, 12 Mar 2018 11:07:15 +0000 (22:07 +1100)]
api: EventList: change select_related() to prefetch_related()
select_related() creates a single giant query that JOINs the required
tables together in the DB. prefetch_related() does a similar thing,
but at the Django layer - for all referenced models, it makes a
separate query to the DB to fetch them.
This massively, massively simplifies the job the DB has to do:
instead of creating a massive, sparse results table with many
columns, we do 1 query for the events, and then query for only
patches/cover letters/series/projects etc referenced in those 30
events.
Tested with cURL + JSON renderer + Postgres w/ ~100k patches,
request time went from 1.5s to 0.25s, a 6x speedup.
Tested with cURL + JSON renderer + MySQL w/ ~33k patches,
request time went from ~2.2s to ~0.20s, an ~11x speedup.
Daniel Axtens [Thu, 8 Mar 2018 01:28:22 +0000 (12:28 +1100)]
Fix slow Patch counting query
Stephen Rothwell noticed (way back in September - sorry Stephen!) that
the following query is really slow on OzLabs:
SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
INNER JOIN "patchwork_submission" ON
("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id")
WHERE ("patchwork_submission"."project_id" = 14 AND
"patchwork_patch"."state_id" IN
(SELECT U0."id" AS Col1 FROM "patchwork_state" U0
WHERE U0."action_required" = true
ORDER BY U0."ordering" ASC));
I think this is really slow because we have to join the patch and
submission table to get the project id, which we need to filter the
patches.
Duplicate the project id in the patch table itself, which allows us to
avoid the JOIN.
The new query reads as:
SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
WHERE ("patchwork_patch"."patch_project_id" = 1 AND
"patchwork_patch"."state_id" IN
(SELECT U0."id" AS Col1 FROM "patchwork_state" U0
WHERE U0."action_required" = true
ORDER BY U0."ordering" ASC));
Very simple testing on a small, artifical Postgres instance (3
projects, 102711 patches), shows speed gains of ~1.5-5x for this
query. Looking at Postgres' cost estimates (EXPLAIN) of the first
query vs the second query, we see a ~1.75x improvement there too.
I suspect the gains will be bigger on OzLabs.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Daniel Axtens <dja@axtens.net>
Patchwork saves patches, comments etc with UTC timezone and reports
this time when opening the patch details. However, internally generated
processes such as events are reported with the instance's local time.
There's nothing wrong with that and making PW timezone-aware would add
useless complexity, but in a world-wide collaboration a lot of confusion
may arise as the timezone is not reported at all. Instance's local time
might be very different from the local time of CI integrating with PW,
which is different from the local time of person dealing with it etc.
Use UTC everywhere by default instead of UTC for sumbissions and local
timezone for internally generated events (which is not reported).
Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
[dja:
- squash 2 patches: https://patchwork.ozlabs.org/patch/876744/
https://patchwork.ozlabs.org/patch/877815/
- minor changes to both patches - rejig order of migrations and
adjust wording: "happened sooner" -> "happened earlier"] Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Daniel Axtens <dja@axtens.net>
Daniel Axtens [Mon, 19 Feb 2018 14:34:15 +0000 (01:34 +1100)]
parser: don't fail on multiple SeriesReferences
Parallel parsing would occasonally fail with:
patchwork.models.MultipleObjectsReturned: get() returned more than one SeriesReference -- it returned 2!
I think these are happening if you have different processes parsing
e.g. 1/3 and 2/3 simultaneously: both will have a reference to 1/3,
in the case of 1 it will be the msgid, in the case of 2 it will be
in References. So when we come to parse 3/3, .get() finds 2 and
throws the exception.
This does not fix the creation of multiple series references; it
just causes them to be ignored. We still have serious race conditions
with series creation, but I don't yet have clear answers for them.
With this patch, they will at least not stop patches from being
processed - they'll just lead to wonky series, which we already have.
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Daniel Axtens <dja@axtens.net>
Daniel Axtens [Sun, 18 Feb 2018 01:57:03 +0000 (12:57 +1100)]
parser: use Patch.objects.create instead of save()
Attempts to do parallel parsing with MySQL threw the following errors:
_mysql_exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
Looking at the code, it was thrown when we created a patch like this:
patch = Patch(...)
patch.save()
The SQL statements that were being generated were weird:
UPDATE "patchwork_patch" SET ...
INSERT INTO "patchwork_patch" (...) VALUES (...)
As far as I can tell, the update could never work, because it was
trying to update a patch that didn't exist yet. My hypothesis is
that Django somehow didn't quite 'get' that because of the backend
complexity of the Patch model, so it tried to do an update, failed,
and then tried an insert.
Change the code to use Patch.objects.create, which makes the UPDATEs
and the weird MySQL errors go away.
Also move it up a bit earlier in the process so that if things go wrong
later at least we've committed the patch to the db.
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Daniel Axtens <dja@axtens.net>
Daniel Axtens [Sun, 18 Feb 2018 08:06:23 +0000 (19:06 +1100)]
parser: avoid an unnecessary UPDATE of Person
Analysis of SQL statements showed that when parsing an email, the row
for the Person who sent the email was always getting updated. This is
because the test for updating it only checks if the incoming mail has
*a* name attached to the email address, and not if it has a new name.
Django is not smart enough to figure that out, and so unconditionally
UPDATEs the model when asked to save.
Give it a hand - only update the model and save it if the new name is
in fact different.
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Daniel Axtens <dja@axtens.net>
Daniel Axtens [Sat, 17 Feb 2018 01:54:51 +0000 (12:54 +1100)]
parser: close a TOCTTOU bug on Person creation
find_author looks up a person by email, and if they do not exist,
creates a Person model, which may be saved later if the message
contains something valuable.
Multiple simultaneous processes can race here: both can do the SELECT,
find there is no Person, and create the model. One will succeed in
saving, the other will get an IntegrityError.
Reduce the window by making find_author into get_or_create_author, and
plumb that through. (Remove a test that specifically required find_author
to *not* create).
More importantly, cover the case where we lose the race, by using
get_or_create which handles the race case, catching the IntegrityError
internally and fetching the winning Person model.
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
[dja: post review cleanup of now-unused import] Signed-off-by: Daniel Axtens <dja@axtens.net>
Daniel Axtens [Mon, 19 Feb 2018 14:33:59 +0000 (01:33 +1100)]
parser: Handle even more exotically broken headers
An archive of the Ubuntu kernel team mailing list contains a
fascinating email that causes the following parse error:
email.errors.HeaderParseError: header value appears to contain an embedded header:
'4Mf^tnii7k\\_EnR5aobBm6Di[DZ9@AX1wJ"okBdX-UoJ>:SRn]c6DDU"qUIwfs98vF>...
The broken bit seem related to a UTF-8 quoted-printable encoded
section and to be from an internal attempt to break it over multiple
lines: here's a snippet from the error message:
'\n\t=?utf-8?q?Tnf?=\n'
but interesting the header itself does not contain the new lines, so
clearly something quite weird is happening behind the scenes!
This only throws on header.encode(): it actually makes it through
sanitise_header and into find_headers before throwing the assertion.
So, try to encode in sanitize_header as a final step.
Also, fix a hilarious* python bug that this exposes: whitespace-only
headers cause an index error!
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Daniel Axtens <dja@axtens.net>
Sometimes, multiple projects reside at the same mailing list. So far,
Patchwork only allowed a single project per mailing list, which made it
impossible for these projects to use Patchwork (unless they did some
dirty hacks).
Add a new property `subject_match` to projects and implement filtering
on (list_id, subject_match) match instead of solely list_id. Instance
admin can specify a regex on a per-project basis when the project is
created.
Signed-off-by: Veronika Kabatova <vkabatov@redhat.com> Signed-off-by: Stephen Finucane <stephen@that.guru>