]>
Commit | Line | Data |
---|---|---|
bdae2828 GKH |
1 | From ebded02788b5d7c7600f8cff26ae07896d568649 Mon Sep 17 00:00:00 2001 |
2 | From: Mel Gorman <mgorman@techsingularity.net> | |
3 | Date: Tue, 15 Mar 2016 14:55:39 -0700 | |
4 | Subject: mm: filemap: avoid unnecessary calls to lock_page when waiting for IO to complete during a read | |
5 | ||
6 | From: Mel Gorman <mgorman@techsingularity.net> | |
7 | ||
8 | commit ebded02788b5d7c7600f8cff26ae07896d568649 upstream. | |
9 | ||
10 | In the generic read paths the kernel looks up a page in the page cache | |
11 | and if it's up to date, it is used. If not, the page lock is acquired | |
12 | to wait for IO to complete and then check the page. If multiple | |
13 | processes are waiting on IO, they all serialise against the lock and | |
14 | duplicate the checks. This is unnecessary. | |
15 | ||
16 | The page lock in itself does not give any guarantees to the callers | |
17 | about the page state as it can be immediately truncated or reclaimed | |
18 | after the page is unlocked. It's sufficient to wait_on_page_locked and | |
19 | then continue if the page is up to date on wakeup. | |
20 | ||
21 | It is possible that a truncated but up-to-date page is returned but the | |
22 | reference taken during read prevents it disappearing underneath the | |
23 | caller and the data is still valid if PageUptodate. | |
24 | ||
25 | The overall impact is small as even if processes serialise on the lock, | |
26 | the lock section is tiny once the IO is complete. Profiles indicated | |
27 | that unlock_page and friends are generally a tiny portion of a | |
28 | read-intensive workload. An artificial test was created that had | |
29 | instances of dd access a cache-cold file on an ext4 filesystem and | |
30 | measure how long the read took. | |
31 | ||
32 | paralleldd | |
33 | 4.4.0 4.4.0 | |
34 | vanilla avoidlock | |
35 | Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%) | |
36 | Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%) | |
37 | Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%) | |
38 | Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%) | |
39 | Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%) | |
40 | Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%) | |
41 | Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%) | |
42 | Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%) | |
43 | Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%) | |
44 | Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%) | |
45 | ||
46 | The impact is small but intuitively, it makes sense to avoid unnecessary | |
47 | calls to lock_page. | |
48 | ||
49 | Signed-off-by: Mel Gorman <mgorman@techsingularity.net> | |
50 | Reviewed-by: Jan Kara <jack@suse.cz> | |
51 | Cc: Hugh Dickins <hughd@google.com> | |
52 | Signed-off-by: Andrew Morton <akpm@linux-foundation.org> | |
53 | Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | |
54 | Signed-off-by: Mel Gorman <mgorman@suse.de> | |
55 | Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | |
56 | ||
57 | --- | |
58 | mm/filemap.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ | |
59 | 1 file changed, 49 insertions(+) | |
60 | ||
61 | --- a/mm/filemap.c | |
62 | +++ b/mm/filemap.c | |
63 | @@ -1581,6 +1581,15 @@ find_page: | |
64 | index, last_index - index); | |
65 | } | |
66 | if (!PageUptodate(page)) { | |
67 | + /* | |
68 | + * See comment in do_read_cache_page on why | |
69 | + * wait_on_page_locked is used to avoid unnecessarily | |
70 | + * serialisations and why it's safe. | |
71 | + */ | |
72 | + wait_on_page_locked_killable(page); | |
73 | + if (PageUptodate(page)) | |
74 | + goto page_ok; | |
75 | + | |
76 | if (inode->i_blkbits == PAGE_CACHE_SHIFT || | |
77 | !mapping->a_ops->is_partially_uptodate) | |
78 | goto page_not_up_to_date; | |
79 | @@ -2253,12 +2262,52 @@ filler: | |
80 | if (PageUptodate(page)) | |
81 | goto out; | |
82 | ||
83 | + /* | |
84 | + * Page is not up to date and may be locked due one of the following | |
85 | + * case a: Page is being filled and the page lock is held | |
86 | + * case b: Read/write error clearing the page uptodate status | |
87 | + * case c: Truncation in progress (page locked) | |
88 | + * case d: Reclaim in progress | |
89 | + * | |
90 | + * Case a, the page will be up to date when the page is unlocked. | |
91 | + * There is no need to serialise on the page lock here as the page | |
92 | + * is pinned so the lock gives no additional protection. Even if the | |
93 | + * the page is truncated, the data is still valid if PageUptodate as | |
94 | + * it's a race vs truncate race. | |
95 | + * Case b, the page will not be up to date | |
96 | + * Case c, the page may be truncated but in itself, the data may still | |
97 | + * be valid after IO completes as it's a read vs truncate race. The | |
98 | + * operation must restart if the page is not uptodate on unlock but | |
99 | + * otherwise serialising on page lock to stabilise the mapping gives | |
100 | + * no additional guarantees to the caller as the page lock is | |
101 | + * released before return. | |
102 | + * Case d, similar to truncation. If reclaim holds the page lock, it | |
103 | + * will be a race with remove_mapping that determines if the mapping | |
104 | + * is valid on unlock but otherwise the data is valid and there is | |
105 | + * no need to serialise with page lock. | |
106 | + * | |
107 | + * As the page lock gives no additional guarantee, we optimistically | |
108 | + * wait on the page to be unlocked and check if it's up to date and | |
109 | + * use the page if it is. Otherwise, the page lock is required to | |
110 | + * distinguish between the different cases. The motivation is that we | |
111 | + * avoid spurious serialisations and wakeups when multiple processes | |
112 | + * wait on the same page for IO to complete. | |
113 | + */ | |
114 | + wait_on_page_locked(page); | |
115 | + if (PageUptodate(page)) | |
116 | + goto out; | |
117 | + | |
118 | + /* Distinguish between all the cases under the safety of the lock */ | |
119 | lock_page(page); | |
120 | + | |
121 | + /* Case c or d, restart the operation */ | |
122 | if (!page->mapping) { | |
123 | unlock_page(page); | |
124 | page_cache_release(page); | |
125 | goto repeat; | |
126 | } | |
127 | + | |
128 | + /* Someone else locked and filled the page in a very small window */ | |
129 | if (PageUptodate(page)) { | |
130 | unlock_page(page); | |
131 | goto out; |