Currently we leave dentry on the list until we are done with
lock_for_kill(). That guarantees that it won't have been
even scheduled for removal until we remove it from the list
and drop ->d_lock. We grab ->d_lock and rcu_read_lock()
and call lock_for_kill(). There are four possible cases:
1) lock_for_kill() has succeeded; dentry and its inode
(if any) are locked, dentry refcount is zero and we can
remove it from shrink list and feed it to shrink_kill().
2) lock_for_kill() fails since dentry has become busy.
Nothing to do, rcu_read_unlock(), remove from shrink list,
drop ->d_lock and move on.
3) lock_for_kill() fails since dentry is currently
being killed - already entered __dentry_kill(), but hasn't
reached dentry_unlist() yet. Nothing to do, we should just
do rcu_read_unlock(), remove from shrink list so that
whoever's executing __dentry_kill() would free it once they
are done, drop ->d_lock and move on - same actions as in
case (2).
4) lock_for_kill() fails since dentry has been killed
(reached dentry_unlist(), DCACHE_DENTRY_KILLED set in ->d_flags).
In that case whoever had been killing it had already seen it
on our shrink list and skipped freeing it. At that point it's
just a passive chunk of memory; rcu_read_unlock(), remove from
the list, drop ->d_lock and use dentry_free() to schedule
freeing.
While that works, there's a simpler way to do it:
* grab ->d_lock
* remove dentry from our shrink list
* if DCACHE_DENTRY_KILLED is already set, drop ->d_lock,
call dentry_free() and move on.
* otherwise grab rcu_read_lock() and call lock_for_free()
* if lock_for_kill() succeeds, feed dentry
to shrink_kill(), otherwise drop the locks and move on.
The end result is equivalent to the old variant. The only difference
arises if at the time we grab ->d_lock dentry had refcount 0 and
lock_for_kill() had failed spin_trylock() and had to drop and regain
->d_lock. Otherwise nobody can observe at which point within the
unbroken ->d_lock scope dentry had been removed from the shrink list -
all accesses to ->d_lru are under ->d_lock.
If ->d_lock had been dropped and regained, it is possible for another
thread to feed that dentry to __dentry_kill(); if it doesn't get to
dentry_unlist() before we regain ->d_lock, behaviour is still identical -
it's case (3) and by the time __dentry_kill() would've gotten around
to checking if the victim is on shrink list, it would've been already
removed from ours.
If __dentry_kill() from another thread *does* get to dentry_unlist(),
in the old variant we would have __dentry_kill() leave calling
dentry_free() to us and in the new one __dentry_kill() would've called
dentry_free() itself. Since we are under rcu_read_lock(), we are
guaranteed that actual freeing won't happen until we get around to
rcu_read_unlock(). IOW, the new variant is still safe wrt UAF, if
not for the same reason as the old one, and overall result is the same;
the only difference is which threads ends up scheduling the actual
freeing of dentry.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
dentry = list_entry(list->prev, struct dentry, d_lru);
spin_lock(&dentry->d_lock);
+ d_shrink_del(dentry);
+ if (unlikely(dentry->d_flags & DCACHE_DENTRY_KILLED)) {
+ spin_unlock(&dentry->d_lock);
+ dentry_free(dentry);
+ continue;
+ }
rcu_read_lock();
if (!lock_for_kill(dentry)) {
- bool can_free;
- rcu_read_unlock();
- d_shrink_del(dentry);
- can_free = dentry->d_flags & DCACHE_DENTRY_KILLED;
spin_unlock(&dentry->d_lock);
- if (can_free)
- dentry_free(dentry);
- continue;
+ rcu_read_unlock();
+ } else {
+ shrink_kill(dentry);
}
- d_shrink_del(dentry);
- shrink_kill(dentry);
}
}
EXPORT_SYMBOL(shrink_dentry_list);