When a node is created via rate-new with tc-bw and a parent node,
devlink_nl_rate_set() executes the sequence of ops. It bails out on the
first failure and doesn't rollback anything. For most things that is
fine (setting some numbers), but the parent set can leak if there's
another failure after that.
That is precisely what happens when parent setting isn't the last block
in the function. After the referenced "Fixes" commit, when tc-bw fails
to be set the function bails out after having set the parent and
incremented its refcount.
There are two callers:
- devlink_nl_rate_set_doit() is fine, it just reports the error.
- but devlink_nl_rate_new_doit() frees the newly created node and leaks
the parent refcnt.
Fix that by reordering the blocks so parent setting is last and adding a
comment explaining this so future modification preserve the ordering
(hopefully).
Fixes: 566e8f108fc7 ("devlink: Extend devlink rate API with traffic classes bandwidth management")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260616110633.1449432-3-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
devlink_rate->tx_weight = weight;
}
- nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME];
- if (nla_parent) {
- err = devlink_nl_rate_parent_node_set(devlink_rate, info,
- nla_parent);
+ if (attrs[DEVLINK_ATTR_RATE_TC_BWS]) {
+ err = devlink_nl_rate_tc_bw_set(devlink_rate, info);
if (err)
return err;
}
- if (attrs[DEVLINK_ATTR_RATE_TC_BWS]) {
- err = devlink_nl_rate_tc_bw_set(devlink_rate, info);
+ /* Keep parent setting last because it takes a reference. This function
+ * has no rollback, so failing after taking the ref would leak it.
+ */
+ nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME];
+ if (nla_parent) {
+ err = devlink_nl_rate_parent_node_set(devlink_rate, info,
+ nla_parent);
if (err)
return err;
}