From: Patrick Steinhardt <ps@pks.im>
Date: Thu, 6 Nov 2025 08:52:54 +0000 (+0100)
Subject: object: fix performance regression when peeling tags
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=7048e74609fbef2c91bfa3a80e3a9c4fc0ac04c9;p=thirdparty%2Fgit.git

object: fix performance regression when peeling tags

Our Bencher dashboards [1] have recently alerted us about a bunch of
performance regressions when writing references, specifically with the
reftable backend. There is a 3x regression when writing many refs with
preexisting refs in the reftable format, and a 10x regression when
migrating refs between backends in either of the formats.

Bisecting the issue lands us at 6ec4c0b45b (refs: don't store peeled
object IDs for invalid tags, 2025-10-23). The gist of the commit is that
we may end up storing peeled objects in both reftables and packed-refs
for corrupted tags, where the claimed tagged object type is different
than the actual tagged object type. This will then cause us to create
the `struct object *` with a wrong type, as well, and obviously nothing
good comes out of that.

The fix for this issue was to introduce a new flag to `peel_object()`
that causes us to verify the tagged object's type before writing it into
the refdb -- if the tag is corrupt, we skip writing the peeled value.
To verify whether the peeled value is correct we have to look up the
object type via the ODB and compare the actual type with the claimed
type, and that additional object lookup is costly.

This also explains why we see the regression only when writing refs with
the reftable backend, but we see the regression with both backends when
migrating refs:

  - The reftable backend knows to store peeled values in the new table
    immediately, so it has to try and peel each ref it's about to write
    to the transaction. So the performance regression is visible for all
    writes.

  - The files backend only stores peeled values when writing the
    packed-refs file, so it wouldn't hit the performance regression for
    normal writes. But on ref migrations we know to write all new values
    into the packed-refs file immediately, and that's why we see the
    regression for both backends there.

Taking a step back though reveals an oddity in the new verification
logic: we not only verify the _tagged_ object's type, but we also verify
the type of the tag itself. But this isn't really needed, as we wouldn't
hit the bug in such a case anyway, as we only hit the issue with corrupt
tags claiming an invalid type for the tagged object.

The consequence of this is that we now started to look up the target
object of every single reference we're about to write, regardless of
whether it even is a tag or not. And that is of course quite costly.

Fix the issue by only verifying the type of the tagged objects. This
means that we of course still have a performance hit for actual tags.
But this only happens for writes anyway, and I'd claim it's preferable
to not store corrupted data in the refdb than to be fast here. Rename
the flag accordingly to clarify that we only verify the tagged object's
type.

This fix brings performance back to previous levels:

    Benchmark 1: baseline
      Time (mean ± σ):      46.0 ms ±   0.4 ms    [User: 40.0 ms, System: 5.7 ms]
      Range (min … max):    45.0 ms …  47.1 ms    54 runs

    Benchmark 2: regression
      Time (mean ± σ):     140.2 ms ±   1.3 ms    [User: 77.5 ms, System: 60.5 ms]
      Range (min … max):   138.0 ms … 142.7 ms    20 runs

    Benchmark 3: fix
      Time (mean ± σ):      46.2 ms ±   0.4 ms    [User: 40.2 ms, System: 5.7 ms]
      Range (min … max):    45.0 ms …  47.3 ms    55 runs

    Summary
      update-ref: baseline
        1.00 ± 0.01 times faster than fix
        3.05 ± 0.04 times faster than regression

[1]: https://bencher.dev/perf/git/plots

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

diff --git a/object.c b/object.c
index e72b0ed436..b08fc7a163 100644
--- a/object.c
+++ b/object.c
@@ -214,7 +214,7 @@ enum peel_status peel_object(struct repository *r,
 {
 	struct object *o = lookup_unknown_object(r, name);
 
-	if (o->type == OBJ_NONE || flags & PEEL_OBJECT_VERIFY_OBJECT_TYPE) {
+	if (o->type == OBJ_NONE) {
 		int type = odb_read_object_info(r->objects, name, NULL);
 		if (type < 0 || !object_as_type(o, type, 0))
 			return PEEL_INVALID;
@@ -228,7 +228,7 @@ enum peel_status peel_object(struct repository *r,
 		if (o && o->type == OBJ_TAG && ((struct tag *)o)->tagged) {
 			o = ((struct tag *)o)->tagged;
 
-			if (flags & PEEL_OBJECT_VERIFY_OBJECT_TYPE) {
+			if (flags & PEEL_OBJECT_VERIFY_TAGGED_OBJECT_TYPE) {
 				int type = odb_read_object_info(r->objects, &o->oid, NULL);
 				if (type < 0 || !object_as_type(o, type, 0))
 					return PEEL_INVALID;
diff --git a/object.h b/object.h
index 1499f63d50..e9baade1e0 100644
--- a/object.h
+++ b/object.h
@@ -289,13 +289,13 @@ enum peel_status {
 
 enum peel_object_flags {
 	/*
-	 * Always verify the object type, even in the case where the looked-up
-	 * object already has an object type. This can be useful when the
-	 * stored object type may be invalid. One such case is when looking up
-	 * objects via tags, where we blindly trust the object type declared by
-	 * the tag.
+	 * Always verify the object type of the tagged object, even in the case
+	 * where the looked-up object already has an object type. This can be
+	 * useful when the tagged object type may be invalid. One such case is
+	 * when looking up objects via tags, where we blindly trust the object
+	 * type declared by the tag.
 	 */
-	PEEL_OBJECT_VERIFY_OBJECT_TYPE = (1 << 0),
+	PEEL_OBJECT_VERIFY_TAGGED_OBJECT_TYPE = (1 << 0),
 };
 
 /*
diff --git a/ref-filter.c b/ref-filter.c
index d8667c569a..d7454269e8 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2654,7 +2654,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 		if (!is_null_oid(&ref->peeled_oid)) {
 			oidcpy(&oi_deref.oid, &ref->peeled_oid);
 		} else if (!peel_object(the_repository, &oi.oid, &oi_deref.oid,
-					PEEL_OBJECT_VERIFY_OBJECT_TYPE)) {
+					PEEL_OBJECT_VERIFY_TAGGED_OBJECT_TYPE)) {
 			/* We managed to peel the object ourselves. */
 		} else {
 			die("bad tag");
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 1ab0c50393..5aa615011a 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1528,7 +1528,7 @@ static enum ref_transaction_error write_with_updates(struct packed_ref_store *re
 		} else {
 			struct object_id peeled;
 			int peel_error = peel_object(refs->base.repo, &update->new_oid,
-						     &peeled, PEEL_OBJECT_VERIFY_OBJECT_TYPE);
+						     &peeled, PEEL_OBJECT_VERIFY_TAGGED_OBJECT_TYPE);
 
 			if (write_packed_entry(out, update->refname,
 					       &update->new_oid,
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 6bbfd5618d..1ac1f6156f 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1633,7 +1633,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 			ref.update_index = ts;
 
 			peel_error = peel_object(arg->refs->base.repo, &u->new_oid, &peeled,
-						 PEEL_OBJECT_VERIFY_OBJECT_TYPE);
+						 PEEL_OBJECT_VERIFY_TAGGED_OBJECT_TYPE);
 			if (!peel_error) {
 				ref.value_type = REFTABLE_REF_VAL2;
 				memcpy(ref.value.val2.target_value, peeled.hash, GIT_MAX_RAWSZ);