]> git.ipfire.org Git - thirdparty/git.git/commitdiff
diff: improve scaling of filenames in diffstat to handle UTF-8 chars
authorLorenzoPegorari <lorenzo.pegorari2002@gmail.com>
Fri, 16 Jan 2026 00:05:03 +0000 (01:05 +0100)
committerJunio C Hamano <gitster@pobox.com>
Fri, 16 Jan 2026 16:24:35 +0000 (08:24 -0800)
The `show_stats()` function tries to scale the filenames in the diffstat to
ensure they don't exceed the given `name-width`. It does so by calculating
the "display width" of the characters to be dropped, but then advances the
filename pointer by that number of bytes.

However, the "display width" of a character is not always equal to its byte
count. The result is that sometimes, when displaying UTF-8 characters,
filenames exceed the given `name-width`, and frequently the bytes of the
UTF-8 characters are truncated.

The following is an example of the issue, where the 2 files are "HelloHi" and
"Hello你好", and `name-width=6`:

    ...oHi | 0
    ...<BD><A0>好 | 0

Make the filename pointer move by the actual number of bytes of the
characters to drop from the filename, rather than their display width, using
the `utf8_width()` function.

Force `len` to not be less than 0 (this happens if the given `name-width` is
2 or less), otherwise an infinite loop is entered.

Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff.c

diff --git a/diff.c b/diff.c
index a1961526c0dab1af182d4f400468bf5617f5175c..86fdf4d8d738fd29c5751abbc9cd23d616bcd370 100644 (file)
--- a/diff.c
+++ b/diff.c
@@ -2823,17 +2823,12 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
                        char *slash;
                        prefix = "...";
                        len -= 3;
-                       /*
-                        * NEEDSWORK: (name_len - len) counts the display
-                        * width, which would be shorter than the byte
-                        * length of the corresponding substring.
-                        * Advancing "name" by that number of bytes does
-                        * *NOT* skip over that many columns, so it is
-                        * very likely that chomping the pathname at the
-                        * slash we will find starting from "name" will
-                        * leave the resulting string still too long.
-                        */
-                       name += name_len - len;
+                       if (len < 0)
+                               len = 0;
+
+                       while (name_len > len)
+                               name_len -= utf8_width((const char**)&name, NULL);
+
                        slash = strchr(name, '/');
                        if (slash)
                                name = slash;