===== Benchmark setup (on GNU/Linux) ====
$ yes áááááááááááááááááááá | head -n100000 > mbc.txt
$ yes
12345678901234567890 | head -n100000 > num.txt
===== Before ====
$ time src/wc -Lm < mbc.txt
real 0m0.186s
$ time src/wc -m < mbc.txt
real 0m0.186s
$ time src/wc -Lm < num.txt
real 0m0.055s
$ time src/wc -m < num.txt
real 0m0.056s
==== After ====
$ time src/wc -Lm < mbc.txt
real 0m0.196s
$ time src/wc -m < mbc.txt
real 0m0.173s
$ time src/wc -Lm < num.txt
real 0m0.031s
$ time src/wc -m < num.txt
real 0m0.028s
* src/wc.c (wc): Only call wide variant functions like
iswprint() and wcwidth() for non is_basic() characters.
I.E. non ISO C "basic character set" characters.
This is especially significant on OSX where wcwidth()
is very expensive (about 10x in tests).
* NEWS: Mention the improvement.
Suggested by Eric Fischer.
version of XFS. stat -f --format=%T now reports the file system type,
and tail -f uses inotify.
+ wc avoids redundant processing of ASCII text in multibyte locales,
+ which is especially significant on macOS.
+
* Noteworthy changes in release 8.29 (2017-12-27) [stable]
{
wchar_t wide_char;
size_t n;
+ bool wide = true;
if (!in_shift && is_basic (*p))
{
mbrtowc(). */
n = 1;
wide_char = *p;
+ wide = false;
}
else
{
n = 1;
}
}
- p += n;
- bytes_read -= n;
- chars++;
+
switch (wide_char)
{
case '\n':
in_word = false;
break;
default:
- if (iswprint (wide_char))
+ if (wide && iswprint (wide_char))
{
- int width = wcwidth (wide_char);
- if (width > 0)
- linepos += width;
+ /* wcwidth can be expensive on OSX for example,
+ so avoid if uneeded. */
+ if (print_linelength)
+ {
+ int width = wcwidth (wide_char);
+ if (width > 0)
+ linepos += width;
+ }
if (iswspace (wide_char))
goto mb_word_separator;
in_word = true;
}
+ else if (!wide && isprint (to_uchar (*p)))
+ {
+ linepos++;
+ if (isspace (to_uchar (*p)))
+ goto mb_word_separator;
+ in_word = true;
+ }
break;
}
+
+ p += n;
+ bytes_read -= n;
+ chars++;
}
while (bytes_read > 0);