ls --dired now implies long format output without hyperlinks enabled,
and will take precedence over previously specified formats or hyperlink mode.
+ wc no longer ignores encoding errors when counting words.
+ Instead, it treats them as non white space.
+
** New features
tail now supports following multiple processes, with repeated --pid options.
@opindex -m
@opindex --chars
Print only the character counts, as per the current locale.
-Invalid characters are not counted.
+Encoding errors are not counted.
@item -w
@itemx --words
@opindex --words
Print only the word counts. A word is a nonempty sequence of non white
space delimited by white space characters or by start or end of input.
+The current locale determines which characters are white space.
+GNU @command{wc} treats encoding errors as non white space.
@item -l
@itemx --lines
bytes_read--;
mbszero (&state);
in_shift = false;
+
+ /* Treat encoding errors as non white space.
+ POSIX says a word is "a non-zero-length string of
+ characters delimited by white space". This is
+ wrong in some sense, as the string can be delimited
+ by start or end of input, and it is unclear what it
+ means when the input contains encoding errors.
+ Since encoding errors are not white space,
+ treat them that way here. */
+ words += !in_word;
+ in_word = true;
continue;
}
if (mbsinit (&state))
/* Count words by counting word starts, i.e., each
white space character (or the start of input)
- followed by non white space.
-
- POSIX says a word is "a non-zero-length string of
- characters delimited by white space". This is certainly
- wrong in some sense, as the string can be delimited
- by start or end of input, and it is not clear
- what it means when the input contains encoding errors.
- Although GNU wc ignores encoding errors when determining
- word boundaries, this behavior is not documented or
- portable and should not be relied upon. */
+ followed by non white space. */
words += !in_word;
in_word = true;
break;