@item
First the initial part of each string consisting entirely of non-digit
-characters is determined.
+bytes is determined.
-@enumerate
+@enumerate A
@item
-These two parts (one of which may be empty) are compared lexically.
+These two parts (either of which may be empty) are compared lexically.
If a difference is found it is returned.
@item
-The lexical comparison is a comparison of ASCII values modified so that:
+The lexical comparison is a lexicographic comparison of byte strings,
+except that:
-@enumerate
+@enumerate a
@item
-Letters sort before non-letters.
+ASCII letters sort before other bytes.
@item
-A tilde sorts before anything, even the end of a part.
+A tilde sorts before anything, even an empty string.
@end enumerate
@end enumerate
@item
-Then the initial part of the remainder of each string which consists
-entirely of digit characters is determined. The numerical values of
+Then the initial part of the remainder of each string that contains
+all the leading digits is determined. The numerical values represented by
these two parts are compared, and any difference found is returned as
the result of the comparison.
-@enumerate
+
+@enumerate A
@item
For these purposes an empty string (which can only occur at the end of
one or both version strings being compared) counts as zero.
+
+@item
+Because the numerical value is used, non-identical strings can compare
+equal. For example, @samp{123} compares equal to @samp{00123}, and
+the empty string compares equal to @samp{0}.
@end enumerate
@item
each string:
@example
-foo @r{vs} foo @r{(rule 2, non-digit characters)}
+foo @r{vs} foo @r{(rule 2, non-digits)}
07 @r{vs} 7 @r{(rule 3, digits)}
. @r{vs} a. @r{(rule 2)}
7 @r{vs} 7 @r{(rule 3)}
@enumerate
@item
-The first parts (@samp{foo}) are identical in both strings.
+The first parts (@samp{foo}) are identical.
@item
The second parts (@samp{07} and @samp{7}) are compared numerically,
-and are identical.
+and compare equal.
@item
The third parts (@samp{.} vs @samp{a.}) are compared
-lexically by ASCII value (rule 2.2).
+lexically by ASCII value (rule 2.B).
@item
-The first character of the first string (@samp{.}) is compared
-to the first character of the second string (@samp{a}).
+The first byte of the first string (@samp{.}) is compared
+to the first byte of the second string (@samp{a}).
@item
-Rule 2.2.1 dictates that ``all letters sorts earlier than all non-letters''.
+Rule 2.B.a says letters sorts before non-letters.
Hence, @samp{a} comes before @samp{.}.
@item
Numeric sort (@samp{sort -n}) treats the entire string as a single numeric
value, and compares it to other values. For example, @samp{8.1}, @samp{8.10} and
@samp{8.100} are numerically equivalent, and are ordered together. Similarly,
-@samp{8.49} is numerically smaller than @samp{8.5}, and appears before first.
+@samp{8.49} is numerically less than @samp{8.5}, and appears before first.
Version sort (@samp{sort -V}) first breaks down the string into digit and
non-digit parts, and only then compares each part (see annotated
-example in Version-sort ordering rules).
+example in @ref{Version-sort ordering rules}).
Comparing the string @samp{8.1} to @samp{8.01}, first the
-@samp{8} characters are compared (and are identical), then the
+@samp{8}s are compared (and are identical), then the
dots (@samp{.}) are compared and are identical, and lastly the
remaining digits are compared numerically (@samp{1} and @samp{01}) -
-which are numerically equivalent. Hence, @samp{8.01} and @samp{8.1}
+which are numerically equal. Hence, @samp{8.01} and @samp{8.1}
are grouped together.
Similarly, comparing @samp{8.5} to @samp{8.49} -- the @samp{8}
assigning versions to computer programs (while perhaps not intuitive
or ``natural'' for people).
-@node Punctuation characters
-@subsection Punctuation characters
+@node Version sort punctuation
+@subsection Version sort punctuation
-Punctuation characters are sorted by ASCII order (rule 2.2).
+Punctuation is sorted by ASCII order (rule 2.B).
@example
$ touch 1.0.5_src.tar.gz 1.0_src.tar.gz
into the following parts:
@example
- 1 @r{vs} 1 @r{(rule 3, all digit characters)}
- . @r{vs} . @r{(rule 2, all non-digit characters)}
+ 1 @r{vs} 1 @r{(rule 3, all digits)}
+ . @r{vs} . @r{(rule 2, all non-digits)}
0 @r{vs} 0 @r{(rule 3)}
. @r{vs} _src.tar.gz @r{(rule 2)}
- 5 @r{vs} empty string @r{(no more character in the file name)}
+ 5 @r{vs} empty string @r{(no more bytes in the file name)}
_src.tar.gz @r{vs} empty string
@end example
The fourth parts (@samp{.} and @samp{_src.tar.gz}) are compared
-lexically by ASCII order. The character @samp{.} (ASCII value 46) is
-smaller than @samp{_} (ASCII value 95) -- and should be listed before it.
+lexically by ASCII order. The @samp{.} (ASCII value 46) is
+less than @samp{_} (ASCII value 95) -- and should be listed before it.
Hence, @file{1.0.5_src.tar.gz} is listed first.
-If a different character appears instead of the underscore (for
-example, percent sign @samp{%} ASCII value 37, which is smaller
+If a different byte appears instead of the underscore (for
+example, percent sign @samp{%} ASCII value 37, which is less
than dot's ASCII value of 46), that file will be listed first:
@example
@end example
The same reasoning applies to the following example, as @samp{.} with
-ASCII value 46 is smaller than @samp{/} with ASCII value 47:
+ASCII value 46 is less than @samp{/} with ASCII value 47:
@example
$ cat input5
@node Punctuation vs letters
@subsection Punctuation vs letters
-Rule 2.2.1 dictates that letters sorts earlier than all non-letters
+Rule 2.B.a says letters sort before non-letters
(after breaking down a string to digit and non-digit parts).
@example
@end example
The input strings consist entirely of non-digits, and based on the
-above algorithm have only one part, all non-digit characters
+above algorithm have only one part, all non-digits
(@samp{a%} vs @samp{az}).
Each part is then compared lexically,
-character-by-character. @samp{a} compares identically in both
+byte-by-byte; @samp{a} compares identically in both
strings.
-Rule 2.2.1 dictates that letters (@samp{z}) sorts earlier than all
-non-letters (@samp{%}) -- hence @samp{az} appears first (despite
+Rule 2.B.a says a letter like @samp{z} sorts before
+a non-letter like @samp{%} -- hence @samp{az} appears first (despite
@samp{z} having ASCII value of 122, much larger than @samp{%}
with ASCII value 37).
-@node The tilde @samp{~} character
-@subsection The tilde @samp{~} character
+@node The tilde @samp{~}
+@subsection The tilde @samp{~}
-Rule 2.2.2 dictates that the tilde character @samp{~} (ASCII 126) sorts
-before all other non-digit characters, including an empty part.
+Rule 2.B.b says the tilde @samp{~} (ASCII 126) sorts
+before other bytes, and before an empty string.
@example
$ cat input7
in the input file start with a digit -- their first non-digit part is
empty.
-Based on rule 2.2.2, tilde @samp{~} sorts before all other non-digits
-including the empty part -- hence it comes before all other strings,
+Based on rule 2.B.b, tilde @samp{~} sorts before other bytes
+and before the empty string -- hence it comes before all other strings,
and is listed first in the sorted output.
The remaining lines (@samp{1}, @samp{1%}, @samp{1.2}, @samp{1~})
follow similar logic: The digit part is extracted (1 for all strings)
-and compares identical. The following extracted parts for the remaining
+and compares equal. The following extracted parts for the remaining
input lines are: empty part, @samp{%}, @samp{.}, @samp{~}.
Tilde sorts before all others, hence the line @samp{1~} appears next.
Ignoring the first letter (@samp{a}) which is identical in all
strings, the compared values are:
-@samp{a} and @samp{z} are letters, and sort earlier than
-all other non-digit characters.
+@samp{a} and @samp{z} are letters, and sort before
+all other non-digits.
Then, percent sign @samp{%} (ASCII value 37) is compared to the
first byte of the UTF-8 sequence of @samp{α}, which is 0xCE or 206). The
and file name listing.
-@node Hyphen-minus and colon characters
-@subsection Hyphen-minus @samp{-} and colon @samp{:} characters
+@node Hyphen-minus and colon
+@subsection Hyphen-minus @samp{-} and colon @samp{:}
In Debian's version string syntax the version consists of three parts:
@example
@end example
If the @samp{debian_revision part} is not present,
-hyphen characters @samp{-} are not allowed.
+hyphens @samp{-} are not allowed.
If epoch is not present, colons @samp{:} are not allowed.
If these parts are present, hyphen and/or colons can appear only once
have many hyphens, a line of text can have many colons).
As a result, in GNU Coreutils hyphens and colons are treated exactly
-like all other punctuation characters, i.e., they are sorted after
-letters. @xref{Punctuation characters}.
+like all other punctuation, i.e., they are sorted after
+letters. @xref{Version sort punctuation}.
In Debian, these characters are treated differently than in Coreutils:
a version string with hyphen will sort before similar strings without
For further details, see @ref{Comparing two strings using Debian's
algorithm} and @uref{https://bugs.gnu.org/35939,GNU Bug 35939}.
-@node Additional hard-coded priorities in GNU Coreutils version sort
-@subsection Additional hard-coded priorities in GNU Coreutils version sort
+@node Special priority in GNU Coreutils version sort
+@subsection Special priority in GNU Coreutils version sort
In GNU Coreutils version sort, the following items have
-special priority and sort earlier than all other characters (listed in
-order);
+special priority and sort before all other strings (listed in order):
@enumerate
@item The empty string
-@item The string @samp{.} (a single dot character, ASCII 46)
+@item The string @samp{.} (a single dot, ASCII 46)
-@item The string @samp{..} (two dot characters)
+@item The string @samp{..} (two dots)
-@item Strings start with a dot (@samp{.}) sort earlier than
-strings starting with any other characters.
+@item Strings starting with dot (@samp{.}) sort before
+strings starting with any other byte.
@end enumerate
Example:
@example
$ printf '%s\n' a "" b "." c ".." ".d20" ".d3" | sort -V
-
.
..
.d3
@subsection Special handling of file extensions
GNU Coreutils version sort implements specialized handling
-of file extensions (or strings that look like file names with
-extensions).
-
-This nuanced implementation enables slightly more natural ordering of files.
+of strings that look like file names with extensions.
+This enables slightly more natural ordering of file
+names.
-The additional rules are:
+The following additional rules apply when comparing two strings where
+both begin with non-@samp{.}. They also apply when comparing two
+strings where both begin with @samp{.} but neither is @samp{.} or @samp{..}.
@enumerate
@item
-A suffix (i.e., a file extension) is defined as: a dot, followed by a
-letter or tilde, followed by one or more letters, digits, or tildes
-(possibly repeated more than once), until the end of the string
-(technically, matching the regular expression
-@code{(\.[A-Za-z~][A-Za-z0-9~]*)*}).
+A suffix (i.e., a file extension) is defined as: a dot, followed by an
+ASCII letter or tilde, followed by zero or more ASCII letters, digits,
+or tildes; all repeated zero or more times, and ending at string end.
+This is equivalent to matching the extended regular expression
+@code{(\.[A-Za-z~][A-Za-z0-9~]*)*$} in the C locale.
@item
-If the strings contains suffixes, the suffixes are temporarily
-removed, and the strings are compared without them (using the
-@ref{Version-sort ordering rules,algorithm,algorithm} above).
+The suffixes are temporarily removed, and the strings are compared
+without them, using version sort (see @ref{Version-sort ordering
+rules}) without special priority (see @ref{Special priority in GNU
+Coreutils version sort}).
@item
-If the suffix-less strings are identical, the suffix is restored and
-the entire strings are compared.
+If the suffix-less strings do not compare equal, this comparison
+result is used and the suffixes are effectively ignored.
@item
-If the non-suffixed strings differ, the result is returned and the
-suffix is effectively ignored.
+If the suffix-less strings compare equal, the suffixes are restored
+and the entire strings are compared using version sort.
@end enumerate
Examples for rule 1:
@item
@samp{gcc-c++-10.8.12-0.7rc2.fc9.tar.bz2}: the suffix is
@samp{.fc9.tar.bz2} (@samp{.7rc2} is not included as it begins with a digit)
+
+@item
+@samp{.autom4te.cfg}: the suffix is the entire string.
@end itemize
Examples for rule 2:
to the following parts:
@example
-hello- @r{vs} hello- @r{(rule 2, all non-digit characters)}
-8 @r{vs} 8 @r{(rule 3, all digit characters)}
+hello- @r{vs} hello- @r{(rule 2, all non-digits)}
+8 @r{vs} 8 @r{(rule 3, all digits)}
.txt @r{vs} . @r{(rule 2)}
empty @r{vs} 2
empty @r{vs} .txt
broken down into the following parts:
@example
-hello- @r{vs} hello- @r{(rule 2, all non-digit characters)}
-8 @r{vs} 8 @r{(rule 3, all digit characters)}
+hello- @r{vs} hello- @r{(rule 2, all non-digits)}
+8 @r{vs} 8 @r{(rule 3, all digits)}
empty @r{vs} . @r{(rule 2)}
empty @r{vs} 2
@end example
To illustrate the different handling of hyphens between Debian and
Coreutils algorithms (see
-@ref{Hyphen-minus and colon characters}):
+@ref{Hyphen-minus and colon}):
@example
$ compver abb ab-cd 2>/dev/null $ printf 'abb\nab-cd\n' | sort -V