On Mon, Feb 12, 2007 at 09:53:24AM -0800, Roger Marquis wrote:
I could not find this explanation in previous emails.
because the LC_COLLATE rules for the C locale don't ignore
whitespace, so you're effectively sorting on initial month and then
sorting all 1 digit dates followed by all two digit dates (because
the 1 digit dates are preceeded by a space, which sorts first). In
the en_US locale (which your bug report says you're using) whitespace
is ignored when sorting
'env LANG="en_US.UTF-8" sort -M' is also broken in Solaris 2.8.
Not broken, more working as designed (as is coreutils). Your
expectations may not match the behavior, but it is consistent with the
design (and documentation) of the sort command.
This is not an issue, however, because Solaris does not set the
default LANG to UTF-8
Neither does debian, that's an install-time choice. (Solaris has the
same kind of install-time locale selection.)
(Note that the solaris man page also defines -M as "month sort"
and says nothing about days.)
Obviously a bug in the man page. "-M" has sorted by date,
including the day of the month and hour of the day, in pre-UTF
implementations going back decades.
Again, I think I already explained why it would behave that way in the C
locale. I also pointed out that -M is documented as sorting by month and
does not say anything about dates.
The syntax I described before (sort -k 1,1M -k 2,2n -k 3) will
be more reliable because it will work regardless of the current
locale settings
It is reliable but not accurate. "-k 3" stops sorting at the first ":"
in the hour:minute:second field.
I can't duplicate that:
(102)osgiliath:/tmp> cat test3.fil
Jan 1 08:17:37 annuminas sshd[21294]:
Jan 1 08:15:37 annuminas sshd[21294]:
Jan 1 08:14:37 annuminas sshd[21294]:
Jan 1 08:19:37 annuminas sshd[21294]:
Jan 1 08:16:37 annuminas sshd[21294]:
Jan 1 08:16:38 annuminas sshd[21294]:
Jan 1 08:16:36 annuminas sshd[21294]:
Jan 1 08:16:47 annuminas sshd[21294]:
Jan 1 08:16:57 annuminas sshd[21294]:
Jan 1 08:16:46 annuminas sshd[21294]:
Jan 1 09:24:43 annuminas sshd[21581]:
Jan 1 16:34:11 annuminas sshd[23406]:
(103)osgiliath:/tmp> sort -k 1,1M -k 2,2n -k 3 test3.fil
Jan 1 08:14:37 annuminas sshd[21294]:
Jan 1 08:15:37 annuminas sshd[21294]:
Jan 1 08:16:36 annuminas sshd[21294]:
Jan 1 08:16:37 annuminas sshd[21294]:
Jan 1 08:16:38 annuminas sshd[21294]:
Jan 1 08:16:46 annuminas sshd[21294]:
Jan 1 08:16:47 annuminas sshd[21294]:
Jan 1 08:16:57 annuminas sshd[21294]:
Jan 1 08:17:37 annuminas sshd[21294]:
Jan 1 08:19:37 annuminas sshd[21294]:
Jan 1 09:24:43 annuminas sshd[21581]:
Jan 1 16:34:11 annuminas sshd[23406]:
-k 3 means "sort from third field to end of line" since there is no
terminating field (,n) specified.
For clarification, are you saying that the output of "-M" should
differ between LANG=C and LANG=UTF-8 even though the input fields
are identical?
Yes, as documented in both coreutils and solaris, -M is locale-specific
(different languages don't spell months the same way) and basic sorting
rules (specifically those dealing with non-ascii and whitespace
characters) are different.
Mike Stone
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]