Parag Kalra wrote:
Hello Folks,
Hello,
This is my first post here.
I am trying to emulate Linux 'sort' command through Perl. I got following
code through Internet to sort the text file:
# cat sort.pl
my $column_number = 2; # Sorting by 3rd column since 0-origin based
my $prev = "";
for (
map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [$_, (split)[$column_number]] }
<>
) {
print unless $_ eq $prev;
$prev = $_;
}
Suppose I want to sort the data of text file having following rows &
columns:
# cat test.out
jhvXgF U13GWt 3OvMCf VMkAWj
4ewejk pFnjd4 ie0hZF pPipQJ
4ewejk 4sqprx ie0hZF cqtexi
FT9mWp d4fgMB gvZRJU XRRu0N
hnzI2c GXAXWF 6xKH7A 3dLh18
When I sort it using the 'sort' command by 3rd column I get following
output:
# sort -u -k 3 test.out
jhvXgF U13GWt 3OvMCf VMkAWj
hnzI2c GXAXWF 6xKH7A 3dLh18
FT9mWp d4fgMB gvZRJU XRRu0N
4ewejk 4sqprx ie0hZF cqtexi
4ewejk pFnjd4 ie0hZF pPipQJ
However when I sort the same text file by 3rd column using the piece of
code, I get following:
jhvXgF U13GWt 3OvMCf VMkAWj
hnzI2c GXAXWF 6xKH7A 3dLh18
FT9mWp d4fgMB gvZRJU XRRu0N
4ewejk pFnjd4 ie0hZF pPipQJ
4ewejk 4sqprx ie0hZF cqtexi
Difference can be seen the last 2 row values of 2nd column.
The reason being 'ie0hZF' is repeated twice in 3rd column and also
corresponding values in 1st column are same - '4ewejk' so discrepancy has
occured in 2nd column.
Can anybody help me fix the bug in the above code.
This is not a bug. Because you are only sorting based on one column the
order of the other columns is indeterminate. If you want to sort the
whole line like 'sort' does then change:
sort { $a->[1] cmp $b->[1] }
To:
sort { $a->[1] cmp $b->[1] || $a->[0] cmp $b->[0] }
John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity. -- Damian Conway
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/