Arrays often need sorting. Perl has built-in ways to help you, but as usual, there's more than one way to do it.
To play with the examples shown here, you'll need a file containing a few lines of words. I'll use this:
Each and everyone listen: Exactly one way is not the only way except when it is
and we'll call it "unsorted". Our first Perl sort is simple:
#!/usr/bin/perl my @words=<>; foreach(sort @words) { print; }
and in our usually highly creative way, we'll call it "t.pl". Running it with "unsorted" as an argument gives us these results:
$ ./t.pl unsorted Each Exactly and everyone except is is it listen: not one only the way way when
In some cases, that might be good enough for our needs, but it might not be (and if it were, we'd have to end this article here). So what do we need to do if we need more? Let's say we want to ignore the upper/lower case distinctions and sort in "dictionary" order, like command line "sort -f".
Well, Perl provides a way for us to do part of the sorting. That is, we can provide a subroutine that the Perl "sort" will call to decide whether one thing is greater or smaller than another. Perl will still take care of shuffling things around for us.
The subroutine is a little strange because we don't have to process arguments. For efficiency reasons, the two things we get are always $a and $b and it's up to us to compare them and return a numeric result indicating equality (0), that $a is less than $b (a negative number) or greater (a positive number). So here's our new t.pl:
#!/usr/bin/perl my @words=<>; foreach(sort mysort @words) { print; } sub mysort { lc($a) cmp lc($b); }
The "cmp" does the job of comparing, and "lc" translates to lower case so that the result is case insensitive:
and Each everyone Exactly except is is it listen: not one only the way way when
We can get as complicated as we like in the comparing subroutine. Here's one that sorts in order of the number of "e"'s in each word. A pretty artificial example, but it shows what you can do:
#!/usr/bin/perl my @words=<>; foreach(sort mysort @words) { print; } sub mysort { $aa=$a;$bb=$b; ($aa =~ tr/eE/eE/) <=> ($bb=~ tr/eE/eE/) || lc($a) cmp lc($b); }
The counting of the "e"'s is done using the "tr" operator and the comparison needs to use the arithmetic compare <=>. We copy $a and $b to temporary variables because if we didn't, tr would change them and we'd just get numbers as our result (though in other situations, that might be just what you wanted).
By using "||" and retaining the dictionary method, words with the same number of "e"'s stay in order. Without that, we'd get this:
not and only way it is way is Each the listen: Exactly one when except everyone
but with it:
and is is it not only way way Each Exactly listen: one the when except everyone
Now we get to the more complicated ways. If our "unsorted" were very large, it could take a while to run. At https://perlmonks.thepen.com/145659.html, I found this:
#!/usr/bin/perl my @words=<>; #Guttman Rosler Transform or GRT my @sorted=map { substr($_,4) } sort map { pack("LA*",tr/eE/eE/,$_) } @words; foreach(@sorted) { print; }
This needs a lot of explanation. First, "map". Perl's "map" function is like a "foreach" loop: whatever you want to do to an array is in the first argument, the array you work on is its second. It's better than a foreach loop though, because it gives us back a new array. As a very simple example:
@lcwords=map {lc($_)} @words;
So the "map { pack("LA*",tr/eE/eE/,$_) } @words;" part of the example above returns a new array that consists of a packed 4 byte number followed by the orginal contents of each line. That number is, of course, the count provided by "tr". We use pack here because it's very quick, but if we had more complex needs, we could use sprintf or even build the string ourselves. We just have to make sure we can un-build it at the end.
It's then ordered by the "sort" on the previous line, and then the first "map { substr($_,4) }" works on that array, stripping out the packed 4 byte number that made our previous sorting work.
In spite of the double map use, this will actually run much faster than what we did before. Even on a relatively small file, "time" will show that this can be twice as fast.
For more on this, see:
https://tlc.perlarchive.com/articles/perl/sc0001.shtml
A Fresh Look
at Efficient Perl Sorting
You might also be interested in the opposite task: "unsorting" or randomizing a file.
Got something to add? Send me email.
More Articles by Tony Lawrence © 2011-04-03 Tony Lawrence
The best of us would rather be popular than right. (Mark Twain)
Sun Apr 10 11:49:39 2005: 315 anonymous
try Sort::Key from CPAN
thatīs all folks!
Sun Apr 10 13:13:08 2005: 316 TonyLawrence
Thanks, Sort::Key ( (link) ) does look useful and easy.
------------------------
Printer Friendly Version
Perl sorting Copyright © September 2003 Tony Lawrence
Have you tried Searching this site?
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.
Contact us
Printer Friendly Version