APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

Perl sorting

Sun Aug 1 12:17:27 2004 Perl Search Keys: perl,programming

Perl has an easy to use "sort" function. For example, you might have an array like this:


@month=("January","February","March","April","May","June","July",
"August","September","October","November","December");
 

To list that out alphabetically, you could do:


#!/usr/bin/perl
@month=("January","February","March","April","May","June","July",
"August","September","October","November","December");

foreach (sort @month) {
  print "$_\n";
}
 

If you wanted it reversed, just do

foreach (reverse sort @month) {
  print "$_\n";
}
 

By the way, "reverse" isn't just for sorting. You could print that array starting at December and ending with January with

foreach (reverse @month) {
  print "$_\n";
}
 

But suppose the array was more difficult:

@stuff=("Camel","Aardvark","zoo","animal");
 

You want that sorted, but you want to ignore case. This will work:

foreach (sort { lc($a) cmp lc($b) }  @stuff) {
  print "$_\n";
 

That works because Perl lets you specify a code block to help its sort. The code you provide always gets two elements of your array, and will see them as $a and $b. Your code simply decides how $a and $b compare, and returns 1 if $a is greater, 0 if they are equal, and -1 if $a is less than $b.

If the array is numeric, sort by itself doesn't work well:

@stuff=("1","70","100","200","3");
foreach (sort @stuff) {
  print "$_\n";
}
 

That would produce:

1
100
200
3
70
 

Using a code block fixes that:

foreach (sort { $a <=> $b } @stuff) {
  print "$_\n";
}
 

You can replace the code block with a subroutine if you need to. For example, the Consultants List is stored in a file where each line looks like this:

- U.S.A. -|Massachusetts|A.P. Lawrence|Sharon|(781) 784-7547|(781)
658-2012|Tony Lawrence||http://www.pcunix.com/services.html|Sales,
Service SCO Unix, Linux, Macintosh OS X, Mitel SME server (email,
virus scan, web access control, firewwall, spam control,
groupware)|07/21/04
 

I want to sort that list, but by very special rules. First, I want to sort by country and "state" within the country, and then by the very last field, which is the update date. If the record hasn't been updated, I want that to sort under records which have been updated, and I only want to compare the year, not the month and day. After that, I want listings that have a company name to appear before those that don't. It's a pretty complicated sort, but Perl can handle it:

open(CONSULTS,"confile");
@scons=<CONSULTS>;close CONSULTS;

foreach (sort consort @scons) {

  ..

}
 

By saying "sort consort", instead of putting code between brackets, Perl sort will call our "consort" subroutine:


sub consort {
$aa=$a;
$bb=$b;
# save our variables because our sort routine affects them.  If I "chomp $a"  
# that will actually change the line seen in the foreach loop that calls this.

chomp $aa;
chomp $bb;

$aa=~ s/^  *//;
$bb=~ s/^  *//;
# split up our fields
($country,$state,$company,$city,$phone,$fax,$contact,
$email,$web,$notes,$adate,$junk)=split /\|/,$aa;

($bcountry,$bstate,$bcompany,$bcity,$bphone,$bfax,$bcontact,
$bemail,$bweb,$bnotes,$badate,$bjunk)=split /\|/,$bb;

# isolate the update date
@adate=split /\//,$adate;
@bdate=split /\//,$badate;
$acdate="$adate[2]";
$acdate="00" if not $acdate;
$bcdate="$bdate[2]";
$bcdate="00" if not $bcdate;
# and the company name
$company="ZZZZZ" if not $company;
$bcompany="ZZZZZ" if not $bcompany;
uc($country) cmp uc($bcountry) || uc($state) cmp uc($bstate) || 
$bcdate <=> $acdate || uc($company) cmp uc($bcompany) 
|| uc($aa) cmp uc($bb);

}
 

Notice that "$bcdate <=> $acdate"? That effectively reverses the sort. Ordinarily, you would compare the "$a" value to the "$b" value, but by doing it backwards, we reverse that part of the sort, so "04" is listed before "03" and of course "00" comes last. Or'ing together the rest of our sort criteria produces the ordering I want. The very final "uc($aa) cmp uc($bb)" just picks up any differences should all the other criteria be equal.



Got something to add? Send me email.





Increase ad revenue 50-250% with Ezoic


More Articles by

Find me on Google+

© Tony Lawrence



Kerio Samepage


Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us





It is difficult to free fools from the chains they revere. (Voltaire)

Legend has it that every new technology is first used for something related to sex or *redacted*. That seems to be the way of humankind. (Tim Berners-Lee)












This post tagged: