Why I love Perl

This article is written for people who have at least some experience writing shell scripts or who have at least a basic understanding of another programming or scripting language. To understand it, you will need to have Perl installed so that you can test these ideas for yourself and see what happens.

I'm a fairly recent convert to Perl, having only started using it a few years ago. Switching to something new is always somewhat uncomfortable; there's new syntax to learn, and sometimes whole new ways of doing things. That was certainly the case with Perl, but the pain was offset by the sheer joy of being able to do so many formerly clumsy tasks so simply and elegantly.

Let's dispose of one thing first: I'm not a Perl expert. I'm not an expert at anything- there are just too many things in the world that catch my attention that I can never spend the time necessary to become really proficient at anything. So I am a Perl dabbler: I write a lot of my scripts with it, but I don't for a minute pretend that these are shining examples of Perl at its best.

However, I have learned a few things, and if you are getting ready to start using Perl, you might find my experiences useful.

Those wonderful <>'s

Let's start with a really simple program that just emulates "cat".

#!/usr/bin/perl5
while (<>) {
  print $_;
}
 

Never mind that "$_" for now; we'll get to that later. For now, just accept that it's the line read. See those <> inside the ()'s? That's the entire magic. That will read data from standard input or from a file given on the command line. That means you can use this as a filter or give it an argument; all of these do the same thing:

cat.pl < somefile 
cat somefile | cat.pl 
cat.pl somefile
 

That's pretty cool all by itself. Most languages would make you jump through hoops to do just that. But here's the most wonderful part: you can give it multiple filenames

cat.pl file1 file2 file3
 

and those magic <>'s will just keep on reading with absolutely no effort on your part. If you don't need to, there's no reason to pay any attention at all to the arguments; Perl handles them for you.

If you do need to know when one file closes and another opens, the "eof" command will tell you. Try this with multiple files:

#!/usr/bin/perl5
while (<>) {
  print $_;
  print "--------------------------\n" if eof;
}
 

You can even get the file names if you want them:

#!/usr/bin/perl5
while (<>) {
  print $_;
  print "--- End of $ARGV ----\n" if eof;
}
 

Are you starting to like this? It gets better. Those angle brackets have more magic: they can read an entire file in one gulp. You can do this, for example:

#!/usr/bin/perl5
@files=<>;
print @files;
 

Everything got read into "@files", which is an array. Here we just printed it, but there's much more you can do.

There's more magic in those angle brackets, too. Take a look at this little snippet:

while (<[A-Z]*/*.html [a-e]*.html [g-z]*.html>) {
...
}
 

That loops through the names of files matched by the wildcards. What could be easier?

What if you actually want to open a specific file? Still easy:

open(MYFILE,".profile") or die "Can't open .profile";
while (<MYFILE>) {
...
}
 

That "open" isn't limited to files. Here's something you'll see a lot of:

open(MAIL,"|/usr/bin/mail myaddress\@mydomain.com");
print MAIL "Special message from a Perl program";
close MAIL;
 

In general, Perl goes out of its way to make things easy for you. Look at this sequence:

open(INFOFILE,"totalclicks");
$totalclicks=<INFOFILE>;
open(INFOFILE,"totalhits");
$totalhits=<INFOFILE>;
open(INFOFILE,"costperyear");
$costperyear=<INFOFILE>;
 

Did you notice that I didn't bother to "close INFOFILE"? Perl assumes that if you are opening the same filehandle again, you must want to close the file you had open previously, so it just does it- no whining, no crashing out, no nagging.

That's true throughout the entire language. As another example, Perl makes no hard and fast distinction betwwen numbers and strings that look like numbers. If you have "713" in a variable, you can treat it as a number or a string and Perl will do the right thing:

$whatisit="713";
$whatisit++;
print $whatisit;
# prints 714
print "\n";
$whatisit .= " apples";
print $whatisit;
# prints '714 apples'
print "\n";
$whatisit++;
print $whatisit;
# back to just a number again: 715
print "\n";
 
cartoon

Easy arrays

Perl has two kinds of arrays, and you are going to love them. The first kind is the traditional type you might know from Basic or C; it's indexed by numbers. This should make sense to anyone who's worked with arrays in any other language:

@month=("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec");
print "$month[11]\n";
# prints Dec (array starts at 0, which is Jan)
 

The other kind of array is a "hash". If you know "awk", you already know about these, but if not, this might give you the idea:

%names= (
"scotest" => "Unix Skills Test",
"linuxtest" => "Linux Skills Test",
"quickppp" => "PPP HOW-TO",
"ipfilter" => "IPFILTER Firewalls",
);
print $names{"scotest"};
 

At first, this is confusing, because we refer to the array in two different ways, using "@month" for the whole array and "$month[somenumber]" for a particular element. Hashes are worse, because that uses "%arrayname" when we're referring to the whole thing and "$arrayname{some_element}" for one element (notice the squiggly brackets).

Here's how I remembered the difference when I first started Perl. Square brackets are "square", or "conservative"- so they are the old, traditional arrays. An element is "at" a particular position in such an array, so "@" is its type. "Hash", on the other hand is all ground up- the brackets get distorted by the grinding, so they are squiggly. And if you use your imagination to squish an "@" symbol, you might get a "%".

None of that helps with learning to use the "$" sign when you want an element. You'll just have to get used to it.

If that was all there is to arrays, they'd be useful, but Perl gives you some great ways to loop through them. Traditional, numerically indexed arrays are easy, of course. But how do you run through all the elements in a hash array?

foreach (%names ) {
  print "$_\n";
}
 

That's all it takes. It works, but it's a little strange, and not very useful (try it). Of course, Perl has a better way:

foreach (keys %names ) {
  print "$_ is $names{$_}\n";
}
 

That's better, but this is better yet:

foreach (sort keys %names ) {
  print "$_ is $names{$_}\n";
}
 

And how about this?

foreach (reverse sort keys %names ) {
  print "$_ is $names{$_}\n";
}
 
cartoon

Pattern Matching

Perl's pattern matching is an absolute joy. It can be a little confusing at first, but once the concept clicks in, it becomes natural, and so much easier than anything else you've ever worked with. If you are used to "sed" and"awk", Perl is those tools super-charged. Let's look through a file for a certain word:

while (<>) {
  print "$_" if /\bhello\b/i;
}
 

That "\b" is a neat little helper. It says that "hello" has to be at a "word boundary", which is not necessarily a space. It could be the beginning of a line, the end, or it could follow punctuation. The little "i" says "ignore case". There's more little modifiers like that, but I'm not going to cover them here.

In this case, the /\bhello\b/ tests against "$_" (which I still haven't fully explained). It can test any variable, though:

foreach $line (@files) {
  print $line if $line =~ /hello/i;
}
 

That weird little "=~" is what makes the match test work against $line. Did you notice the "do something if.." way of testing? You could also do:

if ( $line =~ /hello/i ) {
  print $line;
}
 

There's another thing to notice about that: I didn't use "$_". That's because it isn't available when I specifically say "foreach $line": the "$_" appears only when I don't specify a variable (as I did in the earlier examples).

There are many places where you can just assume "$_" will be available, but you do have to watch out for things like this that disable it.

Substitutions

Add an "s" in front and it starts working like "sed":

$line =~ s/hello/greetings/;
 

will change "hello" to "greetings" if it occurs in $line. But it's really much more powerful than that. I don't have the space in this article to go into the incredible power of Perl's pattern matching and substitution features, but believe me, it is just incredible. I'll just give one little example without explanation:

Some of you may use "uncgi" for your cgi scripts. That's fine, but it's so easy to do in Perl. Here's what I use for POST scripts:

#!/usr/bin/perl5
$query=""; # simply to prevent warning in read about uninitialized
read (STDIN,$query,$ENV{'CONTENT_LENGTH'});
@pairs=split(/&/,$query);
foreach $keyv (@pairs) {
        ($key,$value)=split(/=/,$keyv);
        $value =~ tr/+/ /;
        $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C",hex())/eg;
        $formdata{$key}=$value;
        }
foreach $key (keys %formdata) {
 $$key=$formdata{$key};
}
 

That works very much like "uncgi". For example, if you have a form element called "search", its value will be in "$search", etc. It's that "$$key=" that pulls off that trick. But it's that

        $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C",hex())/eg;
 

that does most of the work. As I said, I'm not going to explain it here, but if you know what the POST method delivers to your script, you should really appreciate the power.

(Actually- it's even easier, because what you'd really use is the CGI module which means you don't have to worry about any of it, but this shows you how you COULD do such things.)

For scripts that get passed arguments on the command line, it's even easier: Perl stores all the arguments in an array called @ARGV. Therefor, you can refer to $ARGV[0] to get the first, you can extract the arguments and remove them from the array with something like

$first=shift @ARGV;
$second=shift @ARGV;
 

Or you can just run through the whole thing with

foreach (@ARGV) {
  print "$_;\n";
}
 

or:

print "$p\n" while ($p = shift @ARGV);
 

That's Perl: there's a dozen ways to do it, and you use what makes sense at the time.

cartoon

Split and join

You have a file like this that you want to extract elements from:

field|more|stuff
one|this|that
two|the other|more data
 

Piece of cake:

while (<>) {
 @stuff=split /\|/, $_;
 print "$stuff[0] $stuff[2]\n";
}
 

The opposite of split is join:

while (<>) {
 @stuff=split /\|/, $_;
 $f=join "+",@stuff;
 print "$f\n";
}
 

That changes the "|" separators to "+"'s.

So much more

You could write useful programs with just the few little ideas you've learned here. That, is, in fact, one of the other things I love about Perl: you can get started using it with a very minimal understanding and with lots of things still confusing you. Many of the early Perl programs I wrote did things like this:

print "<p align=\"center\"><a href=\"/index.html\">
<img src=\"/image21.gif\" BORDER=0 WIDTH=69 
HEIGHT=76></a> <br><p align=center><font size=2>
<b>A.P. Lawrence Home</b></font>";
 

There's a lot of confusing quoting in that print statement, and (of course) there are easier ways to do it:

print <<EOF;
<p align="center"><a href="http://aplawrence.com/index.html">
<img src="/image21.gif" BORDER=0 WIDTH=69 
HEIGHT=76></a> <br><p align=center><font size=2>
<b>A.P. Lawrence Home</b></font>
EOF
 

Or even:

print q?<p align="center"><a href="http://aplawrence.com/index.html">
<img src="/image21.gif" BORDER=0 WIDTH=69 
HEIGHT=76></a> <br><p align=center><font size=2>
<b>A.P. Lawrence Home</b></font>?;
 

If you want to get started with Perl, you'll need some books. See these for starters:

Learning Perl
Programming Perl
Perl Cookbook
Advanced Perl

You may want to look at some of the other programming articles here; several of them are written with Perl:



Got something to add? Send me email.





Increase ad revenue 50-250% with Ezoic


More Articles by

Find me on Google+

© Tony Lawrence



Kerio Samepage


Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us





The difference between theory and practice is that in theory, there is no difference between theory and practice. (Richard Moore)

UNIX is simple. It just takes a genius to understand its simplicity. (Dennis Ritchie)








This post tagged:

AWK