(OLDER) <- More Stuff -> (NEWER) (NEWEST)
Printer Friendly Version



Why I love Perl

March 2000



This article is written for people who have at least some experience writing shell scripts or who have at least a basic understanding of another programming or scripting language. To understand it, you will need to have Perl installed so that you can test these ideas for yourself and see what happens.

I'm a fairly recent convert to Perl, having only started using it a few years ago. Switching to something new is always somewhat uncomfortable; there's new syntax to learn, and sometimes whole new ways of doing things. That was certainly the case with Perl, but the pain was offset by the sheer joy of being able to do so many formerly clumsy tasks so simply and elegantly.

Let's dispose of one thing first: I'm not a Perl expert. I'm not an expert at anything- there are just too many things in the world that catch my attention that I can never spend the time necessary to become really proficient at anything. So I am a Perl dabbler: I write a lot of my scripts with it, but I don't for a minute pretend that these are shining examples of Perl at its best.

However, I have learned a few things, and if you are getting ready to start using Perl, you might find my experiences useful.

Those wonderful <>'s

Let's start with a really simple program that just emulates "cat".

#!/usr/bin/perl5
while (<>) {
  print $_;
}
 

Never mind that "$_" for now; we'll get to that later. For now, just accept that it's the line read. See those <> inside the ()'s? That's the entire magic. That will read data from standard input or from a file given on the command line. That means you can use this as a filter or give it an argument; all of these do the same thing:

cat.pl < somefile 
cat somefile | cat.pl 
cat.pl somefile
 

That's pretty cool all by itself. Most languages would make you jump through hoops to do just that. But here's the most wonderful part: you can give it multiple filenames

cat.pl file1 file2 file3
 

and those magic <>'s will just keep on reading with absolutely no effort on your part. If you don't need to, there's no reason to pay any attention at all to the arguments; Perl handles them for you.

If you do need to know when one file closes and another opens, the "eof" command will tell you. Try this with multiple files:

#!/usr/bin/perl5
while (<>) {
  print $_;
  print "--------------------------\n" if eof;
}
 

You can even get the file names if you want them:

#!/usr/bin/perl5
while (<>) {
  print $_;
  print "--- End of $ARGV ----\n" if eof;
}
 

Are you starting to like this? It gets better. Those angle brackets have more magic: they can read an entire file in one gulp. You can do this, for example:

#!/usr/bin/perl5
@files=<>;
print @files;
 

Everything got read into "@files", which is an array. Here we just printed it, but there's much more you can do.





There's more magic in those angle brackets, too. Take a look at this little snippet:

while (<[A-Z]*/*.html [a-e]*.html [g-z]*.html>) {
...
}
 

That loops through the names of files matched by the wildcards. What could be easier?

What if you actually want to open a specific file? Still easy:

open(MYFILE,".profile") or die "Can't open .profile";
while (<MYFILE>) {
...
}
 

That "open" isn't limited to files. Here's something you'll see a lot of:

open(MAIL,"|/usr/bin/mail myaddress\@mydomain.com");
print MAIL "Special message from a Perl program";
close MAIL;
 

In general, Perl goes out of its way to make things easy for you. Look at this sequence:

open(INFOFILE,"totalclicks");
$totalclicks=<INFOFILE>;
open(INFOFILE,"totalhits");
$totalhits=<INFOFILE>;
open(INFOFILE,"costperyear");
$costperyear=<INFOFILE>;
 

Did you notice that I didn't bother to "close INFOFILE"? Perl assumes that if you are opening the same filehandle again, you must want to close the file you had open previously, so it just does it- no whining, no crashing out, no nagging.

That's true throughout the entire language. As another example, Perl makes no hard and fast distinction betwwen numbers and strings that look like numbers. If you have "713" in a variable, you can treat it as a number or a string and Perl will do the right thing:

$whatisit="713";
$whatisit++;
print $whatisit;
# prints 714
print "\n";
$whatisit .= " apples";
print $whatisit;
# prints '714 apples'
print "\n";
$whatisit++;
print $whatisit;
# back to just a number again: 715
print "\n";
 
cartoon

Easy arrays

Perl has two kinds of arrays, and you are going to love them. The first kind is the traditional type you might know from Basic or C; it's indexed by numbers. This should make sense to anyone who's worked with arrays in any other language:

@month=("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec");
print "$month[11]\n";
# prints Dec (array starts at 0, which is Jan)
 

The other kind of array is a "hash". If you know "awk", you already know about these, but if not, this might give you the idea:

%names= (
"scotest" => "Unix Skills Test",
"linuxtest" => "Linux Skills Test",
"quickppp" => "PPP HOW-TO",
"ipfilter" => "IPFILTER Firewalls",
);
print $names{"scotest"};
 

At first, this is confusing, because we refer to the array in two different ways, using "@month" for the whole array and "$month[somenumber]" for a particular element. Hashes are worse, because that uses "%arrayname" when we're referring to the whole thing and "$arrayname{some_element}" for one element (notice the squiggly brackets).

Here's how I remembered the difference when I first started Perl. Square brackets are "square", or "conservative"- so they are the old, traditional arrays. An element is "at" a particular position in such an array, so "@" is its type. "Hash", on the other hand is all ground up- the brackets get distorted by the grinding, so they are squiggly. And if you use your imagination to squish an "@" symbol, you might get a "%".

None of that helps with learning to use the "$" sign when you want an element. You'll just have to get used to it.

If that was all there is to arrays, they'd be useful, but Perl gives you some great ways to loop through them. Traditional, numerically indexed arrays are easy, of course. But how do you run through all the elements in a hash array?

foreach (%names ) {
  print "$_\n";
}
 

That's all it takes. It works, but it's a little strange, and not very useful (try it). Of course, Perl has a better way:

foreach (keys %names ) {
  print "$_ is $names{$_}\n";
}
 

That's better, but this is better yet:

foreach (sort keys %names ) {
  print "$_ is $names{$_}\n";
}
 

And how about this?

foreach (reverse sort keys %names ) {
  print "$_ is $names{$_}\n";
}
 
cartoon

Pattern Matching

Perl's pattern matching is an absolute joy. It can be a little confusing at first, but once the concept clicks in, it becomes natural, and so much easier than anything else you've ever worked with. If you are used to "sed" and"awk", Perl is those tools super-charged. Let's look through a file for a certain word:

while (<>) {
  print "$_" if /\bhello\b/i;
}
 

That "\b" is a neat little helper. It says that "hello" has to be at a "word boundary", which is not necessarily a space. It could be the beginning of a line, the end, or it could follow punctuation. The little "i" says "ignore case". There's more little modifiers like that, but I'm not going to cover them here.

In this case, the /\bhello\b/ tests against "$_" (which I still haven't fully explained). It can test any variable, though:

foreach $line (@files) {
  print $line if $line =~ /hello/i;
}
 

That weird little "=~" is what makes the match test work against $line. Did you notice the "do something if.." way of testing? You could also do:

if ( $line =~ /hello/i ) {
  print $line;
}
 

There's another thing to notice about that: I didn't use "$_". That's because it isn't available when I specifically say "foreach $line": the "$_" appears only when I don't specify a variable (as I did in the earlier examples).

There are many places where you can just assume "$_" will be available, but you do have to watch out for things like this that disable it.

Substitutions

Add an "s" in front and it starts working like "sed":

$line =~ s/hello/greetings/;
 

will change "hello" to "greetings" if it occurs in $line. But it's really much more powerful than that. I don't have the space in this article to go into the incredible power of Perl's pattern matching and substitution features, but believe me, it is just incredible. I'll just give one little example without explanation:

Some of you may use "uncgi" for your cgi scripts. That's fine, but it's so easy to do in Perl. Here's what I use for POST scripts:

#!/usr/bin/perl5
$query=""; # simply to prevent warning in read about uninitialized
read (STDIN,$query,$ENV{'CONTENT_LENGTH'});
@pairs=split(/&/,$query);
foreach $keyv (@pairs) {
        ($key,$value)=split(/=/,$keyv);
        $value =~ tr/+/ /;
        $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C",hex($1))/eg;
        $formdata{$key}=$value;
        }
foreach $key (keys %formdata) {
 $$key=$formdata{$key};
}
 

That works very much like "uncgi". For example, if you have a form element called "search", its value will be in "$search", etc. It's that "$$key=" that pulls off that trick. But it's that

        $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C",hex($1))/eg;
 

that does most of the work. As I said, I'm not going to explain it here, but if you know what the POST method delivers to your script, you should really appreciate the power.

(Actually- it's even easier, because what you'd really use is the CGI module which means you don't have to worry about any of it, but this shows you how you COULD do such things.)

For scripts that get passed arguments on the command line, it's even easier: Perl stores all the arguments in an array called @ARGV. Therefor, you can refer to $ARGV[0] to get the first, you can extract the arguments and remove them from the array with something like

$first=shift @ARGV;
$second=shift @ARGV;
 

Or you can just run through the whole thing with

foreach (@ARGV) {
  print "$_;\n";
}
 

or:

print "$p\n" while ($p = shift @ARGV);
 

That's Perl: there's a dozen ways to do it, and you use what makes sense at the time.

cartoon

Split and join

You have a file like this that you want to extract elements from:

field|more|stuff
one|this|that
two|the other|more data
 

Piece of cake:

while (<>) {
 @stuff=split /\|/, $_;
 print "$stuff[0] $stuff[2]\n";
}
 

The opposite of split is join:

while (<>) {
 @stuff=split /\|/, $_;
 $f=join "+",@stuff;
 print "$f\n";
}
 

That changes the "|" separators to "+"'s.

So much more

You could write useful programs with just the few little ideas you've learned here. That, is, in fact, one of the other things I love about Perl: you can get started using it with a very minimal understanding and with lots of things still confusing you. Many of the early Perl programs I wrote did things like this:

print "<p align=\"center\"><a href=\"/index.html\">
<img src=\"/image21.gif\" BORDER=0 WIDTH=69 
HEIGHT=76></a> <br><p align=center><font size=2>
<b>A.P. Lawrence Home</b></font>";
 

There's a lot of confusing quoting in that print statement, and (of course) there are easier ways to do it:

print <<EOF;
<p align="center"><a href="/index.html">
<img src="/image21.gif" BORDER=0 WIDTH=69 
HEIGHT=76></a> <br><p align=center><font size=2>
<b>A.P. Lawrence Home</b></font>
EOF
 

Or even:

print q?<p align="center"><a href="/index.html">
<img src="/image21.gif" BORDER=0 WIDTH=69 
HEIGHT=76></a> <br><p align=center><font size=2>
<b>A.P. Lawrence Home</b></font>?;
 

If you want to get started with Perl, you'll need some books. See these for starters:

Learning Perl
Programming Perl
Perl Cookbook
Advanced Perl

You may want to look at some of the other programming articles here; several of them are written with Perl:




Click here to add your comments



Don't miss responses! Subscribe to Comments by RSS or by Email

Click here to add your comments


If you want a picture to show with your comment, go get a Gravatar



Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Publishing your articles here

Jump to Comments



Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.

Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.

We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.


book graphic unix and linux troubleshooting guide

My Troubleshooting E-Book will show you how to solve tough problems on Linux and Unix systems!



 I sell and support
 Kerio Mail server




pavatar.jpg
More:
       - Programming
       - Perl


Unix/Linux Consultants

Skills Tests

Guest Post Here








card_image






My Favorites

Change Congress