Jim Mohr's SCO Companion


Copyright 1996-1998 by James Mohr. All rights reserved. Used by permission of the author.

Be sure to visit Jim's great Linux Tutorial web site at http://www.linux-tutorial.info/

5 Running An Internet Server

Setting up an Internet server is just the first step. Providing web pages or files to download via ftp may make for an interesting site, but you need to get people to keep coming back. There best way to do that is to make the site interesting as well as interactive. In this chapter, we are going to talk about some of the ways you can make a really interesting site.

One of the features of many sites that makes them interesting is the fact that they are interactive. The visitor inputs some information and gets back a response. This is handled by the Common Gateway Interface (CGI). Essentially, the only criteria for a CGI program is that it understands standard input and standard output. This means that CGI programs can be written in C or as a shell script. Although you have a copy of the SCO dev sys on the CD, writing them in C is not always the easiest thing to do. You could write them as a shell script, but you then lack some power. The solution is the perl scripting language. This has the almost all the power of C, but is a script language that makes development easy.

Perl — The Language of the Web

If you plan to do anything serious on the Web, then I suggest that you learn perl. In fact, if you plan to do anything serious on you machine, then learning perl is also good idea. The downside of all this is that perl is not part of the standard SCO distribution. However, it is available on the SCO Skunkware CD, or you can download it from the SCO FTP site at ftp.sco.com/Skunk2/CD-ROM/bin/perl.

Now, I am not saying that you shouldn't learn sed, awk and shell programming. Rather, I am saying that you should learn all four. Both sed and awk have been around for quite a while so they are deeply ingrained in the thinking of most system administrators. Although you could easily find a shell script on the system that didn't have elements of sed or awk in them, you would be very hard pressed to find scripts that had no shell programming in them. On the other hand, most of the scripts that process information from other programs use either sed or awk. Therefore it is likely that you will eventually come across one or the other.

Perl is another matter all together. None of the standard scripts have perl in them. This does not say anything about the relative value of perl, but rather the relative availability of it. Since it can be expected that awk and sed are available, it makes sense that they are commonly used. Perl may not be on your machine and by including it in a system shell script you might run into trouble.

In this section we are going to talk about the basics of perl. We'll go through the mechanics of creating perl scripts and the syntax of the perl language. There are many good books on perl, so I would direct you to them to get into the nitty-gritty. Here we are just going to cover the basics. Later on, we'll address some of the issues involved with making perl scripts to use on your web site.

I was once asked why one should program in perl instead of C when developing Web pages. Since you have a copy of the SCO development system on the CD, you don’t need to go out and buy it before you start programming. First of all, perl is an interpreted language. That is, you do not need to compile it or link in different libraries to make it go. This (in my mind) shortens the development time as you do not need to wait for the program to compile.

Another aspect is that there are many things built into perl that make programming for the Web easier. The name perl stands for Practical Extraction and Reporting Language. The are several constructs and functions in perl that make it ideal for CGI programming.

I also think it is easier to learn and to code. Whereas you would need several lines of code in C, you can accomplish the same thing in perl with just a single line. Finally, perl code is essentially the same on every system. Unless you make extensive use of the system commands and programs, there is usually no porting work required.

In most programming texts that I remember using, there was always an introduction that covered the "basics" to make sure we were all starting at the same point. Although I wish I had the space to do that with perl, on the other hand I don't think it is necessary. We have already discussed most of the issues that will come in to play, and those that we haven't talked about yet, we'll get to.

I am going to make assumptions as to what level of programming background you have. If you read and understood the sections on sed, awk and the shell, then you should be ready for what comes next. In this chapter, I am going to jump right in.

Let's create a shell script, called hello.pl. The 'pl' extension has no real meaning, although I have seen many places were this is always used as the extension. It is more or less conventional to do this, just as text files traditionally have the ending 'txt', shell scripts the ending 'sh', and so on.

We'll start off with the traditional:

print "Hello, World!\n";

This shell script consists of a single perl statement, whose purpose is to output the text inside the double-quotes. Each statement in perl is followed by a semi-colon. Here we are using the perl print function to output the literal string "Hello, World!\n" (including the trailing new-line). Although we don't see it, there is the implied file handle to stdout. The equivalent with the explicit reference would be:

print STDOUT "Hello, World!\n";

Once the script is created, we can start in one of two ways. The first is passing the script as an argument to the perl program, as in:

perl hello.pl

The other way is to have the system do the work for use by telling it which interpreter to use. To do this we add #!/usr/bin/perl to the top of the script, so the entire script would look like this like this:


print "Hello, World!\n";

Then we can make the script executable and run it like this:


Along with STDOUT, perl has default file handlers STDIN and STDERR. Here is a quick script to demonstrate all three, as well as introduce a couple of familiar programming constructs:

while (<STDIN>)


if ( $_ eq "\n" )


print STDERR "Error: \n";

} else {

print STDOUT "Input: $_ \n";



Functioning the same as in C and most shells, the while line at the top says that as long as there is something coming from stdin, do the loop. Here we have the special format (<STDIN>), which tells perl where to get input. If we wanted, we could use a file handle other than STDIN. However, we'll get to that in a little bit.

One thing that you need to watch out for is that you have to include blocks of statements (such as after while or if statements) inside of the curly brackets ({}). This is different from the way you can do it in C, where a single line can follow a while or if. For example, this statement is not valid in perl:

while ( $a < $b )


You would need to write it something like this:

while ( $a < $b ) {



Inside of the while loop, we get to an "if" statement. We compare the value of the special variable $_ to see if it is empty. The variable $_ serves several functions. In this case, it represents the line we are reading from STDIN. In other cases, it represents the pattern space, as in sed. If it is empty, then just the enter key was pressed. If the line we just read in is equal the new line character (just a blank line), we use the print function, which has the syntax:

print [filehandler] "text_to_print";

In the first case, a filehandler is stderr and stdout in the second. In each case, we could have left off the file handler and the output would go to stout.

Each time we print a line, we need to include a newline ("\n") ourselves.

We can format the print line in different ways. In the second print line, where the input was not a blank line, we print "Input: " before we print the line just input. Although this is a very simple way of outputting lines, it does get the job done. More complex formatting is possible with the perl printf function. Like it's counterpart in C or awk, you can come up with some very elaborate outputs. We'll get into more details later.

One of the more useful functions for processing lines of input is split. The split function is used, as it's name implies, to split the line based on a field separator that you define. Say, for example, a space. The line is then stored in an array as individual elements. So, in our example, if we wanted to input multiple words and have them parsed correctly, we could change the script to look like this:

while (<STDIN>)


@field = split(' ',$_);

if ( $_ eq "\n" )


print STDERR "Error: \n";

} else {

print STDOUT "$_ \n";

print $field[0];

print $field[1];

print $field[2];



The split function has the syntax


where 'pattern' is our field separator and 'line' is the input line. So our line:

@field = split(' ',$_);

says to split the line we just read in (stored in $_) and use a space (' ') as the field separator. Each field is then placed into an element of the array field. The at-sign (@) is needed in front of the variable (field) to indicate it is an array. In perl, there are several types of variables. The first kind we have already met before. The special variable $_ is an example of a scalar variable. Each scalar variable s preceded by a dollar-sign ($) and can contain a single value, whether a character string or a number. How does perl tell the difference? It depends on the context. Perl will behave correctly by looking at what you tell it to do with the variable. Other examples of scalars are:

$name = "jimmo";

$initial = 'j';

$answertolifetheuniverseandeverything = 42;

Another kind of variable is an array, as we mentioned before. If we precede a variable with a percent-sign (%), we have an array. But don't we have an array with the at-sign? Yes, so what's the difference? The difference is that arrays starting with the at-sign are referenced by numbers, while those starting with the percent-sign are referenced by a string. We’ll get to how that works as we move along.

In our example, we are using the split function to fill up the array @field. This array will be referenced by number. We see the way it is referenced in the three print statements towards the end of the script.

If our input line had a different field separator (for example, '%'), the line might look like this:

@field = split('%',$_);

In this example, we are just outputting the first three words that are input. But what if there are more words? Obviously we just add more print statements. What if there are fewer words? Now we run into problems. In fact, we run into problems when adding more print statements. The question is where do we stop? Do we set a limit on the number of words that can be input? Well, we can avoid all of those problems by letting the system count for us. By changing the script a little, we’ll get:

while (<STDIN>)


@field = split(' ',$_);

if ( $_ eq "\n" )


print STDERR "Error: \n";

} else {

foreach $word (@field){

print $word,"\n";




We introduce the 'foreach' construct. This has the same behavior as a "for" loop. In fact, in perl, "for" and "foreach" are interchangeable, provided you have the right syntax. In this case the syntax is:

foreach $variable (@array)

Where '$variable' is our loop variable, and [email protected]' is the name of the array. When the script is run, the @array is expanded to it's individual components. So, if we had input four fruits, our line might have looked like this:

foreach $word ('apple','banana','cherry','orange');

Since I don't know how many elements there are in the array field, foreach comes in handy. In this example, every word separated by a space will be printed on a line by itself. Like this:

perl script.pl

one two three





The ^D is shorthand for saying that you press the CTRL key and the ‘d’ (lowercase) at the same time. This tells the script that you have reached the end of the file.’

Our next enhancement is to change the field separator. This time we'll use an ampersand (&) instead. The split line now looks like this:

@field = split('&',$_);

When we run the script again with the same input, what we get is a bit


# perl script.pl

one two three

one two three

The reason why we get the output on one line is because the space is no longer a field separator. If we run it again, this time using an ampersand, we get something different:

# perl script.pl





In this case, the three words were recognized as separate fields.

Although it doesn't seem too likely that you would be inputting data like this from the keyboard, it is not unthinkable that you might want to read a file that has data stored this way. To make things easy, I have provided a file that represents a simple database of books. Each line is a record and represents a single book, with the fields separated by a percent sign.

To be able to read from a file, we have to create a file handle. To do this we add a line and change the while statement so it now looks like this:

open ( INFILE,"< bookdata.txt");

while (<INFILE>)

The syntax of the open function is:


The way we open a file depends on the way we want to read it. Here, we use standard shell redirection symbols to indicate how we want to read the specified file. In our example, we indicate redirection from the file bookdata.txt. This says we want to read from the file. If we want to open for writing, the line would look like this:

open ( INFILE,"> bookdata.txt");

If we want to append to the file, we change the redirections so the line

would look like this:

open ( INFILE,">> bookdata.txt");

Remember I said that we use standard redirection symbols? This also includes the pipe symbol. As the need presents itself, your perl script can open a pipe for either reading or writing. Assuming that we want to open a pipe for writing that sends the output through sort. The line might look like this:

open ( INFILE,"| sort ");

Remember that this would work the same as from the command line. Therefore, the output is not being written to a file, just being piped through sort. However, we could do so if we wanted to. For example:

open ( INFILE,"| sort > output_file");

opens the file output_file for writing, but the output is first piped through sort. In our example, we are opening the file bookdata.txt for reading. The while loop continues through and outputs each line read. However, instead of being on a single line, the individual fields (separated by an ampersand) are output on a separate line.

We can now take this one step further. Let's assume that a couple of the fields are actually composed of sub-fields. These sub-fields are separated by a plus sign (+). We want to break up every field containing a plus sign into its individual sub-fields.

As you probably guessed, we use the split command again. This time we use a different variable and instead of reading out of the input line ($_), we are reading out of the string $field. Therefore, the line would look like this:

@subfield = split('\+',$field);

Aside from changing the search pattern, I add the backslash. This is because the plus-sign is used in the search pattern to represent one or more occurrences of the preceding character. If we don't escape it, we generate an error. The whole script now looks like this:


while (<INFILE>)


@data = split('&',$_);

if ( $_ eq "\n" )


print STDERR "Error: \n";

} else {

foreach $field (@data){

@subfield = split('\+',$field);

foreach $word (@subfield){

print $word,"\n";





If we wanted to, we could have written the script to split the incoming lines at both the ampersand and the plus-sign, which would have given us a split line that looked like this:

@data = split('[&\+]',$_);

The reason for writing the script as we did was that it is easier to separate sub-fields and still maintain their relationship. Note that the search pattern here can be any regular expression. For example, we could split the strings every place there is the pattern 'Di', when it is followed by an 'e', 'g' or an 'r', but *not* if that is followed by and 'i'. The regular expression would be:


so the split function would be:

@data = split('Di[reg][^i]',$_);

At this point, we can read in lines from an ASCII file, separate the line based on what we have defined as fields and then output each line. However, the lines don't look very interesting. All we are seeing is the content of each field and do not know what each field represents. Let's change the script once again. This time we will make the output show us the field names as well as their content.

Let's also change the script so that we have control over where the fields end up. We still use the split statement to extract individual fields from the input string. This is not necessary since we can do it all in one step, but I am doing it this way to demonstrate the different constructs and illustrating the adage that in Perl there is always more than one way do to something. So we end up with the script:

open(INFILE,"< bookdata.txt");

while (<INFILE>)


@data = split('&',$_);

if ( $_ eq "\n" )


print STDERR "Error: \n";

} else {

$fields = 0;

foreach $field (@data){

$fieldarray[$fields] = $field;

print $fieldarray[$fields++]," ";




Each time we read a line, we first split it into the array @data, which is then copied into the fields array. Note that there is no new-line in the print statement, so each field will be printed followed by a space. The new-line read at the end of each input line will then be output. Each time through the loop, we reset our counter (the variable $fields) to 0.

Although the array is re-filled every time through the loop and we lose the previous values, we could assign the values to specific variables.

Now let's make the output a little prettier, by outputting the field headings first. To make things simpler, let's label the fields as follows:

title, author, publisher, char0, char1, char2, char3, char4, char5

where char0-char5 are simply characteristics about the book. We need a handful of if statements to make the assignment that will look like this:

foreach $field (@data){

if ( $fields = = 0 ){

print "Title: ",$field;


if ( $fields = = 1 ){

print "Author: ",$field;





if ( $fields = = 8 ){

print "Char 5: ",$field;


Here, too, we would be losing the value of each variable every time through the loop as they get overwritten. Let's just assume we only want this save this information from the first line (why will become clear in a minute). First we need a counter to keep track of what line we are on, and an if statement to enter the block where we make the assignment. Rather than a print statement, we change the line to an assignment, so the first line might look like this:

$title = $field;

When we read subsequent lines, we can then output headers for each of the fields. We do this by having another set of if statements that output the header and then the value, based on it's position.

Actually, there is a way of doing things a little more efficiently. When we read the first line, we can assign the values to variables on a single line. Instead of the line:

foreach $field (@data) {

we add the if-statement to check if this is the first line and add the line:



Rather than assigning values to elements in an array, we are assigning then to specific variables. (Note that if there are more fields generated by the split command than we specify variables for, the remainder will be ignored.) The other advantage of this is that we have saved ourselves a lot of space. We could also call these $field1, $field2, and so on, thereby making the field name a little more generic. We could also modify the split line so instead of several separate variables, have them in a single array called field and then use the number as the offset into the array. Therefore, the first field would be referenced like this:


and the split command for this would look like:


Kind of looks like something we already had. It is. This is just another example of the fact that there are always different ways to do something in perl.

At this point, we still need the series of if-statements inside of the foreach loop to print out the line. However, that seems like a lot of wasted space. Instead, we introduce the concept of an associated list. An associated list is just like any other list, except that you reference the elements by a label rather than a number

Another difference is that associated arrays, also referred to as associated lists, are always an even length. This is because elements come in pairs: label and value. For example, we have:

%list= ('name','James Mohr', 'logname','jimmo', 'department,'IS');

Note that instead of $ or @ to indicate this is an array, we use a %. This specifies that this is an associative array, so we can refer to the value by label, however when we finally reference the value, we do use the $. To print out the name, the line would look like this:

print "Name:",$list{name};

Also different is the brackets we use. Here we use curly bracket instead of square brackets.

The introduction of the associate array allows us to define the field labels within the data itself and access the values using these labels. As I mentioned, the first line of the data file are the field labels. We can use these labels to reference the values. Let's look at the program itself:

open(INFILE,"< bookdata.txt");


while (<INFILE>)



@data = split('&',$_);

if ( $lines == 0 )



foreach $field (0..@headlist-1){

%headers = ( $headlist[$field],'' );



} else {

foreach $field (0..@data-1){

$headers{$headlist[$field][email protected][$field];

print $headlist[$field],": ", $headers{$headlist[$field]},"\n";




At the beginning of the script, we added the chop function which 'chops' off the last character of a list or variable and returns that character. If you don't mention the list or variable, chop effects the $_ variable. This function is useful to chop off the new-line character that gets read in. The next change we made was to remove the block that checked for blank lines and generated an error.

The first time we read a line, we enter the appropriate block. Here we have just read in the line containing the field labels and we put each entry into the array headlist via the split function. The foreach loop also adds some new elements:

foreach $field (0..@headlist-1){

%headers = ( $headlist[$field],'' );


The first addition is the element (0.. @headlist-1). Two numbers separated by the two dots indicate a range. We can use @headlist as a variable to indicate how many elements are in the array headlist. This returns a human number, not a computer (one that starts at 0). Since I chose to access all my variables starting with 0, I need to subtract 1 from the value of @headlist. There are 9 elements per line in the file bookdata.txt, therefore their range is 0..9-1.

However, we don't need to know that! In fact, we don't even know how many elements there are to make use of this functionality. The system knows how many elements it read in, so we don't have to. We just use @headlist-1 (or whatever).

The next line fills in the elements of our associative array:

%headers = ( $headlist[$field],'' );

However, we are only filling in the labels and not the values themselves. Therefore, the second element of the pair is empty (''). One by one, we write the label into the first element of each pair.

After the first line is read, we load the values themselves. Here again we have a foreach loop that goes from 0 to the last element of the array. Like the first loop, we don't need to know how many elements were read it, as we let the system keep track of this for us. The second element in each pair of the associative list is loaded with this line:

$headers{$headlist[$field][email protected][$field];

Let's take a look at this line starting at the left end. From the array @data, (which is the line we just read in) we are accessing the element at the offset specific by the variable $field. Since this is just the counter used for our foreach loop, we go through each element of the array data one by one. The value retrieved is then assigned to the left hand side.

On the left, we have an array offset being referred to by an array offset. Inside we have:

au: “array offset being referred to by an array offset” ok?


The array headlist is what we filled up in the first block. In other words, the list of field headings. When we reference the offset with the $field variable, we get the field heading. This will be used as the string for the associative array. The element specified by:


corresponds the field value. For example, if the expression


evaluated to 'title', then the second time through the loop, the expression


might evaluate to "2010: Odyssey Two."

At this point we are now ready to make our next jump. We are going to add the functionality to search for specific values in the data. Let's assume that we know what the fields are and wish to search for a particular value. For example, we want all books that have scifi as field char0. Assuming that the script was called book.pl, we would specify the field label and value like this:

book.pl char0=scifi

The completed script looks like this:

($searchfield,$searchvalue) = split('=',$ARGV[0]);

open(INFILE,"< bookdata.txt");


while (<INFILE>)



@data = split('&',$_);

if ( $_ eq "\n" )


print STDERR "Error: \n";

} else {

if ( $lines == 0 )



foreach $field (0..@headlist-1){

%headers = ( $headlist[$field],'' );



} else { foreach $field (0..@data-1){

$headers{$headlist[$field][email protected][$field];

if ( ($searchfield eq $headlist[$field] ) &&

($searchvalue eq $headers{$headlist[$field]} )) {






if ( $found == 1 )


foreach $field (0..@data-1){

print $headlist[$field],": ", $headers{$headlist[$field]},"\n";





We added a line at the top of the script that splits the first argument on

the command line:

($searchfield,$searchvalue) = split('=',$ARGV[0]);

Note that we are accessing ARGV[0]. This is not the command being called, as one would expect from C or shell programming. Our command line had the string char0=scifi as it's $ARGV[0]. After the split, $searchfield=char0 and $searchvalue=scifi.

Some other new code looks like this:

if ( ($searchfield eq $headlist[$field] ) &&

($searchvalue eq $headers{$headlist[$field]} )) {


Instead of outputting each line in the second foreach loop, we change it so that here we are checking to see if the field we input ($searchfield) is the one we just read in $headlist[$field], and if the value we are looking for ($searchvalue) equals the one we just read in.

Here we add another new concept, that of logical operators. These are just like in C, where && means a logical AND || is a logical OR. If we want a logical comparison if two variables each have a specific value, we would just ??? the logical AND, like:

if ( $a == 1 && $b = 2)

which says if $a equals 1 AND $b equals 2, then execute the following block. If we wrote it like this:

if ( $a == 1 || $b = 2)

this says that if $a equals 1 OR $b equals 2, then execute the block. In our example, we are saying that if the search field ($searchfield) equals the corresponding value in the heading list ($headlist[$field]) AND the search value we input ($searchvalue) equals the value from the file ($headers{$headlist[$field]}), we then execute the following block. Our block simply sets a flag to say we found a match.

Later, after we read in all the values for each record, we check the flag. If the flag was set, the foreach loop is executed:

if ( $found == 1 )


foreach $field (0..@data-1){

print $headlist[$field],": ", $headers{$headlist[$field]},"\n";


Here we output the headings and then their corresponding values. But what if we aren't sure of the exact text we are looking for? For example, I want all books by the author Eddings, but do not know that his first name is David. It's now time to introduce the perl function index. As it's name implies, it delivers an index. The index it delivers is an offset of one string in another. The syntax is:


where STRING is the name of the string that we are looking in, SUBSTRING is the substring that we are looking for and POSITION is where to start looking. That is, what position to start from. If POSITION is left off, the function starts at the beginning of STRING. For example:


will return 5, as the substring 'pie' starts at position 5 of the string 'applepie'. To take advantage we only need to change one line. We change this:

if ( ($searchfield eq $headlist[$field] ) &&

($searchvalue eq $headers{$headlist[$field]} )) {

to this:

if ( (index($headlist[$field],$searchfield)) != -1 &&

index($headers{$headlist[$field]},$searchvalue) != -1 ) {

Here we looking for an offset of -1. This indicates the condition where the substring is not within the string. (The offset comes before the start of the string.) So, if we were to run the script like this:

script.pl author=Eddings

we would look through the field author for any entry containing the string Eddings. Since there are records with an author named Eddings, if we looked for Edding, we would still find it since Edding is also a substring of "David Eddings".

As you may have noticed, we have a limitation in this mechanism. We have to make sure that we spell things with the right case. Since "Eddings" is uppercase both on the command line and in the file, there is no problem. Normally names are capitalized, so it would make sense to input them like that. But what about the title of a book? Often words like "the" or "and" are not capitalized. However, what if the person who input the data, input them as capitals? If you looked for them in lowercase, but they were in the file as uppercase, you'd never find them.

In order to consider this possibility, we need to compare both the input and the fields in the file in the same case. We do this by using the tr (translate) function. It has the syntax:


Where SEARCHLIST is the list of characters to look for and REPLACEMENTLIST is the characters to use to replace those in SEARCHLIST. To see what options are available, check the perl man-page. We change part of the script to look like this:

foreach $field (0..@data-1){

$headers{$headlist[$field][email protected][$field];

($search1 = $searchfield) =~ tr/A-Z/a-z/;

($search2 = $headlist[$field] ) =~ tr/A-Z/a-z/;

($search3 = $searchvalue)=~tr/A-Z/a-z/;

($search4 = $headers{$headlist[$field]})=~tr/A-Z/a-z/;

if ( (index($search2,$search1) != -1) && (index($search4,$search3) != -1) ) {




In the middle of this section there are four lines where we do the translations. This demonstrates a special aspect of the tr function. We can do the translation as we are assigning one variable to another. This is useful since the original strings are left unchanged. We then had to change the statement with the index function and comparisons to reflect the changes in the variables.

When writing conditional statements, you have to be sure of what it is you are testing as your condition. Truth, like many other things, is in the eye of the beholder. In this case, it is the perl interpreter that is beholding your concept of true. It may not always be what you expect. In general, you can say that a value is true unless it is the null string (''), the number zero (0) or the literary string zero ("0").

One important aspect is the comparison operators. Unlike C, there are different operators for numeric comparison and for string comparison. They're all easy to remember and you have certainly seen both sets before, but keep in mind that they are different. Table 0\1 contains a list of the perl comparison operators:






equal to



not equal to



greater than



less than



greater than or equal to



less than or equal to



not equal to and signed is returned

(0 - strings equal, 1 - first string less,

-1 - first string greater)

Table 0\1 - Perl comparison operators

Another important aspect is that you need to keep in mind that there is no such thing as a numeric variable; well not really. Perl is capable of converting between the two without you interfering. If the variable is used in a context where it can only be a string, then that's they way perl will interpret it, as a string.

Let's take two variables: $a=2 and $b=10. As you might expect the expression $a < $b evaluates to true because we are using the numeric comparison operator <. However, if the expression is $a lt $b, it would evaluate to false. This is because the string "10" comes before "2" lexigraphically (even though it comes first alphabetically).

Beside simply translating sets of letters, perl can also do substitution. To show you, this I am going to show you another neat trick of perl. Being designed as a text and file processing language, it is very common to be reading in a number of lines of data and processing them all in turn. We can tell perl that it should assume we want to read in lines although we don't explicitly say so. Let's take a script that we call fix.pl that looks like this:



The syntax is the same as you find in sed; however perl has a much larger set of regular expressions. Trying to run this as a script by itself will generate an error, so instead we run it like this:

perl -p fix.pl bookdata.pl

The -p option tells perl to put a wrapper around your script. Therefore, our script would behave as we had written it like this:

while (<>) {



} continue {



This would read each line from a file specified on the command line, carry out the substitution and then print out each line, changed or not. We could also take advantage of the ability to specify the interpreter with the #!. If this the first line of a shell script, the system will use the statement following it as the interpreter. The script would then look like:

#!/usr/bin/perl -p



Another command line option is '-i'. This stands for "in-place" and with it you can edit files "in-place." In the example above, the changed lines would be output to the screen and we would have to redirect it to a file ourselves if we wanted to. The -i option takes an argument, which indicates the extension you want for the old version of the file. So, we change the first line, like this:

#!/usr/bin/perl -pi.old

With perl you can also make your own subroutines. These subroutines can be written to return values, so you now have functions as well. Subroutines are first defined with the 'sub' keyword and are called using the ampersand (&). For example:


sub usage {

print "Invalid arguments: @ARGV\n";

print "Usage: $0 [-t] filename\n";


if ( @ARGV < 1 || @ARGV > 2 ) {



This says that if the number of arguments from the command line @ARGV is less than 1 or greater than 2, we call the subroutine usage which prints out a usage message.

To create a function, we first create a subroutine. When we call the subroutine, we call it as part of an expression. The value returned by the subroutine/function is the value of the last expression evaluated.

Let's create a function that prompts you for a yes/no response:


if (&getyn("Do you *really* want to remove all the files in this directory? ")

eq "y\n" )


print "Don't be silly!\n"


sub getyn{

print @_;

$response = (<STDIN>);


This is a very simple example. In the subroutine getyn, we output everything that is passed to the subroutine. This serves as a prompt. We then assign a line we get from stdin to the variable $response. Since this is the last expression inside the subroutine to be evaluated, this is the value that is returned to the calling statement. If we enter "y" (which would include the new-line from the enter key) this is all passed to the funtion.

The calling if-statement is passing the actual prompt as an argument to the subroutine. The getyn subroutine could then be used in other circumstances. As mentioned, the value returned includes the new-line, therefore we have to check for "y\n". This is not ‘y’ or ‘n’, but rather ‘y# followed by a new-line.

Alternatively, we could check the response inside of the subroutine. We could have added the line:

$response =~ /^y/i;

We addressed the =~ characters earlier in connection with the tr function. Here as well, the variable on the left-hand side is replaced by the "evaluation" of the right. In this case, we use a pattern matching construct: /^y/i. This has the same behavior as sed, where we are looking for a 'y' at the beginning of the line. The trailing 'i' simply says to ignore the case. If the first character does begin with a 'y' or 'Y', the left-hand side ($response) is assigned the value 1; if not it becomes a null string.

We now change the calling statement and simply leave off the comparison to "y\n". Since the return value of the subroutine is the value of the last expression evaluated, the value returned now is either '1' or ''. Therefore, we don't have to do any kind of comparison, as the if-statement will react according to the return value.

I wish I could go on. I haven't even hit on a quarter of what perl can do. Unfortunately, like the sections on sed and awk, more details are beyond the scope of this book. Instead I want to refer you to a few other sources. First, there are two books from O'Reilly and Associates. The first is Learning Perl by Randal Schwartz. This is a tutorial. The other is Programming Perl by Larry Wall and Randal Schwartz. If you are familiar with other UNIX scripting languages, I feel you would be better served by getting this one. It takes the same approach that I do by explaining "this is how perl does things" rather than trying to explain to you what those "things" are. Also PERL by Example by Ellie Quigley from Prentice Hall provides an excellent tutorial.

The next place is the PERL CDROM from Walnut Creek CDROM (www.cdrom.com). This is loaded with hundreds of megabytes of perl code and the April 1996 version which I used contains the source code for perl4 (4.036) and perl5 (5.000m). In many cases, I like this approach more since I can see how to do things I need to do. Books are useful to get the basics and reminders of syntax, options, and the like. However, seeing someone else's code shows me how to do it.

Another good CD is the Mother of PERL CD from InfoMagic (www.infomagic.com). It, too, is loaded with hundreds of megabytes of perl scripts and information.

There are a lot of places to find sample scripts while you are waiting for the CD to arrive. One place is the Computers and Internet: Programming Languages: Perl hierarchy at Yahoo. (www.yahoo.com). You can use this as a spring board to many sites that not only have information on perl but use perl on the Web (e.g., in CGI scripts).





$x = $y


Assign the value of $b to $a

$x += $y


Add the value of $b to $a

$x -= $y


Subtract the value of $b from $a

$x .= $y


Append string $y onto $x

String Operations



Delivers offset of string $y in string $x



Delivers substring on $x, starting at $y of length $len

$x . $y


$x and $y considered a single string, but each remains unchanged.

$x x $y


String $x is repeated $y times

Pattern Matching

$var =~ /pattern/


True if $var contains “pattern”

$var =~ s/pat/repl/


Substitutes “repl” for “pat”

$var =~ tr/a-z/A-Z/


Translates lowercase to uppercase

Math Operations

$x + $y

Sum of $x and $y

$x - $y

Difference of $x and $y

$x * $y

Product of $x and $y

$x / $y

Sum of $x and $y

$x % $y

Sum of $x and $y

$x ** $y

Sum of $x and $y

$x++, ++$x

Sum of $x and $y

$x--, --$x

Sum of $x and $y

Logic Operations

$x && $y

logical AND

True if both $x and $y are true

$x || $y

logical OR

True if either $x or $y is true

!$ $x

logical NOT

True if $x is not true

Table 0\2 Perl Operations

Building Our Pages

In this section, we are going to talk about the basics of putting together Web pages. Entire books have been published that go into more detail than I do here. However, as with perl scripts, I feel that the best way to create good Web pages is to practice. I can give you the tools, but it is up to you to become good at using them.

HTML - The Language of The Web

Web pages are written in the Hypertext Markup Language (HTML). This is a "plain-text" file that can edited by any editor, like vi. The HTML commands are similar, and also simpler, that those used by troff. (a text processing language available from various places on the Web). In addition to formatting commands, there are built in commands that tell the Web Browser to go out and retrieve a document. We can also create links to specific locations (labels) within that document. Access to the document is by means of a Uniform Resource Locator (URL).

There are several types of URLs that perform different functions. Several different programs can be used to access these resources such as ftp, http, gopher, or even telnet. If we leave off the program name, the Web browser may assume that it refers to a file on our local system (it depends on the browser). However, just like ftp or telnet, we can make specificreferences to the local machine. I encourage using absolute names like this as it makes transferring Web pages that much easier.

As with any document with special formatting, we have to be able to tell the medium that there is something special. When we print something we may have to tell the printer. When a word processor displays something on the screen we have to tell it. When the man command is supposed to display something in bold, we have to tell it as well. Each medium has it's own language. This applies to the medium of the World Wide Web. The language that the Web uses is HTML. HTML is what is used to format all those headings, the bold fonts, even the links themselves are formatted using HTML.

Like other languages, HTML has it's own syntax and vocabulary. It shouldn't be surprising that there are even "dialects" of HTML. What does the interpretation of this language is our Web Browser, or simply browser. Like the word processor or the man command, our browsers see the formatting information and converts into to the visual images that we see on our screen. If our browser doesn't understand the dialect of the document we are trying to read, we might end up with garbage or maybe even nothing at all. Therefore, it is important to understand about these dialects.

HTML is similar to other formatting languages in that it is used to define the structure of the document. However, it is the viewer (in this case the browser) that gives the structure it's form.

Most browsers support the HTML 2 standard, although the newest standard is HTML 3.2. Some vendors have specific additions to them (such as Netscape) which makes pages designed for them sometimes unreadable by other browsers. However, as of this writing, Netscape has become pretty much the standard. Many sites that have web pages that were designed specifically for Netscape have links back to Netscape where you can download the latest version.

The Web is a client server system in it's truest sense. That is, there are some machines that provide services (the servers) to another set of machines (the clients). From our perspective, however, there is a single client (our browser) and tens of thousands of servers spread out all over the world. Keep in mind that the server doesn't have to be at some other location. In fact, the server doesn't even need to be another machine. Our own machine can serve documents locally, even though they are loaded with scohttpd.

As with any server, it sits and waits for requests. In this case, the requests are for documents. When it gets the request it looks for the appropriate document and passes it back to our client. To be able to communicate, the client and server need to speak the same language. This is the hypertext transfer protocol or HTTP.

What I am going to do in the next section is to give you a crash course on the basics of HTML. This is not an in-depth tutorial, nor are we going to cover every aspect of the language. However, we should cover enough information to get you on your way to creating fairly interesting Web pages.

HTML uses tags (formatting markers) to tell the Web browser how to display the text. These consist of pairs of angle brackets ( < > ) with the tag name inside. Following this is the text to be formatted. Formatting is turned off using the same tag, but the tag name is preceded by a slash (/). For example, the tag to create a level 1 header is <H1>, therefore to turn off the level 1 header, the tag is </H1>. In a document, it might look like this:

<H1>Welcome to My Home Page</H1>

Note that here and throughout the section, I use only capital letters in the tags. You don’t have to if you don’t want to, as the browser can interpret then either way. I find that by using capital letters, it is easier to see the tag when I am editing the source file.

HTML documents are usually broken into two sections: a header and a body. Each has it's own tag: <HEAD> and <BODY>. The header usually contains information that may not actually get displayed on the screen, but is still part of the document and does get copied to the client machine. If both of these are omitted, then all of the text belongs to the body.

By convention, every HTML document has a title, which is primarily used to identify the document and is, by convention, the same as the first heading. The title is not displayed on the screen, but rather at the top of the window.

While browsing the Web, we have certainly clicked on some text or an image and had some other document appear on our screen. As we talked about a moment ago, this is the concept of a hypertext link. Links are defined by two HTML tags. The first one is <A> tag, which stands of anchor. Like a heading or any other formatting tag, the anchor is started using the tag and closed by the same tag with a leading slash (</A>).

The text or image will then appear somewhat highlighted. When we click on it, nothing happens. This is because we haven't told the server what to do yet. This is where the <HREF> tag comes in. This is a reference to what document should be loaded. It doesn't have to be another HTML page, but does have to be something that the Web server and browser understand, like a CGI script (more on those later).

If the document is another HTML page, it will be loaded just like the first and will probably have it's own links. If the link points to an image, that image gets loaded into our browser. It is also possible that the link points to something like a tar archive or a pkzipped file. Clicking on it could cause our browser to start a particular program such as PKUNZIP.EXE, or simply ask we if we want to save the file. This is dependent on how our browser is configured.

Many browsers have an option "View source" where we can see the HTML source for the document we are currently viewing. This way we can see what each reference is anchored to. In addition, this is a great learning tool as you can see how pages were put together.

In general, these links have the format:


The line pointing to our homepage may look like this in a document:

<H1><A HREF="http://www.our.domain/index.html">Jimmo's WWW Home Page</A> </H1>

When we look at the document we don't see any of the tags, just a line that looks like this:

Jimmo's WWW Home Page

When we click on this link, the browser loads the document specified by HREF. In this case, www.our.domain/index.html.

Oftentimes the link is identified by the fact that it is a different color than the other text and links that we haven't visited are underlined. When we click on a link, and later return to that same page, the link will have a different color and the underline is gone. I have seen in some cases where there is no underline and the link just changes color. This is normally configurable by the browser.

The <H1> entry says that this line is to be formatted as a Header, level 1. The anchor is indicated by the <A> entry, and refers to the page index.html on the machine www.our.domain. Remember from our previous discussion, using the file index.html is a convention. If we had defined our DirectoryIndex to be some other file, this would replace index.html.

There are six levels of headings, numbered 1 through 6, with H1 being the largest and most prominent. Headings are generally displayed in larger or bolder fonts than the normal text. Normal text is simply anything not enclosed within tags. Like troff and similar editing languages, HTML does not care about formatting like carriage returns and other white spaces. Therefore we can include a carriage return anywhere in the text, but it will be displayed according to the rules of the tags as well as the width of the viewing window. This allows smaller windows to display the same text, without it getting messed up.

With lower resolutions, the level one header (<H1>) is way too large. I cannot remember ever seeing a site where a level one header really looked good. In most cases, unless we have a 17" monitor or larger and a high resolution, just a couple of words in a level one header takes up the whole width of the screen and is overwhelming. (We'll get more into some tips and techniques later.) The best way to see what it looks like is to try it yourself with different browsers at different resolutions. How the headers related to each other in size you can see in Figure 0-1.

Figure 0-1 Examples of Headers

HTML also provides explicit changes to the physical style such as <B> for bold, creating lists <UL> and forcing the Web browser not to do any formatting <PRE>. Some the more common tags are listed in Table 0\1.

A compendium of HTML elements can found at http://www.synapse.net/~woodall/html.htm. Not only does it list all the known HTML tags, but each has a description with a “compatibility” matrix description the HTML versions that include this tag, as well as which versions of the more common Web browsers support this tag.






Describes the content of the document



Heading with the specific level (1-6)



Starts a new paragraph



Logical tag for displaying emphasis, defaults to italics



Logical tag for displaying something of significance, defaults to bold



The item that you are defining



Start of a list of definitions



The definition of the term



Physical tag enabling bold text



Physical tag enabling underlined text



Physical tag enabling italic text



Disables formatting by Web browser



Forces a line break



Logical tag to enclose a link, normally used with HREF attribute



Beginning of an unnumbered list



Beginning of an numbered or ordered list



Draws a line across the page.



Indicates the document header



Indicates document body.



Item within a list



Defines the source/path of an image to be displayed.

Table 0\1 Basic HTML Tags

Here, too, when we want to stop the formatting, we use the same tag, but preceded with a slash. For example, to turn on bold it would look like this:


To turn it off, like this:


We could also include multiple formatting on the same line:

<H3><I>Welcome to <U><B>Jim's</B> Wonderful</U> Web Site</I> </H3>

which would look like this:

Welcome to Jim's Wonderful Web Site

Here we have several different tags that are laid inside each other. You don't have to match the pairs exactly like this. For example, the <U><B> or the </I> </H3> at the end of the line could have been reversed. We could have also had something like this:

<H3><I>Welcome to <U><B>Jim's</B> Wonderful</I> Web Site</U> </H3>

This would stop the italic formatting before it stopped the underline, although we had started the underline before we started the italic. The result would then look like this:

Welcome to Jim's Wonderful Web Site

We see that the underline continues under "Web Site," but the italics now stopped at the word "Wonderful."

As I mentioned, the convention is that the Web server's machine name is www.domain.name. To access their home page, the URL would be http://ww.domain.name. For example, to get to SCO's home page, the URL is http://www.sco.com.

The Page Itself

Using HTML you have the ability to control the overall appearance of the page. One simple way of doing this is to specify a color to use as your background. This is done using the BGCOLOR option within the <BODY> tag like this:

<BODY BGCOLOR =#rrggbb >

The color is specified as a red-green-blue (RGB) triplet. The value of the color is a hexadecimal value between 00 and FF. This specifies the intensity of each color. The higher the intensity, the closer it is to white. So, if we set all three colors to FF, we end up with a pure white background. Setting them all to 00 gives us black background.

By default, the text is black. So, if we were to specify a black background the text would be invisible. To get around this problem we use the TEXT option within the <BODY> tag. This too is an RBG-triplet. So, to specify white text on a black background, the line might look like this:


You have undoubtedly seen Web sites where there is an image as the background, rather than a single color. This is simply done with the BACKGROUND option. Note that if you specify a background image, this will overwrite what the background color is. Also, you need to pay attention to what the text color is. Sometimes the default color (black) doesn’t come out right, but slightly modifying it does. An example of using a background image and setting the text color to red would look like this:

<BODY BACKGROUND="background.gif" TEXT="#FF0000">

You can also define the color of your links within the <BODY> tag. The LINK option is the color of a link before anything happens. The ALINK option defines the color of the link when you click on it (A for “active”). The VLINK option defines the color of links that you have gone to already (V for “visited”). All three of these are RGB-triplets.


As indicated above, there are two types of lists that you can create: ordered and unordered. An unordered list is often called a bullet list as there is a symbol at the beginning of each line called a bullet. There are three types of bullets that you can use: DISC, CIRCLE and SQUARE.

For example, to specify an unordered list using circles, the line would look like this:


Ordered lists can also showdifferent types. The types indicate what symbols are used to identify each entry. Possible symbols are:

1 - Numbers (the default)

a - lowercase letters

A - Uppercase letters

I - Large roman numbers

i - Small roman numbers

To specify an ordered lists using small roman numbers, it would look like this:


If you use the TYPE attribute within the list tag, the type is valid for all list items. If you use the type attribute within the list item tag, it changes the type for all subsequent items.