Jim Mohr's SCO Companion

Index

Copyright 1996-1998 by James Mohr. All rights reserved. Used by permission of the author.

Be sure to visit Jim's great Linux Tutorial web site at http://www.linux-tutorial.info/

5 Running An Internet Server


Setting up an Internet server is just the first step. Providing web pages or files to download via ftp may make for an interesting site, but you need to get people to keep coming back. There best way to do that is to make the site interesting as well as interactive. In this chapter, we are going to talk about some of the ways you can make a really interesting site.

One of the features of many sites that makes them interesting is the fact that they are interactive. The visitor inputs some information and gets back a response. This is handled by the Common Gateway Interface (CGI). Essentially, the only criteria for a CGI program is that it understands standard input and standard output. This means that CGI programs can be written in C or as a shell script. Although you have a copy of the SCO dev sys on the CD, writing them in C is not always the easiest thing to do. You could write them as a shell script, but you then lack some power. The solution is the perl scripting language. This has the almost all the power of C, but is a script language that makes development easy.

Perl — The Language of the Web

If you plan to do anything serious on the Web, then I suggest that you learn perl. In fact, if you plan to do anything serious on you machine, then learning perl is also good idea. The downside of all this is that perl is not part of the standard SCO distribution. However, it is available on the SCO Skunkware CD, or you can download it from the SCO FTP site at ftp.sco.com/Skunk2/CD-ROM/bin/perl.

Now, I am not saying that you shouldn't learn sed, awk and shell programming. Rather, I am saying that you should learn all four. Both sed and awk have been around for quite a while so they are deeply ingrained in the thinking of most system administrators. Although you could easily find a shell script on the system that didn't have elements of sed or awk in them, you would be very hard pressed to find scripts that had no shell programming in them. On the other hand, most of the scripts that process information from other programs use either sed or awk. Therefore it is likely that you will eventually come across one or the other.

Perl is another matter all together. None of the standard scripts have perl in them. This does not say anything about the relative value of perl, but rather the relative availability of it. Since it can be expected that awk and sed are available, it makes sense that they are commonly used. Perl may not be on your machine and by including it in a system shell script you might run into trouble.

In this section we are going to talk about the basics of perl. We'll go through the mechanics of creating perl scripts and the syntax of the perl language. There are many good books on perl, so I would direct you to them to get into the nitty-gritty. Here we are just going to cover the basics. Later on, we'll address some of the issues involved with making perl scripts to use on your web site.

I was once asked why one should program in perl instead of C when developing Web pages. Since you have a copy of the SCO development system on the CD, you don’t need to go out and buy it before you start programming. First of all, perl is an interpreted language. That is, you do not need to compile it or link in different libraries to make it go. This (in my mind) shortens the development time as you do not need to wait for the program to compile.

Another aspect is that there are many things built into perl that make programming for the Web easier. The name perl stands for Practical Extraction and Reporting Language. The are several constructs and functions in perl that make it ideal for CGI programming.

I also think it is easier to learn and to code. Whereas you would need several lines of code in C, you can accomplish the same thing in perl with just a single line. Finally, perl code is essentially the same on every system. Unless you make extensive use of the system commands and programs, there is usually no porting work required.

In most programming texts that I remember using, there was always an introduction that covered the "basics" to make sure we were all starting at the same point. Although I wish I had the space to do that with perl, on the other hand I don't think it is necessary. We have already discussed most of the issues that will come in to play, and those that we haven't talked about yet, we'll get to.

I am going to make assumptions as to what level of programming background you have. If you read and understood the sections on sed, awk and the shell, then you should be ready for what comes next. In this chapter, I am going to jump right in.

Let's create a shell script, called hello.pl. The 'pl' extension has no real meaning, although I have seen many places were this is always used as the extension. It is more or less conventional to do this, just as text files traditionally have the ending 'txt', shell scripts the ending 'sh', and so on.

We'll start off with the traditional:

print "Hello, World!\n";

This shell script consists of a single perl statement, whose purpose is to output the text inside the double-quotes. Each statement in perl is followed by a semi-colon. Here we are using the perl print function to output the literal string "Hello, World!\n" (including the trailing new-line). Although we don't see it, there is the implied file handle to stdout. The equivalent with the explicit reference would be:

print STDOUT "Hello, World!\n";

Once the script is created, we can start in one of two ways. The first is passing the script as an argument to the perl program, as in:

perl hello.pl

The other way is to have the system do the work for use by telling it which interpreter to use. To do this we add #!/usr/bin/perl to the top of the script, so the entire script would look like this like this:

#!/usr/bin/perl

print "Hello, World!\n";

Then we can make the script executable and run it like this:

./hello.pl

Along with STDOUT, perl has default file handlers STDIN and STDERR. Here is a quick script to demonstrate all three, as well as introduce a couple of familiar programming constructs:

while (<STDIN>)

{

if ( $_ eq "\n" )

{

print STDERR "Error: \n";

} else {

print STDOUT "Input: $_ \n";

}

}

Functioning the same as in C and most shells, the while line at the top says that as long as there is something coming from stdin, do the loop. Here we have the special format (<STDIN>), which tells perl where to get input. If we wanted, we could use a file handle other than STDIN. However, we'll get to that in a little bit.

One thing that you need to watch out for is that you have to include blocks of statements (such as after while or if statements) inside of the curly brackets ({}). This is different from the way you can do it in C, where a single line can follow a while or if. For example, this statement is not valid in perl:

while ( $a < $b )

$a++;

You would need to write it something like this:

while ( $a < $b ) {

$a++;

}

Inside of the while loop, we get to an "if" statement. We compare the value of the special variable $_ to see if it is empty. The variable $_ serves several functions. In this case, it represents the line we are reading from STDIN. In other cases, it represents the pattern space, as in sed. If it is empty, then just the enter key was pressed. If the line we just read in is equal the new line character (just a blank line), we use the print function, which has the syntax:

print [filehandler] "text_to_print";

In the first case, a filehandler is stderr and stdout in the second. In each case, we could have left off the file handler and the output would go to stout.

Each time we print a line, we need to include a newline ("\n") ourselves.

We can format the print line in different ways. In the second print line, where the input was not a blank line, we print "Input: " before we print the line just input. Although this is a very simple way of outputting lines, it does get the job done. More complex formatting is possible with the perl printf function. Like it's counterpart in C or awk, you can come up with some very elaborate outputs. We'll get into more details later.

One of the more useful functions for processing lines of input is split. The split function is used, as it's name implies, to split the line based on a field separator that you define. Say, for example, a space. The line is then stored in an array as individual elements. So, in our example, if we wanted to input multiple words and have them parsed correctly, we could change the script to look like this:

while (<STDIN>)

{

@field = split(' ',$_);

if ( $_ eq "\n" )

{

print STDERR "Error: \n";

} else {

print STDOUT "$_ \n";

print $field[0];

print $field[1];

print $field[2];

}

}

The split function has the syntax

split(pattern,line);

where 'pattern' is our field separator and 'line' is the input line. So our line:

@field = split(' ',$_);

says to split the line we just read in (stored in $_) and use a space (' ') as the field separator. Each field is then placed into an element of the array field. The at-sign (@) is needed in front of the variable (field) to indicate it is an array. In perl, there are several types of variables. The first kind we have already met before. The special variable $_ is an example of a scalar variable. Each scalar variable s preceded by a dollar-sign ($) and can contain a single value, whether a character string or a number. How does perl tell the difference? It depends on the context. Perl will behave correctly by looking at what you tell it to do with the variable. Other examples of scalars are:

$name = "jimmo";

$initial = 'j';

$answertolifetheuniverseandeverything = 42;

Another kind of variable is an array, as we mentioned before. If we precede a variable with a percent-sign (%), we have an array. But don't we have an array with the at-sign? Yes, so what's the difference? The difference is that arrays starting with the at-sign are referenced by numbers, while those starting with the percent-sign are referenced by a string. WeÕll get to how that works as we move along.

In our example, we are using the split function to fill up the array @field. This array will be referenced by number. We see the way it is referenced in the three print statements towards the end of the script.

If our input line had a different field separator (for example, '%'), the line might look like this:

@field = split('%',$_);

In this example, we are just outputting the first three words that are input. But what if there are more words? Obviously we just add more print statements. What if there are fewer words? Now we run into problems. In fact, we run into problems when adding more print statements. The question is where do we stop? Do we set a limit on the number of words that can be input? Well, we can avoid all of those problems by letting the system count for us. By changing the script a little, weÕll get:

while (<STDIN>)

{

@field = split(' ',$_);

if ( $_ eq "\n" )

{

print STDERR "Error: \n";

} else {

foreach $word (@field){

print $word,"\n";

}

}

}

We introduce the 'foreach' construct. This has the same behavior as a "for" loop. In fact, in perl, "for" and "foreach" are interchangeable, provided you have the right syntax. In this case the syntax is:

foreach $variable (@array)

Where '$variable' is our loop variable, and [email protected]' is the name of the array. When the script is run, the @array is expanded to it's individual components. So, if we had input four fruits, our line might have looked like this:

foreach $word ('apple','banana','cherry','orange');

Since I don't know how many elements there are in the array field, foreach comes in handy. In this example, every word separated by a space will be printed on a line by itself. Like this:

perl script.pl

one two three

one

two

three

^D

The ^D is shorthand for saying that you press the CTRL key and the ‘d’ (lowercase) at the same time. This tells the script that you have reached the end of the file.’

Our next enhancement is to change the field separator. This time we'll use an ampersand (&) instead. The split line now looks like this:

@field = split('&',$_);

When we run the script again with the same input, what we get is a bit

different:

# perl script.pl

one two three

one two three

The reason why we get the output on one line is because the space is no longer a field separator. If we run it again, this time using an ampersand, we get something different:

# perl script.pl

one&two&three

one

two

three

In this case, the three words were recognized as separate fields.

Although it doesn't seem too likely that you would be inputting data like this from the keyboard, it is not unthinkable that you might want to read a file that has data stored this way. To make things easy, I have provided a file that represents a simple database of books. Each line is a record and represents a single book, with the fields separated by a percent sign.

To be able to read from a file, we have to create a file handle. To do this we add a line and change the while statement so it now looks like this:

open ( INFILE,"< bookdata.txt");

while (<INFILE>)

The syntax of the open function is:

open(file_handle,openwhat_&_how);

The way we open a file depends on the way we want to read it. Here, we use standard shell redirection symbols to indicate how we want to read the specified file. In our example, we indicate redirection from the file bookdata.txt. This says we want to read from the file. If we want to open for writing, the line would look like this:

open ( INFILE,"> bookdata.txt");

If we want to append to the file, we change the redirections so the line

would look like this:

open ( INFILE,">> bookdata.txt");

Remember I said that we use standard redirection symbols? This also includes the pipe symbol. As the need presents itself, your perl script can open a pipe for either reading or writing. Assuming that we want to open a pipe for writing that sends the output through sort. The line might look like this:

open ( INFILE,"| sort ");

Remember that this would work the same as from the command line. Therefore, the output is not being written to a file, just being piped through sort. However, we could do so if we wanted to. For example:

open ( INFILE,"| sort > output_file");

opens the file output_file for writing, but the output is first piped through sort. In our example, we are opening the file bookdata.txt for reading. The while loop continues through and outputs each line read. However, instead of being on a single line, the individual fields (separated by an ampersand) are output on a separate line.

We can now take this one step further. Let's assume that a couple of the fields are actually composed of sub-fields. These sub-fields are separated by a plus sign (+). We want to break up every field containing a plus sign into its individual sub-fields.

As you probably guessed, we use the split command again. This time we use a different variable and instead of reading out of the input line ($_), we are reading out of the string $field. Therefore, the line would look like this:

@subfield = split('\+',$field);

Aside from changing the search pattern, I add the backslash. This is because the plus-sign is used in the search pattern to represent one or more occurrences of the preceding character. If we don't escape it, we generate an error. The whole script now looks like this:

open(INFILE,"<bookdata.txt");

while (<INFILE>)

{

@data = split('&',$_);

if ( $_ eq "\n" )

{

print STDERR "Error: \n";

} else {

foreach $field (@data){

@subfield = split('\+',$field);

foreach $word (@subfield){

print $word,"\n";

}

}

}

}

If we wanted to, we could have written the script to split the incoming lines at both the ampersand and the plus-sign, which would have given us a split line that looked like this:

@data = split('[&\+]',$_);

The reason for writing the script as we did was that it is easier to separate sub-fields and still maintain their relationship. Note that the search pattern here can be any regular expression. For example, we could split the strings every place there is the pattern 'Di', when it is followed by an 'e', 'g' or an 'r', but *not* if that is followed by and 'i'. The regular expression would be:

Di[reg][^i]

so the split function would be:

@data = split('Di[reg][^i]',$_);

At this point, we can read in lines from an ASCII file, separate the line based on what we have defined as fields and then output each line. However, the lines don't look very interesting. All we are seeing is the content of each field and do not know what each field represents. Let's change the script once again. This time we will make the output show us the field names as well as their content.

Let's also change the script so that we have control over where the fields end up. We still use the split statement to extract individual fields from the input string. This is not necessary since we can do it all in one step, but I am doing it this way to demonstrate the different constructs and illustrating the adage that in Perl there is always more than one way do to something. So we end up with the script:

open(INFILE,"< bookdata.txt");

while (<INFILE>)

{

@data = split('&',$_);

if ( $_ eq "\n" )

{

print STDERR "Error: \n";

} else {

$fields = 0;

foreach $field (@data){

$fieldarray[$fields] = $field;

print $fieldarray[$fields++]," ";

}

}

}

Each time we read a line, we first split it into the array @data, which is then copied into the fields array. Note that there is no new-line in the print statement, so each field will be printed followed by a space. The new-line read at the end of each input line will then be output. Each time through the loop, we reset our counter (the variable $fields) to 0.

Although the array is re-filled every time through the loop and we lose the previous values, we could assign the values to specific variables.

Now let's make the output a little prettier, by outputting the field headings first. To make things simpler, let's label the fields as follows:

title, author, publisher, char0, char1, char2, char3, char4, char5

where char0-char5 are simply characteristics about the book. We need a handful of if statements to make the assignment that will look like this:

foreach $field (@data){

if ( $fields = = 0 ){

print "Title: ",$field;

}

if ( $fields = = 1 ){

print "Author: ",$field;

}

*

*

*

if ( $fields = = 8 ){

print "Char 5: ",$field;

}

Here, too, we would be losing the value of each variable every time through the loop as they get overwritten. Let's just assume we only want this save this information from the first line (why will become clear in a minute). First we need a counter to keep track of what line we are on, and an if statement to enter the block where we make the assignment. Rather than a print statement, we change the line to an assignment, so the first line might look like this:

$title = $field;

When we read subsequent lines, we can then output headers for each of the fields. We do this by having another set of if statements that output the header and then the value, based on it's position.

Actually, there is a way of doing things a little more efficiently. When we read the first line, we can assign the values to variables on a single line. Instead of the line:

foreach $field (@data) {

we add the if-statement to check if this is the first line and add the line:

($field0,$field1,$field2,$field3,$field4,$field5,$field6,$field7,$field8)=

split('&',$_);

Rather than assigning values to elements in an array, we are assigning then to specific variables. (Note that if there are more fields generated by the split command than we specify variables for, the remainder will be ignored.) The other advantage of this is that we have saved ourselves a lot of space. We could also call these $field1, $field2, and so on, thereby making the field name a little more generic. We could also modify the split line so instead of several separate variables, have them in a single array called field and then use the number as the offset into the array. Therefore, the first field would be referenced like this:

$field[0]

and the split command for this would look like:

@field=split('&',$_);

Kind of looks like something we already had. It is. This is just another example of the fact that there are always different ways to do something in perl.

At this point, we still need the series of if-statements inside of the foreach loop to print out the line. However, that seems like a lot of wasted space. Instead, we introduce the concept of an associated list. An associated list is just like any other list, except that you reference the elements by a label rather than a number

Another difference is that associated arrays, also referred to as associated lists, are always an even length. This is because elements come in pairs: label and value. For example, we have:

%list= ('name','James Mohr', 'logname','jimmo', 'department,'IS');

Note that instead of $ or @ to indicate this is an array, we use a %. This specifies that this is an associative array, so we can refer to the value by label, however when we finally reference the value, we do use the $. To print out the name, the line would look like this:

print "Name:",$list{name};

Also different is the brackets we use. Here we use curly bracket instead of square brackets.

The introduction of the associate array allows us to define the field labels within the data itself and access the values using these labels. As I mentioned, the first line of the data file are the field labels. We can use these labels to reference the values. Let's look at the program itself:

open(INFILE,"< bookdata.txt");

$lines=0;

while (<INFILE>)

{

chop;

@data = split('&',$_);

if ( $lines == 0 )

{

@headlist=split('&',$_);

foreach $field (0..@headlist-1){

%headers = ( $headlist[$field],'' );

}

$lines++;

} else {

foreach $field (0..@data-1){

$headers{$headlist[$field][email protected][$field];

print $headlist[$field],": ", $headers{$headlist[$field]},"\n";

}

}

}

At the beginning of the script, we added the chop function which 'chops' off the last character of a list or variable and returns that character. If you don't mention the list or variable, chop effects the $_ variable. This function is useful to chop off the new-line character that gets read in. The next change we made was to remove the block that checked for blank lines and generated an error.

The first time we read a line, we enter the appropriate block. Here we have just read in the line containing the field labels and we put each entry into the array headlist via the split function. The foreach loop also adds some new elements:

foreach $field (0..@headlist-1){

%headers = ( $headlist[$field],'' );

}

The first addition is the element (0.. @headlist-1). Two numbers separated by the two dots indicate a range. We can use @headlist as a variable to indicate how many elements are in the array headlist. This returns a human number, not a computer (one that starts at 0). Since I chose to access all my variables starting with 0, I need to subtract 1 from the value of @headlist. There are 9 elements per line in the file bookdata.txt, therefore their range is 0..9-1.

However, we don't need to know that! In fact, we don't even know how many elements there are to make use of this functionality. The system knows how many elements it read in, so we don't have to. We just use @headlist-1 (or whatever).

The next line fills in the elements of our associative array:

%headers = ( $headlist[$field],'' );

However, we are only filling in the labels and not the values themselves. Therefore, the second element of the pair is empty (''). One by one, we write the label into the first element of each pair.

After the first line is read, we load the values themselves. Here again we have a foreach loop that goes from 0 to the last element of the array. Like the first loop, we don't need to know how many elements were read it, as we let the system keep track of this for us. The second element in each pair of the associative list is loaded with this line:

$headers{$headlist[$field][email protected][$field];

Let's take a look at this line starting at the left end. From the array @data, (which is the line we just read in) we are accessing the element at the offset specific by the variable $field. Since this is just the counter used for our foreach loop, we go through each element of the array data one by one. The value retrieved is then assigned to the left hand side.

On the left, we have an array offset being referred to by an array offset. Inside we have:

au: Òarray offset being referred to by an array offsetÓ ok?

$headlist[$field]

The array headlist is what we filled up in the first block. In other words, the list of field headings. When we reference the offset with the $field variable, we get the field heading. This will be used as the string for the associative array. The element specified by:

$headers{$headlist[$field}

corresponds the field value. For example, if the expression

$headlist[$field}

evaluated to 'title', then the second time through the loop, the expression

$headers{$headlist[$field}

might evaluate to "2010: Odyssey Two."

At this point we are now ready to make our next jump. We are going to add the functionality to search for specific values in the data. Let's assume that we know what the fields are and wish to search for a particular value. For example, we want all books that have scifi as field char0. Assuming that the script was called book.pl, we would specify the field label and value like this:

book.pl char0=scifi

The completed script looks like this:

($searchfield,$searchvalue) = split('=',$ARGV[0]);

open(INFILE,"< bookdata.txt");

$lines=0;

while (<INFILE>)

{

chop;

@data = split('&',$_);

if ( $_ eq "\n" )

{

print STDERR "Error: \n";

} else {

if ( $lines == 0 )

{

@headlist=split('&',$_);

foreach $field (0..@headlist-1){

%headers = ( $headlist[$field],'' );

}

$lines++;

} else { foreach $field (0..@data-1){

$headers{$headlist[$field][email protected][$field];

if ( ($searchfield eq $headlist[$field] ) &&

($searchvalue eq $headers{$headlist[$field]} )) {

$found=1;

}

}

}

}

if ( $found == 1 )

{

foreach $field (0..@data-1){

print $headlist[$field],": ", $headers{$headlist[$field]},"\n";

}

}

$found=0;

}

We added a line at the top of the script that splits the first argument on

the command line:

($searchfield,$searchvalue) = split('=',$ARGV[0]);

Note that we are accessing ARGV[0]. This is not the command being called, as one would expect from C or shell programming. Our command line had the string char0=scifi as it's $ARGV[0]. After the split, $searchfield=char0 and $searchvalue=scifi.

Some other new code looks like this:

if ( ($searchfield eq $headlist[$field] ) &&

($searchvalue eq $headers{$headlist[$field]} )) {

$found=1;

Instead of outputting each line in the second foreach loop, we change it so that here we are checking to see if the field we input ($searchfield) is the one we just read in $headlist[$field], and if the value we are looking for ($searchvalue) equals the one we just read in.

Here we add another new concept, that of logical operators. These are just like in C, where && means a logical AND || is a logical OR. If we want a logical comparison if two variables each have a specific value, we would just ??? the logical AND, like:

if ( $a == 1 && $b = 2)

which says if $a equals 1 AND $b equals 2, then execute the following block. If we wrote it like this:

if ( $a == 1 || $b = 2)

this says that if $a equals 1 OR $b equals 2, then execute the block. In our example, we are saying that if the search field ($searchfield) equals the corresponding value in the heading list ($headlist[$field]) AND the search value we input ($searchvalue) equals the value from the file ($headers{$headlist[$field]}), we then execute the following block. Our block simply sets a flag to say we found a match.

Later, after we read in all the values for each record, we check the flag. If the flag was set, the foreach loop is executed:

if ( $found == 1 )

{

foreach $field (0..@data-1){

print $headlist[$field],": ", $headers{$headlist[$field]},"\n";

}

Here we output the headings and then their corresponding values. But what if we aren't sure of the exact text we are looking for? For example, I want all books by the author Eddings, but do not know that his first name is David. It's now time to introduce the perl function index. As it's name implies, it delivers an index. The index it delivers is an offset of one string in another. The syntax is:

index(STRING,SUBSTRING,POSITION)

where STRING is the name of the string that we are looking in, SUBSTRING is the substring that we are looking for and POSITION is where to start looking. That is, what position to start from. If POSITION is left off, the function starts at the beginning of STRING. For example:

index('pie','applepie');

will return 5, as the substring 'pie' starts at position 5 of the string 'applepie'. To take advantage we only need to change one line. We change this:

if ( ($searchfield eq $headlist[$field] ) &&

($searchvalue eq $headers{$headlist[$field]} )) {

to this:

if ( (index($headlist[$field],$searchfield)) != -1 &&

index($headers{$headlist[$field]},$searchvalue) != -1 ) {

Here we looking for an offset of -1. This indicates the condition where the substring is not within the string. (The offset comes before the start of the string.) So, if we were to run the script like this:

script.pl author=Eddings

we would look through the field author for any entry containing the string Eddings. Since there are records with an author named Eddings, if we looked for Edding, we would still find it since Edding is also a substring of "David Eddings".

As you may have noticed, we have a limitation in this mechanism. We have to make sure that we spell things with the right case. Since "Eddings" is uppercase both on the command line and in the file, there is no problem. Normally names are capitalized, so it would make sense to input them like that. But what about the title of a book? Often words like "the" or "and" are not capitalized. However, what if the person who input the data, input them as capitals? If you looked for them in lowercase, but they were in the file as uppercase, you'd never find them.

In order to consider this possibility, we need to compare both the input and the fields in the file in the same case. We do this by using the tr (translate) function. It has the syntax:

tr/SEARCHLIST/REPLACEMENTLIST/[options]

Where SEARCHLIST is the list of characters to look for and REPLACEMENTLIST is the characters to use to replace those in SEARCHLIST. To see what options are available, check the perl man-page. We change part of the script to look like this:

foreach $field (0..@data-1){

$headers{$headlist[$field][email protected][$field];


($search1 = $searchfield) =~ tr/A-Z/a-z/;

($search2 = $headlist[$field] ) =~ tr/A-Z/a-z/;

($search3 = $searchvalue)=~tr/A-Z/a-z/;

($search4 = $headers{$headlist[$field]})=~tr/A-Z/a-z/;


if ( (index($search2,$search1) != -1) && (index($search4,$search3) != -1) ) {

$found=1;

}

}

In the middle of this section there are four lines where we do the translations. This demonstrates a special aspect of the tr function. We can do the translation as we are assigning one variable to another. This is useful since the original strings are left unchanged. We then had to change the statement with the index function and comparisons to reflect the changes in the variables.

When writing conditional statements, you have to be sure of what it is you are testing as your condition. Truth, like many other things, is in the eye of the beholder. In this case, it is the perl interpreter that is beholding your concept of true. It may not always be what you expect. In general, you can say that a value is true unless it is the null string (''), the number zero (0) or the literary string zero ("0").

One important aspect is the comparison operators. Unlike C, there are different operators for numeric comparison and for string comparison. They're all easy to remember and you have certainly seen both sets before, but keep in mind that they are different. Table 0\1 contains a list of the perl comparison operators:


Numeric

String

Comparison

==

eq

equal to

!=

ne

not equal to

>

gt

greater than

<

lt

less than

>=

ge

greater than or equal to

<=

le

less than or equal to

<=>

cmp

not equal to and signed is returned



(0 - strings equal, 1 - first string less,



-1 - first string greater)

Table 0\1 - Perl comparison operators

Another important aspect is that you need to keep in mind that there is no such thing as a numeric variable; well not really. Perl is capable of converting between the two without you interfering. If the variable is used in a context where it can only be a string, then that's they way perl will interpret it, as a string.

Let's take two variables: $a=2 and $b=10. As you might expect the expression $a < $b evaluates to true because we are using the numeric comparison operator <. However, if the expression is $a lt $b, it would evaluate to false. This is because the string "10" comes before "2" lexigraphically (even though it comes first alphabetically).

Beside simply translating sets of letters, perl can also do substitution. To show you, this I am going to show you another neat trick of perl. Being designed as a text and file processing language, it is very common to be reading in a number of lines of data and processing them all in turn. We can tell perl that it should assume we want to read in lines although we don't explicitly say so. Let's take a script that we call fix.pl that looks like this:

s/James/JAMES/g;

s/Eddings/EDDINGS/g;

The syntax is the same as you find in sed; however perl has a much larger set of regular expressions. Trying to run this as a script by itself will generate an error, so instead we run it like this:

perl -p fix.pl bookdata.pl

The -p option tells perl to put a wrapper around your script. Therefore, our script would behave as we had written it like this:

while (<>) {

s/James/JAMES/g;

s/Eddings/EDDINGS/g;

} continue {

print;

}

This would read each line from a file specified on the command line, carry out the substitution and then print out each line, changed or not. We could also take advantage of the ability to specify the interpreter with the #!. If this the first line of a shell script, the system will use the statement following it as the interpreter. The script would then look like:

#!/usr/bin/perl -p

s/James/JAMES/g;

s/Eddings/EDDINGS/g;

Another command line option is '-i'. This stands for "in-place" and with it you can edit files "in-place." In the example above, the changed lines would be output to the screen and we would have to redirect it to a file ourselves if we wanted to. The -i option takes an argument, which indicates the extension you want for the old version of the file. So, we change the first line, like this:

#!/usr/bin/perl -pi.old

With perl you can also make your own subroutines. These subroutines can be written to return values, so you now have functions as well. Subroutines are first defined with the 'sub' keyword and are called using the ampersand (&). For example:

#!/usr/bin/perl

sub usage {

print "Invalid arguments: @ARGV\n";

print "Usage: $0 [-t] filename\n";

}

if ( @ARGV < 1 || @ARGV > 2 ) {

&usage;

}

This says that if the number of arguments from the command line @ARGV is less than 1 or greater than 2, we call the subroutine usage which prints out a usage message.

To create a function, we first create a subroutine. When we call the subroutine, we call it as part of an expression. The value returned by the subroutine/function is the value of the last expression evaluated.

Let's create a function that prompts you for a yes/no response:

#!/usr/bin/perl

if (&getyn("Do you *really* want to remove all the files in this directory? ")

eq "y\n" )

{

print "Don't be silly!\n"

}


sub getyn{

print @_;

$response = (<STDIN>);

}

This is a very simple example. In the subroutine getyn, we output everything that is passed to the subroutine. This serves as a prompt. We then assign a line we get from stdin to the variable $response. Since this is the last expression inside the subroutine to be evaluated, this is the value that is returned to the calling statement. If we enter "y" (which would include the new-line from the enter key) this is all passed to the funtion.

The calling if-statement is passing the actual prompt as an argument to the subroutine. The getyn subroutine could then be used in other circumstances. As mentioned, the value returned includes the new-line, therefore we have to check for "y\n". This is not ‘y’ or ‘n’, but rather ‘y# followed by a new-line.

Alternatively, we could check the response inside of the subroutine. We could have added the line:

$response =~ /^y/i;

We addressed the =~ characters earlier in connection with the tr function. Here as well, the variable on the left-hand side is replaced by the "evaluation" of the right. In this case, we use a pattern matching construct: /^y/i. This has the same behavior as sed, where we are looking for a 'y' at the beginning of the line. The trailing 'i' simply says to ignore the case. If the first character does begin with a 'y' or 'Y', the left-hand side ($response) is assigned the value 1; if not it becomes a null string.

We now change the calling statement and simply leave off the comparison to "y\n". Since the return value of the subroutine is the value of the last expression evaluated, the value returned now is either '1' or ''. Therefore, we don't have to do any kind of comparison, as the if-statement will react according to the return value.

I wish I could go on. I haven't even hit on a quarter of what perl can do. Unfortunately, like the sections on sed and awk, more details are beyond the scope of this book. Instead I want to refer you to a few other sources. First, there are two books from O'Reilly and Associates. The first is Learning Perl by Randal Schwartz. This is a tutorial. The other is Programming Perl by Larry Wall and Randal Schwartz. If you are familiar with other UNIX scripting languages, I feel you would be better served by getting this one. It takes the same approach that I do by explaining "this is how perl does things" rather than trying to explain to you what those "things" are. Also PERL by Example by Ellie Quigley from Prentice Hall provides an excellent tutorial.

The next place is the PERL CDROM from Walnut Creek CDROM (www.cdrom.com). This is loaded with hundreds of megabytes of perl code and the April 1996 version which I used contains the source code for perl4 (4.036) and perl5 (5.000m). In many cases, I like this approach more since I can see how to do things I need to do. Books are useful to get the basics and reminders of syntax, options, and the like. However, seeing someone else's code shows me how to do it.

Another good CD is the Mother of PERL CD from InfoMagic (www.infomagic.com). It, too, is loaded with hundreds of megabytes of perl scripts and information.

There are a lot of places to find sample scripts while you are waiting for the CD to arrive. One place is the Computers and Internet: Programming Languages: Perl hierarchy at Yahoo. (www.yahoo.com). You can use this as a spring board to many sites that not only have information on perl but use perl on the Web (e.g., in CGI scripts).


Example

Function

Result

Assignment



$x = $y

Assignment

Assign the value of $b to $a

$x += $y

Addition

Add the value of $b to $a

$x -= $y

Subtraction

Subtract the value of $b from $a

$x .= $y

Append

Append string $y onto $x

String Operations



index($x,$y)

Index

Delivers offset of string $y in string $x

substr($x,$y,$len)

Substring

Delivers substring on $x, starting at $y of length $len

$x . $y

Concatenation

$x and $y considered a single string, but each remains unchanged.

$x x $y

Repetition

String $x is repeated $y times

Pattern Matching



$var =~ /pattern/

Match

True if $var contains “pattern”

$var =~ s/pat/repl/

Substitution

Substitutes “repl” for “pat”

$var =~ tr/a-z/A-Z/

Translation

Translates lowercase to uppercase

Math Operations



$x + $y


Sum of $x and $y

$x - $y


Difference of $x and $y

$x * $y


Product of $x and $y

$x / $y


Sum of $x and $y

$x % $y


Sum of $x and $y

$x ** $y


Sum of $x and $y

$x++, ++$x


Sum of $x and $y

$x--, --$x


Sum of $x and $y

Logic Operations



$x && $y

logical AND

True if both $x and $y are true

$x || $y

logical OR

True if either $x or $y is true

!$ $x

logical NOT

True if $x is not true

Table 0\2 Perl Operations

Building Our Pages

In this section, we are going to talk about the basics of putting together Web pages. Entire books have been published that go into more detail than I do here. However, as with perl scripts, I feel that the best way to create good Web pages is to practice. I can give you the tools, but it is up to you to become good at using them.

HTML - The Language of The Web

Web pages are written in the Hypertext Markup Language (HTML). This is a "plain-text" file that can edited by any editor, like vi. The HTML commands are similar, and also simpler, that those used by troff. (a text processing language available from various places on the Web). In addition to formatting commands, there are built in commands that tell the Web Browser to go out and retrieve a document. We can also create links to specific locations (labels) within that document. Access to the document is by means of a Uniform Resource Locator (URL).

There are several types of URLs that perform different functions. Several different programs can be used to access these resources such as ftp, http, gopher, or even telnet. If we leave off the program name, the Web browser may assume that it refers to a file on our local system (it depends on the browser). However, just like ftp or telnet, we can make specificreferences to the local machine. I encourage using absolute names like this as it makes transferring Web pages that much easier.

As with any document with special formatting, we have to be able to tell the medium that there is something special. When we print something we may have to tell the printer. When a word processor displays something on the screen we have to tell it. When the man command is supposed to display something in bold, we have to tell it as well. Each medium has it's own language. This applies to the medium of the World Wide Web. The language that the Web uses is HTML. HTML is what is used to format all those headings, the bold fonts, even the links themselves are formatted using HTML.

Like other languages, HTML has it's own syntax and vocabulary. It shouldn't be surprising that there are even "dialects" of HTML. What does the interpretation of this language is our Web Browser, or simply browser. Like the word processor or the man command, our browsers see the formatting information and converts into to the visual images that we see on our screen. If our browser doesn't understand the dialect of the document we are trying to read, we might end up with garbage or maybe even nothing at all. Therefore, it is important to understand about these dialects.

HTML is similar to other formatting languages in that it is used to define the structure of the document. However, it is the viewer (in this case the browser) that gives the structure it's form.

Most browsers support the HTML 2 standard, although the newest standard is HTML 3.2. Some vendors have specific additions to them (such as Netscape) which makes pages designed for them sometimes unreadable by other browsers. However, as of this writing, Netscape has become pretty much the standard. Many sites that have web pages that were designed specifically for Netscape have links back to Netscape where you can download the latest version.

The Web is a client server system in it's truest sense. That is, there are some machines that provide services (the servers) to another set of machines (the clients). From our perspective, however, there is a single client (our browser) and tens of thousands of servers spread out all over the world. Keep in mind that the server doesn't have to be at some other location. In fact, the server doesn't even need to be another machine. Our own machine can serve documents locally, even though they are loaded with scohttpd.

As with any server, it sits and waits for requests. In this case, the requests are for documents. When it gets the request it looks for the appropriate document and passes it back to our client. To be able to communicate, the client and server need to speak the same language. This is the hypertext transfer protocol or HTTP.

What I am going to do in the next section is to give you a crash course on the basics of HTML. This is not an in-depth tutorial, nor are we going to cover every aspect of the language. However, we should cover enough information to get you on your way to creating fairly interesting Web pages.

HTML uses tags (formatting markers) to tell the Web browser how to display the text. These consist of pairs of angle brackets ( < > ) with the tag name inside. Following this is the text to be formatted. Formatting is turned off using the same tag, but the tag name is preceded by a slash (/). For example, the tag to create a level 1 header is <H1>, therefore to turn off the level 1 header, the tag is </H1>. In a document, it might look like this:

<H1>Welcome to My Home Page</H1>

Note that here and throughout the section, I use only capital letters in the tags. You don’t have to if you don’t want to, as the browser can interpret then either way. I find that by using capital letters, it is easier to see the tag when I am editing the source file.

HTML documents are usually broken into two sections: a header and a body. Each has it's own tag: <HEAD> and <BODY>. The header usually contains information that may not actually get displayed on the screen, but is still part of the document and does get copied to the client machine. If both of these are omitted, then all of the text belongs to the body.

By convention, every HTML document has a title, which is primarily used to identify the document and is, by convention, the same as the first heading. The title is not displayed on the screen, but rather at the top of the window.

While browsing the Web, we have certainly clicked on some text or an image and had some other document appear on our screen. As we talked about a moment ago, this is the concept of a hypertext link. Links are defined by two HTML tags. The first one is <A> tag, which stands of anchor. Like a heading or any other formatting tag, the anchor is started using the tag and closed by the same tag with a leading slash (</A>).

The text or image will then appear somewhat highlighted. When we click on it, nothing happens. This is because we haven't told the server what to do yet. This is where the <HREF> tag comes in. This is a reference to what document should be loaded. It doesn't have to be another HTML page, but does have to be something that the Web server and browser understand, like a CGI script (more on those later).

If the document is another HTML page, it will be loaded just like the first and will probably have it's own links. If the link points to an image, that image gets loaded into our browser. It is also possible that the link points to something like a tar archive or a pkzipped file. Clicking on it could cause our browser to start a particular program such as PKUNZIP.EXE, or simply ask we if we want to save the file. This is dependent on how our browser is configured.

Many browsers have an option "View source" where we can see the HTML source for the document we are currently viewing. This way we can see what each reference is anchored to. In addition, this is a great learning tool as you can see how pages were put together.

In general, these links have the format:

HREF="http://[machine_name]/[document_name]"

The line pointing to our homepage may look like this in a document:

<H1><A HREF="http://www.our.domain/index.html">Jimmo's WWW Home Page</A> </H1>

When we look at the document we don't see any of the tags, just a line that looks like this:

Jimmo's WWW Home Page

When we click on this link, the browser loads the document specified by HREF. In this case, www.our.domain/index.html.

Oftentimes the link is identified by the fact that it is a different color than the other text and links that we haven't visited are underlined. When we click on a link, and later return to that same page, the link will have a different color and the underline is gone. I have seen in some cases where there is no underline and the link just changes color. This is normally configurable by the browser.

The <H1> entry says that this line is to be formatted as a Header, level 1. The anchor is indicated by the <A> entry, and refers to the page index.html on the machine www.our.domain. Remember from our previous discussion, using the file index.html is a convention. If we had defined our DirectoryIndex to be some other file, this would replace index.html.

There are six levels of headings, numbered 1 through 6, with H1 being the largest and most prominent. Headings are generally displayed in larger or bolder fonts than the normal text. Normal text is simply anything not enclosed within tags. Like troff and similar editing languages, HTML does not care about formatting like carriage returns and other white spaces. Therefore we can include a carriage return anywhere in the text, but it will be displayed according to the rules of the tags as well as the width of the viewing window. This allows smaller windows to display the same text, without it getting messed up.

With lower resolutions, the level one header (<H1>) is way too large. I cannot remember ever seeing a site where a level one header really looked good. In most cases, unless we have a 17" monitor or larger and a high resolution, just a couple of words in a level one header takes up the whole width of the screen and is overwhelming. (We'll get more into some tips and techniques later.) The best way to see what it looks like is to try it yourself with different browsers at different resolutions. How the headers related to each other in size you can see in Figure 0-1.

Figure 0-1 Examples of Headers

HTML also provides explicit changes to the physical style such as <B> for bold, creating lists <UL> and forcing the Web browser not to do any formatting <PRE>. Some the more common tags are listed in Table 0\1.

A compendium of HTML elements can found at http://www.synapse.net/~woodall/html.htm. Not only does it list all the known HTML tags, but each has a description with a “compatibility” matrix description the HTML versions that include this tag, as well as which versions of the more common Web browsers support this tag.



Tag

Expansion

Definition

<TITLE>

TITLE

Describes the content of the document

<H1>-<H6>

HEADING

Heading with the specific level (1-6)

<P>

PARAGRAPH

Starts a new paragraph

<EM>

EMPHASIS

Logical tag for displaying emphasis, defaults to italics

<STRONG>

STRONG

Logical tag for displaying something of significance, defaults to bold

<DT>

DEFINED ITEM

The item that you are defining

<DL>

DEFINITION LIST

Start of a list of definitions

<DD>

DEFINITION

The definition of the term

<B>

BOLD

Physical tag enabling bold text

<U>

UNDERLINE

Physical tag enabling underlined text

<I>

ITALICS

Physical tag enabling italic text

<PRE>

PREFORMATTED

Disables formatting by Web browser

<BR>

LINE BREAK

Forces a line break

<A>

ANCHOR

Logical tag to enclose a link, normally used with HREF attribute

<UL>

UNNUMBERED LIST

Beginning of an unnumbered list

<OL>

ORDERED LIST

Beginning of an numbered or ordered list

<HR>

HORIZONTAL RULE

Draws a line across the page.

<HEAD>

TEXT HEADER

Indicates the document header

<BODY>

TEXT BODY

Indicates document body.

<LI>

LIST ITEM

Item within a list

<IMG SRC=>

IMAGE SOURCE

Defines the source/path of an image to be displayed.

Table 0\1 Basic HTML Tags

Here, too, when we want to stop the formatting, we use the same tag, but preceded with a slash. For example, to turn on bold it would look like this:

<B>

To turn it off, like this:

</B>

We could also include multiple formatting on the same line:

<H3><I>Welcome to <U><B>Jim's</B> Wonderful</U> Web Site</I> </H3>

which would look like this:

Welcome to Jim's Wonderful Web Site

Here we have several different tags that are laid inside each other. You don't have to match the pairs exactly like this. For example, the <U><B> or the </I> </H3> at the end of the line could have been reversed. We could have also had something like this:

<H3><I>Welcome to <U><B>Jim's</B> Wonderful</I> Web Site</U> </H3>

This would stop the italic formatting before it stopped the underline, although we had started the underline before we started the italic. The result would then look like this:

Welcome to Jim's Wonderful Web Site

We see that the underline continues under "Web Site," but the italics now stopped at the word "Wonderful."

As I mentioned, the convention is that the Web server's machine name is www.domain.name. To access their home page, the URL would be http://ww.domain.name. For example, to get to SCO's home page, the URL is http://www.sco.com.

The Page Itself

Using HTML you have the ability to control the overall appearance of the page. One simple way of doing this is to specify a color to use as your background. This is done using the BGCOLOR option within the <BODY> tag like this:

<BODY BGCOLOR =#rrggbb >

The color is specified as a red-green-blue (RGB) triplet. The value of the color is a hexadecimal value between 00 and FF. This specifies the intensity of each color. The higher the intensity, the closer it is to white. So, if we set all three colors to FF, we end up with a pure white background. Setting them all to 00 gives us black background.

By default, the text is black. So, if we were to specify a black background the text would be invisible. To get around this problem we use the TEXT option within the <BODY> tag. This too is an RBG-triplet. So, to specify white text on a black background, the line might look like this:

<BODY BGCOLOR =#000000 TEXT=#FFFFFF >

You have undoubtedly seen Web sites where there is an image as the background, rather than a single color. This is simply done with the BACKGROUND option. Note that if you specify a background image, this will overwrite what the background color is. Also, you need to pay attention to what the text color is. Sometimes the default color (black) doesn’t come out right, but slightly modifying it does. An example of using a background image and setting the text color to red would look like this:

<BODY BACKGROUND="background.gif" TEXT="#FF0000">

You can also define the color of your links within the <BODY> tag. The LINK option is the color of a link before anything happens. The ALINK option defines the color of the link when you click on it (A for “active”). The VLINK option defines the color of links that you have gone to already (V for “visited”). All three of these are RGB-triplets.

List

As indicated above, there are two types of lists that you can create: ordered and unordered. An unordered list is often called a bullet list as there is a symbol at the beginning of each line called a bullet. There are three types of bullets that you can use: DISC, CIRCLE and SQUARE.

For example, to specify an unordered list using circles, the line would look like this:

<UL TYPE=CIRCLE>

Ordered lists can also showdifferent types. The types indicate what symbols are used to identify each entry. Possible symbols are:

1 - Numbers (the default)

a - lowercase letters

A - Uppercase letters

I - Large roman numbers

i - Small roman numbers

To specify an ordered lists using small roman numbers, it would look like this:

<OL TYPE=I>

If you use the TYPE attribute within the list tag, the type is valid for all list items. If you use the type attribute within the list item tag, it changes the type for all subsequent items.

An interesting thing to note is that with an ordered list, the system keeps track of the numbers, even if you change types. For example, you could start off with uppercase letters, switch to lowercase letter and then finish of with numbers. You can see some examples in Figure 0-3 Figure 0-4 and Figure 0-4.

Figure 0-2 An Example of Unordered Lists

Figure 0-3 An Example of Ordered Lists

Figure 0-4 Another Example of Ordered Lists

Tables

Tables can also be created using HTML. The <TABLE> is used to start the table and each row and each cell needs to be defined as well. The tag <TR> is used to mark the table row and <TD> is used for the cells. In addition, you can also create borders around your table, change the alignment and even specify that a certain column span multiple rows or row span multiple columns.

Within the table cell tag, you can also use several attributes. Each has the syntax:

attribute = value

For example, to set the horizontal alignment to center, you would use:

<TD align=center>

The attributes that you can use are:

Below is a table describing the technical specification for a theoretical compressor. As you can see that in many cases, we specified both a vertical and a horizontal alignment, as well as cases were we spanned multiple rows or columns.

<TABLE border>

<TR><TD rowspan=2 align=center valign=middle>Typ<TD rowspan=2 align=center valign=middle>Air Volume

<TD rowspan=2 valign=middle align=center>Highest Pressure

<TD rowspan=2 valign=middle align=center>Tank Vol.

<TD COLSPAN=2 align=center>Motor performance

<TD COLSPAN=2 align=center>Fuse Amperage

<TD valign=middle align=center>Measurements

<TD rowspan=2 valign=middle align=center>weight kg</TR>

<TR><TD align=center valign=middle>230V AC

<TD align=center valign=middle>400V DC

<TD align=center valign=middle>230V AC

<TD align=center valign=middle>400V DC


<TD align=center valign=middle>L x W x H mm<TD align=center valign=middle></TR>

<TR><TD valign=middle align=center>TNG L/42

<TD valign=middle align=center>260

<TD valign=middle align=center>10

<TD valign=middle align=center>40

<TD valign=middle align=center>1,3

<TD valign=middle align=center>1,2

<TD valign=middle align=center>16

<TD valign=middle align=center>6

<TD valign=middle align=center>920x410x700

<TD valign=middle align=center>40

</TR>

</TABLE>

INCLUDE FILE: TABLE.JPG

Figure 0-5 Example HTML Table

Be careful with tables. These are one things that not every browser can handle. Even if the browser can’t, the table may still be usable. However, the table in this example may not be.

Bookmarks

I call them bookmarks, but a lot of other documentation calls them named anchors. Because you specify them within an anchor tag and you give them names, I guess this is the correct term. However, I find that calling them bookmarks is more descriptive of what the function they serve.

To use a named anchor, we first have to name the anchor. The name is an attribute given to that anchor. Although we don't have to use headers or any special formatting, specifying a named anchor might look like this:

<h1><a name="bm1">Here is bookmark 1</h1></a>

In the document that has the link, the reference might look like this:

<a href="doc.html#bm1">Jump to bookmark 1</a><br>

When we click on the text:

Jump to bookmark 1

We are brought to the label name=bm1. Since we jump to an anchor, we could have another reference, so the line might look like this:

<h1><a name="bm1" HREF="bookmark.html">Here is bookmark 1</h1></a>

The text "Here is bookmark" would now be a link and when we clicked on it, we would jump to the document bookmark.html.

Let's look at an example, consisting of two files. One contains the links to the other document and the document containing the name anchors. The first one looks like this:

<H1>Here is an example of jumping to specific places within a document.</H1>

<a href="doc.html#bm1">Jump to bookmark 1</a><br>

<a href="doc.html#bm2">Jump to bookmark 2</a><br>

<a href="doc.html#bm3">Jump to bookmark 3</a><br>

<a href="doc.html#bm4">Jump to bookmark 4</a><br>

<a href="doc.html#bm5">Jump to bookmark 5</a><br>

The document that we jump to looks like this:

<h1><a name="bm1" HREF="bookmark.htm">Here is bookmark 1</h1></a>

<a HREF="bookmark.htm">Back to bookmark.html</A>


<HR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR>


<h1><a name="bm2">Here is bookmark 2</h1></a>

<a HREF="bookmark.htm">Back to bookmark.html</A>


<HR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR>


<h1><a name="bm3">Here is bookmark 3</h1></a>

<a HREF="bookmark.htm">Back to bookmark.html</A>


<HR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR>


<h1><a name="bm4">Here is bookmark 4</h1></a>

<a HREF="bookmark.htm">Back to bookmark.html</A>


<HR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR>


<h1><a name="bm5">Here is bookmark 5</h1></a>

<a HREF="bookmark.htm">Back to bookmark.html</A>

In this example, I include the lines with the horizontal rule and the breaks do demonstrate the effect. If all of the book marks are visible on one screen, you don't see the fact that you actually move to the bookmark.


Menus

Menus are not actually something built into HTML, but you can create some very interesting and useful ones on your own. Unfortunately, you cannot create the pull-down and pop-up menus that we know in other applications, but we can design something that allows easy navigation through the web site.

The simplest form of a menu is simply text on the screen:

When you click on one of the lines, this brings you to another document. For example, if we clicked on the link Products, we would be taken to the products page, that has another menu of it's own. The HTML source for this would look like this:

<UL>

<LI><A HREF=“http://www.our.domain/products>Products</A>

<LI><A HREF=“http://www.our.domain/history>History</A>

<LI><A HREF=“http://www.our.domain/custserv>Customer Service</A>

<LI><A HREF=“http://www.our.domain/Sales>Sales</A>

</UL>

This is nothing more than just a series of links that just happen to be included in an unordered list. In this example, we did not specify a file to load, just a directory. Therefore, the server would look for a file with the name specified in the DirectoryIndex directive.

Maps

Maps are another very effective means of moving around a web site. These are images that are linked to a map file and its possible to click on the image to be brought to another document. These don't need to be "maps" in the conventional sense. Instead, they can be images of any kind. The location where we click determines where to go.

Maps consist of two parts: an image and a map file. The image can be anything that you want to display and the map file is a text file that gives coordinates and specifies what document should be loaded when that area of the image is click.

Let's assume that you have a site with a picture of your company's building. You would like a map so that when someone clicks on a particular part of the building, they are brought to a page that describes the department situated within that part of the building. We might then have a link on our page that looks like this:

<A HREF="/maps/dept.map"> <IMG SRC="/images/company1.gif" ismap></A>

Here the entire line is an anchor, but we are showing just an image (/images/company1.gif) and no text. There are two things to note here. First, the link that we are referencing (/maps/dept.map) is a map file and not a html page. The next thing is the reference to the image. Notice the ismap reference at the end. This tells the server that this is a map image. It is then able to process the map file correctly.

Let's now take a look at the map file:

default /dept/management.htm

rect (49,186) (178,196) /dept/export.htm

rect (250,195) (417,203) /dept/is_it.htm

rect (309,183) (415,194) /dept/marketing.htm

rect (181,190) (250,200) /dept/finance.htm

rect (48,197) (180,209) /dept/service.htm

rect (50,211) (177,222) /dept/purchasing.htm

rect (250,205) (416,217) /dept/sales.htm

The first line indicates what file should be accessed if you click on some location that is not defined. (In other words, the default document.) The other lines have the following format:

shape (upper_left) (lower_right) document

Here shape is the actual shape of the area that is being referenced. To define a rectangle, all we need is two corners. Here we give the upper left corner and lower right.

If we wanted, we could also defined a polygon, which needs to be defined by all of it's corners. In this way, we can define any common shape like a triangle or pentagon, and even irregular shapes.

Circles are also possible but are not as intuitively obvious as rectangles or polygons. Here you have two coordinates as well, but these are obviously not the corners. Instead

circle survey.htm 249,170 180,173

circle survey.htm 180,173 249,170

When you click on the image, the coordinates are passed by the client to the server, which then parses the map file to deliver you the appropriate document.

There are actually two different map types: CERN and NCSA. The examples above are for a CERN map file because the document name comes at the end. The NCSA type maps come between the shape and the coordinates. The same map in both formats would look like this:

NCSA:

circle survey.html 224,104 224,166

rect survey.html 406,67 495,168

poly survey.html 74,91 47,162 105,167 149,115 136,76 72,91

CERN:

circle (224,104) 62 survey.htm

rect (406,67) (495,168) survey.htm

poly (74,91) (47,162) (105,167) (149,115) (136,76) (72,91) survey.htm

Aside from the order of the information, another difference is the way circles are defined. An NCSA circle is defined with the center and a point on the edge of the circle. CERN circles are defined by the center and the radius. Also CERN maps have parenthesis around the coordinates while NCSA maps do not.

These image maps can be used in a lot of different ways. Some sites have fancy graphics that when clicked, will bring you to different pages. These behave like a menu. Maybe you have technical information on your product, such as a compressor. By clicking on an image of a compressor, you get details of that component.

One thing I have to mention is that it is clear that maps are easily identifiable as maps and image are images. I have been to some sites where you click on something that looks like a map and nothing happens. However, clicking on the images that look like decorations loads new documents.

Images

Images require two tags in order to be displayed. The first is the image tag itself (<IMG>) and then the source or path of the image. For example:

<IMG SRC="/images/pictures.gif">

Since these always go together, I usually think of this as a single tag.

There are several different things that you can do with images. Some are newer and may not work with specific browsers. The possible attributes are:

texttop

-

Aligns the image with the top of the "tallest" text on the line.

absmiddle

-

Aligns the middle of the current line with the middle of the image.

baseline

-

Aligns the bottom of the image with the baseline of the current line.

bottom

-

Same as baseline.

left

-

Aligns the image with the left margin. Text will wrap to the right.

right

-

Aligns the image with the right margin. Text will wrap to the left.

You can also change the appearance of the image with a couple of other attributes. For example, the HEIGHT and WIDTH attributes are used to change the physical size of the image. Normally, the browser will display the image on the screen based on the resolution and the image itself. With these two attributes you define how large the image should be.

This is useful when you have a "hole" to fill in your page. If you use the default size, this hole might be the wrong size for the image and your text looks weird. When you specify the width and height, you can have the image fill the hole exactly.

You can also use the VSPACE and HSPACE attributes to leave some space around your images. Normally, the text is right up against the image. These attributes change the vertical and horizontal space, respectively.

The BORDER attribute is used to place a border around the image. Although this looks good in some cases, I would use this with care. Depending on what the browser does and how wide the border, this might make it appear as if the image is a link. I can remember a couple of instances where I clicked on an image and nothing happened. I then looked at the source and discovered it was not a link.

One thing I need to point out with images is that they don't behave as other elements of your documentation.

For example, you might have:

Here is some text <BR>

<HR>

This would cause the horizontal line to be completely under the text. However, if we had an image:

<IMG SRC="/images/pictures.gif"><BR>

<HR>

The horizontal line might be sticking out from the side of the image if you are aligning to one side or the other. To avoid this, you can use the CLEAR attribute to the break tag (<B>). This takes the values ALL, LEFT or RIGHT.

It's best to use the attribute that matches the alignment of your image. For example, if you aligned the image to the right (ALIGN=RIGHT) then also clear to the right (CLEAR=RIGHT).

Text Attributes

When you use the paragraph tag (<P>), you start a new paragraph. Normally, this means you start a new line as well. Although you could use the <BR> (break) tag to start a new paragraph, the convention is to use the paragraph tag, if you are really starting a new paragraph.

Each type of text uses a particular font size, such as heading and normal text. However, you can specify different fonts without making them a specific header type. In the table at the beginning of this section, I mentioned bold, italic, strong, and so on. These can be used to change the characteristics of specific words or even individual letters. However, you can also change the size of each letter using the <FONT> tag.

The <FONT> tag has an attribute which is the "size" of the font. For example, so specify a font of size 6, the line might look like this:

<FONT SIZE=6>Here is some text in font size 6.

There is also a default font size that you can use to make the size relative too. This is called the base size for the fonts. By default, the base size is 3. If you wanted a font size that was one larger than this, you could specify with a font size of 4, or like this:

<FONT SIZE=+1>Here is some text in font size 4.

You could also specify a font that was one smaller than the base like this:

<FONT SIZE=-1>Here is some text in font size 2.

The basefont is something else that you can define. This is done with the <BASEFONT> tag. To set the base font size to 6, the line would look like this:

<BASEFONT SIZE=6>Here is some text in font size 6.

<FONT SIZE=+1>Here is some text in font size 7.

The line following the change, we set to a font that is one larger. Now one larger is no longer 4, but 7.

Depending on your browsers, you can also change the color of the text as well. For example, if I wanted red text, the line might look like this:

<FONT SIZE=66 COLOR=”FF0000”>Here is some text that would be in red.

The colors are specified in the same manner as the color for the background. In other words: RRGGBB.

Frames

Frames are a very useful tool in making your site both interesting and useful. However, I have encountered a lot of sites where it seems that they have used frames just to be able to say that they used frames. As with any other aspect of your web pages, you need to use them only where they are appropriate.

In case you haven’t seen any Web pages with frames or are not sure what I am talking about, a frame breaks up the Web page into multiple sections. Each is completely independent of the other so that you can scroll them separately or even load the contents of a page into a specific frame without changing the content of the other(s). The content the pages can even come from different sites as each is treated as it’s own page.

Although you could create a page with just a single frame, that would not be the point. Instead, we’ll make the assumption that each page consists of two frames. With that in mind, we have to say that each page actually consists of at least three documents . There is one main page in which the frames are built and one document for each of the frames.

In the main page, frames are set up using the FRAMESET tag, which replaces the BODY tag. After the FRAMESET tag, we must define whether the frames are to split the page horizontally or vertically. That is, whether we want columns or rows. In addition, we need to specify how much of the page we want each frame to take up. This can be specified either in pixels or in terms of a percentage of the window. Let’s assume we wanted two frame rows, each taking up exactly half of the screen. The line would look like this:

<FRAMESET ROWS="50%,50%">

To split the window in half, but with two frames side-by-side, the same definition would look like this:

<FRAMESET COLS="50%,50%">

If we wanted to split the screen so that the first (top) row takes up just 25% of the screen, we could specify the line like this:

<FRAMESET ROWS="25%,75%">

Although it would be simple in this case for us to do the math, we can let the browser do it for use. This is done by using an asterisk in place of the value that should be calculated. So, the previous example might look like this:

<FRAMESET ROWS="25%,*">

With percentages, this seems like wasted effort. However, remember that you can specify the size in pixels as well. This is useful if one frame has a particular size that you want to keep, but it doesn’t matter how much the other is. An example might look like this:

<FRAMESET ROWS="100,*">

This would create two rows, where the first is 100 pixels high and the second takes up the rest. That means if the content area is 500 pixels high, the lower frame uses 400 pixels. However, if the content area is only 101 pixels high, the lower frame only gets one. If the content area is less than 100 pixels the entire area is taken up by the first frame.

Each frame is referred to with it’s own tag along with the name of the document, like this:

<FRAME SRC="frame_top.html">

<FRAME SRC="frame_btm.html">

In this case, the browser will write the first page into the top frame and the second page into the bottom frame. If there are links in either of these pages, they are simply written to the same frame as the original page. For example, if frame_top.html were to contain a link, clicking on it would write the page into the top frame.

There are a several options that you can add to the FRAME tag. The first is SCROLLING. This can be set to yes, no, or auto. If set to “yes”, scroll bars will appear on the right and at the bottom, no matter how big the window is. If set to “no”, scroll bars will never appear. If set to “auto”, the browser will determine if scroll bars are needed based on the size of the frame, content area and window.

The MARGINWIDTH and MARGINHEIGHT define how many pixels the gap is between the contents and the edge of the frame. MARGINWIDTH determines how wide the left and right margins are. MARGINHEIGHT determines how high the top and bottom margins are.

If the NORESIZE set, the user cannot change the size of the frame. This does not mean scrolling, since when you scroll the size of the frame is not changing, just what is being displayed in it. If NORESIZE is not set, then you can grab the border between frames and actually change the size of the frame. This would mean that even though you split the window in half, you could drag the border to make one frame twice as large as the other, for example.

Using the NAME option, you begin to see one of the really nice things about frames. Once a name has been given to a frame, you can direct subsequent pages into that frame, even if the link is in another frame. For example, let’s assume our two frames are defined like this:

<FRAMESET COLS="10%,*>

<FRAME SRC="frame_left.html" NAME=”menu”>

<FRAME SRC="frame_right.html" NAME=”body”>

</FRAMESET>

We end up with two frames, side-by-side, with the left frame taking up 10% of the window and the right frame taking up the rest. As you may have guessed, the left window contains a menu. Our intention is to click on an item in the menu and have it display in the other window. The link would look something like this:

<A HREF="menu_item1.html" TARGET="body" >

When we click on the link (either text or an image), the document is written to the right-hand frame. This is because we specified the target as being the frame “body”. Note that the TARGET attribute is not used with the FRAME tag, but rather with the HREF tag when we specify the link.

There are several pre-defined target names that you can use. These are:

Another useful tag is the BASE TARGET tag. This will assign a specific frame as being the target for all pages without a target specified. You could use this in our menu example above. If the BASE TARGET was set to the “body” frame, then anything clicked on in the “menu” would automatically go to the right place.

When creating documentation for our users, we took advantage of the fact that each window is treated individually. That is, within each frame you can set background colors or images, load pages, change fonts, and so on. Since it is a separate page, what stops you from creating a new frame set? Nothing. In fact, that’s exactly what we did.

Using the menu example, the page that was loaded into the “body” frame also contained a frameset. This frameset broke the frame into two horizontal frames like this:

<FRAMESET ROWS="*,100">

This created two frame rows, with the lower one taking up 100 pixels. In this frame we wrote the administrative information about the page, such as the date last updated, who was responsible for the content and who wrote it. Using the mailto: tag, we could click on the person’s name and send them a mail message is there was something that needed to be addressed.

Each time we created a new page, there were essentially three pages to create. One was the contents of the “body” page and then the two frames it contained. We found that although there was some more work initially.

One last thing about frames: You need to be careful. Not every browser can handle them. The version of Netscape that you can download from the SCO Web page can. However, not everyone has it. Internally it is not as much a problem as you know what browsers each user has.

Connecting Your Pages

You can have links that point to other things than just HTML pages or images. For example, you could have compressed files or tar archives. Based on how your browser is configured, these could then be copied to you local machine and even uncompressed.

You will often see that there is a link pointing to an email address. (e.g., [email protected]) When you click on this link, it starts any mail program that is connected to your browse.

A very common thing to have links point to is forms. These are HTML documents that have input fields. The input is then passed to a program which parses the information.

At this point, I am going to assume that you are familiar with the basic constructs of HTML and perl. Hopefully, you have already created a few pages and some perl scripts. Now is the time to combine these two to create some really interactive Web pages.

Information is passed from a Web page to the Web server through the Common Gateway Interface (CGI). The "Common" means that the information is passed in the same way no matter what is on the receiving end. If you want, you can write your CGI programs in a compiled language such as C, or even as a bash shell script. However, perl is quicker to use than a compiled language and is much more powerful than shell scripts. Since you have a perl interpreter on your system and you are already perl expert, there should be no problems.

In order to get the Web server to pass the information, we have to tell it that it's a good thing to do. In other words, we have to create an environment in which the server knows that it should pass this information. We do this by creating a form. A form is simply another section of a document. It is possible that the form takes up the whole document, but it doesn't have to. As with other sections, forms are started with <FORM> and ended with </FORM>.

Passing information to the CGI program is done from an HTML form using a process called a method, which can either be the POST or the GET method. Each has it's own way of handing the data off to the CGI program. Knowing which way the data is coming from is important to be able to process it. This is specified when you define the form. Here, too, you also specify what action is to be taken. That is, what script or program will be started.

The GET method passes the data via the environment variable QUERY_STRING. This string is then parsed to be able to access the individual variables. The POST method passes the data as an input stream to the program, which comes in via STDIN. In both cases the format is the same, so once you have the string, parsing it is the same no matter what method was used.

In many books, magazines and other sources this string is referred to as a query string. This comes from the fact that in many (most) cases this string is passed to the CGI program and is used to query a database. However, this does not need to be the case as the information can simply be passed along without any kind of queries.

One nice thing (OK, it was designed that way) is that the information comes across in a known form. Looking at the query string, you (as a human) can easily see that there are patterns to the way data is grouped in the string. The variable name and its value are separated by an equal sign. This is equivalent to saying, "This variable equals this value." The variable-value pairs are separated with an ampersand. This says, I have this pair AND this pair AND this pair, and so forth. Should one of the values have a literal space it in, that space is replaced by a plus sign (+). When the variable-name pair is read, this plus sign needs to be considered.

The four primary types of form elements are:

text - the input is any text you want

checkbox - can select any number of these options

radio button - can select one of these

multiple - can select one of these (list of items)

Each form element can be used as many times as you would like. That is, you can have several different text variables, checkboxes, or whatever. Although you can have multiple instances of the same elements, you should have each with a different variable name. (How are you going to keep track of which is which?) Thereforer, you could have a dozen text elements as long as they have different names.

One gotcha (at least, it got me) is that every checkbox that is checked will be passed to the CGI program. Here we have 4 checkboxes all with the name USAGE. Obviously, this is one case where you can use the same name over and over again. You could label each of the check boxes with a different name (usage1, usage2, usage3, and so on.). However, you don't need to. While parsing the query string, you can read each one in turn and (for example) set flags based on it's value.

The gotcha is that if you decide to use the same name for each of the check boxes, then each one that is checked will be passed along as a seperate variable. Even though they have the same name. It is up to you to parse them. If you only check for the existence of that variable (i.e., if $variable_name eq usage ) and don't look for individual values, you'll loose all but the first (or last) value.

One thing that I like to do with radio buttons is to pre-set one. This is done with the CHECKED attribute. This is done when you should have one of the values checked.

Keep in mind that unlike text, checkboxes and radio buttons can only take on discreet values. That is, there is only a limited number of values they can have. In fact, they are either set or they are not set. In contrast, text variables can be anything you want. You can then check for each of the possible values. For example:

if ( $value eq "fun" )

{

for_fun = 1

}

if ( $value eq "work" )

{

for_work = 1

}

*

*

*

It makes sense to assign radio buttons and checkboxes to a string of the same name. That way it is easier to keep track of what variable in the CGI program is associated with what field in the form. You can easily assign the value to the appropriate value:

@data = split(/&/,$query_string); foreach $line ( @data) {

($field_name,$value) = split (/=/, $line);

$$field_name = $value;

}

The line $$field_name = $value says to expand the variable $field_name and with the other $ in front of it, it now becomes a new variable with the name of what ever it was expanded to. For example, if $field_name = "lastname", then this line is equivalent to saying:

$lastname = $value;

We could do this to ensure the readability of the script and to ensure that the variable names are what the script expects. However, IMHO forcing the variable names in the CGI program to be the same as from the form ensures that there is no confusion.

Note that this assumes that you do not have any multiple variables with the same name such as with checkboxes. The solution, in this case, would be to either check specifically for these variables, or give each box a unique name.

For names, keep them as two or three seperate elements. Although you could parse the string, it's a lot more efficient to have them as seperate fields to begin with. By this I mean that you shouldn't have a single field "name", but rather one field "last_name" and one field "first_name". This is a lot easier to parse than the single variable "name".

Also, someone might add an extra space to the end of a name; unfortunately this appears automatically as a plus-sign (+). Therefore, it might be a good idea to strip out all the plus signs from peopleÕs names. (I have yet to see someone with a plus-sign in their names. Besides, the + gets changed to %2B anywayÑthe hexadecimal value of '+').

Another good idea would be to set all variables you use to a default value within the CGI script, before they are first referenced. That way you do not have to worry about any unexpected values if the form does not pass that variable. By setting it to NULL ($variable = ""), you could check to make sure all "required" fields have been filled out. That is, they are not NULL. If you don't set them to anything and the user doesn't fill in the field, the CGI script does not send the variables. When you access that variable you end up with an unreferenced variable.

If the information that a user has input is supposed to be saved (i.e., in a customer database), it is recommended that you first check this information with the user to verify it. (For example, create a new page with this information and display it to the user.) You can also check the validity of the values, such as names with numbers in them. (NOTE: I first thought about limiting zip-codes to only numbers. However, countries like Canada and England do have letters in their "postal codes.")

So, let's look at our first form. This is a very simple form that searches a text file for an input value. The source for the form looks like this:

<HTML>

<HEAD>

<TITLE>FORM TEST</TITLE>

</HEAD>


<BODY>

<FORM METHOD=POST action="http://www/Scripts/phone.pl">



<B>Name: </B></I></I>

<BR>


<INPUT NAME="name" type=text maxlength=50 size=50 >

<BR><BR>

<input TYPE=submit>

<input TYPE=reset>

</FORM>

</BODY>

</HTML>

When we load the page in a browser, the page looks like figure XYZ. Here we are prompted to input a single value: NAME. This is accomplished with the line:

<INPUT NAME="name" type=text maxlength=50 size=50 >

This says that the variable's name is "name," it is of type "text" (as compared to radiobutton or checkbox), has a maximum length of 50 characters and the input field displayed on the form has a size of 50 characters. We show a two button variable with the lines:

<input TYPE=submit>

<input TYPE=reset>

When we press the "submit" button, the information that we input into the form is passed to the script specified by the form action line. In other words, what is the action to be taken when the button is pressed? In our form above, it looks like this:

<FORM METHOD=POST action="http://www/cgi-bin/phone.pl">

This is passed to the script phone.pl using the POST method. The POST method passes the string to the script via stdin and not through the environment variable QUERY_STRING as the GET method would. The script then passes STDIN. In our example script, which we'll get to shortly, input using either method is possible and we check for which method is used and act accordingly.

The other button is a reset button. Rather than sending information to our script, the reset button tells the system to present a form in it's default state. This normally means that all the text fields are blank.

The follow script searches through a data file looking for the values we put into the form. In this example, the data file is a company phone book. The perl script looks like this:

# Here we set things up

srand(time|$$);

$random=int(rand(1000))+1;

$last_name == "";

$title = "";

$dept = "";


DocumentRoot=”/var/scohttpd/html/”;


# here we get the information passed through the CGI interface and

# Load it into the variable form_info

$request_method = $ENV{'REQUEST_METHOD'};


if ( $request_method eq "GET" ) {

$form_info = $ENV{'QUERY_STRING'};

} else {

$size_of_info = $ENV{'CONTENT_LENGTH'};

read (STDIN, $form_info, $size_of_info);

}


# Here we define the output files. Note that the directories must already

# exist

$TEMPFILE="/temp/".$random.".htm";

$LOCATION="Location: http://www".$TEMPFILE."\n\n";

$OUT="> ".$DocumentRoot.$TEMPFILE;


# We open our input file 'telbook.txt' and die if we can't.

# We also open our output file.

open (INPUT, $DocumentRoot.”telbook.txt") || die "ooops: $! \n";

open (PAGE, $OUT);


# This creates standard header information for the HTML page that is created.

print PAGE "<HTML>\n";

print PAGE "<HEAD>\n";

print PAGE "<TITLE>Search results</TITLE>\n";

print PAGE "</HEAD>\n";


# Here we are splitting the for_info variable (i.e. the QUERY_STRING) and loading

# it into the array 'data'.

@data = split (/&/, $form_info) ;


# The 'foreach' loop reads each entry in the array 'data' and then one

# at a time assigns the entry to the two variables field_name and value

foreach $line ( @data ){

($field_name, $value ) = split (/=/, $line) ;


# here we convert everything to lower case.

$value =~ tr/A-Z/a-z/ ;

}


# Read we read the input file a line at a time and after converting the line to all

# lowercase we determine if the string from the form is a sub-set of the input line.

while ( $line = <INPUT> ){

$line =~ s/;/ /g;

$line =~ tr/A-Z/a-z/ ;

if ( index($line,$value) != -1)

{

print PAGE $line,"<BR>\n" ;

}

}


print PAGE "<BR>\n";


print PAGE "</HTML>\n";

close PAGE;

close INPUT;

print $LOCATION;

The first two lines of the script generate a random number where the seed for the random number is based on the time and the process ID ($$). This random number will then be used to name the file where the script outputs its information. This normally results in a unique random and therefore a unique file name.

This section does the actual reading of the information passed from the form to the CGI script:

$request_method = $ENV{'REQUEST_METHOD'};


if ( $request_method eq "GET" ) {

$form_info = $ENV{'QUERY_STRING'};

} else {

$size_of_info = $ENV{'CONTENT_LENGTH'};

read (STDIN, $form_info, $size_of_info);

}

First, we check for the method that is used to passed the information ($request_method = $ENV{'REQUEST_METHOD'};). Based on the method, we read the information in a different way. If the information was passed using the GET method, then we need to parse the QUERY_STRING. If we used the PUT method (specified by the else block), the input comes via STDIN. In both cases, we assign the information from the form to the variable $form_info, which we parse as we move along.

We next set up the input and output and write the header into the output file. We next split the input line into fields of the array data, which is then parsed one field at a time. If the field name is “name”, we change the value to lower case so that we ensure that the input value and the data are both in the same case.

In the while loop, we read each line from the data file (telbook.txt) one at a time. Our data file consists of a simple ASCII file where each record is on a single line and the fields are separated by a semi-colon. We change the semi-colon to a space ($line =~ s/;/ /g;) and then convert it to all lowercase ($line =~ tr/A-Z/a-z/;). This is so that everything is all the same case.

If the value we received from the form is a sub-string of the line we read from the input file (if ( index($line,$value) != -1) ), we then write that line into the output file.

What the index function does is deliver us an offset into a string. In our example, index would deliver the starting position of the $value string within the $line string. If a value other than -1 is returned, then $value is a sub-string of $line. This allows us to input the department name or other piece of information (although the field name is "last_name"), since we can see if the telephone number, department, and so on are sub-strings of the line we read in.

If we find what we are looking for (i.e., $value is a sub-string of $line), the line we read in is written to our output file. Once we have completed the input file, we finish up the output and tell the browser the location of the output file (print $LOCATION;).

We could actually output everything to the STDOUT rather than to a file. I like this method better, since you have a kind of buffer between your script and the page. I can prepare everything before I send it to the user.

One expansion of this script is to search for individual fields. In this current state, we look for a string anywhere in the record. We could search in just specific fields. For example, you could search for departments. This could be done by having a radio button and then switch which field is searched based on what button is active.

Another expansion would be checkboxes that determine what fields are displayed. You may only want to see just the name and phone number, for example.

Remember that you can execute system commands from within a perl script. If you wanted to, you could execute a program that reads information from a database. Provided the output of the database query was in a form that CGI interface understood, you could use the query program directly.

My personal opinion is that you create some kind of "wrapper" around the database query, such as a shell script. That way you can parse all input and make sure that the input is valid. The help prevent instances where input is passed along to the database program which ends up executing something unintentional.

Our next example is a bit more complicated. The HTML for the form looks like this:

<HTML>


<HEAD>

<TITLE> Pick-a-Car</TITLE>

</HEAD>


<BODY>

<left>

<H1>Pick-a-Car</H1>

</left>

<FORM METHOD=POST action="http://www/cgi-bin/pick.pl">


<BR>

Where do you work?

<input NAME="work" VALUE="Admin" type=radio checked>Administration

<input NAME="work" VALUE="Personnel" type=radio>Personnel

<input NAME="work" VALUE="IS/IT" type=radio>IS/IT

<BR>

What is your form of address?

<input NAME="title" VALUE="Mr." type=radio checked >Mr.

<input NAME="title" VALUE="Mrs." type=radio>Mrs.

<input NAME="title" VALUE="Ms." type=radio>Ms.

<input NAME="title" VALUE="Miss" type=radio>Miss

<BR>

<BR>

<Select multiple name=”occupation”>

<OPTION>Clerical

<OPTION>Consultant

<OPTION>Corporate Executive

<OPTION>Educator

<OPTION>Lawyer

<OPTION>Manager

<OPTION>Physician

<OPTION>Student

<OPTION>Technical Specialist

<OPTION>Unemployed

<OPTION DEFAULT>Other

</SELECT><P>

<BR>

I use my car for

<input NAME="use" VALUE="fun" type=checkbox>fun

<input NAME="use" VALUE="work" type=checkbox>work

<input NAME="use" VALUE="education" type=checkbox>education

<BR>


<B>

First Name:

<INPUT NAME="firstname" type=text maxlength=50 size=50 >

<BR>

Last Name:

<INPUT NAME="lastname" type=text maxlength=50 size=50 >

<BR>

I would like a car that seats:

<BR>

<input NAME="size" VALUE="tiny" type=radio checked >2<BR>

<input NAME="size" VALUE="small" type=radio>4<BR>

<input NAME="size" VALUE="medium" type=radio>6<BR>

<input NAME="size" VALUE="large" type=radio>8 or more<BR>

<HR>

<input TYPE=submit>

</FORM>

</BODY>

</HTML>


The form we show in Figure 0-6 might be used to get information about a car. You input the necessary criteria and the form finds you the right car. (Well, sort of.) Here we introduce the input types radio and checkbox. A radio button is so named because it behaves like the buttons on your car radio. You can only have one pressed at a time. If you press one, the other is popped out.

Figure 0-6 Example Form

Here, when you select one, the other is de-selected. In our example, there are several variables that are defined via radio buttons, such as the form of address and the department. Normally, you are called by a single form of address and belong to a single department. Additionally, when looking for a car, you have an expectation of how many people it can seat. This too is a radio button.

On the other hand, what you use the car for could be several different things. For example, you might use it for fun and business. That is why it is a checkbox instead of a radio button. In this example, we see again that each fields is split out of the QUERY_STRING by the line:

@data = split (/&/, $form_info) ;

This is the further split into the field name and value in the loop:

foreach $line ( @data ){ ($field_name, $value ) = split (/=/, $line) ;

Each time through the loop we check for various field names, such as firstname, lastname and so on. The split line separates each entry in the array data one at a time into the two variables: $field_name and $value. We then check each field_name to see if it is one of the fields we are looking for. Here I used an interesting trick that perl allows us to use. For example, the block that I used to get the first name looks like this:

if ( $field_name eq "firstname" )

{

$$field_name = $value;

}

In this example, if the $field_name variable has the value "firstname" we enter that block. We then assign the value of that variable to a new variable $$field_name. There are two dollar-signs ($$) here on purpose. This is something we talked about earlier. The variable $field_name is expanded to be the name of the field that is passed through the QUERY_STRING. In this case: "firstname." Because there is a second dollar-sign, we now have the variable $firstname, which we assign the appropriate value. In this way we have a variable that has the same name as the variable that was passed from the form.

At the end of the script, we output a few lines to the output file that creates a link to the appropriate HTML page. We output a simple greeting, using some of the values we got from the form and wrote them to the output file. These lines look like this:

print PAGE "<B>Hello ".$title." ".$firstname." ".$lastname." of the ".$dept." Department</B><BR>\n";

print PAGE "<BR>\n";

print PAGE "You requested information on a car that seats ";

We then output the URL to the file containing the information for the car that we selected. For example, if we had chosen a car that seats 2, we would want the page tiny.htm. The block would look like this:

if ( $deliver eq "tiny" )

{

print PAGE "2<BR>\n";

print PAGE "That would be the <A HREF=\"http://www/product/tiny.htm\">

Tiny Series</A><BR>\n";

}

Note that we need to escape the double-quotes (using \") to keep them from being interpreted by perl. When the page is finished, we click on the words "Tiny Series" and are brought to the right page. Finally, at the end, we tell the CGI interface the location of the URL with:

print $LOCATION;

Adding all the components together, we get a perl script that looks like this:

srand(time|$$);

$random=int(rand(1000))+1;

$last_name == "";

$title = "";

$dept = "";


$request_method = $ENV{'REQUEST_METHOD'};


if ( $request_method eq "GET" ) {

$form_info = $ENV{'QUERY_STRING'};

} else {

$size_of_info = $ENV{'CONTENT_LENGTH'};

read (STDIN, $form_info, $size_of_info);

}


$TEMPFILE="/temp/".$random.".htm";

$LOCATION="Location: http://wwwour.domain".$TEMPFILE."\n\n";

$OUT="> "."//www.our.domain/".$TEMPFILE;


open (PAGE, $OUT);

print PAGE "<HTML>\n";

print PAGE "<HEAD>\n";

print PAGE "<TITLE>Just a third Test</TITLE>\n";

print PAGE "</HEAD>\n";


print PAGE "Your Query String is: ".$form_info."<BR>\n";


@data = split (/&/, $form_info) ;


foreach $line ( @data ){

($field_name, $value ) = split (/=/, $line) ;


if ( $field_name eq "firstname" )

{

$$field_name = $value;

}

if ( $field_name eq "lastname" )

{

$$field_name = $value;

}

if ( $field_name eq "title" )

{

$$field_name = $value;


}

if ( $field_name eq "work" )

{

$value =~ s/\%2F/\//;

if ( $value eq "" ) {

$dept = "Unknown";

} else {

$dept = $value;

}

}

if ( $field_name eq "size" )

{

$size = $value;

}

}


if ( $last_name eq "" ){


$last_name = "Man without a name";

}

if ( $title eq "" ) {

$title = "Mr.";

}

if ( $dept eq "" ) {

$dept = "Kitchen"

}


print PAGE "<B>Hello ".$title." ".$firstname." ".$lastname." of the ".$dept." Department</B><BR>\n";

print PAGE "<BR>\n";

print PAGE "You requested information on a car that seats ";


if ( $size eq "tiny" )

{

print PAGE "2<BR>\n";

print PAGE "That would be the <A HREF=\"http://www/Products/tiny.htm\">Tiny Series</A><BR>\n";

}

elsif ( $size eq "small" )

{

print PAGE "4<BR>\n";

print PAGE "That would be the <A HREF=\"http://www/Products/compact.htm\">Compact Series</A><BR>\n";


}

elsif ( $size eq "medium" )

{

print PAGE "6<BR>\n";

print PAGE "That would be the <A HREF=\"http://www/Products/medium.htm\">Medium Series</A><BR>\n";

}

elsif ( $size eq "large" )

{

print PAGE "more than 8<BR>\n";

print PAGE "That would be the <A HREF=\"http://www/Products/large.htm\">Large Series</A><BR>\n";

}

else

{ print PAGE "Sorry no car like that.<BR>\n"; }



print PAGE "</HTML>\n";

close PAGE;

print $LOCATION;


The result returned to you from this form can be seen in Figure 0-7.

Figure 0-7 Results of CGI Script

Think back to the section on writing perl scripts. Remember, there was a script that would search through a list of books looking for specific information. You could even tell the script what field it should look in for the information you were searching for.

Well, image that you are a bookstore, selling many (it not all) of your books through mail order. You decide to start selling books on the Internet as well. Since you are an exclusive book store and only sell a small list of selected titles, the bookdata.txt file that we used earlier by coincidence happens to be your complete list of titles.

Wouldn’t it be nice if you could let your users search through your data base for books? Once they found the books they were looking for, they could click on a link and order that book. Fortunately, it’s not that hard. Many businesses are doing it, not just bookstores.

We can create a very simple form and then modify our perl script to access the input from the form.

Tips and Techniques

As we are creating our Web site, we need to keep in mind how different Web browsers are going to present our pages. Although there is a standard (HTML) that defines specific aspects of the formatting, how it really looks on the page will be different from browser to browser. Be careful of formatting that can be used on all browsers.

You need to test all your pages with more than one browser and from more than one system. I have experienced it many times where the developer of a Web site assumes that I have the latest version of Netscape or Microsoft Explorer and the page looks terrible without it. One common problem is the use of frames. Newer versions of Netscape and MS Explorer can handle them.

Another problem is the resolution of the video card and monitor. I have written pages myself and have come up with a great looking graphic only to find it looks terrible on another system with a lower resolution. Here again, it is an issue of testing your Web.

Every browser can display any ASCII text files. However, the only formatting ASCII knows is spaces, carriage returns, and so forth. It is the browser that does a word wrap if the line gets too long.

Always include a date of the latest version of the site. It may be sufficient to have a date on our home page to indicate the date the site was updated. However, it doesn’t take much work to do it for every page. However, it will depend on the layout of each page and how the date looks, and therefore whether it is appropriate on each page.

Remember that our page may be viewed by an international audience. Dates are not written the same way everywhere. What does the date 01/09/96 mean to we? Is this the first of September or the ninth of January? In the US it means January 9th, 1996. However, this is September 1st in Germany. It would be clear if we said Sep. 9, 1996, 9 Jan. 1996, or something else that names the month. No matter what order it’s in, it will be clear.

Always include some way to send comments. This could be a form, or simply a link to an email address. In most cases, this person is called the Webmaster and there is a link [email protected]. If you have contracted someone to create your web site, this email address is normally someone at the developers site who can make the changes.

Consider an interactive form. Have a spot for the person's email address, but don't make it required. Having an email address on the page is an obvious way, but you may get more responses with a form with checkboxes and fill in boxes.

Icons are a small picture or graphic that are used to represent something. These could be buttons on a toolbar or bullets in a list. A set of icons with similar appearance gives our pages a consistent look and feel. Icons should be small, so that they load quickly (less than 1Kb). So that each image needs to be loaded only once, use the same icon repeatedly. Most browsers will store the image locally and then re-use the same image each time a page calls, rather than loading from the server each time.

Use the <TITLE> tag so that the title of page shows at the top of the browser. This can help to remind users where they are.

Be careful about the <PRE> pre-formatting tag. It may look fine on our machine with a specific resolution, but it may come out weird on another machine. Test it with several resolutions and several browsers.

Also, make it obvious where to click next. I have been to some pages where there is no logic as to where to click. Things look like they’re clickable, but aren’t and things that don’t look clickable are.

You might also want to consider have text-only pages for people with slow line or text browser, such as lynx. Also, if you have frames on your page, you might consider a non-frames page as well. When I created my first Web page with frames, I wrote something improperly. Some browsers ignored it, while others did not show the frames properly. This taught me to have a non-frames version as well, to test my site with a couple of different browsers.

Writing styles

Writing for web pages is not the same as writing brochures or data sheets. The Web is meant to be accessed in a non-linear fashion, that is, bouncing around. That's part of why it's called a Web. Like a spider’s web, there is no top or start to the web. The flow of access is in the hands of the reader and not the writer. We must write with this in mind

Start with summaries, overviews or the "big picture" and then go into details later, using links or images. Let the reader decide how much to read and when. The information that we present to a visitor needs to be downloaded fairly quickly. If there is something on our site that is exceptionally large, we might consider a summary of the document and then a link to download the entire document. We should think about compressing the document using PKZIP.EXE, if we are a DOS head, or gzip if our site will be accessed by UNIX people. Even if our site is running on UNIX, we need to consider who our audience is.

By having summaries rather than full text, we can get more information into the hands of our visitor more quickly. Alternatively, we can have a summary at the top of the document followed by the full text. This gives the reader the chance to see if this is the right document for them.

Document size considerations

There is no rule as how big it should be. However, we need consider that larger documents take longer to load. If we have a relatively slow dial-up connection (14.4bps or less) it can be irritating if a home page is too large. Visitors might get upset and stop downloading the page even before it's finished. I know I have. If the page is too small, there might not be enough to tempt the visitor to look further.

I suggest that "main" pages are one or two screens. This gives a quick introduction to what the rest of the site offers. Later, when we have captured the visitor’s interest and get into more detailed information, our pages can be longer.

If the document is large, consider bookmarks at top and bottom to different sections within the page. Perhaps include links at regular intervals throughout the document that jump up to the top of the document. We could also have links at top and bottom, such as "Up to {section_name}" or "Down to.."

I have seen many pages that have links to within the same page. There might be technical terms that are defined elsewhere on the page, or other information related to the current document. This is very effective if done properly.

If a browser cannot show the image, it will normally replace the image with some symbol. Although this tells us that an image could not be loaded, it doesn’t say which one. We can tell the browser to display a specific text if the image cannot be loaded. This does with the <ALT> tag:

<ALT="image description">

Be careful with the sizes of images. They may look good and we may think that we need that particular image. However, the size might be overwhelming. For example, when preparing a web site for one company, I included an aerial view of the factory. Although this was a very impressive shot, it was over 300 Kilobytes. On my machine it was no problem since I was connected via ethernet to the Web server in another room. However, if a customer were to try to view that page across even a 14.4 modem, they would immediately lose interest.

There is probably little we can do with the size of a lot of images. However, we can change the resolution. Even on the best monitors, I have found that a resolution of as low as 75dpi provides the necessary clarity.

In my opinion, a link that says "click here” looks silly. There is no need to tell the visitor where to click. Take an elevator button as an example. Wouldn't it look silly if one button was labeled:

press here if we want to go up

and the other was labeled:

press here if we want to go down

Instead, the buttons are labeled with just up and down or maybe even just arrows. We know that to go up we press the button. The same applies to links on a web page. We know that clicking on the link performs some action. Therefore it would be better to have something like this:

Download a 1200dpi picture of our company president (4.6Mb)

We know that clicking this link will download the image. It maybe via ftp or http, but that doesn't matter. It's the result that's important.

Writing Our Pages

One thing I haven’t really addressed is writing the pages. I mentioned that they can be written with vi. However, this requires that you not only learn vi, but also learn HTML, as you must input each HTML tag by hand. Recently, as a result of the increasing popularity of the Web, several commercially available HTML editors have become available. If you download Netscape Navigator Gold from the SCO Web site, you will find that it comes with one. (see Figure 0-8)

Figure 0-8 Netscape Navigator Gold Editor

In principle, Netscape Navigator Gold looks just like Netscape Navigator. The first difference you will notice is that there is a button on the toolbar labeled “Edit”. This will load the current page into the editor. However, if you have a page loaded which contains frames, you cannot load them. Instead, you will have to load the frames by hand to edit them. (You can also edit them from the “Edit” entry in the File menu.)

In essence, this editor is WYSIWYG. That is, What You See Is What You Get. One difference is the editor allows you to insert any kind of HTML tag, even those that are not displayed. Therefore, you will see them in the editor, but not in the browser.

One very nice feature is the ability to “publish” your pages. This means that you can develop them on one machine and have the editor copy them to another. This can be doing using FTP. In the Options menu is a new entry : called Editor Preferences. Here you can configure a default location of where to publish your pages, how to maintain the links, and even the user name and password it should login in as.

The Next Step

I’ve said it a couple of times already, but the only way to get good is to practice. The best thing is to create a bunch of pages on your local machine and get a feeling for how they look. Make changes and notice how the changes to the source change the appearance in the browser. Since you are doing this all locally, you don’t have to wait for the long transfers across the net.

Java & JavaScript

One of the disadvantages of perl is that everything is run on the server. Also, there is no way to interact with the browser. All the perl script can do is to create pages that are then loaded by the browser. Java and JavaScript solve that problem by running on the client, enabling substantially more interaction with the server.

Java

Java is a programming language, like C, Fortran, Pascal or any other programming language. If you already know C, then you have a head start in learning Java. Although Java is similar to C, in that it shares some of the syntax, and both Java and C programs are compiled, Java programs are run from within a Web browsers and C programs are not. These programs are given the name applet.

A key difference between Java applets and traditional programs is that Java is limited in what it can do. You can create programs in C or another language that directly access the hard disk, or access arbitrary locations in memory, but this is not possible in Java. This is because many of the features of C and C++ have been removed from Java.

Perhaps the most significant benefit of Java is that it is platform independent. This means I can write my Java applet on an SCO machine, but you could still access it from a Windows 95 machine (provided your browser supports Java). This is because although Java is compiled, it is compiled into an intermediate form called byte code. On the local machine the Java interpreter reads the byte code and executes the instructions appropriate for the local architecture.

Here I need to point out an important difference between perl and Java. Although there are a lot of differences in how you program each (as well as a lot of similarities), the biggest difference is where the program runs. Java is included in the web page itself. It is run locally and therefore, your browser must support it. It can, and does, interact directly with your machine. On the other hand, perl is run on the server.

For you, the Web site developer, all of this is interesting information, but does little to help you decide if Java is right for you. Up to know, we’ve talked about making your Web site interactive. You get responses from your visitor and the site changes accordingly. Java helps you make a Web site that changes dynamically, without intervention from the visitor. Rather than pressing a link to get the page to change, it changes as you watch.

Another key advantage is that you are no longer limited to content types that the client understands. For example, you can have videos running on your Web pages without the clients needing to have an extra viewer. With Java, you send both the video and the viewer.

This can actually be expanded to include any type of program such as a spreadsheet or database. Instead of having the client application on your machine, it will be carried across the network to you machine. Because of its design, Java needs only to load those parts of the program that you need, so it’s a lot faster than loading the entire application from a file server. In contrast to GUI frontends for other applications, the program is local so you don’t need to wait for the request to be sent to the server, evaluated and sent back.

So, is Java right for you? Well, that depends. Although it does make for an exciting Web site, it isn’t necessary to get your business up and running on the Internet. Like other Web page components, you could try creating a couple of Java applets and see what they can do for you.

SCO provides a JAVA Development kit. You can find more information on how to get a copy from the SCO Web site (www.sco.com).

JavaScript

As it’s name implies, JavaScript is a scripting language. That is, the program is not compiled. The nice thing about this is that you do not need any special tools to create JavaScript applets to run in your Web page. You can use the same tools you use to create the pages to begin with. Therefore, you can begin writing with JavaScript now.

As with Java, an advantage of JavaScript is that everything runs locally. Once the script is written, it no longer has to go across the net. A disadvantage is that it cannot interact with the system to any great extent. This limits the functionality, but has an extra level of security since you cannot write a JavaScript program that does anything too nasty to your machine.

Among the various things that you can do with JavaScript is create new windows, load these windows with pages from different sites, and even interact with users. JavaScript also allows you to access many different functions of the browser, to do things like automatically traverse the browser history, and so on.

In one network, I combined JavaScript on the client with perl on the server to create a dynamically updated page. This page displayed the results of a series of pings to about two dozen servers. At regular intervals the page was reloaded by the JavaScript applet so we would monitor connectivity to our remote servers. Based on the average response time, the entry would change colors. If the machine was not reachable at all, the JavaScript applet would create a pop-up saying which server was down.


Next: Security

Index

Copyright 1996-1998 by James Mohr. All rights reserved. Used by permission of the author.

Be sure to visit Jim's great Linux Tutorial web site at http://www.linux-tutorial.info/