Jim Mohr's SCO Companion

Index

Copyright 1996-1998 by James Mohr. All rights reserved. Used by permission of the author.

Be sure to visit Jim's great Linux Tutorial web site at http://www.linux-tutorial.info/

Shells and Basic Utilities

Basic Shell Scripting

Previous: Commonly Used Utilities

By now we have a pretty good idea of how commands can be put together to do a wide variety of tasks. However, in order to create more complicated scripts, we need more than just a few commands. There are several shell constructs that you need to be familiar with in order to make really complicated scripts. A couple (the while and for-in constructs) we already covered. However, there are several more that can be very useful in a wide range of circumstances.

There are several things we need to talk about before we can jump into things. The first is the idea of arguments. Like binary programs you can pass arguments to shell scripts and have them use these arguments as they work. For example, let's assume we have a script called myscript that takes three arguments. The first is the name of a directory, the second is a file name and the third is a word to search for. The script will then search for all files in that directory with any part of their name being the filename and then search in those files for the word specified. A very simple version of the script might look like this:

ls $1 | grep $2 | while read file

do

grep $3 ${2}/${file}

done

The syntax is:

myscript directory file_name word

We discussed the while-do-done construct at the beginning of the chapter when we were talking about different commands. The one difference here is that we are sending the output of a command through a second pipe before we send it to the while.

This also brings up a new construct: ${2}/${file}. By enclosing a variable name inside of curly brackets, we can combine variables. In this case we take the name of the directory ( ${2} ), tack on a '/' for a directory separator followed by the name of a file that the grep had found ( ${file} ). This build up the path name to the file.

When we run the program like this:

myscript /usr/jimmo trip boat

The two arguments /usr/jimmo, trip and boat are assigned to the positional parameters 1,2 and 3, respectively. "Positional" because the number they are assigned is based on what position they appear in the command. Since the positional parameters are shell variables, we need to refer to them with the leading dollar-sign ($).

When the shell interprets the command, what is actually run is:

ls /usr/jimmo | grep trip | while read file

do

grep boat /usr/jimmo/${file}

done

If we wanted we could make the script a little more self-documenting by assigning the values of the positional parameters to variables. The new script our look like this:

DIR=$1

FILENAME=$2

WORD=$3

ls $DIR | grep $FILENAME | while read file

do

grep $WORD ${DIR}/${file}

done

If we started the script again with the same arguments, first /usr/jimmo would get assigned to the variable DIR, trip would get assigned to the variable FILENAME and boat would get assigned to WORD. When the command was interpreted and run, it would still be evaluated the same way.

Being able to assign the positional parameters to variables is useful for a couple reasons. First, is the issue of self-documenting code. In this example, the script is very small and since we know what the script is doing, we probably would have not made the assignments to the variables. However, if we had a larger script, then making the assignment is very valuable in terms of keeping track of things.

The next issue is that you can only reference ten positional parameters. The first $0, refers to the script itself. What this can be used for, we'll get to in a minute. The others, $1-$9 refer to the arguments that are passed to the script. Well, what happens if you have more than nine arguments? This is where the shift instructions comes in. What it does is to move the arguments "down" in the positional parameters list.

For example, let's assume we changed the first part of the script like this:

DIR=$1

shift

FILENAME=$1


On the first line the value of positional parameter 1 is /usr/jimmo and we assign it to the variable DIR. The next line, the shift moves every positional parameter down. Since $0 remains unchanged, what was in $1 (/usr/jimmo) drops out of the bottom. Now the value of positional parameter 1 is trip which is assigned to the variable FILENAME and positional parameter 2 (boat) is assigned to WORD.

If we had 10 argument, the 10th would initially be unavailable to us. However, once we do the shift, what was the 10th argument is shift down and becomes the 9th. It is now accessible through the positional parameter 9. If we had more than 10, there are a couple of ways to get access to them. First, we could issues enough shifts until the arguments all moved down far enough. Or, we could use the fact that shift can take as an argument the number of shifts it should do. Therefore, using

shift 9

makes the 10th argument positional parameter 1.

What about the other nine arguments? Are they gone? We'll if you never assigned them to a variable, then yes, they are gone. However, by assigning them to a variable before you make the shift, you still have access to their values.

Being able to shift positional parameters comes in handy in other instance. This brings up the issue of a new parameter: $*. This refers to all the positional parameter (except for $0). So, let's assume we have 10 positional parameters and do a shift 2 (ignoring whatever we did with the first two). Now the parameter $* contains the value of the last eight arguments.

In our sample script above, what is we wanted to search for a phrase and not just a single word. we could change the script to look like this:

DIR=$1

FILENAME=$2

shift 2

WORD=$*

ls $DIR | grep $FILENAME | while read file

do

grep "$WORD" ${DIR}/${file}

done

The first change was that after assigning positional parameters 1 and 2 to variables, we shifted twice, effectively removing the first two arguments. We now assign the remaining argument to the variable WORD ( WORD=$* ) Since this could be a phrase, we need to enclose the variable in double-quotes ( "$WORD" ). We can now search for phrases as well as single words. If we did not include it in double-quotes, the system would see it as individual arguments to grep.

Another useful parameter keeps track of the total number of parameters: $#. In the script above, what would happen if we only had two arguments? Well, the grep would fail because there would be nothing for it to search for. Therefore, it would be a good thing to keep track of the number of arguments.

We need to first introduce a new construct. This is the if-then-fi construct. This is similar to the while-do-done construct, where the if-fi pairs mark the ends of the block (fi is simply if reversed). The difference is that instead of repeating the commands within the block while the specific condition is true, we only do it once if the condition is true. In general, it looks like this:

if [ condition ]

then

do something

fi

The conditions are all defined in the test(C) man-page and can be string comparisons, arithmetic comparisons, and even conditions where we test specific files, such as if they are writeable or not. Check out the test(C) for more examples.

Since we want to check the number of arguments passed to our script, we will do an arithmetic comparison. We can check if the values are equal, the first is less than the second, the second less than the first, the first greater than or equal to the second and so on. In our case, we want to ensure that there are at least three arguments since having more is valid if we are going to be searching for a phrase. Therefore, we would want to compare the number of arguments and check if it is greater than or equal to 3. So, we might have something like this:

if [ $# -ge 3 ]

then

body_of_script

fi

If we only have two arguments, then the test inside the brackets is false, the if fails and we do not enter the loop. Instead the program simple exits silently. However, to me, this is not enough. We want to know what's going on, therefore, we use another construct, the else. When used with the if-then-fi we are saying that if the test evaluates to true, do one thing, otherwise do something else. In or example program, we might have something like this:

DIR=$1

FILENAME=$2

shift 2

WORD=$*

if [ $# -ge 3 ]

then

ls $DIR | grep $FILENAME | while read file

do

grep "$WORD" ${DIR}/${file}

done

fi

else

echo "Insufficient number of arguments"

fi

If we only put in two arguments. the if fails and the commands between the else and the fi are executed. To make the script a little friendly, we usually tell the user what the correct syntax is, therefore we might change the end of the script to look like this:

else

echo "Insufficient number of arguments"

echo "Usage: $0 <directory> <file_name> <word>"

fi

The important part of this change is the use of the $0. As I mentioned a moment ago, this is used to refer to the program itself. Not just it's name, bur rather the way it was called. Had we hard coded the line to look like this:

echo "Usage: myscript <directory> <file_name> <word>"

then no matter how we started the script, the output would always be:

Usage: myscript <directory> <file_name> <word>

However, if we use $0 instead, we could start the program like this:

/usr/jimmo/bin/myscript /usr/jimmo file

and the output would be:

Usage: /usr/jimmo/bin/myscript <directory> <file_name> <word>

On the other hand, if we started it like this:

./bin/myscript /usr/jimmo file

The output would be:

Usage: ./bin/myscript <directory> <file_name> <word>

One thing to keep in mind is that the else needs to be within the matching if-fi pairs. The key here is the word matching. we could nest the if-then-else-fi several layers if we wanted We just need to keep track of things The keys issues are that the ending fi matches to the last fi and the else is enclosed within an if-fi pair. Here is what multiple sets might look like:

if [ $condition1 = "TRUE" ]

then

if [ $condition2 = "TRUE" ]

then

if [ $condition3 = "TRUE" ]

echo "Conditions 1, 2 and 3 are true"

else

echo "Only Conditions 1 and 2 are true"

fi

else

echo "Only Condition 1 is true"

fi

echo "No conditions are true"

else

fi

Now, this doesn't take into account the possibility that condition1 is false but condition2 or condition3 are true. However, hopefully, you see how to construct nested conditional statements.

What if we had a single variable that could take on several values. Depending on what value that was, the program would behave differently. This could be used as a menu, for example. Many system administrators build such a menu into their users' .profile (or .login) so that they never need to get to a shell. They simply input the number of the program that they want to run and away they go.

In order to do something like this, we need to introduce yet another construct. This is the case-esac pair. Like the if-fi pair, esac is the reverse of case. So to implement a menu, we might have something like this:

read choice

case $choice in

a) program1;;

b) program2;;

c) program3;;

*) echo "No such Option";;

esac

If the value of choice that we input is either a, b or c, then the appropriate program is started. The things to note are the in on the first line, the expected value is followed by a closing parenthesis, and there are two semi-colons at the end of each of block.

It is the closing parenthesis that indicates the end of the possibilities. If we wanted we could include other possibilities for the different options. In addition, since the double semi-colons mark the end of the block, we can simply add other command before we get to the end of the block. For example, if we wanted our script to recognized either upper or lowercase, we could change it to look like this:

read choice

case $choice in

a|A) program1

program2

program3;;

b|B) program2

program3;;

c|C) program3;;

*) echo "No such Option";;

esac


If necessary, we could also include a range of characters, as in:


case $choice in

[a-z] ) echo "Lowercase";;

[A-Z] ) echo "Uppercase";;

[0-9] ) echo "Number";;

esac

Now whatever is called as the result of one of these choices does not have to be a UNIX command. Since each line is interpreted as if it were executed from the command line, we can include anything that we could if we had executed the command from the command line. Provided they are known to the shell script, this also includes aliases, variables and even shell functions.

A shell function behaves similarly to functions in other programming languages. It is a portion of the script that is set-off from the rest of the program and is accessed through its name. These are the same thing as the functions we talked about in our discussion of shells. The only apparent difference is that functions created inside of a shell script will disappear when the shell exits. To prevent this, start the script with a . (dot).

For example, if we had a function inside a script called myscript we would start it like this:

. myscript

The result is that although the script executes normally, a sub-shell is not started. Therefore, anything you set or define remains. This includes both functions and variables.

Shell functions behave like small shell-scripts in that it can accept arguments like a shell script and the positional parameters behave in the same way. Actually, they don't have to be small, at all. In fact, the only limitations we can find are the same that apply to shell scripts in general such as the length of the command line and overall size of the file. However, one thing to keep in mind is that the functions need to appear in the script before they are called.

The basic syntax of the function is:

function_name()

{

what the function does

}

When you call a function, you simply use its name, just as you would from the command. If the function takes any arguments, these are passed just like to shell scripts. In fact, shell functions have positional parameters, just like the shell script. Added to that, they are different from those belonging to the shell script. For example, let's look at this script, called (what else?) myscript:


funct1()

{

echo $1

}


funct1 two

If we started it like this:

myscript one

the output would be:

two

This is because the positional parameter 1 inside on the function funct1() refers to the arguments used to call the function. So, if funct1 was called with no arguments, then $1 would not be set when we got inside of funct1.

One thing I commonly used functions for is to clean up from me when things go wrong. In fact, there are quite a few shell scripts on a standard SCO system containing "clean-up" functions. Such functions are necessary to return the system to the state it was before the shell script was started. What happens when the user hits the delete key in the middle of a script. Unless it has been disabled or trapped, the script will terminate immediately. This can leave some unwanted things lying around the system. Instead, we can catch or trap the delete key and run a special clean-up function before we exit.

This is done with the trap instruction. The syntax is:

trap 'command' signals

where command is the command to run if any of the signals listed in signals are received. For example, if we wanted to trap the delete key (signal 2) and run a cleanup function, the line might look like this:

trap 'cleanup' 2

After we start the script, any time we press the delete key it will first run the function cleanup. You can also set up different traps for different signals, like this:

trap 'cleanup1' 1

trap 'cleanup2' 2

Okay. So you know some of the basic commands and how to put them together into a script. The biggest problem up to this point is figuring out how to create that script. You could continue to use cat. However, that will get old fast, especially if you make a lot of typos like we do or want to make changes to your scripts. Therefore, you need a better tool. What you really need is a text editor, which is the subject of the next section.

In a shell script (or from the command line for that matter), you can input multiple commands on the same line. For example:

date; ls | wc

would run the date command and then give the word count of the ls command. Note that this does not write the output of both commands on the same line. This just allows you to have the multiple commands on the same line. First the date command is executed and the ls | wc. Each time, the system creates an extra process to run that program.

We can prevent the system from creating the extra process by enclosing the command inside of parenthesis. For example:

(date; ls | wc )

This is run by the shell itself and not as sub-shells.



File Management


cd

change directory

chgrp

change the group of a file

chmod

change the permissions (mode) of a file

chown

change the owner of a file

cp

copy files

file

determine a file's contents

l, lc, lf, ls, lx

list files or directories

ln

make a link to a file

mkdir

make a directory

mv

move (rename) a file

rm

remove a file

rmdir

remove a directory

File Manipulation


awk

pattern-matching language

cat

display a file

cmp

compare two files

csplit

split a file

cut

display columns of a f ile

diff

find differences in two files

dircmp

compare two directories

find

find files

head

show the top portion of a file

more

display screenfuls of a file

pg

display screenfuls of a file

sed

non-interactive texte editor

sort

sort a file

tail

display bottom portion of a file

tr

translate chracters in a file

uniq

find unique or repeated lines in a file

xargs

process mutiple arguements

Table 0.1 Commonly Used Commands

Odds and Ends

Here are a few little tidbits, that I wasn't sure where to put.

You can get the shell to help you debug your script. If you place a set -x in your script, each command with the corresponding arguements are printed as they are executed. If you want to just show a section of you script, include the set -x before that section then a set +x at the end. The set +x turns off the output.

If you want you can capture the output into another file, without having it go to the screen. This is done using the fact that output generated as a result of the set -x is going to stderr and not stdout. If you redirect stdout somewhere, the output from the set -x still goes to the screen. On the other hand, if you redirect stderr, stdout still goes to your screen. To redirect sterr to a file start the script like this:

mscript 2>/tmp/output

This says to send file descriptor 2 (stderr) to the file /tmp/output.

If you want to create a directory that is several levels deep, you do not have to change directories to the parent and then run mkdir from there. The mkdir command takes as an argument the path name of the directory you want to create. It doesn't matter if it is a sub-directory, relative path or absolute path. The system will do that for you. Also, if you want to create several levels of directories, you don't have to make each parent directory before you make the sub-directories. Instead, you can use the -p option to mkdir, which will automatically create all the necessary directories.

For example, we want to create the sub-directory ./letters/personal/john, however the sub-directory letters does not exist, yet. This also means that the sub-directory personal doesn't exist. If we run mkdir like this:

mkdir -p ./letters/personal/john

then the system will create ./letters, then ./letters/personal and then ./letters/personal/john.


Assume that you want to remove a file has multiple links, for example, ls, lc, lx, lf, etc. are links to the same file. The system keeps track of how many names reference the file through the link count. (more on that later) Such links are called hard links. If you remove one of them, then the file still exists as there are other names that reference it. Only when we remove the last link (and with that the link count goes to zero) will the file be removed.

There is also the issue of symbolic links. A symbolic link (also called a soft link) is nothing more than a path name that points to some other file, or even directory. It is not until that the link is access that the path is translated into the "real" file. This has some interesting effects. For example, if we create a link like this:

ln -s /usr/jimmo/letter.john /usr/jimmo/text/letter.john

You would see the symbolic link as something like this:

drw-r--r-- 1 jimmo support 29 Sep 15 10:06 letter.john-> /usr/jimmo/letter.john

Then the file /usr/jimmo/text/letter.john is a symbolic link to /usr/jimmo/letter.john. Note that the link count on /usr/jimmo/letter.john doesn't change, since the system sees these as two separate files. It is easier to think of the file /usr/jimmo/text/letter.john is a text file that contains the path to /usr/jimmo/letter.john. If we remove /usr/jimmo/letter.john, then /usr/jimmo/text/letter.john will still exist. However, it points to something that doesn't exist. Even if there are other hard links that point to the same file as /usr/jimmo/letter.john did, that doesn't matter. The symbolic link /usr/jimmo/text/letter.john points to the path /usr/jimmo/letter.john. Since the path no longer exists, the file can no longer be accessed via the symbolic link. It is also possible for you to create a symbolic link to a file that does exist, as the system does not check until you access the file.


When you create a file, the access permissions are determined by their file creation mask. This is defined by the UMASK variable and can be set using the umask command. One thing to keep in mind that this is a mask. That is it masks out permissions rather than assigning them. If you remember, permissions on a file can be set using the chmod command and a three digit value. For example:

chmod 600 letter.john

explicitly sets the permissions on the file letter.john to 600. (read and write permission for the user and nothing for everyone else) If we create a new file, the permissions might be 660 (read/write for user and group). This is determined by the UMASK. To understand how the UMASK works, you need to remember that the permissions are octal values, which are determined by the permissions bits. Looking at one set of permissions we have:

bit:

2

1

0

value:

4

2

1

symbol:

r

w

x

Which means that if the bit with value 4 is set (bit 2), the file can be read. If the bit with value 2 is set (bit 1), the file can be written to. If the bit with value 1 is set (bit 0), the file can be executed. If multiple bits are set, their values are added together. For example, if bits 2 and 1 are set (read/write), the value is 4+2=6. Just as in the example above. If all three are set, we have 4+2+1=7. Since there are three sets of permissions (owner, group, other), the permissions are usually used in triplets, just as in the chmod example above.

The UMASK value masks out the bits. The permissions that each position in the UMASK masks out is the same as the file permissions themselves. So, the left most position masks out the owner permission, the middle position the group, and the right most masks out all others. If we have a UMASK=007, then the permissions for owner and group are not touched. However, for others we have the value 7. This value is obtained by setting all bits. Since this is a mask, then all bits are unset.

The problem many people have is that the umask does not force permission, but rather it limits them. For example, if we had a UMASK=007, then we could assume that any file created has permissions of 770. However, this depends on the program that is creating the file. If the program is creating a file with permissions 777, then the umask will mask out the last bits and the permissions will, in fact, be 770. However, if the program creates permissions of 666, then the last bits are still masked out. However, the new file will have permissions of 660, not 770. Some programs, like the C compiler, do generate files with the execution bit (bit 0) set. However, most do not. Therefore setting the UMASK=007, does not force creation of executable programs. (Unless the program creating the file does itself.)

Let's look at a more complicated example. Assume our is UMASK=047. If our program creates a file with permissions 777, then our UMASK does nothing to the first digit, but masks out the 4 from the second digit, giving us 3. Then since the last digit of the UMASK is 7, this masks out everything, so the permissions here are 0. As a result, the permissions for the file are 730. However, if the program creates the file with permissions 666, then the resulting permissions are 620. The easy way to figure out the effects of the UMASK are to subtract the UMASK from the default permissions that the program sets. (Note that all negative values become 0)

As I mentioned, one way the UMASK is set is through the environment variable UMASK. You can change it any time using the umask command. The syntax is simply:

umask <new_umask>

Here the <new_umask> can either be the numeric value (e.g. 007) or symbolic. For example, to set the umask to 047 using the symbolic notation we have:

umask u=,g=r,o=rwx

This has the effect of removing no permissions from the user, removing read permission from the group, and removing all permissions from others.


Changing your current group

Group control is carried out using the

sg(C)

(supplementary group) command. Type id (see

``Finding out your group'')

or sg to obtain a list of the groups of which you are a

member, as follows:

&#1;&#1;&#1;

$ sg

Current effective supplemental groups:

1014(techpubs)


You can change your current group by using the sg -g

option, as follows:


$ sg -g techpubs


You must be recognized as a member of the new group before you can

switch to it. Group memberships are listed in the file

/etc/group; each group has a line in the file, followed by

the names of those users who are authorized to work in it. After

successfully changing group, you work within the new group for the

remainder of the login session (or until you run sg -g

again).



Getting help when you are uncertain of the topic

If you know the keyword but do not want to read all the reference

text, you can use the

whatis(C)

command to list the description of the item. For example, to read

the description of man, type the following:


$ whatis man

man(C) - prints reference pages in this guide


If you are not sure of the keyword to use for a topic, you can use

the apropos(C)

command (which is the same as man -k). Each entry in the

reference manual has a description associated with it;

apropos searches the descriptions for the word you give as

a subject. For example, to find reference entries concerned with

searching, type apropos search. The following entries are

among those displayed:


egrep(C) - Search a file for one or more patterns

fgrep(C) - Search a file for a fixed string

grep (C) - Search a file for a pattern


You can then use man C grep, for example, to display the

manual page on the grep command.

Next: Interactively Editing Files with vi

Next Chapter: Users and User Accounts

Index

Copyright 1996-1998 by James Mohr. All rights reserved. Used by permission of the author.

Be sure to visit Jim's great Linux Tutorial web site at http://www.linux-tutorial.info/