Jim Mohr's SCO Companion

Index

Copyright 1996-1998 by James Mohr. All rights reserved. Used by permission of the author.

Be sure to visit Jim's great Linux Tutorial web site at http://www.linux-tutorial.info/

Introduction to Operating Systems


Files and Directories

Previous: Processes

Another key aspect of any operating system is the concept of a file. A file is nothing more than a related set of bytes on disk or other media. These bytes are labeled with a name, which is then used as a means of referring to that set of bytes. In most cases, it is through the name that the operating system is able to track down the file's exact location on the disk.

There three kinds of files that most people are familiar with: programs, text files and data files. However, on a UNIX system there are other kinds of files. One of the most common is a device file. These are often referred to as device files or device nodes. Under UNIX, every device is treated as a file. Access is gained to the hardware by the operating system through the device files. These tell the system what specific device driver needs to be used in order to access the hardware.

Another kind of file is a pipe. Like a real pipe, stuff goes in one end and out the other. Some are named pipes, that is they have a name and are located permanently on the hard disk. Others are temporary and are unnamed pipes. Although these do not exist once the process using them has ended, they do take up physical space on the hard disk. We'll talk more about these later.

Unlike operating systems like DOS, there is no pattern for file names that is expected or followed. DOS will not even attempt to execute programs that do not end with .EXE, .COM or .BAT. UNIX, on the other hand, it just as happy to execute a program called "program" as it is a program called program.txt. In fact, you can use any character in a file name except for "/” and NULL.

However, completely random things can happen if the operating system tried to execute a text file as if it were a binary program. To prevent this, UNIX has 2 mechanisms to ensure that text does not get randomly executed. The first is the file's permission bits. The permission bits determine who can read write and execute a particular file. You can see the permissions of a file by doing a long listing of that file. What the permissions are all about, we get into a little later. The second is that the system must recognize a magic number within the program indicating it is a binary executable.

Even if a file was set to allow you to execute it, the beginning portion of the file must contain the right information to tell the operating system how to start this program. If that information is missing, it will attempt to start it as a shell script (similar to a DOS batch file). If the lines in the file do not belong to a shell script and you try to execute the program, you end up with a screen full of errors.

What you name your file is up to you. You are not limited by the eight-letter name and three-letter extension as you are in DOS. You can still use periods as separators, but that's all they are. They do not have the same "special" meaning that they do under DOS. For example, you could have files called:

letter.txt

letter.text

letter_txt

letter_to_jim

letter.to.jim

Only the first one is valid under DOS, but all are valid under SCO UNIX. Note that even though names prior to SCO UNIX 3.2 v4.0 you were limited to 14 characters in file name, all of these are still valid. With SCO UNIX 3.2.v4.0 and later, file names can be as long as 254 characters.

There is one naming convention that does have special meaning in SCO UNIX, and that is creating "dot” files. These are files where the first character is a '.' (dot). If you have such a file, it will by default be invisible to you. However, unlike the DOS concept of "hidden" files, "dot" files can be seen by simply using the -a (all) option to ls, as in ls -a. (ls is a command used to list the contents of directories.) One thing to note is that the Superuser (root) is magic. It can see these files whether it uses the -a or not, because ls recognizes that you are root and adds the -a.

The ability to group your files together into some kind of organization is very helpful. Instead of having to wade through thousands of files on your hard disk to find the one you want, SCO UNIX, along with other operating systems, allow you to group the files into a directory. Under SCO UNIX, a directory is actually nothing more than a file itself with a special format. It contains the names of the files associated with it and some pointers or other information to tell the system where the data for the file actually resides on the hard disk.

Directories do not actually "contain" the files that are associated with them. Physically, (that is how they exist on the disk) directories are just files in a certain format. The directory structure is imposed on them by the program you use. For example, the hd program in ODT 3.0 will output the contains of a directory "file" without regard to the format that might be imposed by something like ls. (Note that the hd program in OpenServer does not do that any more and is , in my mind, broken.)

The directories have information that points to where the real files are. In comparison, you might consider a phonebook. A phonebook does not contain the people listed in it, just their names and telephone numbers. A directory has the same information: the names of files and their numbers. In this case, instead of a telephone number there is an information node number, or inode number.

The logical structure in a telephone book is that names are grouped alphabetically. It is very common for two entries (names) that appear next to each other in the phone book to be in different parts of the city. Just like names in the phonebook, names that are next to each other in a directory may be in distant parts of the hard disk.

As I mentioned, directories are logical groupings of files. It is common to say that the directory "contains" those files or the file is "in" a particular directory. In a sense this is true. The file that is the directory "contains" the name of the file. However, this is the only connection between the directory and file, but we will continue to use this terminology.

One of the kinds of files a directory can contain is more directories. These, in turn, can contain still more directories. The result is a hierarchical tree structure of directories, files, more directories and more files. Directories that contain other directories are referred to a the parent directory of the child or sub-directory that they contain. (Most references I have seen refer only to parent and sub-directories. Rarely have I seen references to child directories.)

When referring to directories under UNIX, there is often either a leading or trailing slash ("/") and sometimes both. The top of the directory tree is referred to with a single "/" and called the "root” directory. Sub-directories are referred to by this slash followed by their name, such as /bin or /dev. As you proceed down the directory tree, each subsequent directory is separated by a slash. The concatenation of slashes and directory names is referred to as a path. Several levels down, you might end up with a path such as /usr/jimmo/letters/personal/chris.txt, where chris.txt is the actual file and /usr/jimmo/letters/personal is all of the directories leading to that file. The directory /usr contains the sub-directory jimmo, which contains the sub-directory letters, which contains the sub-directory personal. This directory contains the file chris.txt.

Movement up and down the tree is accomplished by the means of the cd (change directory) command, which is part of your shell. Although this is often difficult to grasp at first, you are not actually moving anywhere. One of the things that the operating system keeps track of within the context of each process is that process' current directory, also referred to as current working directory. This is merely the name of a directory on the system. Your process has no physical contact with this directory, just that it is keeping its name in memory.

When you changed directories, this portion of the process' memory is changed to reflect your new "location." You can 'move' up and down the tree or make jumps to completely unrelated parts of the directory tree. However, all that really happens is that the current working directory portion of your process gets changed.

Although there can be many files with all the same name, each combination of directories and file name must be unique. This is because the operating system refers to every file on the system by this unique combination of directories and file name. In the example above, I have a personal letter called chris.txt. I might also have a business letter by the same name. It's path would be: /usr/jim/letters/business/chris.txt. Someone else might also have a business letter to Chris. Their path might be: /usr/john/letters/business/chris.txt.

Figure 0-4 Diagram of directory tree structure

One thing to note is that John's business letter to Chris may be the exact same file as Jim's. I am not talking about one being a copy of the other. Rather, I am talking about a situation where both names point to the same physical locations on the hard disk. Since both files are referencing the same bits on the disk, they must therefore be the same file.

This is accomplished through the concept of a link. Like a chain link, a file link connects two pieces together. I mentioned above the "telephone number" for a file was its inode. This number actually points to a special place on the disk called the inode table, with the inode number being the offset into this table.. Each entry in this table not only contains the file's physical location on this disk, but the owner of the file, the access permissions and the number of links, as well as many other things. In the case where the two files are referencing the same entry in the inode table, these are referred to as hard links. A soft link or symbolic link is where a file is created that contains the path of the other file. We get into details about this in chapter 6.


Figure 0-5 - Files and inodes in a directory

However, it does not contain the name of the file. The name is only contained within the directory. Therefore, it is possible to have multiple directory entries that have the same inode. Just as there can be multiple entries in the phone book, all with the same phone number. We'll get into a lot more detail about inodes in the section on filesystems.

Let's think about the telephone book analogy once again. Although it is not too common for an individual to have multiple listings, there might be two people with the same number. For example, if you were sharing a house with three of your friends, there might be only one telephone. However, each of you would have an entry in the phone book. I could get the same phone to ring by dialing the telephone number of four different people. Just as I could get the same inode with four different file names.

Under SCO UNIX, files and directories are grouped into units called filesystems. A filesystem is a portion of your hard diskthat is administered as a single unit. Filesystems exist within a section of the hard disk called a partition. Each hard disk can be broken down into multiple partitions and each partition can be broken down into multiple filesystems. Each has a specific starting and end point that is managed by the system.

In an operating system such as SCO UNIX, the file is more that just the basic unit of data. Instead, almost everything is either treated as a file or is only accessed through files. For example, in order to read the contents of a data file, the operating system must access the hard disk. SCO UNIX treats the hard disk as if it were a file. It opens it like a file, reads it like a file and closes it like a file. The same applies to other hardware such as tape drives, and printers. Even memory is treated as a file. The files used to access the physical hardware are the device files that I mentioned earlier.

When the operating system wants to access any hardware device, it first opens a file that "points" toward that device (the device node). Based on information it finds in the inode, the operating system determines what kind of device it is and can therefore access it in the proper manner. This includes opening, reading and closing, just like any other file.

If, for example, you a reading a file from the harddisk, not only do you have the file open that you are reading, but the operating system has open the file that relates to the filesystem within the partition, the partition on the hard disk and to the hard disk itself. (More about these in later chapters) There are three additional files that are opened every time you log in or start a shell. These are the files that relate to input, output and error messages.

Normally, when you login, you get to a shell prompt. When you type a command on the keyboard and press enter, a moment later something comes onto your screen. If you made a mistake or the program otherwise encounters and error, there will probably be some message on your screen to that effect. The keyboard where you are typing in your data is the input, referred to as standard input (standard in or stdin) and that is where input comes from by default. The program displays a message on your screen, which is the output, referred to as standard output (standard out or stdout). Although it appears on that same screen, the error message appears on standard error (stderr).

Although stdin and stdout appear to be separate physical devices (keyboard and monitor), there is only one connection to the system. This is one of those device files I talked about a moment ago. When you log in, the file (device) is opened for reading, so you can get data from the keyboard, and for writing so that output can go to the screen and you can see the error messages.

These three concepts (standard in, standard out, and standard error) may be somewhat difficult to understand at first. At this point, it suffices to understand that these represent input, output and error messages. We'll get into the details a bit later.

Next: Operating System Layers

Next Chapter: Basics of SCO UNIX

Index

Copyright 1996-1998 by James Mohr. All rights reserved. Used by permission of the author.

Be sure to visit Jim's great Linux Tutorial web site at http://www.linux-tutorial.info/