Jim Mohr's SCO Companion

Index

Copyright 1996-1998 by James Mohr. All rights reserved. Used by permission of the author.

Be sure to visit Jim's great Linux Tutorial web site at http://www.linux-tutorial.info/

Introduction to Operating Systems


Processes

Previous: What is an operating system?

One of the basic concepts of an operating system is the process. If we think of the program as the file stored on the hard disk or floppy and the process as that program in memory, we can better understand the difference between a program and a process. Although these two terms are often interchanged or even misused in "casual" conversation, the difference is very important for issues that we talk about later.

A process is more than just a program. Especially in a multi-user, multi-tasking operating system such as UNIX, there is much more to consider. Each program has a set of data that it uses to do what it needs to. Often this data is not part of the program.. For example, if you are using a text editor, the file you are editing is not part of the program on disk, but is part of the process in memory. If someone else were to be using the same editor, both of you would be using the same program. However, each of you would have a different process in memory.

Under UNIX, many different users can be on the system at the same time. In other words, they have processes that are in memory all at the same time. The system needs to keep track of what user is running what process, which terminal the process is being run on and what other resources the process has (such as open files). All of this is part of the process.

When you log into a UNIX system, you usually get access to a command line interpreter, or shell. This takes your input and runs programs for you. If you are familiar with DOS, then you already have used a command line interpreter. This is the COMMAND.COM program. Under DOS, your shell gives you the C:> prompt (or something similar). Under UNIX, the prompt is usually something like $, # or %. This shell is a process and it belongs to you. That is, the in-memory (or in-core) copy of the shell program belongs to you.

If you were to start up an editor, your file would be loaded and you could edit you file. The interesting thing is that the shell has not gone away. It is still in memory. Unlike what operating systems like DOS do with some programs, the shell remains in memory. The editor is simply another process that belongs to you. Since it was started by the shell, the editor is considered a "child" process of the shell. The shell is the parent process of the editor. (A process has only one parent, but may have many children)

An example might be encountered by a system administrator performing a backup. When you log in, you have the shell. From the shell you enter the commands for the sysadmsh utility, which starts a new process. When you choose the 'Backups' option, a third processes is started. Once you have chosen the parameters for the backup, the backup process calls the cpio command, which starts a fourth process and starts the backup. Figure 0-1 shows how this might look graphically.

The nice thing about UNIX is that while the administrator is backing up the system, you could be continuing to edit your file. This is because UNIX knows how to take advantage of the hardware to have more than one process in memory at a time. (Note: It is not a good idea to do a backup with people on the system as data may become inconsistent. This was only used as an illustration.)

As you continue to edit, you delete words, insert new lines, sort your text, and write it out occasionally to the disk. All this time, the backup is continuing. Someone else on the system maybe adding figures to a spreadsheet, while a fourth person is inputting orders into a database. No one seems to notice that there are other people on the system. For them, the processor is working for them alone. Well, that's the way it looks.



Figure 0-1 - Relationship of multiple processes

As I am writing this sentence, the operating system needs to know whether the characters I press are part of the text or commands I want to pass to the editor. Each key that I press needs to be interpreted. Despite the fact that I can clip along at about 30 words per minute, the Central Processing Unit (CPU) is spending approximately 95% of it's time doing nothing.

The reason for this is that for a computer, the time between successive keystrokes is an eternity. Let's take my Intel 80486 running at a clock speed of 50Mhz as an example. The clock speed of 50Mhz means that there are 50 million(!) clock cycles per second. Since the 80486 gets close to one instruction per clock cycle, this means that within one second, the CPU can get close to executing 50 million instructions! No wonder it is spending most of its time idle. (Note that this is an oversimplification of what is going on.)

A single computer instruction doesn't really do much. However, being able to do 50 million little things in one second, allows the CPU to give the user an impression of being the only one on the system. It is simply switching between the different processes so fast that no one is aware of it happening.

Each user, that is each process, gets complete access to the CPU for an incredibly short period of time. On SCO UNIX 3.2v4 and later, this period of time, (referred to as a time-slice) is 1/100th of a second! That means, at the end of that 1/100th of a second, it's someone else's turn and your process is forced to give up the CPU. (In reality it is much more complicated than this and we'll get into more details later.)

Compare this to an operating system like standard Windows (not Windows NT or Windows 95). The program will hang onto the CPU until it decides to give it up. An ill-behaved program can hold onto the CPU forever. This is the cause of many of the system hangs, since nothing, not even the operating system itself can gain control of the CPU.

Depending on the load of the system (how busy it is) a process may get several time-slices per second. However, after it has run for its time-slice, the operating system checks to see if some other process needs a turn. If so, that process gets to run for a time-slice and then it's someone else's turn. Maybe the first process. Maybe a new one.

As your process is running, it will be given full use of the CPU for the entire 1/100th of a second unless one of three things happens. First, if your process needs to wait for some event. For example, the editor I am writing this in is waiting for me to type in characters. I said that I type about 30 words per minute, so if we assume an average of 6 letters per word, that's 180 characters per minute or three characters per second. That means that on the average, a character is pressed once every 1/3 of a second. Since a time-slice is 1/100th of a second, over 30 processes can have a turn on the CPU between each keystroke! Rather that tying everything up, the program waits until the next key is pressed. It puts itself to sleep until it is awoken by some external event, such as me pressing a key. Compare this to a "busy loop” where the process keeps checking for a key being pressed.

When I want to write to the disk to save my file, it may appear that it happens instantaneously, but like the "complete-use-of-the-CPU myth" this is only appearance. The system will gather requests to write to or read from the disk and do it in chunks. This is much more efficient than satisfying everyone's request when they ask for it.

Gathering up requests and accessing the disk at once has another advantage. Often times the data that was just written is needed again, for example in a database application. If the system wrote everything to the disk immediately, you would have to perform another read to get back that same data. Instead the system holds that data in a special buffer, It "caches” that data in the buffer. This is called the buffer cache.


Figure 0-2 The flow of file access

If a file is being written to or read from, the system first checks the buffer cache. If on a read it finds what it's looking for in the buffer cache, it has just saved itself a trip to the disk. Since the buffer cache is in memory it is substantially faster to read from memory than from the disk. Writes are normally written to the buffer cache, which is then written out in larger chunks. If the data being written already exists in the buffer cache, it is overwritten.

When your process is running and you make a request to read from the hard disk, you can't do anything until you have completed the write to the disk. If you haven't completed your time slice yet, it would be a waste not to let someone else have a turn. That's exactly what the system does. If you decided you need access to some resource that the system cannot immediately give to you, you are "put to sleep" to wait. It is said that you are put to sleep waiting on an event. The event being the disk access. This is the second case where you may not get your full time on the CPU.

The third way that you might not get your full time slice is also the result of an external event. If a device (such as a keyboard, the clock, hard disk, etc) needs to communicate with the operating system, it signals this need through the use of an interrupt. When an interrupt is generated, the CPU itself will stop execution of the process and immediately start executing a routine in the operating system to handle interrupts. Once the operating system has satisfied this interrupt, it returns to its regularly scheduled process. (Note: Things are much more complicated than that. The "priority" of both the interrupt and the process are a factor here. We will go into more detail about later.)

As I mentioned earlier, there are certain things that the operating system keeps track of as a process is running. The information the operating system is keeping track of is referred to as the process' context. This might be the terminal you are running on or what files you have open. The context even includes the internal state of the CPU, that is, what the content of each register is.

What happens when a process' time slice has run out or for some other reason another process gets to run? Well, if things go right (and they usually do) eventually that process gets a turn again. However, to do things right the process must be allowed to return to the exact place where it left off. Any difference could result in disaster.

You may have heard of the classic banking problem when deducting from your account. If the process returned to a place before it made the deduction, you would have it deducted twice. If it hadn't yet made the deduction, but the process started up again at a point after it would have made the deduction, it appears as if the deduction was made. Good for you, not so good for the bank. Therefore, everything must be put back the way it was.

The processors used by SCO UNIX (Intel 80386 and later) have built-in capabilities to manage both multiple users and multiple tasks. We will get into the details of this in later chapters. So for now, just be aware of the fact that the CPU assists the operating system in managing users and processes. How multiple process might look in memory you can see in Figure 0-3.

Figure 0-3 Multiple processes in memory

In addition to user processes, such as shells, text editors and databases, there are system processes running. These are processes that were started by the system. Several of these deal with managing memory and scheduling turns on the CPU. Others deal with delivering mail, printing and other tasks that we take for granted. In principle, both of these kinds of processes are identical. However, system processes can run at much higher priorities and therefore run more often than user processes.

Many of these system processes are referred to as daemon processes or background processes as they run behind the scenes without intervention for users. It is also possible for a user to put one of his or her processes "in the background." This is done by using the ampersand (&) metacharacter at the end of the command-line. (I'll talk more about metacharacters in the section on shells.)

What normally happens when you enter a command is that that the shell will wait for that command to finish before it will accept a new command. By putting a command in the background, the shell does not wait, but rather is ready immediately for the next command. If you wanted to you could put the next command in the background as well.

I have talked to customers who have complained about the system grinding to a halt after they put dozens of processes in the background. The misconception is that since they didn't see the process running, it must not be taking up any resources. (No news is good news. Out of sight, out of mind. Or whatever.) The issue here is that even though the process is running in the background and you can't see it, it still behaves like any other process.

Next: Files and Directories

Next Chapter: Basics of SCO UNIX

Index

Copyright 1996-1998 by James Mohr. All rights reserved. Used by permission of the author.

Be sure to visit Jim's great Linux Tutorial web site at http://www.linux-tutorial.info/