Jim Mohr's SCO Companion

Index

Copyright 1996-1998 by James Mohr. All rights reserved. Used by permission of the author.

Be sure to visit Jim's great Linux Tutorial web site at http://www.linux-tutorial.info/

Starting and Stopping the System


Almost every user and many administrators, never see what is happening as the system as is booting. Those that do, often are not sure what is happening. From the time you flip the power switch, to the time you get that first Login: prompt, dozens of things must happen. Many happen long before the system knows that it's SCO UNIX that's running. Knowing what is happening as the system boots and it what order is very useful when your system is not starting the way it should.

In this chapter we are going to first talk about starting your system. Although you can get it going by flipping on the power switch and letting the system boot by itself, there are many ways to change the behavior of your system as it boots. How the system boots may depend on the situation. As we move along through the chapter we'll talk about the different ways you have of influencing the way the system boots.

After we talk about how to start your system. There are few choices in terms of how you stop your system. However, the few that are available allow you to alter the system's behavior when shutting down.

The Boot Process

The very first thing that happens is the Power-On Self-Test (POST). Here the hardware is checking itself to see that things are all right. One thing that is done is to compare the hardware settings in the CMOS to what is physically on the system. Some errors like the floppy types not matching are annoying, but your system still can boot. Others, like the lack of a video card can keep the system from continuing. Often times, there is nothing to indicate what the problem is, only a few little "beeps”.

Once the POST is completed, the hardware jumps to a specific, pre-defined location in RAM. The instructions that are located here are relatively simply and basically tell the hardware to go look for a boot device. Depending on how your CMOS is configured, first your floppy is checked and then your hard disk.

When a boot device is found (let's assume that it's a hard disk), the hardware is told to go to the 0th (first) sector (cylinder 0, head 0, sector 0), then load and execute the instructions there. This is the masterboot block, or MBR for you DOS-heads. This 512 byte piece of code is intelligent enough to read the partition table (located just past the master boot block) and find the active partition. Once it finds the active partition it begins reading and execute the instructions contained within the first block.

It is at this point that viruses can affect/infect SCO systems. The masterboot block is the same format for essentially all PC-based operating systems. All the masterboot block does is to find and execute code at the beginning of the active partition. Instead, the masterboot block could contain code that told it to go to the very last sector of the hard disk and execute the code there. If that last sector contain code that told the system to find and execute code at the beginning of the active partition, you would never know anything was wrong.

Let's assume that the instructions at the very end of the disk are larger than a single, 512-byte sector. If they took up a couple of kilobytes, you could get some fairly complicated code. Since it as at the end of the disk, you would probably never know it was there. What if that code checked the date in the CMOS and if the day of the week was Friday and the day of the month was the 13th it would erase the first few kilobytes of your hard disk? If that were the case, then your system would be infected with the Friday the 13th virus and you could no longer boot your hard disk.

Viruses that behave in this way are called "boot viruses", as they affect the master boot block and can only damage your system if this is the disk you are booting from. These kinds of can affect all PC based systems. Some computers will allow you to configure the CMOS (more on that later) so that you cannot write to the master boot block. Although this is a good safeguard against older viruses, the newer ones can change the CMOS to allow writing to masterboot block. So, just because you have enabled this feature, does not mean your system is safe. However, I must point out that boot viruses can only affect SCO systems if you boot from an infected disk. This will usually be a floppy, more than likely a DOS floppy. Therefore, you need to be especially careful when booting from floppies.

Now back to our story...

As I mentioned, the code in the masterboot block finds the active partition and begins executing the code there. On an MS-DOS system, this is the IO.SYS and MSDOS.SYS files. On an SCO UNIX system, this is referred to as boot0. Although IO.SYS and MSDOS.SYS are "real" files that you can look at and even remove if you want to, the boot0 program is not. It is part of the partition, but not part of any division. Therefore, it is not part of any filesystem and therefore not a "real" file. Next, boot0 reads boot1. Boot1, then read the first "real" file: /boot.

The /boot program is a not only a file, but it is a program. The key aspect is that it is a "stand-alone" program, often referred to as a stand-alone loader, as it must load the specified operating system into memory. Because of this, /boot must implement its own system calls and memory management. These are mostly handled by making use of the system BIOS. Once /boot has finished loading the SCO UNIX operating system, UNIX begins to runs and the BIOS is not longer used. (At least for the most part. We talk about this later)

In comparison to the other pieces of code, /boot is a genius. It is /boot that reads the /etc/default/boot file to determine the default boot string (more on that in a moment); and default boot options, such as how long to wait until automatically booting. /boot also displays the now famous "boot:" prompt.

Figure 0-1 The stages of booting the system

When you just press ENTER or wait for the time-out to automatically boot, /boot executes the instructions defined in the default boot string. If you look inside of /etc/default/boot, you will see an entry DEFBOOTSTR. This is the default boot string. You will hear it referred to as either "default boot string" or "defbootstr". I am going to talk about some of these options later, but if you want to get more information now, check out the boot(HW) man-page.

In OpenServer, you have a new program: bootos. This can be called from /boot and is used to boot other operating systems, including DOS, Windows NT and OS/2. Rather than breaking the flow of this discussion, I will get into the bootos program later.

Assuming that you simply pressed <ENTER> or waited for the time-out specified in /etc/default/boot, the first thing is done is "memory sizing". Memory sizing is when the /boot program figures out how much memory you have and displays it on the screen. Such as:

Memory sizing ................... Memory found: 0k-640k,1m-16m,16m-32m/n

In this case, /boot recognized the base 640K (0k-640k) of RAM as well as the rest of my 32Mb(1m-16m,16m-32m). Why didn't it just say 1-32m, rather than splitting it into those two ranges? The answer is the /n at the end of the line. This says that the range of memory between 16 and 32 megabytes is not DMAable. That is, the Direct Memory Access controller cannot access memory above 16MB. Because of this, all of the kernel's data must be below the 16Mb mark. If you only had 20Mb, this would read: 16m-20m/n. DMA is a way of having device access memory directly without the intervention of the CPU, which saves time. We talk more about DMA in the chapter on hardware.

Usually what your defbootstr says to do next is to load and execute the /unix program. This is the operating system itself (the kernel). As the system is loading the kernel, you will see several rows of dots moving across the screen. This is more for its pacifying effect than anything else. Since the boot procedure takes a few moments, the movements dots are an indicator that, yes, it doing something.

In ODT, each dot represents 4 K being loaded. Because the Open Server kernel is a lot larger, there are more 4K chunks to load in. So, instead of having more dots, each dot in OpenServer represents 3 dots in ODT. Therefore, don't think that your system is loading slower. There's just more to load.

The first set of dots follow the message: loading .text. If you remember from the discussion on the kernel internals, the text is the segment of the program that contains the instructions. Next, we see the message loading .data. Like other programs, this segment contains the kernel's initialized data. Lastly, we see: loading .bss. This segment contains the kernel's declared, but uninitialized data.

Shortly thereafter, the screen clears and you what is referred to as the "hardware screen." At the top is the date, the operating system, the kernel ID number and a few copyright notices.

Depending on your hardware, you might then see the message:

10 bits of I/O address decoding

Some buses only have 10 bits to decode I/O address lines. This means that bases addresses can only fall into the range of 000h-FFFh. This also means that since there are 6 bits left over, there are 63 "image" addresses that could cause problems with other boards if the lower order 10 bits matched. The above message indicates that on this system, there are only 10 I/O address lines. Therefore, there are only 10 bits for I/O decoding.

Next, we a table of the hardware devices that system recognizes. You can see what each of the columns represent by the header at the top of each column. What the individual entries mean in the comments section, we will talk about in the chapter on hardware. This table is basically the same as what you get by running 'hwconfig -h'.

We next have a listing of how much memory is in the system and how it is broken down:

mem: total = 16256k, kernel = 4852k, user = 11404k

This listing is the deciding factor in determining how much your system is running with. Regardless of what the system shows as it boots or what the /boot program shows during the memory sizing. This is what the kernel sees. In this case, there are 16 Mb of RAM available. However, I have never seen a case where the hardware reports a different amount of memory than what is displayed here.

Here we can see just how much memory the kernel takes up and how much is left over for users. In this case, not quite 5Mb are being used by the kernel, leaving a little over 11Mb for user processes. Be careful! This can be deceptive. The amount of user memory does not say how much is left over for users logging in. Instead it would be more informative to refer to it simply as non-kernel memory.

Remember in our discussion of kernel internals, I mentioned that a process can operated in either user mode or kernel mode. A process that is running in user mode is executing user instructions. This is taking up user memory. Any program that is not the kernel is a process. All system processes, no matter what they do, will take a portion of this remaining 11Mb. This includes such things as vhand, sched and init.

On my OpenServer system, there are so many system processes that the amount available for "normal" users is almost the exact opposite of what it appears from the message during boot. With nothing running other than the system processes, almost 12Mb is used. (That's why I upgraded from 16Mb to 32Mb.

We next have details of the import system devices along with their major and minor numbers, such as the root filesystem (rootdev = 1/40) and the swap device (swapdev = 1/41). The pipe device (pipedev = 1/40) is where the system gets the data blocks from creating pipes. We talked about this in the section on files and filesystems. Lastly there is the dump device (dumpdev = 1/41). Hope that you never have to use this.

Should something go wrong on your system and it needs to panic (Remember on discussion on kernel internals?), the kernel tries to help you figure out what went wrong. It does so by saving an image of all of physical RAM. That way, you can later go back and see what the kernel was doing when it panicked.

Have you ever wondered why the SCO doc says that you need at least as much swap as you have RAM? This is the reason. If you panic and there is not enough space on your swap device, you will not get a valid dump image. Granted, you could save some other area of your hard disk for dump and only use the swap area for swap. However, this is a waste of space as during normal operations the dump area is not used and when the system panics, its not using the swap device.

In addition to being told where the swap device is, we are also told how big it is. In my case, the entry looks like this:

nswap = 34000

Along with this there is the number of times the clock generates and interrupt per second (Hz = 100) and the size of your I/O buffers ( i/o bufs = 1472k). We talked about these I/O buffers in the both the section on device nodes and kernel internals. However, there we used a different name. These I/O buffers are your buffer cache.

When the system boots, you see all the entries in nice, neat little rows. Like this:


device address vector dma comment

----------------------------------------------------------------------------

%fpu - 13 - type=80387

%serial 0x03F8-0x03FF 4 - unit=0 type=Standard nports=1

%floppy 0x03F2-0x03F7 6 2 unit=0 type=135ds18

%console - - - unit=vga type=0 12 screens=68k

%adapter 0x0330-0x0332 11 5 type=ad rev=01 ha=0 id=7 fts=s

%tape - - - type=S ha=0 id=2 lun=0 bus=0 ht=ad

%disk - - - type=S ha=0 id=0 lun=0 bus=0 ht=ad

%Sdsk - - - cyls=1170 hds=64 secs=32 fts=sb

mem: total = 16256k, kernel = 4852k, user = 11404k

swapdev = 1/41, swplo = 0, nswap = 50000, swapmem = 25000k

rootdev = 1/42, pipedev = 1/42, dumpdev = 1/41

kernel: Hz = 100, i/o bufs = 1472k


The column headers are: the type of device, base address range in hexadecimal, the interrupt vector (IRQ), DMA channel, and comments (which contains other details about the hardware).

This is also how the file /usr/adm/messages (where this boot information eventually ends up) looks like on an OpenServer system. However, if you look in /usr/adm/messages on an ODT 3.0 system, things look a lot sloppier. Like this:


D 10 bits of I/O address decoding

Sat Feb 26 14:35:16

g

E device address vector dma comment

----------------------------------------------------------------------------

%fpu - 13 - type=80387

F0 F1 F2 F3 F4 F5 %serial 0x03F8-0x03FF 4 - unit=0 type=Standard nports=1

F6 F7 %floppy 0x03F2-0x03F7 6 2 unit=0 type=135ds18

%floppy - - - unit=1 type=96ds15

F8 F9 F10 %console - - - unit=vga type=0 12 screens=68k

F11 F12 %adapter 0x0330-0x0332 11 5 type=ad ha=0 id=7 fts=s

F13 F14 F15 F16 F17 F18 %tape - - - type=S ha=0 id=2 lun=0 ht=ad

F19 G H0 H

Sat Feb 26 14:35:17

1 H2 H3 H4 H5 H6 %disk - - - type=S ha=0 id=0 lun=0 ht=ad fts=s

%Sdsk - - - cyls=1170 hds=64 secs=32

H7 H8 H9 H10 H11 H12 H13 I0 mem: total = 16256k, kernel = 3428k, user = 12828k

J K L M rootdev = 1/40, swapdev = 1/41, pipedev = 1/40, dumpdev = 1/41

nswap = 34000, swplo = 0, Hz = 100

kernel: i/o bufs = 600k

Interspersed among the information that we already talked about, you will see sets of letters, some followed by numbers. These are essentially checkpoint that the kernel has reached as it checks the hardware it expects to find on your system, as well as when it first mounts the root filesystem, prints the hardware configuration information above and other things it must do at start-up. Because some of the start up procedures occur rather quickly, you may not see the letter for that stage, however it is there. You may see this information as the system boots. Table 0.1 contains the boot letters you see as your system is coming up.


D

Check for 10 bits of I/O decoding and perform certain machine-specific initializations.

E

Print configuration information for the math co-processor if there is one.

F

Initialize I/O devices and pseudo-device

G

Initialize the PICs and multiprocessors as well as configure the root disk driver, and reset keyboard.

H

Initialize various system resources such as the kernel inode table, streams, and clists.

I

Print machine-specific information, start certain devices and print the total kernel and user memory.

J

Initialize floating point emulator.

K

Open the swap device and add the swap file table

M

Initialize machine-specific memory ECC support, as well as display the primary devices (root, pipe, and dump) , clock interrupt rate (HZ), kernel I/O buffers( The buffer cache), and additional CPUs found.

Table 0.1 Boot Letters

It is around here that the kernel loads and starts the init program. One of the first things that init does is reads the /etc/inittab file. It looks for any entry that should be run when the system is initializing (the entry has a sysinit in the third field) and then executes the corresponding command. (See the inittab(F) man-page for more details).

The first thing init runs out of the inittab is /etc/bcheckrc. This is a shell script, so you can take a look at it if you want. The first thing it does is to determine if you have a memory dump image on your swap device. If you should have the misfortune of having your system panic, you may have the good fortune to have the system spill its guts into the dump device (usually the same as the swap device). This contains a complete image of what was in physical memory at the time the system panicked. There is useful information here that can often be used to determine why the system panicked.

Obviously, the system needs some place to put the memory image. As I mentioned before, this is the dump device. Unless you have changed it, this will be the same place as the swap device. Because the system needs to dump all of memory in order to ensure it got everything it needed, the dump device must be at least the same size as the amount of RAM you have. Since the dump and the swap device are usually the same thing, this is one reason why the swap device must be at least the same size as RAM..

If the system panics and was successful in writing the memory image to dump device, you will probably see the following message when the system reboots:

There may be a system dump memory image in the swap device.

Do you want to save it? (y/n)

If you respond 'n', you are then asked if you want to delete the image; enter y to save the image and continue.

Another likely possibility when the system panics is that one or more filesystems will be "dirty". Which could be that they are in an inconsistent state. In many cases no serious problems arise as a result. However, in order to access the filesystem, it must be checked first as the system has no way of knowing if there are any problems until it has checked and the problems are corrected. This is referred to as cleaning the filesystem.

If the system was not shutdown properly (through a panic or simply turning off the power) then every mounted filesystem will need to be cleaned. Depending on how you configured them initially, non-root filesystems could be cleaned automatically. However, unless you changed the default, cleaning of the root filesystem will need to be manually. At the very least you will be prompted to start fsck. Look at the chapter on filesystems for a discussion of fsck. See the filesys(F) man-page for more details on cleaning filesystems automatically.

Checking to see if the root filesystem is dirty is another function of the /etc/bcheckrc script. The indication that the filesystem was not unmounted correctly occurs at boot time when you see the message:

fsstat: root filesystem needs checking

OK to check the root filesystem (/dev/root) (y/n)?

To clean the filesystem, enter y (for "yes'') , this starts the fsck utility, which cleans the filesystem. The extent the fsck utility goes through to clean the filesystem is dependent on the extend of the problems. We'll get into details about this later.

If you have intent logging enabled on your filesystem in OpenServer, the cleaning process is shortened considerably. Normally, fsck must look through every directory and check every file in the inode table to ensure things are consistent. If intent logging is enabled, the log can be replayed and outstanding transactions completed. As a result the time spent to both check and clean the filesystem is only a fraction of what is was previously (seconds compared to minutes).

After we have cleaned the root filesystem, bcheckrc adds the root filesystem to the mount table by hand and then exits. It is here that we are at the point we would have been if we hadn't needed to clean the filesystem. Since bcheckrc is now finished, we need to look in inittab again to see what is run next. Here we find /etc/smmck.

It is the responsibility of /etc/smmck to ensure that files in the TCB are in a consistent state. Many TCB programs create temporary files as they work. It is the job of smmck to determine which of these temporary files is the 'correct' one.

Next, init looks through inittab for the line with initdefault in the third field. The initdefault entry tells the system what run-level to enter initially. It is here we are given the prompt:

INIT: SINGLE USER MODE

Type CONTROL-d to continue with normal startup,

(or give the root password for system maintenance):

This allows us to choose one of two operating modes. If you were to type in the root password a you would enter system maintenance mode. Since the root user is the only one who has access in this mode, this is also referred to as single-user mode.

In maintenance mode, virtually nothing is going on in the system. There are, of course, the system processes such as init, sched, vhand and bdflush. However, that's about it. As a result, what we might consider as normal operations such as printing, or network access are not active. This allows the system administrator to work on the system without fear that his or her actions will conflict with those of others on the system.

Well, what kind of actions? One with the most impact is adding new or updating software. There are often cases where new software with effect the old software in such a way as it is better not to have other users on the system. In such cases, the installation procedures for that software should keep you from installing unless you are in maintenance mode.

This is also a good place to configure hardware that you added or change kernel parameters. Although these rarely impact users, you will have to do a kernel relink. This takes up a lot of system resources and overall performance is degraded. In addition, some utilities such as ps do not work after a kernel relink until the system is rebooted.

Figure 0-2 The flow from single-user to multi-user

If the changes you made do not require you to relink the kernel (say, adding new software), you can go directly from single-user to multi-user mode. This is done by pressing CTRL-D from the command prompt. You also enter multi-user mode when you press CTRL-D at the prompt:

Type CONTROL-d to continue with normal startup,

(or give the root password for system maintenance):

Kind of makes sense, doesn't it?

The very next thing that happens is that you are told that you are now entering a new "run level", with the message:

INIT: New run level: 2

We now go back to inittab file. Init looks for any entry that has a 2 in the second field. This 2 corresponds to the run-level we are currently at. Run-level 2 is the same as multi-user mode. The first thing init finds is the program /etc/asktimerc. This is a shell script that asks you to input the system time, like this:

Current System Time is Fri May 12 16:56:58 MET 1995

Enter new time ([YYMMDD]hhmm[ss]):

The time the system is displaying here is what it obtained from the hardware clock. In most cases, you can simply press enter, as the hardware clock is correct. If not, then you need to enter the correct time. At a minimum you need to enter the hour and minutes (hhmm) using a 24 hour clock and specifying two digits for each. If you want you can also include the year (YY), the month (MM) and the day (DD) or the seconds(SS).

There are a couple of things to note here. First, you can't just specify the year or month, for example. You need to specify all three (year, month and day). Otherwise, the system has no way of determining which one you mean. Also, the ability to set the seconds in new to SCO OpenServer. In previous releases, you could only get minute accuracy. The asktime utility can also be used to set the system clock after the system has started. See the asktime(ADM) man-page for details on the format of these values.

The next thing init finds in inittab is /etc/authckrc. This script checks the security databases. As it is doing this, you see four messages on your screen if you have ODT 3.0:

Checking tcb ...

Checking auth database ...

Checking protected subsystem database ...

Checking ttys database ...

and three if you have SCO OpenServer:

Checking tcb ...

Checking protected password and protected subsystems databases ...

Checking ttys database ...

The change is because the middle two under ODT are the same program (/tcb/bin/authck), which was actually run twice. If you had a lot of users on your system, running it twice took a long time. In SCO OpenServer, authck is run simply with the -a for "all'. Rather than having to load the table twice, they are already there and that saves time.

The reason for these checks is to ensure the integrity of the TCB itself. As I mentioned a moment ago. If the system goes down unexpectedly, your file system could be damaged. One of the things that could be damaged are the TCB files. If that is the case, and authck cannot correct the problem, you are advised of the situation and told to restore from backups. If you have OpenServer, what might be damaged is just the symbolic link and you simply need to restore the link.

If the system can correct problems in the Protected Password Database (perhaps the name exists in /etc/passwd but there is no TCB file), authck asks you if it should correct the problem:

There are errors for this user

Fix them (y/n)?

Next, the Protected Subsystem Database files are compared to the Protected Password Database to check for discrepancies.

You then see this message:

Checking ttys database ...

Thius message is show just prior to the /etc/ttyupd. The ttyupd command is run to ensure that all ttys in /etc/inittab have entries in the Terminal Control Database (/etc/auth/system/ttys. If any files were reported missing, you must now log in on the override terminal to restore them. By default, the override terminal is defined as tty01, also known as the first multiscreen. If you removed the default entry in /etc/default/login, you will have to shut the system off, reboot and enter single user mode, where you can restore the files that are missing or corrupt. When you log in on the override tty, this message is displayed:

The security databases are corrupt.

However, root login at terminal tty01 is allowed.

The next line in inittab is to simply cat the contents of the files in /etc/copyrights. As you would guess these are files containing copyright information. Normally, there is one line per file. Because of the large number of components in OpenServer, it is no longer practicle to display all of the copyrights. Instead, you can use the copyright command to display them.

The last thing init reads out of inittab is /etc/rc2. This is also a shell script and does several things. First, it executes all the scripts in /etc/rc2.d. (Note that very often a .d at the end of a name indicates the directory associated with a particular command or function) It is in these files that most all of the system processes get started. For example, it is here where cron, the print spooler, and the networking processes are started.

The /etc/rc2 executes each startup script within a for-loop. Each startup script is started with a command line like this:

/bin/sh ${f} start

We see here that each of this scripts is started with the Bourne shell. Therefore it is important that any changes made to these scripts maintain the Bourn-Shell syntax. The ${f} is the variable used to keep track of the file name. The key is the trailing start. This is passed as an argument to the start up script.

Well, if it's a "startup" script why do we need to tell the script that we want to start? If you look in /etc/rc2.d, you'll see that each file has multiple lines. In virtually every case, these are linked to files in /etc/rc0.d. So when the system goes into run-level 0 (shutdown), the scripts in /etc/rc0.d are called. They started in similar fashion from /etc/rc0. The difference is that when the system is shutdown, the scripts are called with stop and not start. In each script, the flow of execution is controlled by the argument passed to it. If start, the script starts things up. If stop, the scripts stops everything.

Next /etc/rc2 executes the commands in /etc/idrc.d. In most cases this directory is empty and is provided for compatibility reasons. After that there are the scripts in the /etc/rc.d directory. These carry out a few more system initialization functions and are mostly there for compatibility reasons. This also applies the /etc/rc script, which is the last thing run by /etc/rc2 before it says:

The system is ready.

After it has completed running /etc/rc2, init now runs a getty on all enabled ports. This is what finally gives us our login prompt.

Run-Levels

Most users are only familiar with two run-states or run-levels. The more commonly experienced one is what is referred to a multi-user mode. This is where logins are enabled on terminals, the network is running and the system is behaving "normally". The other is system maintenance or single-user mode, where there is only a single-user on the system (root) who is probably doing some kind of maintenance tasks. Hence their names.

It is generally talked about that the "system" is in a particular run-level. However, it is more accurate to say that the init process is in a particular run-level, as it is init that determines what other processes get started at each run-level.

In addition to the run levels most of us are familiar with, there are several others that the system can run in. Despite this fact, few of them are hardly ever used. For more details on what these run-levels are, take a look at the init(M) man-page.

The system administrator can change to a particular run-level by using that run-level as the argument to init. For example, running init 2 would change the system to run-level 2. To determine what processes to start in each run-level, init reads the /etc/inittab file. This is defined by the second field in the /etc/inittab file. Init read this file and executes each program defined for that run-level in order.

The fields in the inittab file are:

  • id uniquely identify for that entry

  • rstate run-level in which this entry will be processed.

  • action tells init how to treat the process specifically

  • process what process will be started

When the system boots, it decides what run level to go into based on the DEFAULT_LEVEL variable in /etc/default/boot. If this is not set, then the /bin/sulogin program is run which asks whether the system should be brought into maintenance or multi-user mode. If brought into maintenance mode, this is run-level S. If, for whatever reason, the /etc/inittab is corrupt or otherwise unreadable, then this is the only valid run-level.

Keep in mind that you do not have to reboot the system in order enter run-level S. You can do so by passing S as an argument to init as I mentioned above. (init S) In addition, you can use the shutdown command to enter a specific run-level, which will ultimately use init.

If you are in maintenance mode (run-level S) and type in exit, this will kill the shell you are running and the system returns you to the "Type CONTROL-D to proceed with normal startup" prompt. Here, you can press control-D to begin the startup into run-level 2, or multi-user mode. If you want, you could simply type in init 2, which will start the process of brining the system into run-level 2. I have also experienced it where typing exit or CTRL-D in single user mode brings you directly into run-level 2 without prompting you.

If we look in /etc/inittab, we see quit a few entries that have a 2 in the second column. Many of these you will notice are the same programs we talked about earlier when we described the boot process in general. The /etc/inittab file is where they are started from.

One thing I need to point out is that the entries in inittab are not run exactly according to the order they appear. If you are entering a run-level other than S for the first time since boot-up, init will first execute those entries with a boot or bootwait in the third column. These are those processes that should be started before users are allowed access to the system, such as checking the status of the filesystems and then mounting them.

In run-level 2, one of the things done is to start a /etc/getty process on the terminals specified. It is the getty process that gives you your login: prompt. When you have entered your logname for the first time, getty starts the login process which asks you for your password. If incorrect, you are prompted to input your logname again. Note that this time the prompt is different. The first time you (usually) see the system name as part of the prompt. If you input an incorrect logname or password, the second login: prompt does not contain the system name. This is because it is the login process that is giving you the prompt the second time, not getty.

If your password is correct, then the system starts your "login shell." Note that what gets started may not be a shell at all, but some other program. The term "login shell" is the generic term for whatever program is started when you login. This is defined by the last field of the appropriate entry in /etc/passwd.

Keep in mind that you can move in either direction, that is, from a lower run-level to higher run-level or from a higher to a lower without having to first reboot. Init will read the inittab and start the necessary processes. If a particular process is not defined at a particular run-level then init will kill it. For example, assume you are in run-level 2 and switch to run-level 1. Many of the processes defined do not have a 2 in the second field. Therefore, they and all their children will be stopped.

Once we are in multi-user mode, we can return to maintenance mode either through the shutdown command or running init directly. Here I need to emphasize that running init directly is not really a good idea. The shutdown script is designed to be a little more gentle. By running, for example, init S, the system is suddenly in maintenance mode. No warning is given, all user processes simply cease to exists. Since shutdown at least gives some warning, this is less likely to have angry co-workers calling you up.

If we have shutdown from run-level 2 into run-level 1, for example, we see two entries in inittab with a 1 in the second field. These are:

r1:1:wait:/etc/rc1 1> /dev/console 2>&1 </dev/console

co1:1:respawn:/bin/sh -c "sleep 20; exec /etc/getty tty01 sc_m"

The first entry (r1), run the script /etc/rc1. If we look inside that script, we see at the end of the script that what happens is that all processes are stopped (/etc/killall 9) and all filesystems are unmounted (/etc/umountall). The very last thing it does is to run init and switch to run-level S. Here is where we see a difference between run-level 1 and run-level S. Although they are functionally the same for most users, it is the transition to run-level 1 that unmounts filesystem. So, if you shutdown into run-level S, all your filesystems will remain mounted.

To shutdown the system immediately, you could run:

init 0

Which brings the system immediately into run-level 0. If we look in /etc/inittab, we find that there are only two lines that have a 0 in the second column. These are:

r0:056:wait:/etc/rc0 1> /dev/console 2>&1 </dev/console

sd:0:wait:/etc/uadmin 2 0 >/dev/console 2>&1 </dev/console

The first thing that is processed is the r0 line which runs the /etc/rc0 script. If we look in that script, we see that is runs the killall command that stops (almost) all other processes on the system. Shortly, thereafter it exits.

At this point, init continues with the next entry that is for the specified run-level, in our case run-level 0. This line runs the program uadmin, which is what actually shuts the system down. This means nothing is running and the system stops. Note the uadmin this can only be executed from the system console.

Let's back up and look at that first line again:

r0:056:wait:/etc/rc0 1> /dev/console 2>&1 </dev/console

Here there are three numbers in the second column and not just one. This means that if init changes to any one of those three, this line will get executed. If you look in /etc/inittab, you will see that there are several lines that are started in multiple run levels.

After it has started the necessary process from inittab, init just hangs out and waits. When one of it's "descendants" dies (a child process of a child process of a child process, etc of init started by a process that init started), init re-reads the inittab to see what should be done. If, for example, there is a respawn entry in the third field, init will start the specified process again. This is the reason why when you log out, you immediately get a new login: prompt.

Because init just hangs around waiting for processes to die, you cannot simply add an entry to inittab and expect the process to start up. You have tell init to go and re-read the inittab. However, you can force init to re-read the inittab by running init (or telinit) Q. This is the only time you should use the init program yourself.

In addition to the run-levels we discussed here, there are several more that are possible. Unfortunately, this is one of those cases where I have to put off further discussion since these other run-levels are rarely, if ever, used. If you're curious, take a look at the init(M) man-page.

Boot Magic

A very useful facet of SCO UNIX is the ability to boot in many different ways. At the beginning of this section I mentioned the default bootstring (DEFBOOTSTR). As its name implies, this contains the default parameters the system uses to boot. Sometimes it is necessary to change the behavior of the system when it boots. That is change the default.

One alternative is to edit the DEFBOOTSTR (in /etc/default/boot). This is useful if the changes you want to make are somewhat permanent. This is not necessarily a good idea when you want to make a quick test as you need to change the file again, once you're done. The other disadvantage is that there is always the possibility of mistakes. If you make a mistake, then you have to type in a boot string at the boot prompt anyway, so why not type in by hand the first time?

One of the more common things that I do when changing how the system boots is to define specific memory ranges. I do this in a couple of cases. The first is when I suspect anti-caching. Anti-caching occurs when you do not have enough cache for the amount of RAM you have. Normally, this is 64k of cache per 16Mb of RAM. If, for example, you had 32Mb of RAM, but only 64K of cache, you wouldn't have enough. Some motherboards recognize this and then disable all cache. This includes the internal cache on the CPU. This has a dramatic effect on the performance since every instruction must now be taken from main memory and not the cache. You thought that adding RAM was going to speed things up, but instead it slowed things down. See the section on hardware for more details.

Rather than pulling out the extra memory, you can tell the kernel only to use a particular part of memory with different boot strings. This is the mem= option. To tell the kernel that you want it only to use the RAM below 16Mb, the bootstring would look like this:

boot : defbootstr mem=1-16m

The defbootstr tells the kernel to read the default boot string from /etc/default/boot, however us the memory from 1-16Mb. Note that we don't need to specify the memory below 640K as the kernel will use that anyway. If we specified this boot string and the system ran faster, despite the fact we had less memory, then this would indicate anti-caching.

I also use the mem= option to exclude chunks of memory when I suspect that I have bad RAM. In certain motherboards, like the one I have, you cannot just pull out a SIMM to see if your memory problems go away. This is because each bank needs to be completely full in order for things to work. However, if you tell the kernel to skip a certain range, you might be able to figure out which SIMM is bad.

For example, lets say I have four 4MB SIMMs and have a bad spot of memory somewhere. If I tell the kernel at boot to avoid the 3rd SIMM as in: mem=1-8m,12-16m and my problem goes away, then this is the bad SIMM.

There is a problem with checking for bad RAM like this. Some motherboards read memory from all SIMMs in parallel, not sequentially. That is they read 32 bits at a time, eight from each SIMM rather than reading the first SIMM completely before starting on the second.

If you have a system that reads from memory in parallel, then blocking out segments at boot probably won't work. This is because you are blocking out portions of every SIMM and not just one. Unless you know that the system reads in parallel, this is worth a try. However, the only way you can be certain if you get a positive result. That is, the problem goes away.

Another very common thing to do is with the boot string is to boot different kernels. Very often adding a new driver will toast your kernel to the point that it cannot be booted. Another possibility is that your new kernel is beyond the 1024th cylinder and therefore can no longer boot. By typing in the name of another kernel you can boot that. More than likely there is a unix.old if you just relink the kernel. SCO OpenServer maintains a couple of copies such as unix.install and unix.safe.

One very useful trick that I have used is the ability to specify your root filesystem and swap device as well as the kernel that I want to boot. When I was working in SCO Support. I needed quick access to many different operating systems and environments. Rather than having several different machines, I had a single one, with the different operating systems on different filesystems or partitions. When I needed to boot one, I could simple type in the appropriate boot string.

You tell the /boot program where to look by specify the driver name, the minor number and the program in the form:

driver(minor)program

Even in SCO OpenServer the only two driver names it accepts are hd (for the hard disk) and fd (for the floppy). See the section on major and minor numbers or the hd(HW) and fd(HW) man-pages for details on what minor numbers are all about.

To specify other devices, the format is:

device=driver(minor)

Now, let's assume that my hard disk contains only two partitions. The first is for SCO UNIX and the second for SCO XENIX. Since the UNIX partition is active the filesystems, it has minor numbers in the range 40-47. This was also the first partition, so it has minor numbers in the range 8-15. The second partition had minor numbers in the range 16-23

Originally, my default boot string simply said hd(40)unix. This meant it would load the unix program from the hd device with a minor number of 40. Once I started adding the other operating systems, I expanded my boot string to specify each of the devices explicitly, as in:

hd(40)unix root=hd(40) swap=hd(41)

Normally, the unix program is located on the root filesystem. That's why they both have a minor number of 40. I could also have specified the absolute minor numbers as in:

hd(8)unix root=hd(8) swap=hd(9)

To boot by SCO XENIX system, the boot string would look like this:

hd(16)unix root=hd(16) swap=hd(17)

After the third or fourth time you enter one of these strings, you realize that it is bothersome. Fortunately, there is an easier way. You can actually assign names to these strings and have /boot make the translation for you. This is done in the /etc/default/boot file and is called boot aliasing.

Figure 0-3 Loading the system from different filesystems

By default, there is already one boot string alias: DEFBOOTSTR. Although this is what /boot looks for by default, you could just as easily type in defbootstr at the boot prompt. The syntax is the same as variable definitions in sh or ksh. Simply variable_name=value. In the case of the default boot string, the variable name is DEFBOOTSTR and the value is hd(40) (or something similar). You can type in defbootstr and then add something else, such as:

defbootstr Stp=ad(0,3,0)

This uses the default boot string and tells the system to add a SCSI tape drive to the ad (Adaptec) driver at host adapter 0, SCSI ID 3 and LUN 0.

In my /etc/default/boot file, I created two additional aliases. One for my SCO UNIX partition that said:

UNX=hd(8)unix root=hd(8) swap=hd(9)

And one for my SCO XENIX partition that looked like this:

XNX=hd(16)unix root=hd(16) swap=hd(17)

If you are running Open Server, you can also use the System Startup Manager or change the DEFBOOTSTR in /etc/default/boot. This way I can define aliases for boot strings that are fairly complicated, but I don't have to type them in. For example, the two I have on my hard disk are:

OS5=hd(40)unix swap=hd(41) dump=hd(41) root=hd(42)

ODT=hd(24)unix swap=hd(25) dump=hd(25) root=hd(24)

When I want to start up OpenServer, I type in OS5. This is interpreted by /boot and it is as if I had typed in:

hd(40)unix swap=hd(41) dump=hd(41) root=hd(42)

I really don't need to include all these values, how to make sure things are set the way they are supposed to, I aliased it like this.

When I want to start up ODT 3.0, I use the ODT entry. Notice that I don't use the same minor numbers. OpenServer is installed on the active partition and if you remember from the discussion of major and minor numbers, the range for minor numbers on the active partition of the first drive is 40-47. Since ODT 3.0 is not on the active partition, I have to use the absolute minor numbers. Although I could have use absolute minor numbers with OpenServer, this is shows me right way which is partition active.

The bootos Program

I remember a rather "excited" post to the SCOFORUM on CompuServe. The poster was quite upset at what he percieved as a major short coming in SCO. Despite the fact that SCO simply reads and writes a standard bootblock and partition table, he was upset that this preventing him from booting into his OS/2 boot manager. Unfortunately for him, this was one of those cases where a little bit of preparation could have saved him some heartaches.

A major advance of OpenServer is the ability to not only boot DOS, like ODT, but to boot other operating systems, such as Windows NT, OS/2 and even CP/M. This is accomplished by the bootos program, which is called from /boot, depending on the boot options you give it. The bootos program is started simply by entering it at the Boot: prompt. One very useful option is the ?. For example, this would be started like this:

bootos ?

This displays your partition table and includes details such as whether the partition is active, the type of operating system that is recognized, and the size of the partition. By giving it the number of the partition, bootos will attempt to boot the operating system that it finds. The id= option can be used to get bootos to find the first operating system of that type and boot it, for example:

bootos id=dos_12

looks for the first DOS 12-bit filesystem. Or we try:

bootos id=os2

which causes bootos to look for an OS/2 filesystem. Note that OpenServer cannot distinguish between an Windows NT NTFS, OS/2 filesystem or OS/2 HPFS, so the above example could also be specified as:

bootos id=nt

or

bootos id=os2_hpfs

Therefore, if you have both an OS/2 and an NT partition, you will need to specify the partition number.

For those of us who have had a DOS partitions on our systems, we are used to being able to startup by simply typing dos at the boot prompt. This option is still available, however, this is equivalent to starting this system with:

bootos dos

For more details on this, see the bootos(HW) man-page.


Stopping the System

For those of you who hadn't noticed, neither SCO ODT nor SCO OpenServer is like DOS. Despite the superficial similarity at the command prompt, they have little in common. One very important difference is the way you stop the system.

Under DOS, you are in completely omnipotent. You know everything that's going on. You have complete control over everything. If you decide that you've had enough and flip the power switch, you are the only one it is going to effect. However, with dozens of people working on an SCO UNIX system and dozens more using its resources, simply turning off the machine is not something you want to do. Despite the fact you will annoy quite a few people, it can cause damage to your system depending on exactly was happening when you killed the power. (Okay, you could also create problems with a DOS system, but with only one person, the chances are less likely).

In order to make sure that things are stopped safely, you need to shutdown your system "properly". What is considered properly can be a couple of things, depending on the circumstances. SCO provides several tools to stop the system and allows you to decide what is proper for your particular circumstance. Flipping the power switch is not shutting down properly.

The first two are actually two links to the same file: /etc/haltsys and /etc/reboot. These are shell scripts, so you can take a look at the insides and see what they do.

The first thing is to flush the buffer cache and update the superblock with the sync command. If this is not done and the system stops, there is a high likely hood that what is on the hard disk is not what should be. There are two issues here. First, if you are writing to the disk, more than likely you are going through the buffer cache. If the system stops before that information gets written to the disk, the last changes you made won't have been written to the file.

Second, if you look in either file (haltsys or reboot) you'll see that sync is actually called twice. This was orginally included to ensure that more data is written to the hard disk and things are more likely to be consistent. If something was written as the sync was proceeding, there could still be some information that was not written to the disk. Granted this can catch everything. There is still the chance that after the second sync, something gets written to the buffer cache, however running sync twice decreases the likelihood something will get lost. It is not only there for historical reasons.

Next, all file systems are unmounted. Part of this process is also to ensure that any pending I/O is completed. This is obviously necessary to prevent any more data loss.

The last thing done is to actually shut the system down using /etc/uadmin command. Which arguments are passed to uadmin depend on the command and arguments we used to shut down the system. It's possible to pass an argument to the haltsys command that prevents you from rebooting, for example. At this point the system is down and if you started reboot instead of haltsys, it is rebooted for you. This reboot is one time where SCO does use the BIOS.

One thing to keep in mind is that with both haltsys and reboot, the system simply stops. You are not given any warning nor are processes allowed to exit gracefully. All processes cease to exist because the system no longer exists, not because they were sent any signal.

This is really only an issue if you have other users accessing your system, either logged in locally or using system resources across the net. If on the other hand, you brought the system into maintenance mode to add a new driver (or whatever) and need to reboot to make the changes take effect, then there is no harm in either using haltsys or reboot.

When you need to give your user's notice that the system is going to come down or have processes running that need to be stopped gracefully, then neither haltsys nor reboot is a good idea. A better program is /etc/shutdown, which not only gives users notice, but allows programs to stop fairly gracefully, without pulling the rug out from underneath them.

Like haltsys and reboot, shutdown is a shell script. However, it is not linked to the other to but rather to /tcb/files/rootcmds/shutdown, this allows it to be used by non-root users to shutdown the system, if necessary. Putting commands in the rootcmds directory is necessary because one of the first things done is check the UID of the user starting it. If it's not root, then shutdown exists.

I won't go into too many details about the various option, since they are all listed in the manual. It suffices to say that you can tell shutdown to wait a certain amount of time before stopping the system as well as send a specific message to the users that the shutdown is taking place.

The important thing to note is that shutdown does not actually kill any process, rather it lets init do it. One of the arguments you pass to shutdown is the new init state (run-level) you want to run in after warning users that a shutdown is coming and finally telling them to log off or risk loosing data, shutdown calls init to bring the system to the specified run level. If no run level is specific, it defaults to run-level 0 and the system is brought all the way down.

One advantage this has is the ability to go from normal, multi-user mode (run level 2) to maintenance mode (run-level S or 1). Here we get to a common misconception. Is often thought that run-level S and run-level 1 are both the same since the system is brought into "single-user" mode. Well, it is true that in both cases the only user on the system is root. However, that's where the similarity ends.

Run-level S is essentially the same run-level as when we first boot up and go into maintenance mode. Here only the root filesystem is mounted (all others are unmounted if going from run-level 2 to run-level S) and all processes are kill except those connected to the system console.

If you switch run-levels from run-level 2 to run-level S, then basically the only thing that happens is that users processes are tossed off the system. All file systems remain mounted. Although for users this means the same thing, it is useful to keep filesystems mounted without having users on the system. However, as I mentioned before, in run-level 1, all the filesystems are unmounted and it is more like the initial boot-up.

If you've decided to shutdown the system completely (run-level 0), all filesystems are unmounted, all processes are killed and on your screen you see:

** Safe to Power Off **

-or-

** Press Any Key to Reboot **


The System Shutdown Manager provides a graphic interface to the shutdown command. The functions are the same as from the command line. However, you do get to point and click.




Next: Printers and Interfaces

Index

Copyright 1996-1998 by James Mohr. All rights reserved. Used by permission of the author.

Be sure to visit Jim's great Linux Tutorial web site at http://www.linux-tutorial.info/


Popular Pages