Copyright 1996-1998 by James Mohr. All rights reserved. Used by permission of the author.
Be sure to visit Jim's great Linux Tutorial web site at http://www.linux-tutorial.info/
Hardware is my life. I love working with it. I love installing it. I love reading about it. I am by no means an expert in such a way that I can tell you about every chip on the motherboard. In fact, I enjoy being a "jack-of-all-trades." Of all the trades I am a jack of (of which I am a jack?), I enjoy hardware the most.
It's difficult to say why. There is, of course, that without the hardware nothing works. Software, without hardware, is just words on a page. However, it's something more than just that. I like the idea that it all started out as rocks and sand and now it can send men to the moon and look inside of atoms.
I think that this is what it's all about. Between the hardware and the operating system (I also love operating systems) you pretty much got the whole ball of wax.
During the several years I spent on the phone, it was common to have people call in with no idea of what kind of computer they had. I remember one conversation with a customer where he answered "I don't know” to every questioned I asked about his hardware. Finally, he got so frustrated he said, "Look! I'm not a computer person. I just want you to tell me what's wrong with my system.”
Imagine calling your mechanic to say there is something wrong with your car. He asks you whether is car has 4 or 8 cylinders, whether it has fuel injection or not, whether it is automatic or manual, and whether it uses unleaded or leaded gas. You finally get frustrated and say, "Look. I'm not a engine person, I just want you to tell me what's wrong with my car.”
The solution is to drive your car to the mechanic and have it checked. However, you can't always do that with your computer system. You have dozens of people who rely on it to do their work. Without it, the business stops. In order to better track down and diagnose hardware problems, you need to know what to look for.
This section should serve as a background for many issues we cover in elsewhere. This chapter is designed more to familiarize you with the concepts, rather than make you an expert on any aspect of the hardware. If you want to read more about PC hardware, a good place is the Winn Rosch Hardware Bible from Brady Books.
A key concept for this discussion is the bus. So, just what is a bus? Well, in computer terms it has a similar meaning as your local county public transit. It is used to move something from one place to another. For the county transit bus, what it moves is people. For a computer bus, what it moves is information.
The information is transmitted along the bus as electric signals. If you have ever opened up a computer, you probably saw that there was one central printed circuit board with the CPU, the expansion cards and loads of chips sticking out of it. The electronic connections between these parts is referred to as a bus.
The signals that moves along a computer bus comes in two basic forms: control and data. Control signals do just that: they control things. Data signals are just that: data. How this happens and what each part does we will get to as we move along.
In today's PC computer market, there are several buses, which have many of the same functions, but approach thing quite differently. In this section, we are going to talk about what goes on between the different devices on the bus, what the main components are that communicate along the bus and then talk about the different bus types.
Despite difference in bus types, there are certain aspects of the hardware that are common with among all PCs. The Basic Input Output System (BIOS), interrupts, Direct Memory Access channels and base addresses are just a few. Although once the kernel is loaded, SCO UNIX almost never needs the system BIOS, understanding it's function and purpose is useful in understanding the process that the computer goes through from the time you hit the power switch to when SCO UNIX has full control of the hardware.
The BIOS is the mechanism DOS uses to access the hardware. DOS (or a DOS application) makes BIOS calls, which then transfer the data to and from the devices. Expect for the first few moments of the boot process and the last moment of a shutdown, SCO UNIX may never again use it again.
The "standard" BIOS for PCs is the IBM BIOS, but that's simply because "PC” is an IBM standard. However, "standard" does not mean "most common," as there are several other BIOS vendors, such as Phoenix and AMI.
DOS or a DOS applications make device independent calls to the BIOS in order to transfer data. The BIOS then translate this into device dependent instructions. For example, DOS (or application) requests that the hard disk read a certain block of data. The application does not care what kind of hard disk hardware there is, nor should it. It is the job of the BIOS to make that translation to something the specific harddrive can understand.
In SCO UNIX, on the other hand, there is a special program called a device driver that handles the functions of the BIOS. As we talked about in the section on the kernel, device drivers are sets of routines that directly access the hardware. Just as the BIOS does. It is important to note that although the SCO UNIX kernel accesses devices primary through devices drivers, there are circumstances where the BIOS is accessed. For example Certain video card drivers use it as well as the SCO UNIX kernel itself when it is rebooting the system after you issue the reboot or shutdown command,.
The fact that SCO UNIX by-passes the BIOS and goes directly to the hardware is one reason why some hardware will work under DOS and not under SCO UNIX. In some instances, the BIOS has been specially designed for the machine that it runs on. Because of this, it can speak the same dialect of "machine language” that the rest of the hardware speaks. However, since UNIX does not speak the same dialect, things get lost in the translation.
The Intel 80x86 family of processors (which SCO runs on) have an I/O space that is distinct from memory space. What this means is that memory (or RAM) is treated differently than I/O. Other machine architectures, such as the Motorola 68000 family, see accessing memory and I/O as the same thing. Although the addresses for I/O devices appears as "normal" memory addresses and the CPU is performing a read or write as it would to RAM, the result is completely different.
When accessing memory, either for a read or write, the CPU utilizes the same address and data lines as it does when accessing RAM. The difference lies in the M/IO# line on the CPU. For those not familiar with digital electronics, this can also be described as the Memory/Not IO line. That is, if the line is high, the CPU is addressing memory. If it is low, it is addressing an I/O device.
Although the SCO UNIX operating system is much different from DOS, it still must access the hardware in the same fashion. There are assembly language instruction that allow an operating system (or any program for that matter) to access the hardware correctly. By passing these commands the base address of the I/O device, the CPU knows to keep the M/IO# line low and therefore access the device and not memory.
You can see the base address of each device on the system every time you boot. The hardware screen shows you the devices it recognizes along with certain values such as the base address, the interrupt vector and the DMA channel. You can also see this same information by running the hwconfig command.
Although there are 16 I/O address lines coming from the 80386, some PCs only have 10 of these wired. So instead of having 64K of I/O address space (216), there is only 1K (210). When the system detects this you see the message 10 bits of I/O address decoding when the system is booting. Some machines have 11 or more address lines and, therefore, have a larger I/O space.
If your motherboard only uses 10 address lines, devices on the mother board that have I/O address (such as the DMA controller and PIC) will appear at their normal address as well as "image" addresses. This is because the high 6 bits are ignored, so any 16-bit address where the lower ten bits match will show up as an "image" address. Since there are 6 bits that are ignored, there are 63 possible "image” address. (64 minus the one for the "real" address)
These "image" addresses may cause conflicts with hardware that have I/O address higher than 0x3FF (1023), which is the highest possible with only 10 address lines. Therefore, if your motherboard only has 10 bits of I/O addresses, you shouldn't put devices at addresses higher than 0x3FF.
When installing, it is vital that no two devices have overlapping (or identical) base addresses. Whereas you can share interrupts and DMA channels on some machines, you can never share base addresses. If you attempt to read a device that has an overlapping base address, you may end up getting information from both devices.
If you are installing a board, whose default base address is the same as one already on the system, one needs to get changed before they both can work. Additionally, the base address of a card is almost always asked during its installation. Therefore you will need to keep track of this. See the section on troubleshooting for tips on maintaining a notebook with this kind of information.
Table 0.1 contains a list of the more common devices and the base address ranges that they use:
|
HexRange |
Device |
|
000-0ff |
Motherboard devices (DMA Controller, PIC, timer chip, etc.) |
|
1f0-1f8 |
Fixed disk controller (WD10xx) |
|
278-27f |
Parallel port 2 |
|
2f8-2ff |
Serial port 2 |
|
378-37f |
Parallel port 1 |
|
3bc-3bf |
Monochrome display and parallel port 2 |
|
3c0-3cf |
EGA or VGA adapter |
|
3d0-3df |
CGA, EGA or VGA adapter |
|
3f0-3f7 |
Floppy disk controller |
|
3f8-3ff |
Serial port 1 |
Table 0.1 Common hex addresses
It is generally understood that the speed and capabilities of the CPU is directly related to the performance of the system as a whole. In fact, the CPU is a major selling point of PCs, especially among less experienced users. One aspect of the machine that is less understood and therefore less likely to be an issue is the expansion bus.
The expansion bus, simply put, is the set of connections and slots that allow users to add to, or expand, their system. Although not really an "expansion" of the system, you often find video cards and hard disk controllers attached to the "expansion bus."
Anyone who has opened up their machine has seen parts of the expansion bus. The slots used to connect cards to the system are part of this bus. A thing to note is that people will often refer to this bus as the bus. While it will be understood what is meant, there are other buses on the system. Just keep this in mind as you go through this chapter.
Most people are aware of the differences in CPUs. This could be whether the CPU is 16-bit or 32-bit, what the speed of the processor is, whether there is a math co-processor, and so on. The concepts of BIOS and interrupts are also very commonly understood.
One part of the machines hardware that is somewhat less known and often causes confusion is the bus architecture. This is the basic way in which the hardware components (usually on the motherboard) all fit together. There are three different bus architectures on which SCO operating systems will currently run. (Note: Here I am referring to the main system bus, although SCO can access devices on other buses.)
The three major types of bus architectures used are the Industry Standard Architecture (ISA), the Extended Industry Standard Architecture (EISA), and the Micro-Channel Architecture. Both ISA and EISA machines are manufactured by a wide range of companies, but only a few (primarily IBM) manufacture MCA machines.
In addition to the three mentioned above, there a few other bus types that can be used in conjunction or supplementary to the three. These include the Small Computer System Interface (SCSI), Peripheral Component Interconnect (PCI) and The Video Electronics Standards Association Local Bus (VL-Bus or VLB).
Both PCI and VLB exist as separate buses on the computer motherboard. Expansion cards exist for both these types of buses. You will usually find either PCI or VLB in addition to either ISA or EISA. Sometimes, however, you can also find both PCI and VLB in addition to the primary bus. In addition, it is possible to have machines that only have PCI, since it is a true system bus and not an expansion bus like VLB. However, as of this writing few machines provide PCI-only expansion buses.
SCSI, on the other hand, compliments the existing bus architecture by adding an additional hardware controller to the system. There are SCSI controllers (more commonly referred to as host adapters) that fit in ISA, EISA, MCA, PCI or VL-Bus slots.
As I mentioned before, most people are generally aware of the relationship between CPU performance and system performance. However, every system is only as strong as it's weakest component. Therefore, the expansion bus also sets limits on the system performance.
There were several drawbacks with the expansion original bus in the original IBM PC. First, it was limited to only 8 data lines. This meant that only 8 bits could be transferred at a time. Second, the expansion bus was, in a way, directly connected to the CPU. Therefore, it operated at the same speed as the CPU. This meant that in order to improve performance with the CPU, the expansion bus had to be altered as well. The result would have been that existing expansion cards would be obsolete.
In the early days of PC computing, IBM was not known to want to cut it's own throat. It has already developed quite a following with the IBM PC among users and developers. If it decided to change the design of the expansion bus, developers would have to re-invent the wheel and users would have to buy all new equipment. Instead of sticking with IBM, there was the risk that users and developers would switch to another platform.
Rather than risking that, IBM decided that backward compatibility was a paramount issue. One of the key changes was severing the direct connection between the expansion bus and CPU. As a result expansion boards could operate at a different speed than the CPU. This allowed users to keep existing hardware and allowed manufacturers to keep producing their expansion cards. As a result, the IBM standard became the industry standard and the bus architecture became known as the Industry Standard Architecture, or ISA.
In addition to this change, IBM added more address and data lines. They doubled the data lines to 16 and increased the address lines to 24. This meant that the system could address up to 16 megabytes of memory, the maximum that the 80286 CPU (Intel's newest central processor at the time) could handle.
When the 80386 came out, the connection between the CPU and bus clocks were severed completely, since no expansion board could operate at the 16MHz or more that the 80386 could. The bus speed does not need to be an exact fraction of the CPU speed, but an attempt has been made to keep it there, since by keeping the bus and CPU synchronized it is easier to transfer data. The CPU will only accept data when it coincides with it's own clock. If an attempt is made to speed up the bus a little, the data must wait until the right moment in the CPUs clock cycle before it can pass the data. Therefore, nothing has been gained by making it faster.
One method used to speed up the transfer of data is Direct Memory Access or DMA. Although DMA existed in the IBM XT, the ISA bus provided some extra lines. DMA allows the system to move data from place to place without the intervention of the CPU. In that way, data can be transferred from, let's say, the hard disk to memory while the CPU is working on something else. Keep in mind that in order to make the transfer, the DMA controller must have complete control of both the data and the address lines, so the CPU cannot be accessing memory itself at this time.
Figure 0
Let's step back here a minute. It is somewhat of a misnomer to say that a DMA transfer occurs without intervention from the CPU, as it is the CPU that must initiate the transfer. However, once the transfer is started, the CPU is free to continue with other activities. DMA controllers on ISA-Bus machines use "pass-through" or "fly-by" transferred. That is, the data is not latched, or held internally, but rather is simply passes through the controller. If it were latched, two cycles would be needed: one to latch into the DMA controller and the second to pass it to the device or memory (depending on which was it was headed).
Devices tell the DMA controller that they wish to make DMA transfers through the use of one of three "DMA Request" lines, numbered 1-3. Each of these lines is given a priority based on it's number, with 1 being the highest. The ISA-Bus includes two sets of DMA controllers. There are four 8-bit channels and four 16-bit channels. The channels are labeled 0-7, with 0 having the highest priority.
Each device on the system that is capable of doing DMA transfers is given it's own DMA channel. The channel is set on the expansion board usually by means of jumpers. The pins that these jumpers are connected to are usually labeled DRQ, for DMA Request.
The two DMA controllers (both Intel 8237), each with 4 DMA channels, are cascaded together. The master DMA controller is the one that is connected directly to the CPU. One of it's DMA channels is used to connect to the slave controller. Because of this, there are actually only seven channels available.
Everyone who has had a baby knows what an interrupt driven operating system like SCO UNIX goes through on a regular bases. Just like a baby when it needs its diaper changed, when a device on the expansion bus needs servicing it tells the system by generating an interrupt. For example, when the hard disk has transferred the requested data to or from memory, it signals the CPU by means of an interrupt. When keys are pressed on the keyboard, the keyboard interface also generates an interrupt.
Upon receipt of such an interrupt, the system executes a set of functions commonly referred to as an Interrupt Service Routine, or ISR. Since the reaction to a key being pressed on the keyboard is different from the reaction when data is transferred from the hard disk, there needs to be different ISRs for each device. Although the behavior of ISRs is different under DOS than UNIX, their functionality is basically the same. For details of how this work under SCO, see the chapter on the kernel.
On the CPU there is a single interrupt request line. This does not mean that every device on the system is connected to the CPU via this single line. Just like there is a DMA controller to handle DMA requests, there is also an interrupt controller to handle interrupt requests. This is the Intel 8259 Programmable interrupt controller, or PIC.
On the original IBM PC, there were five "Interrupt Request" lines, numbered 2-7. Here again the higher the number the lower the priority. (Interrupts 0 and 1 are used internally and are not available for expansion cards.)
The ISA-Bus also added an additional PIC, which is "cascaded" off the first one. With this addition, there were now 16 interrupt values on the system. However, not all of these were available to devices. Interrupts 0 and 1 were still used internally, but also were interrupts 8 and 13. Interrupt 2 was something special. It too was reserved for system use, but instead of being a device of some kind, an interrupt on line 2, actually means that an interrupt is coming from the 2nd PIC. Similar to the way cascading works on the DMA controller.
A question that I brought up when I first started learning about interrupts is "What happens when the system is servicing an interrupt and another one comes in?" Well there are two mechanism for helping in this.
Remember that the 8259 is a "programmable" interrupt controller. There is a machine instruction called 'Clear Interrupt Enable' or CLI. If a program is executing what is called a critical section of code (on that should not be stopped in the middle), the programmer can call the CLI instruction and disable acknowledgment of all incoming interrupts. As soon as the critical section is left, the program should execute a 'Set Interrupt Enable', or STI instruction within a timely manner.
I say "should" because the programmer doesn't have to. There could be a CLI instruction in the middle of a program somewhere and if the STI is never called, no more interrupts will be serviced. Nothing, aside from common sense, prevents him or her from doing this. Should the program take too long before it calls the STI, interrupts could get lost. This is common on busy systems when characters from the keyboard 'disappear'.
The second mechanism is that the interrupts are priority based. The lower the interrupt request level, or IRQ, the higher the priority. This has an interesting side effect since the second PIC (or slave) is bridged off the first PIC (or master) at IRQ2. The interrupts on the first PIC are numbered 0-7 and on the second PIC 8-15. However, interrupt 2 is where the slave PIC is attached to the master. Therefore, the actual priority is 0,1,8-15,3-7.
Table 0.2 contains a list of the standard interrupts.
|
IRQ |
Device |
|
0 |
system timer |
|
1 |
keyboard |
|
2 |
2nd level interrupt |
|
3 |
COM 2 |
|
4 |
COM 1 |
|
5 |
Printer 2 |
|
6 |
floppy |
|
7 |
Printer 1 |
|
8 |
clock |
|
9 |
not assigned |
|
10 |
not assigned |
|
11 |
not assigned |
|
12 |
not assigned |
|
13 |
math co-processor |
|
14 |
Hard Disk |
|
15 |
Hard Disk |
One consideration needed to be made when dealing with interrupts. On XT machines, IRQ 2 was a valid interrupt. Now on AT machines, IRQ 2 was bridged to the second PIC. So, in order to ensure that devices configured to IRQ 2 worked properly, the IRQ 2 pin on the all the expansion slots was connected to the IRQ 9 input of the second PIC. In addition, all the device attached to the second PIC, have a IRQ value associated with where they are attached to the PIC, plus the fact they generate and IRQ 2 on the first PIC.
The PICs on an ISA machine are edge-triggered. This means that they react only when the interrupt signal is transitioning from low to high. That is, it is on a transition edge. This becomes an issue when you attempt to share interrupts. This is where two devices use the same interrupt.
Assume you have a serial port and floppy controller both at interrupt 6. If the serial port generates an interrupt, the system will "service" it. If the floppy controller generates an interrupt before the system has finished servicing the interrupt for the serial port, the interrupt from the floppy gets lost. There is another way to react to interrupts called "level triggered” which we will get to shortly.
As I mentioned earlier, a primary consideration in the design of the AT Bus (as the changed PC bus came to be called) was that it maintained compatibility with it predecessors. It maintains compatibility with the PC expansion cards, but takes advantage of 16-bit technology. In order to do this, connectors were not changed only added. Therefore, card designed for the 8-bit PC bus, could be slide right into a 16-bit slot on the ISA-Bus and no one would know the difference.
The introduction of IBM's Micro Channel Architecture (MCA) was a redesign of the entire bus architecture. Although IBM was the developer of the original AT architecture, which later became ISA, there were many companies producing machines that followed this standard. The introduction of MCA mean that IBM could produce machines that it alone had the patent rights to.
One of the most obvious differences is the smaller slots required for MCA cards. ISA cards are 4.75 x 13.5 inches, compared with the 3.5 x 11.5 inches of MCA cards. As a result, the same number of cards can fit into a smaller area. The drawback was that ISA cards can not fit into MCA slots and MCA cards could not fit into ISA slots. Although this might seem like IBM had decided to cut its own throar, the changes made in creating MCA made it very appealing.
Part of the decrease in size was a result of surface mount components or surface mount technology (SMT). Previously cards used "through-hole" mounting were holes were drilled through the system board (hence the name). Chips where mounted in this holes or into holders that were mounted in the holes. Surface mount does not use this and as a result, looks "flattened" by comparison. This not only saves space, but also time and money as SMT cards are easier to produce. In addition, the spacing between the pins on the card ( 0.050") corresponds to the spacing on the chips. This makes designing the boards much easier.
Micro Channel also gives increases in speed since there is a ground on every fourth pin. This reduces interference and as a result, MCA bus can operate at ten times the speed of non-MCA machines and still comply with FCC regulations in terms of radio frequency interference.
Another major improvement was the expansion of the data bus to 32 bits. This meant that machines were no longer limited to 16 megabytes of memory, but could now access 4 gigabytes.
One of the key changes in the MCA architecture was the concept of hardware-mediated bus arbitration. With ISA machines, devices could share the bus, and the OS was required to arbitrate who got a turn. With MCA, that arbitration is done at the hardware level, freeing the OS to work on other things. This also enables multiple processors to use the bus. To implement this, there are several new lines to the bus. There are four lines that determine the arbitration bus priority level, which represents 16 different priority levels that a device could have. Who gets the bus, is dependent on the priority.
From the user's perspective, installation of MCA cards is much easier than for ISA cards. This is due to the introduction of the Programmable Option Select, or POS. With this, the entire hardware configuration is stored in the CMOS. When new cards are added, you are required to run the machine's reference disk. In addition, each card comes with an options disk which contains configuration information for the card. With the combination of reference disk and options disk, conflicts are all but eliminated.
Part of the MCA spec is that each card has it's own unique identifying number encoded into the firmware. When the system boots, the settings in the CMOS are compared to the cards that are found on the bus. If one has been added or removed, the system requires you to boot using the reference disk to ensure things are set up correctly.
As I mentioned, on each options disk is the necessary configuration information. This is contained within the Adapter Description File (ADF). The ADF contains all the necessary information to get the expansion card to be recognized by your system. Because it is only a few kilobytes in size, many ADF files can be store on a floppy. This is useful in situations like we had in SCO Support. There were several MCA machines in the department, with dozens of expansion cards, each with their own ADF file. Rather than having copies of each of the diskettes, the analysts who supported MCA machines (myself included) each had a single disk with all the ADF files. (Eventually that too became burdensome, so we copied the ADF files into a central directory where we could copy them as needed.) Any time we needed to add a new card to our machines for testing, we didn't need to worry about the ADF files, as they were all in one place.
Since each device has it's own identification number and this number is stored in the ADF, the reference diskette can find the appropriate one with no problem. All ADF files have names such as @BFDF.ADF, so it isn't obvious what kind of card the ADF file is for, just by looking at the name. However, this since the ADF files are simply text files, it is easy to figure out by looking at the contents.
Unlike ISA machines, the MCA architecture allows for interrupt sharing. Since many expansion boards are limited to a small range of interrupts, it is often difficult, if not impossible to configure every combination on your system. Interrupt sharing is possible on MCA machine because they use something called level-triggered interrupts or level-sensitive interrupts.
With edge-triggered interrupts, or edge-sensitive interrupts, that the standard ISA-bus use, an interrupt is generated and then drops. This sets a flag in the PIC, which figures out which device generated the interrupt and services it. If interrupts were shared with edge-triggered interrupts, any interrupt that arrived between the time the first one is generated and serviced would be lost. This is because the PIC has no means of knowing that a second one occurred. All it sees is that an interrupt occurred.
Figure 0
With level-triggered interrupts, when an interrupt is generated it is held high until the PIC forces it low after the interrupt has been serviced. If an other device were on the same interrupt, the PIC would try to pull down the interrupt line, however, the seconds device would keep it high. The PIC would then see that it was high and would be able to service the second device.
Despite the many obvious advantages of the MCA, there are a few drawbacks. One of the primary drawbacks is the interchangeability of expansion cards between architectures. MCA cards can only fit in MCA machines. However, it is possible to use an ISA card in an EISA machine and EISA machines is what we will talk about next.
In order to break the hold that IBM had on the 32-bit bus market with the Micro-Channel Architecture, a consortium of computer companies, lead by Compaq, issued their own standard in September, 1988. This new standard was an extension of the ISA bus architecture and was (logically) called the Extended Industry Standard Architecture (EISA). EISA offered many of the same feature as MCA, but with a different approach.
Although EISA provides some major improvements, it has maintained backward compatibility with ISA boards. Therefore, existing ISA boards can be used in EISA machines. In some cases, such boards can even take advantage of the features that EISA offers.
In order to maintain this compatibility, EISA boards are the same physical size as there ISA counterparts as well as providing connections to the bus in the same locations. The original designed called for an extension of the bus slot, similar to the way the AT slots were an extension on the XT slots. However, this was deemed impractical as some hardware vendors had additional contacts that extended beyond the ends of the slots. There was also the issue that in most cases, the slots would extend the entire length of the motherboard. This meant that the motherboard would need to be either longer or wider to handle the longer slots.
Instead, the current spec calls for the additional connections to be "intertwined" with the old ones and extending lower. In what used to be gaps between the connectors, there are now leads to the new connectors. Therefore, EISA slots are deeper than those for ISA machines. By looking at EISA cards you can easily tell them from ISA cards by the two rows of connectors.
Figure 02833 shows what the ISA and EISA connections look like. Note that this is not to scale.
Figure 0
Another major improvement of EISA over ISA is the issue of bus arbitration Bus arbitration is the process by which devices "discuss” whose turn it is on the bus and then let one of them go. In XT and AT class machines, control of the bus was completely managed by the CPU. EISA includes additional control hardware to take this job away from the CPU. This does two important things. First, the CPU is now 'free' to carry on more important work and second the CPU gets to use the bus only when it's turn comes around.
Hmmm. Does that sound right? Since the CPU is the single most important piece of hardware on the system, shouldn't it get the bus whenever it needs it? Well, yes and no. The key issue of contention is the use of the word "single." EISA was designed with multi-processing in mind. That is, computers with more than one CPU. If there are more than one CPU, which one is more important?
The term used here is bus arbitration. Each of the six devices that EISA allows to take control of the bus, has it's own priority level. A device signals it's desire for the bus by sending a signal to the Centralized Arbitration Control (CAC) unit. If conflicts arise (i.e. multiple requests), the CAC units resolves them according to the priority of the requesting devices. Certain activity such as DMA and memory refresh have the highest priority, with the CPU following close behind. Such devices are called "bus mastering devices” or "bus masters” as the become the master of the bus.
The EISA DMA controller was designed for devices that cannot take advantage of the bus mastering capabilities of EISA. The DMA controller supports ISA, with ISA timing and 24 bit addressing as the default mode. However, it can be configured by EISA devices to take full advantage of the 32-bit capabilities.
Another advantage that EISA has is the concept of dual buses. Since cache memory is considered a basic part of the EISA specification, the CPU can often continue working for some time even if it does not have access to the bus.
A major drawback of EISA (as compared with MCA) is that in order to maintain the compatibility to ISA, EISA speed improvements cannot extend into memory. This is because the ISA bus cannot handle the speed requirements of the high-speed CPUs. Therefore, EISA requires separate memory buses. This results in every manufacturer having its own memory expansion cards.
In our discussion on ISA we talked about the problems with sharing of level-triggered interrupts. MCA on the other hand uses edge-triggered which allows interrupt sharing. EISA uses a combination of the two. Obviously, it needs to support edge-triggered to maintain compatibility with ISA cards. However, it allows EISA boards to configure that particular interrupt as either edge or level triggered.
As with MCA, EISA allows each board to be identified at boot up. Each manufacturer is assigned a prefix code to ease identification of the board. EISA also provides a configuration utility, similar to the MCA reference disk to allow configuration of the cards. In addition, EISA supports automatic configuration which allows the system to recognize the hardware at boot-up and configure itself accordingly. This can present problems for SCO system as drivers in the kernel rely on the configuration to remain constant. Since each slot on an EISA machine is given a particular range of base address, it is necessary to modify your kernel prior to making such changes. This is often referred to as the EISA-config, EISA Configuration Utility or ECU.
As I've said before and all say again, the system is only as good as its weakest part. With computer systems, that weakest link has been the IO subsystem for many years. CPUs got faster, but the system was still limited by slow communication with the outside world. The 32-bit buses of MCA and EISA made significant advances and increased throughput by a factor of 5 or more. However, this was not enough.
The Video Electronics Standards Association, or VESA, ( a consortium of over 120 companies) came up with an immediate solution to this problem. Although originally intended as a means of speeding up video transfer, the VESA local bus, or VL-Bus can achieve data transfer speeds that make it a worthy partner to fast 80386, 80486 CPUs and even the Intel Pentium.
Like EISA, the VL-Bus is a hybrid. That is, it is not a complete change from ISA as MCA is. Whereas EISA interleaves the new connections with the old, the VL-Bus extends the existing slots, something EISA decided not to do. Because of the load put on the system by the VL-Bus, usually only three slots on the mother board have the VL-Bus extension. The other remain just ISA, EISA or MCA.
The reason for the three card limit is one of performance. There is the slight cost increase for adding the extra connectors and traces, however the lure of the increased performance would outweigh the cost. Alas, things are not that easy. The CPU is directly accessing the control, address and data pins of the VL-Bus cards (That's why it's call local). However, unless you want to reduce the speed of the CPU, (ya, right) the CPU just can't handle more than three external loads. In practice, this means that although there are three slots, the CPU can't have more than one or two at speeds greater than 33MHz.
However, on the other hand it is relatively inexpensive to change an existing ISA or EISA design into a VL-Bus. There are a few new chips, a couple of new traces on the motherboard and two or three new connectors. There isn't even a change to the BIOS.
VL-Bus is not intended as a replacement for ISA, although MCA and EISA sell themselves as such. (Or a replacement for each other, depending on whose literature you read) Current technology doesn't seem to allow it. As I mentioned, you can only have one or two VL-Bus devices before you have to consider reducing your CPU speed. Therefore, you have to have some other kind of bus slots, as well.
ISA/ESIA slots are the same length, with VLB slots hanging down "below" them. Because the VL-Bus slots are an extension of the existing slots, it is not necessary to leave those slots empty if you have only one or two VL-Bus cards. In fact, all the slots with the VL-Bus extension can be filled with other cards. (ISA, EISA or MCA).
Watch out for machines that are advertised as "local bus". It is true that they might be, however there is a catch. Sometimes they have an SVGA chip or hard disk controller built onto the mother board. These are connected directly to the CPU and are therefore "local", but they do not adhere to the VL-Bus spec.
More and more machines you find on the market today are being included with PCI local buses. One advantage that PCI offers over VL-BUS is the higher performance, automatic configuration of peripheral cards, and superior compatibility. A major drawback with the other bus types (ISA, EISA, MCA) is the I/O bottleneck. Local buses overcome this by accessing memory using the same signals lines as the CPU. As a result, they can operate at the full speed of the CPU as well as utilizing the 32-bit data path. Therefore, I/O performance is limited by the card and not the bus.
Although PCI is referred to as a local bus, it actually lies somewhere "above" the system bus. As a result it is often referred to as a "mezzanine bus" and has electronic "bridges" between the system bus and the expansion bus. As a result, the PCI bus can support up to 5 PCI devices, whereas the VL-BUS can only support two or three. In addition, the PCI bus can reach transfer speeds four times that of EISA or MCA.
Despite PCI being called a mezzanine bus, it could replace either ISA, EISA or MCA buses. Although in most cases, PCI is offered as a supplement to the existing bus type. If you look at a motherboard with PCI slots, you will see that they are completely separate from the other slots. Whereas VLB slots are extensions of the existing slots.
PCI offers additional advantages over the VLB as the VLB cannot keep up with the speed of the faster CPUs, especially if there are multiple VLB devices on the system. Because PCI works together with the CPU it is much more suited to multi-tasking operating systems like UNIX. Whereas the CPU cannot work independently if a VLB device is running.
Like EISA and MCA, PCI boards have configuration information built into the card. As the computer is booting, the system can configure each card individually based on system resources. This configuration is done "around" existing ISA, EISA and MCA cards on your system.
To overcome a shortcoming PCI has when transferring data, Intel (designer and chief proponent of PCI) has come up with a PCI specific chip sets, which allows data to be stored on the PCI controller, freeing the CPU to do other work. Although this may delay the start of the transfer, however once the data flow starts, it should continue uninterrupted.
A shortcoming of PCI, (at least from SCO's perspective) is that ISA and EISA cards can be swapped for VLB cards, without any major problems. This is not so for the PCI cards. Significant changes need to be made to both the kernel and device drivers to account for the differences.
The SCSI bus is an extension of your existing bus. A controller card, called a host adapter, is place into one of your expansion slots. A ribbon cable, containing both data and control signals then connect the host adapter to you peripheral devices.
There are several advantages to having SCSI in you system. If you have a limited number of bus slots, then the addition of a single SCSI host adapter allows you to add up to seven more device by taking up only one slot with older SCSI systems and up to 15 devices with wide-SCSI. SCSI has higher throughput than either IDE or ESDI. SCSI also supports many more different types of devices.
There a five different types of SCSI devices. The original SCSI specification is commonly referred to as SCSI-1. The newer specification, SCSI-2 SCSI: offers speed and performance increases over SCSI-1 as well as adds new commands. Fast-SCSI SCSI: increases throughput to over 10MB/second. Fast-Wide SCSI SCSI: provides a wider data path and throughput of up to 40MB/second and up to 15 devices. The last type, SCSI-3 SCSI: is still being developed as of this writing and it will provide the same functionality as Fast-Wide SCSI as well as support longer cables and more devices.
Each SCSI device has it's own controller and can send, receive and execute SCSI commands. As long as it communicates with the host adapter using proper SCSI commands, internal data manipulation is not an issue. In fact, most SCSI hard disks have an IDE controller with a SCSI interface built onto them.
The fact that there is a standard set of SCSI commands, new and different kinds of devices can be added to the SCSI family with little trouble. However, IDE and ESDI and limited to disk type devices. Because the SCSI commands need to be "translated" by the device, there is a slight overhead. This is compensated for by the fact that SCSI devices are intrinsically faster than non-SCSI devices. SCSI devices also have higher data integrity than non-SCSI devices. The SCSI cable consist of 50 pins, half of which are ground. Since every pin has it's own ground, it is less prone to interference, therefore it has higher data integrity.
On each SCSI host adapter there are two connectors. One is at the top of the card (opposite the bus connectors) and is used for internal devices A flat ribbon cable is used to connect each device to the host adapter. On internal SCSI devices, there is only one connector on the device itself. Should you have external SCSI devices, there is a connector on the end of the card (where it attaches to the chassis). Here SCSI devices are "daisy-chained together.
The SCSI bus needs to be closed in order to work correctly. By this I mean that each end of the bus must be terminated. There is usually a set of resistor (or slots for resistors) on each device. The device that is physically at either end of the SCSI bus needs to has such resistors. This is referred to as terminating the bus and the resistors are called terminating resistors.
It's fine to say that the SCSI bus needs to be terminated. However, that doesn't much to help your understanding of the issue. As with other kinds of devices, SCSI devices reacts to commands sent along the cable to them. Unless otherwise, impeded the signals reach the end of the cable and bounce back. There are two outcomes, both of which are undesirable: either the bounced signal interferes with the valid one or the devices reacts to a second (unique in its mind) command. By placing a terminator at the end of the bus, the signals are "absorbed" and, therefore, don't bounce back.
Figure 02836 and Figure 02837 show examples of how the SCSI bus should be terminated. Note that Figure 02836 says that it is an example of "all external devices." Keep in mind that the principle is still the same for internal devices. If all the devices are internal, then the host adapter would be still be terminated as well as would be the last device in the chain.
Figure 0
Figure 0
If you don't have any external devices (or only external) then the host adapter is at one end of the bus. Therefore, it too must be terminated. Many host adapters today have the ability to be terminated in software, therefore is no need for terminating resistors (also known as resistor packs).
Each SCSI device is "identified" by a unique pair of addresses. This is the controller addresses is also referred to as the SCSI ID and is usually set by jumpers or dip switches on the device itself. Keep in mind that the ID is something that is set on the devices itself and is not related to location on the bus. Note that is in Figure 02836, above, the SCSI ID of the devices are ordered ID 0, 6 and 5.
Care be taken when setting the SCSI ID. It is important that you are sure of what the setting is, otherwise the system will not be able to talk to the device. OpenServer supports SCSI host adapters with multiple buses, therefore this is a triplet of numbers rather than a pair. This increases the possibility of mistakes by 50%.
This sounds pretty obvious, but some people don't make sure. They make assumptions about what they see on the device as to how the ID is set and do not fully understand what it means. For example, I have an Archive 5150 SCSI tape drive. On the back are three jumpers, labeled 0,1 and 2.
I have had customers call in with similar hardware with their SCSI tape drive set at 2. After running 'mkdev tape' and rebooting, they still cannot access the tape drive. Nothing else is set at ID 2, so there are no conflicts. The system can access other devices on the SCSI bus, so the host adapter is probably okay. Different SCSI devices can be plugged into the same spot on the SCSI cable, so it's not the cable. The SCSI bus is terminated correctly, so that's not the problem.
Rather than simply giving up and saying that it was a hardware problem, I suggested that the customer change the SCSI ID to 3 or 4 to see if that works. Well, he can't. The jumpers on the back only allow him to change the SCSI ID to 0, 1 or 2. Then it dawns on me what the problem is. The jumpers in the back are in binary! In order to set the ID to 2, the jumper needs to be on jumper 1 and not jumper 2. Once we switched it to jumper 1 and rebooted, all was well. (Note: I had this customer before I bought the Archive tape drive. Went I got my drive home and wanted to check the SCSI ID, I saw only three jumpers. I then did something that would appall most users: I read the manual! Sure enough, it explained that the jumpers for the SCSI ID were binary.)
Figure 0
An additional problem to this whole SCSI ID business is that manufacturers are not consistent. Some might label the jumpers (or switches) 0,1 and 2. Others label them 1,2 and 4. Still others label them ID0, ID1, ID2. I have even seen some with a dial on them with 8 settings, which makes configuration a lot easier. The key is that no matter how they are label, 3 pins or switches is binary and their values are added to give you the SCSI ID.
Let's look at Figure 02838. This represents the jumper settings on a SCSI device. In the first example, none of the jumpers are set, so the SCSI ID is 0. In the second example, the jumper labeled 1 is set. This is 21 or 2, so the ID here is 2. In the last example, the jumpers labeled 2 and 0 are set. This is 22 + 20 = 4 + 1 or 5.
On an AT-bus, the number of devices added is limited only by the number of slots (Granted the AT-Bus is limited in how far the slot can be away from the CPU and therefore is limited in the number of slots). However, on a SCSI bus, there can be only seven devices in addition to the host adapter. Whereas devices on the AT-bus are distinguished by their base address, devices on the SCSI bus are distinguished by their ID number.
ID numbers range from 0-7 and unlike base addresses, the higher the ID the higher the priority. Therefore, the ID of the host adapter should always be a 7. Since it manages all the other devices, it should have the highest priority. On the newer Wide-SCSI buses, there can be up to 15 devices, plus the host adapter, with SCSI Ids from 0-15.
Now back to our story...
The device address is known as the logical unit number (LUN). On devices with embedded controllers, such has hard disks, the LUN is always 0. All the SCSI devices directly supported by SCO UNIX have embedded controllers. Therefore, you are not likely to see devices set at LUNs other than 0.
In theory, a single-channel SCSI host adapter can support 56 devices. There are devices called bridge adapters that connect devices without embedded controllers to the SCSI bus. Devices attached to the bridge adapter had LUNs between 0-7. If there are 7 bridge adapters, each with 8 LUNs (relating to 8 devices), there are 56 total devices possible.
The original SCSI-1 spec, only defined the connection to hard disks. The SCSI-2 spec has extended this to such devices like CD-ROMS, tape drives, scanners and printers. Provided these devices all adhered to the SCSI-2 standard they can be mixed and match even with older SCSI-1 hard disks.
One common problem with external SCSI devices is the fact that the power supply is external as well. If you are booting your system with the power to that external device turned off, once the kernel gets past the initialization routines for that device (the hardware screen) it can no longer recognize that device. The only solution is to reboot. To prevent this problem, it is a good idea to have all your SCSI devices internal. (This doesn't help for scanners and printer, but since SCO doesn't yet have drivers for them, it's a mute point.)
There are several ways a computer stores the data it works with. Both are often referred to as memory. Long term memory, the kind that remains in the system even if there is no power, is called non-volatile memory and exists in such places as on hard disks or floppies. This is often referred to as secondary storage. Short term, or volatile memory is stored in memory chips, called RAM, for Random Access Memory. This is often referred to as primary storage.
There is a third class of memory that is often ignored, or at least not though of often. This is memory that exists in hardware on the system, but does not disappear when power is turned off. This is called ROM, or Read Only Memory.
We need to clarify one thing before we go on. Read-only memory is as it says, read-only. For the most part it cannot be written to. However, like Random-Access Memory the locations within it can be accessed in a "random" order, that is, at the discretion of the programmer. Also read-only memory isn't always read-only, but that's a different story that goes beyond this book.
The best way of referring to memory to keep things clear (at least the best way in my opinion) is to refer to that memory we traditional call RAM as "main" memory. This is where our programs and the operating system actually reside.
There are two broad classes of memory: Dynamic RAM or DRAM (read Dee-Ram) and Static RAM or SRAM (read Es-Ram). DRAM is composed of tiny capacitors that can hold their charge only a short while before they require a "boost." SRAM is static because it does not require an extra power supply to keep it's charge. As a result of the way it works internally, SRAM is faster and more expensive than DRAM. Because of the cost, the RAM that composes main memory is typically DRAM.
DRAM chips hold memory in ranges from 64k up to 16Mb and more. In older systems, individual DRAM chips were laid out in parallel rows called banks. The chips themselves were called DIPPs, for Dual In-Line Pin Package. These look like you average, run-of-the-mill computer chip, with two rows of parallel pins, one on each side of the chip. If memory ever went bad in one of these banks, it was usually necessary to replace (or test) dozens of individual chips. Since the maximum for most of these chips was 256 kilobits (32Kb), it took 32 of them for each megabyte!
On newer systems, the DIPP chips have been replaced by Single In-Line Memory Modules, or SIMMs. Technological advances have decreased the size considerably. Whereas a few years ago you needed an area the size of standard piece of binder paper to hold just a few megabytes, today's SIMMs can squish twice that much into an area the size of a stick of gum.
SIMMs come in powers of 2 (1, 2, 4, 8, etc) megabytes and are generally arranged in banks of four or eight. Because of the way the memory is accessed, you sometimes cannot mix sizes. That is, if you have four 2Mb SIMMs, you cannot simply add an 8Mb SIMM to get up to 16Mb. Bare this in mind when ordering your system or ordering more memory. You should first check the documentation that came with the motherboard or the manufacturer.
Many hardware salespeople are not aware of this distinction. Therefore, if you order a system with 8 MB that's "expandable" to 128Mb, you may be in for a big surprise. True there are 8 slots that can contain 16Mb each. However, if the vendor fills all eight slots with 1 Mb SIMMs to give you your 8 MB, you may have to throw everything out if you ever want to increase you RAM.
However, this is not always the case. My motherboard has some strange configurations. The memory slots on my motherboard consist of two banks of four slots each. (this is typical of many machines) Originally, I had one bank completely full with four 4Mb SIMMs. When I installed Open Server this was barely enough. Once I decided to start X-Windows and Wabi, this was much too little. I could have increased this by 1Mb by filling the first bank with four 256K SIMMs and moving the four 4Mb SIMMs to the second bank. However, if I wanted to move up to 20Mb, I could use 1Mb instead of 256K. So, here is one example where everything does not have to match. In the end, I added four 4 MB SIMMs to bring my total up to 32 MB. The moral of the story: read the manual!
Another issue that needs to be considered with SIMMs is that the motherboard design may require you to put in memory in either multiples of two or multiples of four. The reason for this is the way the mother board accesses that memory. Potentially, a 32-bit machine could read a byte from four SIMMs at once, essentially reading the full 32-bytes in one read. Keep in mind that the 32 bits are probably not being read simultaneously. However, being able to read them in succession is faster that reading one bank and then waiting for it to reset.
Even so, this requires special circuitry for each of the slots, called address decode logic. The address decode logic receives a memory address from the CPU and determines which SIMM it's in and where on the SIMM. In other words it decodes the address to determine which SIMM is needed for a particular physical address..
This extra circuitry makes the machine more expensive as this is not just an issue with the memory, but rather the motherboard design as well. Accessing memory in this fashion is called "page mode" as the memory is broken up into sets of bytes, or pages. Because the address decode logic is designed to access memory in only one way, the memory that is installed must fit the way it is read. For example, my motherboard requires each bank to be either completely filled or completely empty. Now, this requires a little bit of explanation.
As I mentioned earlier, DRAM consists of little capacitors for each bit of information. If the capacitor is charged, then the bit is 1, if there is no charge, the bit is 0. Capacitors have a tendency to drain over time, and for capacitors this small, that time is very short. Therefore they must be regularly (or dynamically) recharged.
When a memory location is read, there must be some way of determining if there is a charge in the capacitor or not. The only way of doing that is to discharge the capacitor. If it can be discharged, that means there was a charge to begin with and the system knows the bit was a 1. Once discharged, internal circuitry recharges the capacitor.
Now, assuming the system wanted to read two consecutive bytes from a single SIMM. Since there is no practical way for the address decode logic to tell that the second read is not just a re-read of the first byte, the system must wait until the first byte has recharged itself. Only then can the second byte be read.
By taking advantage of the fact that programs run sequential and rarely read the same byte more than once at any given time, the memory subsystem can interleave its reads. That is, while the first bank is recharging, it can be reading from the second, while the second is recharging, it can be reading from the third and so on. Since subsequent reads must wait until previous one have completed, this method is obviously not as fast as simultaneous reads. This is referred to as "interleaved" or "banked" memory.
Figure 0
Since all of these issues are motherboard dependent, it best to check the hardware documentation when changing or adding memory. Additionally, settings, or jumpers, may need to be adjusted on the motherboard to tell it how much RAM you have and in what configuration.
Another issue that addresses speed is the physical layout of the SIMM. SIMMs are often described as being arranged in a "by-9" or "by 36" configuration. This refers to the number of bits that are immediately accessible. So, in a "by-9" configuration 9 bits are immediately accessible with one used for parity. In a "by-36" configuration, 36 bits are available with 4 bit for parity (1 for each 8 bits). The "by-9" configuration come on SIMMs with 30 pins, whereas the "by-36" come on SIMMs with 72 pins. The 72-pin SIMMs can be read 32-bits simultaneously . So, there are even faster than 30-pin SIMM at the same speed.
There are also different physical sizes for the SIMM. The SIMMs with 30 pins are slightly smaller than those with 72 pins. The larger, 72-pin variety are called PS/2 SIMMs as they are used in IBM's PS/2 machines. Aside from being slightly larger, these have a notch in the center so it is physically impossible to mix up the two. In both cases there is a notch on one end. This fits into a key in the slot on the motherboard, which makes putting the SIMM in backwards almost impossible.
SIMMs come in several different speeds, the most common today are between 60-80 nanoseconds. Although there is usually no harm in mixing speeds, there is little to be gained. However, I want to emphasize the word usually. Mixing speeds has been known to cause panics. Therefore, if you mix speeds, it is best keep all the SIMMS within a single bank at a single speed. If your machine does not have multiple banks, then it is best not to mix speeds. Even if you do, remember that the system is only as fast as its slowest component.
Based on the principle spatial locality a program is more likely to be spending it's time executing code around the same set of instructions. This is demonstrated by that fact that tests have shown that most programs spend 80% of their time executing 20% of their code. Cache memory takes advantage of that.
Cache memory, or sometimes just cache, is a small set of very high speed memory. Typically it uses SRAM which can be up to ten times more expensive than DRAM, which usually makes it prohibitive for anything other than cache.
When the IBM PC first came out, DRAM was fast enough to keep up with even the fastest processor. However, as CPU technology increased, so did its speed. Soon, the CPU began to outrun its memory. The advances in CPU technology could not be utilized unless the system was filled with the more expensive, faster SRAM.
The solution to this was a compromise. Using the locality principle, manufactures of fast 386 and 486 machines began including a set of cache memory consisting of SRAM, but still populated main memory with the slower, less expensive DRAM.
To better understand the advantages of this scheme, let's cover the principle of locality in a little more detail. For a computer program we deal with two types of locality: temporal (time) and spatial (space). Since programs tend to run in loops (repeating the same instructions over and over), the same set of instructions need be read in over and over. The longer a set of instructions is in memory without being used, the less likely it is to be used again. This is the principle of temporal locality. What cache memory does is allows us to keep those regularly used instructions "closer" to the CPU making access to them much faster.
Spatial locality is the relationship between consecutively executed instructions. We just said that a program spends more of it's time executing the same set of instructions. Therefore, in all likelihood, the next instruction the program will be executing lies in the next memory location. By filling cache with more than just one instruction at a time, the principle of spatial locality can be taken advantage of.
Is there really such a major advantage to cache memory? Cache performance is evaluated in terms of cache hits. A hit occurs when the CPU requests a memory location and it is already in cache. (it does not have to go to main memory to get it) Since most programs run in loops (including the OS), the principle of locality results in a hit ratio of 85%-95%. Not bad!
On most 486 machines, two levels of cache are used. They are called (logically) first level cache and second level cache. First level cache is internal to the CPU. Although nothing (other than cost) prevents it from being any larger, Intel has limited the first level cache in the 486 to 8k.
Figure 0
Second level cache is the kind that you buy extra with your machine. This is often part of the ad you see in the paper and is usually what people are talking about when they say how much cache is in their system. This kind of cache is external to the CPU and can be increased at any time, whereas first level cache is an integral part of the CPU and the only way to get more is to buy a different CPU. Typical sizes of second level cache range from 64K-256K. This is usually in increments of 64K.
A major problem exists when dealing with cache memory and that is the issue of consistency. What happens when main memory is updated and cache is not? What happens when cache is updated and main memory is not? This is where the cache's write policy comes in.
The write policy determines if and when the contents of the cache are written back to memory. Write-Through cache simply writes the data through the cache directly into memory. This slows things down on writes, but you are assured that the data is consistent. Buffered write through is a slight modification of this, where data is collected and everything is written at once. Write-Back improves cache performance by only writing to main memory when necessary. Write-Dirty is when it writes to main memory only when it has been modified.
Cache (or main memory for that matter) is referred to as "dirty" when it is written to. Unfortunately, the system has no way of telling whether anything has changed, just that it is being written to. Therefore it is possible, but not likely, that a block of cache is written back to memory even if it not "really" dirty.
Another aspect of cache is its organization. Without going into detail (that would take most of a chapter itself) we can generalize by saying there are four different types of cache organization.
The first kind is fully associative. This means that every entry in the cache has an slot in the "cache directory" indicating where it came from in memory. Usually these are not individual bytes, but chunks of four bytes or more. Since each "slot" in the cache has a separate directory slot, any location in RAM can be placed anywhere in the cache. This is the simplest scheme, but also the slowest since each cache directory entry must be searched until a match (if any) is found. Therefore, this kind of cache is often limited to just 4Kb.
Direct-mapped or 1-way set associative cache requires that only a single directory entry be searched. This speeds up access time considerably. The location in the cache is related on the location in memory and is usually based on blocks memory equal to the size of the cache. For example, if the cache could hold 4K 32-bit (4-byte) entries, then the block that each entry is associated with is also 4K x 32 bits. The first 32 bits in each block are read into the first slot of the cache. The second 32 bits in each block are read into the second slot, and so on. The size of each entry, or line, usually ranges from 4 to 16 bytes.
There is a mechanism called a tag, to tell us which of the blocks this came from. Also, because of the very nature of this method, the cache cannot hold data from multiple blocks for the same offset. If, for example, slot 1 was already filled with the data from block 1 and a program wanted to read the data at the same location from block 2, the data in the cache would be overwritten. Therefore, the shortcoming in this scheme is when data is read at intervals that are the size of these blocks, the cache gets constantly over-written. Keep in mind that this does not occur too often as the due to the principle of spatial locality.
The third type is an extension of the 1-Way Set Associative Cache, called the 2-way set associative. Here there are two entries per slot. Again, data can end up in only a particular, slot but there are two places to go within that slot. Granted, the system is slowed a little by having to look at the tags for both slots. However, this scheme allows data at the same offset from multiple blocks to be in the cache at the same time. This is also extended to 4-way set associative cache. In fact, the cache internal to 486 and Pentium has a 4-way set associate cache.
Although this is interesting stuff (at least to me), you may be asking yourself "Why is this memory stuff important as a system administrator?" Well, first, knowing about the differences in RAM (main memory) can aide you in making decisions about your upgrade. Also, as I mentioned earlier, it may be necessary to set switches on the motherboard if you change memory configuration.
Knowledge about cache memory is also important for the same reason, but also because this may be adjustable by you. On many machines, the write policy can be adjusted through the CMOS. For example, on my machine I have a choice of Write-Back, Write-Through and Write-Dirty. Depending on the applications you are running, you may want to change this to improvce performance.
In most memory today, an extra bit is added for each byte. This is a parity bit. Parity is a simple way of detecting errors within a memory chip (among other things). If there is an odd number of bits set, the parity bit will be set to make the total number of bits set an even number. (Most memory uses even parity) For example, if three bits are set, the parity bit will also be set to make the total bits set four.
When data is written, the number of set bits is calculated and the parity bit set accordingly. When the data is read, the parity bit is also read. If the total number of bits set is even, all is well. However, if there is a odd number of data bits set and the parity bit is not set or if there is an even number of data bits set and the parity bit is set, this is not the way it ought to be. A parity error has just occurred.
When a parity error occurs in memory, the state of the system is uncertain. In order to prevent any further problems, the parity checking logic generates a Non-Maskable Interrupt (NMI) and the CPU immediately jumps to special codes called the NMI service routine.
When SCO UNIX is interrupted with an NMI as the result of a parity error, it too realizes things are not good and the system panics. The panic causes the system to stop everything and shutdown. Certain machines support ECC RAM, which corrects parity problems before killing your system.
Even as I wrote this section, the computer industry was shifting way from from the old SIMMs toward extended data out RAM or EDORAM. Although as of this writing (NOV 1995), EDORAM is somewhat more expensive than SIMMS, it is expected that by early 1996, the demand for EDORAM will be such that the price difference will disappear.
The principle behind EDORAM is an extention of the fast page mode (FPM) RAM. With FPM RAM, you rely on the fact that memory is generally read sequentially. Since you don't "really" need to wait for each memory location to recharge itself, you can read the next location without waiting. Since you have to wait until the signal is stabilized, there is still some wait. However, this is much less than waiting for the memroy to recharge. At CPU speeds, greater than 33 Mhz, the the CPU is requesting memory faster than memory can deliver it and the CPU needs to wait.
EDORAM works by "latching" the memory, which means that secondary memory cells are added. These detect the data being out from memory and store the signals so the CPU can retrieve it. This works at bus speeds of 66Mhz. This process can be sped up even faster by including "burst" EDORAM. This extends the locality principle even further. Since we are going to read sequentially, why don't we anticipate the processor and read more than just that single location. In some cases the system will read 128 bits at once.
Keep in mind, however, you cannot just install EDORAM in your machine and expect it to work. You need a special chip-set on your motherboard. One such chip-set is the Intell Triton chip-set.
Sometimes you get people that just don't understand. At first, I thought that they "didn't have a clue”, but that's was really the problem. They had a clue, but a single clue doesn't solve a crime, nor does it help you run an SCO UNIX system.
It seems like a simple thing. You use doscp to copy a program from a DOS diskette onto an SCO UNIX system. In all likelihood the permissions are already set to be executable. So you type in the name of the program and press enter. Nothing happens or you get an error about incorrect format. Hmmm. The software says to it runs on a 386 or higher (which you have), a VGA monitor (which you have) and at least 2 Mb of hard disk space (which you have). Why doesn't it work?
Yes, this is a true story. A customer called in saying that our operating system (SCO UNIX) was broken. This customer had a program that worked fine on his DOS PC at home. It too, was a 386 so there shouldn't be a problem right? Unfortunately, wrong. Granted that in both cases the CPU is reading machine instructions and executing them. In fact, they are the same machine instructions. They have to be.
The problem is comparable to German and English. Although both use (basically) the same alphabet, words (sets of characters) written in German are not understandable by someone reading them as English and visa-versa. Sets of machine instructions that were designed to be interpreted under DOS are not going to be understood under SCO UNIX. (Actually, the problem is a little more complicated, but you get the basic idea.)
Just like your brain has be told (taught) the difference between German and English, a computer needs to be told the difference between DOS and UNIX programs.
In this section we talk about the CPU, the brains of the outfit. It is perfectly reasonable for users and administrators alike to have no understanding of what the CPU is doing internally. However, a basic knowledge of some of the key issues is important, in order to completely understand some of the issues I get into elsewhere.
It's like trying to tune-up your car. Now you don't really need to know how oxygen mixes with the gasoline in order to be able to adjust the carburetor. However, knowing about it makes adjusting the carburetor that much easier.
I don't go into details about the instruction cycle of the CPU, that is how it gets and executes instructions. While I like things like that and would love to talk about them, it isn't really necessary to understand what we need to talk about here. Instead we are going to talk mostly about how the CPU enables the operating system to create a scheme whereby many programs can be in memory simultaneously. These are the concepts of paging and multi-tasking.
Although it is an interesting subject, the ancient history of microprocessors is not really important to the issues at hand. It might be nice to learn how the young PC grew from a small, budding 4-bit system to the gigantic, strapping 64-bit Pentium. However, there are many books that covered this subject and unfortunately I don't have the space. Besides, you can read it elsewhere and SCO UNIX only runs on Intel 80386 (or 100% compatible clones) and higher processors.
So, instead of setting the Way-Back machine to Charles Babbage and his Analytic Engine, we leap ahead to 1985 and the introduction of the Intel 80386. Even compared to it's immediate predecessor, the 80286, the 80386 (386 for short) was a powerhouse. Not only could it handle twice the amount of data at once (now 32-bits), its speed rapidly increased well beyond that of the 286.
New advances were added in to increase the 386's power. Internal registers were added as well as increasing their size. Built into the 386 was the concept of virtual memory. This was a way to make it appear as if there was much more memory on system than there actually was. This substantially increased the system efficiency. Another major advance was the inclusion of a 16-byte, pre-fetch cache. With this the CPU would load instructions before it actually processed them. Thereby, speeding things up even more. Then the most obvious speed increase came by increasing the speed of the processor from 8Mhz to 16Mhz.
Although the 386 had major advantages over its predecessors, at first, it's cost seemed relatively prohibitive, In order to allow users access to the multi-tasking capability and still make the chip fit within their customers' budgets, Intel made an interesting compromise. By making a new chip where the interface to the bus was 16-bits instead of 32-bits, Intel made their chip a fair bit cheaper.
Internally this new chip, designated the 80386SX, is identical to the standard 386. All the registers are there and are the full 32-bits wide. However, data and instructions are accessed 16-bits at a time, therefore requiring two bus accesses to fill the registers. Despite this "short-coming", the 80386SX is still faster than the 286.
Perhaps, the most significant advance of the 386 for SCO is it's paging abilities. We talked a little about paging in the section on operating system basics so you already have a general idea of what it's about. We will also go into more details about paging in the section on the kernel. However, we need to talk about it a littler here to fully understand the power that the 386 has given us and to see how the CPU helps the OS.
SCO does have a product, SCO XENIX, that does run on 286s. In fact, there was even a version of SCO XENIX that ran on the 8086. Because SCO UNIX was first released for the 386, we are not going to go into an more details about the 286 nor the differences between the 286 and 386. instead I will just be describing the CPU used by SCO UNIX as sort of an abstract entity. In addition, since most of what I will be talking about is valid for the 486 and Pentium as well the 386, I will simply call it "the CPU" instead of 386, 486, or Pentium.
(Note: SCO will run also run on non-Intel CPUs. However, the issues we are going to talk about are all common to Intel-based or Intel-derived CPUs.)
I need to take a side-step here for a minute. On PC buses, multiple things are happening at once. The CPU is busily processing while much of the hardware is being access via DMA. Although these are multiple tasks that are occurring simultaneously on the system, this is not what is referred to by "multi-tasking".
When we talk about multi-tasking we are referring to multiple processes being in memory at the same time. Because the time it takes the computer to switch between these processes, or tasks, is much faster than the human brain can recognize, it appears as if they are running simultaneously. In reality, what is happening is that each process gets to use the CPU and other system resources for a brief time and then it's someone else's turn.
As it runs, the process could use any part of the system memory it needed. The problem with this is that a portion of RAM that one process wants may already contain code from another process. Rather than allowing each process to access any part of memory it wants, protections are needed to keep one program from overwriting another one. This protection is built-in as part of the CPU and is called, quite logically, "protected mode." Without it, SCO UNIX could not function.
Note, however, that just because the CPU is in protected mode, does not necessarily mean that the protections are being utilized. It simply means that the operating system can take advantage of the built in abilities if it wants.
Although this capability is built into the CPU, it is not the default mode. Instead, the CPU starts up in what I like to call "DOS compatibility mode." However, the correct term is "real mode." Real mode is a real danger to an operating system like UNIX. In this mode, a there are no protection (makes sense since protections exist in protected mode.) A process running in real mode has complete control over the entire system and can do anything it wants. Therefore, trying to run multi-user system on a real mode system would be a nightmare. All the protections would have to be build into the process as the operating system couldn't prevent a process from doing what it wanted.
Also built in is a 3rd mode. This is called "virtual mode." In virtual mode, the CPU behaves to a limited degree that it is in real mode. However, when a process attempts to directly access registers or hardware, the instruction is caught, or trapped, and the operating system is allowed to take over.
Let's get back to protected mode as this is what makes multitasking possible.
When in protected mode, the CPU can use virtual memory. As I mentioned, this is a way to trick the system into thinking there is more memory that there really is. There are two ways of doing this. The first is called swapping. Here, the entire process is loaded into memory. It is allowed to run it's course for a certain amount of time. When its turn is over, an other process is allowed to run. What happens when there is not enough room for both process to be in memory at the same time? The only solution is that the first process is copied out to a special part of the hard disk called the swap space or swap device. Then, the next process is loaded into memory and allowed its turn.
Because it takes such a large portion of the system resources to swap process in and out of memory. This can be very inefficient. Especially when you have a lot of process running. Let's take this a step further, what happens if there are too many process and the system spends all of it's time swapping? Not good.
In order to avoid this problem, a mechanism was devised whereby only those parts of the process that were needed are in memory. As it goes about its business, a program may only need to access a small portion of it's code. In fact, empirical tests show that a program spends 80% of its time executing 20% of its code. So why bother bringing in those parts that aren't being used? Why not wait and see if they are used?
To make things more efficient only those parts of the program that are needed (or expected to be needed) are brought into memory. Rather than accessing memory is random units, it is divided into 4K chunks, called pages. Although there is nothing magic about 4K, per se, this value is easily manipulated. In the CPU, data is referenced in 32-bit (4 byte) chunks and 1K (1024) of them is a page (4096). Later you will see how this helps things work out.
As I mentioned, only that part of the process currently being used needs to be in memory. When the process wants to read something that is not currently in RAM, it needs to go out to the hard disk to pull in the other parts of the process. That is it goes out and reads in new pages. This process is called "paging". When the process attempts to read from a part of the process that is not in physical memory, a "page fault" occurs.
One thing we must bear in mind is that fact that a process can jump around a lot. Functions are called which sends the process off somewhere completely different. It is possible, likely for that matter, that the page containing the memory location to where the process needs to jump to is not currently in memory. Since it is trying to read a part of the process not in physical memory, this too is called a page fault. As memory fills up, pages that haven't been used in some time are replaced by new ones. (Much more on this whole business later.)
Assume that a process has just made a call to a function somewhere else in the code and the page needed is brought into memory. Now there are two pages of the process from completely different parts of the code. Should the process take another jump or returns from the function, it needs to know if where it is going is in memory or not. The operating system could keep track of this. However, it doesn't need to. The CPU will keep track for it.
Stop here for a minute! This is not entirely true. The OS must first set-up the structures that the CPU uses. However, it is the CPU that uses these structures to determine If a section of a program is in memory or not. Although not part of the CPU, but rather RAM, the CPU administers the RAM utilization through page tables. As their names imply they are simply tables of pages. In other words, they are memory locations in which other memory locations are stored.
Confused? I was at first, so let's look at this concept another way. Each running process has a certain part of it's code currently in memory. The system uses these page tables to keep track of what is currently memory and where it is physically located. To limit the amount the CPU has to work, each of these page tables is only 4K or one page in size. Since each contain a set of 32-bit addresses, a page table can contain only 1024 entries.
Although this would imply that a process can only have 4K*1024, or 4Mb loaded at a time, there is more to it. Page tables are grouped into page directories. Like the page table, the entries in a page directory point to memory locations. However, rather than pointing to a part of the process, page directories point to page tables. Again, to reduce the work of the CPU, a page directory is only one page. Since each entry in the page directory points to a page, this means that a process can only have 1024 page tables.
Is this enough? Let's see. A page is 4K or 4096 bytes, which is 212. Each page table can refer to 1024 pages. This is 210. Each page directory can refer to 1024 page tables. This is also 210. Multiply this out we have:
page_size * pages_in_page_table * page_tables_in_page_directory
or
(212) * (210) * (210) = 2 32
Since the CPU is only capable of accessing 232 bytes, this scheme allows access to every possible memory address that the system can generate.
Are you still with me?
Inside of the CPU is a register called the Control Register 0 or CR0 for short. There is a single bit in this register that turns on this paging mechanism. If turned on, any memory reference that the CPU gets is interpreted as a combination of page directories, page tables and offsets, rather than an absolute, linear address.
Build into the CPU is a special unit that is responsible to make the translation from the virtual address of the process to physical pages in memory. It's called (what else?) the Paging Unit. To understand more about the work the Paging Unit saves the operating system or other parts of the CPU, let's see how the address is translated.
Figure 0
When paging is turned on, the Paging Unit receives a 32-bit value that represents a virtual memory location within a process. The Paging Unit takes theses values and translates them as shown in Figure 028311. At the top we see that the virtual address is handed to the paging unit which converts it to a linear address. This is not the physical address in memory. As you see, the 32-bit linear address is broken down into three components. The first 10 bits (22-31) are the offset into the page directory. The location in memory of the page directory is determined by the Page Directory Base Register (PDBR).
The page directory entry contains 4 bits which point to a specific page table. The entry in the page table, as you see, is determined by bits 12-21. Here again, we have 10 bits, which means each entry is 32 bits. These 32 bits point to a specific page in phyiscal memory. Which byte we are referencing in physical memory is determined by the offset portion of the linear address, which is bits 0-11. These twelve bits represents the 4096 (4K) bytes in each physical page.
Keep in mind a couple of things. First, page tables and page directories are not part of the CPU. They can't be. If a page directory were full, it would contain 1024 references to 4K chunks of memory. For the page tables alone, you would need 4Mb just for the page tables! Since this would create a CPU hundreds of times larger than it is. Page table and directories are stored in RAM.
Next, page tables and page directories are abstract concepts that the CPU knows how to utilize. They occupy physical RAM and operating systems such as SCO UNIX know how to switch on this capability within the CPU. All the CPU is doing is the "translation" work. When it starts, SCO UNIX turns on this capability and sets-up all the structures. This structures are then handed off to the CPU, where the Paging Unit does the work.
As I just said, a process with all of its page directory entries full would require 4Mb just for the page tables. However, this would imply that the entire process is somewhere in memory. Since each of the page table entries points to physical pages in RAM, you would need 16Gb of RAM. Not that I would mind having that much RAM, it is a bit costly and even if you had 16Mb SIMMs you would need 1000 of them.
Like pages of the process, it's possible that a linear address passed to the Paging Unit translates to a page table or even a page directory that was not in memory. Since the system is trying to access a page (which contains a page table and not part of the process) that is not in memory, a page fault occurs and the system must go get that page.
Since page tables and the page directory or not really part of the process, but are important only to the operating system, a page fault causes these structures to get created rather than read in from the hard disk or elsewhere. In fact, as the process is starting up, all is without form and void. No pages, no page tables and no page directory.
The system accesses a memory location as it starts the process. The system translates the address as we described above and tries to read the page directory. It's not there. A page fault occurs and the page directory must be created. Now that the directory is there, the system finds the entry that points to the page table. Since no page tables exist, the slot is empty and another page fault occurs. So, the system needs to create a page table. The entry in the page table for the physical page is found to be empty, therefore another page fault occurs. Finally, the system can read in the page that was referenced in the first place.
Now this whole process sounds a bit cumbersome, but bear in mind that this amount of page faulting only occurs as the process is being started. Once the table is created for a given process, it won't page fault again on that table. Based on the principle of locality, the page tables will hold enough entries for a while, unless of course the process goes bouncing around a lot.
The potential for bouncing around brings up an interesting aspect of page tables. Since page tables translate to physical RAM in the same way all the time, virtual addresses in the same area of the process end up in the same page tables. Therefore, page tables get filled up since the process is more likely to execute code in the same part of a process than elsewhere (this is spatial locality).
There is quite a lot there, huh? Well, don't get up yet as we're not finished. There are a few issues that we haven't addressed.
First, I often referred to page tables and the page directory. Each process has a single page directory (it doesn't need anymore). Although the CPU supports multiple page directories, there is only one for the entire system. When a process needs to be switched out, the entries in the page directory for the old process are overwritten by the ones for the new process. The location of the page directory in memory is maintained in the Control Register 3 (CR3) in the CPU.
There is something here that bothered me in the beginning and may still be bother you. As I described above, each time a memory reference is made, the CPU has to look at the page directory then a page table then calculate the physical address. This means that for every memory reference, the CPU has to make two more references just to find out where the next instruction or data is coming from. I though that was pretty stupid.
Well, so did the designers of the CPU. They have included a functional unit called the Translation Lookaside Buffer, or TLB. The TLB contains 32 entries and like the internal and external caches point to sets of instructions, the TLB points to pages. If a page that is being looked for is in the TLB, a TLB hit occurs. (just like a cache hit) As a result of the principle of spatial locality, there is a 98% hit rate using the TLB.
When you think about it, this makes a lot of sense. The CPU does not just execute one instruction for a program, then switches to something else. It executes hundreds or even thousands before it is someone else's turn. If each page contains 1024 instructions and the CPU executes 1000 before it's someone else's turn, all 1000 will most likely be in the same page. Therefore they are all TLB hits.
Now, let's take a closer look at the page table entries themselves. Each is a 32-bit value, pointing to a 4K location in RAM. Since it is pointing to an area of memory larger than a byte, it does not need all of the 32 bits to do it. Therefore, it has some bits left over. Since the page table entry points to an area that has 220 bytes (4096 bytes = 1 page), there are 12 bits that it doesn't need. These are the low order 12 bits and the CPU uses them for other purposes related to that page. A few of them are unused and the operating system can, and does, use them for its own purposes. There are also a couple reserved by Intel and should not be used.
One of the bits, the 0th bit, is the present bit. If this bit is set, the CPU knows that the page being referenced is in memory. If not set, the page is not in memory and if the CPU tries to access it, a page fault occurs. Also,