First, certain Fox News reporters and some military types drive me nuts mispronouncing this. It's "kash", not "kash-ay" (cachet, which is pronounced that way, is related by origin but is quite distinct in meaning) and that's all that needs to be said. Except perhaps "Foo!" toward those who perpetuate this barbarism. I get grumpy about this kind of thing: I also deplore the recent trend of using "monetize" where "commercialize" is the proper word. Anyway..
The barbarians are referring to weapons caches, and we'll be looking at data caching (instructions just being a form of data too), but it's the same idea: storing something where you can get it when you need it. In the case of computers, we aren't trying to hide data, just make our access to it quicker or more convenient. Speed sometimes come from faster storage: ram is faster than a disk drive, cpu cache memory is faster than ram, and on chip pipelines are faster still. Other times it comes from using indexes (often hashes), and very often both are used.
There are multiple levels of caching going on in any modern computer system. Consider just a simple "cat /etc/termcap". There's so much going on there I'll probably miss something, but we'll give it a try:
No doubt I've forgotten something, but you probably get the idea: there are a LOT of caches in use. Maintaining those caches and keeping them accurate is the hard part. If good chunks of /etc/termcap are in cache, and some other program now modifies that file, future requests for the affected data blocks need to return the correct bytes. Keep in mind that the writing of data is buffered too, so this has to be very carefully managed. It would be unusual for a program like "cat" to care about data changes invalidating what it has in its own buffers, but some programs do have to worry about such things.
CPU caches have their own data consistency issues. While part of that housekeeping is up to the OS, on multiprocessor systems lower level hardware support has to be involved too. The biggest problem is that each CPU has its own cache(s) and consistency simply has to be maintained between them (a very good book that will tell you more than you may want to know about this is reviewed at Unix Systems for Modern Architecture).
Caching becomes more important as larger disparities in hardware speed occur. CPU speed exceeded ram speed quite some time ago: without deep caches, modern processors would be constantly stopping waiting for ram to provide the next instruction. While not completely accurate and blatantly ignoring some rather important details, a 100 Mhz CPU would need 10 nanosecond memory if you weren't caching. That's a lousy 100 Mhz CPU - what we run nowadays is just a little bit faster, right? But our ram hasn't kept up at all. Without lots of caching, these gigahertz CPU's would be pointless (in many applications, they are pointless anyway).
Of course disk drives are so slow that they'd seem nearly motionless from the CPU's point of view.
When we bring network access into the picture, even more caches and buffers come into play (some might argue that a buffer isn't a cache, but in fact it is: it's just more temporary). Your web browser caches the very bytes you are looking at now, and both your machine and my server went through the various caches described above to get that data on the screen to start with. Clusters and other forms of parallelism maintain caches between themselves also. Like the problems of CPU caches touched on above, maintaining coherency of data without sacrificing speed is likely to be a major part of the design goals.
You might say that the opposite of caching is archiving: moving less frequently accessed data to a slower or less convenient location. However, that is a form of caching too: the archival cache is more convenient in that it doesn't sap more expensive or more limited resources. There are archival systems that are at least somewhat transparent to the user: if a resource is not available from cache memory, and isn't on disk, it will be automatically loaded from tape or cdrom or wherever else it might be stored. The virtual memory system concept is extended to the disk, though the wait time might be quite long.
You might also be interested in cache data corruption and Invalidating the Linux buffer cache.
Got something to add? Send me email.
More Articles by Tony Lawrence © 2012-03-11 Tony Lawrence
Perl: The only language that looks the same before and after RSA encryption. (Keith Bostic)
Printer Friendly Version
Caches Copyright © September 2003 Tony Lawrence
Have you tried Searching this site?
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.
Printer Friendly Version