This is part of a series of articles that covers the booting of an OSR5 machine. See Booting OSR5 for other related articles.
Just about anything you are going to do with a computer involves files. Unix in general treats just about everything as a file: devices, directories, named pipes- they are all just files. Other OS's don't necessarily do that; for example there's no visible /dev directory on a Windows system-though if you are programming in "C", for example, you can open "/dev/lp" and print to it just as you would in Unix! Besides that, though Windows doesn't usually let you have direct access to its devices: a tape backup program, for example, knows how to write to your tape drive, but you'd have no way to directly access it yourself as you do in Unix. The concept of "everything is a file" is one of the distinguishing characteristics of Unix systems. Files are created on filesystems, filesystems are created on divisions (SCO's terminology) and divisions are created within partitions.
Partitions are created with "fdisk". That's true whether you are running SCO, NT or Linux. You might use multiple partitions so that you can "dual boot"- run more than one OS on the same computer. It doesn't matter which OS goes in which partition (though it's usually advisable to install Windows first), but one of the partitions must be marked "active": that's the partition that will boot by default. DOS allows four partitions, one of which can be an "extended partition". Dos, Windows, and Linux make use of extended partitions, SCO does not.
There are two reasons for installing Windows before Unix or Linux. The first is that older versions of Windows were a little stupid about where their partition actually ended, and would sometimes scribble a little extra data beyond where they should have. We hope they've fixed that by now, but the disk geometry problem still remains:
One thing NOT to forget- a lot of advice suggests leaving root fairly small. That's OK in and of itself, but keep in mind that on OSR5, /tmp is NOT a separate filesystem by default, and many things make heavy use of /tmp for (duh!) temporary files. I've had numrerous "incidents" of mysterious failures on Linux systems from exactly this, and I've seen it a handful of times on SCO systems that I didn't install. Given the current cost of diskspace, I routinely give 2-4GB for root- it's massive overkill, but I then have one less thing I have to worry about- even my home Linux box has a 2GB root and is typically only about 40% used- which is fine by me; cheap insurance.
A hard drive is composed, of course, of cylinders, heads, and sectors, but nowadays that is more fiction than reality. In other words, the actual physical design of the disk is unimportant and may not even be known, and it can be accessed in numerous different combinations of cylinders, heads and sectors per track that will calculate out to the proper size. The problem is that Dos/Windows have certain specific ideas about what sort of numbers to use, and that may not be what the Unix/Linux fdisk would choose in the absence of other information. However, if the Dos/Windows is installed first, then the Unix fdisk knows what geometry was used and can match it.
Should you find yourself with a drive that has lost its concept of geometry, you can force it by the "dparam" command: see "man dparam". Boot programs can be restored with "instbb". If you can't boot, of course, you'll need a boot floppy or something else: see Emergency Boot.
You can create more than one Unix partition within fdisk, but you generally wouldn't need to unless you will need more filesystems than will fit in the first partition. Each Unix partition can be divided into up to 7 file systems, but since the first partition usually has swap, boot and recover as well as root, that only leaves 3 additional filesystems. If you will need more, you'll need multiple Unix partitions (or, of course, just more physical drives, each of which in turn can have 4 fdisk partitions and 7 filesystems within each partition).
If you have multiple partitions, you'll need to run divvy on them AFTER the installation. The command would be "divvy /dev/hd02" for the second partition on your primary disk, "divvy /dev/hd12" for the second drive, etc. See "man HW hd" for more examples. You'd then create and name filesystems, and then run "mkdev fs" to finish up. See Adding another hard drive for more details.
Remember, if you are using divvy on a disk with existing data, DO NOT use the (c) option to create filesystems. Simply name the divisions you want to access. You have to use a mame that doesn't already exist; you can't use "root"- but "oldroot" is fine. Also, "mkdev fs" is not destructive in anyway- it will create and populate /lost+found (see below) if necessary, but it will not harm anything; it will just add the filesystem to /etc/default/filesys so that it can be mounted. You can do this manually if you wish.
Bela Lubkin summed up divvy in a newgroup posting:
From - Sat Sep 29 07:10:25 2001 Newsgroups: comp.unix.sco.misc From: Bela Lubkin <[email protected]> Subject: Re: divvy Message-ID: <[email protected]> References: <[email protected]> Sender: [email protected] Lodo Nicolino wrote: > I would like to know where divvy write the information. Ex: block, > filesystem name etc Please see these Technical Articles: http://aplawrence.com/cgi-bin/ta.pl?arg=106296 http://aplawrence.com/cgi-bin/ta.pl?arg=104384 http://aplawrence.com/cgi-bin/ta.pl?arg=107180 I'll add one thing, which is probably covered in one of those articles but worth mentioning again: what you see when you run `divvy` is actually a compendium of information compiled from three different sources. First, there's the divvy table on the partition. This tells us the start and end block numbers of each division. Second, `divvy` searches the /dev directory on the active root filesystem, looking for device nodes whose major/minor numbers correspond to those of the various divisions being looked at. For instance, on an OSR5 root partition, /dev/root is usually device 1/42. When you run divvy it does not find the string "root" in the division table. It computes that the device number of division #2 on this partition would be 1/42. Then it looks in /dev, notices /dev/root is 1/42, and displays "root" in the on-screen table. This is significant because if you boot off a recovery floppy, it will only know the device names of your divisions if their device nodes have been copied to the floppy. Third, it actually _reads_ the first few K bytes of each of the divisions in order to comment on what _type_ of data is present. In the case of 1/42, it opens /dev/root, reads a bit, and (in most cases) determines that it's an HTFS filesystem. So it displays "HTFS". It reads /dev/boot and learns that it's "EAFS"; it reads /dev/swap and doesn't recognize it as any particular filesystem type, so displays "NON FS". When you _change_ division start/end points, divvy writes the new information to the division table at the beginning of the partition. When you change division _names_, divvy deletes /dev/oldname and /dev/roldname and creates /dev/newname, /dev/rnewname with the right device numbers. When you "change the type" of a division in divvy, it has no effect. Only when you also tell it to "create a new filesystem" does it do anything. Then, when you tell it to act on your wishes (i.e. when you q[uit], i[nstall]), it runs `mkfs` to create a filesystem of the requested type. Assuming that succeeds, next time you enter divvy it will show the new type. (This of course destroys the previous contents of the filesystem; as would changing a division's start/end points. Be careful while experimenting with divvy!) >Bela<
With a 5.0.x or Unixware install, you are going to have at least two "real" filesystems: /stand and / (root). You'll also have swap and recover, and possibly a "scratch" filesystem. Both recover and scratch are related to "fsck" which is discussed further below.
An HTFS filesystem can be as large as one terabyte. A Unixware 7 file system can also be that large. However, on OSR5 an individual file is limited to 2GB, while Unixware allows a file to be 1 TB for at least some purposes (not all Unixware commands can handle files iover 2GB). Current Linux systems also limit file size to 2GB.
I don't think I've heard any mention of "low level format" in many a year now, but there was a time when that phrase was tossed around loosely and inaccurately.
This old post spelled out the reality of disk design and is interesting in a historical context.
This reminded me of another utility from that era - Gibson Research Spinrite. Hard drives have become so reliable that I was really surprised to see that Spinrite still sells that product! Their FAQ page has this note:
I hope no one does still have those drives!
Newsgroups: comp.unix.sco.misc From: [email protected] (Bill Vermillion) Subject: Re: How do I low-level-format an IDE Drive? Message-ID: <[email protected]> Date: Fri, 9 Apr 1999 15:38:56 GMT In article <[email protected]>, Tony Earnshaw <[email protected]> wrote: >Frank Overstreet wrote: >> ... Now I want to low level format the drive and am wondering >> if the Western Digital utility wddiag.exe is what I need. When >> the readme describes writing all 00's is that the same as a >> low-level-format. If not please help. >If you even attempt to low-level format a modern IDE disk, you'll ruin >it. This has long been the case. Writing all 00s is not the same as a >low-level format, it's what it says. >Low level formatting was possible, after manufacture, with (now) almost >prehistoric drives to remagnetize surfaces (thereby removing corruption >and thus sometimes repairing some bad spots on the surfaces). These >drives did not carry translation tables, as modern drives do. The old wives tale of 'remagnetizing' the drives is just a myth. Magnetic media is quite stable - it's the environment that does them in re-acting with binders in coated media. Media for the most part is plated/sputtered today, so the only problem is the decaying of the particles. It just doesn't happen - at least in a computers lifetime. This excludes catastophic events of course, and high heat levels - above 150F you are going to have problems. One of the ways the myth seemed to get started was on the old MFM drives of the ST-506 heritage. These were all 'stepper' drives. eg - a motor turned x degrees and ratchets the head across the drive surface. (In the floppy arena it was typically to have to re-aline a 5.25" disk every 6 months when used in heavy duty service. I did that but was pushing them 24x7x365. The first drives would last about a year, and when the 1/2 heights came out you could expect 4 years approximately - MTBF was about 20,000 hours for those). The mechanism would wear over time and when the drive was issued commands to pulse/step the drive to the track, after a time it the head would not be positioned exactly in the center of the track set by the original format, and a reformat would then bring back the performance as first seen as the platter to stepper were now in sync with the worn portions. To try to improve performance embedded servos were being used. This was a servo burst in between sectors. Doing a real low-level format meant the drive had to go back to the factor for a new format and servo. It was expensive. Typically the servo looked like a 'wedge' if you viewed it magnetically as the outer tracks had the bits spread further apart. Then came the dedicated servo drive - with the bottom platter being used only for servo. This is why you'd see drives with and even number of platters, but one less than the total for data. These are the drives that perform the thermal recal because as the enivornment changes the metals contract and expand and the bottom head is controlling the position of all other heads on the stalk. Current technolgy is embeded servo again - but there's no way a user can screw these up - as the old drives were controlled by cards external to the drive, and the new ones are integral to the units. This eleminated thermal recallibration, ZBR (zone bit recording) gives a different number of sectors available on different track groups. Low-level reformating really needs never to be done. Worst case - to get rid of some pesky droppings by some ill-behaved program, or programming concept, would be the destructive verify in the controller. But 'reformatting to refresh the format' is something left over from DOS circa 1985. -- Bill Vermillion bv @ wjv.com
Typically, the BIOS can only access the first 1024 cylinders of the drive. This is not a problem once Unix is up and running, but it is the built in BIOS that starts the boot process is used for the first stages , so it's usually important to be sure that the /stand filesystem will always fall under that limit. Remember, this is a limitation of the BIOS, not of SCO Unix.
In most cases nowadays, that's where I leave it, a small /stand and the rest of the disk or partition as one big file system.
There are, however, reasons to have separate filesystems:
There isn't space on the hard drive to fit everything you need. This, of course, was very common in the days of small hard drives, but seldom is an issue nowadays. It certainly isn't likely to be an issue during the installation- nothing you'd install initially requires anything but a fraction of even a 4 gig disk, and you are probably using something even larger.
After the install is a different story, of course. You may want to install all kinds of things that take up all kinds of space, and you may need or want another drive to hold it all. For example, you might want to install SKUNKWARE. Normally this installs to /var/opt/K/SKUNK98 (or SKUNK99, etc). You could force that to go elsewhere by making /var/opt/K/SKUNK99 a symbolic link pointing to another filesystem. See Secondary drives and Disk space for more ideas about that.
Related to that is the situation where you WANT the data on a separate drive for performance reasons, or because the other drive is going to physically removable, etc.
You want to control how much data gets put on a drive. For example, in some environments, I'll make /var/spool/lp/temp a small filesystem of its own. This causes it to fill up if there are too many unfulfilled print jobs, which calls attention to the problem before it really gets out of hand and fills up something more important. The idea here is that it's better not to print than not to work at all.
A similar concept might apply to temporary directories, but keep in mind that the booting system is apt to need those too, particularly /tmp and possibly even /usr/tmp, so those need to be available during boot and in single user mode.
You can't fit an entire drive on your backup media, and want to keep volatile data on separate filesystems to make it easier to use archaic backup programs like "dump". This is unlikely to be a problem nowadays except for the very largest systems. Unless it is simply impossible to do it because of time or medium constraints, you should ALWAYS be backing up everything every day. If you can't do that, you still probably do not want to be using programs like dump that depend on organization into separate filesystems. Modern backup utilities let you easily specify what to include and what to leave out. While on this subject, I'll mention in passing that while incremental backups (backing up data that has been modified today) are tempting, they are not fun when it comes time to restore. Aside from the physical annoyance of having to restore multiple tapes, you are also likely to end up restoring data that was actually deleted and should not be restored. See /Reviews/supertars.html
You want to be able to do upgrades or reinstalls and leave some filesystems untouched. This remains a valid reason for separating certain areas from others, but with the speed and capacity of modern backup systems, it is hardly as compelling as it used to be.
You want to be able to clean the filesystem in the event of a crash. Older large filesystems, took a long time to clean and fsck needed more memory than was likely to be present, so it would need scratch files, which slowed it down further. It was not at all unusual for these filesystems to get confused for no particular reason; not from a crash, just because, so it was obviously better to clean one or two small filesystems now and then as opposed to having to clean one big filesystem every time this happened. Linux filesystems still have that mentality, btw, and will automatically run fsck after x number of boots and/or x number of days. Older Sun filesystems would run it on EVERY boot. Modern filesystems very seldom need to run fsck anyway, so this is not an issue. I imagine it won't be an issue for Linux, either, once it catches up in this area.
Partial drive failure
You want to contain the damage. On older systems, it was often observed that if you had physical or electronic damage, it was sometimes unrecoverable by fsck, but that it was very apt to be confined to one filesystem. Therefore, spreading the filesystems out made it more likely that more of your data survived a crash. Again, this is unlikely to be an issue with modern filesystems, and as we tend to back up more data more often from and to more reliable media, it's even less important.
Treat data differently
John Dubois pointed out something I hadn't thought of
Using separate filesystems for certain data also means that you can use mount options appropriate for that data - tmp, nolog, ronly, etc.-
None of these issues may apply to you, and if they don't there's nothing at all wrong with one large root filesystem.
Different OS's have different ideas about what needs to be available as the system boots. OSR5, for example, is very fussy: it must have /usr and /tmp- which means these CANNOT be separate filesystems. People accustomed to other Unix versions sometimes get themselves in trouble here. With OSR5, don't relocate system directories elswhere- if you want to make /u, /home, or /users (not /usr), that's fine- but leave the system stuff on the root.
On Unixware 7, /tmp is usually a ram disk. Linux doesn't mind you putting /usr elsewhere because it keeps its necessary programs in /sbin. In short, you have to know what's important and why, and very often the install manuals don't help you very much.
Modern file systems are very immune to damage. Although you should shutdown properly (man shutdown) before powering off, and should have your machine attached to a UPS so it doesn't go off unexpectedly, it is very likely that absolutely nothing bad will happen if shutdown isn't run.
When there is a crash, or just a power off, the filesystem will probably be marked "dirty", meaning that there was data in memory that had not yet been flushed out to the disk when the system went down. On older systems (and on Linux through at least December 1999) that situation required running "fsck" to check and repair the damage.
Repairing is what "fsck" does, and it does it incredibly well. It needs to be run on an unmounted filesystem, or if that's impossible (root is always mounted), in single user mode. NEVER RUN FSCK on root in multi-user mode. You'll very likely cause irrecoverable damage. Think about what's happening here: fsck is reading through all the data and directories on the disk, trying to make sense of everything, and at the same time, some daemon running in multi-user mode is changing things! You'd be lucky not to really screw things up, and how often are you that lucky?
If you have a large filesystem and a small amount of memory, fsck may need a "scratch" file. That scratch file may have already been created for you during the original install, but fsck doesn't necessarily know to use that automatically. There's a flag you can add to /etc/default/filesys that will tell fsck to use that, but you would have had to add that yourself (see "man fsck"). Most likely, though, you've been ambushed- you are booting a crashed system, fsck says it wants a scratch file, and you don't know if you have one. What do you do? If it's the root file system that fsck is working on, throw a floppy in (it doesn't have to be formatted for Unix but it does need not to have bad blocks) and tell it to use /dev/fd0. If it's a secondary filesystem, you can just tell it the name of a file on the root filesys- /tmp/scratch is fine.
The "recover" filesystem mentioned above is used by fsck to save its output in the case of an autoboot after a crash where it runs automatically (assuming that you've said that's OK in /etc/default/boot). You can see this data get picked up in /etc/rc.d/9/reserved.
The lost+found directory (each filesystem needs its own /lost+found) is used by fsck as a place to put files that are valid (have valid disk blocks allocated and a non-zero reference count) but somehow aren't listed in any directory. Ordinarily, these would be temporary files caused by a common programming trick of opening a temporary file, and then immediately removing it. The file remains "open" for the program to read and write to, but the data blocks will be immediately removed when the program exits- unless, of course, the system crashes. So, normally, things you find in lost+found won't be anything you need or care abbout, but in the rare case of more serious problems, these could be. If you can identify what the files are (they'll only be identified by their original inode number) you can move them back where they belong. If an entire directory ends up in lost+found, then the files within it will have names (because, of course, the names are stored in that directory) and that may be helpful in determining where the directory belongs.
It is fairly rare to see fsck needed on modern HTFS filesystems. This is because the driver keeps logs of work it needs to do with regard to data blocks that need to be written, and it can quickly read those logs and determine exactly what, if anything, needs to be done. In fact, fsck doesn't usually do any more than examine those logs- if you need to force it to really do its work, you'll need to run it as "fsck -ofull".
In the rare event that there is damage that isn't fixed automatically, you may need to run "fsck -ofull" manually. In such situations, consider adding the "-y" flag to give default answers to any and all questions that fsck may ask. It is unlikely that you have better knowledge or ability to fix any problem and in fact answering "n" to any question is just going to leave you with unresolved issues that you probably need exotic knowledge and lots of experience to fix manually (with fsdb or something like it). For most of us mortals, the best choice is to let fsck make its own decisions.
The File System Debugger is "fsdb". As alluded to above, someone with deep expertise could repair a damaged filesystem by hand using it. Real use of this requires intimate knowledge of filesystem design, but even those of us who'd rather not know that much about the internals will find it useful now and then. For example, suppose you suspect that a certain large file is excessively fragmented. To find out, you just needs its inode number (get that from "ls -li") and what filesystem it is on. Lets say it is inode number 54618 and it is on the root filesystem:
echo "54618i" | fsdb /dev/root
That dumps the entire inode data, including the locations of the data blocks. To follow these blocks all the way, you need to understand their structure; see the article bfind.c: Finds which file contains a block for an introduction to that. Steve Pate's Unix Internals tells a more complete story of OSR5 filesystems.
If you want to play with fsdb, do so on a disposable filesystem. You could use a floppy, or spare division or partition if you have one.
Speaking of fragmentation, it seldom makes swense to "defrag" a modern filesystem. In the first place, it's much less likely to be fragmented because of its design. But more importantly, if this is a typical multi-user system, consider that disk block requests are coming from all the different users, and are likely to be scattered all about the drive anyway. The OS and sometimes the underlying hardware will do the best they can to arrange those requests for the best access, but since the users will have different requests, it isn't likely to help much to have files in sequential blocks. If you still feel you need to defragment your drive, use one of the Supertars to do it- just wipe everything out and restore it- everything will be sequential.
Other than low level tools like fsdb, you generally have to mount a filesystem to have any useful access to it. That can be done manually, but it's usually easier to have it done automatically from /etc/default/filesys. You can add entries manually, or you can let "mkdev fs" do it for you. One disadvantage of "mkdev fs" is that it doesn't allow you to add optional flags you may want (such as to specify a scratch file- see "man filesys" and "man mount" for other options). Nothing says you can't use "mkdev fs" and then go in by hand to add what you need, though.
When a filesystem is listed in /etc/default/filesys, you can use a shorthand form of "mount". For example, suppose I had this entry in /etc/default/filesys:
bdev=/dev/jaz cdev=/dev/rjaz \ mountdir=/jazdrive mount=yes fstyp=HTFS \ fsck=dirty fsckflags= rcmount=yes \ rcfsck=dirty mountflags=
I could mount this by just saying "mount /jazdrive" (referring to the mountpoint).
Even better, since the Jaz drive is removable media, and I use it for both Dos and Unix files, I can leave out the "fstyp=HTFS":
bdev=/dev/jaz cdev=/dev/rjaz \ mountdir=/jazdrive mount=yes \ fsck=dirty fsckflags= rcmount=yes \ rcfsck=dirty mountflags=
and when I just say "mount /jazdrive" as above, "mount" will figure out what sort of filesystem I have and mount it.
A damaged file system won't mount until it has been cleaned with fsck, but if fsck won't do it, you might be able to mount it read-only and at least recover your data with a quick backup. It's certainly worth a try.
SCO Unix had a long history of CD confusion. Originally they only supported SCSI cd's, but even those had issues.
IDE support became available, but it was confusing to configure as the system did not auto-detect; you had to show it where the devices were connected. The IDE driver itself went through several iterations. See the "wd.delay" section of Booting OSR5 - Definitions for some of the EIDE issues.
Strange problems likely had something to do with a timing issue in the wd driver.
I'm not sure that 18.104.22.168 could read Windows CD's. ISO 9660 compatible systems are supposed to just ignore Joliet extensions (see ISO 9660 extensions), but I'm not sure that it did. Possibly 3.2v4.2 lacked that ability - Windows 95/98 couldn't always read Rock Ridge CD's, so that's certainly possible.
Even as late as Windows 2000 there were incompatibilities and the early versions of OS X didn't create Windows readable CD's by default.
If you want portability for older systems (and don't need file system specific metadata), just use flat ISO-9660.
On non-root HTFS filesystems, there's a very interesting and unusual file that is usually invisible- or almost invisible, and that's one oof the strange things about it. You can't see it unless you are completely specific:
ls -l .slog0000
(remember, you only have this on non-root HTFS filesystems). If you try wild cards, you won't see it. You can't read it, you can't remove it- it's metadata- that is, it is supposed to be there for the use of the file system drivers only (I have no idea why it isn't on the root filesystem). Its stated purpose is to speed up synchronous writes.
It is possible (under really strange conditions) for this file to become corrupt and/or visible (visible under ordinary listings). This can cause backup programs (which ordinarily wouldn't even notice it) to complain that they cannot open it, or (depending on the design of the program) even to fail entirely. The OSS497C and OSS497D patches supposedly prevent this from happening on OSR5.0.4/5.
If this does happen, the file system can be unmounted and then remounted using an undocumented "-o noslog" argument. When mounted in this way, the .slog0000 file can be removed, and when the filesystem is remounted normally, it should be recreated.
Most modern drives are extremely reliable, and the better drives will even automatically and transparently map out bad blocks so you never even know anything happened. When they see that a disk block is starting to fail, they copy data to a spare block, re-jigger their internal tables so that any reference to the old block now goes to to new, and lock out the old bad block forever. That can all be transparent to you. For drives without that feature, you use "badtrk", and like fsck, you use it in single-user mode. It lets you scan your drive non-destructively and will try to recover data from bad blocks.
I'd say with the current state of the art, it's OK to have a bad block or two. If you are seeing more than that, though, you have a defective drive. A quick way to check for unreadable blocks if you cant't unmount right now is to use dd. For example, to check my jaz media, I might do:
dd if=/dev/jaz of=/dev/null bs=1024k
If there are any unreadable blocks, I'll get error messages. But "badtrk" is a better way to do this.
Do not mess around with anything less than one of the Super Tars.
If you found something useful today, please consider a small donation.
Got something to add? Send me email.
More Articles by Tony Lawrence © 2013-08-11 Tony Lawrence