Yesterday I went out to try to recover a SCO system running
Medical Manager. What I found was an old system with a DPT
controller set up as a RAID 5 with three drives, but one
dead.. and it had crashed, and crashed hard.
According to the customer, the drive had failed some time ago..
and they had sometime tried to replace it, but it had never come
back on line. The system was sitting at a single user root
prompt, and she explained that if they continued to normal startup,
they would not be able to login. The first thing I looked
at was /etc/passwd - it was gone..
I don't mean "gone" as in the symbolic link being missing
(SCO uses a lot of symlinks). Nor do I mean the the real
password file down in /var/opt/K was missing.. it was listed
in the proper directory there.. but the inode that directory
entry pointed at was trashed.. we had disk corruption.
And it was not limited to that. Looking in /dev and other
system directories, "ls -l" would show strange characters in
the permissions field.. obviously there had been a hit to
the inode table. I turned fsck loose and it quickly filled
Well, that's OK. They use Microlite Edge for backup, so
I figured I'd be restoring data anyway. I could not get "edgemenu"
to run, but "edge" at the command line worked fine, so after
fsck finished, I started restoring files. I ran into two
immediate problems: some files that should have been directories
were not, and we were just about out of disk space after the restore.
The first problem was easily fixed by removing the corrupt ex-directories,
but the second turned out to be more of a problem: I started
removing files in /lost+found and about ten minutes in, the system panicked.
Ooops.. so the lost+found files themselves were also corrupt.. that's
really bad.. but the system did reboot and we were back at "CTRL-D
for normal system startup". I asked for the root password (remember,
it had been at a shell prompt when I arrived) and the person I was
working with said she didn't know..
But she must know.. she had obviously typed it in before..
well, apparently she had been on the phone with Sage Software (Medical
Manager) when she had done that, but couldn't remember what they
had her do. No problem, we'll just get Sage back on the phone..
Nope, not available right now.. we left a message and I
sat down to wait. But.. maybe I could go multi-user, because
I had restored /etc/passwd, so I tried it and was able to login as "ccmenu"..
on this system, that's a superuser account and it gave me
a menu that included "Unix Utilities", and that menu included "Unix
Shell" as an option.. unfortunately that wanted a password too.., but
"Read Mail" did not, so I did "!/bin/sh" within mail and got to
a "#" prompt. I changed the password, did an "init 1" and ran "fsck -ofull -y /dev/root".
That found a fair amount of problems, but most were in /lost+found and the ones that were not were on text report files.. so not so bad. When it finished,
I ran it again and all was clean.. we went multi-user and I had
them run reports to prove out the data, that all passed so things were
While they were checking more reports, I talked with the owners
about the foolishness of continuing with this system. I explained
that SCO was in dire straights, might not last much longer, and that
their old (3.2v5.0.5) OS wouldn't be able to be installed on modern
hardware.. I strongly suggested that they see what Sage could offer for
It turns out that Sage now offers Medical Manager on RedHat Linux.
That's great news, because the owners did NOT want to move to a Windows
system.. they've seen too many problems to fall for that. So they
have asked Sage for an upgrade quote and will move on that very quickly.
About then we ran into a problem: someone had tried to login in a second
screen and got a message saying that that they couldn't use /dev/ttyp35.
I took a quick look and saw the problem. The /dev/ttyp entries on this
system should be major number 58 and the minor number should match the
pty number. So ttyp0 is 58,0 and ttyp35 should be 58,35 - but it
was 54,19 instead.. again, inode corruption.
That's easy to fix, though: rm /dev/ttyp35; mknod /dev/ttyp35 c 58 35
will do it. However, a few minutes later it happened to a different
pty, so the corruption is ongoing: this is a real hardware issue, not
the result of a crash. That puts a certain urgency into changing machines.
I explained that SCO 5.0.5 would not work with any modern hardware and
that while we certainly could still find systems that it would work
with, that would take a few days at least.. and seemed to me to be throwing
good money after bad. If they could just limp along with this system,
being aware of the potential danger of corrupting customer data, I felt
they'd be better to just switch to Linux and abandon this as quickly
We were able to speak to Sage soon after that, and they said that although a new machine would take a few weeks, they felt they had some used boxes that
they could supply in the meantime - a little extra expense, but probably
less than wasting money on SCO.. so that's the plan for the moment.
I saw no point in trying to get the third drive working: the
controller could be bad, and it could get much worse if I try to
change anything.. in this case, I think it's best to just let
sleeping dogs lie.
I checked with the customer this morning; no more corruption in the ttyp's,
and all data is still proving out.. so maybe they can survive this
without much downtime. I sure hope so.
It's good that Sage can move them to Linux. It makes no sense to
throw more money into the old box, and it certainly makes no sense
to stay on SCO.