A SCO Medical Manager System Dies

Yesterday I went out to try to recover a SCO system running Medical Manager. What I found was an old system with a DPT controller set up as a RAID 5 with three drives, but one dead.. and it had crashed, and crashed hard.

According to the customer, the drive had failed some time ago.. and they had sometime tried to replace it, but it had never come back on line. The system was sitting at a single user root prompt, and she explained that if they continued to normal startup, they would not be able to login. The first thing I looked at was /etc/passwd - it was gone..

I don't mean "gone" as in the symbolic link being missing (SCO uses a lot of symlinks). Nor do I mean the the real password file down in /var/opt/K was missing.. it was listed in the proper directory there.. but the inode that directory entry pointed at was trashed.. we had disk corruption.

And it was not limited to that. Looking in /dev and other system directories, "ls -l" would show strange characters in the permissions field.. obviously there had been a hit to the inode table. I turned fsck loose and it quickly filled up /lost+found..

Well, that's OK. They use Microlite Edge for backup, so I figured I'd be restoring data anyway. I could not get "edgemenu" to run, but "edge" at the command line worked fine, so after fsck finished, I started restoring files. I ran into two immediate problems: some files that should have been directories were not, and we were just about out of disk space after the restore. The first problem was easily fixed by removing the corrupt ex-directories, but the second turned out to be more of a problem: I started removing files in /lost+found and about ten minutes in, the system panicked.

Ooops.. so the lost+found files themselves were also corrupt.. that's really bad.. but the system did reboot and we were back at "CTRL-D for normal system startup". I asked for the root password (remember, it had been at a shell prompt when I arrived) and the person I was working with said she didn't know..

But she must know.. she had obviously typed it in before.. well, apparently she had been on the phone with Sage Software (Medical Manager) when she had done that, but couldn't remember what they had her do. No problem, we'll just get Sage back on the phone..

Nope, not available right now.. we left a message and I sat down to wait. But.. maybe I could go multi-user, because I had restored /etc/passwd, so I tried it and was able to login as "ccmenu".. on this system, that's a superuser account and it gave me a menu that included "Unix Utilities", and that menu included "Unix Shell" as an option.. unfortunately that wanted a password too.., but "Read Mail" did not, so I did "!/bin/sh" within mail and got to a "#" prompt. I changed the password, did an "init 1" and ran "fsck -ofull -y /dev/root".

That found a fair amount of problems, but most were in /lost+found and the ones that were not were on text report files.. so not so bad. When it finished, I ran it again and all was clean.. we went multi-user and I had them run reports to prove out the data, that all passed so things were looking good.

While they were checking more reports, I talked with the owners about the foolishness of continuing with this system. I explained that SCO was in dire straights, might not last much longer, and that their old (3.2v5.0.5) OS wouldn't be able to be installed on modern hardware.. I strongly suggested that they see what Sage could offer for an upgrade.

It turns out that Sage now offers Medical Manager on RedHat Linux. That's great news, because the owners did NOT want to move to a Windows system.. they've seen too many problems to fall for that. So they have asked Sage for an upgrade quote and will move on that very quickly.

About then we ran into a problem: someone had tried to login in a second screen and got a message saying that that they couldn't use /dev/ttyp35.

I took a quick look and saw the problem. The /dev/ttyp entries on this system should be major number 58 and the minor number should match the pty number. So ttyp0 is 58,0 and ttyp35 should be 58,35 - but it was 54,19 instead.. again, inode corruption.

That's easy to fix, though: rm /dev/ttyp35; mknod /dev/ttyp35 c 58 35 will do it. However, a few minutes later it happened to a different pty, so the corruption is ongoing: this is a real hardware issue, not the result of a crash. That puts a certain urgency into changing machines. I explained that SCO 5.0.5 would not work with any modern hardware and that while we certainly could still find systems that it would work with, that would take a few days at least.. and seemed to me to be throwing good money after bad. If they could just limp along with this system, being aware of the potential danger of corrupting customer data, I felt they'd be better to just switch to Linux and abandon this as quickly as possible.

We were able to speak to Sage soon after that, and they said that although a new machine would take a few weeks, they felt they had some used boxes that they could supply in the meantime - a little extra expense, but probably less than wasting money on SCO.. so that's the plan for the moment.

I saw no point in trying to get the third drive working: the controller could be bad, and it could get much worse if I try to change anything.. in this case, I think it's best to just let sleeping dogs lie.

I checked with the customer this morning; no more corruption in the ttyp's, and all data is still proving out.. so maybe they can survive this without much downtime. I sure hope so.

It's good that Sage can move them to Linux. It makes no sense to throw more money into the old box, and it certainly makes no sense to stay on SCO.



Got something to add? Send me email.





(OLDER) <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> -> A SCO Medical Manager System Dies


8 comments



Increase ad revenue 50-250% with Ezoic


More Articles by

Find me on Google+

© Anthony Lawrence







Sat Jan 12 02:35:07 2008: 3438   bruceg2004


Love the little hack to get a shell prompt from the 'Read Mail' option -- nice! Their management of this whole fiasco just proves that some people really should not be managing IT assets, when they are not competent enough to properly manage them. Our 5.0.5 machine is probably going to head down this road soon enough; older hardware, disk drives that have been spinning for over 7 years, and other wear and tear will have the RAID asking for another drive soon. I have already replaced one drive in the RAID, but it rebuilt just fine. This is the first year that I have finally had management agree that our servers should have a FIVE year limited lifetime, and should be upgraded to prevent these things from happening.



Sat Jan 12 11:23:19 2008: 3440   TonyLawrence

gravatar
Well, they don't have an IT person.. that's how things got this bad.

Yeah, five years is plenty.. I explained to them that it just needs to be in their budget.



Sat Jan 12 11:46:50 2008: 3441   TonyLawrence

gravatar
And what are you going to so about SCO?



Thu Aug 27 06:00:49 2009: 6795   Jason

gravatar
Interesting. I had a client who had a sco 5.0.6 system die just this week. Initially, setup in 2001. Wow eh? SCO boots to the boot prompt , but after that just sits there with a kernal dump. I saw a message about the Compaq Array 431 overheating. Attempted to book the system using the drivers from a flopy disk boot: defbootstr link=fd(60)clad .. etc. Then, get a memory corrupt kernal panic. Anyway, that system is done for. Ironically, Windows BartPE and my Ubuntu cd work okay; and start up on this system. The raid controller took an online firmware update. Hard to tell what caused it.



Thu Aug 27 10:31:30 2009: 6796   TonyLawrence

gravatar
Yeah, RAID 5 is great - until it isn't.



Thu Aug 27 14:37:53 2009: 6801   BigDumbDinosaur

gravatar
This article was posted when I was very sick from an aggressive form of ITP, so I missed reading it (as well as others from around that time).

I was surprised to see the client still running on OSR5.0.5 long after two more iterations of OSR5 had been released. This is a classic case of getting too far behind the technology curve. However, as a matter of academic interest, OSR5.0.5 will run on current hardware -- I've tested it on one of our Opteron servers. The main problem, as always, is getting drivers for new host adapters, etc. So in a pinch, this client could have moved to new hardware and still run their old version of OSR5 by using compatible network cards, etc. But, why would anyone do that when Linux can be had for a very reasonable cost?

As far as the ongoing filesystem corruption you described, that sounded of either a cache coherency problem in the DPT RAID controller or flaky SCSI bus termination between one of the ports on the controller and the associated drive. Having diagnosed that sort of trouble on several machines in the past, I would have removed the RAID controller, substituted a standard SCSI host adapter and tried doing a minimal OS install to see if the machine was stable -- most likely a crash during installation would have occurred if the mobo or memory were to blame. If it did stabilize, I'd replace the controller, although that in itself would have been a challenge due to differing drive mapping schemes.

BTW, DPT's products were never first-rate, and we have never used them in any server we built with RAID.



Thu Aug 27 14:44:58 2009: 6802   TonyLawrence

gravatar
Illness is no excuse :-)



Thu Aug 27 15:37:49 2009: 6803   BigDumbDinosaur

gravatar
Illness is no excuse :-)

Sure it is! Society uses illness to excuse all sorts of bad behavior. Why should I be any different? <Grin> If I'm sick and I don't want to read your articles, I won't! So there!

Jocosity aside, those who would use illness as an excuse to not do anything useful should consider the late Ted Kennedy. In the last year of his life, desperately ill as he was, he didn't lay around and take up space. Although I personally found his politics (and his past behavior) distasteful, I do admire the man for taking the cards he had been dealt and continuing despite them. Some of our "entitled" loafers would do well to take a hint from the senator's final days.

------------------------
Kerio Samepage


Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us

privacy policy