SCO hpsas failed

Got questions? Go ahead: Ask me anything!

ToddPorter

I got an error from one of our many SCO servers in the area which said:
hpsas <slot 0>: controller heart beat counter stopped
I have never encountered this error before. All I could do is reboot the server which came back up fine but how worried should I be? Is there there any sort of utility I can run like a SCO "chkdsk"? Is there any log that would give me more detailed info?



Got something to add? Send me email.





(OLDER) <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> -> sco hapsas failed

20 comments



Increase ad revenue 50-250% with Ezoic


More Articles by © ToddPorter







Wed Jan 28 14:45:16 2009: 5258   TonyLawrence

gravatar
As "hpsas" is the disk driver, it may not have been able to write any logs. You should contact HP for debugging info on that.

Your question about "chkdsk" is amusing because Unix had such programs long before Microsoft existed. The command you want is "fsck". If there is definite corruption, it will run itself at reboot.

If you want to force that. it has to be run on unmounted file systems or in single user mode (init 1) for the root file system. You'd want

fsck -ofull







Fri Mar 13 13:08:58 2009: 5679   anonymous

gravatar
We have the same problem with a sco server, every 60 days the server goes down with the error heartbeat counter stopped.








Fri Apr 3 14:14:16 2009: 5965   ToddPorter

gravatar
thanks for the reply, I'm getting more familiar with UNIX now then I ever thought I would. I had this hpapen on another customer site again and rebooted, all is well now. I'm beginning to think that either the controllers are flaky or we should stop selling the ML350's for our UNIX clients. In your experience whats the most stable controller for SAS?

Also I want to say that this website has been a godsend over the past year or so sincxe I started in the UNIX world. MUCH valuable information, and I appreciate all the hard work you have put into it!



Fri Apr 3 14:18:36 2009: 5966   TonyLawrence

gravatar
Unfortunately, you are likely to run into driver problems with SCO on anything today. Nobody cares about them - everyone expects them to disappear shortly. The mfg's certainly can't expect much help from SCO Engineering now..



Fri Apr 3 15:24:06 2009: 5967   BigDumbDinosaur

gravatar
I'm curious as to what hardware these SCO installations are running. I wonder if these are cases of running old drivers on newer hardware revisions. As Tony pointed out, there's little of today's hardware that has adequate support for anything SCO.

If you want to force that. it has to be run on unmounted file systems or in single user mode (init 1) for the root file system. You'd want


fsck -ofull

You might want to make that fsck -y -ofull if you are a novice at this sort of thing.



Mon Apr 6 12:53:30 2009: 6015   ToddPorter

gravatar
both times, customer is running our newly installed systems (no later than 1 yr ago). Both customers are running an HP ML350 G5 with the built in "HP Smart Array E200i" controller. Does anyone have a reconmmendation for new hardware (entire machine) that plays nice with SCO 5.0.7. We do support for sites where thats all their software will support.



Mon Apr 6 13:06:09 2009: 6016   TonyLawrence

gravatar
Certainly you can put together systems that will work with SCO. Obviously you can't go to Dell or any other brand name - I don't think any of them care about SCO now. You can find white box distributors - for example Seneca Data still builds SCO boxes last I checked. However, I can't think that any of them will continue this for very long.

Your customers need to move. See
(link) for a list of resources toward that end.



Fri Sep 4 13:33:21 2009: 6849   greg

gravatar
hpsas <slot 0>: controller heart beat counter stopped
i am in the process of moving over to the ml350 g5 with the e200i from the
ml350 g4p ..i would like to know if this issue was resolved regarding this error mesg ..i done a goggle search and went to hp and sco and found
nothing... just seems weird that the same person had it on 2 different boxes



Fri Sep 4 13:39:06 2009: 6850   TonyLawrence

gravatar
I have no more information on this, sorry.



Fri Sep 4 14:05:56 2009: 6851   greg

gravatar
can anyone email the poster ? i just worry that the ml350g5 will not work
its been certified by hp and sco using the efs..maybe this guy did not do the install correctly? i have had sco running 5.07 on g4p with no problem but that used a scsi 640 raid card not a sas driver hpsas ... just seems weird that it worked 4 60 days and thn shut down? if it was a driver problem why wait 60days it should of broke asap or never worked



Fri Sep 4 14:08:15 2009: 6852   TonyLawrence

gravatar
Unfortunately, he didn't provide an address.



Fri Sep 4 16:22:45 2009: 6854   ToddPorter

gravatar
I have done maybe 10 of these installs with a ML350. I only had the issue with 2 sites. 1 site had the issue last year, only once. A reboot fixed the issue; never had it again. The second customer had the same issue twice, a year ago. Reboot "fixed" the issue. I have not had that since either. I am very familiar with SCO now and believe it to be related to the particular hard drives we ordered part# ST373455SS (Seagate Cheetah). We have used other brands since and have not had the issue on any other ML350 systems.



Fri Sep 4 16:27:11 2009: 6855   TonyLawrence

gravatar
Thanks, Todd.

I'm glad you followed this!



Fri Sep 4 16:58:58 2009: 6856   ToddPorter

gravatar
no problem. I hope others do the same for me. This site is definitely the most useful and fastest responding site I use. Keep up the good work in maintaining it!



Fri Sep 4 17:09:12 2009: 6857   TonyLawrence

gravatar
"Giving back" is part of the reason I started this.

I do work hard at this, trying to keep up with updates, keeping it fast and so on. It's a lot of work and I really cant come close to doing it all, so I really appreciate additions like this.



Wed Aug 29 14:17:30 2012: 11258   ToddPorter

gravatar


I actually have some valuable details to add:

After MUCH research I have found that Seagate drives are not compatible with this controller. HP went so far as to tell me now that Seagate drives as not supported on the e200i. All the ML350 G5/G6 servers we sold were on an older firware of 1.2-1.5 (somwhere around there, dont exactly remember). I used the HP firmware CD to update the e200i firmware to 1.8 on 3 or 4 servers starting around a year ago and none of them have had ANY issues with the seagate drives. I am not sure if FW1.8 fixed the issue or if it is just coincidence but I would definitely update the firware to the latest on any e200i controller using HP's boot CD!



Wed Aug 29 14:21:36 2012: 11259   TonyLawrence

gravatar


Thanks! That may well save someone some grief in the future.



Wed Aug 29 15:23:52 2012: 11260   BigDumbDinosaur

gravatar


After MUCH research I have found that Seagate drives are not compatible with this controller. HP went so far as to tell me now that Seagate drives as not supported on the e200i.

Rubbish! That's a cop-out by HP for the defective firmware in their host adapters, something that I have run into numerous times with their products. The Seagate ST373455SS SAS drive is 100 percent compliant to the ANSI SCSI specs, include all bus timing requirements. How typical of HP (and to a lesser extent, Dell) to blame the manufacturer of a peripheral for host adapter (and cable) issues!

SCSI of any kind is manufacturer-agnostic. The only way that the server would know that a Seagate drive is at the other end of the bus would be by reading the vendor fields during an inquiry (SCSI command 0x12). To quote from the Seagate programming manual:

5.1.1.3 Inquiry Command (12h)
The INQUIRY command requests that information regarding parameters of the drive be sent to the initiator. An option Enable Vital Product Data (EVPD) allows the initiator to request additional information about the drive. See paragraph 5.1.1.3.1. Several Inquiry commands may be sent to request the vital product data pages instead of the standard data shown in Table 5.1.1-8.


I'm suspecting that the H-P host adapter's firmware is enabling EVPD and looking for something specific to H-P-manufactured drives during device enumeration (the process when the host adapter lists SCSI devices), and not seeing it, somehow marking the drive as either not present or not fully ANSI compliant. Most drive manufacturers implicitly advise against enabling EVPD until after the drive has been spun up. Here are Seagate's remarks on this matter:

To minimize delays after a reset or power-up condition, the standard INQUIRY data is available without incurring any media access delays. Since the drive stores some of the INQUIRY data on the device media it may return zeros or ASCII spaces (20h) in those fields until the data is available.

Seagate, especially, is very careful about following the SCSI protocol and has been 100 percent standards-compliant since 1986, when X3-131 was ratified. I have never known any SCSI device from Seagate or other vendors that we regularly use (e.g., Tandberg) to have these kind of issues with non-HP hardware. Inevitable, it's either a host adapter issue, a buggy driver (again, quite possible with HP, who'd rather you run HP-UX) or a defective bus cable (the latter usually due to a faulty active terminator in parallel bus systems -- not applicable to SAS). I'd bet that if you used a third party host adapter (e.g., an Adaptec product), this problem would go away.



Wed Aug 29 15:30:03 2012: 11261   BigDumbDinosaur

gravatar


Also, here's the Seagate product manual for your disk.
(link)



Wed Aug 29 17:10:44 2012: 11262   ToddPorter

gravatar


Let me rephrase that... HP has told me they are not compatible, which I agree is ridiculous, but what are you gonna do? I am very familiar with Unix/SCO now and have talked with a SCO engineer about this issue. I believe it IS HP's problem but I cannot reproduce the issue yet after a firmware update which of course is the first thing HP recommends, so I am at a standstill until the newer firmware allows the same issue to re-occur.




------------------------
Kerio Samepage


Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us





It all sounds good from the pulpit,but come Monday morning all the sinners are back to business as usual writing crappy code. (Tony Lawrence)





This post tagged: