If this isn't exactly what you wanted, please try our Search (there's a LOT of techy and non-techy stuff here about Linux, Unix, Mac OS X and just computers in general!):
From: Mike Brown <mike@tkg.ca> Subject: Update: Compaq ML530 G2 redundant power supply Date: Thu, 09 Jan 2003 22:08:13 GMT Compaq ( opps, the new HP ) tech support RMA'd the server, and a new ML530 was shipped in. It has the identical problem. The problem was escallated to engineering, and determined to be a bug, reportedly in the UNIX kernel handling of IRQs, that the EFS5.54 cpqcasm daemon was interfacing with. Since I don't get to play with source I will have to take their word on it.
So anyone with a ML530 or ML570, any combination of HW or OSR5, running EFS5.50+ may want to prevent the CASM daemon from starting up. There will be no error notification on a HW failure. Most of the issue will be resolved with EFS5.56, no ETA. Below is a cut and paste of the previous thread: _____________________________________________________________ > > Bela Lubkin wrote: > > > > Mike Brown wrote: > > > > > If anyone else has this combination would > > > you test this problem for me. > > > > > > Compaq ML530 with redundant power supply. > > > Xeon 2.4GHz, 1GB ram, 5302 Raid Controller. > > > 6 x 18GB 15K hard drives. > > > > > > SCO 5.0.6a, MSTPP, most OSS patches > > > and EFS5.52 or EFS5.50 > > > > > > Dual network cards on different subnets. > > > > > > The system is up and running, and has passed > > > all normal break in tests, and it is > > > impressively fast. > > > > > > The test it failed was a large FTP transfer > > > when a power cord was pulled from one of the > > > power supplies. I normally load a system and > > > do things like test UPS shutdown, HD failure, > > > network card disconnect ... > > > > > > With nothing else running as a load, a FTP > > > get was started on a 500MB file. A few seconds > > > in the power cord was pulled, and this message > > > displayed on the graphic screen: > > > > > > WARNING: fpexterrflt - No process owns floating point unit > > > > > > and the FTP session drops back to a shell > > > prompt without completing the transfer. > > > > > > The system experiences no problems with the EFS's > > > removed, but fails this test with 5.52 or 5.50 > > > loaded. > > > > Boy, you're good at digging up the weird ones. ;-} > > > > Have you done this test more than once? The message you show means that > > a floating point exception was received for a process, but by the time > > it was sufficiently digested to know what process to send it to, that > > process had already exited. It should take a very unusual timing > > sequence for that to happen. > > > > I'm thinking that if you run this test again, you'll find that the > > symptom of the FTP session dying is repeatable, but the kernel warning > > message is not (or, at least, doesn't come every time). Thus, the > > warning message is a secondary symptom; the FTP process dying is a > > primary symptom. > > > > I haven't looked at the details of a Compaq EFS since something like ODT > > 2.0, so I can't give specific advice. But it should consist of kernel > > portions and user daemons. Can you boot the kernel with the Compaq > > stuff in it, yet avoid starting the daemons? Put "exit 0" at the top of > > its rc script(s) or something. Then see if the primary symptom is > > repeatable. The answer to this tells us whether it's a kernel or daemon > > problem. > > > > Then you could try starting limited subsets of the daemons. They > > probably depend on each other so you'll either have to read the doc or > > experiment. See if you can narrow it down to a single daemon whose > > presence seems to cause the problem. > > > > Report back. And bring it up with Compaq as well. > > > > >Bela< > > Some days I feel like 'WEIRD ONES `R US'. > > The test is very repeatable, but since this is a new platform I do > not have multiple boxes to play on, yet. In 1999 Compaq had a > similar problem with the then current EFS and the ProLiant 1600. > File copies were aborted when the power cord was momentarily > disconnected. The problem went away with a new EFS release, > but the details and solution has never been posted. And the 1600 > had no detectable problems with the EFS removed. > > A case has been opened with Compaq, and has been sent to escalations, > but I was curious if anyone else is seeing this problem. > > There are two daemons running, casmd and cevtd, which at run time > do not die with a kill -9 command. I will prevent them from starting > and perform the same tests tomorrow. > > The floating point error does not show up on a text screen, only on > the X window and in syslog. Another quirk is that the X window session > may be killed and the screen set back to the scologin. > > Mike > > -- > Michael Brown > > The Kingsway Group The problem is not related to combinations of patch or driver, but to the IRQ assigned to the Compaq Advanced Server Management HW by the BIOS. With the particular server under test, assigning IRQ 5 or 15 to the ASM HW results in a server with the above problem, but everything works perfectly with the IRQ at 10 or 11. Compaq tech support wants me to replace the system board and retest since Compaq's lab unit does not have the same issue.
I can move the NIC, SCSI controller or RAID IRQs to any value available without a problem, so it may be a subtle timing or race condition rather than an outright hard failure. Maybe if I orient the backplane to magnetic north ... Mike _____________________________________________________________ Mike -- Michael Brown The Kingsway Group
/Bofcusm/1960.html copyright 1997-2004 (various authors) All Rights Reserved
Have you tried Searching this site?
Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.
Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.
Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.
We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.
Click here to add your comments
Don't miss responses! Subscribe to Comments by RSS or by Email
Click here to add your comments
If you want a picture to show with your comment, go get a Gravatar