APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

Tips on Hard Drive Problems

Back in the 1980's, I could count on earning a few hundred dollars every month from hard disk failures and other problems (slow performance, lost files and so on). That may be a slight exaggeration, but crashing disk drives and performance complaints were much more common then than now.

Today, I can't even remember the last time I had to replace a disk drive; it may have been several years back. The MTBF (Mean Time Between Failures) for disks is now many years on average. Good engineering and many years of experience to draw from has made drive failure a very rare event.

Performance has also become much less of an issue - today's systems and the software that drives them are usually not performance bound. There are always exceptions, of course, and specialized applications may benefit from specific tuning, but for most of us, "out of the box" is just fine.

However, problems do still happen.

Grinding noise in computer

Are you hearing noises from your computer? It is probably NOT the disk. More often that comes from a cooling fan bearing. If you are a little technically savvy, you can pinpoint the source of odd noises by temporarily unplugging the drive or fans (after powering everything off, of course, and perhaps even unplugging from the wall for more safety).

Do you need to replace a noisy fan? You probably should, because overheating computers will run more slowly and heat can cause component failures. On the other hand I have seen very noisy fans run on for years and years.

System crashes frequently

Try shutting off completely. Additionally, unplug the system from the wall if it is not a laptop. Now go get a cup of hot coffee or tea. You don't have to drink it - just wait until it is cooled off and then plug everything back on and try again. The point is just to let the system cool off completely. Don't rush, let it cool.

If this works, your problem may be heat. That could be from a failed fan, some airflow obstruction or a buildup of dust. Unless you can correct the cause, it's just going to come back.

I had a customer lose three servers one very hot summer weekend because she shut off the air conditioning to save money. The servers were in a tiny room with a closed door, their offices were on the top floor of a building with a flat asphalt roof; the sun beat down and the temperature rose and rose..

For a very temporary fix, you can try setting up a floor fan to blow directly onto the computer, with or without its covers open. This can sometimes keep a system running while you plan a more permanent solution.

For laptops, you can buy cooling stands. I use this Antec cooler but there are many others to choose from.

Disk won't boot

Try the full power off and cool down for this also. If that doesn't work, next check your computer BIOS to determine if it is "seeing" the drive. Sometimes simply resetting the BIOS to defaults can fix problems like this, but of course that can also lose other necessary configuration changes.

If the drive will boot after cooling down, with luck it can stay running long enough for you to back up data. In the old days, I have put hard drives in a refrigerator for a few hours to cool them! Cranking up the A/C and pointing external fans at the machine might buy you a few critical minutes.

I'd check internal cabling and power connections before resetting the BIOS. This is much harder to do on a laptop, but not impossible if you have a little mechanical skill. If you are comfortable working with the covers off and have enough knowledge and experience to feel safe, you may be able to feel whether or not the hard drive has powered on. You can also judge if it feels unusually hot - extreme heat can indicate a failed drive.

Cables

It's possible that the power connector or data cable are bad. You can try swapping the power cable from something that you know has power (like your CD or DVD drive) and data cables are cheap and easily available. Watch out for bent pins!

For IDE drives (most common in desktop and home computers) the master/slave selection is important. If disconnecting everything else on the same cable makes it work, you probably have a conflict.

SATA drives have no jumper settings, though you may need to update your computer's BIOS if it can't see the installed drives.

For SCSI drives, the interrupt assigned to the controller might change if you did something as simple as adding a new internal device or switching to a USB mouse from an older PS/2 style. If the operating system driver doesn't know that the SCSI disk controller can change its interrupt, it can fail to boot. SCSI id conflicts can also cause failures, as can improper SCSI termination - both too much and too little! Termination issues are more apt to cause flaky performance than complete failure, but you do need to understand termination if you have SCSI drives.

Controller failure

A controller failure can look just like a drive failure. If you have another machine with the same configuration, swapping the drive to that is often a quick way to determine the real cause.

RAID failure

RAID failures can be simple (replace a failed drive and rebuild the raid) or difficult (a failing controller has scrambled the raid).

If a drive needs replacing, the rebuild may be automatic or may need to be initiated by you. As RAID can be hardware or software, your procedure will vary. Here is an example of rebuilding a Linux software RAID.

Performance

It's rare to do much performance tuning today. The only task most people do at all is defragmentation, which may be pointless in most circumstances.

.

If performance is an issue, more ram (for buffer cache) is an easy solution. RAID systems are another way to increase drive throughput.

Monitoring Drive health

One reason hard drives last longer than they used to is because modern drives can automatically remap failing sectors to new, spare sectors. This usually happens automatically without your knowledge.

Many of today's hard drives support S.M.A.R.T (Self monitoring analysis and reporting technology). This can be very helpful to determine if your disk might be getting near to failing.

You can find software that will display this S.M.A.R.T information if your operating system doesn't already do that for you.

Recovering data

If the drive simply won't boot, it could be that only boot sectors are damaged or missing. Adding the drive as a secondary in a working machine might let you access it to copy important data. Remember master/slave settings or SCSI id settings!

There are tools like Spinrite and Tom's Root and Boot disk that you can use to diagnose and fix common issues. Some others include Parted Magic and RIP.

If all else fails, data recovery firms can often do amazing work recovering most or all of your data. If you are using Linux or Windows, be sure the firm you choose has experience with those filesystems!

Hard drive problems are rare today. Often you will have replaced your computer long before the hard drive causes you any problem at all. Just keep these tips in mind in case it ever does happen to you.

The best protection you can make for your data is to have a good backup strategy. Don't neglect that!



Got something to add? Send me email.





(OLDER)    <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> -> Tips on hard drive problems and troubleshooting


6 comments



Increase ad revenue 50-250% with Ezoic


More Articles by

Find me on Google+

© Anthony Lawrence







Mon Feb 28 16:22:24 2011: 9338   BigDumbDinosaur

gravatar


For IDE drives (most common in desktop and home computers) ...

Not anymore. We haven't shipped anything with an IDE drive since late 2004 and I haven't seen anything come through out shop in the last several years with an IDE drive. Everything in desktop machines is SATA. Increasingly, low-end servers are also shipping with SATA drives, which is a cost-cutting measure that produces lame performance in some cases. We still continue to build all our servers with SCSI hardware. If a client wants a cheap server with SATA drives we suggest they look at H-P or (ugh!) Dell.

For SCSI drives, the interrupt assigned to the controller might change if you did something as simple as adding a new internal device or switching to a USB mouse from an older PS/2 style. If the operating system driver doesn't know that the SCSI disk controller can change its interrupt, it can fail to boot.

In my experience, this has yet to be an issue with Linux. I can move a host adapter from one slot to another and Linux always finds it. OpenServer, however, has occasionally had trouble with shifting host adapter IRQs. It's a consistent problem with network cards in OSR5 (dunno about OSR6). Of course, OSR5's support for USB hardware was spotty at best.





Mon Feb 28 16:26:07 2011: 9339   TonyLawrence

gravatar


Yes, currently shipping has been SATA for a while - but IDE is still found mostly in home computers..



Tue Mar 1 13:52:26 2011: 9341   joe

gravatar


Tony! i remeber what a lot of money we can make in these years.,.,

today all are pluga and play boxes with no-requirement-expertise to replace.

i still remember these "swet-money" computer years, ym 52 years old..

sorry my poor english

joe



Tue Mar 1 21:26:23 2011: 9344   AndrewSmallshaw

gravatar


It appearss to me there's a lot of misinformed rubbish floating about concerning cooling. It seems to mostly emanate from the gamers with an implicit assumption that more fans are better and that more impressive looking heatsinks are better. The result is cases with half a dozen or more fans all working against each other and heatsinks with shiny chrome finishes: it is literally high school physics that dark matt finishes make far better radiators.

With regards to external fans and removal of covers it has always struck me as dubious. Directing a room fan over a computer strikes me as a good way of disrupting the airflow into and out of the case, which personally I'd rather avoid. Running without covers is similar: a correctly designed system of airflow will circulate air throughout an enclosed case. Take the cover off and you loose that: you no longer have forced air cooling.

On the other hand, was it you that came to computers via heat exchangers? I can't remember now - it was definitely someone from c.u.s.m. It may not be a direct fit but there's some relevant real expertise there.



Tue Mar 1 21:58:59 2011: 9345   TonyLawrence

gravatar


I'm not referring to normal situations. Removing covers to use external fans is an emergency response to overheating that is occurring because of some other issue - failed fans, failing hard drive.

It works. In the short run :)






Wed Mar 2 16:03:15 2011: 9348   BigDumbDinosaur

gravatar


Tony, didn't you sell steam generators at one time?

Regarding forced-air cooling of hard disks, all manufacturers of 10,000 RPM and 15,000 RPM SCSI drives recommend forced-air cooling. These drives tend to get quite hot without it and, as any student of high capacity disk storage surely must know, thermal expansion of the platters can get to a point where the drive's recalibration function will not be able to accommodate the changes, causing a plethora of errors.

I have here an application note from Fujitsu on their 15,000 RPM SCSI drives, and in it is a recommended airflow table that is based on the ambient temperature inside the server enclosure (these aren't drives most folks would install in desktop PCs, BTW). It's clear from reading this app note that Fujitsu really doesn't expect convection cooling to be adequate, which supposition has been borne out by my own experience in building servers. So Tony's idea of directing a fan into a server's interior to mitigate thermal problems isn't "wrong." Any airflow is better than none.

Running without covers is similar: a correctly designed system of airflow will circulate air throughout an enclosed case. Take the cover off and you loose that: you no longer have forced air cooling.

Most systems aren't really well designed from an air movement perspective. The general approach is to put a big enough fan (or fans) in the rear bulkhead, maybe a fan or two in the drive bays, and let 'er rip. True, removing the cover negates the exhausting capability of the rear bulkhead fan(s), but if another fan is blowing into the enclosure, you're still going to get cooling. All of the hardware in the server -- even the MPU cooler -- sheds unwanted heat via convection. Air movement merely hastens the process and prevents stagnant pockets of hot air from developing.

------------------------
Kerio Samepage


Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us





On two occasions, I have been asked [by members of Parliament], "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?"...I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. (Charles Babbage)

One in a million is next Tuesday. (Gordon Letwin)












This post tagged: