APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

Backup mistakes you might be making

Yesterday I found myself in a situation no one ever wants to be in: I needed to restore a file from a customer's backups, but could not because no backups existed.

How did we get to this abominable condition? Through a series of errors and bad practices. Are you making any of these mistakes?

First, the box being backed up is a Linux box. That's unimportant - the mistakes that followed have nothing to do with Linux. I chose Microlite Backup Edge as the backup software so that we could do a complete bare metal restore if necessary. That wasn't a mistake.

I configured the software to do a FTP backup to a Windows server that I don't administer. Of course I needed the cooperation of the Windows administrator to set up the FTP directory for the backups. No mistakes so far.

I configured the software to only notify in the even of failure. That's not my usual habit: ordinarily I would notify by mail or printout either success or failure. I do not remember why I only set this for failure; perhaps I was convinced that they did not need the "Success" notifications. I think that is a mistake, because the absence of an expected notification could signal a problem. If notification of success was expected, someone would have noticed the subsequent failures.

On the other hand, I have had customers completely ignore failures because all they were looking for was a piece of paper and were not actually reading it, so it is possible this would not have helped.

I configured the failure notification to print to a networked LaserJet printer. I did test the printer, but I did not test the failure report specifically. That's probably out of habit - I subconsciously expected that the "test" would be from the first backup's "Success" report, but that wasn't configured, so it was never tested. The failure notifications did print when they eventually occurred, but the printer was not configured to handle Unix LF's correctly, so only one line printed and that of course was incomplete. Probably these looked like junk and were thrown away. Bad mistake.

Over the next 53 weeks, the system did 106 backups (two a week, which we had established as reasonable) and failed once. I know that from looking at backup logs. Of course the single failure wasn't noticed but that wasn't at all critical. No real issue here..

Sometime in March of 2007 the company renumbered this part of its LAN, going from a 192.168 network to 172.30. Of course they did notify me of that, but none of us remembered that the backups were going to the old 192.168 address. From that day onward, the backups failed, but as the reports were not being seen, nobody noticed. The renumbering of course was not a mistake, but not remembering that the backup was hardcoded to an old IP certainly was.

Again, if the printer had been producing a readable report, this might have been noticed. It wasn't. Week after week, backups failed. I may have even logged onto this system for minor changes during this period, but never checked backup logs. Another mistake. Technically, I was not responsible for checking, but I easily could have. I did not - until yesterday, when I needed a file. At that time I asked the Windows admin for an early tape because I had forgotten that we did this by FTP. He remembered that I was backing up over the network and reminded me.. it was then that I looked at the logs and realized we had a problem..

However, we still had backups on the Windows server, right? These actually would have been fine for what I needed as the file I wanted hadn't changed in years. Unfortunately, sometime in August the Windows admin was looking for some extra space and came across the FTP directory that had not been used in three months. He conferred with a person at the company who has some responsibility in this area, but neither recognized that this was our Linux backup so they removed everything to gain space. Big mistake, but perhaps not too bad, because the Windows server itself also gets backed up to tape.

Unfortunately, the tapes only are retained for thirty days. That's yet another mistake: a good data retention policy keeps monthly and yearly media. Ideally the yearly media would go back a number of years for forensic purposes if nothing else, so this was the killing mistake: we had no backups whatsoever.

So that's it: a series of small mistakes that led to a disaster.. well, not a complete disaster: I can manually fix whatever problem there is in this file, but it would have been quicker and easier to just restore it.

If you have not completely reviewed your backup policies and strategies recently, that is something that needs to be done, probably at least once a year. That's the final mistake: if the policies had been reviewed and backups checked yearly, this might have been less damaging.

Got something to add? Send me email.

(OLDER)    <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> -> Backups lost due to multiple errors


Increase ad revenue 50-250% with Ezoic

More Articles by

Find me on Google+

© Anthony Lawrence

Tue Jan 29 16:26:17 2008: 3547   rbailin

Note that Microlite BackupEdge also allows you to send out success/failure notices via email as well as to a printer. Normally it just sends it to root on the local machine, but it's prudent to add your own email to the list.

Tue Jan 29 16:30:11 2008: 3548   rbailin

On an unrelated note, when I click on my username to edit a comment, I'm brought to a page listing all my comments. When I click on the one I wish to edit, I end up with only a blank page. This behavior happens in both IE7 and Firefox2.0. Whyizzit?


Tue Jan 29 16:50:54 2008: 3549   TonyLawrence

My fault.. editing comments is not working well right now..

I couldn't send email because that machine wasn't allowed to send outbound mail :-)

Kerio Samepage

Have you tried Searching this site?

Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us

FORTRAN—the "infantile disorder"—, by now nearly 20 years old, is hopelessly inadequate for whatever computer application you have in mind today: it is now too clumsy, too risky, and too expensive to use. (Edsger W. Dijkstra)

This post tagged: