Yesterday I found myself in a situation no one ever wants to be in: I needed to restore a file from a customer's backups, but could not because no backups existed.
How did we get to this abominable condition? Through a series of errors and bad practices. Are you making any of these mistakes?
First, the box being backed up is a Linux box. That's unimportant - the mistakes that followed have nothing to do with Linux. I chose Microlite Backup Edge as the backup software so that we could do a complete bare metal restore if necessary. That wasn't a mistake.
I configured the software to do a FTP backup to a Windows server that I don't administer. Of course I needed the cooperation of the Windows administrator to set up the FTP directory for the backups. No mistakes so far.
I configured the software to only notify in the even of failure. That's not my usual habit: ordinarily I would notify by mail or printout either success or failure. I do not remember why I only set this for failure; perhaps I was convinced that they did not need the "Success" notifications. I think that is a mistake, because the absence of an expected notification could signal a problem. If notification of success was expected, someone would have noticed the subsequent failures.
On the other hand, I have had customers completely ignore failures because all they were looking for was a piece of paper and were not actually reading it, so it is possible this would not have helped.
I configured the failure notification to print to a networked LaserJet printer. I did test the printer, but I did not test the failure report specifically. That's probably out of habit - I subconsciously expected that the "test" would be from the first backup's "Success" report, but that wasn't configured, so it was never tested. The failure notifications did print when they eventually occurred, but the printer was not configured to handle Unix LF's correctly, so only one line printed and that of course was incomplete. Probably these looked like junk and were thrown away. Bad mistake.
Over the next 53 weeks, the system did 106 backups (two a week, which we had established as reasonable) and failed once. I know that from looking at backup logs. Of course the single failure wasn't noticed but that wasn't at all critical. No real issue here..
Sometime in March of 2007 the company renumbered this part of its LAN, going from a 192.168 network to 172.30. Of course they did notify me of that, but none of us remembered that the backups were going to the old 192.168 address. From that day onward, the backups failed, but as the reports were not being seen, nobody noticed. The renumbering of course was not a mistake, but not remembering that the backup was hardcoded to an old IP certainly was.
Again, if the printer had been producing a readable report, this might have been noticed. It wasn't. Week after week, backups failed. I may have even logged onto this system for minor changes during this period, but never checked backup logs. Another mistake. Technically, I was not responsible for checking, but I easily could have. I did not - until yesterday, when I needed a file. At that time I asked the Windows admin for an early tape because I had forgotten that we did this by FTP. He remembered that I was backing up over the network and reminded me.. it was then that I looked at the logs and realized we had a problem..
However, we still had backups on the Windows server, right? These actually would have been fine for what I needed as the file I wanted hadn't changed in years. Unfortunately, sometime in August the Windows admin was looking for some extra space and came across the FTP directory that had not been used in three months. He conferred with a person at the company who has some responsibility in this area, but neither recognized that this was our Linux backup so they removed everything to gain space. Big mistake, but perhaps not too bad, because the Windows server itself also gets backed up to tape.
Unfortunately, the tapes only are retained for thirty days. That's yet another mistake: a good data retention policy keeps monthly and yearly media. Ideally the yearly media would go back a number of years for forensic purposes if nothing else, so this was the killing mistake: we had no backups whatsoever.
So that's it: a series of small mistakes that led to a disaster.. well, not a complete disaster: I can manually fix whatever problem there is in this file, but it would have been quicker and easier to just restore it.
If you have not completely reviewed your backup policies and strategies recently, that is something that needs to be done, probably at least once a year. That's the final mistake: if the policies had been reviewed and backups checked yearly, this might have been less damaging.
More Articles by Anthony Lawrence - Find me on Google+
Have you tried Searching this site?
Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.
Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.
Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.
We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.
Click here to add your comments
Tue Jan 29 16:26:17 2008: rbailin
Note that Microlite BackupEdge also allows you to send out success/failure notices via email as well as to a printer. Normally it just sends it to root on the local machine, but it's prudent to add your own email to the list.
Tue Jan 29 16:30:11 2008: rbailin
On an unrelated note, when I click on my username to edit a comment, I'm brought to a page listing all my comments. When I click on the one I wish to edit, I end up with only a blank page. This behavior happens in both IE7 and Firefox2.0. Whyizzit?
--Bob
Tue Jan 29 16:50:54 2008: TonyLawrence
My fault.. editing comments is not working well right now..
I couldn't send email because that machine wasn't allowed to send outbound mail :-)
Don't miss responses! Subscribe to Comments by RSS or by Email
Click here to add your comments
If you want a picture to show with your comment, go get a Gravatar