I get this question frequently. It's usually triggered either because the tape device can't hold an entire backup set or because the time required for backup interferes with productive work. Most of the time this can be easily remedied by a larger or faster storage device, but someone is bound to bring up the idea of differential backups.
The idea is that you create a full backup that has everything, and from then on, you only backup the files that have changed. Presumably that's a smaller set of files and therefore this solves the space or time problems. Usually the full backup is refreshed on some schedule and the process starts again. There are variants on the theme; for example the differential may include all files that have changed since the last full backup rather than just those that have changed since the last differential. That sort of scheme eventually ends up with the differential containing any and all files that ever change, no matter how infrequently; the full backup is the source of everything else.
Often the term "Incremental" is used to describe what I call true differential. I'll use that term for the rest of this article. Remember that a Differential will always have everything that has changed since the last complete backup; an Incremental will only have files that have changed since the previous Incremental backup. Right after the full backup, an Incremental and a Differential would be exactly the same; after that they will probably contain different files. An Incremental CAN be smaller than a Differential but could never be larger.
Differential or Incremental backups always seems like a great idea to people who haven't experienced the negative aspects. Admittedly, there can be circumstances where you have no other choice, but consider these points:
With differentials, it's more difficult. You need to keep a master off site and if you are doing Incrementals (not Differentials), you need to keep ALL of those off site. That makes it inconvenient if you need to have occasional access to the tapes on site, and that may also mean that you need to make TWO full backups each time you reach that point in your cycle- that makes it very time consuming and can use a lot of media.
More sophisticated backup programs can avoid this by deleting files that are not present on the next tape - however that depends on the integrity of the set and simple backups like tar or cpio cannot do this at all.
Worse news: damaged or lost media in the middle of a Incremental set like this can mean disaster. If a file happens to only exist on one piece of media because it is modified infrequently, the modifications may be lost forever.
Wherever possible, doing a complete, full backup every day is easiest and gives the most data redundancy. If you absolutely cannot do that, then the modified Incremental (everything modified since the last full backup) is better than true Incrementals. However, don't neglect having multiple full backups in either case.
By the way, my aversion to differential or incremental backups is based on many years of painful field experience. Although it is rare nowadays, not too many years ago I would be involved with drive failures about once a month: I have seen these problems for myself. I STRONGLY RECOMMEND FULL BACKUPS IF AT ALL POSSIBLE. Backup media gets larger and faster and cheaper ever year, so most people CAN do complete backups, and should.
While attractive in principle, the time element isn't all that good and you also lose several important capabilities:
Consider this also: you have set up rsync or whatever to keep two machines up to date. Now you have a memory or motherboard problem on the main machine that scrambles database data. It's not bad enough to crash instantly, but it is bad enough to damage the database extensively. That bogus data will of course get transferred to the other machine: effectively a hardware problem on one box causes the identical problems on the other.
Sometimes the easiest way to fix such a problem is to go back in time to a point where the data was not corrupted. This may be because it's too corrupt to fix with ordinary tools but more often it's just because it is too difficult to figure out where all the problems are: the only sure solution is to revert to some previous state. The ONLY way to do that is to have multiple sets of removable backup that extend backup in time.
Remember, I'm not saying that having the backup machine is a bad idea. It's not, and it can be very convenient. But you need removable media SOMEWHERE.
There are now inexpensive removable hard drives. They are still a little expensive, but you CAN do this.
Removable media is still the intelligent choice for backup and will remain so until solid state, non-volatile disk drives are common, and I'm not even sure if it's a bad idea then.
Enter your email address for automatic notification of new posts here
(be sure to whitelist 'feedburner.com' if you use spam filtering)
| Views for this page | ||||
|---|---|---|---|---|
| Today | This Week | This Month | This Year | Overall |
| 6 | 8 | 6 | 1,584 | 17,575 |
Have you tried Searching this site?
Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.
Wed Sep 14 15:16:46 2005: Subject: anonymous
Nice Utopian view but not realistic in a large or enterprise environment, and you charge how much?
Wed Sep 14 17:49:03 2005: Subject: TonyLawrence
I don't think you read this article. I said, if you can't do it, either because of time or space, obviously you have to go to a differential or incremental scheme. But if you CAN, a full backup is a far better idea. There's nothing "Utopian" about it - either you can do full backups or you cannot. If you can, you should. Period.
Oh, and I charge $150.00 per hour. But opinions like this are free.
Wed Sep 14 22:26:10 2005: Subject: BigDumbDinosaur
There are now inexpensive removable hard drives. They are still a little expensive, but you CAN do this.
Any hard drive is an inferior alternative to tape or optical media (if the backup can fit therein). The primary reason we do backups is to protect ourselves from hard drive failure, which as Tony indicated in his article, used to be a routine problem some years ago. A hard drive is an inherently delicate device, and all it takes is one good hard knock to damage the mechanism and render the data inaccessible. While using a removable hard disk as a backup device is fairly convenient, it's not a final solution.
Nice Utopian view but not realistic in a large or enterprise environment, and you charge how much?
There's nothing Utopian about using the right methods. Just how valuable is your data to you anyhow?
As for what one might charge for one's services, I'm not sure what relevance that has to this article, since the information is free for the reading. I personally haven't worked with Tony, so I don't have an opinion on whether his rate is fair or not. However, I'd be willing to wager that with his range of experience in this business, he would be worth every cent of what he charges if your system went belly-up and you needed it back on-line in a jiffy.
In my case, my clients pay me well to keep their computing machinery greased and oiled, and none has ever complained about what I charge. If something were to go kaput and employees were suddenly unable to get anything accomplished, that would not be the best time to be pinching pennies. Which would be cheaper: hiring a high priced but very experienced technician who can come in, quickly size up the situation and promptly restore operation, or paying the salaries of numerous employees who are sitting around doing nothing while an inexperienced technician piddles around trying to figure out what's wrong?
Thu Oct 5 16:05:34 2006: Subject: anonymous
The weighting of this article is in the title but... As mr Utopia pointed out this isn't always a feasible in a corp environment. Corps will generally aim to have a centralised backup infrastructure and that means backing up over networks. Capacity is no longer the limiting factor, speed is. It maybe out of the articles scope but perhaps a reason to do differentials is to try and guarantee that a backup will complete before the data starts changing again. There’s also the issue of data that is changing 24/7. I agree with the gist of the article, but Utopia has a point.
Thu Oct 5 17:21:06 2006: Subject: TonyLawrence
Once again: I said, if you can't do it, either because of time or space, obviously you HAVE to go to a differential or incremental scheme. But if you CAN, a full backup is a far better idea. There's nothing "Utopian" about it - either you can do full backups or you cannot. If you can, you should. Period.
Add your comments
Control Spam and Viruses
Run your own mailserver