Why not differential backups, Internet backups, disk to disk??

I get this question frequently. It's usually triggered either because the tape device can't hold an entire backup set or because the time required for backup interferes with productive work. Most of the time this can be easily remedied by a larger or faster storage device, but someone is bound to bring up the idea of differential backups.

The idea is that you create a full backup that has everything, and from then on, you only backup the files that have changed. Presumably that's a smaller set of files and therefore this solves the space or time problems. Usually the full backup is refreshed on some schedule and the process starts again. There are variants on the theme; for example the differential may include all files that have changed since the last full backup rather than just those that have changed since the last differential. That sort of scheme eventually ends up with the differential containing any and all files that ever change, no matter how infrequently; the full backup is the source of everything else.

Often the term "Incremental" is used to describe what I call true differential. I'll use that term for the rest of this article. Remember that a Differential will always have everything that has changed since the last complete backup; an Incremental will only have files that have changed since the previous Incremental backup. Right after the full backup, an Incremental and a Differential would be exactly the same; after that they will probably contain different files. An Incremental CAN be smaller than a Differential but could never be larger.

Differential or Incremental backups always seems like a great idea to people who haven't experienced the negative aspects. Admittedly, there can be circumstances where you have no other choice, but consider these points:

  • Nowadays, this may be a futile effort. The unchanging Operating System files aren't what is exceeding your space or time capacity- it's surely your data in most situations. So any style of differential backup is still likely to be more data than you want- the OS files are often a puny and insignificant part of your data set.

  • Differential backups complicate off-site storage. The whole point of moving backups off site is to provide safety in the event of a fire or other complete physical loss. If you have complete backups, most small companies rotate the media in and out daily- Wednesday nights backup goes off site Thursday night, and Tuesday's is brought back in Friday morning. This is simple.

    With differentials, it's more difficult. You need to keep a master off site and if you are doing Incrementals (not Differentials), you need to keep ALL of those off site. That makes it inconvenient if you need to have occasional access to the tapes on site, and that may also mean that you need to make TWO full backups each time you reach that point in your cycle- that makes it very time consuming and can use a lot of media.

  • Incrementals (which are often the only method that will solve the time or space constraints) introduce another problem if it becomes necessary to restore. You start with the most recent full backup, and then restore each Incremental in order. More than once I've seen people run out of disk space doing this because of temporary files. Each Incremental will include temporary or transient files that may have been removed before the next Incremental, but those files will be restored faithfully just the same. You have to be very careful about excluding temporary files with this scheme.

    More sophisticated backup programs can avoid this by deleting files that are not present on the next tape - however that depends on the integrity of the set and simple backups like tar or cpio cannot do this at all.

    Worse news: damaged or lost media in the middle of a Incremental set like this can mean disaster. If a file happens to only exist on one piece of media because it is modified infrequently, the modifications may be lost forever.

  • Differentials give more redundancy than Incrementals to the changing data, but often have no or limited redundancy for the full backup. As system files very often are modified very infrequently, loss of a full backup (media damage or physical loss) can be quite serious.


Wherever possible, doing a complete, full backup every day is easiest and gives the most data redundancy. If you absolutely cannot do that, then the modified Incremental (everything modified since the last full backup) is better than true Incrementals. However, don't neglect having multiple full backups in either case.

By the way, my aversion to differential or incremental backups is based on many years of painful field experience. Although it is rare nowadays, not too many years ago I would be involved with drive failures about once a month: I have seen these problems for myself. I STRONGLY RECOMMEND FULL BACKUPS IF AT ALL POSSIBLE. Backup media gets larger and faster and cheaper ever year, so most people CAN do complete backups, and should.

What about Network Backups to another hard drive?

While attractive in principle, the time element isn't all that good and you also lose several important capabilities:

  • The ability to take media off site.

  • The ability to restore completely to a fresh drive from the media without reinstalling the OS (though see Supertars).

  • "deep" backup stretching as far back in time as you need. You can simulate that with a large enough drive at the receiving end, but then all your backups are in one mechanical device: if that device fails, you lose all backup.


Consider this also: you have set up rsync or whatever to keep two machines up to date. Now you have a memory or motherboard problem on the main machine that scrambles database data. It's not bad enough to crash instantly, but it is bad enough to damage the database extensively. That bogus data will of course get transferred to the other machine: effectively a hardware problem on one box causes the identical problems on the other.

Sometimes the easiest way to fix such a problem is to go back in time to a point where the data was not corrupted. This may be because it's too corrupt to fix with ordinary tools but more often it's just because it is too difficult to figure out where all the problems are: the only sure solution is to revert to some previous state. The ONLY way to do that is to have multiple sets of removable backup that extend backup in time.

Remember, I'm not saying that having the backup machine is a bad idea. It's not, and it can be very convenient. But you need removable media SOMEWHERE.

There are now inexpensive removable hard drives. They are still a little expensive, but you CAN do this.

Removable media is still the intelligent choice for backup and will remain so until solid state, non-volatile disk drives are common, and I'm not even sure if it's a bad idea then.

Why not Internet backups?

The problem here is two-fold: one, you probably can't back up ALL your data because the connection isn't fast enough and two, you are depending on the Internet being available for restore. I do think Internet backup is a great adjunct to in-house removable media, but that's all it is.

Maybe you have multiple redundant T3 connections and can do this, but even then, I think you should have in-house removable media for utmost safety.

Your data is critical. Don't put it at risk.



Got something to add? Send me email.





(OLDER) <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> -> Why not differential backups? Why not network backup? Why not hard drive backup?


6 comments



Increase ad revenue 50-250% with Ezoic


More Articles by

Find me on Google+

© Tony Lawrence







Wed Sep 14 15:16:46 2005: 1086   anonymous


Nice Utopian view but not realistic in a large or enterprise environment, and you charge how much?



Wed Sep 14 17:49:03 2005: 1089   TonyLawrence

gravatar
I don't think you read this article. I said, if you can't do it, either because of time or space, obviously you have to go to a differential or incremental scheme. But if you CAN, a full backup is a far better idea. There's nothing "Utopian" about it - either you can do full backups or you cannot. If you can, you should. Period.

Oh, and I charge $150.00 per hour. But opinions like this are free.



Wed Sep 14 22:26:10 2005: 1090   BigDumbDinosaur


There are now inexpensive removable hard drives. They are still a little expensive, but you CAN do this.

Any hard drive is an inferior alternative to tape or optical media (if the backup can fit therein). The primary reason we do backups is to protect ourselves from hard drive failure, which as Tony indicated in his article, used to be a routine problem some years ago. A hard drive is an inherently delicate device, and all it takes is one good hard knock to damage the mechanism and render the data inaccessible. While using a removable hard disk as a backup device is fairly convenient, it's not a final solution.

Nice Utopian view but not realistic in a large or enterprise environment, and you charge how much?

There's nothing Utopian about using the right methods. Just how valuable is your data to you anyhow?

As for what one might charge for one's services, I'm not sure what relevance that has to this article, since the information is free for the reading. I personally haven't worked with Tony, so I don't have an opinion on whether his rate is fair or not. However, I'd be willing to wager that with his range of experience in this business, he would be worth every cent of what he charges if your system went belly-up and you needed it back on-line in a jiffy.

In my case, my clients pay me well to keep their computing machinery greased and oiled, and none has ever complained about what I charge. If something were to go kaput and employees were suddenly unable to get anything accomplished, that would not be the best time to be pinching pennies. Which would be cheaper: hiring a high priced but very experienced technician who can come in, quickly size up the situation and promptly restore operation, or paying the salaries of numerous employees who are sitting around doing nothing while an inexperienced technician piddles around trying to figure out what's wrong?



Thu Oct 5 16:05:34 2006: 2505   anonymous


The weighting of this article is in the title but... As mr Utopia pointed out this isn't always a feasible in a corp environment. Corps will generally aim to have a centralised backup infrastructure and that means backing up over networks. Capacity is no longer the limiting factor, speed is. It maybe out of the articles scope but perhaps a reason to do differentials is to try and guarantee that a backup will complete before the data starts changing again. There�s also the issue of data that is changing 24/7. I agree with the gist of the article, but Utopia has a point.



Thu Oct 5 17:21:06 2006: 2506   TonyLawrence

gravatar
Once again: I said, if you can't do it, either because of time or space, obviously you HAVE to go to a differential or incremental scheme. But if you CAN, a full backup is a far better idea. There's nothing "Utopian" about it - either you can do full backups or you cannot. If you can, you should. Period.



Fri Sep 18 16:49:10 2009: 6932   TonyLawrence

gravatar
I wrote this over seven years agio and really nothing has changed.

A lot of people are running around pushing Internet backup. I think that's great as an adjunct to in-house, but as I explained above, it shouldn't be your only solution even if your data is small enough to stream out every night.

I still get the "another hard drive" and the "copy it to another machine" people too and while that can be convenient, it isn't BACKUP.

What I do get asked a lot is what software I recommend. Unfortunately, every damn backup app I've seen for Windows has caused me grief somewhere. I suspect this is Microsoft's fault more than the app vendors, but still: they are all very good at giving you pretty reports showing what they backed up and how long it took. If only they were equally good when it comes time to restore a dead box!

Because of the frailty, when it comes to Windows, my normal advice that you have to TEST full restores has to be even stronger: you have to test regularly and consistently. That's annoying and time consuming but my experience says you just can't poke your head in the sand and trust. I wish it were otherwise. With Microlite on Unix/Linux, I *know* that if I get a backup and have bootable restore media, I *can* rebuild. I wish Windows backup apps gave me the same confidence, but they do not.





------------------------
Kerio Samepage


Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us





Lawyer — One who protects us against robbers by taking away the temptation. (H.L. Mencken)

Actually I made up the term "object-oriented", and I can tell you I did not have C++ in mind. (Alan Kay)








This post tagged: