Why not differential backups, Internet backups, disk to disk??
I get this question frequently. It's usually triggered either
because the tape device can't hold an entire backup set or because
the time required for backup interferes with productive work. Most
of the time this can be easily remedied by a larger or faster
storage device, but someone is bound to bring up the idea of
The idea is that you create a full backup that has everything,
and from then on, you only backup the files that have changed.
Presumably that's a smaller set of files and therefore this solves
the space or time problems. Usually the full backup is refreshed on
some schedule and the process starts again. There are variants on
the theme; for example the differential may include all files that
have changed since the last full backup rather than just those that
have changed since the last differential. That sort of scheme
eventually ends up with the differential containing any and all
files that ever change, no matter how infrequently; the full backup
is the source of everything else.
Often the term "Incremental" is used to describe what I call
true differential. I'll use that term for the rest of this article.
Remember that a Differential will always have everything that has
changed since the last complete backup; an Incremental will only
have files that have changed since the previous Incremental backup.
Right after the full backup, an Incremental and a Differential
would be exactly the same; after that they will probably contain
different files. An Incremental CAN be smaller than a Differential
but could never be larger.
Differential or Incremental backups always seems like a great
idea to people who haven't experienced the negative aspects.
Admittedly, there can be circumstances where you have no other
choice, but consider these points:
- Nowadays, this may be a futile effort. The unchanging Operating
System files aren't what is exceeding your space or time capacity-
it's surely your data in most situations. So any style of
differential backup is still likely to be more data than you want-
the OS files are often a puny and insignificant part of your data
- Differential backups complicate off-site storage. The whole
point of moving backups off site is to provide safety in the event
of a fire or other complete physical loss. If you have complete
backups, most small companies rotate the media in and out daily-
Wednesday nights backup goes off site Thursday night, and Tuesday's
is brought back in Friday morning. This is simple.
With differentials, it's more difficult. You need to keep a
master off site and if you are doing Incrementals (not
Differentials), you need to keep ALL of those off site. That makes
it inconvenient if you need to have occasional access to the tapes
on site, and that may also mean that you need to make TWO full
backups each time you reach that point in your cycle- that makes it
very time consuming and can use a lot of media.
- Incrementals (which are often the only method that will solve
the time or space constraints) introduce another problem if it
becomes necessary to restore. You start with the most recent full
backup, and then restore each Incremental in order. More than once
I've seen people run out of disk space doing this because of
temporary files. Each Incremental will include temporary or
transient files that may have been removed before the next
Incremental, but those files will be restored faithfully just the
same. You have to be very careful about excluding temporary files
with this scheme.
More sophisticated backup programs can avoid this by deleting
files that are not present on the next tape - however that depends
on the integrity of the set and simple backups like tar or cpio
cannot do this at all.
Worse news: damaged or lost media in the middle of a Incremental
set like this can mean disaster. If a file happens to only exist on
one piece of media because it is modified infrequently, the
modifications may be lost forever.
- Differentials give more redundancy than Incrementals to the
changing data, but often have no or limited redundancy for the full
backup. As system files very often are modified very infrequently,
loss of a full backup (media damage or physical loss) can be quite
Wherever possible, doing a complete, full backup every day is
easiest and gives the most data redundancy. If you absolutely
cannot do that, then the modified Incremental (everything modified
since the last full backup) is better than true Incrementals.
However, don't neglect having multiple full backups in either
By the way, my aversion to differential or incremental backups
is based on many years of painful field experience. Although it is
rare nowadays, not too many years ago I would be involved with
drive failures about once a month: I have seen these problems for
myself. I STRONGLY RECOMMEND FULL BACKUPS IF AT ALL POSSIBLE.
Backup media gets larger and faster and cheaper ever year, so most
people CAN do complete backups, and should.
While attractive in principle, the time element isn't all that
good and you also lose several important capabilities:
- The ability to take media off site.
The ability to restore completely to a fresh drive from the
media without reinstalling the OS (though see Supertars).
- "deep" backup stretching as far back in time as you need. You
can simulate that with a large enough drive at the receiving end,
but then all your backups are in one mechanical device: if that
device fails, you lose all backup.
Consider this also: you have set up rsync or whatever to keep
two machines up to date. Now you have a memory or motherboard
problem on the main machine that scrambles database data. It's not
bad enough to crash instantly, but it is bad enough to damage the
database extensively. That bogus data will of course get
transferred to the other machine: effectively a hardware problem on
one box causes the identical problems on the other.
Sometimes the easiest way to fix such a problem is to go back in
time to a point where the data was not corrupted. This may be
because it's too corrupt to fix with ordinary tools but more often
it's just because it is too difficult to figure out where all the
problems are: the only sure solution is to revert to some previous
state. The ONLY way to do that is to have multiple sets of
removable backup that extend backup in time.
Remember, I'm not saying that having the backup machine is a bad
idea. It's not, and it can be very convenient. But you need
removable media SOMEWHERE.
There are now inexpensive removable hard drives. They are still
a little expensive, but you CAN do this.
Removable media is still the intelligent choice for backup and
will remain so until solid state, non-volatile disk drives are
common, and I'm not even sure if it's a bad idea then.
The problem here is two-fold: one, you probably can't back up ALL your
data because the connection isn't fast enough and two, you are depending on
the Internet being available for restore. I do think Internet backup is a great adjunct to in-house removable media, but that's all it is.
Maybe you have multiple redundant T3 connections and can do this, but even then, I think you should have in-house removable media for utmost safety.
Your data is critical. Don't put it at risk.
Got something to add? Send me email.
Increase ad revenue 50-250% with Ezoic
More Articles by Tony Lawrence
Find me on Google+
© 2012-07-14 Tony Lawrence