I had a conversation with a client yesterday about virtualization and backup. As there was some definite confusion in that conversation, I realized other people may need some orientation.
Today's VM systems can use "snapshots" to effectively capture an image of the VM at a moment in time. Microsoft calls this "Shadow Copy" but it's all the same idea: copy on write. The snapshot software creates a file that looks like a copy of the VM image, but it isn't. At that moment, it's actually working like a hard link: it is pointing to exactly the same disk blocks as the original file. If some process writes data to any block in the original, the target block is first copied to a new block that the snapshot file will use.
At any point in time, the snapshot file inode points to a mix of bocks: some are the same blocks that are still pristine in the original file and some are new blocks that have copies of that files data before it was overwritten.
This copy on write scheme leads to a few important facts. First, the snapshot will always represent the original file at the moment in time that it was created. Because no data is actually copied when the snapshot is first created, that can take place very quickly - the running system will be frozen briefly, but it doesn't take long. However, the necessary copy on write will have some impact on the VM's performance.
The snapshot may take up far less disk space than would be required for an actual copy. Over time, as new data is written to the original, the snapshot will need more new disk blocks and eventually could end up consuming as much space as its source.
There are two uses for these snapshots. One is to be able to quickly roll back to a particular point in time. The other use is for backup, and that's where the confusion comes in.
The snapshot file itself is a backup of sorts, but in this context its purpose is to allow you the leisure of copying to other media (tape, another disk, another directory, whatever) without having to be concerned about data changing during the process. You backup or copy the snapshot file which is always frozen in time when it was created.
That's great, but for most small business software it does NOT eliminate the necessity of stopping user activity. It cuts it down: creating a snapshot usually is literally a matter of a second or two. But it does not eliminate it.
Let's imagine an accounting application where a user has just entered a new invoice. There are several things that the software needs to do, but of course each action needs to be performed serially. In the midst of that, a snapshot of the data is requested.
The snapshot process will, of course, flush any pending disk writes, but the application software may have only asked to write some of what it needs to write to finish the transaction. You've frozen the data, but there are still things the application hasn't gotten to yet. The data may be corrupt.
Application software may be ready to deal with that situation. If you don't know that it is, you MUST get users to be inactive before creating the snapshot. For small businesses that typically aren't doing any user input during the backup window, that's no problem. For larger businesses it may be - but of course they are more apt to have more sophisticated application software that can roll back or complete partial transactions.
The snapshot software may allow you to automatically call custom scripts to shut down databases or do any other "pre-freeze" and "post-thaw" work. For example, see VMware's Virtual Machine Backup Guide for an overview of their snapshot/backup methods.
Snapshots do make backup easier, but you still have to consider your application software.
Got something to add? Send me email.
(OLDER) <- More Stuff -> (NEWER) (NEWEST)
Printer Friendly Version
Increase ad revenue 50-250% with Ezoic