Using rsync for machine replication?

Yesterday I was at a client site where they explained they wanted to keep a stand-by server up to date and ready to take over in case of main system failure. Fine, lots of people do that, and currently they are doing it by restoring backups every morning. What they were asking about was using rsync or some other mechanism to keep the machines more current.

My first reaction was to question them about their app: it's apparently a mess of Basic programs that work on hundreds of different data files 24 x 7. I asked if they shut down the users for backup. They don't. I explained that the backup can't really be guaranteed consistent if users are writing data to files because obviously files are going to be backed up while they are being written to. As the files are related to each other (A/R header and detail files, indexes, etc.) you can have inconsistent versions on the backup media. . Somebody told them (obviously incorrectly) that rsync could prevent this. There are databases that will let you replicate data while you run, but that's application level, and this app has no such ability. Rsync can't do any better than a backup for that.

I talked about snapshots, and asked if they could shut down users for the very brief time it takes to do that. Nope. Can't ever stop the flow of data. I explained that rsync can't answer that problem any better than a tape backup can: files and indexes may be inconsistent with each other.

I also wonder if rsync's rolling checksum might even make things better or worse. On the one hand we get less data transferred as opposed to just a rcp or whatever, but open files that may be getting writes during the checksum make things even more confusing.

They can't change the app. They can't shut down users. Their bank says they have to have better disaster recovery. These things seem impossible to reconcile. My feeling is that if the banks demands have to be met, then the user community HAS to put up with periods where they can't use the app. If they use snapshots, that period can be brief, otherwise its going to be fairly long (13 GB of data to transfer).

Comments?

More comments at http://groups.google.com/groups?dq=&hl=en&lr=&threadm=1097927504.117610.261580%40f14g2000cwb.googlegroups.com&rnum=1


Got something to add? Send me email.





(OLDER) <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> -> Using rsync for machine replication?




Increase ad revenue 50-250% with Ezoic


More Articles by

Find me on Google+

© Tony Lawrence




"They can't change the app. They can't shut down users. Their bank says they have to have better disaster recovery. These things seem impossible to reconcile. My feeling is that if the banks demands have to be met, then the user community HAS to put up with periods where they can't use the app. If they use snapshots, that period can be brief, otherwise its going to be fairly long (13 GB of data to transfer)."

These people are being unreasonable -- and stupid. In any case, 13 GB can be transfered in a reasonable amount of time if using DDS4, DDS5 or DLT. Other formats would be a problem, time-wise.

An alternative, assuming there's enough disk space, would be to kick the users off the system, create a tarball, let the users back in and then transfer the tarball to tape. You're assured that all data files are in a consistent state and disruption is kept to a reasonable level.

Otherwise, your client is playing Russian roulette with their data.

--BigDumbDinosaur

---October 18, 2004

After reading that Usenet thread, I'm still left with one question (if you can answer it in a way that doesn't breach client confidentiality): What sort of business is this client in that they operate 24/7 with absolutely no possiblity of downtime? How did they cope with the Aug 2003 blackout, for heavens sake?

(Have you ever checked the uptime value on their server to see if they're telling the truth?)

Basically (pun intended), I can't imagine any such mission-critical application being written haphazardly in Basic. (This coming from someone who works on such a system.) If you performed an activity analysis of their site, would it really show a constant level of activity from 00:00:00 to 23:59:59, with nary a dip at 3:49AM when the night shift collectively stops for their mid-morning pizza break? Or is it more a matter of people not wanting to give up the convenience of having all their data entry and inquiry screens available at a moment's notice when that $50 sales call comes in at 4AM?

To me it sounds like management is distorting reality to make them appear much more important than they truly are. Plus, they sound like a cheap outfit, in more ways than one.

--Bob

---October 18, 2004

I don't know the reason, but I bet it is 100% political - some upper manager wants no down time and that's it. This is a food manufacturer, and I suspect its simply a case of upper management having no clue (what else is new?) and perhaps made more complicated by the apparent assertion from another consulting firm that rsync would solve the integrity issue, which of course it will not.

This isn't a regular client, but I'm certainly not going to sign off on "rsync is just what you want"

--TonyLawrence


---October 18, 2004

Can't shut down ... but what about a backup app that detects periods of inactivity of duration X (thus ensuring indices align with databases) then breaks the mirror on the data drives (requires a RAID/mirror arrangement), snapshots the data on the inactive half of the mirror, re-synchs the mirror, then, in its own sweet time, backs up the snapshot.

-Buck




---October 19, 2004

I guess these guys may have to experience an extended hardware outage, where the backups are needed, and then maybe they could accept a little downtime? How fast is the hardware? I do not envy the position you are in, Tony. I am guessing you have put some of the facts in writing for them, to cover yourself, and they will have to live with the possibility that without shutting down for a short period of time (i.e. in the middle of the night) to perform the correct backup, they may have data that could not fully be restored, should the hardware catch the Flu and not recover because of it's age :-)

- Bruce Garlock

---October 19, 2004



---October 19, 2004

Maybe run the 2-3 servers with a distributed file system like AFS or Coda?

That way they could do the backups, and have replication in case one server gets fubar'd.
http://www.coda.cs.cmu.edu/misc/databases.html

(oh and only have them use one server at a time)

--Drag

---October 19, 2004

1. A set of mirrored drives - disconnect the mirror at a slow time and back up the mirror, reconnect when the backup is done, resync happens on the fly so no one is the wiser.

2. A set of mirrored hot swap drives - at backup time simply pull half a set of drives and immediately replace them with the spare set, resync happens on the fly so no one is the wiser.

3. Replace the system with a redundant pair of servers (http://www.marathon1.com/) that operate in lockstep with each other.

4. It seems to me that some databases can suspend commits for the period of time backup is done. Actual implementation is left as an exercise for the reader. N.B.: Most Business Basic applications do not actually use a database product such as SighBase or snOracle but implement their own structured file types. Rewriting the application may be required and I am available.

--Dirk


Mirrored drives don't change anything. You can't know that the app data is consistent when you disconnect the mirror. It's not FS consistency, it's app.

HA servers can't either unless they share storage, but that still leaves the problem of backup which is not answered without shutting down users. Period.

--TonyLawrence

It it were my client, I'd spell out in detail -- in writing -- exactly what will happen if the system "catches the flu," and that I accept no responsibility of any kind in such an event. I've often found that when dealing with a client who, for whatever reason, doesn't want to deal with backups, a certain amount of scare tactics is in order. Presenting a worse-case scenario and taking the story as high in the chain of command as possible often shakes out complacency. These people are just asking for a data disaster.

--BigDumbDinosaur

---October 22, 2004
How about something with LVM, as this link explains:

"The goal is to get a Mail server that will never be taken offline for nightly backups (as our old system used to). We found Linux, XFS, LVM, and snapshots to be perfect for this situation"

http://arstechnica.com/columns/linux/linux-20041013.ars

- Bruce Garlock

---October 22, 2004

We talked about snapshots (its mentioned in the original article). But a snapshot doesn't help application file consistency: a snapshot of an app with partial transactions is no better than a tar of the same data.

Apps like this use multiple files. For example, you might have a customer file, an a/r transaction file, an a/r detail file, and probably associated indexes for each. When you add a transaction, all of these need to be updated, but obviously that will be a serial procedure: first one, then another, and so on. If you catch data in the middle, you have an inconsistent state when you restore that data.

Real databases handle that problem with a transaction file that knows how far things have gotten and can be completed or rolled back as necessary. They also usually provide methods to replicate while in use also, which might be done by buffering requested activity until the replication is done, thus making it transparent to the users. But this is app level support which many older programs don't have at all. Those programs can't be backed up without stopping input.

--TonyLawrence




Kerio Samepage


Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us





If you just want to use the system, instead of hacking on its internals, you don't need source code. (Andrew S. Tanenbaum)

If you have any trouble sounding condescending, find a Unix user to show you how it's done. (Scott Adams)







This post tagged: