Zerto Bitmap Sync

December 8, 2015 High Availability Blog Articles

Here’s a situation that might sound familiar to you. You have Zerto installed, and things are running smoothly – at first. Then one morning you look at your Zerto console and realize that a few of your VPGs are in a “Bitmap Sync” state, and seem to be syncing gigabytes of data. You might be thinking that Zerto somehow isn’t running properly, or wondering how a user changed so much data. However, neither of these explanations are actually the case. Allow us to explain.

The Bitmap Sync and Your Backed Up Data

First and foremost, Zerto is not broken, and you probably didn’t actually generate as much change as Zerto is uploading. To achieve a better understanding of what’s going on, first we’ll consider the bitmap sync. A bitmap sync is not a volume sync or a delta sync. In other words, the bitmap sync is not transmitting the entire volume of a virtual machine (VM) like a Volume Sync would, nor is it doing a target side scan like a Delta Sync would. A bitmap sync is a normal operation that Zerto uses when a Virtual Protection Group (VPG) becomes bandwidth constrained. In other words, a bitmap sync is a tool Zerto uses when the vRealize Automation Center (VRA) cannot send its data quickly enough to the replication site.

Zerto captures changes to VMs at the input/output (I/O) layer in real time and stores them briefly on VRAs until they are transported to your recovery site. The VRAs are programed with a maximum memory allocation of 3GB; this 3GB buffer is used as a sort of queue for that data. Because the source VRAs cannot communicate quickly enough with the target VRAs, Zerto does a bitmap caching of the changes made to your data instead, which it will then push when the connection becomes available again. Now you might be wondering: “why can’t my source communicate with the target quickly enough?”

What’s Causing The Backed Up Backup?

There are a few possible reasons for why your source can’t efficiently communicate with your target. Because Zerto is using real-time I/O level captures, the rate at which changes can be offloaded to the target depends greatly on two factors:

  1. Upstream bandwidth availability
  2. The quantity of data that is changing

For example, imagine you have 4 VPGs and one is bitmap syncing, as shown in the image below.

bitmap_sync

 

When a VM generates an I/O action, Zerto captures that change and then offloads it to the appropriate VRA. Imagine that one of the VMs in the affected VPG changes storage blocks 5, 8, 9, 15, 90, 102, 109, 301, and 505. Under normal operation conditions, where a lot of bandwidth is available and change rates are low, those individual data blocks would be written to the memory of the VRA, flagged for transport and then written to the target VRA. However, when either the change rate increases dramatically or the bandwidth decreases (or both), what happens when the VRA runs out of memory to track changes? The amount of data that can be tracked decreases, and journal flags are created as the VRA memory peaks to its maximum. Essentially, Zerto does not create massive journals on the source system, preventing your data from creating a traffic jam.

If your system generated 50GB of change, Zerto does not require space to store 50GB of changed data before transport. Instead, it uses the 3GB of memory buffer as a short term way to track changes while it attempts to achieve real time transmission. If the VRA becomes unable to store all blocks of data in its memory, it will start tracking groups of blocks. So instead of capturing the individual blocks mentioned above, groups may instead be something like 5-102 and 301-505. Now imagine that not all of the blocks changed from 5-102, but instead only 6 blocks changed. Since Zerto is attempting to deliver HA in real time, it will replicate all blocks that may have changed, ensuring that the most recent version of your data is accounted for.

Unexpected Efficiency

Despite seeming like you’re syncing gigabytes of data, this operation is actually much more efficient than alternative methods of handling queued data. Zerto does not require massive amounts of storage to operate, nor does it require a large overhead of journaling or the CPU-intensive task of tracking all blocks at all times.

Instead, Zerto aims to be as efficient as possible and only handle the data that is changing as it changes. Unfortunately, that real-time replication is not always possible as change rates increase, or available upstream bandwidth decreases. By using the bitmap sync, Zerto is able to avoid costly overhead to your infrastructure all while maintaining relatively low RPOs, even when bandwidth is constrained. Just when you were w



Back to blog list