Skip to content

Frequently asked questions (FAQ)#

What should be installed on the customer side to start replication of an infrastructure?

On the customer side, just install and run replication agents on each of the ESXi hosts in your VMware vSphere or internal agents in case of Windows and Linux operating systems. Each VMware agent will be responsible for replicating machines on its host, which optimizes the flow and makes replication process more reliable. A VMware agent is downloaded through the replication wizard of a new host, where the agent is passed as a parameter to the group. Machines replicated by the agent will enter this group automatically as the settings to access VMware vSphere are shared as well. For detailed information, please refer to the section ACP - Protecting machines.

How often should you update your Disaster Recovery plans?

The more often you update the the better when faced with an accident and the less time it will take to update when the accident has already occurred. Recommended frequency is every 2 or 3 weeks. For detailed information, please refer to the section ACP - Disaster Recovery plans.

How to group machines correctly and why should I create groups?

Groups are necessary for a logical combination of machines. Association principles can be different: general purpose of machines, geographical location, general rules for storing snapshots and replication schedule. A particularly important and convenient feature for groups is the ability to manage common parameters of a replication schedule and snapshot storage rules at a time, which significantly simplifies an already flexible configuration of a business application replication.

How much data could be lost in case of disaster?

Recovery point objective (RPO) is a measurement of data that will be lost in case of disaster, and it depends directly on a machine replication schedule (its frequency). If an accident takes place, all data accumulated since the last successful replica will be lost. So for critical applications it is required to set a minimum interval between replications or select a Continuous replication, where a new snapshot is taken immediately after completion of a previous one. But please keep in mind that such frequency can lead to a high network load and affect performance of machines due to constant creation of consistent snapshots.

How to check a replicated business application in case of disaster?

To check correctness of replication and recovery of a business application in case of disaster, it is necessary to periodically test and restore infrastructure. With a certain periodicity, run cloud sites from the last recovery points on the basis of updated and perform a set of tests on a restored application (similar to checking correctness of work in production), make sure that tests are successfully passed and application works as expected in the cloud site. In case of problems, contact support team for an early solution.

Some machines changed their statuses to Error, what is the reason and how to fix it?

Make sure that replication agents are running and there are no errors in their console and no problems with the Internet connection between production and DR site.

How to set a replication schedule for critical parts of a business application and snapshot retention rules?

Settings for schedules and snapshot storage policies can be modified for all machines as well as for individual groups and individual machines. A detailed overview of the functionality can be found in the sections ACP Edit replication settings and schedule and Edit retention policies.

How to properly plan a failback to production and what downtime is expected during the procedure?

Failback to production can be initiated once the related accident on a production platform has been eliminated and the main site is ready to failback the business application. Download an agent to failback the application, complete the process of downloading data from the DR site, test production, stop the machines in the cloud site and download all the changes from it, run production and again carry out the final testing, finally, redirect traffic to production. The expected downtime depends on the size of a business application and the number of changes accumulated in the cloud site since the last synchronization between sites. On average, it lasts from 1 hour to 1-2 days. For a detailed description of the process, please refer to the section ACP -- Failback to production.

A user has removed some of the resources (cloud site, Disaster Recovery plan, etc.), how to get an audit-log of the recent user actions?

Client audit log is available for cloud provider administrators or customer administrators in case of an on-premise installation. Contact them to access and download the required data.

How to create a system user with limited rights for viewing and editing resources?

The system supports a flexible roles management including various actions within roles. Contact the solution administrator to create a role with necessary access parameters and assign users with it. A detailed overview of the functionality can be found in the section ACP - Roles management.

One of the VMs goes to the parked state with the reason 'The machine will be parked: Create snapshots failed because of expected provider error. Check the event log for details'. Could you give me a proper plan to fix the issue?

Typically, it is sufficient to rule out the cause listed in the Windows logs. First of all, there is a VSS service log in the Event Viewer, as well as a Volsnap service log in the System log. You can also do the following:

  • Restart the VSS service,
  • Reboot the server,
  • Check disks for errors,
  • Check the VSS settings (is there enough space allocated for Shadow Storage).