Your Mission Critical Applications Deserve Real Backup Validation!

accident disaster steam locomotive train wreck

Photo by Pixabay on Pexels.com

Every organisation knows how important data protection is. However still most of the organisations never test their backups. Why? Because it is complex issue. However if you do not test how do you know that you can really survive from disasters?

Modern Approach for Data Protection

First step for Data Protection is of course thinking it in modern ways. Even that most of the restores from backups are still individual files and/or folders your organisation has to prepare for bigger disasters. What if half of your 500 VM production environment is hit by ransomware attack? How do you really survive from these types of disasters? Restore times with so called legacy backups might take days, even weeks. Can your business survive without access to these VMs for weeks, propably not.

Data protection depends upon timely backups, fast global search, and rapid recovery. Cohesity DataProtect reduces your recovery point objectives to minutes. Unique SnapTree technology eliminates chain-based backups and delivers instantaneous application or file-level recovery, with a full catalog of always-ready snapshots even at full capacity. With this approach it can dramatically reduce recovery time.

However even modern data protection is not enough if you don’t know that you really have something to recover to. Most of the modern technologies handle file system level data integrity but still there is no way to really know that your backups are fully recoverable without testing them.

From Data Protection to Recovery Testing

Typically organisations approach recovery testing with either just recovering single (or multiple) virtual machines. This of course makes sure that you can recover individual VMs but it doesnt ensure that you can recover something that is truly working. Some backup vendors implement recovery testing, but still it is mostly just VMs or some basic uptime testing.

Other way to do this is manually restore application setups and do manual testing. This is very costly because it requires lot’s of manual work, and also introduces several risks. However it enables your organisation to really test application workflows with proper testing. Do you really get answer from your 3-layer web application, can you get answers to your DB queries etc. What if you could take this method of running complex testing but without any need for manual labour?

Automating Recovery Testing

Because modern hypervisor platforms are API driven it is pretty easy to automate things on VM level. When you add API driven data protection platform, like Cohesity, you can automate full recovery testing with very complex testing. This is a issue I hear from most of my Service Provider customers – but also from bigger enterprise customers. How to automate complex recovery testing? Lets see….

Cohesity Backup Validation Toolkit

To make things simpler, you can download Cohesity Backup Validation toolkit from here and with minimal scripting knowledge it is easy to automate validation process.

After downloading it is time to create some config-files. Lets start with environment.json -file. This file contains connection information for both Cohesity, and VMware vSphere environments. Create file with content:

{
        "cohesityCluster": cohesity-01.organisation.com",
        "cohesityCred": "./cohesity_cred.xml",
        "vmwareServer": "vcenter-01.organisation.com",
        "vmwareResourcePool": "Resources",
        "vmwareCred": "./vmware_cred.xml"
}

After this we need to create actual config.json -file containing information about each virtual machine we are about to test.

This file also defines tests per VM so it is very easy to define multiple tests but only use selected per VM. Script also enables you to attach VM to needed test network, and change IP address to different for testing purpose so you don’t need to test with overlapping production IPs, or create siloed networking for VMware.

Note that VMs don’t need to be protected with same protection job making this more scalable since propably you have different job for web frontends and actual backend databases.

{
    "virtualMachines": [
        {
            "name": "Win2012",
            "guestOS": "Windows",
            "backupJobName": "VM_Job",
            "guestCred": "./guestvm_cred.xml",
            "VmNamePrefix": "0210-",
            "testIp": "10.99.1.222",
            "testNetwork": "VM Network",
            "testSubnet": "24",
            "testGateway": "10.99.1.1",
            "tasks": ["Ping","getWindowsServicesStatus"]
        },
        {
            "name": "mysql",
            "guestOS": "Linux",
            "linuxNetDev": "eth0",
            "backupJobName": "VM_Job",
            "guestCred": "./guestvm_cred_linux.xml",
            "VmNamePrefix": "0310-",
            "testIp": "10.99.1.223",
            "testNetwork": "VM Network",
            "testSubnet": "24",
            "testGateway": "10.99.1.1",
            "tasks": ["Ping","MySQLStatus"]
        }
    ]
}

And then final step is to create actual credential files. To prevent having usernames and password in configuration files in plaintext format we can use  simple powershell script to create these. You can have one shared credential file for all VMs, or you can have one per VM. Note that these users must have administrator level access to VMs to change IP network to test network.

To create credential files you can use included createCredentials.ps1 script which will create only one guestvm_cred.xml file but if you want to create more you can just simply run simple powershell command:

Get-Credential | Export-Clixml -Path guestvm_more.xml

Since this file is encrypted it can be only accessed with same user who created file, so make sure that you create credential files with same user you are using for running testing scripts.

So How Does it Work?

Here is an example run to clone two virtual machines (one Linux and one Windows) and run different set of tests on each VM.

First script gets configuration files and connects to Cohesity cluster and VMware vSphere vCenter environments. Then it will start clone process for VMs

blog1

blog2

and after clone process is done it will move to actual validation phase where we will first check that clone task is in success state and actual VMware VM’s are powered on with VMware Tools running in VMs cloned.

blog3

When VMs are up and VMware Tools are running we will run test per VM to ensure that we can push scripts trough VMware Tools. Next task is to move VMware VM to correct VM Network and then change IP configuration for each VM.

blog4

After moving VMs to correct network we will run tests for each VM

blog5

and after running tests we will clean clones automatically from Cohesity and VMware environment

blog6

Notes

This automation toolkit is not officially provided by Cohesity. Any bugs can be reported to me directly. There are some limitations with this toolkit:

You can use it only for VMware environments running vCenter

You can run tests only against Linux and Windows virtual machines and Windows machines need to have PowerShell installed.

Hope that this simple toolkit helps you to automate your organisations backup validations!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s