Your Mission Critical Applications Deserve Real Backup Validation!

accident disaster steam locomotive train wreck

Photo by Pixabay on Pexels.com

Every organisation knows how important data protection is. However still most of the organisations never test their backups. Why? Because it is complex issue. However if you do not test how do you know that you can really survive from disasters?

Modern Approach for Data Protection

First step for Data Protection is of course thinking it in modern ways. Even that most of the restores from backups are still individual files and/or folders your organisation has to prepare for bigger disasters. What if half of your 500 VM production environment is hit by ransomware attack? How do you really survive from these types of disasters? Restore times with so called legacy backups might take days, even weeks. Can your business survive without access to these VMs for weeks, propably not.

Data protection depends upon timely backups, fast global search, and rapid recovery. Cohesity DataProtect reduces your recovery point objectives to minutes. Unique SnapTree technology eliminates chain-based backups and delivers instantaneous application or file-level recovery, with a full catalog of always-ready snapshots even at full capacity. With this approach it can dramatically reduce recovery time.

However even modern data protection is not enough if you don’t know that you really have something to recover to. Most of the modern technologies handle file system level data integrity but still there is no way to really know that your backups are fully recoverable without testing them.

From Data Protection to Recovery Testing

Typically organisations approach recovery testing with either just recovering single (or multiple) virtual machines. This of course makes sure that you can recover individual VMs but it doesnt ensure that you can recover something that is truly working. Some backup vendors implement recovery testing, but still it is mostly just VMs or some basic uptime testing.

Other way to do this is manually restore application setups and do manual testing. This is very costly because it requires lot’s of manual work, and also introduces several risks. However it enables your organisation to really test application workflows with proper testing. Do you really get answer from your 3-layer web application, can you get answers to your DB queries etc. What if you could take this method of running complex testing but without any need for manual labour?

Automating Recovery Testing

Because modern hypervisor platforms are API driven it is pretty easy to automate things on VM level. When you add API driven data protection platform, like Cohesity, you can automate full recovery testing with very complex testing. This is a issue I hear from most of my Service Provider customers – but also from bigger enterprise customers. How to automate complex recovery testing? Lets see….

Cohesity Backup Validation Toolkit

To make things simpler, you can download Cohesity Backup Validation toolkit from here and with minimal scripting knowledge it is easy to automate validation process.

After downloading it is time to create some config-files. Lets start with environment.json -file. This file contains connection information for both Cohesity, and VMware vSphere environments. Create file with content:

{
        "cohesityCluster": cohesity-01.organisation.com",
        "cohesityCred": "./cohesity_cred.xml",
        "vmwareServer": "vcenter-01.organisation.com",
        "vmwareResourcePool": "Resources",
        "vmwareCred": "./vmware_cred.xml"
}

After this we need to create actual config.json -file containing information about each virtual machine we are about to test.

This file also defines tests per VM so it is very easy to define multiple tests but only use selected per VM. Script also enables you to attach VM to needed test network, and change IP address to different for testing purpose so you don’t need to test with overlapping production IPs, or create siloed networking for VMware.

Note that VMs don’t need to be protected with same protection job making this more scalable since propably you have different job for web frontends and actual backend databases.

{
    "virtualMachines": [
        {
            "name": "Win2012",
            "guestOS": "Windows",
            "backupJobName": "VM_Job",
            "guestCred": "./guestvm_cred.xml",
            "VmNamePrefix": "0210-",
            "testIp": "10.99.1.222",
            "testNetwork": "VM Network",
            "testSubnet": "24",
            "testGateway": "10.99.1.1",
            "tasks": ["Ping","getWindowsServicesStatus"]
        },
        {
            "name": "mysql",
            "guestOS": "Linux",
            "linuxNetDev": "eth0",
            "backupJobName": "VM_Job",
            "guestCred": "./guestvm_cred_linux.xml",
            "VmNamePrefix": "0310-",
            "testIp": "10.99.1.223",
            "testNetwork": "VM Network",
            "testSubnet": "24",
            "testGateway": "10.99.1.1",
            "tasks": ["Ping","MySQLStatus"]
        }
    ]
}

And then final step is to create actual credential files. To prevent having usernames and password in configuration files in plaintext format we can use  simple powershell script to create these. You can have one shared credential file for all VMs, or you can have one per VM. Note that these users must have administrator level access to VMs to change IP network to test network.

To create credential files you can use included createCredentials.ps1 script which will create only one guestvm_cred.xml file but if you want to create more you can just simply run simple powershell command:

Get-Credential | Export-Clixml -Path guestvm_more.xml

Since this file is encrypted it can be only accessed with same user who created file, so make sure that you create credential files with same user you are using for running testing scripts.

So How Does it Work?

Here is an example run to clone two virtual machines (one Linux and one Windows) and run different set of tests on each VM.

First script gets configuration files and connects to Cohesity cluster and VMware vSphere vCenter environments. Then it will start clone process for VMs

blog1

blog2

and after clone process is done it will move to actual validation phase where we will first check that clone task is in success state and actual VMware VM’s are powered on with VMware Tools running in VMs cloned.

blog3

When VMs are up and VMware Tools are running we will run test per VM to ensure that we can push scripts trough VMware Tools. Next task is to move VMware VM to correct VM Network and then change IP configuration for each VM.

blog4

After moving VMs to correct network we will run tests for each VM

blog5

and after running tests we will clean clones automatically from Cohesity and VMware environment

blog6

Notes

This automation toolkit is not officially provided by Cohesity. Any bugs can be reported to me directly. There are some limitations with this toolkit:

You can use it only for VMware environments running vCenter

You can run tests only against Linux and Windows virtual machines and Windows machines need to have PowerShell installed.

Hope that this simple toolkit helps you to automate your organisations backup validations!

Building a Modern Data Platform by Exploiting the True Possibilities of Public Cloud

analysis blackboard board bubble

Photo by Pixabay on Pexels.com

Building a modern next-generation datacenter requires specific approach and understanding of automation. When we are designing modern datacenter we have to understand that data is center element in todays business. We have to understand that with automation we can not only save time but ensure that human error factor is reduced dramatically.  On-premise datacenter, even the next-generation one, is still only one element of data platform and since no modern data platform would be complete without having option to use the public cloud and in fact public cloud plays significant role in building modern data platform and providing all the capabilities we just couldn’t get any other way.

In this post we look look the benefits of public cloud while taking care that we overcome all the challenges we might see in public cloud adoption embracing cloud as functional key element of our platform.

Why our data needs public cloud?

While the modern storage systems are very good and past couple years they have evolved lot modern data centric approach and fast changes in business landscapes require flexibility, scalability, data movement and commercial approach which makes quickly clear that cloud can be potentially answer for all of these challenges.

While these business challenges are quite common in pretty much all traditional systems they are area where public cloud can be strongest. Cloud can, in theory, scale infinite and provide consumption model where organisations can move CAPEX investments to OPEX by paying only what they need while still having option of flexibility by going bigger or smaller based on current business requirement. But cloud can do also much more. We can easily take copy of our data and do pretty interesting things with it – once it is copied or moved to public cloud. Typically organisations start with low-hanging fruits, backups, since they are very easily moved from on-premise to cloud since pretty much every modern backup software supports extension to cloud (If your’s doesn’t maybe it’s very good time to look something better). When we backup our data to public cloud we can actually benefit more from it. We can use this cold data for business analytics or artificial intelligence. But it can work also as a disaster recovery. With proper design this can be way cheaper than building disaster recovery site. In the end flexibility is the most compelling reasons for any organisation to consider leveraging public cloud.

But while these benefits are pretty clear why so many organisations fail to meet these benefits by not moving to cloud?

Why organisations resist moving to the cloud?

It’s not about what public cloud can do it is more about what it doesn’t that tends to stop organisations wholeheartedly embracing cloud when it comes to organisations most valuable assets, data.image.png

As we’ve worked through the different areas of building a modern data platform our approach to data is way more than just storage. It is insight of data, protection, security, availability, and privacy, and these are things not normally associated with native cloud storage. Traditionally native cloud storage is not built to handle these types of needs but to be pretty much easily scalable and cheap.  And since organisations got so used to these requirements they don’t want to move their data to cloud if it means losing all of those capabilities, or having to implement and learn a new set of tools to deliver them.

Of course there is also the “data gravity” problem, we can’t have our cloud based data siloed away from the rest of our platform, it has to be part of it. We need to be able to move data in to the cloud but also ensure that we can move it back to on-premise again and even between cloud providers while still retaining all of those key elements that enterprise organisations require – control and management.

So is there really a way to overcome these challenges and have cloud as fundamental part of modern data platform. Yes, there is.

Making cloud be part of the enterprise data platform

There are dozens and dozens companies trying to solve this issue. Most of them start from the top without really looking the real problem, data mobility.  If you look AWS Marketplace’s storage category you will see almost 300 different options available so the question is how one knows which really gives organisation full potential for true hybrid cloud. The answer is, one really cant without deep knowledge. I will not point any single vendor but quite many makes claims that they can give you data mobility and leverage your data in full potential while only few of them can really do this.

There are two things making this very hard.

First is data movement between on-premise and cloud. It’s pretty easy to copy data from point A to point B but how to make this cost efficient and fast. Moving huge amounts of data takes time even with very fast internet connections so having builtin capabilities of moving only needed blocks can make significant difference not only in migration/movement times but since pretty much all cloud vendors charge egress traffic when it is time to move data back to on-premise or to other cloud vendor this can mean a huge difference in costs.

Second is ability to use migrated/moved data to several purposes. Using cloud as backup target is quite inefficient if you cannot use the same data as source for DR, analytics, AI or test&dev. Cloud storage doesn’t cost that much but if you can use if efficiently in more than one use case you will reduce the total cost quite much.

Both of these are foundation of enterprise capabilities. And while adding enterprise capabilities are great, the idea of a modern data platform relies on having our data in the location we need it, when we need it while maintaining management and control. This is where the use of efficient technology provides real advantage. You can achieve this in many ways one being for example using NetApp’s ONTAP  storage system as a consistent endpoint allowing organisations to use the same tools, policies and procedures at the core of data platform and extend this to organisations data in the public cloud. This is possible if vendor has an modern software-defined approach.

NetApp’s integrated SnapMirror provides the data movement capabilities so one can simply move data in and out of and between clouds. Replicating data in this way means that while on-premise version can be the authoritative copy, it doesn’t have to be the only one. Replicating a copy of  data to a location for a one off task, which once completed can then be destroyed, is a powerful capability and an important element of simplifying the extension of organisations data platform into the cloud.

So technology matters?

In short answer, no. One doesn’t need to use technology vendor X to deliver true hybrid cloud service. You do not need to use NetApp but I have used it as an example since it has nice cloud integration features built-in and because of that it can deliver modern data platform easily by providing consistent data services across multiple locations (on-premise and cloud) while still maintaining all critical enterprise controls. Of course this means that you need to have NetApp on-premise and in cloud.

When you evaluate vendor Y for your next-generation datacenter it is very critical to think how you can build your enterprise data platform to have an option to expand your business to cloud. While there are other data service providers having somewhat similar services as NetApp I think that NetApp’s story and capabilities are in line with the requirements for modern data platform. There are more solutions which can be used to achieve similar solution and even go bit further but I will cover one of them in my next post.

In the end most important thing on your design sterategy, if it is to include public cloud, is to ensure that you have appropriate access to data services, integration, control and data management. It is crucial that you don’t put your organisations most valuable asset, data, at risk or dimish the capabilities of your data platform by using the cloud. Cloud is playing huge role in future data plaforms so make sure you have easy option to move workloads to cloud – and back.