Software-Defined Storage is something that every storage vendor on the planet talks today. If we however stop and think about it a bit we actually realise that software-defined is just yet another buzzword. This is 101 on SDS and we are not going to go deep in all areas. I will write more in-depth article per characteristics later. So the question is: What makes a software-defined storage software-defined?
A bit of background research
To really understand what software-defined actually means we must first do a bit of background research. If we take any modern storage – and by modern I mean something released in past 10 years – we actually see a product having somekind of hardware component, x pieces of hard disk drives / ssd drives and yes, a software component. So by definition every storage platform released in past years is actually a software-defined storage since there is software which actually handles all the nice things of them.
Well, what the hell software-defined storage then is?
Software-defined actually doesn’t mean something where software is defining features but:
Software-defined storage (SDS) is an evolving concept for computer data storage software to manage policy-based provisioning and management of data storage independent of the underlying hardware. Software-defined storage definitions typically include a form of storage virtualization to separate the storage hardware from the software that manages the storage infrastructure. The software enabling a software-defined storage environment may also provide policy management for feature options such as deduplication, replication, thin provisioning, snapshots and backup. SDS definitions are sometimes compared with those of Software-based Storage.
Above is a quote from wikipedia defining what SDS actually means. So let’s go deeper and look what are characteristics you would except from SDS product. Note that there are products on marketed as SDS which doesn’t have all of these. It’s not necessary to have them all, but more you have better it is – or is it?
Automation with policy-driven storage provisioning with service-level agreements
Automation of storage systems is not that new feature, but it’s a feature that is really usable and it’s also required feature if you plan to build any kind of modern cloud architecture with storage systems. Traditional storage admin tasks like creating and mapping a LUN cannot done by humans in modern cloud environment but rather they must be done by automation. What this means in reality is that in VMware environment a VM-admin is able to create new storage based in service-level needed directly from his/her VMware tools. Most of the modern storage products has at least some kind of plugin for VMware vCenter – but not all has same features. VMware released their vVols with vSphere 6 and it will be dominant model with storage deployments in future I think. It helps VMware admins to create a storage with needed features (deduplication, compression, protection, replication, etc) per VM. This is not that new feature also, since OpenStack has been using this kind of model for quite long. And there are even storage vendors having their business on this (Tintri).
Policy based provisioning and automation are in my opinion most important features of software-defined storage and you should look these carefully if achieving cloud kind of environment is your short or long term plan.
Abstraction of logical storage services and capabilities from the underlying physical storage systems
Logical storage services and capabilities and their abstraction is really not a new thing. Most of the modern storage architectures have been using this kind of abstraction for several years. What this means is for example virtualisation of underlaying storage (One or more RAID-group in generic storage pool) to save unused capacity and giving flexibility. I don’t even remember any modern storage product not using any virtualisation of underlaying storage since this has been really basic feature for years and has nothing to do with modern software-defined thinking that much.
What makes this as a modern software-defined storage feature is features like VMware vVols. They enable you to have more granular control per VM what kind of storage they need and what are needed features. Not all VM’s for example need to be replicated in sync to second location and having a separate datastore for replicated VM’s is just not enough for most of the companies – and it’s really not enough in modern cloud like architectures. You must have per VM control and all the magic must happen without VM admin to know which storage datastore is replicated and which is not. This is a quite new feature in VMware but has been available for OpenStack users for a while mainly because OpenStack is built for modern cloud environment where as VMware was mainly built for datacenter virtualisation but has implemented cloud like features later on.
Commodity hardware with storage logic abstracted into a software layer.
Traditionally most of the storage vendors had their own hardware they used with some of them designing even own ASICs but at least some form of engineered hardware combined with commodity hard disk drives / ssd drives with custom firmware. Legacy vendors, like they are nowadays called did indeed invest lot’s of money to develop engineered hardware to meet their special needs and this typically meant longer release cycles and different go-to-market strategy since introducing a new feature might mean developing a new ASIC and new hardware to support it.
Some of the startup vendors claims that using engineered hardware is expensive and it means that customers pay too much. This might be true but there is long list of advantages using engineered hardware instead of commodity hardware but by definition you, as a customer, shouldn’t look that much about this area. Commodity hardware can be as good as engineered hardware and engineered hardware can be as cheap as commodity hardware.
If the vendor you are looking for uses commodity hardware they must do many things on their software layer which can be done on hardware layer with microcode. However which ever route they go storage logic is, and has been abstracted into a software layer years ago. All of modern storage vendors products uses software to do most of the logic but some of them uses ASICs to do some clever things, like HP uses ASIC to do de-duplication while most of the competitors use software to do this.
Scale-Out storage architecture
Scale-Out storage architecture is not that new since first commercial scale-out storage products came to market over 10 years ago. However it is still not that common to have commercial storage vendor to have a good scale-out product on their portfolio to solve most of the customer needs but typically just used for one purpose (Like scale-out NFS).
You can think scale-out in two forms.
Isolation Domain Scale-Out
Traditional method to deploy storage controllers is to use HA-pairs where you have two controllers sharing same backend disk storage system and in case of failover one can handle all traffic. This type of method is not a scale-out, but you can also do scale-out type of storage architecture relaying on HA-pairs and then connecting them together. What this allows is moving data between HA-pairs and doing all kind of maintenance and hardware refresh tasks without need to take your storage down. There are however many limitations this method has for your scalability. When you add new capacity to cluster you must select one isolation domain / HA-pair where you add this capacity rather than just adding capacity to one pool. This method has however much better failover handling because there are several isolation domains in case of faults.
This method however typically means vendor using engineered hardware etc. Think this as a method NetApp Clustered Data ONTAP uses.
Shared Nothing Scale-Out
Shared Nothing is a scale-out method where you have N storage controllers who do not share their backend storage at all. This method is more common with vendors and products built on commodity hardware since this can solve most of the problems of commodity hardware has and is in most of the cases much more understandable way to scale than typical isolation domain type of scaling (HA-pairs, etc.). All of the virtual SAN products on market relay on this type of scaling since it doesn’t need any specific hardware components to achieve scalability and high availability. This is also type of scaling all hyper-converge infrastructure (HCI) vendors use and claim that their method is same as used with Google, Facebook and Amazon type of big environments, even that it’s not 100% true.
However this is by far best method to use if you design your architecture from scratch and design it for commodity hardware. It’s also much better in handling rebuild scenarios because there’s not that big performance hit and also self healing is possible with this kind of approach.
Think this as a method Nutanix, Simplivity, SolidFire, Isilon, etc. uses.
So is one better than other?
No. Both have good things and bad things. This is a design choice you make and you can make a realiable and scalable system with both methods. When architected correctly either of them can give you performance, reliability and scalability needed.
Conclusion. Why would I care?
Understanding what Software-Defined Storage (SDS) means helps you understand more competitive landscape and helps you not falling on typical traps of FUD from competitors. Any modern storage, either NAS or SAN, is software-defined, but not all have same feature s or same kind of approach. Anyway all of them solve most of the problems modern infrastructure has nowadays and most of them can help your organisation to store more data than 10 years ago and get enough speed to meet requirements of most of your applications. When selecting storage vendor do it based on your organisational needs rather than marketing jargon.