What makes a software-defined storage software-defined

software-defined-storage-emcworld

Software-Defined Storage is something that every storage vendor on the planet talks today. If we however stop and think about it a bit we actually realise that software-defined is just yet another buzzword.  This is 101 on SDS and we are not going to go deep in all areas. I will write more in-depth article per characteristics later. So the question is: What makes a software-defined storage software-defined?

A bit of background research

To really understand what software-defined actually means we must first do a bit of background research. If we take any modern storage – and by modern I mean something released in past 10 years – we actually see a product having somekind of hardware component, x pieces of hard disk drives / ssd drives and yes, a software component. So by definition every storage platform released in past years is actually a software-defined storage since there is software which actually handles all the nice things of them.

Well, what the hell software-defined storage then is?

Software-Defined-Storage-for-Dummies

Software-defined actually doesn’t mean something where software is defining features but:

Software-defined storage (SDS) is an evolving concept for computer data storage software to manage policy-based provisioning and management of data storage independent of the underlying hardware. Software-defined storage definitions typically include a form of storage virtualization to separate the storage hardware from the software that manages the storage infrastructure. The software enabling a software-defined storage environment may also provide policy management for feature options such as deduplication, replication, thin provisioning, snapshots and backup. SDS definitions are sometimes compared with those of Software-based Storage.

Above is a quote from wikipedia defining what SDS actually means. So let’s go deeper and look what are characteristics you would except from SDS product. Note that there are products on marketed as SDS which doesn’t have all of these. It’s not necessary to have them all, but more you have better it is – or is it?

Automation with policy-driven storage provisioning with service-level agreements

Automation of storage systems is not that new feature, but it’s a feature that is really usable and it’s also required feature if you plan to build any kind of modern cloud architecture with storage systems. Traditional storage admin tasks like creating and mapping a LUN cannot done by humans in modern cloud environment but rather they must be done by automation. What this means in reality is that in VMware environment a VM-admin is able to create new storage based in service-level needed directly from his/her VMware tools. Most of the modern storage products has at least some kind of plugin for VMware vCenter – but not all has same features. VMware released their vVols with vSphere 6 and it will be dominant model with storage deployments in future I think. It helps VMware admins to create a storage with needed features (deduplication, compression, protection, replication, etc) per VM. This is not that new feature also, since OpenStack has been using this kind of model for quite long. And there are even storage vendors having their business on this (Tintri).

Policy based provisioning and automation are in my opinion most important features of software-defined storage and you should look these carefully if achieving cloud kind of environment is your short or long term plan.

Abstraction of logical storage services and capabilities from the underlying physical storage systems

Logical storage services and capabilities and their abstraction is really not a new thing. Most of the modern storage architectures have been using this kind of abstraction for several years. What this means is for example virtualisation of underlaying storage (One or more RAID-group in generic storage pool) to save unused capacity and giving flexibility. I don’t even remember any modern storage product not using any virtualisation of underlaying storage since this has been really basic feature for years and has nothing to do with modern software-defined thinking that much.

What makes this as a modern software-defined storage feature is features like VMware vVols. They enable you to have more granular control per VM what kind of storage they need and what are needed features. Not all VM’s for example need to be replicated in sync to second location and having a separate datastore for replicated VM’s is just not enough for most of the companies – and it’s really not enough in modern cloud like architectures. You must have per VM control and all the magic must happen without VM admin to know which storage datastore is replicated and which is not. This is a quite new feature in VMware but has been available for OpenStack users for a while mainly because OpenStack is built for modern cloud environment where as VMware was mainly built for datacenter virtualisation but has implemented cloud like features later on.

Commodity hardware with storage logic abstracted into a software layer.

Traditionally most of the storage vendors had their own hardware they used with some of them designing even own ASICs but at least some form of engineered hardware combined with commodity hard disk drives / ssd drives with custom firmware. Legacy vendors, like they are nowadays called did indeed invest lot’s of money to develop engineered hardware to meet their special needs and this typically meant longer release cycles and different go-to-market strategy since introducing a new feature might mean developing a new ASIC and new hardware to support it.

Some of the startup vendors claims that using engineered hardware is expensive and it means that customers pay too much. This might be true but there is long list of advantages using engineered hardware instead of commodity hardware but by definition you, as a customer, shouldn’t look that much about this area. Commodity hardware can be as good as engineered hardware and engineered hardware can be as cheap as commodity hardware.

If the vendor you are looking for uses commodity hardware they must do many things on their software layer which can be done on hardware layer with microcode.  However which ever route they go storage logic is, and has been abstracted into a software layer years ago. All of modern storage vendors products uses software to do most of the logic but some of them uses ASICs to do some clever things, like HP uses ASIC to do de-duplication while most of the competitors use software to do this.

Scale-Out storage architecture

Scale-Out storage architecture is not that new since first commercial scale-out storage products came to market over 10 years ago. However it is still not that common to have commercial storage vendor to have a good scale-out product on their portfolio to solve most of the customer needs but typically just used for one purpose (Like scale-out NFS).

You can think scale-out in two forms.

Isolation Domain Scale-Out

Traditional method to deploy storage controllers is to use HA-pairs where you have two controllers sharing same backend disk storage system and in case of failover one can handle all traffic. This type of method is not a scale-out, but you can also do scale-out type of storage architecture relaying on HA-pairs and then connecting them together. What this allows is moving data between HA-pairs and doing all kind of maintenance and hardware refresh tasks without need to take your storage down. There are however many limitations this method has for your scalability. When you add new capacity to cluster you must select one isolation domain / HA-pair where you add this capacity rather than just adding capacity to one pool. This method has however much better failover handling because there are several isolation domains in case of faults.

This method however typically means vendor using engineered hardware etc. Think this as a method NetApp Clustered Data ONTAP uses.

Shared Nothing Scale-Out

Shared Nothing is a scale-out method where you have N storage controllers who do not share their backend storage at all. This method is more common with vendors and products built on commodity hardware since this can solve most of the problems of commodity hardware has and is in most of the cases much more understandable way to scale than typical isolation domain type of scaling (HA-pairs, etc.). All of the virtual SAN products on market relay on this type of scaling since it doesn’t need any specific hardware components to achieve scalability and high availability. This is also type of scaling all hyper-converge infrastructure (HCI) vendors use and claim that their method is same as used with Google, Facebook and Amazon type of big environments, even that it’s not 100% true.

However this is by far best method to use if you design your architecture from scratch and design it for commodity hardware. It’s also much better in handling rebuild scenarios because there’s not that big performance hit and also self healing is possible with this kind of approach.

Think this as a method Nutanix, Simplivity, SolidFire, Isilon, etc. uses.

So is one better than other?

No. Both have good things and bad things. This is a design choice you make and you can make a realiable and scalable system with both methods. When architected correctly either of them can give you performance, reliability and scalability needed.

Conclusion. Why would I care?

Understanding what Software-Defined Storage (SDS) means helps you understand more competitive landscape and helps you not falling on typical traps of FUD from competitors. Any modern storage, either NAS or SAN, is software-defined, but not all have same feature s or same kind of approach. Anyway all of them solve most of the problems modern infrastructure has nowadays and most of them can help your organisation to store more data than 10 years ago and get enough speed to meet requirements of most of your applications. When selecting storage vendor do it based on your organisational needs rather than marketing jargon.

Note. I work for “legacy” storage vendor and this has nothing to do with my job. I say same principles to every customer I talk to. This is my personal view and has nothing to do with my employer.

Seven rules to help you on preparing for problems on storage area networks

I wrote this article due the fact that every SAN-adminisrator should know how to be better prepared for problems on their storage area networking. As central storage nowadays is rather a rule than an exception does storage area networking play really important role for everyday computing.

Note that these seven steps are more like my suggestions rather than generic rules that should be followed strictly. So try to find at least something that you can implement for your environment and please let me know if there’s more relevant things than these, or something to add.

Rule #1 – Implement NTP

From my opinion this is the most important thing. Anyone who has faced situation where you needed to find out what happened on environment where all clocks are pointing different time understands relevancy of NTP. When you have all your devices on same time finding root cause is usually much easier. Implement NTP server on your management network and sync all your devices from there. You can keep your management networks NTP on proper time by syncing it from internet but it’s more relevant that all devices are on same time rather than exactly on right second of world clock.

Rule #2 – Implement good naming schema

In past servers usually got their names from action heroes and stars and this might be nice but if you want to have easy rememberable names use them as CNAMES rather then proper names. In problem situation it would be nice to see from name exactly where your device/server is located so I suggest that you use something like Helsinki-DC1-BC01-BL01-spiderman rather than just spiderman. In this example you could easily see that your server is located at Helsinki on datacenter 1 and is on blade chassis one and there blade number one.

Use consistent naming on zoning. I usually name zones ZA_host1_host2. This shows immediately that it’s zone on fabric A and it’s between host1 and host2. On SAN I always prefer that aliases are also named with same kind of naming schema; AA_host1 which is alias on fabric A for host1.

For storage area networking domain ID is like phone number, domain ID’s should always be unique. This is not usually problem if you have separate SAN’s, but if you move something between SAN’s having unique ID is crucial so from the beginning use unique id’s. This information is also used for several other things like fibre channel address of device ports etc.

Rule #3 – Create generic SAN management station

This is usually done on all bigger environments but every now and then I see environments where there is no generic SAN management station implemented. Almost every company has implemented virtualization at least in some level so creating generic SAN management station should not be any kind of problem. You can go easily with virtualized Windows Server or maybe even with just virtualized Windows 7 with RDP connection enabled but I would go with server so there can be more than just one admin on station at time.

This station should have at least these:

  • SSH and telnet tool witch allows you to output of session to text file, on Windows environments I usually go with putty
  • FTP-server (and maybe tftp also). I usually go with Filezilla Server which is really easy to configure and use
  • NTP server for your SAN environment
  • Management tools for your SAN (Fabric Manager for Cisco and DCFM for Brocade) – This is really important on larger environments for toubleshooting
  • Enough disk space to store all firmware-images and log files from switches (Rule #5)
  • ….access for internet in cases where you need to download something new or just use google when sitting on fire 😉

Rule #4 – Implement monitoring on your SAN environment

This can be done at least by using same software you use for your server environment but I would go with Fabric Manager on Cisco SAN’s and DCFM on Brocade SAN’s because these include also other features and are really useful when your environment gets bigger. Configure your management software to send email/sms when something happens – don’t just trust your eyes!

You should also implement automatic log collection for your environment. For example this helps a lot when you try to find physical link problems or slow drain devices. Configure your management station to pull out all logs from switches daily/weekly and then clear all counters so next log starts with empty counters. This can be implemented with few lines of perl and ssh library and there are plenty of exciting scripts already on google if you don’t know how to do it with perl.

Rule #5 – Design your SAN layout properly

This is really easy to achieve and doesn’t even need much time to keep in update. Create layout sketch of your SAN – even in smaller environments – and share it with all admins. You don’t need to have all servers on this sketch, include just your SAN switches and storage systems, if you want you can include your servers also but this usually makes your sketch quite big and unreadable. In two SAN environments (Having two separate SAN’s should be defacto!) plug your servers and storage always on same ports, so if you connect your storage system on ports 1-4 on switch one in fabric A, connect them to ports 1-4 also in corresponding switch on fabric B.

Rule #6 – Update your firmwares

Don’t just hang on working firmwares. There is no software which is absolutely free of bugs and this is why you should always update your firmwares regularly. I am not saying that you should go with new release as soon as it gets to downloads but try to be in as new version you can. There are lot’s of storage systems which makes requirement for firmware levels so always follow your manufactures advices. If your manufacturer doesn’t support newer then something released year ago it might be time to change your vendor!

If you have properly designed SAN with two separate networks you can do firmware upgrades without any breaks on production and most of the enterprise class SAN switches (Usually called SAN Directors) have two redundant controllers so you can update them on fly without any interruption on your production!

Rule #7 – Do backups!!!

Take this seriously. Taking backups is not hard. You can implement this on your daily statistics collection scripts or do this periodically by your hands – which ever way you choose take your backups regularly. I have seen lot’s of cases where there was no backups from switch and on crash admins needed to create everything from scratch. Implement this also to your storage systems if possible, at least IBM’s high end storage systems has features which allows you to take backups of configs. Config files are usually really small and there shouldn’t be place where there is no disk/tape space for backups of such a important things like SAN switches and storage systems. From SAN switches you might also want to keep backup of your license files as getting new license files from Cisco/Brocade can take while.

Why IBM’s XIV matters?

IBM’s XIV has raised lot’s of discussion on market, mostly because IBM claims that it’s enterprise class storage system running on SATA disks which are considered to be more midrange stuff.

There are plenty of cases where customers have evaluated IBM’s XIV storage system and realized that it’s amazing product. Here are couple of examples why XIV really matters:

  • A service provider that had used EMC disk systems for over 10 years evaluated the IBM XIV versus upgrading to EMC V-Max. The three year total cost of ownership (TCO) of EMC’s V-Max was $7 Million US dollars higher, so EMC counter-proposed CLARiiON CX4 instead. Customer selected XIV.
  • A large US bank holding company managed to get 5.3 GB/sec from a pair of XIV boxes for their analytics environment. That’s amazing performance from SATA disks!

I have seen IBM’s XIV in couple of customer environments and it’s really proven to be enterprise storage. IBM recently upgraded XIV to third generation. Same time they announced that XIV will have in future option for SSD caching which is claimed to change performance to next level at a fraction of typical SSD storage costs. Will this happen really, I will bet that this feature comes quite soon. At the same time they also made internal cache larger, added faster disk controllers and changed internal connection to InfiniBand.

IBM has proven that you can create really good performing storage by looking this from other point of view and using generic intel x64 hardware and generic SATA-disks. In fact XIV was not invented by IBM but company which was founded by Moshe Yanai (who actually leaded EMC’s Symmetrix development).

Read more about XIV from IBM’s XIV page.

Could iSCSI provide enough performance?

Allmost every day I face up situation where people are thinking could iSCSI provide enough performance? Usually it’s quite relevant to know what kind of environment is and what kind of architecture design is used but in most of the cases iSCSI can provide enough IOPS. I’ll try to give you information why.

From storage systems actual storage is shared with several methods but most used once’s are Fibre Channel and NFS, where first is block based. There is also quite many other ways to share storage, like iSCSI and FCoE, which has been hot topic for quite long now and big bang of FCoE has been waited for couple of years now. From performance point biggest improvement for last two once’s has been 10 Gbit/s ethernet technology which provides good pipe for data movement.

Intel showed on 2010’s Microsoft Management Summit iSCSI performance where they managed to get from software iSCSI and 10 Gbit/s ehternet-technology over 1 million IOPS (IO operations per second) which is quite nice. In same summit they had nice demo on their booth where they showed environment with Intel’s Xeon 5600-chipset and Microsoft software iSCSI-initiator which was able to do more than 1.2 million IO operation per second with CPU usage of almost 100%. It’s quite relevant to understand that when you have CPU utilization near 100% you cannot actually do anything else than just this IO, but this shows that you can get really massive performance by using iSCSI and 10 Gbit/s ethernet.

At past iSCSI’s bottleneck was 1 Gbit/s ethernet connection. Of course there was ways to get better performance by designing architecture correct but in most of iSCSI storage there was only 4 pcs of 1 Gbit/s connections. When 10 Gbit/s connection got more introduced to storage systems it enabled more and more cases where iSCSI was comparable solution for Fibre Channel. There used to be also dedicated iSCSI-cards in market but they are mostly gone because CPU technology got so good that CPU overhead of iSCSI was not anymore so relevant. Nowadays most of 10 Gbit/s ethernet cards can do iSCSI encapsulation on their internal chip so it won’t affect to CPU so much neither.

10 Gbit/s ethernet-technology has helped a lot and you don’t need separate SAN-networks anymore if you go with iSCSI or FCoE. You can use already exciting 10 Gbit/s connections which are now common and mostly standard on Blade-systems. Still in big environments you should have separation between your data networks and storage networks, but this can be done with proper network architecture and VLAN’s. But I would still like to do separation (at least in core level) for storage and data networking to avoid cases where problems in data networking might affect your storage systems.

FCoE is coming for sure, but there are still some limitations on it and mostly lack of native FCoE storage is reason for this. However if you are doing investment for network renewal I would keep FCoE in mind and do all new networks in way that FCoE can be implemented with less work when the time comes. While waiting, iSCSI might be good alternative for FC.

….but I still prefer old school fibre channel. Why? Brocade just released 16 Gbit/s FC-switches and again FC is faster 😉

Read more about Intel’s iSCSI performance test from here..

Miten otan käyttöön FCoE:n Ciscon Nexus-laitteissa

Fibre Channel over Ethernet (FCoE) on teknologia joka helpottaa erittäin paljon konesaliympäristöjen kaapelointia yhdistämällä samalla kuituyhteydellä sekä perinteisen IP-tietoliikenteen että SAN-yhteydet. Teknologian etu tulee erittäin nopeasta 10 Gbit/s -linkistä sekä sen oikeaoppisesta hyödyntämisestä. Perinteisesti blade-ympäristöissä on ollut erikseen 1 Gbit/s-nopeudella toimivat tietoliikenneyhteydet ja lisäksi 4/8 Gbit/s-nopeudella toimivat FC-moduulit. Suurimmista valmistajista HP on jo pitkään tuonut blade-kehikkonsa nopealla 10 Gbit/s-yhteydellä ja nyt myös esim. IBM:n kehikoihin on saatavilla moduulit tähän tarkoitukseen. FCoE:n käyttöönottaminen on erittäin helppoa, mikäli käytössä oleva CNA-kortti on tuettu käytetyllä käyttöjärjestelmällä. Kytkinpuolella FCoE:n käyttöönotto on lyhykäisyydessään seuraavanlainen prosessi:

1. Otetaan FCoE-ominaisuus käyttöön.
2. Yhdistetään VSAN jota käytetään FCoE-liikenteeseen sopivaan VLANiin.
3. Luodaan virtuaalinen Fibre Channel-rajapinta joka kuljettaa FCoE-sanomat.

Ciscon Nexus mahdollistaa ominaisuuksien käyttöönottamisen erittäin helposti joten ensimmäinen kohta on suorastaan lasten leikkiä, ajetaan vain yksi komento:

switch(config)# feature fcoe

Seuraava tehtävä on määritellä yhteys käytettävän VSANin ja VLANin välille. Käytännössä käytettävä VSAN määräytyy helposti. Yleisesti Nexus 5000-sarjan kytkimistä menee yhteydet Ciscon MDS-kytkimiin joissa itse levyjärjestelmät ovat kiinni. Käytännössä VSANin tulee siis olla täysin sama jota käytetään MDS:n puolella koska muuten liikenne ei koskaan kulje perille asti. Tämä kohta on kuitenkin pakollinen tehtävä, muuten FCoE-rajapinta ei nouse ylös. Tässä esimerkissä käytetään VSANia 210 ja VLANia 210.

switch(config)# vlan 210
switch(config-vlan)# fcoe vsan 210
switch(config-vlan)# exit

Kun itse VSAN on mäpätty sopivaan VLANiin jatketaan luomalla virtuaalinen fibre channel-rajapinta (vfc). Jokaisessa Nexuksen portissa, jolla on tarkoitus liikennöidä FCoE-liikennettä, tulee olla VFC määriteltynä. Yleisesti jokaista porttia kohden tehdään oma VFC jonka numero vastaa käytettyä porttia. Esimerkissa käytetään porttia yksi

switch(config)# interface vfc 1
switch(config-if)# bind interface ethernet 1/1
switch(config-if)# no shutdown
switch(config-if)# exit

Vaikka tässä vaiheessa voisi olettaa niin yhteys ei vielä toimi koska luotu VFC tulee myös liittää käytettyyn VSANiin, tämä selviää myös katsomalla interfacea “show interface vfc”-komennolla:

vfc1 is down (VSAN not mapped to an FCoE enabled VLAN)

Kun FCoE VSAN on määritelty sopivaan VLANiin tarvitsee enää ainoastaan liittää luotu VFC sopivaan VSANiin:

switch(config)# vsan database
switch(config-vsan-db)# vsan  interface vfc
switch(config-vsan-db)# exit

Tässä vaiheessa VFC:n pitäisi olla ylhäällä ja koneen yhteystiedot pitäisi löytyä “show flogi database”-komennolla. Nyt itse FCoE-osuus on hoidettu ja ainoa jäljellä oleva tehtävä on luoda sopivat aliakset WWN:llä ja liittää ne oikeisiin zone-määrityksiin jotta kone näkee levyjärjestelmän. Nämä tehdään kuitenkin aivan samalla tavalla, riippumatta siitä käytetäänkö perinteistä FC:tä vai FCoE:tä.