The Winds of Change

My reason for writing this article is that a wind of change is blowing through the storage market. The success of cloud storage such as Amazon S3 and Syncplicity has opened the way to new methods of archiving, making backups, and even disaster recovery. But the biggest disruptor is of course flash memory, and more specifically PCIe SSDs.

PCIe SSDs are not bandwidth limited by the SATA/SAS wiring and (if implemented well) protocol overhead. As a result, PCIe drives have up to three times as many channels of flash memory. And well-designed PCIe SSDs do not have to carry the burden of RAID controllers and protocols that were architectured for hard drives with completely different characteristics than flash memory. But even if they use a PCIe/SAS bridge, PCIe SSDs offer higher reliability and vastly superior performance than the best enterprise drives. But there is much more going on.

As PCIe SSDs offer large capacities (up to 10TB!) and performance in a very small form factor, they open new markets. It is interesting to see the completely new solutions that are now available, solutions that are much better suited for certain workloads. One example of a workload where traditional SANs fall short is virtual desktops.

Virtual Desktops

Virtual desktops like Xendesktop or VMware View have been promising significant energy and cost savings, but these savings almost never materialize in reality. The energy saving claims made a few years ago were ridiculous; they were based on the assumption that we are still using power hogging desktops. Replace those with thin clients and you magically get massive energy savings.

The reality is that most of the IT professionals already use a 20-30W portable instead of an old 150W desktop, and the extra server load was not helping save energy either. Even if portables were not used, many business desktops today sip small amounts of energy. And if there was any miraculous energy saving, the additional complex storage system would be the final blow. The end result of desktop virtualization is often higher instead of lower energy bills. But perhaps worse is that knowledge workers hated most of the virtual desktops project with a passion. Suddenly several actions that used to complete without any noticeable response time became laggy.

Although there were serious costs savings if your desktop deployment and management was just organized chaos, every organization that replaced PCs with virtual desktops faced the need for huge investments. As lots of people boot up their virtual desktop in the morning, massive amounts of data is written and read in a rather random way: the so called “boot storm”. The solution was to boot up the desktops in a staggered way, tens of minutes before the arrival of the users, and to perform all kinds of special optimizations all over the software stack. But that is hardly more than a band-aid: what about unexpected hot fix patches, or what if people arrive a little bit earlier on occasion?

Data source: NetApp News 2013

Astute readers understand that the administration of virtual desktops is quite a bit more complex than the traditional setup with roaming profiles and saving files on a centralized file server. Only the most recent and high-end SANs could really deal with these specific requirements. Granted, some of the essential storage tasks like backup and archiving are a lot easier once you have a SAN in place… but mostly after you have invested in all kinds of expensive management software. When you start to invest in a complex SAN platform, the costs seem to multiply like rabbits.

In short, although a fast SAN seemed to be an enabler, they were also a deal breaker in the virtual desktop world. They're too slow and/or too expensive, and they're also power hungry.

Several companies feel they have a much better alternative and it is very interesting to see how the Fusion–IO and Intel PCIe SSDs are being turned into innovative and specialized alternatives for the typical SAN solution. Let's discuss a few of these over the next several pages.

Introduction: Enterprise Storage 101 Nutanix: No More SAN
POST A COMMENT

60 Comments

View All Comments

  • WeaselITB - Tuesday, August 6, 2013 - link

    Fascinating perspective piece. I look forward to the CouldFounders review -- that stuff seems pretty interesting.

    Thanks,
    -Weasel
    Reply
  • shodanshok - Tuesday, August 6, 2013 - link

    Very interesting article. It basically match my personal option on SAN market: it is an overprice one, with much less performance per $$$ then DAS.

    Anyway, with the advent of thin pools / thin volumes in RHEL 6.4 and dmcache in RHEL 7.0, commodity, cheap Linux distribution (CentOS costs 0, by the way) basically matche the feature-set exposed by most low/mid end SAN. This means that a cheap server with 12-24 2.5'' bays can be converted to SAN-like works, with very good results also.

    In this point of view, the recent S3500 / Crucial M500 disks are very interesting: the first provide enterprise-certified, high performance, yet (relatively) low cost storage, and the second, while not explicitly targeted at the enterprise market, is available at outstanding capacity/cost ratio (the 1TB version is about 650 euros). Moreover it also has a capacitor array to prevent data loss in the case of power failure.

    Bottom line: for high performance, low cost storage, use a Linux server with loads of SATA SSDs. The only drawback is that you _had_ to know the VGS/LVS cli interface, because good GUIs tend to be commercial products and, anyway, for data recovery the cli remains your best friend.

    A note on the RAID level: while most sysadmins continue to use RAID5/6, I think it is really wrong in most cases. The R/M/W penalty is simply too much on mechanincal disks. I've done some tests here: http://www.ilsistemista.net/index.php/linux-a-unix...

    Maybe on SSDs the results are better for RAID5, but the low-performance degraded state (and very slow/dangerous reconstruction process) ramain.
    Reply
  • Kyrra1234 - Wednesday, August 7, 2013 - link

    The enterprise storage market is about the value-add you get from buying from the big name companies (EMC, Netapp, HP, etc...). All of those will come with support contracts for replacement gear and to help you fix any problems you may run into with the storage system. I'd say the key reasons to buy from some of these big players:

    * Let someone else worry about maintaining the systems (this is helpful for large datacenter operations where the customer has petabytes of data).
    * The data reporting tools you get from these companies will out-shine any home grown solution.
    * When something goes wrong, these systems will have extensive logs about what happened, and those companies will fly out engineers to rescue your data.
    * Hardware/Firmware testing and verification. The testing that is behind these solutions is pretty staggering.

    For smaller operations, rolling out an enterprise SAN is probably overkill. But if your data and uptime is important to you, enterprise storage will be less of a headache when compared to JBOD setups.
    Reply
  • Adul - Wednesday, August 7, 2013 - link

    We looked at Fusion-IO ioDrive and decided not to go that route as the work loads presented by virtualize desktops we offer would have killed those units in a heartbeat. We opted instead for a product by greenbytes for our VDI offering. Reply
  • Adul - Wednesday, August 7, 2013 - link

    See if you can get one of these devices for review :)

    http://getgreenbytes.com/solutions/vio/

    we have hundreds of VDI instances running on this.
    Reply
  • Brutalizer - Sunday, August 11, 2013 - link

    These Greenbyte servers are running ZFS and Solaris (illumos)
    http://www.virtualizationpractice.com/greenbytes-a...
    Reply
  • Brutalizer - Sunday, August 11, 2013 - link

    GreenByte:
    http://www.theregister.co.uk/2012/10/12/greenbytes...

    Also, Tegile is using ZFS and Solaris:
    http://www.theregister.co.uk/2012/06/01/tegile_zeb...

    Who said ZFS is not the future?
    Reply
  • woogitboogity - Sunday, August 11, 2013 - link

    If there is one thing I absolutely adore about real capitalism it is these moments where the establishment goes down in flames. Just the thought of their jaws dropping and stammering "but that's not fair!" when they themselves were making mockery of fair prices with absurd profit margins... priceless. Working with computers gives you so very many of these wonderful moments of truth...

    On the software end it is almost as much fun as watching plutocrats and dictators alike try to "contain" or "limit" TCP/IP's ability to spread information.
    Reply
  • wumpus - Wednesday, August 14, 2013 - link

    There also seems to be a disconnect in what Reed-Solomon can do and what they are concerned about (while RAID 6 uses Reed Solomon, it is a specific application and not a general limitation).

    It is almost impossible to scale rotating discs (presumably magnetic, but don't ignore optical forever) to the point where Reed-Solomon becomes an issue. The basic algorithm scales (easily) to 256 disks (or whatever you are striping across) of which typically you want about 16 (or less) parity disks. Any panic over "some byte of data was mangled while a drive died" just means you need to use more parity disks. Somehow using up all 256 is silly (for rotating media) as few applications access data in groups of 256 sectors a time (current 1MB, possibly more by the time somebody might consider it).

    All this goes out the window if you are using flash (and can otherwise deal with the large page clear requirement issue), but I doubt that many are up to such large sizes yet. If extreme multilevel optical disks ever take over, things might get more interesting on this front (I will still expect Reed Solomon to do well, but eventually things might reach the tipping point).
    Reply
  • equals42 - Saturday, August 17, 2013 - link

    The author misunderstands how NetApp uses NVRAM. NVRAM is not a cache for the hottest data. Writes are always to DRAM memory. The writes are committed to NVRAM (which is mirrored to another controller) before being acknowledged to the host but the write IO and its commitment to disk or SSD via WAFL sequential CP writes is all from DRAM. While any data remains in DRAM, it can be considered cached but the contents of NVRAM do not constitute nor is it used for caching for host reads.

    NVRAM is only to make sure that no writes are ever lost due to a controller loss. This is important to recognize since most mid-range systems (and all the low-end ones I've investigated) do NOT protect from write losses in event of failure. Data loss like this can lead to corruption in block-based scenarios and database corruption in nearly any scenario.
    Reply

Log in

Don't have an account? Sign up now