The shifting dynamics of infrastructure

In my former role, I was the one who had the final word on what technologies our firm was to include in our integrated offerings to our clients.  I spoke to dozens of technology manufacturers each month, evaluated their products in my lab, spoke to clients who used or evaluated the technologies, etc.  Many of these hopefuls left empty-handed even though their technologies may have been quite exciting, even performing better than the standard-bearers in the markets they were trying to break into.  If that market was the Enterprise Storage market, they had two strikes on them before ever coming to the plate, and I will discuss why.

Some may know that I’ve been a NetApp advocate for quite some time, but that’s NOT the reason the newer firms had a disadvantage.  I have ALWAYS submitted that if someone else came to the table with the best demonstrable breadth and depth of feature/functionality/efficiency, with some proof of staying power, that would be the solution that I’d prefer to integrate and bring to my clients.

That’s not what I’m talking about here.

I’m talking about the market dynamics that present a real danger to today’s storage startups.  I can’t look a client in the eye three years from now if I’ve recommended to them a product that is either gone, or subsumed into a black pit of a mega-conglomerate and de-emphasized.  Even if there is a 25% risk of that happening, that’s more than zero and must be added to the product selection process.

Let’s discuss the dynamic.  Please note that what follows are general statements, I’m well aware that of course it won’t go down 100% the way this flows, but I argue it doesn’t have to for the dynamic to still hold true.

The Storage market has a continuum that’s typically broken up into a few groups:

  • Service Providers
  • Large Enterprises
  • Medium-sized Enterprises (MSE)
  • Small-to-Medium sized Businesses (SMB)
  • Startups

If you are starting a business today (or have done so in the past 2-3 years), the overwhelming odds are that you aren’t going to buy any infrastructure. Why should you?  With MS or Google, you’ve got 50-75% of the functionality you need, you can SaaS most of the rest with other vendors (Salesforce, Netsuite, Box, DropBox, you name it).  Even if you need servers, you’ve got the hyperscalers.  So Startups can do whatever they want IT-wise in ways that simply couldn’t be done ten or twenty years ago.

Startups, of course, become SMBs, then MSEs, as they succeed and grow.  Sure, some of them get bought by established, larger firms that have their own infrastructure, but over time that pressure to be hyper-responsive to compete with smaller competitors will push everyone in the bottom three segments – Startups, SMB, and MSE – to move to cloud.

Of course, “cloud” falls under the Service Provider moniker.  So as the infrastructure market for the SMB and MSE buckets shrink, the data’s gotta live somewhere, right?

Almost EVERY Storage startup first needs to establish a beachhead in SMB/MSE – that’s because Large Enterprises won’t give any of them the time of day until they’ve got a solid,  stable, and reference-able customer base.  This means that all of these storage startups (and more pop up every week it seems) are fighting over a market segment that is shrinking before our very eyes and will continue to shrink at an accelerating rate as the startups (using all cloud) of today grow into the SMB/MSE’s of tomorrow, and the current SMB/MSE’s transition to cloud in order to compete with them.  Only when they transform into Larger/Medium Enterprises will these folks have the need for any significant storage infrastructure, especially as they’ve likely optimized their organization and processes to use cloud resources for most everything IT.  Note, we’re talking about a few years down the road here, but this movement is real.

What of the other side of the market continuum?  Large Enterprise, and Service Providers?

If the above dynamic holds true, Large Enterprise will, despite some leveraging of the cloud, still maintain a sizable, if not expanding, on-premises infrastructure.  What would have been hyper-growth of on-prem storage may now only be logarithmic growth.  But Large Enterprises are still buying large storage systems from the large storage vendors in large quantities, despite what how the trade rags tech blogs opine on the market.  As this segment of the market is hesitant to deploy startup technologies, and has established relationships with the existing/established “big 5” storage vendors, there’s not much of a chance for others to break in- you’ll see mostly the “big 5” trading market share of this back and forth (which presents its own problems from a growth perspective to the “big 5”, that’s another story).

The Service Providers, however, are seeing storage growth at accelerating rates never before seen in the history of IT, and that acceleration will itself increase as the SMB/MSEs flock to the cloud and abandon infrastructure.  Most of the storage startups won’t get to see much if any of that business, as they’re going to be too busy picking through the remains of what was once the SMB/MSE market they were architected to serve.  THIS segment represents the real growth opportunity for the “big 5”- finding ways to leverage their platforms in physical and virtual form to support and sell into this Service Provider market segment.

So if you look at the choices made by the big storage companies (NetApp included), I believe you see a recognition of this dynamic and where their companies are going to succeed long-term, rather than solving yesterday’s problems cheaper or faster.

This dynamic is what gives me pause when evaluating today’s storage startups.  Don’t get me wrong, the tech in most cases is amazing, and there are definitely things they do very, very well.  The difference between these startups and the storage startups of 20 years ago is that the former startups had the TIME to develop their enterprise cred.  These startups do not have that luxury, as they will need to establish that cred before the clock runs out on the SMB/MSE testing ground.

Netapp All-Flash FAS (AFF) – What does this mean?

A bunch of my contemporaries have published excellent technical blogs on NetApp’s recent release of their All-Flash FAS systems and simultaneous massive reduction in the acquisition and support prices of those same systems.  I’ll pop those on my blog later.

So there’s great new info on how great these platforms perform, and how their costs are now in-line (or better) than competitors who have been flash-focused for a while now.

Assuming the performance is there (and based on performance numbers I’m seeing in real-world scenarios, it is), and those costs are well understood, does this development mean anything important to the storage or converged infrastructure market?

You betcha.  

AFF now provides the SUPERSET of ALL STORAGE FEATURES offered by all the other flash storage providers, COMBINED.  

Think about that for a second.  Every protocol, file or block. Data movement across nodes. Data access across any node. Every data protection method desirable- no-penalty snapshot, Asynch replication, SYNCH replication, cross-site real-time replication with ZERO RTO/RPO failover, and vSphere 6/vVols support. Dedupe. Compression. Space-efficient cloning of volumes, luns, or files.  Thin Provisioning.  Non-disruptive, no-migration upgrades. Don’t forget the multi-tenancy features required by service providers- including the ability to magically and correctly route data to conflicting IP subnets using IP Spaces and broadcast domains.  Add to that a myriad of ‘little’ features that allow enterprises to just say “yes, I can do it that way if I want”.

Now that NetApp (perhaps finally) has figured out that they can physically do flash as well as anyone else in the business within OnTAP, AND price it to sell, they can now RE-FOCUS the conversation on what’s been most important all along- the DATA – and the need to deliver and protect it in all sorts of form factors (datacenter, cloud, devOps, archive).

NetApp still has the alternative flash platforms (EF, and one day FlashRay) for those particular environments that don’t require all of the functionality that OnTAP excels at providing, so they’ve got their bases covered. I suspect most customers will opt for the features of OnTAP, in case they are needed later- especially if it doesn’t mean a significant price difference. 

But the conversation about performance and price viz-a-viz other flash vendors is now OVER, and we can get back to solving problems.  Which is where NetApp has been dominant for a long, long time. 

On Change, Hysteria, and Continuity

Change is scary.  Whether it’s a good or bad change, it’s ALWAYS a learning opportunity, whether the change happened to you directly, or to people you know or are just aware of.

Personally, I find myself at a point of very positive change, joining the esteemed firm Red8, LLC., after over 21 years of running the IT infrastructure VAR practice of another company.  The scary part of this change is that I need to quickly improve my team-building and team-belonging skills; and re-learn how to utilize the resources available to me without abusing or mis-using them (and, of course, become a valuable resource to others!).  My success in this new venture will directly correlate to my success in improving those skills.  I’m excited beyond words.

Other changes are afoot at a technology partner I work with.  Whether these changes are necessary, overdue, or unavoidable, are questions I’m going to think about for a while.   This  technology company (NetApp) has, in my opinion, the best technology on the market for solving the biggest problems on the minds of CIOs and CTOs today.  They’re not going to come to customers and ask what they want to buy; they’re going to ask what the problems are and come back with a way to utilize their technology to solve those problems.  That aligns perfectly well with how I approach customers, so it should be no surprise that I work with NetApp…a LOT.

This week there was a layoff at NetApp, and recently they’ve made some public-facing missteps that have nothing (really) to do with how they solve customer problems.  There are market dynamics at work across the whole enterprise storage market – namely, the move to cloud and the emergence of many second/third tier storage vendors – and those dynamics are going to impact the biggest storage vendors most markedly.  Again- this should be no surprise.  These particular changes have appeared to catch NetApp more flat-footed than we’ve come to expect from them in the past.

NetApp, as it has historically done, has leaned right into the cloud dynamic, working to create solutions that will work with its existing technology, but allow for data to move from those platforms in and out of the public cloud providers.  They are correct (IMO) in their guess that enterprises will NOT choose to put ALL of their data in the cloud, and will be forced to maintain at least some infrastructure to support their data management and production needs.

They are also guessing that customers will want their data in the cloud to look like and be managed like their data residing on-premise.  This remains to be seen.  Unstructured data has been moving to SaaS platforms like Box.com and Dropbox (among others) at accelerating rates.  Further, making cloud storage look like it’s on a NetApp reminds me a bit of VTL technology, forcing a tape construct on large-capacity disks.  There will be imperative use cases for this, but it’s not the most efficient way to use the cloud (it may be more functional in cases).

What has confused many CTOs (and technology providers) is the simultaneous rise of cloud…and FLASH.  So, we’ve got one dynamic that is operationalizing unstructured (and some structured) data and getting it out of our administrative hands, and then we’ve got this other dynamic that’s geared towards our on-premise structured data, pushing the in-house application bottlenecks back up to the CPU and RAM where they belong.   I can imagine having a single conversation with a CIO convincing him on the one hand to migrate his data and apps to the cloud to optimize budget and elasticity, and before it’s over perhaps discussing the benefits of moving his most demanding internal workloads to flash-based arrays which eliminates elasticity altogether.  (Should it stay, or should it go??)

Now imagine you are a CTO at a technology company like NetApp, that solves many problems within the enterprise problem set, for a wide array (no pun intended) of customers.  How are you supposed to set a solid technology direction when the industry’s futures arrows are all pointing different ways, and the arrows constantly move?  As a market leader, you’re expected to be right on EVERY guess, and if you’re not, the trolls have a field day and the market kicks you in the tail. The “direction thing” (think GHWB’s “vision thing”) has contributed to the current NetApp malaise; the outside world sees multiple flash arrays, an object storage platform, Clustered OnTAP slowly (perhaps too slowly) replacing 7-mode systems, E-Series storage, a cloud-connected backup target product acquired from Riverbed…it’s hard to see a DIRECTION here.  Internally, there have been multiple re-orgs upon re-orgs.  But, what you DO see is the multiple moving arrows pointing multiple ways- and that’s the result of NetApp trying to solve the entire enterprise problem set in regards to data storage and management.  If NetApp can be accused of fault, it can be that it perhaps tries to solve too MANY problems at once; you’re faced with keeping up with the changing nature of each problem, and sometimes the multiple problems force you into conflicting or multiple solution sets.

If you are a technology provider that only has one product- say, an all-flash array – you don’t have this direction problem, because your car doesn’t even have a steering wheel.  It’s a train on a track, and you’d better hope that the track leads you to the promised land (yes, I’m aware I mixed metaphors there. Moving on..).  Given the history of IT and technology invention and innovation, I wouldn’t make that bet. If you’re not leading the market to where you’re going, you’re only hoping it’s going to remain in your current direction.  So the tier-2 and tier-3 vendors do not worry me too much long term, as they’re not solving problems that others haven’t solved already; sure, they’ll grab some market share in the mid-market, but the cloud dynamic is going to impact them just as hard as the big storage vendors, and they’ll be hard pressed to innovate into that dynamic. Combine that with the impact of newer, radically different non-volatile media already hitting early adopters and I believe we’ll get a thinning of that herd over the next few years.

So sure, this week has been a tough one for NetApp; and I feel horrible for those that lost their jobs.  Given the rate at which other companies had been poaching NetApp’s talent prior to the layoff, I suspect any pain for these individuals will be short-lived.  (I do find it amazing that trolls that bash NetApp celebrate when someone from NetApp leaves for another firm; if they liked them so much, why have they been bashing them before? Shouldn’t they now bash their new firm?)

I do want to caution those who say that NetApp has stopped innovating or is in decline.  That could not be further from the truth.  Clustered OnTAP is an evolution of OnTAP, not simply a new release.  It’s not totally different, but it is… MORE.  It’s uniquely positioned for the cloud-based, multi-tenant, CYA, do-everything-in-every-way workload that is being demanded of storage by large enterprise and service providers.  Everyone else is still focused on solving yesterday’s problems faster (’cause it isn’t cheaper, folks).  The technology portfolio has never been stronger.  NetApp has historically been a year or more ahead of new problem sets- which certainly has negative revenue impacts as the market catches up (think iSCSI, Primary de-dupe, multi-protocol NAS, Multi-tenancy, vault-to-disk) –  but count NetApp out at your own peril.  People have been doing it since 1992 and NetApp has made fools of them all.

LACP aggregate on an Extreme Switch

Little helpful tidbit.

Setting up an LACP aggregate on an Extreme switch in the CLI is a piece of cake.

# enable sharing X grouping Y-Z lacp

Where X is the first or ‘master’ port in the aggregate, and Y-X are the ports that will be included in the aggregate.  (Y should be the same as X in most cases).

Remember to save, and if your LACP partner on the other end is already configured, you’ll see that everything is up!

Stop counting lines in sysconfig -r

Thank you to @scottygelb for this tidbit timesaver.

Instead of looking at sysconfig -r to evaluate what the raid group sizes are across your NetApp (which I’ve been doing for a LONG time), use

sysconfig -V

This will give you output like this-

volume aggr0 (1 RAID group):
group 0: 7 disks
volume aggr1_fc (1 RAID group):
group 0: 13 disks

This proves that no matter how long you’ve been doing something, you can still learn more from others. Keep the mind open!

OS X built-in emergency HTTP server

Sometimes you need to serve up a file via HTTP, for instance when upgrading a NetApp Cluster via the Automated method.

With Windows, I always had to install a lightweight HTTP Server, or install IIS (ugh).

I have a MacBook Pro, and I don’t have to install anything!

I can just CD to the directory that the desired file is in, and issue the following command:

python -mSimpleHTTPServer xxxx

where “xxxx” is the TCP Port number you want your laptop to listen on for the HTTP request.  By default, it will use port 8000.

Just make sure you CTRL-C when you’re done, or you’re going to leave a wide-open entryway into that directory.

Hope you find this helpful, I know I did when I learned it!

NetApp connects Hadoop to NFS

The link to Val Bercovici’s article is here.

Here’s the gist-

Hadoop natively uses HDFS, which is a file system that’s made to be node-level redundant. Data is replicated by default THREE times across nodes in a Hadoop cluster.  The nodes themselves, at least in the “tradition” of Hadoop, do not perform any RAID at all, if node’s filesystem fails, the data is already contained elsewhere and any running MapReduce jobs are simply started over.

This is great if you have a few thousand nodes and the people you’re crunching data for are at-large consumers who aren’t paying for your service and as such cannot expect service levels of any kind.

Enterprise, however, is a different story.  Once business units start depending on reduced results from Hadoop, they start depending on the timeframe in which it’s delivered as well.  Simply starting jobs over is NOT going to please anyone and could interrupt business processes.  Further, Enterprises don’t have the space or budget to put up Hadoop clusters with the scale the Facebooks and Yahoos do (they also don’t typically have the justifiable use cases). In fact, the Enterprises I’m working with are taking a “build it and the use cases will come” approach to Hadoop.

NetApp’s NFS connector for Hadoop significantly reduces the entry point for businesses who want to vet out Hadoop and justify use cases.  One of the traditional problems with Hadoop is that one needs to create a silo’ed architecture- servers, storage, and network, in a scale that prove the worth of Hadoop.

Now, businesses can throw compute (physical OR virtual) into a Hadoop cluster and connect to existing NFS datastores – whether they are on NetApp or not!   NetApp has created this connector and thrown it upon the world as open source on GitHub.

This removes a huge barrier to entry for any NetApp (or NFS!) customer who is looking to perform analytics against an existing dataset without moving it or creating duplicate copies.

Great move.

Rant #1: Data-At-Rest Encryption

Subtitled: Data-at-rest encryption compare and contrast, Netapp FAS & VNX2

So every once in a while I run across a situation at a client where I get to see the real differences in approach between technology manufacturers.

The specific focus is on data-at-rest encryption.  Encrypting data that resides on hard drives is a good practice, provided it can be done cost-effectively with a minimal effect on performance, availability, and administrative complexity. Probably the best reason I can come up with for implementing data-at-rest encryption is the drive failure case- you’re expected to send back that ‘failed’ hard drive to the manufacturer, where they will mount that drive to see just how ‘failed’ it really is.  Point is, you’ve still got data on there. If you’re a service provider, you’ve got your client’s data on there. Not good. Unless you’ve got an industrial-grade degausser, you don’t have many options here.  Some manufacturers have a special support upgrade that lets you throw away failed drives instead of return them, but that’s a significant bump up in cost.

OK, so now you’ve decided, sure, I want to do data-at-rest encryption.  Great!  Turn it on!

Not so fast.

The most important object in the world of encryption is the key.  Without the key, all that business-saving data you have on that expensive enterprise-storage solution is useless.  Therefore, you need to implement a key-management solution to make sure that keys are rotated, backed up, remembered, and most importantly available for a secure restore.  The key management solution, like every other important piece of IT equipment, needs a companion in DR in case the first one takes a dive.

Wait, secure restore?  Well, what’s the point of encrypting data if you make it super-simple to steal the data and then decrypt it?  Most enterprise-grade key management solutions implement quorums for key management operations, complete with smart cards, etc.   This helps prevent the all-too-often occurrence of “inside man” data theft.

NetApp’s answer to data-at-rest encryption is the self-encrypting drive.  The drives themselves perform all the encryption work, and pass data to the controller as any drive would.  The biggest caveat here is that all drives in a Netapp HA pair must be of the NSE drive type, and you can’t use Flash Pool or MetroCluster.

Netapp partners with a couple of key management vendors, but OEM the SafeNet KeySecure solution for key management.  Having the keys stored off-box ensures that if some bad guy wheels away with your entire storage device, they’ve got nuthin’.  It won’t boot without being able to reach the key management system.  SafeNet’s KeySecure adheres to many (if not all) industry standards, and can simultaneously manage keys for other storage and PKI-enabled infrastructure.  I consider this approach to be strategic as it thinks holistically- one place to manage keys across a wide array of resources.  Scalable administration, high-availability, maximum security.  Peachy.

I had the opportunity to contrast approach this against EMC’s VNX2 data-at-rest solution.  EMC took its typical tactical approach, as I will outline next.

Instead of using encrypting drives, they chose the path of the encrypting SAS adapter – they use PMC-Sierra’s Tachyon adapter for this with the inline ASIC-based encryption.  So it’s important to note that this encryption technology has nothing to actually do with VNX2. More on this architectural choice later.

Where the encryption is done in a solution is equally as important as the key management portion of the solution- without the keys, you’re toast.  EMC, owner of RSA, took a version of RSA key management software and implemented it in the storage processors of the VNX2.  This is something they tout as a great feature- “embedded key management”.   The problem is, they have totally missed the point.  Having a key manager on the box that contains the data you’re encrypting is a little like leaving your keys in your car.  If someone takes the entire array, they have your data.  Doesn’t this go against the very notion of encrypting data? Sure, you’re still protected from someone swiping a drive.  But entire systems get returned off lease all the time.

Now, of course if you’ve got a VNX2, you’ve got a second one in DR.  That box has its OWN embedded key manager.  Great.  Now I’ve got TWO key managers to backup every time I make a drive config change to the system (and if I change one, I’m probably changing the other).

What?  You say you don’t like this and you want to use a third-party key manager?  Nope.  VNX2 will NOT support any third-party, industry-standard compliant key manager.  You’re stuck with the embedded one.  This embedded key manager sounds like more of an albatross than a feature.  Quite frankly, I’m very surprised that EMC would limit clients in this way, as the PMC-Sierra encrypting technology that’s in VNX2 DOES support third-party key managers!  Gotta keep RSA competitors away though, that’s more important than doing right by clients, right?

OK. On to the choice of the SAS encrypting adapter vs. the encrypting hard drives.

Encrypting at the SAS layer has the great advantage of being able to encrypt to any drive available on the market.  That’s a valid architectural advantage from a cost and product choice perspective. That’s where the advantages stop, however.

It should seem obvious that having many devices working in parallel on a split-up data set is much more efficient than having 100% of the data load worked on at one point (I’ll call it the bottleneck!) in the data chain.  Based on the performance hit data supplied by the vendors, I’m probably correct.  EMC states a <5% performance hit using encryption- but has a caveat that “large block operations > 256KB” and high-throughput situations could result in higher performance degradation.  Netapp has no such performance restriction (its back end ops are always smaller), and the encryption jobs are being done by many, many spindles at a time, not a single ASIC (even if there are multiples, same point applies).  However, I could see how implementing an encrypting SAS adapter would be much easier to get to-market quickly, and the allure of encrypt-enabling all existing drives is strong.  Architecturally it’s way too risky to purposely introduce a single bottleneck that affects an entire system.

Coming to the end of this rant.  It just never ceases to amaze me that when you dig into the facts and the architecture, you’ll find that some manufacturers always think strategically, and others can’t stop thinking tactically.

Death of the Expert

Last week, I had the pleasure of attending the first #RWDevCon, an iOS Development/Tutorial conference in Washington, DC.  About 200 developers converged on the Liaison Hotel, and for two days engaged in some intensive demo- and lab-oriented learning.  Sessions also included some great inspirational talks from folks that have experienced, survived, and thrived in the coding business.

While I’ve done some coding myself over the years (and secretly wish I could do more of it), I was there for other reasons, primarily accompanying my 15-year-old son who has achieved a level of coding I can’t even dream of attaining.  I figured instead of holing up in my hotel room while my kid received all the golden knowledge, I’d attend the ‘beginner’ sessions and get what I could out of the experience.

One thing I definitely noticed is that programming, er.. coding, has changed.  I remember the days where a programmer would write a program, and would be responsible for all facets of it over time.  This included the user interface, the data flow, communications, etc.  Even in the beginning of the “App” explosion, independent developers were the cool guys to emulate.

The term that was used to describe the change to this paradigm was the “Indiepocalypse” (credit for that goes to the esteemed Ray Wenderlich, the “RW” in RWDevCon and principal at raywenderlich.com).  Projects are now all done by teams.  The ability to work effectively in a development team is now just as, if not more important than the ability to code.  No more lone wolves, no more genius-in-a-closed-closet writing awesome code nobody else can understand.  Different team members now focus on what they do best- user experience, architectural design, database, etc.   It’s gotten too complex for one person to be “the expert” at all of these concepts.

So what does this have to do with Infrastructure? Infrastructure has gotten so converged that each discipline within IT architecture affects the others in ways the players and managers don’t often recognize.  There are a few lessons we can take from the “Indiepocalypse” and apply to our world:

  • Don’t rely on a single “Expert”

    All of your organization’s talent can’t be locked in one person’s head, no matter how good they are.  There are obvious “hit by a bus” consequences to this, but it creates a huge imbalance in the organization, and every decision that gets made will have that person’s stamp all over it, at the cost of everyone else’s opinions.
     
  • Constant learning and team building

    Learning, especially when done as a team, ensures that everyone gets the same opportunities to effectively contribute, and gives the team a common “code base” from which to “code” the future of the infrastructure.
     
  • Embracing Change

    Developers at this conference were there to embrace Apple’s move from Objective C to Swift, as well as newer technologies and best practices.  Staying put is not an option for anyone that wants to be needed for the next project.  Why should infrastructure be any different?  Teams need to understand ALL new technologies, whether they’re going to use them or not- otherwise how are they to effectively provide the business with the best options going forward?  How are they to adequately filter through all of the tech marketing thrown at them and bring real solutions to real problems?
     

There are plenty more parallels that can be laid out; I’ll leave to the reader to think those out.  The point is, we need to start treating our IT architecture more like a software development environment than we have, especially since everything is becoming “Software-Defined” – there are tons of learned lessons out there, but they’re all “over the wall” in the software development departments.  Time to tear down those walls.