Gridstore was able to bring the first All-Flash HyperConverged Appliance to market, at a highly cost effective price, by eliminating the need for wasteful replicas. Flash is an expensive resource and throwing away 66% of the flash a customer pays for on replicas is an extraordinary waste. But that is just part of the story.
As I previously blogged about the “Problems with Server Side SAN” – all HyperConverged products on the market today use replication between nodes to provide protection from one or two faults. 3-way Replication (3-way replica is the original data plus two full replicas allowing 2 faults without data loss or disruption) provides a fault tolerance level of two and is recommended when using commodity hardware components that do not have redundancy built into the individual systems. Yes, all vendors allow you to use a replication factor of just one – but this is living close to the edge. There is widespread research based on component failure rates of commodity hardware by companies such as Google and Amazon that strongly recommend a replication factor of 2.
The pioneering of the 3-way replica was really done by Google with the Google File System. Google search is 99.999999 read only. They write indexes once every night and read them billions of times a day. Having replicas of indexes spread across multiple nodes allows more parallel searches. Leveraging cheap capacity to do so seems like a very smart thing to do here. Google has scaled to be one of the largest infrastructures every created. So if it works for Google – why not you. Or so goes the story from vendors selling this architecture.
Well – there’s a number of reasons.
First – to state the obvious – not many enterprises, of any size, have workloads that resemble Google.
Second – the concept of replication is founded on the principle that “capacity is cheap”. This makes a certain amount of sense when you’re talking about spinning disk. But Flash storage breaks this paradigm fully. Depending on the Flash – the difference can be upwards of 50X the cost of “cheap capacity”. With a 3-way replica – you are literally burning 2/3rds (66%) of one of the most expensive resources in the infrastructure. Worse, not only is it an extraordinary waste of a very expensive resource – but Flash also wears out with the number of writes. Now with this 3-way replication model, each write I/O is amplified 3X. This costs CPU cycles, network traffic and most importantly wears the Flash resource 3X faster. The result is you either pay up now by under-provisioning flash to account for the write amplification or pay later by replacing Flash that wore out long before its sell-by date. So not only do you throw away 66% of that Flash – you need to replace it even faster than you thought you would.
Third – When replication is used in the context of Hyper-Converged Infrastructures, the costs literally explode. Again, the cost of replication when using “cheap capacity” may not be material. But when you replace cheap capacity with very expensive Flash – it’s a very different economic equation. While that is bad enough on its on – consider now with Hyper-Converged what is being replicated. With Hyper-Converged Infrastructure (all layers of infrastructure converged into a single component) – you are literally replicating the entire infrastructure stack 3X. Storage – Network – Compute and a full stack of software licenses on every one of these nodes. Replicas based on “cheap capacity” can make sense. Replicating your entire infrastructure 3X makes no sense – it is an extraordinary cost and excessive waste.
So why not switch to Erasure Encoding like Gridstore and eliminate the replicas? This concept, how and where data is placed is the foundation of the architecture. Every element in the system relies on this concept – it is the data path, it is the inter-node protocol, it is every corner case that needs to be turned into code. Foundational architectural principles are bet the farm decisions – once committed to a path are almost impossible to undo unless you have the time and resource to start again.