February 22, 2016
The trade-off for flash storage vs. spinning disk has always been the same. Sure, flash is far cheaper per IOP, but per byte of data, flash simply cannot compete. The added cost per byte had always relegated flash storage to thumb drives, some laptops, cell phones, and other gadgets where the physical space of a spinning disk was unfeasible.
In 2009, Pure Storage developed an all flash array to change that tired dynamic, and the startup has been a major disruption within the storage industry ever since. In an industry that has remained relatively unchanged for twenty years, Pure Storage manages to deliver the performance of flash at a price that can compete with traditional hard drive storage.
To reduce the price of flash, Pure Storage had to reduce the cost per byte on their array, and they accomplished this through several data reduction techniques. The primary of which is data deduplication.
What is Flash Storage?
Flash storage is an alternative to a hard disk drive (HDD). Otherwise known as a solid-state drive (SSD), flash storage replaces a spinning disk with a computer chip. Unlike system memory, flash has the additional capability of storing an electrical charge after power has been removed from it, and so it “remembers” the data that has been stored.
Data Deduplication in Pure Storage All Flash Arrays
All data can be boiled down to binary code, a string of 1s and 0s that inform the computer what to display on the screen on a per pixel basis. With only two variables in play, there are bound to be repetitive blocks of code used over and over again. Data deduplication recognizes these redundant blocks as it writes the new data and simply chooses to omit them.
Through this technique, there may be a block of data that is used in fifteen different files, but it only exists once in the array. When a server asks the storage array to retrieve that data, it gathers the unique elements and references the shared blocks as needed.
- Global Reduction: An important element to note in Pure Storage’s deduplication capabilities is that the deduplication procedure is applied globally across the entire array and not just a single drive. If a block of data is repeated anywhere in the array, the optimization software will quickly locate it and eliminate it.
Data Deduplication Problems with Disk
Data reduction techniques and data deduplication are not new. They’ve been around in the storage industry for some time, but data deduplication has always been a mixed bag on a spinning disk.
When the redundant blocks of data are eliminated and consolidated, it disperses the data of a file throughout the physical surface area on the spinning disk. Meaning, the file is no longer all in one place. In flash that isn’t much of an issue, but spinning disk relies on a mechanical arm and a spinning motor. The arm now needs to physically search for the data in more than one place. That back and forth movement of the arm hikes up the latency on an already comparatively slow medium.
As a result, deduplication is generally used exclusively for archiving, secondary storage purposes, or any other task where latency is not a concern.
However, in an all flash array, the same hardships do not occur. There is no mechanical arm needed to seek out data. Everything is done electronically on a chip. Furthermore, flash storage already performs at speeds far beyond a spinning disk, so flash can afford to take a small hit to performance in the name of data reduction.
Other Data Reduction Technologies That Compete With Spinning Disk
Aside from data deduplication, Pure Storage uses four other distinct data reduction techniques to compress data, save space, and overcome the perennial problem holding flash back from seizing the market.
- Pattern Removal: Identifies and removes repetitive binary patterns
- Inline Compression: Changes the format of the data to use less capacity
- Deep Reduction: A heavier-weight compression algorithm
- Copy Reduction: Instead of rewriting the file, Pure Storage uses metadata to create copies
Data Reduction Results
Though Pure Storage recognizes that all environments are different and those differences will greatly affect how much compression will occur, they confidently claim a reduction ratio as high as 10:1.
Like what you read?
Mindsight, a Chicago IT services provider, is an extension of your team. Our culture is built on transparency and trust, and our team is made up of extraordinary people – the kinds of people you would hire. We have one of the largest expert-level engineering teams delivering the full spectrum of IT services and solutions, from cloud to infrastructure, collaboration to contact center. Our highly-certified engineers and process-oriented excellence have certainly been key to our success. But what really sets us apart is our straightforward and honest approach to every conversation, whether it is for an emerging business or global enterprise. Our customers rely on our thought leadership, responsiveness, and dedication to solving their toughest technology challenges.
For Further Reading: