| |
| |
|
|

 |
Storage lies at the heart of enterprise IT infrastructure, so reliability and availability are essential criteria. No storage media is perfect, and flash is no exception. While it delivers exceptional performance, power efficiency, and hardware reliability, the flash silicon itself is subject to wear-out over time. Cells within the flash silicon only support a fixed number of erase operations, or “erase cycles”. Carefully managing erase cycles therefore is paramount for any enterprise flash-based storage system. Nimbus fully addresses the erase cycle issue with a multi-pronged strategy of quality hardware, smart software, and classic redundancy. |
 |
There are three grades of NAND silicon: MLC (multi-level cell), EMLC (enterprise multi-level cell), and SLC (single-level cell). The different grades of NAND are primarily characterized by the number of erase cycles supported before the silicon wears out. MLC silicon provides the fewest erase cycles, typically from 1,000 to 5,000 erase cycles, and as a result it is commonly used for recreational, non mission-critical applications like tablets, smartphones, netbooks, and cameras. EMLC provides 30,000 erase cycles, approximately 10 times the durability of MLC. EMLC is a variant of MLC technology that is harvested from the highest quality portion of the NAND wafer and programmed uniquely to increase erase cycles. SLC, the rarest and most expensive NAND, provides 100,000 erase cycles. Since it is derived from a completely different wafer process, it cannot leverage the economies of scale of MLC fabrication and costs significantly more than EMLC NAND. In fact, SLC NAND accounts for less than 5% of worldwide NAND output, and some analysts are skeptical as to whether such low manufacturing volumes can sustain SLC production in the future. The industry is clearing converging on the multi-level cell platform: MLC and EMLC. |
 |
The significant differences in NAND durability can be viewed as analogous to varying grades of hard drives, such as consumer-grade SATA, enterprise SATA, and SAS/Fibre Channel drives, each of which has significantly different MTBF ratings and error recovery capabilities. Just as you would not deploy consumer-grade SATA drives for a mission-critical high IO storage environment, similarly MLC NAND should not be either. Furthermore, as NAND geometries shrink, the erase cycles possible with MLC NAND are further reduced, whereas EMLC NAND delivers predictable 30,000 erase cycles. This is why Nimbus uses EMLC silicon within its S-Class and E-Class systems. By starting with silicon that is 10 times more durable, risk is reduced and the achievable lifespan is significantly longer. |
 |
Next, it is important to ensure that write and erase activity is evenly spread across all NAND silicon and the cells and bits within them. In the Nimbus S-Class and E-Class, this vital function is fully hardware-offloaded to a dedicated flash management processor in each flash module. This ASIC is specially-designed to the task of NAND management, including wear-leveling, ECC, safeguarding against errant reprogramming of cells (disturb management), and garbage collection (the process whereby previously deleted data is actually erased from the flash media). In write-intensive environments, this offload is vital and system processors are too-far-removed from the NAND to effectively perform this function. |
 |
Nimbus flash modules are further equipped with 28% reserve NAND, which is extra NAND above the advertised capacity of the module. For example, a 10 TB Nimbus solid state storage enclosure contains close to 13 TB of actual NAND. This reserve NAND serves two purposes. First, it provides spare NAND capacity in the event that any NAND cells prematurely age, similar to a bad-block reserve found on hard drives. In the event that a particular NAND cell is determined to be unrecoverable, a cell from the reserve pool is transparently provisioned in its place. Second, this reserve capacity provides free space with which to intelligently garbage collect without impacting IO performance. Consumer grade SSD’s typically lack such a large reserve and can fall victim to a “write cliff”, a scenario where garbage collection overwhelms the SSD, causing sudden and significant performance loss. Nimbus systems are engineered to avoid a write cliff. |
 |
The HALO operating system also supports a command called TRIM, which is designed to reduce write amplification and enable more intelligent garbage collection. When data is marked for deletion on the system, the Nimbus software can instruct the flash module which particular NAND cells are affected, avoiding the read-erase-modify-write cycle that would otherwise be invoked by garbage collection. This enables the flash module to avoid relocating data in those cells during garbage collection, resulting in fewer writes to the flash and improving endurance. |
 |
Nimbus’ HALO software provides insight into flash reliability by periodically collecting vital information regarding flash life, such as the total amount of gigabytes written, percent of reserve NAND left, and overall flash life expectancy. The HALO software makes this data available to end users interested in tracking these statistics, and Nimbus’ built-in event notification feature tracks this data as well, notifying administrators if certain thresholds are ever breached. |
|
|
|