What Is Z-NAND?

When Samsung first announced Z-NAND in 2016, it was a year after 3D XPoint memory was announced and before any Optane products had shipped. Samsung was willing to preview some information about the Z-NAND based drives that were on the way, but for a year and a half they kept almost all information about Z-NAND itself under wraps. Initially, the company would only state that Z-NAND was a high-performance derivative of their V-NAND 3D NAND flash memory. Meanwhile at Flash Memory Summit 2017, they confirmed that Z-NAND was a SLC (one bit per cell) memory, while the company also announced that they were also working on a second generation of Z-NAND will introduce a MLC version of Z-NAND. (for reference, mainstream NAND flash is now almost always 3 bit per cell TLC).

If simply operating existing NAND as SLC was all there is to Z-NAND, then we would also expect Toshiba, WD, SK Hynix to have also delivered their competitors by now. But there are further tweaks required to challenge 3D XPoint. A year ago at IEEE's International Solid State Circuits Conference (ISSCC), Samsung pulled back the veil a bit and shared more information about Z-NAND. The full presentation was not made public, but PC Watch's coverage captured the important details. Samsung's first-generation Z-NAND is a 48-layer part with a capacity of 64Gb. Samsung's mainstream capacity-optimized NAND is currently transitioning from 64 layers to what's officially "9x" layers, most likely 96. There are probably several factors for why Z-NAND is lagging behind by almost two generations of manufacturing tech, but one important element is that adding layers can be detrimental to performance.

Samsung 3D NAND Comparison
Generation 48L SLC
Z-NAND
48L
TLC
64L
TLC
9xL
TLC
Nominal Die Capacity 64Gb
(8GB)
256Gb
(32GB)
512Gb
(64GB)
256Gb
(32GB)
Read Latency (tR) 3 µs 45 µs 60 µs 50 µs
Program Latency (tPROG) 100 µs 660 µs 700 µs 500 µs
Page Size 2kB, 4kB 16kB 16kB 16kB?

Compared to their past few generations of TLC NAND, Samsung's SLC Z-NAND improves read latency by a factor of 15-20x, but program latency is only improved by a factor of 5-7x. Note however that the read and program times shown above denote how long it takes to transfer information between the flash memory array and the on-chip buffers; so that 3µs read time doesn't include transferring the data to the SSD controller, let alone shipping it over the PCIe link to the CPU.

With Samsung using 16kB page sizes for their TLC NAND, the 4kB page size for SLC Z-NAND seems to be a reasonable choice as only a slight shrink in total number of memory cells per page, but the capability to instead operate with a 2kB page size indicates that small page sizes are an important part of the performance enhancements Z-NAND is supposed to offer.

Missing from this data set is information about the erase block size and erase time. Erasing flash memory is a much slower process than the program operation and it requires activating large and power-hungry charge pumps to generate the high voltages necessary. For this reason, all NAND flash memory groups many pages together to form each erase block, which nowadays tends to be at least several megabytes.

Samsung's Z-NAND may be able to offer far better read and program times than mainstream NAND, but they may not have been able to improve erase times as much. And shrinking erase blocks would significantly inflate the die space required for peripheral circuitry, further harming memory density that is already at a steep disadvantage for 48L SLC compared to mainstream 64L+ TLC.

Test System

Intel provided our enterprise SSD test system, one of their 2U servers based on the Xeon Scalable platform (codenamed Purley). The system includes two Xeon Gold 6154 18-core Skylake-SP processors, and 16GB DDR4-2666 DIMMs on all twelve memory channels for a total of 192GB of DRAM. Each of the two processors provides 48 PCI Express lanes plus a four-lane DMI link. The allocation of these lanes is complicated. Most of the PCIe lanes from CPU1 are dedicated to specific purposes: the x4 DMI plus another x16 link go to the C624 chipset, and there's an x8 link to a connector for an optional SAS controller. This leaves CPU2 providing the PCIe lanes for most of the expansion slots, including most of the U.2 ports.

Enterprise SSD Test System
System Model Intel Server R2208WFTZS
CPU 2x Intel Xeon Gold 6154 (18C, 3.0GHz)
Motherboard Intel S2600WFT
Chipset Intel C624
Memory 192GB total, Micron DDR4-2666 16GB modules
Software Linux kernel 4.19.8
fio version 3.12
Thanks to StarTech for providing a RK2236BKF 22U rack cabinet.

The enterprise SSD test system and most of our consumer SSD test equipment are housed in a StarTech RK2236BKF 22U fully-enclosed rack cabinet. During testing for this review, the front door on this rack was generally left open to allow better airflow, since the rack doesn't include exhaust fans of its own. The rack is currently installed in an unheated attic and it's the middle of winter, so this setup provided a reasonable approximation of a well-cooled datacenter.

The test system is running a Linux kernel from the most recent long-term support branch. This brings in the latest Meltdown/Spectre mitigations, though strategies for dealing with Spectre-style attacks are still evolving. The benchmarks in this review are all synthetic benchmarks, with most of the IO workloads generated using FIO. Server workloads are too widely varied for it to be practical to implement a comprehensive suite of application-level benchmarks, so we instead try to analyze performance on a broad variety of IO patterns.

Enterprise SSDs are specified for steady-state performance and don't include features like SLC caching, so the duration of benchmark runs doesn't have much effect on the score, so long as the drive was thoroughly preconditioned. Except where otherwise specified, for our tests that include random writes, the drives were prepared with at least two full drive writes of 4kB random writes. For all the other tests, the drives were prepared with at least two full sequential write passes.

Our drive power measurements are conducted with a Quarch XLC Programmable Power Module. This device supplies power to drives and logs both current and voltage simultaneously. With a 250kHz sample rate and precision down to a few mV and mA, it provides a very high resolution view into drive power consumption. For most of our automated benchmarks, we are only interested in averages over time spans on the order of at least a minute, so we configure the power module to average together its measurements and only provide about eight samples per second, but internally it is still measuring at 4µs intervals so it doesn't miss out on short-term power spikes.

Introduction Performance at Queue Depth 1
POST A COMMENT

46 Comments

View All Comments

  • patrickjp93 - Tuesday, February 19, 2019 - link

    Okay, no, just no. Pursuing normalisation beyond 3rd normal form is lunacy. You actually start losing ground on compression at that point, and your queries get ridiculously more verbose. 4th & 5th normal form are touted by academics who never have to work with them and DBAs who have more time on their hands than sense.

    Anyone who's ever worked even in 3rd normal form knows those pivot tables that are just key-key pairings are ridiculous wastes of space and code for an extra join or the smartass who'll use Unpivot+Roll-up which 90% of SQL users will not understand and which doesn't perform any better than just using another join!
    Reply
  • prisonerX - Wednesday, February 20, 2019 - link

    You really don't have a clue about databases, normalized or otherwise. You're entirely FOS. Reply
  • JTBM_real - Thursday, February 21, 2019 - link

    Database servers evolved a lot. High end database servers have the whole database in memory and have scaleable CPU - in practice multiple CPUs. Every processing and storage what is not database can be pushed to other purpose built servers. Purpose built server can be processing or storage heavy as needed.

    If you go to the extremes and cannot build a larger iron you can split your database and have two (or more). This is probably only an issue at google for example.
    Reply
  • Opencg - Tuesday, February 19, 2019 - link

    no price is too high to stop me from instalocking pharah Reply
  • npz - Tuesday, February 19, 2019 - link

    Intel Optane P4800X which it's competing against is $3,420.99 for only 750GB
    https://www.newegg.com/Product/Product.aspx?Item=3...

    and the 1.5TB version is almost twice as much.

    so the 983 ZET 960GB is a relative bargain, haha.
    Reply
  • npz - Tuesday, February 19, 2019 - link

    you also have to remember that since they are going back to SLC instead of trying to stuff as many bits onto the same silicon as possible plus the new technology, of course it's going to be expensive. Reply
  • Samus - Tuesday, February 19, 2019 - link

    Going 'back' to SLC is relatively simple. You don't need to manufacture NAND any different to dictate the voltage states. That is configured at the firmware level. Hence SLC caching occurs on "TLC" NAND.

    The other improvements that dictate Z-NAND are some manufacturing tweaks, but there is nothing stopping Samsung from running the NAND in this drive in MLC or TLC mode, effectively giving you 2-3x the storage capacity. Performance and endurance would obviously take a hit, obviously.

    But the big difference here between THIS SLC and previous SLC is the manufacturing technologies present now since SLC was effectively retired (3D NAND, process node, etc) have all improved tremendously. There was no way to physically fit enough NAND dies running SLC mode on even a PCIe card 10 years ago that would contain 1TB.

    The only other company doing anything like this with modern SLC is SoliData and its mostly for mission critical military applications (SLC mode, inherently running fewer voltage states, tolerates high temperatures better.)
    Reply
  • FunBunny2 - Tuesday, February 19, 2019 - link

    "Going 'back' to SLC is relatively simple. You don't need to manufacture NAND any different to dictate the voltage states. That is configured at the firmware level. Hence SLC caching occurs on "TLC" NAND."

    I seem to recall that some place (here?) described whether the physical cell is constructed the same whatever the xLC. I don't remember the answer. but don't the control structures have to be different (more complicated) for [M/T/Q]LC than SLC? there are [T/Q]LC SSD with SLC cache, yes? the question is whether embedding SLC into a [T/Q]LC SSD behaves just like a native SLC drive. I don't know, but I'd guess it matters.
    Reply
  • Samus - Tuesday, February 19, 2019 - link

    So, I’m not a NAND engineer, but my understanding is NAND cells are optimized for voltage translation, but at the same time NAND l, lets say TLC, isn’t manufactured with a specific SLC ‘zone’ as the controller firmware simply decides to run a portion of the NAND, whatever it may be, in a single voltage state.

    That said, I don’t see why you could just run an entire “TLC” NAND die in SLC mode and reap the benefits at a loss of capacity. It would simply be a reprovisioning.
    Reply
  • Samus - Tuesday, February 19, 2019 - link

    I would say the 900P is more similar to this based on endurance, and the 900P is half the price. Reply

Log in

Don't have an account? Sign up now