Serial NAND Flash: the Perfect Companion for Zephyr OS

In this article, we explain why serial SLC NAND flash is an ideal companion for Zephyr OS. We discuss some of the pitfalls to look out for when in comes to integrating NAND flash, and lay out the key ingredients to a successful and reliable integration. Finally, we introduce which are, in our opinion, two of the best NAND file systems available right now, Yaffs2 and TSFS, weighing the pros and cons of each solution.

Why NAND Flash?

Many applications built around Zephyr OS perform some form of data collection. These applications are oftentimes energy-constrained and have a limited RAM budget. They typically require more storage capacity and a higher write throughput than NOR flash can offer. Finally, they tend to have strict storage access timing requirements. Here are a few reasons why NAND flash, in particular the serial (e.g. QSPI) SLC variant, is a perfect fit for these applications:

Fast write performance. Writing to NAND flash is fast, much faster than NOR and likely faster than SD card or eMMC depending on the file system and access pattern. TSFS (see Figure 1) on QSPI NAND, for instance, can sustain over 2MB/s of page-aligned random-like accesses. By comparison, the fastest NOR flash devices will produce around 200KB/s. SD card and eMMC performances vary widely across specimens. Sustaining 2MB/s of small random-like accesses is possible on high-end devices but will require <em>lots</em> of buffering RAM (see next point on write latency).

Low write latency. NAND flash allows for a very low maximum write latency, typically below 10ms with the right file system. For context, the maximum write latency on NOR flash (as dictated by the block erase time) is typically close to one second. On SD/MMC, it goes up to 250ms. The maximum write access time have huge implications in terms of buffering requirements and thus RAM usage (see this article for more details).

Low energy consumption. NAND flash energy consumption is much lower than NOR flash and SD/MMC for write-intensive applications. NOR flash has lower standby and read currents so it can be more energy-efficient for very light workloads or workloads dominated by read accesses. For data collection, though, NAND is almost certainly a better choice.

Low cost per byte. Serial SLC NAND devices offer more storage space than NOR flash counterparts, with capacities up to 1GiB available at a very low cost per byte. Higher capacities are available with a parallel interface although costs rise steeply beyond 1GiB. Figure 1 illustrates how NOR flash, SLC NAND flash and MLC NAND flash compare in terms of capacity and cost.

Ease of use. With its high endurance (100k erase/program cycles) and built-in ECC engine, serial SLC NAND is without a doubt the easiest way to access all the benefits of NAND flash. Thanks to its ubiquitous QSPI (or other SPI variants) interface, serial SLC NAND can be attached to any modern MCU, offering a lower pin count and taking up less board space than its parallel counterpart. Admittedly, NAND flash requires a bit more care than NOR flash, but in the serial SLC form, and with the right management software, it really is nothing to be afraid of.

Cost versus size diagram for NOR, SLC & MLC NAND.
Figure 1 – NOR, SLC NAND and MLC NAND cost versus capacity

The Keys to Reliability

The use of a proper NAND management software (or NAND file system) makes the difference between a seemingly functional solution and a truly reliable one. Crucially, the NAND management software takes care of the following:

Bad blocks identification. In order to minimize production cost, NAND flash may exit the factory with a small amount of defective blocks (bad blocks). A block is considered bad if it cannot operate properly within strict testing parameters set by the manufacturer. These blocks are marked with a special value (different than 0xFF) by the manufacturer, at a specified location within the bad block. Note that out-of-factory bad blocks may not fail upon programming or erasing, but still exhibit lower than specified data retention which could result in data corruption. The NAND management software is responsible for tracking these blocks and making sure that they are never used.

Failing blocks tracking. In addition to the bad blocks already present out of the factory, bad blocks can form during operation. A failing block will trigger a program or erase error. It is important that the NAND driver reads the device status register after each program/erase operation to identify potential errors. The error must be reported to the NAND management software, which is responsible for permanently retiring these blocks and managing replacement blocks.

Wear-levelling. NAND blocks can only be erased a certain number of times before they eventually fail. This requires two things from the file system or NAND management software: writing (and thus erasing) should be minimized (low write amplification), and erasing should be as uniformly distributed as possible across all the blocks. The latter is required in order to avoid prematurely wearing off some blocks that would otherwise be erased more than others. The algorithm used to distribute wear across the entire flash is referred to as wear-levelling and is an essential part of proper NAND management.

Read error management. SLC NAND flash has a very low, but higher than 0, bit error rate (BER). These errors must be compensated for with error correction codes (ECC). Many serial SLC NAND devices come with built-in ECC, which automatically performs ECC calculation upon programming, and verification/correction upon loading. One widespread misconception, though, is that on-chip ECC means no software intervention at all. This is not true. When a certain threshold of bit errors is exceeded, the error is deemed critical, meaning that more errors will exceed the correcting capabilities of the ECC and cause unrecoverable data corruption. When a critical read error is detected, the NAND management software is responsible for moving the block content to a new block, a process that is commonly referred to as block refreshing.

NAND File Systems for Zephyr

LittleFS

Zephyr OS comes with a fully integrated flash file system called LittleFS. In its current version (version 2 at the time of writing this article), LittleFS can run on NAND flash although with serious limitations, including the lack of bad block tracking, the lack of critical read error handling, and extremely high write amplification. For these reasons, LittleFS version 2 is not recommended for NAND flash. It should be noted that version 3 is now underway, which promises to improve many aspects of the current version, including NAND support.

Yaffs2

Yaffs2 is a dual-licensed (GPL and commercial) file system with a long record of successful deployments on SLC NAND flash. It can run on a variety of host platforms, including Linux-based systems, as well as bare-metal and RTOS-based systems. Yaffs2 offers full support for SLC NAND flash, including bad block management, read error management and wear-levelling. Only dynamic wear-levelling is available though (as opposed to dynamic <em>and</em> static wear-levelling) which may require a bit of extra care to make sure that all content is updated (and thus moved around) from time to time.

Another key benefit of Yaffs2 is its low write amplification and high write throughput. On the flip side, Yaffs2 keeps large amounts of file system metadata in RAM. For a typical 128MiB SLC NAND with 2KiB pages, Yaffs2 requires around 160KB of RAM assuming a matching chunk (sector in Yaffs jargon) size. A larger chunk size will decrease the RAM footprint, although likely at the expense of write performance, depending on the access pattern.

As far as we know, there is no Yaffs2 support for Zephyr OS, but putting together a minimal port for memory allocation and thread-safety is straightforward. Writing the NAND driver is somewhat trickier. In particular, the driver must provide spare area access and proper error handling which, depending on your experience with NAND, may require a bit more effort.

TSFS

TSFS is a high-performance flash file system targetting small, bare-metal and RTOS-based embedded systems. It was introduced back in 2016 with the specific goal of providing solid NAND flash support.

TSFS provides bad block tracking (out-of-factory and forming), block refreshing on critical read errors, both dynamic and static wear-levelling and an impressively low write amplification. It produces high sustained read/write throughput (as detailed in this article) and a low write latency (typically below 10ms) on NAND flash. TSFS requires around 20KB of RAM in total on NAND flash.

TSFS is offered with a minimal kernel port for Zephyr OS and full hardware support for popular platforms and raw NAND devices. A complete integration with Zephyr’s Virtual Filesystem Switch is currently in the works. In the meantime, evaluation examples for Zephyr OS are available on request.

Conclusion

Serial SLC NAND devices come with many compelling benefits for all sorts of data collecting applications. Used with the right NAND management software, NAND flash provides cheap, reliable and fast storage. At the time of writing this article, Zephyr OS only comes with very limited NAND flash support, but either Yaffs2 or TSFS can be integrated with minimal effort. If you have questions regarding NAND flash support or anything else embedded-related, please reach out to us.


See all articles