Taming the Flash Beast

This article is the first of an introduction series about flash memory. A high-level introduction shall we say. Not the kind that takes you straight to the electron and drags you through the depths of quantum physics. No. The purpose of this series is to provide useful information from an operational perspective. Things that you should know before incorporating flash-based devices into your design. Things that, if ignored, can really backfire on you.

This first article discusses what makes flash devices special. What makes them so compelling for embedded applications but also, what makes them somewhat intimidating at first. Because, yes, flash devices are capricious little things that require care and attention. But with a proper design and the help of dedicated management software, flash memories can become trustworthy allies capable of supporting an astoundingly large array of applications.

The Goods and the Bads

At first sight, flash memories have everything to please embedded application designers. They are small, they can withstand large amounts of shocks and vibrations, as well as extreme temperatures, they are cheap (although the actual cost per byte can vary quite a lot based on application requirements) and exhibit very low energy consumption. Above all, they come in many different shapes and flavours, matching as many different applications, from tiny battery-powered sensor devices up to the most demanding high-speed computing platforms.

But flash memories have a dark side. Their programming procedure is everything but straightforward, they can suffer significant bit error rates (which typically grow with age and wear), they are subject to weird side effects, like program and read disturbance, and can be corrupted by untimely power losses. Not quite heartening. Because unless flash is your profession, solving the flash puzzle is probably not a priority. This is where flash management software comes in.

It’s All About Software

Perhaps as important as the flash device itself is the flash management software. It can make the difference between a successful design and a catastrophic failure. Indeed, it is no exaggeration to say that the performance and reliability of a flash device is only as good as that of the overlying flash management. This comes as no surprise given that flash management involves many critical tasks, including:

  • Copy-on-write. In-place updates are fundamentally incompatible with the way that flash arrays operate. This means that the seemingly trivial act of overwriting data, on flash, is in fact quite complex, because data must be moved to a new location as it is updated. This translates into all sorts of elaborate mechanisms such as address translation and garbage collection. These algorithms are often globally referred to as copy-on-write (COW) and are integral components of the flash management software.
  • Bad block management. Some flash blocks are faulty upon exiting the factory — at least for NAND flash. Besides, additional bad blocks can appear within the expected lifetime of the flash memory. New bad blocks are identified upon erasing or programming, based on the operation status code. In either case, bad blocks must not be further accessed. The flash management software is responsible for locating factory bad blocks and keeping track of new bad blocks as they appear. It is also responsible for relocating valid data out of bad blocks when needed.
  • Wear-levelling. Flash blocks can only be erased so many times before problems happen. What problems, you ask ? All kinds. Let’s just say that beyond its expected lifetime, a block is not guaranteed to maintain the same level of performance and reliability. Up to the point where, eventually, it cannot even operate properly. To maintain consistent performance and reliability levels across the whole flash, and to prevent blocks from prematurely turning bad, wear disparities between blocks must be minimized. Clearly, the end result cannot be left to the vagaries of top-level access patterns, which are likely far from uniform. Here, the flash management software can help in two different ways. First, it can choose update destinations (remember that updates are performed out-of-place) such as to even out the wear distribution. Secondly, it can force data out of stale blocks, such that these blocks are also erased and written to from time to time. The former strategy is called dynamic wear levelling, while the second is known as static wear levelling. Both are critical aspects of reliable flash management.
  • Error correction and block refreshing. Flash devices — at least NAND flash devices — are not 100% reliable. Bit errors can occur for different reasons and it is the responsibility of the flash management software to palliate these errors. Various error correction algorithms (ECC) are used to match various bit error rates. When a block contains a critical number of bit errors (on the verge of exceeding the correcting power of the ECC) its content must be copied to a new block before it becomes irremediably corrupted. This procedure is called block refreshing and is typically under the control of the flash management software as well.

What’s next?

A lot quite frankly. But you have probably seen enough by now to understand the kind of creature we are dealing with. And enough to understand that proper flash management is a cautious first step towards taming the flash beast. Stay tuned as we dive deeper into the subject in articles to come. In the meantime, the TSFS User Manual contains a wealth of information on flash memories, including basic concepts, design guidelines and in-depth performance analysis.

Thank you for reading and if you have any questions or comments regarding flash technologies, embedded storage in general or any other embedded topics, please feel free to reach out to us.

Leave a Comment