5 Things to Consider When Choosing an Embedded File System

In this article, we cover five aspects of embedded file systems that you should have in mind when comes the time to decide which file system to base your next design on. This is by no means a comprehensive guide to embedded file systems — that would be a topic for a book — but these five bare essentials cover enough ground to avoid the most common mistakes, mistakes that we, as embedded consultants, witness all too often, particularly among application engineers who have no or little prior experience with embedded file systems.

1) Storage Device Support

As important as it is, the file system is only a fraction of what it takes to achieve efficient data storage. There is also the device driver, the controller driver, the flash translation layer (if required) and, last but not least, the storage device itself.

Choosing the perfect combination of software and hardware components can be tricky. In an ideal world, components would be selected in isolation, guided by independent sets of criteria and then, assembled into an optimal solution for the application at hand. Unfortunately, there are a number of interdependencies between the various storage components, especially between the storage device and the file system. As a consequence, proper device selection based on application requirements is a complex task benefitting from a systematic approach. That, in itself, will be the object of a separate article. In the meantime, let’s simply point out the fact that all storage devices are not equally supported by all file systems.

Figure 1 shows the three most common storage stack configurations used in embedded systems. Figure 1a shows a storage stack based on managed flash (e.g., SD card or eMMC). In this case, a file system that supports a block device driver interface — let’s call it a block device file system — is needed. FAT32 is a very popular block device file system for embedded applications, although there are better solutions available nowadays (more on this later on).

Both Figure 1b and Figure 1c show storage stacks based on bare flash. The most common devices used in embedded applications are QSPI NOR, QSPI SLC NAND and parallel SLC NAND. Here, there are two possibilities in terms of file system: the first one (Figure 1b) is to use a block device file system, same as for managed flash although this time, with the addition of a special adaptation layer, commonly referred to as the flash translation layer, or FTL for short. This extra adaptation layer is needed because a raw flash device is nothing like a block device (again this is a topic for a separate article). The alternative (Figure 1c) is to opt for a native flash file system which, as its name implies, can run directly on top of a raw flash device.

Block diagram of three different file system stack configuration. — **Figure 1 –** The three most common embedded storage stack configurations:
a) block device file system on top of a managed flash
b) block device file system on top of a flash translation layer
c) native flash file system on top of a bare NOR or NAND flash.

Beyond device compatibility, another thing to consider is the level of integration readily available for your combination of file system, target platform and storage device. Is there a driver available for the specific device that you want to use? What about the controller driver? If you need an FTL, is the integration with the selected file system already available? If that is not the case, make sure to factor in the additional time, cost and overall complexity in your planning. In doubt, you might want to consider a more tightly integrated solution or consult with embedded storage experts.

2) Performance

Oftentimes, the term performance is narrowly used in reference to average read/write throughput (usually measured in megabytes per second). Although average throughput is an important performance metric, other metrics should not be overlooked. Here are some of the most significant:

read/write throughput
read/write latency
read/write energy consumption
mount time
file creation/deletion/truncation time
directory creation/read time

Evaluating and comparing file system performances is tricky because the conclusion depends on so many factors. By the way, be wary of performance data given without any form of context. Performance numbers are meaningless without a complete picture, including at the very least, the file system buffer/cache configuration, the MCU/SoC configuration, the storage device fill level, the controller and bus configuration and the access pattern (size and distribution).

Let’s take the fill level as an example of how performances can drastically change based on seemingly unrelated parameters. The fill level is the ratio between the size of stored data and the total device size, typically expressed in percentages. The higher the fill level is, the less room the FTL (or flash file system) has for its own internal purposes. That turns out to be a huge deal because on bare flash you must erase large blocks of data before you can rewrite one of the pages that it contains (64 pages per block is a typical configuration). As a result, all the valid data in the block to be erased must first be relocated elsewhere, a process commonly referred to as garbage collection. The more free space you have, the less often garbage collection must be performed and thus, the higher the write throughput is.

The detailed mechanics of how the fill-level impacts write performances go beyond this simple explanation, but hopefully it gives you a sense of why performances cannot be measured and interpreted in a vacuum. For application engineers who have limited experience with embedded storage technologies, these subtleties can be overwhelming. In doubt, consult with embedded storage experts or discuss your application requirements in terms of expected performances and resource constraints with your file system vendor.

3) Resource Usage

When discussing resources in general, we may refer to many different things including memory, CPU, internal bus bandwidth and energy. In the context of embedded file systems, though, the single most important resource to consider is RAM. The minimum RAM requirement is a good place to start because, obviously, if you cannot meet the minimum, it’s not worth going any further. But don’t stop there.

Also contributing to the minimum RAM requirement — but often not counted in the advertised file system footprint — is the RAM requirement for the FTL if one is needed. Depending on how the FTL is designed, the extra footprint may be more or less significant. Anyhow, this is certainly something to watch out for.

In addition to the fixed minimum RAM requirement, many file systems exhibit variable memory usage based on configuration. Oftentimes, the added memory linearly increases with one or many configuration parameters, the most common being the device size, the logical block size and the number of opened files/directories. We have even seen, in one case, the RAM requirement increase based on the amount of stored files, which is really something that you don’t want to miss!

Now, here is the tricky part: performances and memory footprint usually go hand in hand. For instance, the more cache and/or buffers you have, the better performances you get. Another way that performances and memory footprint are related is through the maximum write latency. Generally speaking, the higher the write latency is, the more buffering you need at the application level. It just goes to show how, yet again, how interdependent file system characteristics can be and how they should be considered as a whole (easier said than done!) to find the optimal tradeoff for your specific application.

4) Fail-Safety and Transactionality

Another crucial characteristic for a file system is its behaviour in case of a power failure or other untimely interruption like the physical removal of the device by the end user (e.g. SD card). In such a scenario, the outcome can vary widely depending on the file system implementation, going from complete corruption and loss of the whole file system, to losing only the small amount of data waiting in RAM to be written. For applications that cannot tolerate any amount of data loss, the requirements extend far beyond the file system itself, so we will not discuss that.

A popular choice of file system among embedded application designers is FAT32. Although the FAT file system can be a reasonable choice in some cases, it is not appropriate for applications where power failures are expected and file system corruption is not tolerable. Depending on the exact implementation, you can end up with something mild like cluster leakage, or something way more severe like cross-linked or cyclic cluster chains.

To mitigate these problems, some implementations of the FAT file system are offered with an extra journaling module. The fundamental idea behind journaling is as follows: each file operation is logged on disk before it is actually performed. In the event of a sudden interruption, the logged operations are read back by the file system upon recovery, and either completed or reverted before normal operation is resumed. In the end, partial operations are never allowed thus guaranteeing file system integrity.

Note however that journaling does not protect against data corruption. If a power failure occurs midway through a file update where some portion of the file is being overwritten, you can end up with an undefined mix of old and new data, likely translating into an inconsistent state from the application perspective — even though the file system itself remains perfectly coherent. For some applications that might be okay, but if your application requires data consistency under all circumstances, you need a transactional file system.

Transactional file systems offer the highest level of protection against untimely interruptions. Simply put, a transaction is a group of file system operations that either executes completely (as a whole) or not at all. By appropriately defining transactions at the application level (transactional file systems offer dedicated APIs for that purpose), you can make sure that your data remains in a consistent state no matter how and when individual operations are interrupted.

5) Support and Cost

We have already mentioned the importance of reaching for external advice and expertise when needed, and this cannot be overemphasized. However, consulting fees can quickly add up and sometimes it just makes more sense to go with a commercial solution and the customer support that comes with it. Now, full disclosure: we do sell licenses for our own embedded transactional flash file system, TSFS, so we are admittedly biased here. On the other hand, we also offer consulting services, so we know how much time and money can be spent on fixing poorly integrated solutions that were initially selected to avoid licensing costs.

Just to be perfectly clear: there are many great open source alternatives out there. But is an open source solution right for you? Does your level of experience and expertise matches the solution that you are considering? Have you planned for the extra cost and time required to evaluate, integrate and tune a full-storage stack, including the file system, the FTL, the drivers and the storage device on your own? Those are crucially important questions and unless you answered each time with a resounding yes, chances are you would be better off with a commercial solution.

6) Interoperability

I know, we were supposed to stop at five, but here is an extra one anyway: interoperability. Some applications require that the user be able to access the content of the embedded storage media from a Linux or Windows host computer. This can be achieved through a network file access protocol such as FTP, regardless of the file system in use. Another way of sharing data is through a removable media, such as SD card. In that case, however, the file system must be supported by the host computer, which leaves very few options. The easiest option is to go with one of the FAT file system variants. Otherwise, some embedded file system vendors provide a Linux or Windows driver that can be installed on the host computer in order to support their proprietary file system. This certainly is another option although it might not be acceptable from the end user experience standpoint. In any case, if interoperability is a requirement, that should be taken into account at the very beginning of the file system selection process.

Conclusion

We do hope that you found this article instructive. Hopefully, it will keep you from making some of the most common mistakes that we, as consultants, see being repeated again and again. However, don’t forget that there is much more to embedded file systems than we could possibly cover in this article, especially when considered in relation to specific application requirements. If you need expert advice on embedded file systems or file system integration, or if you want to know more about our embedded transactional flash file system, TSFS, please feel free to contact us.