This article is the first part of a twofold series on one of the most unique TSFS feature: snapshotting. In this first article, we show how snapshots can ease application development, providing the application designer with an elegant way of handling concurrent read/write accesses. We also introduce a simple firmware update example, to help us gain a concrete sense of what snapshots can do. In the second article, we will discuss the specifics of the TSFS snapshot API and show how it can be used to implement the solution designed in the first article. But first, let’s begin with a general overview.
What are Snapshots?
Snapshotting is a widely available feature among server-grade file systems. It comes in various shapes and colours, but the goal remains essentially the same: saving the current state of the file system (or part of it), such that it can further be read from or reverted to. Snapshots cannot be modified, only deleted. Once created, snapshots remain the same for their entire lifetime, independent of subsequent modifications to the file system.
As they provide a consistent, immutable view of the file system without preventing further updates, snapshots account for a natural and efficient way to cope with concurrent read/write accesses. To understand how, let’s consider a simple example.
The Firmware Upgrade Problem
Suppose a firmware upgrade procedure, through which both program code and accompanying read-only data files (say version info, fonts, lookup tables, etc.) are updated. The new firmware version is retrieved through some network connection. To save bandwidth, let’s further assume that the upgrade is differential, that is, unchanged files (and perhaps even portions of files) are not retransmitted each time.
The procedure must also fulfill the following requirements:
- The device must be able to maintain normal operation while the upgrade is being performed.
- The upgrade procedure must be fail-safe, i.e. the device must be able to fully recover from an untimely interruption.
- The upgrade procedure must include a revert path, i.e. the firmware can be returned to the previous version if the new version is bogus (say, networking does not work correctly).
Obviously, overwriting the firmware files without precaution is not an option. Although it can be fail-safe (at least when built on top of a transactional file system such as TSFS), this simple design does not provide the required revert path. It does not allow for normal operation during the upgrade either.
A better approach would be to create an alternate directory tree for the updated version. The application could switch to the new directory tree after the upgrade is complete. Again, write transactions could be used to make the switch fail-safe and atomic from the running application standpoint.
While this approach does work, it introduces complex and error-prone firmware switch/revert logic. Also, it is not space-efficient as unchanged files must be copied to build a full firmware directory tree each time.
A Fail-Safe Design Using Snapshots
Using TSFS snapshots, a possible design could be as follows:
- Update the files (or portions of files) as needed.
- Create a “current” snapshot for the updated firmware (overwriting the previous “current” snapshot).
- Commit the above modifications (snapshot creation and file updates).
- Reboot or reload the firmware.
Figure 1 depicts the update procedure. At all time, the firmware files are accessed by the running application through the “current” snapshot (we assume that an initial “current” snapshot was created upon installing the firmware files for the first time). Because of that, updates performed as part of an ongoing firmware upgrade (step 1) are hidden from the application. The application can thus run normally using the latest firmware version, oblivious of any concurrent modifications.
After the snapshot is created (step 2), the “current” snapshot contains the new firmware version. However, this snapshot is not yet visible from the application. Not as long as it remains uncommitted.
When the commit is performed (step 3), the newly created “current” snapshot becomes effective and the new firmware is revealed to the application. Because the firmware files remain at the same location, the application can keep using the same access path.
Still some caution is needed here. As the transition to the new firmware version is abrupt, the running application should probably be stopped before committing. Although normal operation cannot be sustained at this point, this situation is perfectly acceptable as committing is very fast. Also the device is likely to be reset anyway to complete the upgrade procedure.
Now, consider what happens in the event of a unexpected failure (say a power loss). If the interruption occurs before committing, the previous firmware version is automatically restored upon mount cycling and the upgrade procedure can start over again. Otherwise, the update procedure is complete and the application can resume using the new firmware version.
So far we have covered two crucial requirements: the device can operate normally during the upgrade and the procedure is fail-safe. As a bonus, the design is space-efficient, because unaltered files (and portions of files) are not duplicated between snapshots (at least this is the case for TSFS).
However, once the transition to the new firmware is done, the old firmware is lost and rolling back to the previous version is not possible. Fortunately, a revert procedure can be easily added using snapshots.
Adding the Revert Path
To implement the revert path, an additional “previous” snapshot of the firmware is created at the beginning of the upgrade procedure. If something goes wrong with the new firmware, the file system state is reverted to that snapshot. Then, a “current” snapshot is created from the reverted state and committed. At this point, the application can resume using the old firmware. This procedure is illustrated in figure 2.
Conclusion
In this article we have shown how snapshots can be used to gracefully handle concurrent read/write accesses. More concretely, we have used snapshots, together with write transactions, to design a straightforward, yet flexible and robust firmware upgrade scheme, with minimal impact on normal device operation.
In the next article, we will implement the designed solution using TSFS snapshots and present some peculiarities of the TSFS snapshot API.
Click here to read the next article in this series talking about implementation.
Questions or comments?
Do not hesitate to contact us at blog@jblopen.com. Your questions, comments and, suggestions are appreciated.