Linux Filesystem Durability
Write-then-fsync
There’s two ways to request data be made durable on disk:
-
Write the data to the disk’s volatile cache, and then send it a
FLUSHcommand to force it to non-volatile storage. -
Write the data to disk with the Force Unit Access (FUA) bit set to force the data directly to non-volatile storage.
If a filesystem does neither of these, then they are highly suspect as not supporting durability at all.
From kernel filesystem code, there’s some standard ways one can request a FLUSH to be issued:
-
Call
blkdev_issue_flush, which is the most "official" public API for this. -
Invoke
generic_file_fsync, which is a small wrapper aroundblkdev_issue_flushinlibfs.c, which is meant to be used from other filesystem code. -
Invoke
generic_buffers_fsync, which is similar but only used byextfilesystems for some reason. -
Submit a block IO request with
REQ_PREFLUSH, which is whatblkdev_issue_flushis doing internally. -
FUA-based durability requires submitting a block IO request with the
REQ_FUAbit set.
Scraping Linux’s fs/ directory for hits of these functions, we can construct the list of filesystems that do each of these:
Call blkdev_issue_flush
|
exfat, ext4, f2fs, fat, hfsplus, jbd2, nilfs2, ocfs2, xfs, zonefs |
Call generic_file_fsync
|
adfs, bfs, exfat, fat, minix, ntfs3, omfs, qnx4, qnx6, sysv, udf, ufs |
Call generic_buffers_fsync
|
ext2,ext4 |
Use REQ_PREFLUSH
|
bcachefs, btrfs, exfat, ext4, f2fs, gfs2, jbd2, nilfs2, xfs |
Use REQ_FUA
|
bcachefs, btrfs, exfat, ext4, f2fs, gfs2, iomap, jbd2, nilfs2, xfs |
Which leaves us with a long list of subdirectories in fs/ which don’t superficially contain any way of invoking a FLUSH command: 9p, affs, afs, autofs, befs, cachefiles, ceph, coda, configfs, cramfs, crypto, debugfs, devpts, dlm, ecryptfs, efivarfs, efs, erofs, exportfs, ext2, freevxfs, fuse, hfs, hostfs, hpfs, hugetlbfs, isofs, jffs2, jfs, kernfs, lockd, netfs, nfs, nfsd, nls, openpromfs, orangefs, overlayfs, proc, pstore, quota, ramfs, romfs, smb, squashfs, sysfs, tests, tracefs, ubifs, unicode, vboxsf, verity. For many of these, durability doesn’t make sense anyway, like romfs being read-only, nfs being networked, sysfs being not even a filesystem, or ubifs having its own bespoke storage stack. For some, it is suspicious. Mostly the implementations of historic filesystems (Plan9, BeFS, EFS, HPFS, etc.) all lack real durability?
See discussion of this page on Reddit, Hacker News, and Lobsters.