Sync Write (Log every commited write to a ZIL device)
ZFS always collects small and slow randow writes in a rambased write cache for a few seconds and write them
together as a single large sequential write. A commit to a writing application means, yes data is in cache. It
does not mean data is on disk. This behaviour is essential to achieve performance. A powerloss can result in a
few seconds of lost data. As the writing application has no control this can affect transactions where for
exampe the first transaction is on disk after a write and the next dependent transaction is lost in cache on a
powerloss.
If you need a transaction safe behaviour where a commit must means, yes data is on disk, you can enforce
sync for a filesystem or allow on request of a writing application (sync=always or default) . In such a case you
enable a sync write logging for every commited small random write directly to stable storage to a special device
called ZIL followed by a regular fast sequential cached write of multiple transactions to the pool .
Basically you have not achieved anything on first view. In contrast, you have fast combined cached writes and
additionally uncached commited random writes of every single datablock as both write actions go to the same
pool, This is why you sometimes discover that on a slow pool enabling sync reduces your pool performance to
10% of the value that you can achieve with sync=disabled.
Dedicated Slog (ZIL on a separate device)
This is why you can add an Slog to the pool. This is an additional DRAM or Flash based disk or NVMe. If you use
an Slog device with powerlosss protection and a much lower latency, much higher iops values than your pool,
you can ensure a safe uncached sync write behaviour without such a dramatically reduced performance.
Remember: The ZIL is not a write cache device. This is already offered by ZFS in RAM. Its an additional
logdevice that contains all committed data what means that it only need to be able to store about 10s of
writes. Even with a single 10G connection, about 8GB is enough. This is why one of the fastest Slog devices,
a ZeusRAM has only 8GB of battery buffered DRAM. The Slog device content is only read on a reboot after a
crash to redo commited writes. A good newer Slog device is an Intel S/P DC 3700. You do not need to mirror the
Slog unless you do not need to ensure performance on a Slog failure as ZFS will otherwise revert performance
to the slow onpool ZIL. With current ZFS even a pool import with a missing Slog is possible.
SSD Powerloss problems
In the past, SSDs were mainly used as an L2Arc cache device or an Slog. As a single spindle disk offers only
about 100 iops while enterprise SSDs can offer 40000 - 80000 iops under constant steady load. Do not believe
the 100000 iops of cheap desktop SSDs as they offer this only for a very short time and performancy degrades
then to a few thousand iops while enterprise SSD can hold their performance on a steady constant write load.
While SSDs improves IO performance dramatically over spindels, they come with a serious security problem. As
writes can be only done in pages and you need an erase cycle prior rewriting data, the SSD firmware is
constanly reorganising data in the background (garbage collection). A power outage can mean a data loss at
any time with SSDs, does not matter if there is storage activity or not and not even CopyOnWrite can protect
you against. Raid and checksums can help to repair problems on access.
For a professional setup, you should avoid this problem by using SSDs with powerloss protection. All enterprise
SSDs offer this as a feature. Some cheaper desktop SSDs offers also powerloss protection but it is sometimes
not clear if this affects data written from the OS and the background data from garbage collection
I prefer HGST, Intel or Samsungs in the datacenter editions ex a small Intel S3510 as bootdisk and Intel S3510
for readoptimized pools and the 3610 or 3700/3710 models with better write iops values.
Summary of Contents for ZFS Storage
Page 8: ...3 1 ZFS Configurations...