napp-it ZFS Storage Скачать руководство пользователя страница 35

Страница: 35 / 63

15. Data Scrubbing

With older filesystems like Ext4 or NTFS you have had the following problem.
You edit a file and you have a crash or powerloss while you write/ update the file. As these
filesystems do inline modifications (modifiy parts of the current data on disk) the following can happen:

regarding your data
1. nothing written, old data is valid
2. new valid data, modifications are correct
3. modificated data is parly written, no chance to detect or repair the problem

regarding the metadata
1. metadate correct updated
2. metadata corrupt=corrupt filesystem, an offline fschk is needed to repair the filesystem structure

No chance to detect metadata problems beside a offline fschk that can last days and this does not
even help to detect or repair data corruptions. The result is only valid metadata structures.

ZFS, a new generation filesystem

https://en.wikipedia.org/wiki/ZFS

ZFS stands for a new generation of filesystem that do not try to reduce these problems but to avoid them
completely with two basic principles: CopyOnWrite and End to End Checksums on data/OS level.
CopyOnWrite means, that you do not do inline updates of old data but write datablocks always new.
In the above powerloss szenario, ZFS behaves like:

regarding your data and metadata
1. modified data is written completely new, data pointers are updated, new data is valid and verified
2. modified data is not written completely, data pointers are not updated, old data keeps valid and verified
3. If anything goes wrong, this will be detected by checksums on next read and auto repaired (self healing)

That does not mean, that you cannot have a dataloss. If you write/update a large file with a powerloss,
this file is corrupt and damaged (no miracle with ZFS) but your filesystem is not damaged ans always valid.

reasons for corrupted data on disk

- Powerloss or a system crash
- I/O errors due to a bad disk, controller, cabling, backplane or PSU
- On-disk data corruption due to cosmic rays, magnetic or electromagnetic fields
- Driver bugs resulting in data being transferred to or from the wrong location
- A user or Virus modifying data by accident or intention

All beside the last problem can be at least detected and mostly autorepaired by checksums and redundancy.
For the last point ZFS offers snapshots. On a multiTerabyte Array you will always find corrupted data over time.

Scrub on every read, on demand or as a planned job

On every read, data is checked and repaired from Raid redundancy when a checksum error is detected. (auto
self healing filesystem). If you want to check the whole pool you can start a scrub manually or as a planned
napp-it Job. With desktop disks I would do this once a month ex on a low io day like saturday. Unlike a
traditional fschk that require an offline filesystem for quite a long time without a real repair option,
a scrub is a online process that runs in the background with low priority and verifies/repair all data.

I own many server and systems and see checksum repairs quite often: This feature is mandatory!