73
Copyright © Acronis, Inc., 2000-2010
Deduplication at target
After backup to a deduplicating vault is completed, the storage node runs the
indexing task
to
deduplicate data in the vault as follows:
1.
It moves the items (disk blocks or files) from the archives to a special file within the vault, storing
duplicate items there only once. This file is called the
deduplication data store
. If there are both
disk-level and file-level backups in the vault, there are two separate data stores for them. Items
that cannot be deduplicated remain in the archives.
2.
In the archives, it replaces the moved items with the corresponding references to them.
As a result, the vault contains a number of unique, deduplicated items, with each item having one or
more references to it from the vault's archives.
The indexing task may take considerable time to complete. You can see this task's state in the
Tasks
view on the management server.
Compacting
After one or more backups or archives have been deleted from the vault—either manually or during
cleanup—the vault may contain items which are no longer referred to from any archive. Such items
are deleted by the
compacting task,
which is a scheduled task performed by the storage node.
By default, the compacting task runs every Sunday night at 03:00. You can re-schedule the task as
described in Actions on storage nodes (p. 325), under "Change the compacting task schedule". You
can also manually start or stop the task from the
Tasks
view.
Because deletion of unused items is resource-consuming, the compacting task performs it only when
a sufficient amount of data to delete has accumulated. The threshold is determined by the
Compacting Trigger Threshold
(p. 341) configuration parameter.
2.14.6.3
When deduplication is most effective
The following are cases when deduplication produces the maximum effect:
When backing up in the
full backup mode
similar data from different sources. Such is the case
when you back up operating systems and applications deployed from a single source over the
network.
When performing
incremental backups
of similar data from different sources, provided that the
changes to the data are also similar
. Such is the case when you deploy updates to these systems
and apply the incremental backup. Again, it is recommended that you first back up one machine
and then the others, all at once or one by one.
When performing
incremental backups
of data that does not change itself, but
changes its
location
. Such is the case when multiple pieces of data circulate over the network or within one
system. Each time a piece of data moves, it is included in the incremental backup which becomes
sizeable while it does not contain new data. Deduplication helps to solve the problem: each time
an item appears in a new place, a reference to the item is saved instead of the item itself.
Deduplication and incremental backups
In case of random changes to the data, de-duplication at incremental backup will not produce much
effect because:
The deduplicated items that have not changed are not included in the incremental backup.