Chapter 5: Protection
87
In training your staff, you should note that restoring to another online server while the
first is online is a very difficult procedure to the majority of disaster recovery procedures,
because you have to recover to a different forest under those circumstances. The best way
to simulate the type of restore you may have to perform in an emergency is using a test
network that is completely separate from the main network. This allows you to simulate
anything from failures of stores, to total hardware failures of servers and learn what to do
under those circumstances.
However, this does not mean you should not perform alternate server restores. These
restores tell us other information, such as the fact that the backup software/tapes/storage
procedures are working properly, and that a particular live database can be backed up and
restored with no hitches. After all, there is no point in having highly trained staff to do a
restore if the restores themselves will fail because of faulty tapes. You should ensure that
every one of your databases has been restored to an alternate server at least once every six
months.
The Operations Manager should be responsible for ensuring that the organization is fully
prepared for disaster recovery. This involves regular restores being performed for each
Exchange server and each backup device, by the staff who would be involved in restores
when they are actively required.
Summary
In an ideal world, Exchange would never suffer problems. However, we live in a world of
very diverse hardware and software, viruses and hackers, so it is inevitable that sometimes
you are going to run into difficulties with your Exchange configuration.
As you have seen here, to help you meet your SLAs it is vital to minimize the recovery time
in the event of a failure. However, in some cases it is very important that you shut down
services to protect the system, even though this will affect the availability measurement
defined in your SLA.
To protect operations against the unforeseen it is important to factor in unusual circum-
stances in your service level agreements. For example you may have established that you
have such resilient hardware and such efficient restore technology that you are able to
achieve 99.998 percent uptime in your organization (around 10 minutes downtime across
your organization per year). However, if a new virus hits your company because the anti-
virus vendor hasn’t informed you about it, then you may end up having to shut down
Exchange services just to prevent more damage. You can deal with this eventuality in two
ways. Either you can take a risk assessment on the effect of such unforeseen circumstances
and reduce the SLA accordingly, or you can simply modify the SLA so it states that if the
corporation is victim to an unforeseen hacker attack or virus attack and you are able to
show that you used your best efforts to combat the problem, then this downtime will not
count against the service level agreement.