Galaxy
®
GHDX2-ADA DICOM Storage and Archive Appliance Admin Guide
71
Appendix C: Recovery from a Multiple Hard Drive Failure
Failure Discovery
A complete data disk failure will be discovered by several subsystems, for example
PostgressDB, QStar HSM, JBOSS functionality and warning symbols in the Web Interface.
The complete failure of the data disks may be the result of the RAID controller failure. In
this case the failure should be qualified as “catastrophic failure” and the computer shall be
replaced. For instruction on recovery from a catastrophic failure see Appendix D.
If both hard disks fail at the same time, (or one hard disk failure was neglected for a long
time) the system cannot be restored to the state it was in at the moment of the failure.
The only possibility is to restore data from the ADA optical disks. This means the DICOM
appliance can be recovered only to the state currently saved to ADA optical disks. The ADA
optical disks contain two important pieces of data: the Postgres data base backups and the
pacs_nearline tar files.
In the event of a failure the DICOM Appliance will email the Administrator. The
Administrator should contact their hardware support organization to make the needed repair.
The following procedure is designed to be used by Hardware Support personnel.
Overview of Recovery Procedures for a Multiple Hard Drive Failure
A hard drive failure is indicated by the light on a hard drive changing colors from green to
yellow or red. There is also a warning posted in the GUI which will allow a user to be aware
of the failure. If both hard drives fail the Monitoring and RAID menu items will display a
RED warning sign to alert the user there has been a catastrophic failure. Portions of the
GalaxyDICOM Appliance Web Interface will continue to report, however, any page that
requires information from the RAID will produce PHP errors.
The service procedure to recover from a multiple hard drive failure is accomplished with the
script called:
”/usr/bin/recoverGalaxyFromRAIDFailure.sh”
The majority of the operations are implemented in the script but there are some portions of
the procedure which may require manual intervention.
The “recoverGalaxyFromRAIDFailure.sh” script allows multiple executions. Since the script
performs step-by-step actions, it will verify if a particular action is recovered and then goes
on to the next action.
For example; if the RAID is already recovered the script should go to the next step –
verification of the partitions. If pacs_nearline is restored (/mnt/pacs_nearline is mounted
and set is recovered) the script shall go to the PostgreSQL data base verification and
recovery.
Essentially if everything is recovered or if the script is started on the normally running
GalaxyDICOM appliance it will print an OK message for each step and exit with a message
of “All systems are in operational state”.