1.7.3
Fault Detection
A system offering redundancy can be less reliable than a non-redundant system. The system must be able to detect and
annunciate faults so it can be repaired before a forced outage occurs. Fault detection is needed to ensure a component or group
of components are operating properly. Fault detection is achieved through one or more of the following methods.
•
Operator inspection of the process
•
Operator inspection of the equipment.
•
Special hardware circuits to monitor operation
•
Hardware and software watchdogs
•
Software logic
•
Software heartbeats
Complex control systems have many potential failure points. This can be very costly and time consuming to create foolproof
fault detection. Failure to control the outputs of a system is the most damaging. Fault detection must be determined as close to
the output as possible to achieve the highest level of reliability. With the Mark VIe control configured using triple redundant
controllers and I/O modules, a high level of detection and fault masking is provided by voting the outputs of all three
controllers and monitoring discrepancies.
All Mark VIe control systems benefit from the fault detection design of the I/O packs. Every pack includes function-specific
fault detection methods attempting to confirm correct operation. This is made possible by the powerful local processing that is
present in each input and output pack. Some examples of this include:
•
Analog to digital (A/D) converters are compared to a reference standard each conversion cycle. If the converted
calibration input falls outside of acceptable ranges, the pack declares bad health.
•
Analog output 0-20 mA use a small current-sense resistor on the output terminal board. This output is read back through
a separate A/D converter and compared to the commanded value. A difference between the commanded and actual value
exceeding an acceptable level results in the output being declared in bad health.
•
Discrete input opto-isolators are periodically forced to an on condition, then forced off. This is done independently of the
actual input and is fast enough not to interfere with the sequence of events (SOE) time capture. If any signal path is stuck
and does not respond to the test command, the signal is declared in bad health.
1.7.4
Online Repair
When a component failure is detected and healed in the control system on a critical path, a potential failure has been avoided.
Subsequent actions can include:
Option 1- Continue running until the backup component fails.
Option 2 - Continue running until the system is brought down in a controlled manner to replace the failed component.
Option 3 - Replace the component online.
Option 1 is not recommended. A redundant system, where the MTTR is infinite can have a lower total reliability than a
simplex system.
Option 2 is a valid procedure for some processes needing predictable mission times. Many controlled processes cannot be
easily scheduled for a shut down.
Note
As MTTR increases from the expected four hours to infinite, the system reliability can decline from significantly
greater down to less than a simplex system reliability. Repair should be accomplished as soon as possible.
Option 3 is required to get the maximum benefit from redundant systems with long mission times. In dual or triple redundant
Mark VIe controller applications, the controllers and redundant I/O packs can be replaced online.
Control System Overview
GEH-6721_Vol_I_BP System Guide 51
Public Information