Application ReStart (AppRS) Overview
ClusterPack
Application ReStart (AppRS) Overview
Index
|
Administrators Guide
|
Users Guide
|
Tool Overview
|
Related Documents
|
Dictionary
3.4.1 What is AppRS?
3.4.1 What is AppRS?
AppRS is a collection of software that works in conjunction with Platform Computing's
Clusterware™ to provide a fail-over system that preserves the current working directory
(CWD) contents of applications in the event of a fail-over. Many technical applications
provide application-level checkpoint/restart facilities in which the application can save and
restore its state from a file set. Checkpoint/restart is particularly helpful for long running
applications because it can minimize lost computing time due to computer failure. The
usefulness of this capability is diminished however by two factors. First, computer failure
frequently leaves the restart files inaccessible. Using a shared file system does not preclude
data loss and can introduce performance degradation. Redundant hardware solutions are often
financially impractical for large clusters used in technical computing. Secondly, applications
affected by computer failure generally require human detection and intervention in order to be
restarted from restart files. Valuable compute time is often lost between the time that the job
fails and a user is made aware of the failure. Clusterware™ + AppRS provides functionality to
migrate and restart applications affected by an unreachable host and ensure that the content of
the CWD of such applications is preserved across a migration.
AppRS is accessed by submitting jobs to AppRS-enabled queues. Such queues generally end
in "_apprs". A number of utilities are also available for monitoring a job and its files:
z
apprs_hist
z
apprs_ls
z
apprs_clean
z
apprs_mpijob
More information is available in the man page or HP Application ReStart User's Guide.
% man apprs
Содержание 1032
Страница 101: ...Copyright 1994 2004 hewlett packard company ...
Страница 115: ...Copyright 1994 2004 hewlett packard company ...
Страница 167: ...Copyright 1994 2004 hewlett packard company ...