Seagate Enterprise Performance - 3.5 HDD SAS Product Manual, Rev. B
18
5.2.3
S.M.A.R.T.
S.M.A.R.T. is an acronym for Self-Monitoring Analysis and Reporting Technology. This technology is intended to recognize conditions that indicate
imminent drive failure and is designed to provide sufficient warning of a failure to allow an application to back up the data before an actual failure
occurs.
.
Each monitored attribute has been selected to monitor a specific set of failure conditions in the operating performance of the drive and the
thresholds are optimized to minimize “false” and “failed” predictions.
Controlling S.M.A.R.T.
The operating mode of S.M.A.R.T. is controlled by the DEXCPT and PERF bits on the Informational Exceptions Control mode page (1Ch). Use the
DEXCPT bit to enable or disable the S.M.A.R.T. feature. Setting the DEXCPT bit disables all S.M.A.R.T. functions. When enabled, S.M.A.R.T. collects on-
line data as the drive performs normal read and write operations. When the PERF bit is set, the drive is considered to be in “On-line Mode Only” and
will not perform off-line functions.
Applications can measure off-line attributes and force the drive to save the data by using the Rezero Unit command. Forcing S.M.A.R.T. resets the
timer so that the next scheduled interrupt is in one hour.
Applications can interrogate the drive through the host to determine the time remaining before the next scheduled measurement and data
logging process occurs. To accomplish this, issue a Log Sense command to log page 0x3E. This allows applications to control when S.M.A.R.T.
interruptions occur. Forcing S.M.A.R.T. with the RTZ command resets the timer.
Performance impact
S.M.A.R.T. attribute data is saved to the disk so that the events that caused a predictive failure can be recreated. The drive measures and saves
parameters once every hour subject to an idle period on the drive interfaces. The process of measuring off-line attribute data and saving data to the
disk is interruptable. The maximum on-line only processing delay is summarized below:
Reporting control
Reporting is controlled by the MRIE bits in the Informational Exceptions Control mode page (1Ch). For example, if the MRIE is set to one, the
firmware will issue to the host an 01-5D00 sense code. The FRU field contains the type of predictive failure that occurred. The error code is preserved
through bus resets and power cycles.
Determining rate
S.M.A.R.T. monitors the rate at which errors occur and signals a predictive failure if the rate of degraded errors increases to an unacceptable level. To
determine rate, error events are logged and compared to the number of total operations for a given attribute. The interval defines the number of
operations over which to measure the rate. The counter that keeps track of the current number of operations is referred to as the Interval Counter.
S.M.A.R.T. measures error rates. All errors for each monitored attribute are recorded. A counter keeps track of the number of errors for the current
interval. This counter is referred to as the Failure Counter.
Error rate is the number of errors per operation. The algorithm that S.M.A.R.T. uses to record rates of error is to set thresholds for the number of
errors and their interval. If the number of errors exceeds the threshold before the interval expires, the error rate is considered to be unacceptable. If
the number of errors does not exceed the threshold before the interval expires, the error rate is considered to be acceptable. In either case, the
interval and failure counters are reset and the process starts over.
Note
The drive’s firmware monitors specific attributes for degradation
over time but can’t predict instantaneous drive failures.
Maximum processing delay
Fully-enabled delay DEXCPT = 0
S.M.A.R.T. delay times
75 ms