454
IBM z13s Technical Guide
B.1 Troubleshooting in complex IT environments
In a 24x7 operating environment, a system problem or incident can drive up operations costs
and disrupt service to clients for hours or even days. Current IT environments cannot afford
recurring problems or outages that take too long to repair. These outages can result in
damage to a company’s reputation and limit the company’s ability to remain competitive in the
marketplace.
However, as systems become more complex, errors can occur anywhere. Some problems
begin with symptoms that can go undetected for long periods of time. Systems often
experience “soft failures” (sick but not dead) that are much more difficult to detect. Moreover,
problems can grow, cascade, and get out of control.
The following everyday activities can introduce system anomalies and trigger either hard or
soft failures in complex, integrated data centers:
Increased volume of business activity
Application modifications to comply with changing regulatory requirements
IT efficiency efforts, such as consolidating images
Standard operational changes:
– Adding or upgrading hardware
– Adding or upgrading software, such as operating systems, middleware, and
independent software vendor products
– Modifying network configurations
– Moving workloads (provisioning, balancing, deploying, disaster recovery (DR) testing,
and so on)
Using a combination of existing system management tools helps to diagnose problems.
However, they cannot quickly identify messages that precede system problems and cannot
detect every possible combination of change and failure.
When using these tools, you might need to look through message logs to understand the
underlying issue. But the number of messages makes this process a challenging and
skills-intensive task, and also error-prone.
To meet IT service challenges and to effectively sustain high levels of availability, a proven
way is needed to identify, isolate, and resolve system problems quickly. Information and
insight are vital to understanding baseline system behavior along with possible deviations.
Having this knowledge reduces the time that is needed to diagnose problems, and address
them quickly and accurately.
The current complex, integrated data centers require a team of experts to monitor systems
and perform the real-time diagnosis of events. However, it is not always possible to afford this
level of skill for these reasons:
A z/OS sysplex might produce more than 40 GB of message traffic per day for its images
and components alone. Application messages can significantly increase that number.
There are more than 40,000 unique message IDs defined in z/OS and the IBM software
that runs on z/OS. Independent software vendor (ISV) or client messages can increase
that number.
Summary of Contents for z13s
Page 2: ......
Page 3: ...International Technical Support Organization IBM z13s Technical Guide June 2016 SG24 8294 00 ...
Page 24: ...THIS PAGE INTENTIONALLY LEFT BLANK ...
Page 164: ...136 IBM z13s Technical Guide ...
Page 226: ...198 IBM z13s Technical Guide ...
Page 256: ...228 IBM z13s Technical Guide ...
Page 414: ...386 IBM z13s Technical Guide ...
Page 464: ...436 IBM z13s Technical Guide ...
Page 476: ...448 IBM z13s Technical Guide ...
Page 498: ...470 IBM z13s Technical Guide ...
Page 502: ...474 IBM z13s Technical Guide ...
Page 568: ...540 IBM z13s Technical Guide ...
Page 578: ...550 IBM z13s Technical Guide ...
Page 584: ...556 IBM z13s Technical Guide ...
Page 585: ...ISBN 0738441678 SG24 8294 00 1 0 spine 0 875 1 498 460 788 pages IBM z13s Technical Guide ...
Page 586: ......
Page 587: ......
Page 588: ...ibm com redbooks Printed in U S A Back cover ISBN 0738441678 SG24 8294 00 ...