background image

SG24-5131-00

International Technical Support Organization

http://www.redbooks.ibm.com

IBM Certification Study Guide
AIX HACMP  

David Thiessen, Achim Rehor, Reinhard Zettler
 

Содержание AIX HACMP SG24-5131-00

Страница 1: ...SG24 5131 00 International Technical Support Organization http www redbooks ibm com IBM Certification Study Guide AIX HACMP David Thiessen Achim Rehor Reinhard Zettler...

Страница 2: ......

Страница 3: ...IBM Certification Study Guide AIX HACMP May 1999 SG24 5131 00 International Technical Support Organization...

Страница 4: ...r 5765 D28 for use with the AIX Operating System Version 4 3 2 and later Comments may be addressed to IBM Corporation International Technical Support Organization Dept JN9B Building 003 Internal Zip 2...

Страница 5: ...4 2 3 Cluster Disks 16 2 3 1 SSA Disks 16 2 3 2 SCSI Disks 26 2 4 Resource Planning 28 2 4 1 Resource Group Options 28 2 4 2 Shared LVM Components 30 2 4 3 IP Address Takeover 34 2 4 4 NFS Exports and...

Страница 6: ...m 88 3 4 6 Alternate Method TaskGuide 90 Chapter 4 HACMP Installation and Cluster Definition 93 4 1 Installing HACMP 93 4 1 1 First Time Installs 93 4 1 2 Upgrading From a Previous Version 96 4 2 Defi...

Страница 7: ...ters 132 6 1 3 Process State 132 6 1 4 Network State 132 6 1 5 LVM State 133 6 1 6 Cluster State 133 6 2 Simulate Errors 134 6 2 1 Adapter Failure 134 6 2 2 Node Failure Reintegration 137 6 2 3 Networ...

Страница 8: ...ze Cluster Resources 168 8 5 3 DARE Resource Migration Utility 169 8 6 Applying Software Maintenance to an HACMP Cluster 174 8 7 Backup Strategies 176 8 7 1 Split Mirror Backups 176 8 7 2 Using Events...

Страница 9: ...luster Technology RSCT 200 10 2 2 Enhanced Cluster Security 201 10 3 High Availability for Network File System for AIX 201 10 4 Similarities and Differences 202 10 5 Decision Criteria 202 Appendix A S...

Страница 10: ...viii IBM Certification Study Guide AIX HACMP...

Страница 11: ...necting Networks to a Hub 61 10 7135 110 RAIDiant Arrays Connected on Two Shared 8 Bit SCSI Buses 74 11 7135 110 RAIDiant Arrays Connected on Two Shared 16 Bit SCSI Buses77 12 Termination on the SCSI...

Страница 12: ...x IBM Certification Study Guide AIX HACMP...

Страница 13: ...7133 Models 010 020 500 600 D40 T40 Specifications 18 8 SSA Disks 19 9 SSA Adapters 19 10 The Advantages and Disadvantages of the Different RAID Levels 24 11 Necessary APAR Fixes 55 12 AIX Prerequisi...

Страница 14: ...xii IBM Certification Study Guide AIX HACMP...

Страница 15: ...icable to the job role and is a meaningful and appropriate assessment of skill Subject Matter Experts who successfully perform the job participate throughout the entire development process These job i...

Страница 16: ...certification content this publication can also be used as a desk side reference So whether you are planning to take the AIX HACMP certification exam or just want to validate your HACMP skills this bo...

Страница 17: ...ontributions to this project Marcus Brewer International Technical Support Organization Austin Center Rebecca Gonzalez IBM AIX Certification Project Manager Austin Milos Radosavljevic International Te...

Страница 18: ...xvi IBM Certification Study Guide AIX HACMP...

Страница 19: ...Available Clusters Certification Requirement two Tests To attain the IBM Certified Specialist AIX HACMP certification candidates must first obtain the AIX System Administration or the AIX System Suppo...

Страница 20: ...ions Assist customers in identifying HA applications Evaluate the Customer Environment and Tailorable Components Evaluate the configuration and identify Single Points of Failure SPOF Define and analyz...

Страница 21: ...led shared disk tests Troubleshoot a failed application Troubleshoot failed Pre Post event scripts Troubleshoot failed error notifications Troubleshoot errors reported by cluster verification Section...

Страница 22: ...eb site http www ibm com certify Table 1 AIX Version 4 HACMP Installation and Implementation Course Number Q1054 USA AU54 Worldwide Course Duration Five days Course Abstract This course provides a det...

Страница 23: ...this course include Integrating the cluster with existing network services DNS NIS etc Monitoring tools for the cluster including HAView for Netview Maintaining user IDs and passwords across the clust...

Страница 24: ...6 IBM Certification Study Guide AIX HACMP...

Страница 25: ...or AIX and HACMP ES We realize that the rapid pace of change in products will almost certainly render any snapshot of the options out of date by the time it is published This is true of almost all tec...

Страница 26: ...ture of normal RS 6000 machines and RS 6000 SP nodes is possible 2 1 2 Cluster Node Considerations It is important to understand that selecting the system components for a cluster requires careful con...

Страница 27: ...he requirements of highly available applications not only in terms of CPU cycles but also of memory and possibly disk space Approximately 50 MB of disk storage is required for full installation of the...

Страница 28: ...ature is available for token ring FDDI or ATM you must use an I O slot to provide token ring adapter redundancy Table 4 Number of Adapter Slots in Each Model 1 The switch adapter is onboard and does n...

Страница 29: ...Version 4 3 product does not use non TCP IP networks for node to node communications in triggering synchronizing and executing event reactions This can be an issue if you are configuring a cluster wit...

Страница 30: ...m availability in that the more communication paths that connect clustered nodes and clients the greater the degree of network availability 2 2 1 2 Special Network Considerations Each type of interfac...

Страница 31: ...umference of 100 kilometers ATM is a point to point connection network It currently supports the OC3 and the OC12 standard which is 155 Mbps or 625 Mbps You cannot use hardware address swapping with A...

Страница 32: ...the SP Switch For IP Address Takeover IPAT in general there are two adapters per cluster node and network recommended in order to eliminate single points of failure The only exception to this rule is...

Страница 33: ...requires two serial ports per node Table 3 shows a list of possible cluster nodes and the number of native serial ports for each Table 5 Number of Available Serial Ports in Each Model 1 serial port ca...

Страница 34: ...not use more than 4 target mode SCSI networks in a cluster Target mode SSA If you are using shared SSA devices target mode SSA is the third possibility for a serial network within HACMP In order to us...

Страница 35: ...ode can be either an initiator or a target An initiator issues commands while a target responds with data and status information The SSA nodes in the adapter are therefore initiators while the SSA nod...

Страница 36: ...Multi Storage Tower Specifications Table 7 7133 Models 010 020 500 600 D40 T40 Specifications Item Specification Transfer rate SSA interface 80 MB Configuration 2 to 5 disk drives 2 2 GB 4 5 GB or 9...

Страница 37: ...k Yes and hot swappable redundant power and cooling Name Capacities GB Buffer size KB Maximum Transfer rate MBps Starfire 1100 1 1 0 20 Starfire 2200 2 2 0 20 Starfire 4320 4 5 512 20 Scorpion 4500 4...

Страница 38: ...tors A1 and A2 or Connectors B1 and B2 Only one of the two pairs of connectors on an adapter card can be connected in a single SSA loop A maximum of 48 devices can be connected in a single SSA loop A...

Страница 39: ...mum of two adapters can be connected in a particular loop if one or more of the disk drives in the loop are array disk drives that are not configured for fast write operations The adapters can be two...

Страница 40: ...ttern skew is eliminated due to the distribution of the data This means that with data distributed evenly across a number of disks random accesses will most likely find the required information spread...

Страница 41: ...ve that the required data is actually on This means that simultaneous as well as independent reads are possible Write requests however require a read modify update cycle that creates a bottleneck at t...

Страница 42: ...the 7133 Disk Subsystem The only RAID level supported by the 7133 SSA disk subsystem is RAID 5 RAID 0 and RAID 1 can be achieved with the striping and mirroring facility of the Logical Volume Manager...

Страница 43: ...unning the same applications The advantages of SSA are summarized as follows Dual paths to devices Simplified cabling cheaper smaller cables and connectors no separate terminators Faster interconnect...

Страница 44: ...I 2 Differential 8 bit or 16 bit bus 2 3 2 1 Capacities Disks There are four disk sizes available for the 7135 RAIDiant Array Models 110 and 210 1 3 GB 2 0 GB 2 2 GB only supported by Dual Active Soft...

Страница 45: ...volume groups different parts of the subsystem can be logically attached to different systems at any one time Redundant Power Supply Redundant power supplies provide alternative sources of power If on...

Страница 46: ...g nodes HACMP considers the following as resource types Volume Groups Disks File Systems File Systems to be NFS mounted File Systems to be NFS exported Service IP addresses Applications The following...

Страница 47: ...to false the first node in a group s resource chain to join the cluster acquires all the resources in the resource group only if it is the node with the highest priority for that group If the first no...

Страница 48: ...anager makes the following assumptions about the acquisition of resource groups Cascading The active node with the highest priority controls the resource group Concurrent All active nodes have access...

Страница 49: ...nfiguration is that you can shift from a single system environment to an HACMP cluster at a low cost by adding a less powerful processor Of course this assumes that you are willing to accept a lower l...

Страница 50: ...tion Figure 3 Mutual Takeover Configuration In this configuration there are two cascading resource groups A and B Resource group A consists of two disks hdisk1 and hdisk3 and one volume group sharedvg...

Страница 51: ...look at it from the point of view of performance this is the best thing to do since you have one node doing the work of two when any one of the nodes is down Third Party Takeover Configuration Figure...

Страница 52: ...roup have no priorities assigned to them If a 7135 RAIDiant Array Subsystem is used for storage you can have a maximum of four nodes concurrently accessing a set of storage resources If you are using...

Страница 53: ...rk Topology The following sections cover topics of network topology Single Network In a single network setup each node in the cluster is connected to only one network and has only one service adapter...

Страница 54: ...o another In normal cluster activity however each network is separate both logically and physically Keep in mind that a client unless it is connected to more than one network is susceptible to network...

Страница 55: ...ibute Network Name The network name is a symbolic value that identifies a network in an HACMP for AIX environment Cluster processes use this information to determine which adapters are connected to th...

Страница 56: ...ters in an HACMP cluster have a label and a function service standby or boot The maximum number of network interfaces per node is 24 Adapter Label A network adapter is identified by an adapter label F...

Страница 57: ...n and hardware slot constraints determine the actual number of standby adapters that a node can support The standby adapter is configured on a different subnet from any service adapters on the same sy...

Страница 58: ...de A reclaims the address and reintegrates it into the cluster Reintegration however fails if Node A has not been configured to boot using its boot address The boot address does not use a separate phy...

Страница 59: ...provide a highly available environment for mission critical applications These applications must remain available at all times in many organizations For example an HACMP cluster could run a database s...

Страница 60: ...uster could be Under normal conditions the load is serviced by a cluster node that was designed for this application s needs In case of a failover another node has to handle its own work plus the appl...

Страница 61: ...ce with other Applications In case of a failover a node might have to handle several applications concurrently This means the applications data or resources must not conflict with each other Again the...

Страница 62: ...s The HACMP for AIX Version 4 3 Installation Guide SC23 4278 describes how to configure event processing for a cluster You cannot define additional cluster events You can however define multiple pre a...

Страница 63: ...sue you can insert a recovery command with a retry count high enough to be sure to cover for the problem 2 6 2 Error Notification The AIX Error Notification facility detects errors that are logged to...

Страница 64: ...HACMP there is a SMIT screen to make it easier to set up an error notification object This is much easier than the traditional AIX way of adding a template file to the ODM class Under smit hacmp RAS S...

Страница 65: ...ly you can always customize any cluster event to enable a Notify Command whenever this event is triggered through the SMIT screen for customizing events 2 6 2 3 Application Failure Even application fa...

Страница 66: ...ing problems caused by mismatches in the user or group IDs System administrators typically keep user accounts synchronized across cluster nodes by copying the key system account and security files to...

Страница 67: ...ferent approaches to that You could either put them on a shared volume and handle them within a resource group or you could use NFS mounts 2 7 3 1 Home Directories on Shared Volumes Within an HACMP cl...

Страница 68: ...urce where they are physically residing they have to be NFS exported from the resource group and imported on all the other nodes in case any application is running there needing access to the users fi...

Страница 69: ...d a computer system physical disk devices are usually the most susceptible to failure Because of this disk mirroring is a frequently used technique for increasing system availability File system mirro...

Страница 70: ...y If the dump device is mirrored you may not be able to capture the dump image from a crash or the dump image may be corrupted The design of LVM prevents mirrored writes of the dump device Only one of...

Страница 71: ...so mirror those logical volumes in addition to hd6 If hd5 consists of more than one logical partition then after mirroring hd5 you must verify that the mirrored copy of hd5 resides on contiguous physi...

Страница 72: ...command This is so that the Quorum OFF functionality takes effect syncvg v rootvg bosboot a d dev hdisk bootlist m normal hdisk0 hdisk1 Even though this command identifies the list of possible boot d...

Страница 73: ...Prerequisite LPPs The Prerequisites for the HACMP component HAView 4 2 are xlC rte 3 1 3 0 nv6000 base obj 4 1 0 0 AIX Version APARs needed 4 1 IX56564 IX61184 IX60521 4 2 IX62417 IX68483 IX70884 IX7...

Страница 74: ...n poor interactive performance from some applications when another application on the system is doing heavy input output Under certain conditions I O can take several seconds to complete While the hea...

Страница 75: ...ry from system to system an initial high water mark of 33 and a low water mark of 24 provides a good starting point These settings only slightly reduce write times and consistently generate correct fa...

Страница 76: ...abled users that are known only in the NIS managed version of the etc passwd file will not be able to create crontabs This is because cron is started with the etc inittab file with run level 2 for exa...

Страница 77: ...gr daemon however does not depend on rhosts file entries The rhosts file is not required on SP systems running the HACMP Enhanced Security This feature removes the requirement of TCP IP access control...

Страница 78: ...f your clusters you have to check whether your network cabling allows you to put two cluster nodes away from each other or even in different buildings There s one additional point with cabling that sh...

Страница 79: ...g Networks to a Hub 3 2 1 2 IP Addresses and Subnets The design of the HACMP for AIX software specifies that All client traffic be carried over the service adapter Standby adapters be hidden from clie...

Страница 80: ...oice of a transmission route also facilitates identifying an adapter failure See Chapter 2 4 3 IP Address Takeover on page 34 for more detailed information 3 2 1 3 Testing After setting up all adapter...

Страница 81: ...kind Therefore when we are talking about HACMP network definitions a serial network could also be a target mode SCSI or target mode SSA network The following describes some cabling issues on each typ...

Страница 82: ...parent device cannot be changed as long as there are child devices present and active you have to set all the disks on that bus to Defined with the rmdev l hdiskx command before you can enable that f...

Страница 83: ...p tmssa The Target Mode SCSI or SSA serial network can now be configured into an HACMP cluster 3 2 2 5 Testing RS232 and Target Mode Networks Testing of the serial networks functionality is similar Ba...

Страница 84: ...second command 3 3 Cluster Disk Setup The following sections relate important information about cluster disk setup 3 3 1 SSA The following sections describe cabling AIX configuration microcode loading...

Страница 85: ...ot time the configuration manager of AIX configures all the device drivers needed to have the SSA disks available for usage The configuration manager can t do this configuration if the SSA Subsystem i...

Страница 86: ...ault one pdisk is always configured for each physical disk drive One hdisk is configured for each disk drive that is connected to the using system or for each array By default all disk drives are conf...

Страница 87: ...SSA Service Aids This will give you the following options Set Service Mode This option enables you to determine the location of a specific SSA disk drive within a loop and to remove the drive from the...

Страница 88: ...e for your SSA disk subsystem The latest information and downloadable files can be found under http www hursley ibm com ssa Upgrade Instructions Follow these steps to perform an upgrade 1 Login as roo...

Страница 89: ...s in other systems please repeat this procedure on all systems as soon as possible 17 In order to install the disk microcode run ssadload u from each system in turn You must ensure that You do not att...

Страница 90: ...SSA Enhanced Raid adapters but with the Logical Volume Manager LVM RAID0 and RAID1 can be configured on non RAID disks In order to create a RAID5 on SSA Disks use the command smitty ssaraid This will...

Страница 91: ...on protects you against any failure SCSI adapter cables or RAID controller on either SCSI bus Because of cable length restrictions a maximum of two 7135s on a shared SCSI bus are supported by HACMP 3...

Страница 92: ...en an SCSI 2 Differential Y Cable and a Differential SCSI Cable going to the 7135 unit as shown in Figure 10 Figure 10 shows four RS 6000s each represented by two SCSI 2 Differential Controllers conne...

Страница 93: ...N 67G1262 OR FC 2914 or 9214 14m PN 67G1263 OR FC 2918 or 9218 18m PN 67G1264 16 Bit Terminator T Included in FC 2426 Y Cable PN 61G8324 Figure 11 shows four RS 6000s each represented by two SCSI 2 Di...

Страница 94: ...76 IBM Certification Study Guide AIX HACMP T T T T 6 bit 6 16 bit 2416 16 2424 6 bit 6 16 bit 2426 2416 16 b 2416 16 bit 2426 Maximum total cable length 25m...

Страница 95: ...of shared disks there should be no termination anywhere on the bus except at the extremities Therefore you should remove the termination resistor blocks from the SCSI 2 Differential Controller and the...

Страница 96: ...t Wide Adapter A are shown in Figure 12 and Figure 13 respectively Figure 12 Termination on the SCSI 2 Differential Controller Figure 13 Termination on the SCSI 2 Differential Fast Wide Adapters 4 2 P...

Страница 97: ...e list presented to you 3 Enter the new ID any integer from 0 to 7 for this adapter in the Adapter card SCSI ID field Since the device with the highest SCSI ID on a bus gets control of the bus set the...

Страница 98: ...an ascsi device Also as shown below you need to change the external SCSI ID only Change Show Characteristics of a SCSI Adapter Type or select values in entry fields Press Enter AFTER making all desir...

Страница 99: ...e nodes in an HACMP cluster requires that you perform steps on all nodes in the cluster In general you define the components on one node referred to in the text as the source node and then import the...

Страница 100: ...que within the cluster Activate volume group AUTOMATICALLY at system restart Set to no so that the volume group can be activated as appropriate by the cluster event scripts ACTIVATE volume group after...

Страница 101: ...a non concurrent access volume group A concurrent access volume group can be activated varied on in either non concurrent mode or concurrent access mode To define logical volumes on a concurrent acce...

Страница 102: ...cal volume it creates Examples of logical volume names are dev lv00 and dev lv01 Within an HACMP cluster the name of any shared logical volume must be unique Also Options Description VOLUME GROUP name...

Страница 103: ...system in the volume group and make sure that it has the new jfslog name Check the dev attribute for the logical volume that you renamed and make sure that it has the new logical volume name Adding Co...

Страница 104: ...be AIX mirrored the disk array provides its own data redundancy The copies should reside on separate disks that are controlled by different disk adapters and are located in separate drawers or units...

Страница 105: ...e the volume group so that it is not activated automatically at system restart Use the smit chvg fastpath to change the characteristics of a volume group Table 18 smit crjfs Options Options Descriptio...

Страница 106: ...o physical partitions The varyonvg command reads information from this area VGSA Maintains the status of all physical volumes and physical partitions in the volume group It stores information regardin...

Страница 107: ...Quorum has nothing to do with the availability of mirrored data It is possible to have failures that result in loss of all copies of a logical volume yet the volume group remains varied on because a...

Страница 108: ...ailability quorum provides very little actual protection in non concurrent access configurations In fact enabling quorum may mask failures by allowing a volume group to varyon with missing resources A...

Страница 109: ...are on a graphics capable terminal 3 4 6 2 Starting the TaskGuide You can start the TaskGuide from the command line by typing usr sbin cluster tguides bin cl_ccvg or you can use the SMIT interface as...

Страница 110: ...92 IBM Certification Study Guide AIX HACMP...

Страница 111: ...for example and the required free space in usr must be confirmed For parts of the product like HAView there are prerequisites for other lpps nv6000 in this case that have to be ensured You can instal...

Страница 112: ...emos HACMP Client Demos cluster adt client samples demos HACMP Client Demos Samples cluster adt client samples clinfo HACMP Client clinfo Samples cluster adt client samples clstat HACMP Client clstat...

Страница 113: ...er msg en_US haview This fileset contains the US English messages for the HAView component cluster msg en_US haview HACMP HAView Messages cluster taskguides This is the fileset that contains the taskg...

Страница 114: ...he prerequisites are met For details look into Chapter 8 of the HACMP for AIX Version 4 3 Installation Guide SC23 4278 Archive any localized script and configuration files to prevent losing them durin...

Страница 115: ...79 2 Shut down the first node gracefully with takeover using the smit clstop fastpath For this example shut down Node A Node B will take over Node A s resources and make them available to clients See...

Страница 116: ...cluster 8 Repeat Steps 2 through 7 on Node B on remaining cluster nodes one at a time 9 When the last node has been upgraded to both AIX 4 3 2 and HACMP 4 3 the cluster install upgrade process is com...

Страница 117: ...running an earlier version of HACMP for AIX without de installing the server the results are unpredictable To determine if there is a mismatch between the HACMP client and server software installed on...

Страница 118: ...lity automatically updates the HACMP ODM object classes to the 4 3 version 6 Reboot Node A 7 Start the HACMP for AIX software on Node A using the smit clstart fastpath and verify that Node A successfu...

Страница 119: ...and the cluster name is a text string of up to 31 alphanumeric characters including underscores It doesn t necessarily need to match the hostname The HACMP software uses this information to create th...

Страница 120: ...ic characters underscores and hyphens up to 31 characters If IP address takeover is defined for that adapter a boot adapter address label has to be defined for it Use a consistent naming convention fo...

Страница 121: ...is service standby or boot Press Tab to toggle the values A node has a single service adapter for each public or private network A serial network has only a single service adapter A node can have none...

Страница 122: ...defining a service adapter and the adapter has a boot address and you want to use hardware address swapping See the chapter on planning TCP IP networks in the HACMP for AIX Version 4 3 Planning Guide...

Страница 123: ...detection rate Each network module maintains a connection to other network modules in the cluster The Cluster Managers on cluster nodes send messages to each other through these connections Each netw...

Страница 124: ...er Topology screen just like the NIM tuning options 4 2 5 Synchronizing the Cluster Definition Across Nodes Synchronization of the cluster topology ensures that the ODM data on all cluster nodes is in...

Страница 125: ...it is possible to get it and configure it on a non SP RS 6000 node This is not very common though so you will almost always see HACMP Enhanced Security used on the SP system When you synchronize the c...

Страница 126: ...ship with a set of nodes Depending on this relationship resources can be defined as one of three types cascading concurrent access or rotating See 2 4 1 Resource Group Options on page 28 for details A...

Страница 127: ...rs are not supported by HACMP for AIX 4 3 File Systems Identify the file systems to include in this resource group Press F4 to see a list of the file systems When you enter a file system in this field...

Страница 128: ...hopefully meaningful name in order to enable the cluster manager to identify the application server uniquely as well File Systems Consistency Check Identify the method for checking consistency of file...

Страница 129: ...4 4 Initial Testing After installing and configuring your cluster it is recommended that you do some initial testing in order to verify that the cluster is acting as it should 4 4 1 Clverify Running u...

Страница 130: ...mation daemon true Reissue either the ps command see above or look for the interface state with the netstat i command Now you should see that the boot interface is gone in favor of the service interfa...

Страница 131: ...lar cluster configuration a process called applying a snapshot provided the cluster is configured with the requisite hardware and software to support the configuration You can perform many of the clus...

Страница 132: ...ices are inactive on all cluster nodes applying the snapshot changes the ODM data stored in the system default configuration directory DCD If cluster services are active on the local node applying a s...

Страница 133: ...HACMP Installation and Cluster Definition 115...

Страница 134: ...116 IBM Certification Study Guide AIX HACMP...

Страница 135: ...e provides an event customization facility that allows you to tailor event processing to your site This facility can be used to include the following types of customization Adding changing and removin...

Страница 136: ...ress to be released because a standby adapter on the local node is masquerading as the service address of the remote node Reconfigures the local standby adapter to its original address and hardware ad...

Страница 137: ...original IP address and hardware address if necessary release_vg_fs Releases volume groups and file systems that are part of a resource group the local node is serving release_service_addr If configu...

Страница 138: ...lost contact with a network It is assumed in this case that a network related failure has occurred rather than a node related failure The network_down event mails a notification to the system administ...

Страница 139: ...sole message indicating that a standby adapter has failed or is no longer available join_standby This event occurs if a standby adapter becomes available The join_standby event displays a console mess...

Страница 140: ...1 3 Event Notification You can specify a command or user defined script that provides notification for example mail that an event is about to happen and that an event has just occurred along with the...

Страница 141: ...ents However the name of these scripts their location in the file system and their permission bits have to be identical 5 1 6 Event Emulator To test the effect of running an event on your cluster HACM...

Страница 142: ...defined to the Error Notification facility however an executable that shuts down the node with the failed adapter could be run allowing the surviving node to take over the disk 5 3 Network Modules To...

Страница 143: ...4284 5 4 NFS considerations For NFS to work correctly in an HACMP cluster environment you have to take care of some special NFS characteristics The HACMP scripts have only minimal NFS support You may...

Страница 144: ...ity that uses the exportfs command with the i flag and specifies the file system names stored in the HACMP ODM object class Therefore export options specified in the etc exports file are ignored Howev...

Страница 145: ...exported afs locally mounted afs nfs exported Ensure that the shared volume groups have the same major number on the server nodes This allows the clients to re establish the NFS mount transparently af...

Страница 146: ...an application that issues lock requests using the flock system call Node A fails Node B then attempts to unmount the NFS mounted file system mount it as a local file system and export it for client u...

Страница 147: ...fs in FILELIST do Is the filesystem mounted s says only return status x says exact match we use awk instead of cut because mount outputs lots of leading blanks that confuse cut etc mount awk print 2...

Страница 148: ...time to die Only wait if at least one filesystem is mounted if MOUNTED true then sleep SLEEP fi FILELIST for i in do bin echo i done bin sort r for fs in FILELIST do Is the filesystem mounted s says o...

Страница 149: ...the command errpt more or errpt a more Check that all devices are in the available state lsdev C more Check that the SCSI addresses of adapters on shared buses are unique lsattr E l ascsi0 If you are...

Страница 150: ...1 3 Process State Check the paging space usage by issuing lsps a Look for all expected processes with ps ef more Check that the run queue is 5 and that the CPU usage is at an acceptable level vmstat...

Страница 151: ...auto varyon are correctly defined and that the shared VG s are in the correct state lsvg and lsvg o Check that there are no stale partitions lsvg l Check that all appropriate file systems have been m...

Страница 152: ...mands Note that the tmp hacmp out file is the most useful to monitor especially if the Debug Level of the HACMP Run Time Parameters for the nodes has been set to high and if the Application Server Scr...

Страница 153: ...wap adapter has occurred Reconnect the network cable to the service interface This will cause the original service interface to become the standby interface Initiate a swap adapter back to the origina...

Страница 154: ...appuid for application processes and Eprimary for Eprimary Start HACMP on NodeF smit clstart NodeT will release NodeF s cascading Resource Groups and NodeF will take them back over but NodeT or a lowe...

Страница 155: ...NodeT Verify that failover has occurred netstat i and ping for networks lsvg o and vi of a test file for volume groups and ps U appuid for application processes Power cycle NodeF If HACMP is not conf...

Страница 156: ...ote that you should record the values for sb_max and thewall prior to modifying them and as an extra check you may want to add the original values to the end of etc rc net The TCP IP subsystem failure...

Страница 157: ...disk1 if for example hdisk1 is the mirror of hdisk0 bootlist m normal o Optional Prune the error log on NodeF errclear 0 Monitor cluster logfiles on NodeT if HACMP has been customized to monitor SCSI...

Страница 158: ...ant RAIDiant Disk Array Manager List all SCSI RAID Arrays Verify that all sharedvg file systems and paging spaces are accessible df and lsps a If using RAID5 with Hot Spare verify that reconstruction...

Страница 159: ...isk back in then sync the volume group syncvg NodeFvg Verify that all NodeFvg file systems and paging spaces are accessible df k and lsps a and that the partitions are not stale lsvg l NodeFvg 6 2 5 A...

Страница 160: ...142 IBM Certification Study Guide AIX HACMP...

Страница 161: ...sages written to the system console may scroll off screen before you notice them The following paragraphs provide an overview of the log files which are to be consulted for cluster troubleshooting as...

Страница 162: ...ted messages generated by HACMP for AIX clstrmgr activity Information in this file is used by IBM Support personnel when the clstrmgr is in debug mode Note that this file is overwritten every time clu...

Страница 163: ...running for more than 360 seconds can still be working on something and eventually get the job done Therefore it is essential to look at the tmp hacmp out file to find out what is actually happening 7...

Страница 164: ...high water mark it must wait until enough I O operations have finished to make the low water mark See the AIX Performance Monitoring Tuning Guide SC23 2365 for more information on I O pacing 7 3 2 Ex...

Страница 165: ...te with the other Let s consider a two node cluster where all networks have failed between the two nodes but each node remains up and running The problem with a partitioned cluster is that each node i...

Страница 166: ...nd the start of IP address takeover scripts As the disks are being acquired by the takeover node or after the disks have been acquired and applications are running the missing node completes its proce...

Страница 167: ...ind a solution to a problem in the cluster some sort of strategy is helpful for pinpointing the problem The following guidelines should make the troubleshooting process more productive Save the log fi...

Страница 168: ...If you do and one of the changes corrects the problem you have no way of knowing which change actually fixed the problem Make one change test the change and then if necessary make the next change Do n...

Страница 169: ...ter clstat utility which reports the status of key cluster components the cluster itself the nodes in the cluster and the network adapters connected to the nodes The HAView utility which monitors HACM...

Страница 170: ...he client The clstat utility reports whether the cluster is up down or unstable It also reports whether a node is up down joining leaving or reconfiguring and the number of nodes in the cluster For ea...

Страница 171: ...the time and date when they occurred 8 1 3 2 tmp hacmp out The tmp hacmp out file records the output generated by the configuration and startup scripts as they execute This information supplements an...

Страница 172: ...ns timestamped messages in ASCII format These track the execution of internal activities of the grpsvcs daemon IBM support personnel use this information for troubleshooting The file gets trimmed regu...

Страница 173: ...s the status of the nodes and their interfaces and invokes the appropriate scripts in response to node or network events All cluster nodes must run the clstrmgr daemon 8 2 1 2 Cluster SMUX Peer daemon...

Страница 174: ...s required for cluster operation All HACMP ES cluster nodes must run the grpsvcsd daemon 8 2 1 8 Cluster Globalized Server Daemon daemon grpglsmd This daemon operates as a grpsvcs client its function...

Страница 175: ...CP IP interfaces and to set the required network options 8 2 3 Stopping Cluster Services on a Node You stop cluster services on a node by executing the HACMP usr sbin cluster etc clstop script Use the...

Страница 176: ...r You have the following options Graceful In a graceful stop the HACMP software shuts down its applications and releases its resources The other nodes do not take over the resources of the stopped nod...

Страница 177: ...no hostnames or addresses HACMP server addresses must be provided by the user at installation time This file should contain all boot and service names or addresses of HACMP servers from any cluster ac...

Страница 178: ...particular processor or architecture ensure that the new node is the same type of system Uniprocessor applications may run slower on SMP systems Slot capacity of the new node must be the same or bette...

Страница 179: ...nt maintenance No command line intervention should be necessary to replace a failed disk in a RAID array Do the following steps in order to replace a disk that is a member of a RAID array 1 Remove the...

Страница 180: ...MP 4 3 enhancements to the C SPOC LVM utilities the disk replacement does not cause system down time as long as the failed disk was part of a RAID array or if all the LVs on it are mirrored to other d...

Страница 181: ...hared volume group and the information stored in the ODM are equal After changes in the volume group e g increasing the size of a file system the information about the volume group in ODM and in the V...

Страница 182: ...lly owning the shared volume group 8 4 2 Lazy Update For LVM components under the control of HACMP for AIX you do not have to explicitly export and import to bring the other cluster nodes up to date I...

Страница 183: ...Starting and Stopping HACMP on a Node or a Client on page 154 Without C SPOC functionality the system administrator must spend time executing administrative tasks individually on each cluster node Us...

Страница 184: ...oncurrent mode and with HACMP 4 3 Remove a logical volume Shared file systems only applicable for non concurrent VGs List all shared file systems Change View the characteristics of a shared file syste...

Страница 185: ...fore you start the TaskGuide make sure that You have a configured HACMP cluster in place You are on a graphics capable terminal 8 4 4 2 Starting the TaskGuide You can start the TaskGuide from the comm...

Страница 186: ...iguration of cluster resources in the ODM on one node you must synchronize the change across all cluster nodes 8 5 2 Synchronize Cluster Resources You perform a synchronization by choosing the Synchro...

Страница 187: ...ldare command The command lets you move the ownership of a series of resource groups to a specific node in that resource group s node list as long as the requested arrangement is not incompatible with...

Страница 188: ...lted at the time the sticky location fails to find the highest priority node active After finding the active node cascading resource groups will continually migrate to the highest priority node in the...

Страница 189: ...the placement of migrated resource groups default and stop The default and stop locations are special locations that determine resource group behavior and whether the resources can be reacquired Defau...

Страница 190: ...des Resource migration first releases all specified resources wherever they reside in the cluster then it reacquires these resources on the newly specified nodes You can also use this command to swap...

Страница 191: ...ode the DARE Resource Migration utility includes a command clfindres that makes a best guess estimate within the domain of current HACMP configuration policies of the state and location of specified r...

Страница 192: ...below you might even be able to keep your mission critical application up and running during the update process provided that the takeover node is designed to carry its own load and the takeover load...

Страница 193: ...er Node Along with the normal rules for applying updates the following general points should be observed for HACMP clusters Cluster nodes should be kept at the same AIX maintenance levels wherever pos...

Страница 194: ...using the AIX cron facility While this is a very good procedure the HACMP cluster environment presents some special challenges The problem is you never know which machine has your application data onl...

Страница 195: ...d you are back to a mirrored mode of operation with fully updated data The splitlvcopy command of AIX does much of the work required to implement this solution We can summarize the steps to do a split...

Страница 196: ...ta is backed up i e the data this cluster node cares about during normal operations or in case of another s node failure and a subsequent takeover of this node s resources backing up both of the clust...

Страница 197: ...on one of the cluster nodes the cl_lsuser command outputs a warning message but continues execution of the command on other cluster nodes 8 8 2 Adding User Accounts on all Cluster Nodes Adding a user...

Страница 198: ...file and the files in the etc security directory To change the attributes of a user account on one or more cluster nodes you can either use the AIX chuser command in rsh to one cluster node after the...

Страница 199: ...commands The restrictions on NIS are just the same as for users and therefore are not explained here in detail For more detailed information please refer to Chapter 12 of the HACMP for AIX Version 4...

Страница 200: ...182 IBM Certification Study Guide AIX HACMP...

Страница 201: ...an SP system Also the failure of the control workstation could cause the switch network to fail HACWS covers the following cases with a fully functional environment Continues running your SP system a...

Страница 202: ...CWS Environment 9 1 2 Software Requirements Both of the control workstations must have the same software installed that is they must be on the same AIX level use the same PSSP software level and have...

Страница 203: ...10 of the HACMP for AIX Version 4 3 Installation Guide SC23 4278 should be performed For HACWS control workstations the ssp hacws fileset has to be installed as well 9 1 5 HACWS Configuration Since t...

Страница 204: ...ncluded in a resource group Recommended settings for this resource group are Resource Group Name hacws_group1 Node Relationship rotating Participating Node Names nodename of primary cws nodename of ba...

Страница 205: ...r services startup with the following command grep SPCW_APPS COMPLETE tmp hacmp out Now you can cause a failover by stopping cluster services on the primary cws and see whether cws services are still...

Страница 206: ...erberos server database When a client needs the services of a server the client must prove its identity to the server so that the server knows to whom it is talking Tickets are the means the Kerberos...

Страница 207: ...ros principals so that remote kerberized commands will work On an SP the setup_authent command does the SP related kerberos setup which is based on the IP labels found in the SDR Since the SDR does no...

Страница 208: ...hysically connected to one node to be transparently accessed by other nodes Importantly VSD supports only raw logical volumes not file systems The VSD facility is included in the ssp csd vsd fileset o...

Страница 209: ...recommend disabling VSD cache because its management becomes counterproductive 2 From lv_X in which case the VSD device driver exploits Node X s normal LVM and Disk Device Driver Disk DD pathway to f...

Страница 210: ...ned in the SDR and managed by either SP SMIT panels or the VSD Perspective VSDs can be in one of five states as shown in Figure 18 on page 192 Figure 18 VSD State Transitions This figure shows the pos...

Страница 211: ...and provide transparent failover of VSDs among the nodes RVSD is a separately priced IBM LPP Figure 19 RVSD Function With reference to Figure 19 above Nodes X Y and Z form a group of nodes using VSD R...

Страница 212: ...ecovery Communication adapter failures are treated the same as node failures The hc daemon is also called the Connection Manager It supports the development of recoverable applications The hc daemon m...

Страница 213: ...l the others continue working without even noticing that something has happened on the switch network 9 4 1 Switch Basics Within HACMP Although it has already been mentioned in other places the follow...

Страница 214: ...ge As the SP switch has its availability concept built in there is no need to do it outside the PSSP software so HACMP doesn t have to take care of it any more 9 4 3 Switch Failures As mentioned befor...

Страница 215: ...6000 SP Topics 197 In case this node was the Eprimary node on the switch network and it is an SP switch then the RS 6000 SP software would have chosen a new Eprimary independently from the HACMP softw...

Страница 216: ...198 IBM Certification Study Guide AIX HACMP...

Страница 217: ...logy for heartbeating is called HACMP Extended Scalability HACMP ES see below for details Basically these two versions differ only in the way the cluster manager keeps track of the status of nodes ada...

Страница 218: ...ftware stack Packaging these services with HACMP ES makes it possible to run this software on all RS 6000s not just on SP nodes RSCT Services include the following components Event Manager A distribut...

Страница 219: ...rk File System for AIX The HANFS for AIX software provides a reliable NFS server capability by allowing a backup processor to recover current NFS activity should the primary NFS server fail The HANFS...

Страница 220: ...t on an RS 6000 SP you need to have PSSP Version 3 1 installed As the HPS Switch is no longer supported with PSSP Version 3 1 you need to upgrade to the SP Switch in case you haven t already or you wi...

Страница 221: ...you can define custom events These events can act on anything that haemd can detect which is virtually anything measurable on an AIX system How to customize events is explained in great detail in the...

Страница 222: ...204 IBM Certification Study Guide AIX HACMP...

Страница 223: ...hardware and software products and levels IBM may have patents or pending patent applications covering subject matter in this document The furnishing of this document does not give you any license to...

Страница 224: ...them as completely as possible the examples contain the names of individuals companies brands and products All of these names are fictitious and any similarity to the names and addresses used by an a...

Страница 225: ...ion under license Pentium MMX ProShare LANDesk and ActionMedia are trademarks or registered trademarks of Intel Corporation in the U S and other countries Network File System and NFS are trademarks of...

Страница 226: ...208 IBM Certification Study Guide AIX HACMP...

Страница 227: ...SP SG24 5145 Monitoring and Managing IBM SSA Disk Subsystems SG24 5251 AIX Version 4 3 Migration Guide SG24 5116 B 2 Redbooks on CD ROMs Redbooks are also available on CD ROMs Order a subscription an...

Страница 228: ...1877 AIX Performance Monitoring and Tuning Guide SC23 2365 AIX HACMP for AIX Version 4 3 Concepts and Facilities SC23 4276 AIX HACMP for AIX Version 4 3 Planning Guide SC23 4277 AIX HACMP for AIX Vers...

Страница 229: ...ands TOOLCAT REDPRINT TOOLS SENDTO EHONE4 TOOLS2 REDPRINT GET SG24xxxx PACKAGE TOOLS SENDTO CANVM2 TOOLS REDPRINT GET SG24xxxx PACKAGE Canadian users only To get BookManager BOOKs of redbooks type the...

Страница 230: ...edish IBM Publications Publications Customer Support P O Box 29570 Raleigh NC 27626 0570 USA IBM Publications 144 4th Avenue S W Calgary Alberta T2P 3N5 Canada IBM Direct Services Sortemosevej 21 DK 3...

Страница 231: ...nt by credit card not available in all countries Signature mandatory for credit card payment Title Order Number Quantity First name Last name Company Address City Postal code Telephone number Telefax...

Страница 232: ...214 IBM Certification Study Guide AIX HACMP...

Страница 233: ...trol DMS Deadman Switch DNS Domain Name Service DSMIT Distributed System Management Interface Tool FDDI Fiber Distributed Data Interface F W Fast and Wide SCSI GB Gigabyte GODM Global Object Data Mana...

Страница 234: ...SC Reduced Instruction Set Computer SCSI Small Computer Systems Interface SLIP Serial Line Interface Protocol SMIT System Management Interface Tool SMP Symmetric Multi Processor SMUX SNMP see below Mu...

Страница 235: ...groups NFS crossmounting issues 126 changing user accounts 180 cl_lsuser command using 179 cl_mkuser command using 179 cldare command 172 clfindres 173 clinfo 156 cllockd 155 clsmuxpd 155 clstat 152 c...

Страница 236: ...stopping on clients 159 HACWS 183 HANFS for AIX 201 Hardware Address Swapping 12 hardware address swapping 40 planning 40 HAView 151 heartbeats 11 home directories 49 Hot Standby Configuration 30 hot...

Страница 237: ...un Time Parameters 110 RVSD 193 S SCSI target mode 38 SCSI Disks 26 service adapter 38 service ticket 188 Shared LVM Component Configuration 81 Shared LVs and Filesystems 84 Shared VGs 82 single point...

Страница 238: ...Guide AIX HACMP Token Ring 13 Topology Service 200 topsvcsd 156 U Upgrading 96 user accounts adding 179 changing 180 creating 179 removing 180 User and Group IDs 48 V VGDA 88 VGSA 88 Virtual Shared D...

Страница 239: ...ode 1 914 432 8264 Send your comments in an Internet note to redbook us ibm com Which of the following best describes you _ Customer _ Business Partner _ Solution Developer _ IBM employee _ None of th...

Страница 240: ...Printed in the U S A SG24 5131 00 IBM Certification Study Guide AIX HACMP SG24 5131 00...

Отзывы: