background image

Cluster Hardware and Software Preparation 

67

For more information regarding adapters and cabling rules see 2.3.1, “SSA 
Disks” on page 16 or the following documents:

  • 7133 SSA Disk Subsystems: Service Guide, SY33-0185-02 

  • 7133 SSA Disk Subsystem: Operator Guide, GA33-3259-01 

  • 7133 Models 010 and 020 SSA Disk Subsystems: Installation Guide, 

GA33-3260-02 

  • 7133 Models 500 and 600 SSA Disk Subsystems: Installation Guide, 

GA33-3263-02 

  • 7133 SSA Disk Subsystems for Open Attachment: Service Guide, 

SY33-0191-00 

  • 7133 SSA Disk Subsystems for Open Attachment: Installation and User's 

Guide, SA33-3273-00

3.3.1.2  AIX Configuration
During boot time, the configuration manager of AIX configures all the device 
drivers needed to have the SSA disks available for usage. The configuration 
manager can’t do this configuration if the SSA Subsystem is not properly 
connected or if the SSA Software is not installed. If the SSA Software is not 
already installed, the configuration manager will tell you the missing filesets. 
You can either install the missing filesets with 

smit

,

 

or call the configuration 

manager with the -i flag.

The configuration manager configures the following devices:

  • SSA Adapter Router

  • SSA Adapter

  • SSA Disks

Adapter Router

The adapter Router (ssar) is only a conceptual configuration aid and is 
always in a “Defined” state. It cannot be made “Available.” You can list the 
ssar with the following command:

#lsdev -C | grep ssar

ssar

Defined

SSA Adapter Router

Summary of Contents for AIX HACMP SG24-5131-00

Page 1: ...SG24 5131 00 International Technical Support Organization http www redbooks ibm com IBM Certification Study Guide AIX HACMP David Thiessen Achim Rehor Reinhard Zettler...

Page 2: ......

Page 3: ...IBM Certification Study Guide AIX HACMP May 1999 SG24 5131 00 International Technical Support Organization...

Page 4: ...r 5765 D28 for use with the AIX Operating System Version 4 3 2 and later Comments may be addressed to IBM Corporation International Technical Support Organization Dept JN9B Building 003 Internal Zip 2...

Page 5: ...4 2 3 Cluster Disks 16 2 3 1 SSA Disks 16 2 3 2 SCSI Disks 26 2 4 Resource Planning 28 2 4 1 Resource Group Options 28 2 4 2 Shared LVM Components 30 2 4 3 IP Address Takeover 34 2 4 4 NFS Exports and...

Page 6: ...m 88 3 4 6 Alternate Method TaskGuide 90 Chapter 4 HACMP Installation and Cluster Definition 93 4 1 Installing HACMP 93 4 1 1 First Time Installs 93 4 1 2 Upgrading From a Previous Version 96 4 2 Defi...

Page 7: ...ters 132 6 1 3 Process State 132 6 1 4 Network State 132 6 1 5 LVM State 133 6 1 6 Cluster State 133 6 2 Simulate Errors 134 6 2 1 Adapter Failure 134 6 2 2 Node Failure Reintegration 137 6 2 3 Networ...

Page 8: ...ze Cluster Resources 168 8 5 3 DARE Resource Migration Utility 169 8 6 Applying Software Maintenance to an HACMP Cluster 174 8 7 Backup Strategies 176 8 7 1 Split Mirror Backups 176 8 7 2 Using Events...

Page 9: ...luster Technology RSCT 200 10 2 2 Enhanced Cluster Security 201 10 3 High Availability for Network File System for AIX 201 10 4 Similarities and Differences 202 10 5 Decision Criteria 202 Appendix A S...

Page 10: ...viii IBM Certification Study Guide AIX HACMP...

Page 11: ...necting Networks to a Hub 61 10 7135 110 RAIDiant Arrays Connected on Two Shared 8 Bit SCSI Buses 74 11 7135 110 RAIDiant Arrays Connected on Two Shared 16 Bit SCSI Buses77 12 Termination on the SCSI...

Page 12: ...x IBM Certification Study Guide AIX HACMP...

Page 13: ...7133 Models 010 020 500 600 D40 T40 Specifications 18 8 SSA Disks 19 9 SSA Adapters 19 10 The Advantages and Disadvantages of the Different RAID Levels 24 11 Necessary APAR Fixes 55 12 AIX Prerequisi...

Page 14: ...xii IBM Certification Study Guide AIX HACMP...

Page 15: ...icable to the job role and is a meaningful and appropriate assessment of skill Subject Matter Experts who successfully perform the job participate throughout the entire development process These job i...

Page 16: ...certification content this publication can also be used as a desk side reference So whether you are planning to take the AIX HACMP certification exam or just want to validate your HACMP skills this bo...

Page 17: ...ontributions to this project Marcus Brewer International Technical Support Organization Austin Center Rebecca Gonzalez IBM AIX Certification Project Manager Austin Milos Radosavljevic International Te...

Page 18: ...xvi IBM Certification Study Guide AIX HACMP...

Page 19: ...Available Clusters Certification Requirement two Tests To attain the IBM Certified Specialist AIX HACMP certification candidates must first obtain the AIX System Administration or the AIX System Suppo...

Page 20: ...ions Assist customers in identifying HA applications Evaluate the Customer Environment and Tailorable Components Evaluate the configuration and identify Single Points of Failure SPOF Define and analyz...

Page 21: ...led shared disk tests Troubleshoot a failed application Troubleshoot failed Pre Post event scripts Troubleshoot failed error notifications Troubleshoot errors reported by cluster verification Section...

Page 22: ...eb site http www ibm com certify Table 1 AIX Version 4 HACMP Installation and Implementation Course Number Q1054 USA AU54 Worldwide Course Duration Five days Course Abstract This course provides a det...

Page 23: ...this course include Integrating the cluster with existing network services DNS NIS etc Monitoring tools for the cluster including HAView for Netview Maintaining user IDs and passwords across the clust...

Page 24: ...6 IBM Certification Study Guide AIX HACMP...

Page 25: ...or AIX and HACMP ES We realize that the rapid pace of change in products will almost certainly render any snapshot of the options out of date by the time it is published This is true of almost all tec...

Page 26: ...ture of normal RS 6000 machines and RS 6000 SP nodes is possible 2 1 2 Cluster Node Considerations It is important to understand that selecting the system components for a cluster requires careful con...

Page 27: ...he requirements of highly available applications not only in terms of CPU cycles but also of memory and possibly disk space Approximately 50 MB of disk storage is required for full installation of the...

Page 28: ...ature is available for token ring FDDI or ATM you must use an I O slot to provide token ring adapter redundancy Table 4 Number of Adapter Slots in Each Model 1 The switch adapter is onboard and does n...

Page 29: ...Version 4 3 product does not use non TCP IP networks for node to node communications in triggering synchronizing and executing event reactions This can be an issue if you are configuring a cluster wit...

Page 30: ...m availability in that the more communication paths that connect clustered nodes and clients the greater the degree of network availability 2 2 1 2 Special Network Considerations Each type of interfac...

Page 31: ...umference of 100 kilometers ATM is a point to point connection network It currently supports the OC3 and the OC12 standard which is 155 Mbps or 625 Mbps You cannot use hardware address swapping with A...

Page 32: ...the SP Switch For IP Address Takeover IPAT in general there are two adapters per cluster node and network recommended in order to eliminate single points of failure The only exception to this rule is...

Page 33: ...requires two serial ports per node Table 3 shows a list of possible cluster nodes and the number of native serial ports for each Table 5 Number of Available Serial Ports in Each Model 1 serial port ca...

Page 34: ...not use more than 4 target mode SCSI networks in a cluster Target mode SSA If you are using shared SSA devices target mode SSA is the third possibility for a serial network within HACMP In order to us...

Page 35: ...ode can be either an initiator or a target An initiator issues commands while a target responds with data and status information The SSA nodes in the adapter are therefore initiators while the SSA nod...

Page 36: ...Multi Storage Tower Specifications Table 7 7133 Models 010 020 500 600 D40 T40 Specifications Item Specification Transfer rate SSA interface 80 MB Configuration 2 to 5 disk drives 2 2 GB 4 5 GB or 9...

Page 37: ...k Yes and hot swappable redundant power and cooling Name Capacities GB Buffer size KB Maximum Transfer rate MBps Starfire 1100 1 1 0 20 Starfire 2200 2 2 0 20 Starfire 4320 4 5 512 20 Scorpion 4500 4...

Page 38: ...tors A1 and A2 or Connectors B1 and B2 Only one of the two pairs of connectors on an adapter card can be connected in a single SSA loop A maximum of 48 devices can be connected in a single SSA loop A...

Page 39: ...mum of two adapters can be connected in a particular loop if one or more of the disk drives in the loop are array disk drives that are not configured for fast write operations The adapters can be two...

Page 40: ...ttern skew is eliminated due to the distribution of the data This means that with data distributed evenly across a number of disks random accesses will most likely find the required information spread...

Page 41: ...ve that the required data is actually on This means that simultaneous as well as independent reads are possible Write requests however require a read modify update cycle that creates a bottleneck at t...

Page 42: ...the 7133 Disk Subsystem The only RAID level supported by the 7133 SSA disk subsystem is RAID 5 RAID 0 and RAID 1 can be achieved with the striping and mirroring facility of the Logical Volume Manager...

Page 43: ...unning the same applications The advantages of SSA are summarized as follows Dual paths to devices Simplified cabling cheaper smaller cables and connectors no separate terminators Faster interconnect...

Page 44: ...I 2 Differential 8 bit or 16 bit bus 2 3 2 1 Capacities Disks There are four disk sizes available for the 7135 RAIDiant Array Models 110 and 210 1 3 GB 2 0 GB 2 2 GB only supported by Dual Active Soft...

Page 45: ...volume groups different parts of the subsystem can be logically attached to different systems at any one time Redundant Power Supply Redundant power supplies provide alternative sources of power If on...

Page 46: ...g nodes HACMP considers the following as resource types Volume Groups Disks File Systems File Systems to be NFS mounted File Systems to be NFS exported Service IP addresses Applications The following...

Page 47: ...to false the first node in a group s resource chain to join the cluster acquires all the resources in the resource group only if it is the node with the highest priority for that group If the first no...

Page 48: ...anager makes the following assumptions about the acquisition of resource groups Cascading The active node with the highest priority controls the resource group Concurrent All active nodes have access...

Page 49: ...nfiguration is that you can shift from a single system environment to an HACMP cluster at a low cost by adding a less powerful processor Of course this assumes that you are willing to accept a lower l...

Page 50: ...tion Figure 3 Mutual Takeover Configuration In this configuration there are two cascading resource groups A and B Resource group A consists of two disks hdisk1 and hdisk3 and one volume group sharedvg...

Page 51: ...look at it from the point of view of performance this is the best thing to do since you have one node doing the work of two when any one of the nodes is down Third Party Takeover Configuration Figure...

Page 52: ...roup have no priorities assigned to them If a 7135 RAIDiant Array Subsystem is used for storage you can have a maximum of four nodes concurrently accessing a set of storage resources If you are using...

Page 53: ...rk Topology The following sections cover topics of network topology Single Network In a single network setup each node in the cluster is connected to only one network and has only one service adapter...

Page 54: ...o another In normal cluster activity however each network is separate both logically and physically Keep in mind that a client unless it is connected to more than one network is susceptible to network...

Page 55: ...ibute Network Name The network name is a symbolic value that identifies a network in an HACMP for AIX environment Cluster processes use this information to determine which adapters are connected to th...

Page 56: ...ters in an HACMP cluster have a label and a function service standby or boot The maximum number of network interfaces per node is 24 Adapter Label A network adapter is identified by an adapter label F...

Page 57: ...n and hardware slot constraints determine the actual number of standby adapters that a node can support The standby adapter is configured on a different subnet from any service adapters on the same sy...

Page 58: ...de A reclaims the address and reintegrates it into the cluster Reintegration however fails if Node A has not been configured to boot using its boot address The boot address does not use a separate phy...

Page 59: ...provide a highly available environment for mission critical applications These applications must remain available at all times in many organizations For example an HACMP cluster could run a database s...

Page 60: ...uster could be Under normal conditions the load is serviced by a cluster node that was designed for this application s needs In case of a failover another node has to handle its own work plus the appl...

Page 61: ...ce with other Applications In case of a failover a node might have to handle several applications concurrently This means the applications data or resources must not conflict with each other Again the...

Page 62: ...s The HACMP for AIX Version 4 3 Installation Guide SC23 4278 describes how to configure event processing for a cluster You cannot define additional cluster events You can however define multiple pre a...

Page 63: ...sue you can insert a recovery command with a retry count high enough to be sure to cover for the problem 2 6 2 Error Notification The AIX Error Notification facility detects errors that are logged to...

Page 64: ...HACMP there is a SMIT screen to make it easier to set up an error notification object This is much easier than the traditional AIX way of adding a template file to the ODM class Under smit hacmp RAS S...

Page 65: ...ly you can always customize any cluster event to enable a Notify Command whenever this event is triggered through the SMIT screen for customizing events 2 6 2 3 Application Failure Even application fa...

Page 66: ...ing problems caused by mismatches in the user or group IDs System administrators typically keep user accounts synchronized across cluster nodes by copying the key system account and security files to...

Page 67: ...ferent approaches to that You could either put them on a shared volume and handle them within a resource group or you could use NFS mounts 2 7 3 1 Home Directories on Shared Volumes Within an HACMP cl...

Page 68: ...urce where they are physically residing they have to be NFS exported from the resource group and imported on all the other nodes in case any application is running there needing access to the users fi...

Page 69: ...d a computer system physical disk devices are usually the most susceptible to failure Because of this disk mirroring is a frequently used technique for increasing system availability File system mirro...

Page 70: ...y If the dump device is mirrored you may not be able to capture the dump image from a crash or the dump image may be corrupted The design of LVM prevents mirrored writes of the dump device Only one of...

Page 71: ...so mirror those logical volumes in addition to hd6 If hd5 consists of more than one logical partition then after mirroring hd5 you must verify that the mirrored copy of hd5 resides on contiguous physi...

Page 72: ...command This is so that the Quorum OFF functionality takes effect syncvg v rootvg bosboot a d dev hdisk bootlist m normal hdisk0 hdisk1 Even though this command identifies the list of possible boot d...

Page 73: ...Prerequisite LPPs The Prerequisites for the HACMP component HAView 4 2 are xlC rte 3 1 3 0 nv6000 base obj 4 1 0 0 AIX Version APARs needed 4 1 IX56564 IX61184 IX60521 4 2 IX62417 IX68483 IX70884 IX7...

Page 74: ...n poor interactive performance from some applications when another application on the system is doing heavy input output Under certain conditions I O can take several seconds to complete While the hea...

Page 75: ...ry from system to system an initial high water mark of 33 and a low water mark of 24 provides a good starting point These settings only slightly reduce write times and consistently generate correct fa...

Page 76: ...abled users that are known only in the NIS managed version of the etc passwd file will not be able to create crontabs This is because cron is started with the etc inittab file with run level 2 for exa...

Page 77: ...gr daemon however does not depend on rhosts file entries The rhosts file is not required on SP systems running the HACMP Enhanced Security This feature removes the requirement of TCP IP access control...

Page 78: ...f your clusters you have to check whether your network cabling allows you to put two cluster nodes away from each other or even in different buildings There s one additional point with cabling that sh...

Page 79: ...g Networks to a Hub 3 2 1 2 IP Addresses and Subnets The design of the HACMP for AIX software specifies that All client traffic be carried over the service adapter Standby adapters be hidden from clie...

Page 80: ...oice of a transmission route also facilitates identifying an adapter failure See Chapter 2 4 3 IP Address Takeover on page 34 for more detailed information 3 2 1 3 Testing After setting up all adapter...

Page 81: ...kind Therefore when we are talking about HACMP network definitions a serial network could also be a target mode SCSI or target mode SSA network The following describes some cabling issues on each typ...

Page 82: ...parent device cannot be changed as long as there are child devices present and active you have to set all the disks on that bus to Defined with the rmdev l hdiskx command before you can enable that f...

Page 83: ...p tmssa The Target Mode SCSI or SSA serial network can now be configured into an HACMP cluster 3 2 2 5 Testing RS232 and Target Mode Networks Testing of the serial networks functionality is similar Ba...

Page 84: ...second command 3 3 Cluster Disk Setup The following sections relate important information about cluster disk setup 3 3 1 SSA The following sections describe cabling AIX configuration microcode loading...

Page 85: ...ot time the configuration manager of AIX configures all the device drivers needed to have the SSA disks available for usage The configuration manager can t do this configuration if the SSA Subsystem i...

Page 86: ...ault one pdisk is always configured for each physical disk drive One hdisk is configured for each disk drive that is connected to the using system or for each array By default all disk drives are conf...

Page 87: ...SSA Service Aids This will give you the following options Set Service Mode This option enables you to determine the location of a specific SSA disk drive within a loop and to remove the drive from the...

Page 88: ...e for your SSA disk subsystem The latest information and downloadable files can be found under http www hursley ibm com ssa Upgrade Instructions Follow these steps to perform an upgrade 1 Login as roo...

Page 89: ...s in other systems please repeat this procedure on all systems as soon as possible 17 In order to install the disk microcode run ssadload u from each system in turn You must ensure that You do not att...

Page 90: ...SSA Enhanced Raid adapters but with the Logical Volume Manager LVM RAID0 and RAID1 can be configured on non RAID disks In order to create a RAID5 on SSA Disks use the command smitty ssaraid This will...

Page 91: ...on protects you against any failure SCSI adapter cables or RAID controller on either SCSI bus Because of cable length restrictions a maximum of two 7135s on a shared SCSI bus are supported by HACMP 3...

Page 92: ...en an SCSI 2 Differential Y Cable and a Differential SCSI Cable going to the 7135 unit as shown in Figure 10 Figure 10 shows four RS 6000s each represented by two SCSI 2 Differential Controllers conne...

Page 93: ...N 67G1262 OR FC 2914 or 9214 14m PN 67G1263 OR FC 2918 or 9218 18m PN 67G1264 16 Bit Terminator T Included in FC 2426 Y Cable PN 61G8324 Figure 11 shows four RS 6000s each represented by two SCSI 2 Di...

Page 94: ...76 IBM Certification Study Guide AIX HACMP T T T T 6 bit 6 16 bit 2416 16 2424 6 bit 6 16 bit 2426 2416 16 b 2416 16 bit 2426 Maximum total cable length 25m...

Page 95: ...of shared disks there should be no termination anywhere on the bus except at the extremities Therefore you should remove the termination resistor blocks from the SCSI 2 Differential Controller and the...

Page 96: ...t Wide Adapter A are shown in Figure 12 and Figure 13 respectively Figure 12 Termination on the SCSI 2 Differential Controller Figure 13 Termination on the SCSI 2 Differential Fast Wide Adapters 4 2 P...

Page 97: ...e list presented to you 3 Enter the new ID any integer from 0 to 7 for this adapter in the Adapter card SCSI ID field Since the device with the highest SCSI ID on a bus gets control of the bus set the...

Page 98: ...an ascsi device Also as shown below you need to change the external SCSI ID only Change Show Characteristics of a SCSI Adapter Type or select values in entry fields Press Enter AFTER making all desir...

Page 99: ...e nodes in an HACMP cluster requires that you perform steps on all nodes in the cluster In general you define the components on one node referred to in the text as the source node and then import the...

Page 100: ...que within the cluster Activate volume group AUTOMATICALLY at system restart Set to no so that the volume group can be activated as appropriate by the cluster event scripts ACTIVATE volume group after...

Page 101: ...a non concurrent access volume group A concurrent access volume group can be activated varied on in either non concurrent mode or concurrent access mode To define logical volumes on a concurrent acce...

Page 102: ...cal volume it creates Examples of logical volume names are dev lv00 and dev lv01 Within an HACMP cluster the name of any shared logical volume must be unique Also Options Description VOLUME GROUP name...

Page 103: ...system in the volume group and make sure that it has the new jfslog name Check the dev attribute for the logical volume that you renamed and make sure that it has the new logical volume name Adding Co...

Page 104: ...be AIX mirrored the disk array provides its own data redundancy The copies should reside on separate disks that are controlled by different disk adapters and are located in separate drawers or units...

Page 105: ...e the volume group so that it is not activated automatically at system restart Use the smit chvg fastpath to change the characteristics of a volume group Table 18 smit crjfs Options Options Descriptio...

Page 106: ...o physical partitions The varyonvg command reads information from this area VGSA Maintains the status of all physical volumes and physical partitions in the volume group It stores information regardin...

Page 107: ...Quorum has nothing to do with the availability of mirrored data It is possible to have failures that result in loss of all copies of a logical volume yet the volume group remains varied on because a...

Page 108: ...ailability quorum provides very little actual protection in non concurrent access configurations In fact enabling quorum may mask failures by allowing a volume group to varyon with missing resources A...

Page 109: ...are on a graphics capable terminal 3 4 6 2 Starting the TaskGuide You can start the TaskGuide from the command line by typing usr sbin cluster tguides bin cl_ccvg or you can use the SMIT interface as...

Page 110: ...92 IBM Certification Study Guide AIX HACMP...

Page 111: ...for example and the required free space in usr must be confirmed For parts of the product like HAView there are prerequisites for other lpps nv6000 in this case that have to be ensured You can instal...

Page 112: ...emos HACMP Client Demos cluster adt client samples demos HACMP Client Demos Samples cluster adt client samples clinfo HACMP Client clinfo Samples cluster adt client samples clstat HACMP Client clstat...

Page 113: ...er msg en_US haview This fileset contains the US English messages for the HAView component cluster msg en_US haview HACMP HAView Messages cluster taskguides This is the fileset that contains the taskg...

Page 114: ...he prerequisites are met For details look into Chapter 8 of the HACMP for AIX Version 4 3 Installation Guide SC23 4278 Archive any localized script and configuration files to prevent losing them durin...

Page 115: ...79 2 Shut down the first node gracefully with takeover using the smit clstop fastpath For this example shut down Node A Node B will take over Node A s resources and make them available to clients See...

Page 116: ...cluster 8 Repeat Steps 2 through 7 on Node B on remaining cluster nodes one at a time 9 When the last node has been upgraded to both AIX 4 3 2 and HACMP 4 3 the cluster install upgrade process is com...

Page 117: ...running an earlier version of HACMP for AIX without de installing the server the results are unpredictable To determine if there is a mismatch between the HACMP client and server software installed on...

Page 118: ...lity automatically updates the HACMP ODM object classes to the 4 3 version 6 Reboot Node A 7 Start the HACMP for AIX software on Node A using the smit clstart fastpath and verify that Node A successfu...

Page 119: ...and the cluster name is a text string of up to 31 alphanumeric characters including underscores It doesn t necessarily need to match the hostname The HACMP software uses this information to create th...

Page 120: ...ic characters underscores and hyphens up to 31 characters If IP address takeover is defined for that adapter a boot adapter address label has to be defined for it Use a consistent naming convention fo...

Page 121: ...is service standby or boot Press Tab to toggle the values A node has a single service adapter for each public or private network A serial network has only a single service adapter A node can have none...

Page 122: ...defining a service adapter and the adapter has a boot address and you want to use hardware address swapping See the chapter on planning TCP IP networks in the HACMP for AIX Version 4 3 Planning Guide...

Page 123: ...detection rate Each network module maintains a connection to other network modules in the cluster The Cluster Managers on cluster nodes send messages to each other through these connections Each netw...

Page 124: ...er Topology screen just like the NIM tuning options 4 2 5 Synchronizing the Cluster Definition Across Nodes Synchronization of the cluster topology ensures that the ODM data on all cluster nodes is in...

Page 125: ...it is possible to get it and configure it on a non SP RS 6000 node This is not very common though so you will almost always see HACMP Enhanced Security used on the SP system When you synchronize the c...

Page 126: ...ship with a set of nodes Depending on this relationship resources can be defined as one of three types cascading concurrent access or rotating See 2 4 1 Resource Group Options on page 28 for details A...

Page 127: ...rs are not supported by HACMP for AIX 4 3 File Systems Identify the file systems to include in this resource group Press F4 to see a list of the file systems When you enter a file system in this field...

Page 128: ...hopefully meaningful name in order to enable the cluster manager to identify the application server uniquely as well File Systems Consistency Check Identify the method for checking consistency of file...

Page 129: ...4 4 Initial Testing After installing and configuring your cluster it is recommended that you do some initial testing in order to verify that the cluster is acting as it should 4 4 1 Clverify Running u...

Page 130: ...mation daemon true Reissue either the ps command see above or look for the interface state with the netstat i command Now you should see that the boot interface is gone in favor of the service interfa...

Page 131: ...lar cluster configuration a process called applying a snapshot provided the cluster is configured with the requisite hardware and software to support the configuration You can perform many of the clus...

Page 132: ...ices are inactive on all cluster nodes applying the snapshot changes the ODM data stored in the system default configuration directory DCD If cluster services are active on the local node applying a s...

Page 133: ...HACMP Installation and Cluster Definition 115...

Page 134: ...116 IBM Certification Study Guide AIX HACMP...

Page 135: ...e provides an event customization facility that allows you to tailor event processing to your site This facility can be used to include the following types of customization Adding changing and removin...

Page 136: ...ress to be released because a standby adapter on the local node is masquerading as the service address of the remote node Reconfigures the local standby adapter to its original address and hardware ad...

Page 137: ...original IP address and hardware address if necessary release_vg_fs Releases volume groups and file systems that are part of a resource group the local node is serving release_service_addr If configu...

Page 138: ...lost contact with a network It is assumed in this case that a network related failure has occurred rather than a node related failure The network_down event mails a notification to the system administ...

Page 139: ...sole message indicating that a standby adapter has failed or is no longer available join_standby This event occurs if a standby adapter becomes available The join_standby event displays a console mess...

Page 140: ...1 3 Event Notification You can specify a command or user defined script that provides notification for example mail that an event is about to happen and that an event has just occurred along with the...

Page 141: ...ents However the name of these scripts their location in the file system and their permission bits have to be identical 5 1 6 Event Emulator To test the effect of running an event on your cluster HACM...

Page 142: ...defined to the Error Notification facility however an executable that shuts down the node with the failed adapter could be run allowing the surviving node to take over the disk 5 3 Network Modules To...

Page 143: ...4284 5 4 NFS considerations For NFS to work correctly in an HACMP cluster environment you have to take care of some special NFS characteristics The HACMP scripts have only minimal NFS support You may...

Page 144: ...ity that uses the exportfs command with the i flag and specifies the file system names stored in the HACMP ODM object class Therefore export options specified in the etc exports file are ignored Howev...

Page 145: ...exported afs locally mounted afs nfs exported Ensure that the shared volume groups have the same major number on the server nodes This allows the clients to re establish the NFS mount transparently af...

Page 146: ...an application that issues lock requests using the flock system call Node A fails Node B then attempts to unmount the NFS mounted file system mount it as a local file system and export it for client u...

Page 147: ...fs in FILELIST do Is the filesystem mounted s says only return status x says exact match we use awk instead of cut because mount outputs lots of leading blanks that confuse cut etc mount awk print 2...

Page 148: ...time to die Only wait if at least one filesystem is mounted if MOUNTED true then sleep SLEEP fi FILELIST for i in do bin echo i done bin sort r for fs in FILELIST do Is the filesystem mounted s says o...

Page 149: ...the command errpt more or errpt a more Check that all devices are in the available state lsdev C more Check that the SCSI addresses of adapters on shared buses are unique lsattr E l ascsi0 If you are...

Page 150: ...1 3 Process State Check the paging space usage by issuing lsps a Look for all expected processes with ps ef more Check that the run queue is 5 and that the CPU usage is at an acceptable level vmstat...

Page 151: ...auto varyon are correctly defined and that the shared VG s are in the correct state lsvg and lsvg o Check that there are no stale partitions lsvg l Check that all appropriate file systems have been m...

Page 152: ...mands Note that the tmp hacmp out file is the most useful to monitor especially if the Debug Level of the HACMP Run Time Parameters for the nodes has been set to high and if the Application Server Scr...

Page 153: ...wap adapter has occurred Reconnect the network cable to the service interface This will cause the original service interface to become the standby interface Initiate a swap adapter back to the origina...

Page 154: ...appuid for application processes and Eprimary for Eprimary Start HACMP on NodeF smit clstart NodeT will release NodeF s cascading Resource Groups and NodeF will take them back over but NodeT or a lowe...

Page 155: ...NodeT Verify that failover has occurred netstat i and ping for networks lsvg o and vi of a test file for volume groups and ps U appuid for application processes Power cycle NodeF If HACMP is not conf...

Page 156: ...ote that you should record the values for sb_max and thewall prior to modifying them and as an extra check you may want to add the original values to the end of etc rc net The TCP IP subsystem failure...

Page 157: ...disk1 if for example hdisk1 is the mirror of hdisk0 bootlist m normal o Optional Prune the error log on NodeF errclear 0 Monitor cluster logfiles on NodeT if HACMP has been customized to monitor SCSI...

Page 158: ...ant RAIDiant Disk Array Manager List all SCSI RAID Arrays Verify that all sharedvg file systems and paging spaces are accessible df and lsps a If using RAID5 with Hot Spare verify that reconstruction...

Page 159: ...isk back in then sync the volume group syncvg NodeFvg Verify that all NodeFvg file systems and paging spaces are accessible df k and lsps a and that the partitions are not stale lsvg l NodeFvg 6 2 5 A...

Page 160: ...142 IBM Certification Study Guide AIX HACMP...

Page 161: ...sages written to the system console may scroll off screen before you notice them The following paragraphs provide an overview of the log files which are to be consulted for cluster troubleshooting as...

Page 162: ...ted messages generated by HACMP for AIX clstrmgr activity Information in this file is used by IBM Support personnel when the clstrmgr is in debug mode Note that this file is overwritten every time clu...

Page 163: ...running for more than 360 seconds can still be working on something and eventually get the job done Therefore it is essential to look at the tmp hacmp out file to find out what is actually happening 7...

Page 164: ...high water mark it must wait until enough I O operations have finished to make the low water mark See the AIX Performance Monitoring Tuning Guide SC23 2365 for more information on I O pacing 7 3 2 Ex...

Page 165: ...te with the other Let s consider a two node cluster where all networks have failed between the two nodes but each node remains up and running The problem with a partitioned cluster is that each node i...

Page 166: ...nd the start of IP address takeover scripts As the disks are being acquired by the takeover node or after the disks have been acquired and applications are running the missing node completes its proce...

Page 167: ...ind a solution to a problem in the cluster some sort of strategy is helpful for pinpointing the problem The following guidelines should make the troubleshooting process more productive Save the log fi...

Page 168: ...If you do and one of the changes corrects the problem you have no way of knowing which change actually fixed the problem Make one change test the change and then if necessary make the next change Do n...

Page 169: ...ter clstat utility which reports the status of key cluster components the cluster itself the nodes in the cluster and the network adapters connected to the nodes The HAView utility which monitors HACM...

Page 170: ...he client The clstat utility reports whether the cluster is up down or unstable It also reports whether a node is up down joining leaving or reconfiguring and the number of nodes in the cluster For ea...

Page 171: ...the time and date when they occurred 8 1 3 2 tmp hacmp out The tmp hacmp out file records the output generated by the configuration and startup scripts as they execute This information supplements an...

Page 172: ...ns timestamped messages in ASCII format These track the execution of internal activities of the grpsvcs daemon IBM support personnel use this information for troubleshooting The file gets trimmed regu...

Page 173: ...s the status of the nodes and their interfaces and invokes the appropriate scripts in response to node or network events All cluster nodes must run the clstrmgr daemon 8 2 1 2 Cluster SMUX Peer daemon...

Page 174: ...s required for cluster operation All HACMP ES cluster nodes must run the grpsvcsd daemon 8 2 1 8 Cluster Globalized Server Daemon daemon grpglsmd This daemon operates as a grpsvcs client its function...

Page 175: ...CP IP interfaces and to set the required network options 8 2 3 Stopping Cluster Services on a Node You stop cluster services on a node by executing the HACMP usr sbin cluster etc clstop script Use the...

Page 176: ...r You have the following options Graceful In a graceful stop the HACMP software shuts down its applications and releases its resources The other nodes do not take over the resources of the stopped nod...

Page 177: ...no hostnames or addresses HACMP server addresses must be provided by the user at installation time This file should contain all boot and service names or addresses of HACMP servers from any cluster ac...

Page 178: ...particular processor or architecture ensure that the new node is the same type of system Uniprocessor applications may run slower on SMP systems Slot capacity of the new node must be the same or bette...

Page 179: ...nt maintenance No command line intervention should be necessary to replace a failed disk in a RAID array Do the following steps in order to replace a disk that is a member of a RAID array 1 Remove the...

Page 180: ...MP 4 3 enhancements to the C SPOC LVM utilities the disk replacement does not cause system down time as long as the failed disk was part of a RAID array or if all the LVs on it are mirrored to other d...

Page 181: ...hared volume group and the information stored in the ODM are equal After changes in the volume group e g increasing the size of a file system the information about the volume group in ODM and in the V...

Page 182: ...lly owning the shared volume group 8 4 2 Lazy Update For LVM components under the control of HACMP for AIX you do not have to explicitly export and import to bring the other cluster nodes up to date I...

Page 183: ...Starting and Stopping HACMP on a Node or a Client on page 154 Without C SPOC functionality the system administrator must spend time executing administrative tasks individually on each cluster node Us...

Page 184: ...oncurrent mode and with HACMP 4 3 Remove a logical volume Shared file systems only applicable for non concurrent VGs List all shared file systems Change View the characteristics of a shared file syste...

Page 185: ...fore you start the TaskGuide make sure that You have a configured HACMP cluster in place You are on a graphics capable terminal 8 4 4 2 Starting the TaskGuide You can start the TaskGuide from the comm...

Page 186: ...iguration of cluster resources in the ODM on one node you must synchronize the change across all cluster nodes 8 5 2 Synchronize Cluster Resources You perform a synchronization by choosing the Synchro...

Page 187: ...ldare command The command lets you move the ownership of a series of resource groups to a specific node in that resource group s node list as long as the requested arrangement is not incompatible with...

Page 188: ...lted at the time the sticky location fails to find the highest priority node active After finding the active node cascading resource groups will continually migrate to the highest priority node in the...

Page 189: ...the placement of migrated resource groups default and stop The default and stop locations are special locations that determine resource group behavior and whether the resources can be reacquired Defau...

Page 190: ...des Resource migration first releases all specified resources wherever they reside in the cluster then it reacquires these resources on the newly specified nodes You can also use this command to swap...

Page 191: ...ode the DARE Resource Migration utility includes a command clfindres that makes a best guess estimate within the domain of current HACMP configuration policies of the state and location of specified r...

Page 192: ...below you might even be able to keep your mission critical application up and running during the update process provided that the takeover node is designed to carry its own load and the takeover load...

Page 193: ...er Node Along with the normal rules for applying updates the following general points should be observed for HACMP clusters Cluster nodes should be kept at the same AIX maintenance levels wherever pos...

Page 194: ...using the AIX cron facility While this is a very good procedure the HACMP cluster environment presents some special challenges The problem is you never know which machine has your application data onl...

Page 195: ...d you are back to a mirrored mode of operation with fully updated data The splitlvcopy command of AIX does much of the work required to implement this solution We can summarize the steps to do a split...

Page 196: ...ta is backed up i e the data this cluster node cares about during normal operations or in case of another s node failure and a subsequent takeover of this node s resources backing up both of the clust...

Page 197: ...on one of the cluster nodes the cl_lsuser command outputs a warning message but continues execution of the command on other cluster nodes 8 8 2 Adding User Accounts on all Cluster Nodes Adding a user...

Page 198: ...file and the files in the etc security directory To change the attributes of a user account on one or more cluster nodes you can either use the AIX chuser command in rsh to one cluster node after the...

Page 199: ...commands The restrictions on NIS are just the same as for users and therefore are not explained here in detail For more detailed information please refer to Chapter 12 of the HACMP for AIX Version 4...

Page 200: ...182 IBM Certification Study Guide AIX HACMP...

Page 201: ...an SP system Also the failure of the control workstation could cause the switch network to fail HACWS covers the following cases with a fully functional environment Continues running your SP system a...

Page 202: ...CWS Environment 9 1 2 Software Requirements Both of the control workstations must have the same software installed that is they must be on the same AIX level use the same PSSP software level and have...

Page 203: ...10 of the HACMP for AIX Version 4 3 Installation Guide SC23 4278 should be performed For HACWS control workstations the ssp hacws fileset has to be installed as well 9 1 5 HACWS Configuration Since t...

Page 204: ...ncluded in a resource group Recommended settings for this resource group are Resource Group Name hacws_group1 Node Relationship rotating Participating Node Names nodename of primary cws nodename of ba...

Page 205: ...r services startup with the following command grep SPCW_APPS COMPLETE tmp hacmp out Now you can cause a failover by stopping cluster services on the primary cws and see whether cws services are still...

Page 206: ...erberos server database When a client needs the services of a server the client must prove its identity to the server so that the server knows to whom it is talking Tickets are the means the Kerberos...

Page 207: ...ros principals so that remote kerberized commands will work On an SP the setup_authent command does the SP related kerberos setup which is based on the IP labels found in the SDR Since the SDR does no...

Page 208: ...hysically connected to one node to be transparently accessed by other nodes Importantly VSD supports only raw logical volumes not file systems The VSD facility is included in the ssp csd vsd fileset o...

Page 209: ...recommend disabling VSD cache because its management becomes counterproductive 2 From lv_X in which case the VSD device driver exploits Node X s normal LVM and Disk Device Driver Disk DD pathway to f...

Page 210: ...ned in the SDR and managed by either SP SMIT panels or the VSD Perspective VSDs can be in one of five states as shown in Figure 18 on page 192 Figure 18 VSD State Transitions This figure shows the pos...

Page 211: ...and provide transparent failover of VSDs among the nodes RVSD is a separately priced IBM LPP Figure 19 RVSD Function With reference to Figure 19 above Nodes X Y and Z form a group of nodes using VSD R...

Page 212: ...ecovery Communication adapter failures are treated the same as node failures The hc daemon is also called the Connection Manager It supports the development of recoverable applications The hc daemon m...

Page 213: ...l the others continue working without even noticing that something has happened on the switch network 9 4 1 Switch Basics Within HACMP Although it has already been mentioned in other places the follow...

Page 214: ...ge As the SP switch has its availability concept built in there is no need to do it outside the PSSP software so HACMP doesn t have to take care of it any more 9 4 3 Switch Failures As mentioned befor...

Page 215: ...6000 SP Topics 197 In case this node was the Eprimary node on the switch network and it is an SP switch then the RS 6000 SP software would have chosen a new Eprimary independently from the HACMP softw...

Page 216: ...198 IBM Certification Study Guide AIX HACMP...

Page 217: ...logy for heartbeating is called HACMP Extended Scalability HACMP ES see below for details Basically these two versions differ only in the way the cluster manager keeps track of the status of nodes ada...

Page 218: ...ftware stack Packaging these services with HACMP ES makes it possible to run this software on all RS 6000s not just on SP nodes RSCT Services include the following components Event Manager A distribut...

Page 219: ...rk File System for AIX The HANFS for AIX software provides a reliable NFS server capability by allowing a backup processor to recover current NFS activity should the primary NFS server fail The HANFS...

Page 220: ...t on an RS 6000 SP you need to have PSSP Version 3 1 installed As the HPS Switch is no longer supported with PSSP Version 3 1 you need to upgrade to the SP Switch in case you haven t already or you wi...

Page 221: ...you can define custom events These events can act on anything that haemd can detect which is virtually anything measurable on an AIX system How to customize events is explained in great detail in the...

Page 222: ...204 IBM Certification Study Guide AIX HACMP...

Page 223: ...hardware and software products and levels IBM may have patents or pending patent applications covering subject matter in this document The furnishing of this document does not give you any license to...

Page 224: ...them as completely as possible the examples contain the names of individuals companies brands and products All of these names are fictitious and any similarity to the names and addresses used by an a...

Page 225: ...ion under license Pentium MMX ProShare LANDesk and ActionMedia are trademarks or registered trademarks of Intel Corporation in the U S and other countries Network File System and NFS are trademarks of...

Page 226: ...208 IBM Certification Study Guide AIX HACMP...

Page 227: ...SP SG24 5145 Monitoring and Managing IBM SSA Disk Subsystems SG24 5251 AIX Version 4 3 Migration Guide SG24 5116 B 2 Redbooks on CD ROMs Redbooks are also available on CD ROMs Order a subscription an...

Page 228: ...1877 AIX Performance Monitoring and Tuning Guide SC23 2365 AIX HACMP for AIX Version 4 3 Concepts and Facilities SC23 4276 AIX HACMP for AIX Version 4 3 Planning Guide SC23 4277 AIX HACMP for AIX Vers...

Page 229: ...ands TOOLCAT REDPRINT TOOLS SENDTO EHONE4 TOOLS2 REDPRINT GET SG24xxxx PACKAGE TOOLS SENDTO CANVM2 TOOLS REDPRINT GET SG24xxxx PACKAGE Canadian users only To get BookManager BOOKs of redbooks type the...

Page 230: ...edish IBM Publications Publications Customer Support P O Box 29570 Raleigh NC 27626 0570 USA IBM Publications 144 4th Avenue S W Calgary Alberta T2P 3N5 Canada IBM Direct Services Sortemosevej 21 DK 3...

Page 231: ...nt by credit card not available in all countries Signature mandatory for credit card payment Title Order Number Quantity First name Last name Company Address City Postal code Telephone number Telefax...

Page 232: ...214 IBM Certification Study Guide AIX HACMP...

Page 233: ...trol DMS Deadman Switch DNS Domain Name Service DSMIT Distributed System Management Interface Tool FDDI Fiber Distributed Data Interface F W Fast and Wide SCSI GB Gigabyte GODM Global Object Data Mana...

Page 234: ...SC Reduced Instruction Set Computer SCSI Small Computer Systems Interface SLIP Serial Line Interface Protocol SMIT System Management Interface Tool SMP Symmetric Multi Processor SMUX SNMP see below Mu...

Page 235: ...groups NFS crossmounting issues 126 changing user accounts 180 cl_lsuser command using 179 cl_mkuser command using 179 cldare command 172 clfindres 173 clinfo 156 cllockd 155 clsmuxpd 155 clstat 152 c...

Page 236: ...stopping on clients 159 HACWS 183 HANFS for AIX 201 Hardware Address Swapping 12 hardware address swapping 40 planning 40 HAView 151 heartbeats 11 home directories 49 Hot Standby Configuration 30 hot...

Page 237: ...un Time Parameters 110 RVSD 193 S SCSI target mode 38 SCSI Disks 26 service adapter 38 service ticket 188 Shared LVM Component Configuration 81 Shared LVs and Filesystems 84 Shared VGs 82 single point...

Page 238: ...Guide AIX HACMP Token Ring 13 Topology Service 200 topsvcsd 156 U Upgrading 96 user accounts adding 179 changing 180 creating 179 removing 180 User and Group IDs 48 V VGDA 88 VGSA 88 Virtual Shared D...

Page 239: ...ode 1 914 432 8264 Send your comments in an Internet note to redbook us ibm com Which of the following best describes you _ Customer _ Business Partner _ Solution Developer _ IBM employee _ None of th...

Page 240: ...Printed in the U S A SG24 5131 00 IBM Certification Study Guide AIX HACMP SG24 5131 00...

Reviews: