background image

204

 

POWER7 and Optimization and Tuning Guide

Although the PAM document is the official source for any SAP-related platform support, the 
SAP Community Network topic, Supported Platforms/PARS for SAP Business Objects 
(

http://www.sdn.sap.com/irj/boc/articles?rid=/webcontent/uuid/e01d4e05-6ea5-2c10-1

bb6-a8904ca76411

) provides a good overview of SAP BusinessObjects releases and 

supported platforms.

Sizing for optimum performance

Adequate system sizing is essential for successfully running an SBOP BI solution. Therefore, 
SAP includes SBOP BI in its Quick Sizer tool. Based on the number of users who are 
planning to use the SBOP BI applications, the Quick Sizer tool provides an SAP Application 
Performance Standard (SAPS) number and the memory necessary to run the applications. 
IBM can then provide the correct system configuration that is based on these numbers. The 
Quick Sizer tool is available at 

http://service.sap.com/quicksizer

 (registration required). 

Also, the SAP BusinessObjects BI 4 - Companion Guide is available on the SAP Quick Sizer 
landing page at 

http://service.sap.com/quicksizer

 (registration required). To download the 

guide, open the web page and click Sizing Guidelines

 Solutions & Platforms

 SAP 

BusiessObjects. This guide explains the sizing process in detail using a 
sizing example. 

Landscape design

For general considerations about how to design an SBOP BI landscape, see the Master 
Guide
 (registration required), available at:

http://service.sap.com/~sapidb/011000358700001237052010E/xi4_master_en.pdf

There are six different reference architecture landscapes available that demonstrate how to 
implement an SBOP BI solution that is based on the type of enterprise resource planning 
(ERP) and data warehouse solution you are using. 

The architecture documents are in the Getting Started with SAP BusinessObjects BI 
Solutions
 topic on the SAP Community Network (

http://scn.sap.com/docs/DOC-27193

) in the 

Implementation and Upgrade section.

Содержание Power System POWER7 Series

Страница 1: ...John MacMillan Sudhir Maddali K Madhusudanan Bruce Mealey Steve Munroe Francis P O Connell Sergio Reyes Raul Silvera Randy Swanberg Brian Twichell Brian F Veale Julian Wang Yaakov Yaari Discover simp...

Страница 2: ......

Страница 3: ...International Technical Support Organization POWER7 and POWER7 Optimization and Tuning Guide November 2012 SG24 8079 00...

Страница 4: ...cted by GSA ADP Schedule Contract with IBM Corp First Edition November 2012 This edition pertains to Power Systems servers based on POWER7 and POWER7 processor based technology Specific software level...

Страница 5: ...POWER7 features 25 2 3 1 Page sizes 4 KB 64 KB 16 MB and 16 GB 25 2 3 2 Cache sharing 29 2 3 3 SMT priorities 35 2 3 4 Storage synchronization sync lwsync lwarx stwcx and eieio 37 2 3 5 Vector Scalar...

Страница 6: ...6 2 1 Common prerequisites 109 6 2 2 XL compiler family 110 6 2 3 GCC compiler family 112 6 3 IBM Feedback Directed Program Restructuring 114 6 3 1 Introduction 114 6 3 2 FDPR supported environments...

Страница 7: ...performance tooling 143 8 6 1 High level investigation 143 8 6 2 Low level investigation 144 8 7 Conclusion 144 8 8 Related publications 144 Chapter 9 WebSphere Application Server 147 9 1 IBM WebSpher...

Страница 8: ...Oracle 11gR2 preferred practices for AIX V6 1 and AIX V7 1 on Power Systems 188 Migrating Sybase ASE to POWER7 193 Implementing Sybase IQ to POWER7 195 Environment variables 195 Special consideration...

Страница 9: ...o not in any manner serve as an endorsement of those websites The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk IBM may u...

Страница 10: ...M Micro Partitioning Power Architecture POWER Hypervisor Power Systems Power Systems Software POWER6 POWER7 Systems POWER7 POWER7 PowerLinux PowerPC PowerVM POWER PureSystems Rational Redbooks Redbook...

Страница 11: ...who are responsible for performing migration and implementation activities on IBM POWER7 based servers which includes system administrators system architects network administrators information archite...

Страница 12: ...th IBM STG Systems Solution Development Her responsibilities include end to end system and software design performance from application middleware and operating system to hardware Her fields of expert...

Страница 13: ...as written extensively about IBM POWER performance and Java performance Francis P O Connell is a member of the IBM Systems and Technology Group in Austin Texas He is a Senior Technical Staff Member in...

Страница 14: ...in Computer Science from the University of Oklahoma where he was a Graduate Assistance in Areas of National Need GAANN Fellow and a lecturer teaching courses in Computer Science and Electrical and Co...

Страница 15: ...Sanjay Ruprell IT Specialist Foster City California Bruce P Semple STG Power and z Systems Architecture Gaithersburg Maryland Maneesh Sharma POWER Sales Representative Englewood Cliffs New Jersey Wol...

Страница 16: ...o IBM Corporation International Technical Support Organization Dept HYTD Mail Station P099 2455 South Road Poughkeepsie NY 12601 5400 Stay connected to IBM Redbooks Find us on Facebook http www facebo...

Страница 17: ...nd tuning on IBM POWER7 and IBM POWER7 This chapter describes the optimization and tuning of the IBM POWER7 system It covers the the following topics Introduction Outline of this guide Conventions tha...

Страница 18: ...given for the POWER7 and POWER7 processors the general guidance is applicable to the IBM POWER6 POWER5 and even to earlier processors This guide is directed at personnel who are responsible for perfo...

Страница 19: ...provide full support and usage of POWER7 technologies and systems Linux is based on community efforts that are focused not only on the Linux kernel but also all of the complementary packages tools too...

Страница 20: ...esign are making it more important than ever to consider analyzing and working to improve application performance In the past two of the ways in which newer processor chips delivered higher performanc...

Страница 21: ...ories 1 Lightweight tuning and optimization guidelines Lightweight tuning covers simple prescriptive steps for tuning application performance on POWER7 These simple steps can be carried out without de...

Страница 22: ...cs that are shared among processor chips in the same family each generation of processor chip has unique performance characteristics Optimizing code for POWER7 requires that you set up a test bed on a...

Страница 23: ...ormance testing initially should be done in a non virtualized environment to minimize the factors that affect performance Ensure that the LPAR is running an up to date version of the operating system...

Страница 24: ...t link executable image such as one produced by static compilers and applies additional optimizations FDPR is another tool that can be considered for optimizing applications that are based on an execu...

Страница 25: ...Not a Number NaN signed zeros infinities floating point expression reorganization or setting the errno variable The new Ofast GCC option includes O3 and ffast math and might include other options in...

Страница 26: ...ery large Java heap is being used On Power Systems the Xcodecache option often delivers a small improvement in performance especially in a large Java application This option specifies the size of each...

Страница 27: ...the pool front end Larger allocations are handled with good scalability by the multiheap malloc A simple example of specifying the pool and multiheap combination is by using the environment variable s...

Страница 28: ...64k bdatapsize 64k bstackpsize 64k executable 3 Using linker options at build time cc btextpsize 64k bdatapsize 64k bstackpsize 64k ld btextpsize 64k bdatapsize 64k bstackpsize 64k All of these mechan...

Страница 29: ...ployment guidelines This section discusses deployment guidelines as they relate to virtualized and non virtualized environments and the effect of partition size and affinity on deployments Virtualized...

Страница 30: ...apter 2 The POWER7 processor on page 21 This short description provides some background to help understand two important performance issues that are known as affinity effects Cache affinity The hardwa...

Страница 31: ...ions are started first It is a preferred practice to start higher priority partitions first so that there is a better opportunity for them to obtain good affinity characteristics in their core and mem...

Страница 32: ...n about ASO the MEMORY_AFFINITY environment variable the execrset command and related environment variables and commands see Chapter 4 AIX on page 67 The same forced affinity can be established on Lin...

Страница 33: ...va methods are compiled to binary code With considerably small partitions there might be a long warm up period before reaching steady state performance where a 0 05 LPAR cannot get additional cycles f...

Страница 34: ...8 routines This situation is associated with a locking issue This locking might ultimately arise at the system level as seen with malloc locking issues on AIX or at the application level in Java code...

Страница 35: ...lts with a GUI The GUI presents information about thread states and has powerful features to drill down to see call chains The WAIT tool results combine many of the features of a time based profile a...

Страница 36: ...20 POWER7 and POWER7 Optimization and Tuning Guide...

Страница 37: ...R7 processor This chapter introduces the POWER7 processor and describes some of the technical details and features of this product It covers the the following topics Introduction to the POWER7 process...

Страница 38: ...e particular POWER7 system Figure 2 1 The POWER7 processor chip Each core is a 64 bit implementation of the IBM Power ISA Version 2 06 Revision B and has the following features Multi threaded design c...

Страница 39: ...e There are no new instructions in POWER7 over POWER7 The differences in POWER7 are Manufactured with 32 nm technology A 10 MB L3 cache per core On chip encryption accelerators On chip compression acc...

Страница 40: ...e and thread boundaries can improve application scaling Details about operating system binding facilities are available in 4 1 AIX and system libraries on page 68 and include Affinity topology binding...

Страница 41: ...tion Lookaside Buffer TLB Misses A single large page that is being constantly referenced remains in memory This feature eliminates the possibility of several small pages often being swapped out Unhind...

Страница 42: ...y based on processor chip type AIX V6 1 supports segments with two page sizes 4 KB and 64 KB By default processes use these variable page size segments This configuration is overridden by the existing...

Страница 43: ...4 KB page size The PSPA can be set for the whole system by using the vmm_default_pspa vmo tunable or for a specific process by using the vm_pattr system call 8 In addition to 4 KB and 64 KB page sizes...

Страница 44: ...4K bstackpsize 64K sub1 o sub2 o The ldedit command can be used to set these page size options in an existing executable command ldedit btextpsize 4K bdatapsize 64K bstackpsize 64K mpsize out We can s...

Страница 45: ...ou can use to scale the hardware to many nodes of processor chips and memory One advantage is that systems can be used for multiple workloads and workloads that are large However these characteristics...

Страница 46: ...a condition to be satisfied and then being resumed on a different core Any application data that is in the cache local to the original core is no longer in the local cache because the application thre...

Страница 47: ...mory Table 2 6 shows the cache sizes and related geometry information for POWER7 Figure 2 2 POWER7 chip and local memory17 Table 2 6 POWER7 storage hierarchy18 17 Ibid Cache POWER7 POWER7 L1 i cache C...

Страница 48: ...ur CPU power from invisibly going down the drain 20 it is also important to carefully assess the impact of this strategy especially when applied to systems where there are a high number of CPU cores a...

Страница 49: ...y done by the POWER7 hardware and is configurable as described in 2 3 7 Data prefetching using d cache instructions and the Data Streams Control Register DSCR on page 46 Alignment of data Processors a...

Страница 50: ...rning data can end up being routed through busses that connect multiple chips and memory which have particular bandwidth and latency characteristics The goal for scaling across multiple cores then is...

Страница 51: ...read_set_smt_priority system call in AIX The result can be boosted performance for the sibling SMT threads on the same processor core Concepts and benefits The POWER processor architecture uses SMT to...

Страница 52: ...ified the SMT thread priority see APIs on page 37 Where to use SMT thread priority can be used to improve the performance of a workload by lowering the SMT thread priority that is being used on an SMT...

Страница 53: ...iate data consistency because of their inherent heavyweight nature Concepts and benefits The Power Architecture defines a storage model that provides weak ordering of storage accesses The order in whi...

Страница 54: ...provides an order for storage accesses caused by load store dcbz eciwx and ecowx instructions 42 Where to use Care must be taken when you use synchronization mechanisms in any processor architecture b...

Страница 55: ...oint Operations conforming to the IEEE 754 Standard for Floating Point Arithmetic The introduction of VSX in to the Power Architecture increases the parallelism by providing Single Instruction Multipl...

Страница 56: ...Environment of Power ISA v2 06 a white paper from Power org available at https www power org documentation whats new in the server environment of power isa v2 06 registration required FPR0 VSR0 FPR1...

Страница 57: ...r data types and the size and possible values for each type Table 2 9 Vector data types Type Interpretation of content Range of values vector unsigned char 16 unsigned char 0 255 vector signed char 16...

Страница 58: ...or literal or any expression that has the same vector type For example 55 vector unsigned int v1 vector unsigned int v2 vector unsigned int 10 XL only not GCC v1 v2 The number of values in a braced in...

Страница 59: ...O3 For Fortran xlf qarch pwr7 qtune pwr7 O3 qhot qsimd gfortran mcpu power7 mtune power7 O3 Using Engineering and Scientific Subroutine ESSL libraries with vectorization support Select routines have v...

Страница 60: ...t in performance because it is usually implemented in software IBM POWER6 and POWER7 processor based systems provide hardware support for DFP arithmetic The POWER6 and POWER7 microprocessor cores incl...

Страница 61: ...e Toolchain compiler and run time The Advance Toolchain runtime libraries can also be integrated with recent XL V9 compilers for DFP exploitation The latest Advance Toolchain compiler and run times ca...

Страница 62: ...your applications are using DFP There are two AIX commands that are used for monitoring hpmstat for monitoring the whole system hpmcount for monitoring a single program The PM_DFU_FIN DFU instruction...

Страница 63: ...bt and dcbtst instructions provide hints about a sequence of accesses to data elements or indicate the expected use Such a sequence is called a data stream and a dcbt or dcbtst instruction in which TH...

Страница 64: ...ater but can also degrade performance by expending bandwidth on cache lines that are not later referenced or by displacing cache lines that are later referenced by the program Similarly setting DPFD t...

Страница 65: ...returns the values in the output buffer struct dscr_properties defined in sys machine h DSCR_SET_DEFAULT Sets a 64 bit DSCR value in a buffer pointed to by buf_p as the operating system default Retur...

Страница 66: ...ause the system call writes the new value both in the process context and in the DSCR When a thread runs dcsr_ctl to change the prefetch depth for the process the new value is written into the AIX pro...

Страница 67: ...haracteristic Performance can be improved by disabling hardware prefetching in these cases by running the following command dscrctl n s 1 This system partition wide disabling is only appropriate if it...

Страница 68: ...r to the following sections Section 3 1 Program Priority Registers Section 3 2 or Instruction Section 4 3 4 Program Priority Register Section 4 4 3 OR Instruction Section 5 3 4 Program Priority Regist...

Страница 69: ...htm splat Command found at http publib boulder ibm com infocenter aix v7r1 index jsp topic com ibm aix cmds doc aixcmds5 splat htm trace Daemon found at http publib boulder ibm com infocenter aix v7r...

Страница 70: ...54 POWER7 and POWER7 Optimization and Tuning Guide...

Страница 71: ...ved 55 Chapter 3 The POWER Hypervisor This chapter introduces the POWER7 Hypervisor and describes some of the technical details for this product It covers the the following topics Introduction to the...

Страница 72: ...ng tool that takes virtualization impacts into consideration such as the IBM Workload Estimator to estimate capacity for each partition One of the goals of virtualization is maximizing usage This usag...

Страница 73: ...inter partition communication VLANs option that is used for higher network performance Shared Ethernet versus host Ethernet Virtual disk I O Virtual small computer system interface vSCSI N_Port ID Vir...

Страница 74: ...if the partition is defined to run in a specific virtual shared processor pool the number of virtual processors ought not to exceed the maximum that is defined for the specific virtual shared processo...

Страница 75: ...virtual processors Matching entitlement of an LPAR close to its average usage for better performance The aggregate entitlement minimum or wanted processor capacity of all LPARs in a system is a facto...

Страница 76: ...ld value of 49 the primary thread of a core is used before unfolding another virtual processor to consume another core from the shared pool on POWER7 Systems If free cores are available in the shared...

Страница 77: ...ut also makes the table become sparse which results in the following situations A dense page table tends to help with better cache affinity because of reloads Less memory that is consumed by the hyper...

Страница 78: ...mains that are assigned to the LPAR Setting lpar_placement 0 is the default setting and follows the existing rules when SPPL is set to MAX How to determine if an LPAR is contained within a domain From...

Страница 79: ...y so that the performance impact of resource addition or deletion is minimal Planning for growth helps alleviate the fragmentation that is caused by DLPAR operations Knowing the LPARs that must grow o...

Страница 80: ...y on demand as business needs for compute capacity grows Therefore a Power System might not have all resources that are licensed which poses a challenge to allocate both cores and memory from a local...

Страница 81: ...le called the Dynamic Platform Optimizer This optimizer automates the manual steps to improve resource placement For more information visit the following Web site and select the Doc type Word document...

Страница 82: ...Guide Virtual I O VIO and Virtualization found at http www ibm com developerworks wikis display virtualization VIO Virtualization Best Practice found at http www ibm com developerworks wikis display...

Страница 83: ...This chapter describes the optimization and tuning of a POWER7 processor based server running the AIX operating system It covers the the following topics AIX and system libraries AIX Active System Opt...

Страница 84: ...are Default allocator The default allocator is selected when the MALLOCTYPE environment variable is unset This setting maintains a consistent performance even in a worst case scenario but might not b...

Страница 85: ...ts This suboption is similar to the built in bucket allocator of the Watson allocator However with this option you can have fine grained control over the number of buckets number of blocks per bucket...

Страница 86: ...d is scalable for small allocations while multiheap ensures scalability for larger and less frequent allocations 8 If you notice high memory usage in the application process even after you run free th...

Страница 87: ...For more information about this topic see 4 4 Related publications on page 94 File system performance benefits AIX Enhanced Journaled File System JFS2 is the default file system for 64 bit kernel envi...

Страница 88: ...h improves performance because I O operations and applications processing can run simultaneously Many applications such as databases and file servers take advantage of the ability to overlap processin...

Страница 89: ...address space The mmap subroutine provides a unique object address for each process that maps to an object The software accomplishes this task by providing each process with a unique virtual address w...

Страница 90: ...point registers by the ABI the high 32 bits of all fixed point registers are treated as volatile or undefined by the ABI The 32 bit ABI preserves only 32 bit fixed point context across subroutine lin...

Страница 91: ...ituation is in contrast to the manual usage of the affinity APIs documented in this section Processor affinity bindprocessor Processor affinity is the probability of dispatching of a thread to the log...

Страница 92: ...plications receive a handle to the RSET The RSET handle datatype rsethandle_t in sys rset h is then used in RSET APIs to manipulate or attach the RSET Summary of RSET commands Here is a summary of the...

Страница 93: ...n about an RSET rs_getrad Get resource allocation domain information from an input RSET rs_numrads Returns the number of system resource allocation domains at the specified system detail level that ha...

Страница 94: ...n POWER7 Systems Enhanced Affinity extends the AIX existing memory affinity support AIX V6 1 technology level 6100 05 contains AIX Enhanced Affinity support Enhanced Affinity status is determined duri...

Страница 95: ...ory by using the sra_detach function new Hybrid thread and core AIX provides facilities to customize simultaneous multi threading SMT characteristics of CPUs running within a partition The features re...

Страница 96: ...o take advantage of this hybrid mode are Asymmetric workload where the performance of one thread serializes an entire workload For example one master thread dispatches work to many subordinate threads...

Страница 97: ...all to thread_wait is not blocked but returns with success immediately Multiple posts to the same thread without an intervening wait by the specified thread counts only as a single post The posting re...

Страница 98: ...e software package must be split into shareable and non shareable files Shareable files such as executable code and message catalogs must be installed into the shared global file systems that are read...

Страница 99: ...are described in this section there are no application visible changes or awareness required AIX encrypted file system EFS Integrated with the AIX Journaled File System JFS2 is the ability to create...

Страница 100: ...ptimizer ASO and Dynamic System Optimizer DSO attempt to address the optimization of both the operating system and server autonomously 4 2 1 Concepts DSO is built on the Active System Optimizer ASO fr...

Страница 101: ...ystem Optimization strategies Two optimization strategies are provided with ASO Cache affinity optimization Memory affinity optimization DSO adds two more optimizations to the ASO framework Large page...

Страница 102: ...affinity ASO analyzes the cache access patterns that are based on information from the kernel and the PMU to identify potential improvements in cache affinity by moving threads of workloads closer tog...

Страница 103: ...optimization the workload must pass certain minimum criteria as described in this section Ideal workloads Workload characteristics for each optimization are Cache affinity optimization and memory affi...

Страница 104: ...e placed normally CPU usage The CPU usage of the workload should be above 0 1 cores Workload age Workloads must be at least 10 seconds of age to be considered for cache affinity and 5 minutes of age f...

Страница 105: ...Memory prefetch optimization 30 minutes 4 2 4 The asoo command The ASO framework is off by default in an AIX installation The asoo command must be used to enable the ASO framework The command syntax...

Страница 106: ...erSaver mode on the HMC causes virtual processor management in a dedicated environment to be enabled forcing cache and memory affinity optimizations to be disabled Enabling active memory sharing disab...

Страница 107: ...aso log This file lists major ASO events such as when it is enabled or disabled or when it hibernates It also contains a basic audit trail of optimizations that are performed to workloads Example 4 1...

Страница 108: ...rred practices that are applicable to all Power Systems generations AIX preferred practices that are applicable to POWER7 POWER7 mid range and high end High Impact or Pervasive advisory 4 3 1 AIX pref...

Страница 109: ...1 For more information about this topic see 4 4 Related publications on page 94 4 3 3 POWER7 mid range and high end High Impact or Pervasive advisory IBM maintains a strong focus on the quality and r...

Страница 110: ...ibm com infocenter aix v7r1 topic com ibm aix baseadmn do c baseadmndita excluseprocrecset htm execrset command found at http publib boulder ibm com infocenter aix v7r1 topic com ibm aix cmds doc ai...

Страница 111: ...ory index html thread_post Subroutine found at http publib boulder ibm com infocenter aix v7r1 index jsp topic com ibm aix basetechref doc basetrf2 thread_post htm thread_post_many Subroutine found at...

Страница 112: ...96 POWER7 and POWER7 Optimization and Tuning Guide...

Страница 113: ...7 Chapter 5 Linux This chapter describes the optimization and tuning of the POWER7 processor based server running the Linux operating system It covers the following topics Linux and system libraries L...

Страница 114: ...s are enabled to run on small Power Micro Partitioning partitions through the broad range of IBM Power offerings from low cost PowerLinux servers and Flex System nodes up through the largest IBM Power...

Страница 115: ...cpu and mtune compiler flags might be the best option For example mcpu power7 allows the compiler to use all the new instructions such as the Vector Scalar Extended category The mcpu power7 option als...

Страница 116: ...instructions or POWER6 no vector double instructions machines You can optimize all three Power platforms if you build and install your application and libraries correctly by completing the following...

Страница 117: ...e best used with the libhugetlbfs API Large segments can be used to back shared memory malloc storage and main program text and data segments incorporating large pages for shared library text or data...

Страница 118: ...default malloc implementation uses trylock techniques to detect contentions between POSIX threads and then tries to assign each thread its own arena This action works well when the same thread frees s...

Страница 119: ...e html Massif a heap profiler available at http valgrind org docs manual ms manual html For more details about memory management tools see Empirical performance analysis using the IBM SDK for PowerLin...

Страница 120: ...commands export TCMALLOC_MEMFS_MALLOC_PATH libhugetlbfs export HUGETLB_ELFMAP RW export HUGETLB_MORECORE yes Where TCMALLOC_MEMFS_MALLOC_PATH libhugetlbfs defines the libhugetlbfs mount point HUGETLB...

Страница 121: ...uild and improves overall performance Previously the TOC mfull toc defaulted to a single instruction access form that restricts the total size of the TOC to 64 KB This configuration can cause large pr...

Страница 122: ...ific logical processors The setaffinity API allows processes and threads to have affinity to specific logical processors The number and numbering of logical processors is a product of the number of pr...

Страница 123: ...or C C and Fortran This chapter describes the optimization and tuning of the POWER7 processor based server using compilers and tools It covers the following topics Compiler versions and optimization l...

Страница 124: ...dvantage of the more advanced compiler optimization For numerical or compute intensive codes the XL compiler options O3 or qhot O3 enable loop transformations which improve program performance by rest...

Страница 125: ...analysis and transformations improve runtime performance by changing the translation of the program source into assembly code Changes in these translations might cause the application to behave differ...

Страница 126: ...This situation can be useful for older code that is written without following these rules The options to request this optimization are qalias noansi for C C and qalias nostd for Fortran High order tra...

Страница 127: ...nk step can increase significantly Optimization that is based on Profile Directed Feedback Profile based optimization allows the compiler to collect information about the program behavior and use that...

Страница 128: ...ions include fpeel loops funroll loops ftree vectorize fvect cost model mcmodel medium Specifying the mveclibabi mass option and linking to the MASS libraries enables more loops for ftree vectorize Th...

Страница 129: ...ormation in the resulting object file Then at application link time the linker can collect all the objects with additional information and pass them back to the compiler GCC for whole program IPA and...

Страница 130: ...the executable binary file of a program by collecting information about the behavior of the program while the program is used for a typical workload and then creates a new version of the program that...

Страница 131: ...n optimized version of the input 6 3 2 FDPR supported environments FDPR is available on the following platforms AIX and Power Systems Part of the AIX 5L V5 operating system and higher for both 32 bit...

Страница 132: ...roof Run man fdpr for more information about this wrapper Special input and output files FDPR has a number of options that control input and output files One option that controls the input files is ig...

Страница 133: ...This information might also be interspersed with warning and debugging messages Use the quiet q option to avoid progress information To limit the warning information use the warning l w l option 6 3...

Страница 134: ...e or absolute location where it was created in the instrumentation step or where specified originally by fdir Use the FDPR_PROF_NAME environment variable to specify the profile file name if the profil...

Страница 135: ...en the basic blocks are mostly not taken This configuration makes instruction prefetching more efficient Chains are terminated when the heat that is execution count goes below a certain threshold rela...

Страница 136: ...it de virtualizes the virtual method calls by calling the actual targets directly The optimized code compares the address of the function descriptor which is used for the indirect call against the add...

Страница 137: ...termine the data that is contained in each register at each point in the function and whether this value is used later The function optimizations are killed regs kr A register is considered killed at...

Страница 138: ...ream The most common place is following a function call in code Because the call might have modified the TOC anchor register R2 the compiler inserts a load instruction that resets R2 to its correct va...

Страница 139: ...ational found at http www ibm com rational cafe community ccpp FDPR Post Link Optimization for Linux on Power found at https www ibm com developerworks mydeveloperworks groups service html communi tyv...

Страница 140: ...124 POWER7 and POWER7 Optimization and Tuning Guide...

Страница 141: ...describes the optimization and tuning of Java based applications that are running on a POWER7 processor based server It covers the following topics Java levels 32 bit versus 64 bit Java Memory and pa...

Страница 142: ...R7 supports prefetch instructions for transient data which is needed but this data must be evacuated from the CPU caches with priority which results in more efficient usage of CPU caches and leads to...

Страница 143: ...age of medium 64 KB and large 16 MB page sizes that are supported by the current AIX versions and POWER processors Using medium or large pages instead of the default 4 KB page size can improve applica...

Страница 144: ...ministrator can add this capability by running chuser chuser capabilities CAP_BYPASS_RAC_VMM CAP_PROPAGATE user_id On Linux 1 GB of 16 MB pages are configured by running echo echo 64 proc sys vm nr_hu...

Страница 145: ...ompiler has a cap on how much memory it can allocate at run time to store compiled code and for most of applications the default cap is more than sufficient However certain programs especially those p...

Страница 146: ...independent of any running JVM and persists until it is deleted A shared cache can contain Bootstrap classes Application classes Metadata that describes the classes Ahead of time AOT compiled code 7 4...

Страница 147: ...y default up to 25 of the heap is dedicated to the new space The division between the new space and the old space can be controlled with the Xmn option which specifies the size of the new space the re...

Страница 148: ...ds of the application can lower the overall GC impact If an application requires more flexibility than can be achieved with a constant sized heap it might be beneficial to tune the sizing parameters f...

Страница 149: ...fault SMT modes Table 7 3 SMT mode on POWER7 is dependent upon AIX and compatibility mode Most applications benefit from SMT However some applications do not scale with an increased number of logical...

Страница 150: ...T threads that belong to one core Create an RSET with eight logical CPUs by selecting eight SMT threads that belong to two cores The smtctl command can be used to determine which logical CPUs belong t...

Страница 151: ...ses the cost of acquiring the monitor can be reduced by using the XlockReservation option With this option it is assumed that the last thread to acquire the monitor is also likely to be the next threa...

Страница 152: ...concurrentlevel number option which specifies the ratio between the amounts of heap that is allocated and heap marked The default value is 8 The number of low priority mark threads can be set with the...

Страница 153: ...ocessor based server running IBM DB2 It covers the following topics DB2 and the POWER7 processor Taking advantage of the POWER7 processor Capitalizing on the compilers and optimization tools for POWER...

Страница 154: ...on POWER7 Systems through the DB2 registry variable DB2_RESOURCE_POLICY In general this variable defines which operating system resources are available for DB2 databases or assigns specific resources...

Страница 155: ...er for large pages support by running vmo vmo r o lgpg_size LargePageSize o lgpg_regions LargePages LargePageSize is the size in bytes of the hardware supported large pages and LargePages specifies th...

Страница 156: ...ons On the AIX platform whole program analysis IPA and profile based optimizations PDF compiler options are used to optimize DB2 using a set of customer representative workloads This technique produce...

Страница 157: ...k for available memory in the system when instance_memory is set to automatic DB2 also supports the PowerVM Live Partition Mobility LPM feature when virtual I O is configured LPM allows an active data...

Страница 158: ...tems requires many individual function calls typically as many as the number of EDUs being woken up 8 5 2 File systems DB2 uses most of the advanced features within the AIX file systems These features...

Страница 159: ...vers configuration parameter By default this parameter is automatically tuned during database startup For more information about how to monitor and tune AIO for DB2 see Best Practices for DB2 on AIX 6...

Страница 160: ...le is a system profiling tool similar in nature to tprof that is popular on the Linux platform OProfile uses hardware counters to provide functional level profiling in both the kernel and user space L...

Страница 161: ...SOURCE_POLICY and DB2_LARGE_PAGE_MEM http pic dhe ibm com infocenter db2luw v10r1 index jsp topic com ibm db2 luw admin regvars doc doc r0005665 html DB2 Virtualization SG24 7805 DECFLOAT The data typ...

Страница 162: ...146 POWER7 and POWER7 Optimization and Tuning Guide...

Страница 163: ...ght IBM Corp 2012 All rights reserved 147 Chapter 9 WebSphere Application Server This chapter describes the optimization and tuning of the POWER7 processor based server running WebSphere Application S...

Страница 164: ...rations Table 9 1 Installation considerations 9 1 2 Deployment When you start the WebSphere Application Server there is an option to bind the Java processors to specific CPU processor cores to circumv...

Страница 165: ...WER hardware POWER5 or POWER6 you might experience scalability issues because the default SMT mode on POWER7 is SMT4 but on POWER5 and POWER6 the default is SMT and SMT2 mode As some of these applicat...

Страница 166: ...nce of WebSphere Application Server on POWER7 Systems For an example of using the taskset and numactl commands in a Linux environment see Partition sizes and affinity on page 14 More information about...

Страница 167: ...pendix A Analyzing malloc usage under AIX This appendix describes the optimization and tuning of the POWER7 processor based server by using the AIX malloc subroutine It covers the following topics Int...

Страница 168: ...resented here presents a basic view How to collect malloc usage information To discover the distribution of allocation sizes set the following environment variable export MALLOCOPTIONS buckets bucket_...

Страница 169: ...hows a sample output Example A 2 Sample output from the malloc subroutine dbx malloc The following options are enabled Implementation Algorithm Default Allocator Yorktown Malloc Log Stack Depth 4 Stat...

Страница 170: ...154 POWER7 and POWER7 Optimization and Tuning Guide...

Страница 171: ...l performance analysis This appendix describes the optimization and tuning of the POWER7 processor based server from the perspective of performance tooling and empirical performance analysis It covers...

Страница 172: ...in Expert system advisors on page 156 The fourth advisor is part of the IBM Rational Developer for Power Systems Software It is a component of an integrated development environment IDE which provides...

Страница 173: ...plains why a particular topic was monitored and provides a definition of the performance metric or setting 2 Why is it Important This report entry explains why the topic is relevant and how it impacts...

Страница 174: ...capture from the VIOS Advisor Figure B 2 The VIOS Advisor Virtualization Performance Advisor The Virtualization Performance Advisor provides guidance for various aspects of an LPAR both dedicated and...

Страница 175: ...ser in determining the best possible configuration The LPAR Performance Advisor can be found at https www ibm com developerworks wikis display WikiPtype PowerVM Virtualization performance advisor Figu...

Страница 176: ...and the user s expertise level Figure B 4 is a snapshot of Java and WebSphere Application Server recommendations from a sample run indicating the best JVM optimization and WebSphere Application Server...

Страница 177: ...s generated by the compiler allows this data to be matched back to the original source code XLC compilers can generate XML report files that provide information about optimizations that were performed...

Страница 178: ...s Here the fundamental measure of performance is throughput the number of transactions that are run over a period with an acceptable response time Other applications are more batch oriented where few...

Страница 179: ...flag instructs tprof to report CPU time in the number of ticks that is samples instead of percentages The x sleep 10 argument instructs tprof to collect profiling data during the running of the sleep...

Страница 180: ...76 606265 1 1 0 0 0 usr bin trcstop 245976 606263 1 1 0 0 0 swapper 0 3 1 1 0 0 0 rmcd 155876 348337 1 1 0 0 0 Total 7504 5865 1637 2 0 Total Samples 7504 Total Elapsed Time 18 76s Example B 3 from th...

Страница 181: ...es the system call was run during the monitoring interval Total Time Amount of CPU time in milliseconds consumed in running the system call sys time Percentage of overall CPU capacity that is spent in...

Страница 182: ...tal Time Avg Time Min Time Max Time SVC Address msec msec msec msec 492 157 0663 0 3192 0 0032 0 6596 listio64 516ea40 494 3 3656 0 0068 0 0002 0 0163 GetMultipleCompletionStatus 549a6a8 12 0 0238 0 0...

Страница 183: ...xample of a mutex report from splat is shown in Example B 7 Example B 7 Mutex report from splat pthread MUTEX ADDRESS 00000000F0154CD0 Parent Thread 0000000000000001 creation time 26 232305 Pid 18396...

Страница 184: ...or suspended Percent Held This field contains the following subfields Real CPU The percentage of the cumulative processor time the lock was held by a running thread Real Elapsed The percentage of the...

Страница 185: ...ommand runs To detect alignment issues that are handled by microcode run hpmcount to collect data for group 38 An example is provided in Example B 8 Example B 8 Example of the results of the hpmcount...

Страница 186: ...stat htm Finding emulation issues Over the 20 year evolution of the Power instruction set a few instructions were removed Instead of trapping programs that run these instructions AIX emulates them in...

Страница 187: ...cts performance data on a system wide basis rather than just for the execution of a command Further documentation about hpmcount and hpmstat can be found at http publib boulder ibm com infocenter aix...

Страница 188: ...e OProfile can be run directly as a command line tool or under the IBM SDK for PowerLinux The OProfile tools can monitor the whole system LPAR including all the tasks and the kernel This action requir...

Страница 189: ...n the kernel Reserve POSIX pthread_spinlock and sched_yield for applications that have exclusive use of the system and with carefully designed thread affinity assigning specific threads to specific co...

Страница 190: ...roduct owner to provide a build using higher optimization Alternatively for open source library packages you can build your own optimized binary version of those packages Deeper empirical analysis If...

Страница 191: ...shows a nested set of twisties starting with the event cycles by default then program library function and source line within function The developer drills down by opening the twisties in the profile...

Страница 192: ...esulting FDPR journal is used to drive the SCA analysis Running FDPR and retrieving the journal is can be automated by clicking Profile as Profile with Source Code Advisor Java either AIX or Linux Foc...

Страница 193: ...er using graphs and other diagrams allowing trends and totals to be easily and quickly recognized The graphs can be used to determine the minimum and maximum heap usage growth and shrink rates over ti...

Страница 194: ...it is more common to profile Java after a warm up period so that JIT compilation activity has generally completed To profile after a warm up start Java and wait an appropriate interval until steady s...

Страница 195: ...create lock contention A new Double Num B new Double Num C new Double Num initialize calculate d C 0 doubleValue return d use the calculated values public static void initialize Initialize A and B fo...

Страница 196: ...ple B 13 as ticks in the libj9jit24 so helper routine jitMonitorEntry in the AIX pthreads library libpthreads a and in the AIX kernel routine_check_lock This Java program clearly has excessive lock co...

Страница 197: ...ation levels by the JIT compiler Most of the ticks appear in the final highly optimized version of the doWork D method into which the initialize V and calculate V methods are inlined by the JIT compil...

Страница 198: ...on Make sure we start from scratch opcontrol shutdown Load the Oprofile module if required and makes the Oprofile driver interface available opcontrol init Clear out data from current session opcontro...

Страница 199: ...r routine analysis on page 177 Thread state analysis Multi threaded Java applications especially applications that are running on top of WebSphere Application Server often have many threads that might...

Страница 200: ...more information Manuals animated demonstrations and sample reports are also available on the WAIT website For more information about WAIT go to http wait researchlabs ibm com This site also has sampl...

Страница 201: ...tions This appendix describes the optimization and tuning of the POWER7 processor based server running third party applications It covers the following topics Migrating Oracle to POWER7 Migrating Syba...

Страница 202: ...nstallation information AIX patches tuning implementation suggestions and so on Oracle 11gR2 Readme Release Notes http docs oracle com cd E11882_01 relnotes 112 e10 853 toc htm Oracle Database Release...

Страница 203: ...nsf Web Index WP101176 11gR2 planning and implementation Oracle DB RAC 10gR2 on IBM AIX Tips and Considerations http www ibm com support techdocs atsmastr nsf Web Index WP101089 10gR2 planning and imp...

Страница 204: ...cture and Tuning on AIX v2 20 found at http www ibm com support techdocs atsmastr nsf WebIndex WP100883 The following sections describe memory CPU I O network and miscellaneous settings In addition th...

Страница 205: ...OCK_SGA TRUE The AIX parameters to enable pinned memory and 16 MB large pages are vmo p o v_pinshm 1 allows pinned memory and requires reboot vmo p o lgpg_size 16777216 o lgpg_regions number_of_large_...

Страница 206: ...s that are sensitive to a degree of parallelism might change behavior because of the migration to POWER7 Reviewing the PARALLEL_MAX_SERVERS parameter after migration but set DB_WRITER PROCESSES to the...

Страница 207: ...with no options Redo logs Create the logs with agblksize of 512 and mount with no options With SETALL use direct I O for Redo logs Control files Create the files with agblksize of 512 and mount with n...

Страница 208: ...p docs oracle com cd E18283_01 server 112 e16102 asminst htm CHDBBIBF Network This section outlines the minimum values that are applicable to network configurations Kernel configurations The following...

Страница 209: ...unning iostat Dl might indicate a need to increase queue depth max_transfer might need to be adjusted upward depending on the largest I O requested by Oracle a typical starting point for Oracle on AIX...

Страница 210: ...wer Systems available at http www ibm com support techdocs atsmastr nsf WebIndex WP102105 The subtitle of this paper explains its purpose It presents the case for POWER7 as the optimal platform for Sy...

Страница 211: ...AD_SCOPE S Same as AIXTHREAD_MNRATIO 1 1 Firm suggestion Setting AIXTHREAD_SCOPE S means that user threads created with default attributes are placed into system wide contention scope If a user thread...

Страница 212: ...o 60 min iqnumbercpus 4 50 iqnumbercpus 4 2 numConnections 2 1 On a 4 core single threaded system with gm number of connections it is set to 20 that is iqmt 60 4 50 4 4 2 20 2 1 285 However if SMT4 mo...

Страница 213: ...ur results suggest that IQ 15 2 can attain dedicated level performance with a 1 4 entitlement to virtual processor ratio providing that the other configuration suggestions are followed Migrating SAS t...

Страница 214: ...user process If the maximum number of processes for a single user exceeds 2000 increase the value of maxuproc to prevent SAS processes from abnormal shutdown or delay Increase the maxuproc setting by...

Страница 215: ...file systems and SAS data file systems on physically separate disks Use multiple storage server controllers to further separate and isolate the I O traffic between SAS temporary and data spaces Use mu...

Страница 216: ...isks in the array For example strip size x number of disks stripe size The AIX LVM stripe size that you can select from the smit lv create panel is the single strip size not stripe It is the size of d...

Страница 217: ...Os that cannot be serviced in the disk queues go into the single wait queue of the dpo device The benefit of this situation is that the dpo device provides fault tolerant error handling This situation...

Страница 218: ...SDD DPO qdepth_enable lsattr El dpo displays the current value Run datapath to change the parameters if at SDD 1 6 or greater Otherwise run chdev For example datapath set qdepth disable Available doc...

Страница 219: ...om many supported database systems including text or multi dimensional online analytical processing OLAP systems You can publish in many different formats to many different publishing systems Platform...

Страница 220: ...Quick Sizer tool is available at http service sap com quicksizer registration required Also the SAP BusinessObjects BI 4 Companion Guide is available on the SAP Quick Sizer landing page at http servic...

Страница 221: ...17 0 473 90 249 pages POWER7 and POWER7 Optimization and Tuning Guide POWER7 and POWER7 Optimization and Tuning Guide POWER7 and POWER7 Optimization and Tuning Guide POWER7 and POWER7 Optimization and...

Страница 222: ...POWER7 and POWER7 Optimization and Tuning Guide POWER7 and POWER7 Optimization and Tuning Guide...

Страница 223: ......

Страница 224: ...ent types of code that runs under the IBM AIX and Linux operating systems focusing on the more pervasive performance opportunities that are identified and how to capitalize on them The technical infor...

Отзывы: