background image

Administrator's Guide

Release 5.0.5

Published April 2010

Summary of Contents for ParaStation5

Page 1: ...Administrator s Guide Release 5 0 5 Published April 2010 ...

Page 2: ...arTec logo and the ParaStation logo are trademarks of ParTec Cluster Competence Center GmbH Linux is a registered trademark of Linus Torvalds All other marks and names mentioned herein may be trademarks or registered trademarks of their respective owners ParTec Cluster Competence Center GmbH Possartstr 20 D 81679 München Phone 49 89 99809 0 Fax 49 89 99809 555 http www par tec com info par tec com...

Page 3: ...sys ps4 local 19 5 2 4 p4stat 19 5 3 Controlling process placement 19 5 4 Using the ParaStation5 queuing facility 20 5 5 Exporting environment variables for a task 20 5 6 Using non ParaStation applications 20 5 7 ParaStation5 TCP bypass 21 5 8 Controlling ParaStation5 communication paths 21 5 9 Authentication within ParaStation5 22 5 10 Homogeneous user ID space 23 5 11 Single system view 23 5 12 ...

Page 4: ...startup 31 6 8 Problem pssh fails 31 6 9 Problem psid does not startup reports port in use 31 6 10 Problem processes cannot access files on remote nodes 32 I Reference Pages 33 parastation conf 35 psiadmin 47 psid 63 test_config 65 test_nodes 67 test_pse 69 p4stat 71 p4tcp 73 psaccounter 75 psaccview 77 mlisten 81 A Quick Installation Guide 83 B ParaStation license 85 C Upgrading ParaStation4 to P...

Page 5: ...ure software project The communication platform used then was Myrinet a Gigabit interconnect developed by Myricom The development of ParaStation2 still took place at the University of Karlsruhe ParaStation became commercial in 1999 when ParTec AG was founded This spin off from the University of Karlsruhe now owns all rights and patents connected with the ParaStation software ParTec promotes the fu...

Page 6: ... part of it s portfolio At the end of 2007 ParaStation5 was released supporting MPI2 and even more interconnects and especially protocols like DAPL ParaStation5 is backward compatible to the previous ParaStation4 version 1 3 About this document This manual discusses installation configuration and administration of ParaStation5 Furthermore all the system management utilities are described For a det...

Page 7: ...n addition a couple of libraries providing communication and management functionality must be installed All libraries are provided as static versions which will be linked to the application at compile time or as shared dynamic versions which are pre linked at compile time and folded in at runtime There is also a set of management and test tools installed on the cluster ParaStation5 comes with it s...

Page 8: ...work drivers These drivers are based on standard device drivers for the corresponding NICs and especially tuned for best performance within a cluster environment They will also support all standard communication and protocols To enable best performance within an Ethernet based cluster these drivers should replace their counterparts currently configured within the kernel ParaStation currently comes...

Page 9: ...irst a so called administration network which is used to handle all the administrative tasks that have to be dealt with within a cluster Besides commonly used services like sharing of NFS partitions or NIS tables on a ParaStation cluster this also includes the inter daemon communication used to implement the effective cluster administration and parallel task handling mechanisms This administration...

Page 10: ...thin the 2 4 and 2 6 kernel streams Using InfiniBand and Myrinet requires additional modules and may restrict the supported kernels 3 2 Directory structure The default location to install ParaStation5 is opt parastation Underneath this directory several subdirectories are created containing the actual ParaStation5 installation bin contains all executables and scripts forming the ParaStation system...

Page 11: ...ore system packages supplying MPIch for GNU Intel Portland Group and Pathscale compilers are available A documentation package is also obtainable The full names of the RPM files follow a simple structure name x y z n arch rpm where name denotes the name and thus the content of the packet x y z describes the version number n the build number and arch is the architecture i e one of i586 ia64 x86_64 ...

Page 12: ... are built on While compiling the package support for Infiniband will be included if one of the following files where found File Version usr mellanox include vapi evapi h Mellanox usr include infiniband verbs h OpenFabrics usr local ofed include verbs h OpenFabrics Voltaire Table 3 1 Supported Infiniband implementations To enable Myrinet GM the environment variable GM_HOME must be set To generate ...

Page 13: ...crease performance and to minimize latency it s highly recommended Using the provided drivers does not influence other network communication While installing the ParaStation management RPM the file etc xinetd psidstarter is installed This enables remote startup of ParaStation daemons using the xinetd 8 The xinetd daemon will be triggered to read this file by executing etc init d xinetd reload Refe...

Page 14: ...ble which are built using different compilers like the PGI or Intel compilers on the Intel IA32 platform the Intel compiler on the IA64 platform and the PGI Intel and Pathscale compiler on X86_64 platform These packets of course will depend on the corresponding compilers to be installed Keep in mind that the use of this compilers might require further licenses After downloading the correct MPI pac...

Page 15: ...esting These steps will be discussed in Chapter 4 Configuration 3 7 Uninstalling ParaStation5 After stoping the ParaStation daemons the corresponding packets can be removed using etc init d parastation stop rpm e psmgmt pscom psdoc psmpi2 on all nodes of the cluster ...

Page 16: ...12 ParaStation5 Administrator s Guide ...

Page 17: ...sid 8 Most of these parameters are set to their default value within lines marked as comments Only those that have to be modified in order to adapt ParaStation to the local environment are enabled Additionally all parameters are exemplified using comments A more detailed description of all the parameters can be found in the parastation conf 5 manual page The template file is a good starting point ...

Page 18: ...starter and accounter may be ignored for now For a detailed description of these parameters refer to the parastation conf 5 manual page Usually the nodes will be enlisted ordered by increasing ParaStation IDs beginning with 0 for the first node If a front end node exists and furthermore should be integrated into the ParaStation system it usually should be configured with ID 0 Within an Ethernet cl...

Page 19: ...To reload the new version of the network drivers it is necessary to reboot the system 4 3 Testing the installation After installing and configuring ParaStation on each node of the cluster the ParaStation daemons can be started up These daemons will setup all necessary communication relations and thus will form the virtual cluster consisting of the available nodes The ParaStation daemons are starte...

Page 20: ...nging up all nodes the communication can be tested using opt parastation bin test_nodes np nodes where nodes has to be replaced by the actual number of nodes within the cluster After a while a result like Master node 0 Process 0 31 to 0 31 node 0 31 to 0 31 OK All connections ok PSIlogger done should be reported Of course the number 31 will be replaced by a the actual number of nodes given on the ...

Page 21: ...nsfer data across various networks like Infiniband or 10G Ethernet using a vendor provided libdapl QsNet The libpscom supports the QsNetII transport layer Using the libpscom4elan plug in it may transfer data using the libelan The interconnect and protocol used between two distinct processes is chosen while opening the connection between those processes Depending on available hardware configuration...

Page 22: ...onnections polling returns the current value for the polling flag 0 never poll 1 poll if otherwise idle number of runable processes number of CPUs 2 always poll Writing this value will immediately change the polling strategy recv_net_ack number of received ACKs recv_net_ctrl number of received control packets ACK NACK SYN SYNACK recv_net_data number of received data packets recv_net_nack number of...

Page 23: ..._idx 0 refs 10 Socket 2 Addr 70 6f 72 74 31 port144 last_idx 0 refs 10 opt parastation bin p4stat n net_idx SSeqNo SWindow RSeqNo RWindow lusridx lnetidx rnetidx snq rnq refs 84 30107 30467 30109 30468 84 84 230 0 0 2 85 30106 30466 30106 30465 85 85 231 0 0 2 86 30107 30467 30109 30468 86 86 84 0 0 2 87 30106 30466 30106 30465 87 87 85 0 0 2 88 30107 30467 30109 30468 88 88 217 0 0 2 89 30106 304...

Page 24: ...escribed procedure will be circumvented and the processes will be run on the user defined nodes For a detailed discussion of placing processes within ParaStation5 please refer to process placement 7 ps_environment 5 pssh 8 and mpiexec 8 5 4 Using the ParaStation5 queuing facility ParaStation is able to queue task start requests if required resources are currently in use This queuing facility is di...

Page 25: ... n 1 To run an administrative task use pssh or mpiexec A n 1 For more details on how to start up serial and parallel jobs refer to mpiexec 8 pssh 8 and the ParaStation5 User s Guide 5 7 ParaStation5 TCP bypass ParaStation5 offers a feature called TCP bypass enabling applications based on TCP to use the efficient p4sock protocol The data will be redirected within the kernel to the p4sock protocol N...

Page 26: ...ion lib64 libpscomopenib so This variable is automatically exported to all processes started by ParaStation Refer to Section 5 1 ParaStation5 pscom communication library for a full list of available library variants If more than one path for a particular interconnect exist e g if the nodes are connected by two Gigabit Ethernet networks in parallel it is desirable to pretend the interface and there...

Page 27: ... in parallel The output of the individual commands is presented in a sophisticated manner showing common parts and differences psh may also be used to copy files to all nodes of the cluster in parallel This command is not intended to run interactive commands in parallel but to run a single task in parallel on all or a bunch of nodes and prepare the output to be easily read by the user 5 13 Nodes a...

Page 28: ...cript tok2env bin bash tmp IFS IFS export AFS_TOKEN GetToken uuencode dev stdout IFS tmp Script env2tok bin bash IFS echo AFS_TOKEN uudecode SetToken exec 5 15 Integrating external queuing systems ParaStation can be easily integrated with batch queuing and scheduling systems In this case the queuing system will decide where and when to run a parallel task ParaStation will then start monitor and te...

Page 29: ...tion with PBS PRO 5 15 4 Integration with LSF Similar to Section 5 15 1 Integration with PBS PRO ParaStation will also recognize the variable LSB_HOSTS provided by LSF This variable holds a list of nodes for the parallel task It is copied to the ParaStation variable PSI_HOSTS consequently it will be used for starting up the task The environment variable PSI_NODES_SORT is set to none thus no sortin...

Page 30: ...e conf or etc sysconfig networks routes depending on the type of Linux distribution in use 5 17 Copying files in parallel To copy large files to many or all nodes in a cluster at once pscp is very handy It overlaps storing data to disk and transfering data on the network therefore it scales very well with respect to the number of nodes Arbitrary size of files may be copied even archives containing...

Page 31: ...s on NUMA based systems This will give hints to the memory management subsystem of the operating system to select nearest memory if available Memory binding may be enabled or disabled globally or on a per node basis Refer to the bindMem entry in parastation conf and set bindmem directive of psiadmin for details See also parastation conf 5 and psiadmin 1 for more information 5 21 Spawning processes...

Page 32: ...d psidstarter to reflect the newly assigned port numbers In addition the ParaStation daemon psid 8 uses the UDP port 886 for RDP connections To change this port use the RDPPort directive within parastation conf See parastation conf 5 for details The port numbers must be identical on all cluster nodes Restart xinetd and psid on all nodes to activate the modifications ...

Page 33: ...ms to be ok up to now check for recent entries within the log file var log messages Be aware the log facility can be modified using the LogDestination within the config file parastation conf Look for lines like Mar 24 17 19 12 pan psid 7361 Starting ParaStation DAEMON Mar 24 17 19 12 pan psid 7361 Protocol Version 329 Mar 24 17 19 12 pan psid 7361 c Cluster Competence Center GmbH These lines indic...

Page 34: ...nodes Verify that the program is executable on all nodes 6 4 Problem bad performance Verify that the proper interconnect and or transport is used check for environment variables controlling transport see Section 5 8 Controlling ParaStation5 communication paths and ps_environment 5 Watch protocol counters e g counters indicating timeouts retries errors or other bad conditions For p4sock check recv_...

Page 35: ... tmp username is accessible on each node or change your current directory to a globally accessible directory 6 8 Problem pssh fails Problem users other than root cannot run commands on remote nodes using the pssh command pssh n 0 date PSI dospawn spawn to node 0 failed Permission denied By default only root may spawn processes which are not consuming CPUs The command pssh uses this way to run a pr...

Page 36: ...cesses cannot access files on remote nodes Problem processes created by ParaStation on remote nodes are not able to access files if this files have enabled access only for a supplementary group the current user belongs to By default only the primary group is set for newly created processes To add all groups to a process set the supplGrps flag within parastation conf or use the supplementaryGroups ...

Page 37: ...ator s Guide 33 Reference Pages This appendix lists all reference pages related to ParaStation5 administration tasks For reference pages describing user related commands and information refer to the ParaStation5 User s Guide ...

Page 38: ...34 ParaStation5 Administrator s Guide ...

Page 39: ...figuration file template parastation conf tmpl contained in the distributed ParaStation system The template file can be found in opt parastation config Parameters The different parameters are discussed in the order they should appear within the configuration file Dependencies between parameters resulting in a defined order of parameters are marked explicitely Some parameters may be modified using ...

Page 40: ... communication hardware This is mainly used in order to generate the lines shown be the status counter directive of the ParaStation administration tool psiadmin 1 headerscript Define a script called in order to get a header line for the status message produced by the above discussed statusscript All further parameters defined within a Hardware section are interpreted as environment variables when ...

Page 41: ...recognized gm Use communication over GM Myrinet The script ps_gm will load the Myrinet gm driver PS_IPENABLED If set to 1 the IP device myri0 is enabled after loading elan Use communication over QsNet libelan No script is currently implemented for this communication protocol therefore no environment variables are recognized This communication layer is currently not supported by the ParaStation com...

Page 42: ...The default value of HWType is none starter true yes 1 false no 0 If the argument is one of yes true or 1 all nodes declared within a Node statement will allow to start parallel tasks unless otherwise stated If the argument is one of no false or 0 starting will be not allowed It might be useful to prohibit the startup of parallel task from the frontend machine if a batch system is used This will f...

Page 43: ... as the stand alone commands to set the corresponding default value E g the line Node node17 16 HWType ethernet p4sock starter yes runJobs no will define the node node17 to have the ParaStation ID 16 Furthermore it is expected to have a Ethernet communication using both TCP and p4sock protocols It is allowed to start parallel tasks from this node but the node itself will not run any process of any...

Page 44: ...n logs a huge amount of message in the logging destination which is usually the syslog 3 This parameter can be set during runtime via the set psiddebug directive within the ParaStation administration and management tool psiadmin 1 LogDest LOG_DAEMON LOG_KERN LOG_LOCAL 0 7 LogDestination LOG_DAEMON LOG_KERN LOG_LOCAL 0 7 Set the logging output s destination for the ParaStation daemon psid 8 Usually...

Page 45: ... number the string infinity or the string unlimited In the two latter cases the data size is set to RLIM_INFINITY DataSize size Set the maximum data size to size kilobytes size is an integer number the string infinity or the string unlimited In the two latter cases the data size is set to RLIM_INFINITY MemLock size Set the maximum amount of memory that might be locked into RAM to size kilobytes si...

Page 46: ...up to 4 0 6 will be enabled Keep in mind that this behavior might collide with the freeOnSuspend feature If the argument is one of no false or 0 ParaStation will disable compatibility mode UseMCast true yes 1 false no 0 If the argument is one of yes true or 1 keep alive messages from the ParaStation daemon psid 8 are sent using Multicast messages If the argument is one of no false or 0 ParaStation...

Page 47: ...those CPU slots and physical CPUs and cores is made using a mapping list See CPUmap below The pinProcs parameter can be set during runtime via the set pinprocs directive within the ParaStation administration and management tool psiadmin 1 bindMem true yes 1 false no 0 This parameter must be set to true if nodes providing non Uniform memory access NUMA should use local memory for the tasks This par...

Page 48: ...steers the actual load introduced by RDP Within the daemon there is a lower limit for all timeout timers of 100 msec Thus the minimal value here is 100 too deadLimit number Dead limit of the RDP status module After this number of consecutively missing RDP pings the master declares the node to be dead Only relevant if MCast is not used statusTimeout ms Timeout of the RDP status module After this nu...

Page 49: ...de 45 ACK is sent piggyback within the next regular packet to this node or as soon as a retransmission occurred If set to 1 each RDP packet received is acknowledged by an explicit ACK Errors No known errors See also psid 8 psiadmin 1 ...

Page 50: ...46 ParaStation5 Administrator s Guide ...

Page 51: ...g down single nodes or the whole system requires root privilege Options c command command Execute the single directive command and exit d Do not automatically start up the local psid 8 e echo Echo each executed directive to stdout f file program file Read commands from the file program file Exit as soon as EOF is reached It might be useful to enable echoing e when acting on a script file This opti...

Page 52: ...h Comments begin with the character and continue to end of the line Comments and blank lines are ignored by psiadmin Upon startup psiadmin tries to find the file psiadminrc first in the current directory and then in the user s home directory Only the first one found is really considered Each directive found within this file is handled silently before going either into interactive or batch mode usi...

Page 53: ...dware load mcast memory node proc cnt count rdp summary max max up version nodes list jobs state running state pending state suspended slots tid Report various states of the selected node s or job s Depending on the given argument different information can be requested from the ParaStation system If no argument is given the node information is retrieved all Show the information given by node count...

Page 54: ...s follows The total number of processes contains all processes managed by the ParaStation system including Logger Forwarder and psiadmin 1 processes Furthermore of course the actual working processes started by the users are included The latter ones are the normal processes additionally displayed in the last column of the output mcast List the status of the MCast facility of the ParaStation daemon...

Page 55: ... e root processes of parallel tasks which converted to a ParaStation Logger process are tagged with L after the user ID System processes which are not counted are marked as Accounting processes are indicated by C Other helper processes are marked with S jobs state running state pending state suspended slots tid Show all or selected jobs managed by the ParaStation system If selected only jobs with ...

Page 56: ...ap bindmem adminuser admingroup rl_addressspace rl_core rl_cpu rl_data rl_fsize rl_locks rl_memlock rl_msgqueue rl_nofile rl_nproc rl_rss rl_sigpending rl_stack supplementaryGroups statusBroadcasts rdpTimeout deadLimit statusTimeout rdpClosedTimeout rdpResendTimeout rdpMaxACKPend nodes Show various parameters of the ParaStation system accounters nodes Show information on which node s ParaStation a...

Page 57: ...me the job continues to run this is the behavior as long as the flag has the value 0 Since the master node does all the resource management within the cluster only the value on this node actually steers the behavior handleOldBins nodes Show the compatibility flag for applications linked against version 4 0 x of ParaStation on the selected nodes nodesSort nodes Show the default sorting strategy use...

Page 58: ... unaccounted tasks rl_addressspace nodes Show RLIMIT_AS on this node rl_core nodes Show RLIMIT_CORE on this node rl_cpu nodes Show RLIMIT_CPU on this node rl_data nodes Show RLIMIT_DATA on this node rl_fsize nodes Show RLIMIT_FSIZE on this node rl_locks nodes Show RLIMIT_LOCKS on this node rl_memlock nodes Show RLIMIT_MEMLOCK on this node rl_msgqueue nodes Show RLIMIT_MSGQUEUE on this node rl_nofi...

Page 59: ...hin the RDP facility in milli seconds See also parastation conf 5 rdpResendTimeout nodes Show the resend timeout within the RDP facility in milli seconds See also parastation conf 5 rdpMaxACKPend nodes Show the maximum ACK pending counter within the RDP facility See also parastation conf 5 sleep sec Sleep for sec seconds before continuing to parse the input version Print various version numbers Pr...

Page 60: ...e node In principle nodes might contain an unlimited number of ranges If nodes value is all all nodes of the ParaStation cluster are selected If nodes is empty the node range preselected via the range command is used The default preselected node range contains all nodes of the ParaStation cluster As an extension nodes might also be a hostname that can be resolved into a valid ParaStation ID reset ...

Page 61: ... user name or to any user If name is preceeded by a or this user is added to or removed from the list of users respectively group name any nodes Grant exclusive access on the selected node s to the special group name or to any group If name is preceeded by a or this group is added to or removed from the list of groups respectively maxproc num any nodes Limit the number of running ParaStation proce...

Page 62: ...010000 PSID_LOG_COMM General daemon communication 0x0020000 PSID_LOG_OPTION Option handling 0x0040000 PSID_LOG_INFO Handling of info request messages 0x0080000 PSID_LOG_PART Partition creation and management 0x0100000 PSID_LOG_ECHO Echo each line to parse 0x0200000 PSID_LOG_FILE Logs concerning the file to parse 0x0400000 PSID_LOG_CMNT Comment handling 0x0800000 PSID_LOG_NODE Info concerning each ...

Page 63: ...02 MCAST_LOG_INTR Interrupted syscalls 0x0004 MCAST_LOG_CONN T_CLOSE and new pings 0x0008 MCAST_LOG_5MIS Every 5th missing ping 0x0010 MCAST_LOG_MSNG Every missing ping 0x0020 MCAST_LOG_MSNG Every received ping 0x0040 MCAST_LOG_SENT Every sent ping Table 4 Multicast debug flags freeOnSuspend 0 1 nodes Switch the freeOnSuspend flag on or off on the selected nodes The freeOnSuspend flag steers the b...

Page 64: ...is only comes into play if the user does not define a sorting strategy explicitely via PSI_NODES_SORT Be aware of the fact that using a batch system like PBS or LSF will set the strategy explicitely namely to NONE overbook 0 1 nodes Define if this nodes shall be overbooked upon user request if flag is true or if overbooking should be denied at all false starter 0 1 nodes Define if starting jobs fr...

Page 65: ...See also parastation conf 5 rdpTimeout ms nodes Set the RDP timeout in ms for all selected nodes See also parastation conf 5 deadLimit num nodes Set the dead limit of the RDP status module After this number of consecutively missing RDP pings the master declares the node to be dead Only relevant if MCast is not used See also parastation conf 5 statusTimeout ms nodes Set the Timeout of the RDP statu...

Page 66: ...runtime Files Upon startup psiadmin tries to find psiadminrc in the current directory or in the user s home directory The first file found is parsed and the directives within are executed Afterwards psiadmin goes into interactive mode unless the f is used This file might be used to set some default ranges whenever psiadmin is invoked The startup file is ignored if the option c is used Errors No kn...

Page 67: ... must always run with root privileges Before a process can communicate with the ParaStation system it has to register with the daemon Access may be granted or denied The daemon can deny the access due to several reasons the ParaStation system library of the process and the ParaStation daemon are incompatible the daemon is in a state where it does not accept new connections insufficient resources t...

Page 68: ...debug command of psiadmin 1 Be aware of the fact that high values of level lead to excessively much debugging output spoiling the syslog 3 or the logfile f configfile file Choose file to be the ParaStation configuration file The default is to use etc parastation conf l logfile file Choose file to be the destination for logging output file may be the name of an ordinary file or stdin or stdout The ...

Page 69: ... filename Description test_config reads and analyses the ParaStation4 configuration file Any errors or anomalies are reported By default the configuration file etc parastation conf will be used Options f filename Use configuration file filename d num Set debug level to num v Output version information and exit h usage Show a help message ...

Page 70: ...66 ParaStation5 Administrator s Guide ...

Page 71: ...ry node has received data from any node i e an all to all communication was executed a success message is printed and test_nodes exits Otherwise after a certain timeout a message concerning the current status about the tested connection is posted test_nodes will run as long as any connection between two tested nodes is unable to transport the test packets Options np num Run the testing program on ...

Page 72: ...68 ParaStation5 Administrator s Guide ...

Page 73: ...n a cluster Synopsis test_pse np num Description This command spawns num processes within the cluster It s intended to test the process spawning capabilities of ParaStation It does not test any communication facilities within ParaStation Options np num Spawn num processes See also psid 8 ...

Page 74: ...70 ParaStation5 Administrator s Guide ...

Page 75: ... Display information for sockets and network connections using the ParaStation4 protocol p4sock Options s sock Display information about open p4sock sockets n net Display information of network connections using p4sock v version Output version information and exit help Show a help message usage Display a brief usage message See also p4tcp 8 parastation conf 5 ...

Page 76: ...72 ParaStation5 Administrator s Guide ...

Page 77: ...lib64 must be pre loaded by both processes using export LD_PRELOAD opt parastation lib64 libp4tcp so For parallel and serial tasks launched by ParaStation this environment variable is exported to all processes by default Please refer to ps_environment 5 Options a add Add an address or an address range to the list of redirected addresses New TCP connections directed to a node within this address ra...

Page 78: ...74 ParaStation5 Administrator s Guide ...

Page 79: ...n d debug flag Print debug information Pattern can be a combination of the following bits Pattern Description 0x010 More warning messages 0x020 Show process information start exit 0x040 Show received messages 0x080 Very verbose output 0x100 Show node information on startup Table 5 Psaccounter debug flags As the accounter is typically not run directly but started by the psid 8 the start script opt ...

Page 80: ...re Define that a core file should be written in case of a catastrophy By default the core file will be written to tmp coredir dir Defines where to save core files v version Output version information and exit help Show this help messages usage Display brief usage message Files var account yyyymmdd Accounting files one per day See also psid 8 psaccview 8 and parastation conf ...

Page 81: ...ing output h human Print times and timestamps in more human readable form nh noheader Suppress headers st stotopt optstring Defines columns displayed within the user list group list and the total summary list Valid entries are user group walltime qtime mem vmem cputime jobs cpuweight aqtime and usage sj sjobopt optstring Defines columns displayed within the detailed job list Valid entries are user...

Page 82: ...he job list is sorted by Valid entries are user group jobid jobname start end walltime qtime mem vmem cputime queue procs and exit usort criteria Selects the criteria where the user list is sorted by Valid entries are user jobs walltime qtime mem vmem cputime procs and cpuweight gsort criteria Selects the criteria where the group list is sorted by Valid entries are group jobs walltime qtime mem vm...

Page 83: ...group group list or as a total summary of all jobs Multiple lists can be selected by default all information is shown Lists may be sorted by columns and may be filtered to only show information about a particular user group queue jobname or job exit code The columns to be printed may be defined using formatting options Available column names are aqtime Average queue time only for total summary cpu...

Page 84: ...strator s Guide These column names may also be used for sorting lists where applicable Files var account var account gz var account bz2 Accounting files one per day HOME psaccviewrc Initialization file See also psaccounter 8 ...

Page 85: ...spinning bars Instead a detailed message about each received multicast ping is displayed m mcast MCAST Listen to multicast group MCAST Set this to the value of MCastGroup in the ParaStation configuration file parastation conf 5 The default is 237 which is also the default within psid 8 p port PORT Listen to UDP port PORT Set this to the value of MCastPort in the ParaStation configuration file para...

Page 86: ...82 ParaStation5 Administrator s Guide ...

Page 87: ...om 5 0 0 0 i586 rpm rpm U pscom modules 5 0 0 0 i586 rpm rpmbuild rebuild psmpi2 5 0 0 1 src rpm rpm U psmpi2 5 0 0 1 i586 rpm The psmgmt package must be installed before the pscom package may be built similar for pscom and psmpi2 If you only want to rebuild the kernel modules for the p4sock protocol use rpmbuild rebuild with modules pscom 5 0 0 0 src rpm This will render a RPM package with the Pa...

Page 88: ...tation bin psiadmin psiadmin add Alternatively you can start psiadmin 1 with the s option To install the ParaStation daemon as a system service started up at boot time use chk_config a etc init d parastation This step must be repeated for each node 7 Testing A brief test of the entire communication and management system can be accomplished by using the test_nodes 1 command For a detailed descripti...

Page 89: ...wever the opportunity to use the software according to this license one time for a limited period of three 3 months It is acknowledged that ParTec has invested an massive amount of labour and financial means into the development of the software It is therefore requested from each licensee to return the results of their studies amendments and enhancements free of charge to ParTec in return for the ...

Page 90: ...iality obligation which complies with this agreement 3 Furthermore Licensee promises not to publish the Software as object code or as source code nor the corresponding comments either totally or in part on his own publications or other documentation Any functional description of Licensee s Modifications in particular source code of Modifications which shows Know how such as the structure of the So...

Page 91: ... indirect or subsequent damages due to errors of the licensed Software 2 ParTec is not aware of any rights of third parties which would oppose University Use or Commercial Use ParTec is not liable however for the licensed Software and the licensed Know how being free of rights of third parties 3 If Licensee is accused by third parties of infringing intellectual property rights due to the use of th...

Page 92: ...International Sale of Goods CISG and International Private Law Attachment I Declaration of Origin Material covered by this certificate version release etc _________________________________________________ Was any portion of the software material written by anyone other than you or your employees within the scope of their employment YES NO Was any portion of the software material e g Code associate...

Page 93: ...w features like process pinning should be used adjust the existing configuration file Look for pinProcs CPUmap bindMem supplGrps and RLimit Core entries in the new template file parastation conf tmpl copy them to the current configuration file and adjust them to your needs The configuration file of ParaStation5 located in etc is no longer a symbolic link to opt parastation config parastation conf ...

Page 94: ...araStation4 can be run using the new mpiexec command In this case the option b or bnr is required The environment variable PSP_P4SOCK was renamed to PSP_P4S but still recognized Within this version of ParaStation both names may be used Likewise The environment variable PSP_SHAREDMEM was renamed to PSP_SHM but also still recognized ...

Page 95: ...g MPI This task will not be accounted within the ParaStation process management ie it will not allocate a dedicated CPU Thus administration tasks may be startet in addition to parallel tasks See also Serial Task for tasks accounted with ParaStation admin task See Administrative Task ARP See Address Resolution Protocol Data Network The data network is used for exchanging data between the compute pr...

Page 96: ... different memory addresses may vary Parallel Task A bunch of processes distributed within the cluster forming an instance of a parallel application E g a MPI program running on several nodes of a cluster can only act as a whole but consists of individual processes on each node ParaStation knows about their relationship and can handle them as a distributed parallel task running on the cluster Some...

Page 97: ...he compute nodes within the cluster This process does not communicate with other processes using MPI ParaStation knows about this process and where it is started from A serial task may use multiple threads to execute but all this threads have to share a common address space within a node ...

Page 98: ...94 ParaStation5 Administrator s Guide ...

Reviews: