background image

ParaStation5 Administrator's Guide

83

Appendix A. Quick Installation Guide

This appendix gives a brief overview how to install ParaStation5 on a cluster. A detailed description can be
found in Chapter 3, Installation and Chapter 4, Configuration.

1. Shutdown

If this is an update of ParaStation, first shut down the ParaStation system. In order to do this, startup
psiadmin and issue a 

shutdown

 command.

  # /opt/parastation/bin/psiadmin

  psiadmin> shutdown

This will terminate all currently running tasks controlled by ParaStation, including psiadmin.

2. Get the installation packages

Get  the  necessary  installation  packages  from  the  download  section  of  the  ParaStation  homepage
www.parastation.com.  Required  packages  are 

psmgmt

pscom

  and 

psmpi2

.  The  documentation

package 

psdoc

 is optional.

If you want to compile the packages yourself, download the source packages (*.src.rpm) and rebuild it,
using the rpmbuild command, e.g.:

  # rpmbuild --rebuild psmgmt.5.0.0-0.src.rpm

  # rpm -U psmgmt.5.0.0-0.i586.rpm

  # rpmbuild --rebuild pscom.5.0.0-0.src.rpm

  # rpm -U pscom.5.0.0-0.i586.rpm

  # rpm -U pscom-modules.5.0.0-0.i586.rpm

  # rpmbuild --rebuild psmpi2.5.0.0-1.src.rpm

  # rpm -U psmpi2.5.0.0-1.i586.rpm

The 

psmgmt

 package must be installed before the 

pscom

 package may be built, similar for 

pscom

 and

psmpi2

. If you only want to rebuild the kernel modules for the p4sock protocol, use

  # rpmbuild --rebuild --with modules pscom.5.0.0-0.src.rpm

This will render a RPM package with the ParaStation kernel modules suitable for your setup.

3. Install software on the server

Install the ParaStation distribution files on the server machine, if not yet done:

  # rpm -U psmgmt.5.0.0-0.i586.rpm pscom.5.0.0-0.i586.rpm \

  pscom-modules.5.0.0-0.i586.rpm psmpi2.5.0.0-1.i586.rpm \

  psdoc.5.0.0-0.noarch.rpm

4. Install software on the compute nodes

Repeat step 3 for each node. You may omit the documentation package.

5. Configuration

Next,  the  configuration  file 

parastation.conf

  has  to  be  adapted  to  the  local  settings.  The

template file 

/opt/parastation/config/parastation.conf.tmpl

 should be copied to 

/etc/

parastation.conf

  and  adjusted  to  the  local  needs.  The  configuration  could  be  verified  using  the

command test_config(1) located in 

/opt/parastation/bin

.

This configuration file must be copied to all other nodes.

6. Startup ParaStation

Содержание PARASTATION5 V5

Страница 1: ...Administrator s Guide Release 5 0 5 Published April 2010...

Страница 2: ...rTec logo and the ParaStation logo are trademarks of ParTec Cluster Competence Center GmbH Linux is a registered trademark of Linus Torvalds All other marks and names mentioned herein may be trademark...

Страница 3: ...ys ps4 local 19 5 2 4 p4stat 19 5 3 Controlling process placement 19 5 4 Using the ParaStation5 queuing facility 20 5 5 Exporting environment variables for a task 20 5 6 Using non ParaStation applicat...

Страница 4: ...tartup 31 6 8 Problem pssh fails 31 6 9 Problem psid does not startup reports port in use 31 6 10 Problem processes cannot access files on remote nodes 32 I Reference Pages 33 parastation conf 35 psia...

Страница 5: ...re software project The communication platform used then was Myrinet a Gigabit interconnect developed by Myricom The development of ParaStation2 still took place at the University of Karlsruhe ParaSta...

Страница 6: ...part of it s portfolio At the end of 2007 ParaStation5 was released supporting MPI2 and even more interconnects and especially protocols like DAPL ParaStation5 is backward compatible to the previous P...

Страница 7: ...addition a couple of libraries providing communication and management functionality must be installed All libraries are provided as static versions which will be linked to the application at compile...

Страница 8: ...ork drivers These drivers are based on standard device drivers for the corresponding NICs and especially tuned for best performance within a cluster environment They will also support all standard com...

Страница 9: ...rst a so called administration network which is used to handle all the administrative tasks that have to be dealt with within a cluster Besides commonly used services like sharing of NFS partitions or...

Страница 10: ...hin the 2 4 and 2 6 kernel streams Using InfiniBand and Myrinet requires additional modules and may restrict the supported kernels 3 2 Directory structure The default location to install ParaStation5...

Страница 11: ...re system packages supplying MPIch for GNU Intel Portland Group and Pathscale compilers are available A documentation package is also obtainable The full names of the RPM files follow a simple structu...

Страница 12: ...are built on While compiling the package support for Infiniband will be included if one of the following files where found File Version usr mellanox include vapi evapi h Mellanox usr include infiniban...

Страница 13: ...rease performance and to minimize latency it s highly recommended Using the provided drivers does not influence other network communication While installing the ParaStation management RPM the file etc...

Страница 14: ...le which are built using different compilers like the PGI or Intel compilers on the Intel IA32 platform the Intel compiler on the IA64 platform and the PGI Intel and Pathscale compiler on X86_64 platf...

Страница 15: ...sting These steps will be discussed in Chapter 4 Configuration 3 7 Uninstalling ParaStation5 After stoping the ParaStation daemons the corresponding packets can be removed using etc init d parastation...

Страница 16: ...12 ParaStation5 Administrator s Guide...

Страница 17: ...id 8 Most of these parameters are set to their default value within lines marked as comments Only those that have to be modified in order to adapt ParaStation to the local environment are enabled Addi...

Страница 18: ...tarter and accounter may be ignored for now For a detailed description of these parameters refer to the parastation conf 5 manual page Usually the nodes will be enlisted ordered by increasing ParaStat...

Страница 19: ...o reload the new version of the network drivers it is necessary to reboot the system 4 3 Testing the installation After installing and configuring ParaStation on each node of the cluster the ParaStati...

Страница 20: ...ging up all nodes the communication can be tested using opt parastation bin test_nodes np nodes where nodes has to be replaced by the actual number of nodes within the cluster After a while a result l...

Страница 21: ...sfer data across various networks like Infiniband or 10G Ethernet using a vendor provided libdapl QsNet The libpscom supports the QsNetII transport layer Using the libpscom4elan plug in it may transfe...

Страница 22: ...nnections polling returns the current value for the polling flag 0 never poll 1 poll if otherwise idle number of runable processes number of CPUs 2 always poll Writing this value will immediately chan...

Страница 23: ...idx 0 refs 10 Socket 2 Addr 70 6f 72 74 31 port144 last_idx 0 refs 10 opt parastation bin p4stat n net_idx SSeqNo SWindow RSeqNo RWindow lusridx lnetidx rnetidx snq rnq refs 84 30107 30467 30109 30468...

Страница 24: ...scribed procedure will be circumvented and the processes will be run on the user defined nodes For a detailed discussion of placing processes within ParaStation5 please refer to process placement 7 ps...

Страница 25: ...n 1 To run an administrative task use pssh or mpiexec A n 1 For more details on how to start up serial and parallel jobs refer to mpiexec 8 pssh 8 and the ParaStation5 User s Guide 5 7 ParaStation5 TC...

Страница 26: ...on lib64 libpscomopenib so This variable is automatically exported to all processes started by ParaStation Refer to Section 5 1 ParaStation5 pscom communication library for a full list of available li...

Страница 27: ...in parallel The output of the individual commands is presented in a sophisticated manner showing common parts and differences psh may also be used to copy files to all nodes of the cluster in parallel...

Страница 28: ...ript tok2env bin bash tmp IFS IFS export AFS_TOKEN GetToken uuencode dev stdout IFS tmp Script env2tok bin bash IFS echo AFS_TOKEN uudecode SetToken exec 5 15 Integrating external queuing systems Para...

Страница 29: ...ion with PBS PRO 5 15 4 Integration with LSF Similar to Section 5 15 1 Integration with PBS PRO ParaStation will also recognize the variable LSB_HOSTS provided by LSF This variable holds a list of nod...

Страница 30: ...conf or etc sysconfig networks routes depending on the type of Linux distribution in use 5 17 Copying files in parallel To copy large files to many or all nodes in a cluster at once pscp is very hand...

Страница 31: ...on NUMA based systems This will give hints to the memory management subsystem of the operating system to select nearest memory if available Memory binding may be enabled or disabled globally or on a...

Страница 32: ...psidstarter to reflect the newly assigned port numbers In addition the ParaStation daemon psid 8 uses the UDP port 886 for RDP connections To change this port use the RDPPort directive within parasta...

Страница 33: ...s to be ok up to now check for recent entries within the log file var log messages Be aware the log facility can be modified using the LogDestination within the config file parastation conf Look for l...

Страница 34: ...odes Verify that the program is executable on all nodes 6 4 Problem bad performance Verify that the proper interconnect and or transport is used check for environment variables controlling transport s...

Страница 35: ...tmp username is accessible on each node or change your current directory to a globally accessible directory 6 8 Problem pssh fails Problem users other than root cannot run commands on remote nodes usi...

Страница 36: ...esses cannot access files on remote nodes Problem processes created by ParaStation on remote nodes are not able to access files if this files have enabled access only for a supplementary group the cur...

Страница 37: ...tor s Guide 33 Reference Pages This appendix lists all reference pages related to ParaStation5 administration tasks For reference pages describing user related commands and information refer to the Pa...

Страница 38: ...34 ParaStation5 Administrator s Guide...

Страница 39: ...iguration file template parastation conf tmpl contained in the distributed ParaStation system The template file can be found in opt parastation config Parameters The different parameters are discussed...

Страница 40: ...communication hardware This is mainly used in order to generate the lines shown be the status counter directive of the ParaStation administration tool psiadmin 1 headerscript Define a script called in...

Страница 41: ...ecognized gm Use communication over GM Myrinet The script ps_gm will load the Myrinet gm driver PS_IPENABLED If set to 1 the IP device myri0 is enabled after loading elan Use communication over QsNet...

Страница 42: ...he default value of HWType is none starter true yes 1 false no 0 If the argument is one of yes true or 1 all nodes declared within a Node statement will allow to start parallel tasks unless otherwise...

Страница 43: ...as the stand alone commands to set the corresponding default value E g the line Node node17 16 HWType ethernet p4sock starter yes runJobs no will define the node node17 to have the ParaStation ID 16 F...

Страница 44: ...logs a huge amount of message in the logging destination which is usually the syslog 3 This parameter can be set during runtime via the set psiddebug directive within the ParaStation administration a...

Страница 45: ...number the string infinity or the string unlimited In the two latter cases the data size is set to RLIM_INFINITY DataSize size Set the maximum data size to size kilobytes size is an integer number the...

Страница 46: ...p to 4 0 6 will be enabled Keep in mind that this behavior might collide with the freeOnSuspend feature If the argument is one of no false or 0 ParaStation will disable compatibility mode UseMCast tru...

Страница 47: ...hose CPU slots and physical CPUs and cores is made using a mapping list See CPUmap below The pinProcs parameter can be set during runtime via the set pinprocs directive within the ParaStation administ...

Страница 48: ...teers the actual load introduced by RDP Within the daemon there is a lower limit for all timeout timers of 100 msec Thus the minimal value here is 100 too deadLimit number Dead limit of the RDP status...

Страница 49: ...e 45 ACK is sent piggyback within the next regular packet to this node or as soon as a retransmission occurred If set to 1 each RDP packet received is acknowledged by an explicit ACK Errors No known e...

Страница 50: ...46 ParaStation5 Administrator s Guide...

Страница 51: ...down single nodes or the whole system requires root privilege Options c command command Execute the single directive command and exit d Do not automatically start up the local psid 8 e echo Echo each...

Страница 52: ...Comments begin with the character and continue to end of the line Comments and blank lines are ignored by psiadmin Upon startup psiadmin tries to find the file psiadminrc first in the current directo...

Страница 53: ...ware load mcast memory node proc cnt count rdp summary max max up version nodes list jobs state running state pending state suspended slots tid Report various states of the selected node s or job s De...

Страница 54: ...follows The total number of processes contains all processes managed by the ParaStation system including Logger Forwarder and psiadmin 1 processes Furthermore of course the actual working processes s...

Страница 55: ...e root processes of parallel tasks which converted to a ParaStation Logger process are tagged with L after the user ID System processes which are not counted are marked as Accounting processes are ind...

Страница 56: ...p bindmem adminuser admingroup rl_addressspace rl_core rl_cpu rl_data rl_fsize rl_locks rl_memlock rl_msgqueue rl_nofile rl_nproc rl_rss rl_sigpending rl_stack supplementaryGroups statusBroadcasts rdp...

Страница 57: ...e the job continues to run this is the behavior as long as the flag has the value 0 Since the master node does all the resource management within the cluster only the value on this node actually steer...

Страница 58: ...unaccounted tasks rl_addressspace nodes Show RLIMIT_AS on this node rl_core nodes Show RLIMIT_CORE on this node rl_cpu nodes Show RLIMIT_CPU on this node rl_data nodes Show RLIMIT_DATA on this node rl...

Страница 59: ...in the RDP facility in milli seconds See also parastation conf 5 rdpResendTimeout nodes Show the resend timeout within the RDP facility in milli seconds See also parastation conf 5 rdpMaxACKPend nodes...

Страница 60: ...node In principle nodes might contain an unlimited number of ranges If nodes value is all all nodes of the ParaStation cluster are selected If nodes is empty the node range preselected via the range...

Страница 61: ...user name or to any user If name is preceeded by a or this user is added to or removed from the list of users respectively group name any nodes Grant exclusive access on the selected node s to the spe...

Страница 62: ...10000 PSID_LOG_COMM General daemon communication 0x0020000 PSID_LOG_OPTION Option handling 0x0040000 PSID_LOG_INFO Handling of info request messages 0x0080000 PSID_LOG_PART Partition creation and mana...

Страница 63: ...2 MCAST_LOG_INTR Interrupted syscalls 0x0004 MCAST_LOG_CONN T_CLOSE and new pings 0x0008 MCAST_LOG_5MIS Every 5th missing ping 0x0010 MCAST_LOG_MSNG Every missing ping 0x0020 MCAST_LOG_MSNG Every rece...

Страница 64: ...s only comes into play if the user does not define a sorting strategy explicitely via PSI_NODES_SORT Be aware of the fact that using a batch system like PBS or LSF will set the strategy explicitely na...

Страница 65: ...ee also parastation conf 5 rdpTimeout ms nodes Set the RDP timeout in ms for all selected nodes See also parastation conf 5 deadLimit num nodes Set the dead limit of the RDP status module After this n...

Страница 66: ...untime Files Upon startup psiadmin tries to find psiadminrc in the current directory or in the user s home directory The first file found is parsed and the directives within are executed Afterwards ps...

Страница 67: ...must always run with root privileges Before a process can communicate with the ParaStation system it has to register with the daemon Access may be granted or denied The daemon can deny the access due...

Страница 68: ...ebug command of psiadmin 1 Be aware of the fact that high values of level lead to excessively much debugging output spoiling the syslog 3 or the logfile f configfile file Choose file to be the ParaSta...

Страница 69: ...filename Description test_config reads and analyses the ParaStation4 configuration file Any errors or anomalies are reported By default the configuration file etc parastation conf will be used Options...

Страница 70: ...66 ParaStation5 Administrator s Guide...

Страница 71: ...y node has received data from any node i e an all to all communication was executed a success message is printed and test_nodes exits Otherwise after a certain timeout a message concerning the current...

Страница 72: ...68 ParaStation5 Administrator s Guide...

Страница 73: ...a cluster Synopsis test_pse np num Description This command spawns num processes within the cluster It s intended to test the process spawning capabilities of ParaStation It does not test any communi...

Страница 74: ...70 ParaStation5 Administrator s Guide...

Страница 75: ...Display information for sockets and network connections using the ParaStation4 protocol p4sock Options s sock Display information about open p4sock sockets n net Display information of network connect...

Страница 76: ...72 ParaStation5 Administrator s Guide...

Страница 77: ...ib64 must be pre loaded by both processes using export LD_PRELOAD opt parastation lib64 libp4tcp so For parallel and serial tasks launched by ParaStation this environment variable is exported to all p...

Страница 78: ...74 ParaStation5 Administrator s Guide...

Страница 79: ...d debug flag Print debug information Pattern can be a combination of the following bits Pattern Description 0x010 More warning messages 0x020 Show process information start exit 0x040 Show received m...

Страница 80: ...e Define that a core file should be written in case of a catastrophy By default the core file will be written to tmp coredir dir Defines where to save core files v version Output version information a...

Страница 81: ...ng output h human Print times and timestamps in more human readable form nh noheader Suppress headers st stotopt optstring Defines columns displayed within the user list group list and the total summa...

Страница 82: ...e job list is sorted by Valid entries are user group jobid jobname start end walltime qtime mem vmem cputime queue procs and exit usort criteria Selects the criteria where the user list is sorted by V...

Страница 83: ...roup group list or as a total summary of all jobs Multiple lists can be selected by default all information is shown Lists may be sorted by columns and may be filtered to only show information about a...

Страница 84: ...trator s Guide These column names may also be used for sorting lists where applicable Files var account var account gz var account bz2 Accounting files one per day HOME psaccviewrc Initialization file...

Страница 85: ...pinning bars Instead a detailed message about each received multicast ping is displayed m mcast MCAST Listen to multicast group MCAST Set this to the value of MCastGroup in the ParaStation configurati...

Страница 86: ...82 ParaStation5 Administrator s Guide...

Страница 87: ...m 5 0 0 0 i586 rpm rpm U pscom modules 5 0 0 0 i586 rpm rpmbuild rebuild psmpi2 5 0 0 1 src rpm rpm U psmpi2 5 0 0 1 i586 rpm The psmgmt package must be installed before the pscom package may be built...

Страница 88: ...ation bin psiadmin psiadmin add Alternatively you can start psiadmin 1 with the s option To install the ParaStation daemon as a system service started up at boot time use chk_config a etc init d paras...

Страница 89: ...ever the opportunity to use the software according to this license one time for a limited period of three 3 months It is acknowledged that ParTec has invested an massive amount of labour and financial...

Страница 90: ...ality obligation which complies with this agreement 3 Furthermore Licensee promises not to publish the Software as object code or as source code nor the corresponding comments either totally or in par...

Страница 91: ...indirect or subsequent damages due to errors of the licensed Software 2 ParTec is not aware of any rights of third parties which would oppose University Use or Commercial Use ParTec is not liable howe...

Страница 92: ...nternational Sale of Goods CISG and International Private Law Attachment I Declaration of Origin Material covered by this certificate version release etc ______________________________________________...

Страница 93: ...features like process pinning should be used adjust the existing configuration file Look for pinProcs CPUmap bindMem supplGrps and RLimit Core entries in the new template file parastation conf tmpl c...

Страница 94: ...raStation4 can be run using the new mpiexec command In this case the option b or bnr is required The environment variable PSP_P4SOCK was renamed to PSP_P4S but still recognized Within this version of...

Страница 95: ...MPI This task will not be accounted within the ParaStation process management ie it will not allocate a dedicated CPU Thus administration tasks may be startet in addition to parallel tasks See also S...

Страница 96: ...different memory addresses may vary Parallel Task A bunch of processes distributed within the cluster forming an instance of a parallel application E g a MPI program running on several nodes of a clus...

Страница 97: ...e compute nodes within the cluster This process does not communicate with other processes using MPI ParaStation knows about this process and where it is started from A serial task may use multiple thr...

Страница 98: ...94 ParaStation5 Administrator s Guide...

Отзывы: