Compilers & MPI Libraries

Intel® Compilers

To use the Intel compiler installations on POD, you will need a valid Intel license that can be accessed from inside the POD cluster. There are three ways for customers to access Intel licenses:

1. License File - Upload a valid license file into your $HOME/intel/licenses directory to enable immediate access.

2. FlexLM Server on POD - Penguin will host FlexLM servers to serve your license free of charge. If you wish to migrate your Intel license to POD and have it hosted on a FlexLM server, please contact the POD support team: pod@penguincomputing.com.

3. Remote FlexLM Server - Alternatively, the POD support team can help you establish a secure tunnel to access an existing FlexLM server in your organization. Please contact the POD support team to help establish a tunnel to your license server: pod@penguincomputing.com.

The different versions of Intel software distributions available on POD can be listed by running the module avail intel command. The example below shows a license file installed in the appropriate folder inside a user’s home directory and the versions currently available on the MT1 cluster.

$ ls $HOME/intel/licenses
mypod.lic

$ module -l avail intel
- Package -----------------------------+- Versions -+- Last mod. ------
/public/modulefiles:
intel/11.1.0                                         2013/04/25 13:19:03
intel/12.1.0                                         2013/12/13 18:37:52
intel/2015                                           2016/06/09 16:24:13
intel/2016                                           2016/10/06  1:17:13
intel/2018                                           2017/10/09 20:15:42
intel_aps/2018.beta                                  2017/05/05 19:05:31

Portland Group PGI® Compilers

Please contact the POD admin team to enable access to the PGI compilers on the MT1 & MT2 clusters. Use the module avail pgi command to see the current list of available PGI distributions installed on POD. The example below shows the different versions currently available on the MT1 cluster.

$ module -l avail pgi
- Package -----------------------------+- Versions -+- Last mod. ------
/public/modulefiles:
pgi/10.9                                             2015/12/14 23:24:44
pgi/11.9                                             2015/12/14 23:24:18
pgi/13.6                                             2015/12/14 23:23:11
pgi/15.10                                            2016/11/08 19:51:22

Running MPI Jobs

Scaling up your jobs to use multiple CPU cores on multiple machines require the ability to share data and messages between individual processes. While, a Message Passing Interface (MPI) library can enable this functionality very easily. Your application must be written and compiled using an MPI library. Multiple different library implementations and versions are provided by default on both POD clusters for your use at both compile time and runtime. Common implementations available on POD include: OpenMPI, Platform MPI, Intel MPI, etc.

MPI Job Syntax

OpenMPI is an open source implementation of an MPI library and is readily available on POD for your MPI-enabled jobs. Since OpenMPI is already optimized for POD, there is no need to specify a machine file or -np when using mpirun at runtime. By default, mpirun will launch an MPI rank per core using the nodes provided by the scheduler. Just like any other application on POD you will need to load the appropriate module for the specific implementation and version of MPI you intend to use. The following example will launch 96 MPI ranks on 8 nodes using OpenMPI 1.5.5.

#PBS -S /bin/bash
#PBS -l nodes=8:ppn=12
#PBS -l walltime=01:00:00

module load openmpi/1.5.5/gcc.4.4.7
mpirun /path/to/binary

exit $?

System Memory/CPU Core

MPI-enabled jobs run multiple parallel ranks on a single node so they will need to share the available system resources including memory. Your application may require a specific amount of memory-per-core to run efficiently. Detailed below are the current memory-per-core capacities, available by default, for each of the queues on the POD MT1 and MT2 clusters.

Queue

Compute Node Architecture

Memory

Cores

Mem/Core

S30

Dual Intel® Xeon® Gold 6148 (Skylake)

384 GB

40

9.6 GB/core

B30

Dual Intel® E5-2600v4 Series (Broadwell)

256 GB

28

9.1 GB/core

T30

Dual Intel® E5-2600v3 Series (Haswell)

128 GB

20

6.4 GB/core

M40

Dual Intel® X5600 Series (Westmere)

48 GB

12

4.0 GB/core

H30

Dual Intel® E5-2600 Series (Sandy Bridge)

64 GB

16

4.0 GB/core

H30G

Dual Intel® E5-2600 Series (Sandy Bridge)

64 GB

16

4.0 GB/core

Use Fewer Ranks for More Memory-Per-Core

A single CPU core on POD provides up to 9.6 GB of dedicated memory. If your application requires more memory than this, you must request the resources required to satisfy your memory requirements and also limit the number of MPI ranks launched. Use the mpirun arguments --npernode or --loadbalance along with -np to customize your memory availability and MPI rank count.

For instance, a 24 core job that needs 8 GB per MPI rank will require that you request 4 nodes from the M40 queue using 48 total cores but only run using 24 MPI ranks. In this configuration half the CPU cores will be unused but there will be twice the amount of memory available for each MPI rank.

#PBS -S /bin/bash
#PBS -N FewerRanksMPI-example
#PBS -q M40
#PBS -j oe
#PBS -l nodes=4:ppn=12         # 4 nodes x 12 cores = 48 cores
#PBS -l walltime=01:00:00

# load OpenMPI
module load openmpi/1.5.5/gcc.4.4.7

# Enter the PBS folder from which qsub is run
cd $PBS_O_WORKDIR

# limit mpirun to 24 cores and loadbalance the MPI ranks over all 4 nodes
mpirun -np 24 --loadbalance /path/to/binary

# alternatively, use --npernode
# mpirun -np 24 --npernode 6 /path/to/binary

exit $?

OpenMPI

More application-specific example templates can be found in /public/examples on MT1 and MT2. This example will run best on the MT1 cluster because it is using the M40 queue. If you plan on using this script you will need to update the call to mpirun with the path to your MPI binary.

#PBS -S /bin/bash
#PBS -N OpenMPI-example
#PBS -q M40
#PBS -j oe
#PBS -l nodes=4:ppn=12
#PBS -l walltime=01:00:00

# Load the ompi environment.  Use 'module avail' from the
# command line to see all available modules.
module load openmpi/1.5.5/gcc.4.7.2

# Display some basics about the job
echo
echo "================== nodes ===================="
cat $PBS_NODEFILE
echo
echo "================= job info  ================="
echo "Date:   $(date)"
echo "Job ID: $PBS_JOBID"
echo "Queue:  $PBS_QUEUE"
echo "Cores:  $PBS_NP"
echo "mpirun: $(which mpirun)"
echo
echo "=================== run ====================="

# Enter the PBS folder from which qsub is run
cd $PBS_O_WORKDIR

# Run your application with mpirun. Note that no -mca btl options
# should be used to ensure optimal performance.  Jobs will use
# InfiniBand by default.
time mpirun /path/to/binary
retval=$?

# Display date and return value
echo
echo "================== done ====================="
echo "Date:   $(date)"
echo "retval: $retval"
echo

# vim: syntax=sh

OpenMPI is strongly encouraged on POD as the OpenMPI releases seen in module avail openmpi are optimized for the POD InfiniBand environment. In the rare case where a commercial application requires the use of a different MPI implementation, below are some special considerations.

IBM/Platform MPI

If your application requires running on MT1 using the Platform MPI library, it is necessary to add the following variables and update the call to mpirun with the following options. Please note that the $PBS_NODEFILE will be automatically generated by the scheduler for use inside the PBS TORQUE job.

#PBS -S /bin/bash
#PBS -N PlatformMPI-example
#PBS -q M40
#PBS -j oe
#PBS -l nodes=4:ppn=12
#PBS -l walltime=01:00:00

# load Platform MPI
module load platform_mpi/09.01.02

# Platfrom MPI-specific environemnt variables
export MPI_MAX_REMSH=32
export MPI_REMSH=/usr/bin/bprsh

# Enter the PBS folder from which qsub is run
cd $PBS_O_WORKDIR

# Platform MPI-specific call to mpirun
mpirun -psm -hostfile $PBS_NODEFILE /path/to/binary

exit $?

Intel MPI

If your application requires an older version of Intel MPI, you will need to configure your job to appropriately use a TMI configuration that leverages InfiniBand and the Qlogic/Intel PSM libraries. The following example can be used as a template to run Intel MPI jobs with the optimal TMI configuration on MT1.

#PBS -S /bin/bash
#PBS -N IntelMPI-example
#PBS -q M40
#PBS -j oe
#PBS -l nodes=4:ppn=12
#PBS -l walltime=01:00:00

# load Intel MPI
module load intel/11.1.0

# Intel MPI-specifc environment variables
export I_MPI_FABRICS=shm:tmi
export TMI_CONFIG=/etc/tmi.conf
export I_MPI_TMI_LIBRARY=/usr/lib64/libtmi.so
export I_MPI_TMI_PROVIDER=psm
export I_MPI_MPD_RSH=/usr/bin/ssh
export I_MPI_DEBUG=5  # optional

# PBS_NUM_NODES = number of compute nodes allocated fo job
# PBS_NP = number of MPI ranks (cores in nodes=X:ppn=Y)
# PBS_NODEFILE = /var/spool/torque/aux/<jobid> from mother superior

# Enter the PBS folder from which qsub is run
cd $PBS_O_WORKDIR

# Start mpd processes on each compute node
mpdboot -n $PBS_NUM_NODES -f $PBS_NODEFILE -r $I_MPI_MPD_RSH

## Run your program here
mpdrun -np $PBS_NP /path/to/binary

# Stop mpd processess on each compute node
mpdallexit

exit $?