Colibri user's documentation

From CCM
Jump to: navigation, search


The Colibri cluster runs Rocks+ distribution from StackIQ, the same as Gross, so Gross user's documentation applies to some extent, up to local modifications and versions of software.

Note: The data on the research clusters is not backed up. Please backup your important data as needed.

Contents

Using the front end

The front end has 16 cores and 32GB memory. It is meant to be used for interactive work, in particular building code. It has no GPUs.

Using the compute nodes

There are 24 compute nodes, compute-0-0 to compute-0-23, with 16 cores, 64GB memory, and 2 NVIDIA FERMI GPUs each. Compute nodes are assigned by the scheduler and nothing should run on them directly, except on nodes that are taken out of scheduler queues.

The following compute notes are currently taken out of the queues and may be used by logging into them directly:

compute-0-0
compute-0-26

There are two queues available. all.q has 17 nodes allocated and hex.q has 6 nodes allocated. If you do not specify a queue, it will use both queues for running the job.

When submitting the job use -q [queuename] to specify the queue. The -q switch must be the first switch in the command line.

  • To get a more detailed look at what jobs are running at which node in which queue, use qstat -f -u '*'
  • To look at the detailed status of a job, use qstat -f -j [job id]

Using special nodes

The following nodes have been taken out of the job scheduler for interactive use:

interactivegpu-0-0 is a node for GPU use.

bigmem-0-0 is a large memory node with 32 cores and 128GB memory. It has no GPUs.

Compiling and running MPI jobs

gcc-mvapich

For some reason mvapich is currently not working.

gcc-openmpi

  • use mpi-selector-menu to choose openmpi-1.7.4, log out and back in.
  • build your executable using gcc, mpicc, or mpif90
  • create a batch job script like the following.
#!/bin/bash
#$ -pe orte 144
#$ -cwd
#$ -j y
#$ -S /bin/bash
cat $0
cp $PE_HOSTFILE pe_hostfile
awk '{print $1 " slots=" $2}' < pe_hostfile | sed 's/\.local//' > hostfile
echo hostfile:
cat hostfile
echo NSLOTS=$NSLOTS
/usr/mpi/gcc/openmpi-1.7.4/bin/mpirun -np $NSLOTS -hostfile hostfile <your code>
  • make the script executable: chmod +x <your batch job script file>
  • submit: qsub <your batch job script file>
  • use qstat -j to check where and how is the job running

Note that the hostfile provided by SGE is not compatible with openmpi and it is processed by the script to contain lines like

compute-0-1 slots=16

PGI compilers

The PGI compilers are set up and working on the front end.

MPI for PGI over Infiniband is not available (yet), the MPI environment below is generic over ethernet as it came with the compilers and if you use it, it conflicts with the system MPI that is set up by mpi-selector-menu.

Unlike gfortran, PGI fortran has an actually functioning Fortran debugger, called pgdbg

To ensure compiled code will run on the nodes since hardware varies in the cluster use compiler switch -tp=x64. For example create a binary using:

pgf90 -tp=x64 source.F -o binary

otherwise the binary may crash on illegal instruction on other nodes than the front end.

On colibri users can setup their environment to use PGI by sourcing one of the following depending on which shell they are using:

/share/apps/pgi/linux86-64/14.3/pgi.csh
/share/apps/pgi/linux86-64/14.3/pgi.sh

You can also try MPI, but it will work over ethernet only: source one of

/share/apps/pgi/linux86-64/14.3/mpi.csh
/share/apps/pgi/linux86-64/14.3/mpi.sh

To set up the environment for building WRF, source /share_home/jmandel/bin/pgi

Python 2.7

Add to your .bashrc the line

export PATH=/export/apps/anaconda2/bin:$PATH

See also

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox