Difference between revisions of "Interactive Jobs"

From HPC users
Jump to navigationJump to search
Line 1: Line 1:
{{warningbox|Interactive jobs should not be used unless needed for a specific reason. If you have to use an interactive session on the compute nodes please keep it short (do not request more than 8h per session) and log out as soon as you done with the interactive work!
{{warningbox|Interactive jobs should not be used unless needed for a specific reason. If you have to use an interactive session on the compute nodes please keep it short (do not request more than 8h per session) and log out as soon as you done with the interactive work!
}}
}}

Revision as of 14:50, 11 February 2022


Interactive jobs should not be used unless needed for a specific reason. If you have to use an interactive session on the compute nodes please keep it short (do not request more than 8h per session) and log out as soon as you done with the interactive work!

Interactive Login and Jobs

Method 1: Start a bash shell on a compute node and reserve four CPUs/cores (with default run time and memory):

hpcl001$ srun --pty -p carl.p --ntasks=1 --cpus-per-task=4 bash
mpcs001$ <execute commands on the compute node>

Of course, you can change the requested resources with the usual command-line options, e.g. to request more memory or even GPUs (with --gres=gpu:1 and the appropriate partition). A typical use case could be the extensive testing of a thread-parallel program.

Method 2: Allocate resources on the compute nodes and then run applications interactivly with srun:

hpcl001$ salloc -p carl.p --nodes=2 --ntasks-per-node=24
hpcl001$ srun ./mpi_program [options]

With the combination of salloc and srun, you can lauch MPI-parallel applications interactively, which can be useful for testing again. Again, you can change the resources, such as memory and runtime, in the usual way. srun replaces the maybe more familiar mpirun here and makes sure, that the MPI program is running on the allocated ressources. Please note, that Intel MPI requires you to set

hpcl001$ export I_MPI_PMI_LIBRARY=/cm/shared/apps/slurm/current/lib64/libpmi.so

to work properly (see also Parallel Jobs with MPI)

Method 3: Not strictly an interactive job but once a SLURM job (batch or interactive) is running on a compute node, you can ssh into this node. This can help you to monitor your jobs better, for example you can check the CPU utilization of your processes or review the output to log file in /scratch. Important note: Do not misuse this option and do not interfere with the jobs of other users! Log out from the compute node once you have the information you wanted.

Hints: Typically, you should be able to get a few resources for interactive jobs immediately. You can use sinfo to see which partition has free resources to offer (typically it helps to have a short time limit). If the cluster is exceptionally busy, of if you need more than just a few resources, then it may help to plan your interactive work in advance. This can be done using the --begin command-line option with salloc:

hpcl001$ salloc -p carl.p --nodes=2 --ntasks-per-node=24 --begin=2019-08-01T10:00 &

This would create an allocation that will not start before the given date and time (see the manpage of salloc for details). You can logout afterwards and the requested allocation will remain in the queue:

hpcl001 ~]$ squeue -u $USER
  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
8004770    carl.p     bash abcd1234 PD       0:00      2 (BeginTime)

Note, that the allocation will not be granted before the given time, but it may be granted later if the resources are not free at that time (check with scontrol show job <jobid> in the field StartTime). Once the allocation was granted, you can use srun to run applications on the allocated resources:

hpcl001 ~]$ srun --jobid=<jobid> <application>

Once you are finished with your interactive work, you can use the command

hpcl001 ~]$ scancel <jobid>

to remove your allocation.

Please only use this option if you can make 100% sure that you will be able to use the resources at the time when they get allocated. Otherwise they will remain idle and hte jobs of other users are waiting in the queue for no reason.

Interactive Login with X11-forwarding

If you need graphical output during an interactive session things become more complicated. SLURM does not natively support X11-forwarding for jobs running on the compute nodes. Here is a work around that can be used:

First, login to the cluster with X11-fowarding enabled:

ssh -X abcd1234@carl.hpc.uni-oldenburg.de

Next, get a copy of srun.x11 with the command

git clone https://github.com/jbornschein/srun.x11.git

which will create a directory srun.x11 with some files in it (you may want to cd to a preferred directory before using git clone). One of the files is named srun.x11 and it is recommended to modify it ((-) old line, (+) new line) as follows:

(-) trap "{ /usr/bin/scancel -Q $JOB; exit; }" SIGINT SIGTERM EXIT
(+) trap "{ scancel -Q $JOB; exit; }" SIGINT SIGTERM EXIT

and

(-)    sleep 1s
(+)    sleep 2s

After that you can create an interactive session using e.g. the command

/path/to/srun.x11 -p carl.p -n 1

You can add any of the options from the sbatch command you would like to use (e.g. --gres=gpu:1 if you also use partition mpcg.p). In the interactive session, this

$ module load gnuplot
$ gnuplot
> plot sin(x)

should open a windows showing the plot of sin(x) on your machine (if you have problems loading module a module restore may help).