JupyterLab

From HPC users
Jump to navigationJump to search

Introduction

Jupyter Notebook (formerly IPython Notebook) is a web-based interactive computational environment for creating notebook documents. Jupyter Notebook is built using several open-source libraries, including IPython, ZeroMQ, Tornado, jQuery, Bootstrap, and MathJax. A Jupyter Notebook application is a browser-based REPL containing an ordered list of input/output cells which can contain code, text (using Github Flavored Markdown), mathematics, plots and rich media.
...
JupyterLab is a newer user interface for Project Jupyter, offering a flexible user interface and more features than the classic notebook UI.[1]

Available modules

On CARL and EDDY, you can use find the available modules with

$ module spider JupyterLab

and to use JupyterLab you can use e.g.

$ module load hpc-env/12.2
$ module load JupyterLab

to load the most recent version of JupyterLab.

Jupyter notebooks are also available through other modules like IPython. In the future, we plan to offer a JupyterHub for easier use of notebooks on the cluster.

Usage

Jupyter notebooks are typically intended for interactive usage, which is a bit at odds with using an HPC cluster. Interactive use of the HPC cluster is of course possible, however one should keep in mind that the allocated ressources should be freed as soon as possible. It is also possible to run notebooks in batch mode.

If you want to use JupyterLab on the HPC cluster, you need to able to access the web interface. This can be acomplished following these instructions, which are repeated here in a slightly updated form. Basically, three steps are needed:

  1. start a JupyterLab job on the cluster
  2. establish an ssh-tunnel from your local computer to the compute node, where your JupyterLab job is running via the login nodes of the cluster
  3. open the web interface

In more detail:

Start a JupyterLab: This can be done by simply submitting the following job script with sbatch

#!/bin/bash
#SBATCH --partition carl.p
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 4      # adjust the number of cpus (cores) as needed
#SBATCH --mem-per-cpu 5000     # adjust as needed, this is the default in carl.p
#SBATCH --time 2:00:00
#SBATCH --job-name JupyterLab
#SBATCH --output jupyterlab-%J.out

## load modules (additional modules can be added if needed)
module load hpc-env/12.2
module load JupyterLab/3.5.0-GCCcore-12.2.0
module load nodejs/18.12.1-GCCcore-12.2.0     # used by Jupyter

## workspaces location
export JUPYTERLAB_WORKSPACES_DIR=$HOME/.local/share/jupyter/workspaces

## get tunneling info
XDG_RUNTIME_DIR=""
port_test=blocked
nport_test=0
while [[ $port_test == "blocked" && $nport_test -lt 10 ]]
do
        ipnport=$(shuf -i18000-19999 -n1)
        nport_test=$((nport_test + 1))
        port_test=$(netstat -tulpn 2> /dev/null | grep -q ":$ipnport" && echo blocked || echo free)
        echo "Attempt $nport_test: Checked port $ipnport, port is $port_test ..."
done
if [ $port_test == "blocked" ]
then
        echo "Failed to find an unused port."
        exit 1
fi
ipnip=$(hostname -i)

## print tunneling instructions to jupyterlab-{jobid}.out
echo -e "
   Paste this ssh command in a terminal on local host (i.e., laptop)
   -----------------------------------------------------------------
   ssh -N -L $ipnport:$ipnip:$ipnport {user@host}

   Open this address in a browser on local host; see token below.
   -----------------------------------------------------------------
   localhost:$ipnport  (prepend with https:// if using a password)
   "

## launch a jupyter server on the specified port & ip
jupyter lab --no-browser --port=$ipnport --ip=$ipnip

In this job script, you can of course adjust the requested ressources (partition, number of cpus, memory, time limit, and so on) and also the modules, that should be loaded (for example, if you load SciPy-bundle, you can also use numpy in your notebooks). Once the job is running, you should check the output file jupyterlab-{jobid}.out which provides the information for the next steps.

Establish an SSH-Tunnel: In the output file from your JupyterLab job you will find a line like

ssh -N -L 18496:10.151.9.23:18496 {user@host}

where the port number (here 18496) and IP (here 10.151.9.23) will be different for your job. Run this command on your local computer with abcd1234@carl.hpc.uni-oldenburg.de replacing {user@host}. You can run this in a terminal (Linux, MacOS) or in the PowerShell (Windows). Note, that the command does not have any ooutput after you entered your password successfully (you could add the option -v to change that). To terminate the tunnel, you can simply press Ctrl-c.

Open Web Interface: This can be done by using the http-address from the output file that look like

http://127.0.0.1:18496/lab?token=5ca....

where 127.0.0.1 is the standard IP connected to your local host, which is tunneled to the compute node on the given port (here again 18496). Alternatively, you use enter

http://localhost:18496/

in you web browser and then copy the token into the provided form.

After you have completed these steps, you should see your JupyterLab interface, which allows you to open Notebook files (*.ipynb) or interactively run Python code.