Globus Data Transfer

From HPC users
Jump to navigationJump to search

This is highly experimental and not an official service offered by the University.

Introduction

Globus provides a secure, unified interface to your research data. Use Globus to 'fire and forget' high-performance data transfers between systems within and across organizations. [1]

Members of the University Oldenburg can use their credentials (abcd1234 and password) to login to the service. Using the web interface you can manage so-called endpoints and transfer large amounts of data between different endpoints (which can be shared by or with you). See the Globus How-tos for details.

Tip: You can SMB-mount your HPC directories (e.g. $DATA or $GROUP) on your local computer to make them the default directory for your personal endpoint.

Using a Virtual Machine to create a Personal Endpoint

It might be useful to use a virtual machine (VM) for creating a personal endpoint so that data transfer does not interfere with your normal computer use. Here is a summary of what needs to be done:

1. Request VM: You can request a VM following the instructions at the web pages from IT services. Make sure that the VM is visible worldwide.

2. Prepare VM: As root, use SMB to mount a filesystem (e.g. $DATA or $GROUP) which will later serve as a directory for data transfer. Note, that the mount will disappear if the system is rebooted. Next, setup Python 3 as the default Python (not strictly needed, but recommended):

# alternatives --install /usr/bin/python python /usr/bin/python2 50
# alternatives --install /usr/bin/python python /usr/bin/python3 60
# alternatives --install /usr/bin/pip pip /usr/bin/pip3 60
# python --version

Normally, Python 3 is already installed, if not use yum install to do so.

3. Install Globus CLI: The globus-cli is needed to connect to the Globus network and manage your private endpoints. It is a Python package so installation is simply done by

# pip install --upgrade globus-cli

(using pip as root is not really recommended but it is still the easiest way to install a package for everyone). Globus CLI Documentation.

4. Install Globus Personal Connect: This allows to run a Globus service on your server which allows the Globus network to connect to your server. The commands are

# cd /opt   # usually a good place for non-standard software 
# wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
--2018-05-22 15:32:32--  https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
Resolving downloads.globus.org (downloads.globus.org)... 52.84.122.197, 52.84.122.3, 52.84.122.100, ...
Connecting to downloads.globus.org (downloads.globus.org)|52.84.122.197|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14500802 (14M) [application/x-tar]
Saving to: ‘globusconnectpersonal-latest.tgz’ 

globusconnectpersonal-latest.tgz             
100%[=====================================================================================>]  13.83M   3.63MB/s    in 3.9s
# tar xzf globusconnectpersonal-latest.tgz
# ln -s globusconnectpersonal-x.y.z globusconnectpersonal

If you now add to the file /etc/skel/.bashrc the two lines

# Globus Personal Connect
PATH=/opt/globusconnectpersonal:$PATH

somewhere in the end, new users should be able to run the commands in the steps below. Existing users have to add the two lines to their own ~/.bashrc.

5. Create User Account: It is recommended to finalize the setup of a personal endpoint with a non-root user which has to be created. It might be useful to use the university login as username and match the uid and gid</gid used by the IDM of the University. To find the two numbers, login to the cluster and run the command

carl$ id
uid=4567(abcd1234) gid=14567(agyourgroup) groups=...

With information go back to your server and first add the group and then the user:

# groupadd -g 14567 agyourgroup
# useradd -g 14567 -u 4567 abcd1234
# passwd abcd1234
New password: ...
# id abcd1234
uid=4567(abcd1234) gid=14567(agyourgroup) groups=...

The last command is just to double-check. Additional groups after the = are not needed in general.

6. Setup Personal Endpoint: Now login with your new abcd1234 user on the server. Then use the commands (follow this guide and references therein)

$ globus login
...
$ globus session show

to login to the Globus network and later check the status. The login process creates a link that you can use to login via the University SSO service. This will generate a token (authorization code) which needs to be copied and pasted to the prompt.

When you are logged in to the Globus network, you can setup an endpoint with the commands

$ globus endpoint create --default-directory /path/to/share --personal abcd1234@uol.de
Message:     Endpoint created successfully
Endpoint ID: 922e9552-0000-00xx-971b-021304b0cca7
Setup Key:   24aa0566-000x-0x0x-9a2d-e000b55fb7cf
$ globusconnectpersonal -setup 24aa0566-000x-0x0x-9a2d-e000b55fb7cf
Configuration directory: /home/abcd1234/.globusonline/lta
Contacting relay.globusonline.org:2223
Done!

where the second command uses the Setup Key from the first command (last line of output). Finally, the Globus connect service needs to be started with