Globus Data Transfer

From HPC users
Revision as of 10:57, 20 February 2020 by Harfst (talk | contribs)
Jump to navigationJump to search

This is highly experimental and not an official service offered by the University.

Introduction

Globus provides a secure, unified interface to your research data. Use Globus to 'fire and forget' high-performance data transfers between systems within and across organizations. [1]

Members of the University Oldenburg can use their credentials (abcd1234 and password) to login to the service. Using the web interface you can manage so-called endpoints and transfer large amounts of data between different endpoints (which can be shared by or with you). See the Globus How-tos for details.

Tip: You can SMB-mount your HPC directories (e.g. $DATA or $GROUP) on your local computer to make them the default directory for your personal endpoint.

Using a Virtual Machine to create a Personal Endpoint

It might be useful to use a virtual machine (VM) for creating a personal endpoint so that data transfer does not interfere with your normal computer use. Here is a summary of what needs to be done:

1. Request VM: You can request a VM following the instructions at the web pages from IT services. Make sure that the VM is visible worldwide.

2. Prepare VM: As root, use SMB to mount a filesystem (e.g. $DATA or $GROUP) which will later serve as a directory for data transfer. Note, that the mount will disappear if the system is rebooted. Next, setup Python 3 as the default Python (not strictly needed, but recommended):

# alternatives --install /usr/bin/python python /usr/bin/python2 50
# alternatives --install /usr/bin/python python /usr/bin/python3 60
# alternatives --install /usr/bin/pip pip /usr/bin/pip3 60
# python --version

Normally, Python 3 is already installed, if not use yum install to do so.

3. Install Globus CLI: The globus-cli is needed to connect to the Globus network and manage your private endpoints. It is a Python package so installation is simply done by

# pip install --upgrade globus-cli

(using pip as root is not really recommended but it is still the easiest way to install a package for everyone). Globus CLI Documentation.

4. Install Globus Personal Connect: This allows to run a Globus service on your server which allows the Globus network to connect to your server. The commands are

# cd /opt   # usually a good place for non-standard software 
# wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
--2018-05-22 15:32:32--  https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
Resolving downloads.globus.org (downloads.globus.org)... 52.84.122.197, 52.84.122.3, 52.84.122.100, ...
Connecting to downloads.globus.org (downloads.globus.org)|52.84.122.197|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14500802 (14M) [application/x-tar]
Saving to: ‘globusconnectpersonal-latest.tgz’ 

globusconnectpersonal-latest.tgz             
100%[=====================================================================================>]  13.83M   3.63MB/s    in 3.9s
# tar xzf globusconnectpersonal-latest.tgz
# ln -s globusconnectpersonal-x.y.z globusconnectpersonal

If you now add to the file /etc/skel/.bashrc the two lines

# Globus Personal Connect
PATH=/opt/globusconnectpersonal:$PATH

somewhere in the end, new users should be able to run the commands in the steps below. Existing users have to add the two lines to their own ~/.bashrc.

5. Create User Account: It is recommended to finalize the setup of a personal endpoint with a non-root user which has to be created. It might be useful to use the university login as username and match the uid and gid</gid used by the IDM of the University. To find the two numbers, login to the cluster and run the command

carl$ id
uid=4567(abcd1234) gid=14567(agyourgroup) groups=...

With information go back to your server and first add the group and then the user:

# groupadd -g 14567 agyourgroup
# useradd -g 14567 -u 4567 abcd1234
# passwd abcd1234
New password: ...
# id abcd1234
uid=4567(abcd1234) gid=14567(agyourgroup) groups=...

The last command is just to double-check. Additional groups after the = are not needed in general.

6. Setup Personal Endpoint: Now login with your new abcd1234 user on the server. Then use the commands (follow this guide and references therein)

$ globus login
...
$ globus session show

to login to the Globus network and later check the status. The login process creates a link that you can use to login via the University SSO service. This will generate a token (authorization code) which needs to be copied and pasted to the prompt.

When you are logged in to the Globus network, you can setup an endpoint with the commands

$ globus endpoint create --default-directory /path/to/share --personal abcd1234@uol.de
Message:     Endpoint created successfully
Endpoint ID: 922e9552-0000-00xx-971b-021304b0cca7
Setup Key:   24aa0566-000x-0x0x-9a2d-e000b55fb7cf
$ globusconnectpersonal -setup 24aa0566-000x-0x0x-9a2d-e000b55fb7cf
Configuration directory: /home/abcd1234/.globusonline/lta
Contacting relay.globusonline.org:2223
Done!

where the second command uses the Setup Key from the first command (last line of output). You also need to make the directory for the endpoint accessible by editing the file (e.g. using vim editor)

$ vim ~/.globusonline/lta/config-paths 
$ cat ~/.globusonline/lta/config-paths
~/,0,1
/path/to/share/,0,1

and add the second line to the file (per default your $HOME configured automatically to be accessible).

Finally, the Globus connect service needs to be started with

$ globusconnectpersonal -start &

which will continue running even if you log out. Check with

$ globusconnectpersonal -status

7. Transferring data: This can be done now using the web interface. Once you logged in you should see your newly configured endpoint which allows you to initiate data transfer. In principle, you can also use the Globus CLI on the server.