Globus Data Transfer

From HPC users
Revision as of 17:11, 19 February 2020 by Harfst (talk | contribs) (Created page with "This is highly experimental and not an official service offered by the University. == Introduction == Globus provides a secure, unified interface to your research data. Use...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is highly experimental and not an official service offered by the University.

Introduction

Globus provides a secure, unified interface to your research data. Use Globus to 'fire and forget' high-performance data transfers between systems within and across organizations. [1]

Members of the University Oldenburg can use their credentials (abcd1234 and password) to login to the service. Using the web interface you can manage so-called endpoints and transfer large amounts of data between different endpoints (which can be shared by or with you). See the Globus How-tos for details.

Tip: You can SMB-mount your HPC directories (e.g. $DATA or $GROUP) on your local computer to make them the default directory for your personal endpoint.

Using a Virtual Machine to create a Personal Endpoint

It might be useful to use a virtual machine (VM) for creating a personal endpoint so that data transfer does not interfere with your normal computer use. Here is a summary of what needs to be done:

1. Request VM: You can request a VM following the instructions at the web pages from IT services. Make sure that the VM is visible worldwide.

2. Prepare VM: As root, use SMB to mount a filesystem (e.g. $DATA or $GROUP) which will later serve as a directory for data transfer. Note, that the mount will disappear if the system is rebooted. Next, setup Python 3 as the default Python (not strictly needed, but recommended):

# alternatives --install /usr/bin/python python /usr/bin/python2 50
# alternatives --install /usr/bin/python python /usr/bin/python3 60
# alternatives --install /usr/bin/pip pip /usr/bin/pip3 60
# python --version

Normally, Python 3 is already installed, if not use yum install to do so.

3. Install Globus CLI: The globus-cli is needed to connect to the Globus network and manage your private endpoints. It is a Python package so installation is simply done by

# pip install --upgrade globus-cli

(using pip as root is not really recommended but it is still the easiest way to install a package for everyone). Globus CLI Documentation.

4. Install Globus Personal Connect: This allows to run a Globus service on your server which allows the Globus network to connect to your server. The commands are

# cd /opt   # usually a good place for non-standard software 
# wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
--2018-05-22 15:32:32--  https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
Resolving downloads.globus.org (downloads.globus.org)... 52.84.122.197, 52.84.122.3, 52.84.122.100, ...
Connecting to downloads.globus.org (downloads.globus.org)|52.84.122.197|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14500802 (14M) [application/x-tar]
Saving to: ‘globusconnectpersonal-latest.tgz’ 

globusconnectpersonal-latest.tgz             
100%[=====================================================================================>]  13.83M   3.63MB/s    in 3.9s
# tar xzf globusconnectpersonal-latest.tgz
# ln -s globusconnectpersonal-x.y.z globusconnectpersonal

If you now add to the file /etc/skel/.bashrc the two lines

# Globus Personal Connect
PATH=/opt/globusconnectpersonal:$PATH

somewhere at the end, new user should be able to run the commands in the steps below. Existing user have to add the two lines to their own ~/.bashrc