Data Transfer HERO/FLOW to CARL/EDDY

From HPC users
Jump to navigationJump to search

What will happen with the data on HERO/FLOW?

It depends on where the data is stored:

Data in $HOME (/user/.../abcd1234) and in /data/work/.../abcd1234 will remain accessible for another few months at least. Data will not be deleted until we have a confirmation that it can be deleted.

Data on the GPFS (/data/work/gpfs, FLOW users only) will remain there until April 30th and remain accessible until that date, too. After that the GPFS will not be accessible and it is planned to update the system for future use. This will result in all remaining data being deleted.

How can I access my data on HERO/FLOW?

Currently, the data on the old systems can be accessed by logging in to HERO/FLOW and for the data on the Isilon system ($HOME and /data/work) also by the links provided in the $HOME-directories on CARL/EDDY (only reading). The login nodes of HERO/FLOW will be accessible until April 30th. After that, access to the data is possible from the new system CARL/EDDY with the exception of the old GPFS (/data/work/gpfs).

What is the best way of migrating my data?

There are several option, but most importantly you should clean up your data as much as possible. Make sure you delete all data you no longer need and if you have large amount of data that you would like to keep but you do not use actively at the moment you can:

  1. combine many smaller files into larger archives (e.g. using the tar command)
  2. compress large data files if you expect a good compression rate (e.g. using gzip or bzip2)

Once you cleaned up your data you can copy it to the new system in one of the following ways:

  1. using rsync
  2. using the cp command on the new system with the links to the old $HOME and /data/work directory
  3. using remote copy commands like scp or sftp
  4. using SMB mounted shares on your local computer

We recommend to use the first two methods, and rsync in particular if you have large amounts of data (because you can interrupt it and continue afterwards).

Please keep in mind that copying data puts a lot of stress on the system, in particular on the bandwidth of the ethernet connections. This may affect the usability of the cluster for others. Try to:

  1. avoid copying data between 9am and 6pm
  2. use the nice command with your copy command
  3. do not rush copying your data (unless it is on the old GPFS), you have plenty of time, one copy at a time should be enough

Examples

will follow