Difference between revisions of "FAQ"
Schwietzer (talk | contribs) |
(→Question: My job is in the pending (PD) state, with the reason MaxJobsPerAccount. What can I do?) |
||
(6 intermediate revisions by 2 users not shown) | |||
Line 89: | Line 89: | ||
'''''TLDR:''' Use $WORK to do simulations and store the results to $DATA.'' | '''''TLDR:''' Use $WORK to do simulations and store the results to $DATA.'' | ||
=== ''Question:'' How do I download files from the cluster to my local device at home/office? === | |||
'''Answer: ''' That depends on the file count and size of your desired directory. In any case, you will probably end up using `rsync` to get the files. <br/> | |||
For small to medium sized folders with a few hundred of files, you can directly go ahead and transfer your directory to your device. You can either use this command directly from within a bash session on the cluster: | |||
rsync -avzh /path/to/data you@hostname:/path/to/remote/ | |||
... or the other way around, call this command from a bash session on your local device: | |||
rsync -avzh abcd1234@carl.hpc.uni-oldenburg.de:/path/to/remote/ /path/to/local/destination | |||
The same applies to more extensive directories, e.g. if the desired folder contains thousands or even millions of files. In those cases, you'll want to compress all corresponding files into an archive before then transferring the archive file to your device.: | |||
tar -I zstd -cvf /path/to/compressed/file.tar.zst /path/do/your/data | |||
rsync -avzh /path/to/compressed/file.tar.zst you@hostname:/path/to/remote/ | |||
This would have two additional and substancial advantages: | |||
1) Especially with many files, compression is highly efficient, and will reduce the data size being transferred. Depending of the type and amount of files, you can easily shrink a 4TB directory into a 2,5TB archive. | |||
2) The transfer rate of a few large files is significantly higher compared to moving many small files. | |||
On your local device, younwill have to decompress the archive correspondingly. Either on Windows with tools like peazip or 7zip, or from a linux shell: | |||
tar -I zstd -xvf /path/to/compressed/file.tar.zst -C /path/to/file/destination | |||
=== ''Question:'' I accidentally deleted / overwrote an important file of mine! Is there any way to undo this mistake? === | === ''Question:'' I accidentally deleted / overwrote an important file of mine! Is there any way to undo this mistake? === | ||
Line 105: | Line 122: | ||
export LC_ALL="en_US.utf8" | export LC_ALL="en_US.utf8" | ||
You can also attach this command to the ~/.bashrc file if you are often confronted with the error message. | You can also attach this command to the ~/.bashrc file if you are often confronted with the error message. | ||
==Groups and Accounts == | ==Groups and Accounts == | ||
Line 150: | Line 155: | ||
===''Question:'' I recently changed my research group. How can I also change my unix group on the cluster? === | ===''Question:'' I recently changed my research group. How can I also change my unix group on the cluster? === | ||
'''Answer:''' You can log in to the [https://servicedesk.uni-oldenburg.de/ selfservice-desk] of the university and request a group change there (go to ''IT Services'' and then ''Wissenschaftliches Rechnen'', finally click on ''Zugang beantragen''). Please note that you will have to manually change the group membership of your files and directories after you have been assigned to the new unix group. | '''Answer:''' You can log in to the [https://servicedesk.uni-oldenburg.de/ selfservice-desk] of the university and request a group change there (go to ''IT Services'' and then ''Wissenschaftliches Rechnen'', finally click on ''Zugang beantragen''). Please note that you will have to manually change the group membership of your files and directories after you have been assigned to the new unix group. | ||
===''Question:'' My group has changed from <tt>agold</tt> to <tt>agnew</tt>, how can I change the group ownership of my files accordingly? === | |||
'''Answer:''' You can use the command <tt>chgrp</tt> to change the group ownership of your files. For example, the command | |||
$ chgrp -R agnew $HOME | |||
will change the ownership of all files and subdirectories in your <tt>$HOME</tt> to the new group <tt>agnew</tt>, regardless of which group they were assigned to before. If you want to only change those files and directories, that belong to <tt>agold</tt>, then use the command | |||
$ find $HOME -group agold -exec chgrp agnew {} \; | |||
In both cases, you probably want to use the same command also with <tt>$DATA</tt>, <tt>$WORK</tt>, and <tt>$OFFSITE</tt>. | |||
==Jobs and Queue== | ==Jobs and Queue== | ||
Line 167: | Line 179: | ||
* <tt>PartitionTimeLimit</tt>: Your job has a time limit longer than 21 days and will not start. Change the time limit to 21 days or less. | * <tt>PartitionTimeLimit</tt>: Your job has a time limit longer than 21 days and will not start. Change the time limit to 21 days or less. | ||
* <tt>ReqNodeNotAvail</tt>: This typically shows up when a downtime for maintenance is scheduled and a reservation is in place. Your job time limit is longer than the time until the maintenance. Unless you reduce the time limit your job will start after the maintenance. Note, that <tt>squeue</tt> will also list nodes that are unavailable for other reasons which can be misleading. | * <tt>ReqNodeNotAvail</tt>: This typically shows up when a downtime for maintenance is scheduled and a reservation is in place. Your job time limit is longer than the time until the maintenance. Unless you reduce the time limit your job will start after the maintenance. Note, that <tt>squeue</tt> will also list nodes that are unavailable for other reasons which can be misleading. | ||
* <tt>MaxJobsPerAccount</tt>: A limit of 250 running jobs per account is enforced. The account is your unix group <tt>agyourgroup</tt> and all the running jobs from users of that group are counted towards this limit. | |||
All possible reasons can be found in the [https://slurm.schedmd.com/squeue.html#lbAF SLURM documentation] | All possible reasons can be found in the [https://slurm.schedmd.com/squeue.html#lbAF SLURM documentation] | ||
===''Question:'' My job is in the pending (PD) state, with the reason <tt>PartitionTimeLimit</tt>. What can I do? === | |||
'''Answer:''' As stated in the previous question, the time limit of your job is too long. In most partitions, the time limit is 21 days, so you need to reduce the time limit of your job below this limit. This can be done with the command | |||
scontrol update job <jobid> TimeLimit=21-0 | |||
where you have to replace <tt><jobid></tt> with the id of your job (as shown e.g. in the <tt>squeue</tt>-command. You can of course set any other time limit below 21 days (it must also be shorter than the current time limit of the job). | |||
Another common problem occurs, when jobs are submitted without specifying a partition such as <tt>carl.p</tt> or <tt>eddy.p</tt>. In this case, the job is queued in the partition <tt>all_nodes.p</tt> which allows a maximum time limit of only 1 day. In this case, you probably want to assign a different partition to the job, which can be achieved with | |||
scontrol update job <jobid> Partition=carl.p | |||
where again you have to replace <tt><jobid></tt> with the id of your job. You can choose any available partition here, <tt>carl.p</tt> is only an example. | |||
The commands above work on all pending jobs, so they can be used in other situtations as well. | |||
===''Question:'' My job is in the pending (PD) state, with the reason <tt>MaxJobsPerAccount</tt>. What can I do? === | |||
'''Answer:''' First of all, you do not necessarily need to do anything as your jobs will start sooner or later, when other jobs running under the same account are completed. The limitation of 250 running jobs per account or group <tt>agsomegroup</tt> has been introduced to improve the fair scheduling of large parallel jobs. You can check all jobs in queue (running and waiting) for a given account with | |||
$ squeue -A agsomegroup | |||
You can then identify other users from your group with active jobs and discuss a fair usage of the 250 running jobs limitation within your group. You can use the job array task limit, e.g. | |||
#SBATCH --array 1-100%20 | |||
to limit job arrays (here to 20 running tasks) or you can submit multiple jobs (or job arrays) with dependencies | |||
$ jid=$(sbatch --parsable array_job_1.sh) | |||
$ jid=$(sbatch --parsable --dependency afterany:$jid array_job_2.sh) | |||
in which case job 2 would only start after job 1 is completed. You can also refer to the [[How to Manage Many Jobs]] and in particular the section 4 about the Linux <tt>parallel</tt>-command. | |||
== Error Messages == | == Error Messages == | ||
Line 173: | Line 207: | ||
Some error messages are hard to understand. Here are some possible solutions: | Some error messages are hard to understand. Here are some possible solutions: | ||
==='' | ===''Error:'' "<tt>srun: error: PMK_KVS_Barrier duplicate request from task 0</tt>"=== | ||
'''Cause:''' Probably you have started an Intel MPI application with <tt>mpirun</tt> while <tt>I_MPI_PMI_LIBRARY</tt> was set in your environment.<br> | |||
'''Solution:'''Using <tt>srun</tt> instead of <tt>mpirun</tt> or <tt>unset I_MPI_PMI_LIBRARY</tt>.<br><br> | |||
===''Error:'' "<tt>slurmstepd: error: execve():bad interpreter(/bin/bash): no such file or directory</tt>" === | |||
'''Cause:''' This mostly happens when a script has been created or edited and saved on a windows system. Should you encounter an error like this or similar error messages directly after submitting / executing a script, the data type is very likely unprocessable for bash: <br/> | |||
''' | '''Solution:''' To solve this issue, the simple command ''dos2unix'' should be used on the corresponding script file. With ''file'' you can check if the file is dos based and if it changed after the process: | ||
''$ file test_win.sh'' | |||
test_win.sh: Bourne-Again shell script, ASCII text executable''', with CRLF line terminators''' # the highlighted part indicates a wrong file format | |||
''$ dos2unix test_win.sh'' | |||
dos2unix: converting file test_win.sh to Unix format... | |||
''$ file test_win.sh'' | |||
test_win.sh: Bourne-Again shell script, ASCII text executable |
Latest revision as of 09:33, 13 June 2023
Introduction
In our Wiki, you will find a lot of often very detailed information about working in our HPC environment. However, for beginners, it can be very challenging to find the right start.
This is where our FAQ is supposed to be of help. It is designed for the very beginner and links to our Wiki resources when needed.
But also our advanced users could find some of the answers helpful.
If you think, that there are some important questions/answers missing, please let us know. This whole Wiki is a work which is permanently in progress.
F.A.Q.
The very basics - about your account
Question: What exactly is an HPC Cluster?
Answer: An HPC cluster is a group of several high-performance computers. HPC clusters are used when the performance of one ordinary computer is no longer sufficient to perform (scientific) computations.
For comparison: An average well equipped PC has a processing unit (CPU) with ca. 4 cores and 8 - 12 Gigabyte of RAM.
Our standard nodes have 24 cores with 256 GB RAM for each core. If a complete node is used to full capacity, 6144 gigabytes of RAM can be used! (A node can be seen as a single computer within the cluster)
Question: Am I permitted to work on the cluster?
Answer: Basically, every student or scientific staff member has the right to work on the HPC cluster, as long as the computations are scientifically legitimated (which of course includes the progress of the students' education).
Nonetheless, there are a few things to consider. There are three common use cases, which we will briefly describe:
- You are writing your Bachelor / Master / PhD thesis. In this case, it is very likely that you are already part of a workgroup. Just tell your thesis advisor that you need to take your work to the cluster and create a request (see next question).
- You take place at a seminar which happens to work with the cluster. In this case, you don't have to do anything. Your university lecturer will take care about everything regarding the HPC login and you will get provisional login data. Your personal user account will not be touched. But after the course, you won't be able/permitted to keep using that account!
- You are not writing on your thesis, you are not taking place in a seminar and you are not part of a workgroup. But you want to use the HPC anyway. If this is the case, please contact us at hpcsupport@uol.de. Either you will be transferred to a fitting group (after consulting the corresponding professor) or you could get an own workgroup. Either way, we will very likely find a solution that fits your needs.
Question: I decided to work on the cluster. How do I get access?
Answer: If you want to get access to our cluster, you need to be part of a workgroup as mentioned above. If you are part of a workgroup, you can request access via our Self Service Portal of our ServiceDesk. Since we already have a step-by-step description on how to start a request, we refer to the instruction page. If your workgroup situation is unclear, just write to us at hpcsupport@uol.de.
Question: I now have access rights for the cluster. How do I log in?
Answer: First of all, congratulations on your new HPC membership!
Now you can start working on our cluster. Depending on your operating system (Windows, Linux, or Mac), the procedure is slightly different.
If you have the privilege of choice, we would always recommend Linux, since the communication Linux -> Linux is always least prone to problems (the HPC cluster environment is based on the Red Hat Enterprise Linux distribution.)
To make it short: On Linux, you open a terminal and type in
ssh abcd1234@carl.hpc.uni-oldenburg.de
On Windows, you need to type in the same address, but you additionally need a ssh compatible program like MobaXterm or Putty.
But to avert redundancy, we refer to our wiki page about login where you can find a more detailed description on how to access the cluster with Linux and/or Windows.
Working on the cluster
Question: I want to start computing. What are the first steps?
Answer: Basically, you need two things:
(1) A software module and (2) a job script.
(1) Let's assume that you already successfully logged in. The first thing you need to know is which software you will need to use.
If you need an overview of the software that is currently available on our cluster, take a look at our software register or type in ml av to get a software list for your current environment.
We go on and assume, that you chose the software you need for your calculations. Let's pretend it's EGSnrc that you need to work with.
If you want to use specific software, you always have to load it first.
So we take a look at the software's page and we see, that it is installed on the environment hpc-env/6.4. (Fortunately, EGSnrc has a detailed software page on our wiki.)
This means, we have to load the environment first and then the software module:
module load hpc-env/6.4 module load EGSnrc
(You can abbreviate module load with ml.)
Now, where EGSnrc is loaded, you could start to use it. But you are currently logged in to one of our five login nodes (hpcl001-hpcl005). What you need to do, is to transfer your calculations to another node. For this, you need to use a (2) sbatch script (or job script) with which you can bundle your commands to one job, transfer it to another node, and allocate specific system resources to the job. After writing the script, you submit the sbatch script with SLURM. We describe this procedure here, but for EGSnrc, there is an additional script example
Creating job scripts is mandatory on our cluster!
Job scripts don't just allocate system resources to your tasks, SLURM also queues every job so that the resources are shared fairly.
Question: I need to work with specific software (versions). What can I do?
Answer: There are three different ways to get new software. But before that, you should check, if your software maybe is already installed:
module spider desired_software
You should also check our software register. If you are sure, that we currently don't provide the software you need, you have the following options:
- Ask us to install your software packet as a module. Write to us at hpcsupport@uol.de and name the software, and the source address if you have it by hand (e.g. GitHub, homepage, etc).
- Install it by and for yourself with Conda
- Create a container containing your desired software packet with Singularity
But especially if you think, that it is a software that could be of need for one or more other scientists, you should prefer the first method and write to us. That way, we can provide software for everybody.
Question: I'm not at the university right now. Can I use the cluster from home?
Answer: Yes, you can.
But since our cluster is only permitted to be used on the university's ground, you will need to use a VPN client. This way, your computer will build a bridge to the university's network and act like you are working at the campus.
When the connection is set up, you can start working as usual.
Question: I work with a significant amount of data. Does it matter, where I store them?
Answer: YES, it matters!
There are four different file systems.
- $HOME (1TB): Here you store the most important and frequently used data, like scripts, results from data analysis, etc.
- Snapshots and backup system.
- $DATA (20TB): Here you can store data from simulations for ongoing analysis, etc.
- Fast read/write access, snapshots, backup system.
- $WORK (50TB): Here, you store the files during the simulations. If you need the same files or results in some week, keep them here. If you won't touch them for a long time, please transfer the important data to $DATA and delete the rest.
- Fast read/write access, neither snapshots nor backup system.
- $SCRATCH (1-2TB per node/per job): This file system significantly differs from the other ones: Every data that is being used on $SCRATCH will be deleted after the job ended immediately. In return, this storage is extremely fast. So just use it for high I/O jobs (e.g. random access) and ALWAYS write your job scripts in a way, that the results are moved to $DATA before is ends, otherwise everything was in vain.
- USE WITH CARE! absurdly fast, but volatile storage. Neither backup nor snapshots (naturally).
For more information, take a look at the corresponding wiki page.
TLDR: Use $WORK to do simulations and store the results to $DATA.
Question: How do I download files from the cluster to my local device at home/office?
Answer: That depends on the file count and size of your desired directory. In any case, you will probably end up using `rsync` to get the files.
For small to medium sized folders with a few hundred of files, you can directly go ahead and transfer your directory to your device. You can either use this command directly from within a bash session on the cluster:
rsync -avzh /path/to/data you@hostname:/path/to/remote/
... or the other way around, call this command from a bash session on your local device:
rsync -avzh abcd1234@carl.hpc.uni-oldenburg.de:/path/to/remote/ /path/to/local/destination
The same applies to more extensive directories, e.g. if the desired folder contains thousands or even millions of files. In those cases, you'll want to compress all corresponding files into an archive before then transferring the archive file to your device.:
tar -I zstd -cvf /path/to/compressed/file.tar.zst /path/do/your/data rsync -avzh /path/to/compressed/file.tar.zst you@hostname:/path/to/remote/
This would have two additional and substancial advantages: 1) Especially with many files, compression is highly efficient, and will reduce the data size being transferred. Depending of the type and amount of files, you can easily shrink a 4TB directory into a 2,5TB archive. 2) The transfer rate of a few large files is significantly higher compared to moving many small files. On your local device, younwill have to decompress the archive correspondingly. Either on Windows with tools like peazip or 7zip, or from a linux shell:
tar -I zstd -xvf /path/to/compressed/file.tar.zst -C /path/to/file/destination
Question: I accidentally deleted / overwrote an important file of mine! Is there any way to undo this mistake?
Answer: Yes there is! (But depending on how fast you noticed the deletion)
On $HOME and $DATA we have a snapshot system for this very issue. Just navigate to the missing file's directory and type in
cd .snapshots
Here you will see the backups of the last 30 days. If your accident happened on $HOME and on this very day, you even have access to snapshots created hourly. If you choose the fitting point of time of your snapshot, navigate to the desired/lost file and copy it into the primary location. For more on this see the corresponding File_system_and_Data_Management#Snapshots_on_the_ESS wiki page.
Question: Why do I regularly get error messages concerning the language setting and locale?
Answer: This error is caused by the difference between the language settings of the host and client system. The error can occur, for example, if the English configured cluster is accessed from a German based operating system. The solution is a single command line on the cluster side: export LC_ALL="en_US.utf8" You can also attach this command to the ~/.bashrc file if you are often confronted with the error message.
Groups and Accounts
Question: I'm in a new workgroup. How do I change the Unix group?
Answer: You can follow the same steps as mentioned above at how do I get access?. Just request access to your new workgroup and your groups will be changed automatically. But it's best to tell us in the web form, that this is a change request.
Question: I started working at the university and have a second account now. How can I transfer the files between my two accounts??
Answer: With your old account, you can transfer the files you need to your new account with rsync:
rsync -avz $HOME/source_directory abcd1234@carl.hpc.uni-oldenburg.de:/user/abcd1234/target_directory
Where abcd1234 is your new account.
See also: File System and Data Management
Answer: There are multiple ways of sharing and transferring data between users, but chmod would be the most straight forward option. As an example:
# Make your $DATA directory accessible chmod o+x $DATA # mkdir $DATA/share cp $HOME/<your_files> $DATA/share # Make the shared folder accessible and readable chmod o+rx $DATA/share
With this method, every user has access to the shared folder on your $DATA directory. Should you want to share it with a colleague from your workgroup, instead of using o+xr, you can user g+xr. This way, the files are exclusively accessible to your group members.
You can reverse the access rights by substituting the "*" with "-"
chmod -xr $DATA/share chmod -x $DATA
For more information on this, you can visit our corresponding wiki page.
Question: I recently changed my research group. How can I also change my unix group on the cluster?
Answer: You can log in to the selfservice-desk of the university and request a group change there (go to IT Services and then Wissenschaftliches Rechnen, finally click on Zugang beantragen). Please note that you will have to manually change the group membership of your files and directories after you have been assigned to the new unix group.
Question: My group has changed from agold to agnew, how can I change the group ownership of my files accordingly?
Answer: You can use the command chgrp to change the group ownership of your files. For example, the command
$ chgrp -R agnew $HOME
will change the ownership of all files and subdirectories in your $HOME to the new group agnew, regardless of which group they were assigned to before. If you want to only change those files and directories, that belong to agold, then use the command
$ find $HOME -group agold -exec chgrp agnew {} \;
In both cases, you probably want to use the same command also with $DATA, $WORK, and $OFFSITE.
Jobs and Queue
Question: How can see the status of my jobs in the job queue?
Answer: To list your own jobs in the queue you can use the command
$ squeue -u $USER JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 12345678 carl.p JobName abcd1234 PD 0:00 1 (Resources)
The state of the job is shown in the coloumn ST and is typically R for running and PD for pending. Other states are possible but should not last very long (when this happens please contact Scientific Computing). If you add the option -l to the squeue-command the state will be written out.
Using squeue --help will display all possible commands of queue and briefly describe what they are doing.
Question: My job is in the pending (PD) state, why?
Answer: The squeue-command (see question before) prints a nodelist for running jobs and for pending jobs in brackets the reason, why a job is pending. The most common reasons are Resources and Priority which basically means your job is waiting for the requested resources. Other reasons may show up, here is a list with explanations:
- Resources: Jobs is waiting for resources and will start when they become available.
- Priority: Other jobs have higher priority, your job will start afterwards when resources are available.
- PartitionTimeLimit: Your job has a time limit longer than 21 days and will not start. Change the time limit to 21 days or less.
- ReqNodeNotAvail: This typically shows up when a downtime for maintenance is scheduled and a reservation is in place. Your job time limit is longer than the time until the maintenance. Unless you reduce the time limit your job will start after the maintenance. Note, that squeue will also list nodes that are unavailable for other reasons which can be misleading.
- MaxJobsPerAccount: A limit of 250 running jobs per account is enforced. The account is your unix group agyourgroup and all the running jobs from users of that group are counted towards this limit.
All possible reasons can be found in the SLURM documentation
Question: My job is in the pending (PD) state, with the reason PartitionTimeLimit. What can I do?
Answer: As stated in the previous question, the time limit of your job is too long. In most partitions, the time limit is 21 days, so you need to reduce the time limit of your job below this limit. This can be done with the command
scontrol update job <jobid> TimeLimit=21-0
where you have to replace <jobid> with the id of your job (as shown e.g. in the squeue-command. You can of course set any other time limit below 21 days (it must also be shorter than the current time limit of the job).
Another common problem occurs, when jobs are submitted without specifying a partition such as carl.p or eddy.p. In this case, the job is queued in the partition all_nodes.p which allows a maximum time limit of only 1 day. In this case, you probably want to assign a different partition to the job, which can be achieved with
scontrol update job <jobid> Partition=carl.p
where again you have to replace <jobid> with the id of your job. You can choose any available partition here, carl.p is only an example.
The commands above work on all pending jobs, so they can be used in other situtations as well.
Question: My job is in the pending (PD) state, with the reason MaxJobsPerAccount. What can I do?
Answer: First of all, you do not necessarily need to do anything as your jobs will start sooner or later, when other jobs running under the same account are completed. The limitation of 250 running jobs per account or group agsomegroup has been introduced to improve the fair scheduling of large parallel jobs. You can check all jobs in queue (running and waiting) for a given account with
$ squeue -A agsomegroup
You can then identify other users from your group with active jobs and discuss a fair usage of the 250 running jobs limitation within your group. You can use the job array task limit, e.g.
#SBATCH --array 1-100%20
to limit job arrays (here to 20 running tasks) or you can submit multiple jobs (or job arrays) with dependencies
$ jid=$(sbatch --parsable array_job_1.sh) $ jid=$(sbatch --parsable --dependency afterany:$jid array_job_2.sh)
in which case job 2 would only start after job 1 is completed. You can also refer to the How to Manage Many Jobs and in particular the section 4 about the Linux parallel-command.
Error Messages
Some error messages are hard to understand. Here are some possible solutions:
Error: "srun: error: PMK_KVS_Barrier duplicate request from task 0"
Cause: Probably you have started an Intel MPI application with mpirun while I_MPI_PMI_LIBRARY was set in your environment.
Solution:Using srun instead of mpirun or unset I_MPI_PMI_LIBRARY.
Error: "slurmstepd: error: execve():bad interpreter(/bin/bash): no such file or directory"
Cause: This mostly happens when a script has been created or edited and saved on a windows system. Should you encounter an error like this or similar error messages directly after submitting / executing a script, the data type is very likely unprocessable for bash:
Solution: To solve this issue, the simple command dos2unix should be used on the corresponding script file. With file you can check if the file is dos based and if it changed after the process:
$ file test_win.sh test_win.sh: Bourne-Again shell script, ASCII text executable, with CRLF line terminators # the highlighted part indicates a wrong file format $ dos2unix test_win.sh dos2unix: converting file test_win.sh to Unix format... $ file test_win.sh test_win.sh: Bourne-Again shell script, ASCII text executable