Difference between revisions of "Non-public templates: Coordination of Scientific Computing"

From HPC users
Jump to navigationJump to search
 
Line 528: Line 528:
Oliver
Oliver
   </nowiki>
   </nowiki>
== Usage Statistics ==
{| class="wikitable" style="text-align:right;"
!Slots/Job
!#Users
!#Jobs
!WallClock
!uTime
!sTime
!cpuTime
|-
!1               
|      22 || 137818 ||      374651 ||      178093 ||        3373 ||      343442
|-
!2               
|        5 || 4066 ||        7875 ||        7406 ||          163 ||        7606
|-
!3               
|        5 ||  109 ||          56 ||          17 ||            1 ||          19
|-
!4               
|        6 ||  105 ||        3215 ||        6785 ||          10 ||        10635
|-
!5               
|        1 ||    2 ||            0 ||            0 ||            0 ||            0
|-
!6               
|        4 ||  47 ||        1632 ||        4730 ||          70 ||        7103
|-
!8               
|        3 ||  71 ||          240 ||          429 ||            7 ||          437
|-
!9               
|        1 ||  197 ||        1086 ||        1404 ||          34 ||        1439
|-
!10             
|        2 ||    3 ||          72 ||            1 ||            0 ||            1
|-
!11             
|        1 ||  17 ||          485 ||        2613 ||          30 ||        2643
|-
!12             
|      13 ||  254 ||        16023 ||        50379 ||          491 ||      158119
|-
!13             
|        1 ||  24 ||          10 ||          65 ||          25 ||          90
|-
!15             
|        1 ||    3 ||          21 ||          215 ||            0 ||          220
|-
!16             
|        4 ||  64 ||          763 ||        8753 ||          55 ||        8858
|-
!18             
|        1 ||    1 ||            0 ||          12 ||            0 ||          12
|-
!19             
|        1 ||    1 ||            0 ||            0 ||            0 ||            0
|-
!20             
|        2 ||  28 ||          216 ||        3086 ||            9 ||        3187
|-
!25             
|        1 ||    3 ||            6 ||          54 ||          103 ||          159
|-
!32             
|        4 ||  39 ||          498 ||        5860 ||        3858 ||        9938
|-
!33             
|        1 ||    4 ||          11 ||            0 ||            0 ||            0
|-
!35             
|        1 ||    6 ||          18 ||          405 ||          28 ||          433
|-
!36             
|        2 ||    8 ||          37 ||          897 ||            0 ||          898
|-
!40             
|        1 ||    1 ||            0 ||            0 ||            0 ||            0
|-
!50             
|        1 ||    1 ||            0 ||            0 ||            0 ||            0
|-
!60             
|        1 ||    1 ||            0 ||            0 ||            0 ||            0
|-
!64             
|        2 ||    6 ||          34 ||          326 ||          302 ||          653
|-
!PE
!#Users
!#Jobs
!WallClock
!uTime
!sTime
!cpuTime
|-
!NONE           
|      16 || 10690 ||      214535 ||        49075 ||        1812 ||      150323
|-
!impi           
|        2 ||  27 ||          909 ||        2613 ||          30 ||        2643
|-
!impi41         
|        3 ||  111 ||          30 ||          302 ||          60 ||          363
|-
!mdcs           
|      10 ||  892 ||        1874 ||        3509 ||          96 ||        3606
|-
!molcas         
|        2 || 83113 ||      114559 ||      106715 ||        1106 ||      156397
|-
!mpich           
|        1 ||    1 ||            0 ||            0 ||            0 ||            0
|-
!openmpi         
|        6 ||  135 ||        1439 ||        17841 ||        4284 ||        22515
|-
!smp             
|      16 || 47910 ||        73613 ||        91483 ||        1177 ||      220052
|-
!AG
!#Users
!#Jobs
!WallClock
!uTime
!sTime
!cpuTime
|-
!agtheochem     
|        7 || 83958 ||      146407 ||      126533 ||        1665 ||      248955
|-
!agmolchem       
|        4 ||  137 ||        8594 ||        45045 ||          90 ||        90296
|-
!agses           
|        1 || 5362 ||      110898 ||          139 ||            0 ||        83363
|-
!agcompphys     
|        6 || 2418 ||        94021 ||        44115 ||        1839 ||        59053
|-
!agmediphys     
|        5 || 49938 ||        38752 ||        42342 ||          617 ||        48727
|-
!agphysocean     
|        3 ||  57 ||          536 ||        6095 ||        4255 ||        10597
|-
!agcompchem     
|        1 ||    2 ||          380 ||            0 ||            0 ||        4558
|-
!agcondmat       
|        1 ||  53 ||        5008 ||        3646 ||            0 ||        4327
|-
!agcoordchem     
|        1 ||    2 ||          374 ||            0 ||            0 ||        2301
|-
!agmodelling     
|        2 ||  223 ||        1252 ||        1484 ||          35 ||        1519
|-
!agsigproc       
|        3 ||  20 ||          305 ||          982 ||            2 ||          985
|-
!aggeneralpsych 
|        1 ||  118 ||          74 ||          586 ||          39 ||          626
|-
!agcomplexsys   
|        2 ||  49 ||          186 ||          476 ||            8 ||          484
|-
!agpsychdh       
|        1 ||  463 ||          145 ||          92 ||          11 ||          103
|-
!agancp         
|        1 ||    8 ||          24 ||            0 ||            0 ||            0
|-
!agcompint       
|        1 ||  71 ||            0 ||            0 ||            0 ||            0
|}
{| class="wikitable" style="text-align:right;"
!Node Type
!#Users
!#Jobs
!WallClock
!uTime
!sTime
!cpuTime
!Usage
|-
!mpcb           
|        4 || 43083 ||        49316 ||        28438 ||          438 ||        60210 ||  0.381850443742
|-
!mpcs           
|      40 || 99737 ||      353534 ||      241653 ||        8043 ||      459615 ||  0.451916224907
|-
!uv100           
|        2 ||  59 ||        4110 ||        1450 ||          85 ||        36076 ||  0.457593344615
|-
!TOTAL           
|      40 || 142879 ||      406961 ||      271541 ||        8568 ||      555901 || 0.440688
|}


== HPC tutorial ==
== HPC tutorial ==

Latest revision as of 09:00, 21 March 2014

Here, for documentation, completeness and availability I will list some templates of e-mails and further things I used on a regular basis.

Application for a new user account

So as to apply for a new user account, an eligible user needs to specify three things:

  • his/her anonymous user-name in the form abcd1234,
  • the working group (or ideally the unix-group) he will be associated to, and
  • an approximate data until when the user account will be needed.

No university user account, yet

If the user has no university-wide anonymous user account, yet, he first needs to apply for one. An exemplary e-mail with advice on how to get such a (guest) user account is listed below

 
Sehr geehrter Herr NAME,

um einen Nutzeraccount für das HPC System erhalten zu können müssen Sie bereits
über einen universitätsweiten, anonymen Nutzeraccount verfügen.  Als Gast einer
Arbeitsgruppe können sie einen entsprechenden Guest-Account bei den IT-Diensten
beantragen. Besuchen Sie dazu bitte die Seite

http://www.uni-oldenburg.de/itdienste/services/nutzerkonto/gaeste-der-universitaet/

und wählen Sie die Option "Gastkonto einrichten". Starten sie den Workflow für
das Anlegen eines Gastkontos. Tragen Sie als Verantwortlichen den Leiter der
universitären Organisationseinheit ein, der Ihr Vorhaben unterstützt. Bitten
Sie diesen, die E-Mail die er erhält zu öffnen, den darin enthaltenen Link
aufzurufen und den Antrag zu genehmigen. Das Konto wird dann automatisch
erstellt. Ihr anonymer Nutzeraccount wird die Form "abcd1234" haben.

Um nun ihren Nutzeraccount für das HPC System freischalten zu können senden Sie
mir bitte folgende Details:

1) den anonymen Nutzernamen für den der HPC account erstellt werden soll,
2) den Namen der Arbeitsgruppe der Sie zugeordnet werden sollen,
3) einen voraussichtlichen Gültigkeitszeitraum für den benötigten HPC account.

Sobald Ihr HPC account aktiviert ist werde ich mich mit weiteren Informationen
bei Ihnen melden.

Mit freundlichen Grüßen
Oliver Melchert
  



User account HPC system: Mail to IT-Services

Once the user supplied the above information, you can apply for a HPC user account at the IT-Service using an e-mail similar to:

 
Mail to: felix.thole@uni-oldenburg.de; juergen.weiss@uni-oldenburg.de
Betreff: [HPC-HERO] Einrichtung eines Nutzeraccounts

Sehr geehrter Herr Thole,
sehr geehrter Herr Weiss,

Hiermit bitte ich um die Einrichtung eines HPC Accounts für 
Herrn NAME

abcd124; UNIX-GROUP

der Account wird voraussichtlich bis DATUM benötigt.

Mit freundlichen Grüßen
Oliver Melchert
   

If no proper unix group exists, yet, send instead an email similar to the following:

 
Mail to: felix.thole@uni-oldenburg.de; juergen.weiss@uni-oldenburg.de
Betreff: [HPC-HERO] Einrichtung eines Nutzeraccounts

Hallo Felix,
hallo Jürgen,

Hiermit bitte ich um die Einrichtung eines HPC Accounts für Herrn NAME

abcd1234

der Account wird voraussichtlich bis DATUM benötigt.

Herr NAME ist Mitarbeiter der AG "AG-NAME" (AG-URL) von Herrn Prof. NAME AG-LEITER. 
Die entsprechede AG hat noch keine eigene Unix Group! Kann daher eine neue Unix Group 
für die AG angelegt und in die bestehende Gruppenhierarchie eingebunden werden?

Ich schlage hier den Namen 

agUNIX-GROUP-NAME

für die Unix Gruppe vor. Die AG gehört zur Fak. FAKULTAET.

Mit freundlichen Grüßen
Oliver Melchert
  

User account HPC system: Mail back to user

As soon as you get feedback from the IT-Services that the account was created, send an email to the user similar to the following:

 
Betreff: [HPC-HERO] HPC user account

Sehr geehrter Herr NAME,

die IT-Dienste haben Ihren HPC Account bereits freigeschaltet. Ihr Loginname
ist

abcd1234

und Sie sind der Unix-gruppe

UNIX-GROUP-NAME

zugeordnet. 

Sie verfügen über 100GB Plattenspeicher auf dem lokalen Filesystem (mit
vollem Backup). Wenn Sie über einen begrenzten Zeitraum mehr Speicherplatz
benötigen können Sie mich gerne diesbezüglich anschreiben. Ihren aktuellen
Speicherverbrauch auf dem HPC System können Sie mittels "iquota" einsehen. An
jedem Sonntag werden Sie eine Email mit dem Betreff "Your weekly HPC Quota
Report" erhalten, die Ihren aktuellen Speicherverbrauch zusammenfasst.

Anbei sende ich Ihnen einen Link zu unserem HPC user wiki, auf dem Sie weitere
Details über das lokale HPC System erhalten 
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Main_Page

Der Beitrag "Brief Introduction to HPC Computing" unter
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Brief_Introduction_to_HPC_Computing
illustriert einige einfache Beispiele zur Nutzung der verschiedenen
(hauptsächlich parallelen) Anwendungsumgebungen die auf HERO zur Verfügung
stehen und ist daher besonders zu empfehlen. Er diskutiert außerdem einige
andere Themen, wie z.B. geeignetes Alloziieren von Ressourcen und Debugging.

Wenn Sie planen die parallelen Ressourcen von MATLAB auf HERO zu nutzen kann
ich Ihnen die Beiträge "MATLAB Distributed Computing Server" (MDCS) unter 
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=MATLAB_Distributing_Computing_Server 
und "MATLAB Examples using MDCS" unter
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Matlab_Examples_using_MDCS
empfehlen. Der erste Beiträge zeigt wie man das lokale Nutzerprofil für die
Nutzung von MATLAB auf HERO konfigurieren kann und der Zweite beinhaltet einige
Beispiele und diskutiert gelegentlich auftretende Probleme im Umgang mit MDCS.

Viele Grüße
Oliver Melchert
  

english variant of the above email:

 
Betreff: [HPC-HERO] HPC user account

Dear NAME,

the IT-Services were now able to activate your HPC account. Your login name to
the HPC system is 

abcd1234

and you are integrated in the group

UNIX-GROUP-NAME

Per default you have 100GB of storage on the local filesystem which is fully
backed up. If you need some more storage over a limited period in time you can
contact me. Note that you can check your memory consumption on the HPC system
via the command "iquota". In addition, on each Sunday you will receive an
email, titled "Your weekly HPC Quota Report", summarizing your current memory
usage. 

Below I sent you a link to the HPC user wiki where you can find further 
details on the HPC system
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Main_Page

In particular I recommend the "Brief Introduction to HPC Computing" at
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Brief_Introduction_to_HPC_Computing
which illustrates several basic examples related to different (mostly parallel)
environments the HPC system HERO offers and discusses a variety of other
topics, as, e.g., proper resource allocation and debugging. 

Further, if you plan to use the parallel capabilities of MATLAB on HERO, I
recommend the "MATLAB Distributed Computing Server" (MDCS) page at 
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=MATLAB_Distributing_Computing_Server 
and the "MATLAB Examples using MDCS" wiki page at
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Matlab_Examples_using_MDCS
These pages summarize how to properly set up your profile for using MATLAB on HERO
and discuss some of the frequently appearing problems.

With kind regards
Oliver
  

User account HPC system: Mail back to user; Fak 2 (STATA users)

New users from Fak 2 most likely want to use the STATA software. An adapted version of the above email reads

 
Dear MY_NAME,

the IT-Services activated your HPC account already. Your login name to
the HPC system is 

LOGIN_NAME

and you are associated to the unix group

UNIX_GROUP

This is also reflected by the structure of the filesystem on the HCP system.

Per default you have 100GB of storage on the local filesystem which is fully
backed up. If you need some more storage over a limited period in time you can
contact me. Note that you can check your memory consumption on the HPC system
via the command "iquota". In addition, on each Sunday you will receive an
email, titled "Your weekly HPC Quota Report", summarizing your current memory
usage. 

Below I sent you a link to the HPC user wiki where you can find further details
on the HPC system: 

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Main_Page

If you plan to use the parallel capabilities of STATA on HERO, I recommend the
"STATA" entry at

Main Page > Application Software and Libraries > Mathematics/Scripting > STATA

see: http://wiki.hpcuser.uni-oldenburg.de/index.php?title=STATA
The above page summarizes how to access the HPC System and how to successfully 
submit a STATA job. 

With kind regards
Dr. Oliver Melchert
  

Temporary extension of disk quota

Sometimes a user from the theoretical chemistry group needs an temporary extension of the available backed-up disk space. Ask him to provide

  • the total amount of disk space needed (he might check his current limit by means of the unix command iquota)
  • an estimated data until the extension is required

Mail to IT-Servies

Then send an email similar to the one listed below to the IT-Service

 
Mail to: felix.thole@uni-oldenburg.de; juergen.weiss@uni-oldenburg.de
Betreff: [HPC-HERO] Erhöhung des verfügbaren Festplattenspeichers eines Nutzers 

Hallo Felix,
hallo Jürgen,

der HPC User NAME

abcd1234; UNIX-GROUP

hat darum gebeten seinen Disk Quota vorübergehend zu erhöhen. Er bittet 
um eine Erhöhung auf ein Gesamtvolumen von

500GB

die bis Ende Dezember 2013 benötigt wird. Danach kann er die 
Daten entsprechend archivieren und der Disk Quota könne wider
zurückgesetzt werden.

Viele Grüße,
Oliver
  

List of users with nonstandard quota

Users that currently enjoy an extended disk quota:

 
NAME                              ID            MEM       LIMIT
jan.mitschker@uni-oldenburg.de    dumu7717 1TB   no limit given
hendrik.spieker@uni-oldenburg.de  rexi0814 300GB Ende September 2013 
wilke.dononelli@uni-oldenburg.de  juro9204 700GB Ende Dezember 2013
eike.mayland.quellhorst@uni-oldenburg.de  auko1937  500GB Ende März 2014

Cluster downtime

In case there needs to be a maintenance downtime for the cluster, send an email similar to the following to the mailing list of the HPC users

 
Mail to: hpc-hero@listserv.uni-oldenburg.de
Betreff: [HPC-HERO] Maintenance downtime 11-13 June 2013 (announcement)

Dear Users of the HPC facilities,

this is to inform you about an overly due THREE-DAY MAINTENANCE DOWNTIME

FROM: Tuesday 11th June 2013, 7 am 
TO: Thursday 13th June 2013, 16 pm

This downtime window is required for essential maintenance work regarding
particular hardware components of HERO. Ultimately, the scheduled downtime will
fix longstanding issues caused by malfunctioning network switches.  Please note
that all running Jobs will be killed if they are not finished up to 11th June 7
am. During the scheduled downtime, all queues and filesystems will be
unavailable.  We expect the HPC facilities to resume on Thursday afternoon. 

I will remind you about the upcoming three-day maintenance downtime in 
unregular intervals.

Please accept my apologies for any inconvenience caused
Oliver Melchert
  

In case the downtime needs to be extended send an email similar to:

 
Mail to: hpc-hero@listserv.uni-oldenburg.de
Betreff: [HPC-HERO] Delay returning the HPC system HERO to production status

Dear Users of the HPC Facilities,

we currently experience a DELAY RETURNING THE hpc SYSTEM TO PRODUCTION STAUTS
since the necessary change of the hardware components took longer than
originally expected. The HPC facilities are expected to finally resume service
by

Friday 14th June 2013, 15:00 

We will notify you as soon as everything is back online. 

With kind regards
Oliver Melchert
  

you do not need to supply much details, yet. However, if another extension is necessary, you should provide some details otherwise prepare for complaints by the users. So, your email could look similar to:

 
Mail to: hpc-hero@listserv.uni-oldenburg.de
Betreff: [HPC-HERO] Further delay returning the HPC system HERO to production status

Dear Users of the HPC Facilities,

as communicated already yesterday, we currently experience a DELAY RETURNING 
THE hpc SYSTEM TO PRODUCTION STATUS. The delay results from difficulties related to 
the maintenance work on the hardware components of HERO.

The original schedule for the maintenance work could not be kept. Some details
of the maintenance process are listed below:

According to the IT-services, the replacement of the old (malfunctioning)
network switches by IBM engineers worked out well (with no delay). However, the
configuration of the components by pro-com engineers took longer that the
previously estimated single day, causing the current delay.  Once the
configuration process is completed, the IT-service staff needs to perform
several tests, firmware updates and application test which will take
approximately one day.  After the last step is completed, the HPC facilities
will finally return to production status.

In view of the above difficulties we ask for your understanding that the HPC
facilities will not be up until today 15:00. We hope that the HPC facilities
resume service by 

Monday 17th June 2013, 16:00 

We will notify you as soon as everything is back online and apologize for the 
inconvenience.
 
With kind regards
Oliver Melchert
  

once the HPC is up and ready send an email similar to:

 
Mail to: hpc-hero@listserv.uni-oldenburg.de
Betreff: [HPC-HERO] HPC systems have returned to production

Dear Users of the HPC Facilities,

this is to inform you that the maintenance work on the HPC systems have been
completed and the HPC component HERO has returned to production: HERO accepts
logins and has already started to process jobs.

Thank you for your patience and please accept my apologies for the extension of
the maintenance downtime and any inconvenience this might have caused
Oliver Melchert 
  


MOLCAS academic license

My question to the MOLCAS contact

 
Dear Dr. Veryazov,

my Name is Oliver Melchert and currently I'm in charge of the coordination of
the scientific computing at the University of Oldenburg. Previously this
position was occupied by Reinhard Leidl who had correspondence with you.

I write to you since I have a question regarding a licensed Software product
which was purchased earlier for our local HPC facilities. 

The Software product I'm referring to is the Quantum Chemistry Software MOLCAS,
for which we own an academic group license which will expire on 18.10.2013.

Now, my question is, in order to extend the license validity what steps do I
have to follow and can you guide me through these steps?

With kind regards
Dr. Oliver Melchert  

And their response

 
Dear Dr. Melchert,
In order to update the academic license for Molcas you should place a 
new order http://www.molcas.org/order.html
Please, print and sign the forms generated during the ordering.
These forms should be sent to me (e-mail attachment is OK).
After receiving the forms I will send you the updated license file.

There are two possibilities for the payment. By default - we will send 
you an invoice to be paid via bank transfer.
It is also possible to pay by a credit card.

     Best Regards,
                 Valera.

-- 
=================================================================
Valera Veryazov         * Tel:   +46-46-222 3110
Theoretical Chemistry   * Fax:   +46-46-222 8648
Chemical Center,        *
P.O.B. 124              * Valera.Veryazov@teokem.lu.se
S-221 00 Lund, Sweden   * http://www.teokem.lu.se/~valera

About MOLCAS: http://www.molcas.org/
-----------------------------------------------------------------
  


Large Matlab Jobs

Some Matlab users send jobs with the maximally allowed number of workers (i.e. slots in Matlab jargon), i.e. 36. Usually these Jobs get distributed over lots of hosts. E.g.:

 
job-ID  prior   name       user         state submit/start at     queue                  master ja-task-ID 
----------------------------------------------------------------------------------------------------------
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs004 MASTER        
                                                                  mpc_std_shrt.q@mpcs004 SLAVE         
                                                                  mpc_std_shrt.q@mpcs004 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs008 SLAVE         
                                                                  mpc_std_shrt.q@mpcs008 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs032 SLAVE         
                                                                  mpc_std_shrt.q@mpcs032 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs034 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs036 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs038 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs043 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs045 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs052 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs066 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs070 SLAVE         
                                                                  mpc_std_shrt.q@mpcs070 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs076 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs080 SLAVE         
                                                                  mpc_std_shrt.q@mpcs080 SLAVE         
                                                                  mpc_std_shrt.q@mpcs080 SLAVE         
                                                                  mpc_std_shrt.q@mpcs080 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs087 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs089 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs090 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs091 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs099 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs107 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs110 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs111 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs112 SLAVE         
                                                                  mpc_std_shrt.q@mpcs112 SLAVE         
1040328 0.51109 Job16      nixi9106     r     10/07/2013 18:19:48 mpc_std_shrt.q@mpcs117 SLAVE         
                                                                  mpc_std_shrt.q@mpcs117 SLAVE         
                                                                  mpc_std_shrt.q@mpcs117 SLAVE         
                                                                  mpc_std_shrt.q@mpcs117 SLAVE         
                                                                  mpc_std_shrt.q@mpcs117 SLAVE         
                                                                  mpc_std_shrt.q@mpcs117 SLAVE    
  

If the jobs have lots of I/O this puts a big strain on the filesystem. For these large jobs the "parallel job memory issue" is a problem. I.e. the master process has to account (in terms of memory) for all the connections to the other host machines. Then, if the master process runs out of memory the job gets killed. More common are 8 slot jobs and even more common are jobs with even less slots.


Login problems

Every now and then external guests or regular users try to login to the HPC system from outside the university and the straightforward attempt via

 ssh abcd1234@hero.hpc.uni-oldenburg.de

fails, of course. Then, the user might report

 
Dear Oliver,

My name is Pavel Paulau, 
I tried today for the first time to log in in cluster: 

ssh exwi4008@hero.hpc.uni-oldenburg.de 

and got message: 
"Permission denied, please try again."

Could You say what is the reason? What should I do to get access?

Thanks.
Kind wishes,
Pavel
  

A possible response then might read

 
Dear Pavel,

on the first sight your command line statement looks right, provided that you
try to login to the HPC system from a terminal within the University of
Oldenburg.  I also checked that your HPC account indeed exists (and it does :)).

As pointed out in the HPC user wiki it makes a difference whether you attempt
to login from a Terminal within the University of Oldenburg or from outside the
university:
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Logging_in_to_the_system#From_within_the_University_.28intranet.29 

In case you want to login to the HPC system from outside the university I
recommend to setup a VPN connection via the gateway vpn2.uni-oldenburg.de as
pointed out in the above user-wiki entry. However, sometimes, even though the
VPN tunnel is correctly set up, the login procedure might fail due to problems
resolving the hostname. Then you might try to access the cluster via the ip
address of the master node. Just establish the VPN tunnel and try to access
HERO via

ssh exwi4008@10.140.1.61 

this should resolve the name issues.

With kind regards
Oliver
  

HPC tutorial

Requesting seminar rooms

 
Hallo Herr Melchert,
die Buchungen habe ich eingetragen!

Gruß
Silke Harms


Raum- und Veranstaltungsbüro
Dezernat 4 / Gebäudemanagement
Carl von Ossietzky Universität Oldenburg

Telefon: 0441 / 798-2483


-----Ursprüngliche Nachricht-----
Von: Oliver Melchert 
Gesendet: Donnerstag, 24. Oktober 2013 10:28
An: Silke Ulrike Harms
Betreff: RE: Anfrage Raum für Einzel-/Blockveranstaltung

Hallo Frau Harms,

vielen Dank für die Liste der freien Termine. Nach Rücksprache mit meinem Kollegen würden wir gerne folgende Räume/Zeiten buchen: 

W04 1-162: 
Di: 19.11.13 - 14-16 Uhr
Mi: 20.11.13 - 16-18 Uhr

W01 0-008:
Do: 21.11.13 - 09-12 Uhr

der Einzelveranstaltung/Blockveranstaltung ist keine Nr. zugeordnet, es ist ein Pilotprojekt das, wenn erfolgreich, in den kommenden Semestern regulär (dann mit Veranstaltungsnummer) angeboten werden soll. Der Name der Veranstaltung lautet "A brief HPC Tutorial" und wird von Dr. Oliver Melchert und Dr. Stefan Albensoeder angeboten.

Mit herzlichen Grüßen
Oliver Melchert

________________________________________
From: Silke Ulrike Harms
Sent: Wednesday, October 23, 2013 10:09 AM
To: Oliver Melchert
Subject: AW: Anfrage Raum für Einzel-/Blockveranstaltung

Hallo Herr Melchert,
Sie können die freien Zeiten über Stud.IP in den jeweiligen Räumen einsehen (unter Raumbelegungen).
Ich habe Ihnen jetzt die derzeitigen Lücken rausgesucht:
W01 0-008
Mo 11.11.13 - 10-12 Uhr, Di 12.11.13 - 12-14 Uhr, Mi 13.11.13 - 08-10 + 16-18 Uhr, Do 14.11.13 - 08-10 + 14-16 Uhr, Fr 15.11.13 - 12-14 Uhr Mo 18.11.13 - ab 16 Uhr, Di 19.11.13 - 12-14 Uhr, Mi 20.11.13 - 08-10 Uhr, Do 21.11.13 - 08-12 + 14-16 Uhr, Fr 22.11.13 - ab 12 Uhr Mo 25.11.13 - ab 16 Uhr, Di 26.11.13 - 12-14 Uhr,  Mi 27.11.13 - 08-10 Uhr, Do 28.11.13 - 08-12 + 14-16 Uhr

W04 1-162
Di 12./19./26.11.13 - jeweils 14-16 Uhr
Mi 13./20./27.11.13 - jeweils ab 16 Uhr
Fr 15./22./29.11.13 - jeweils ab 14 Uhr

Bitte entscheiden Sie sich schnell, weil es zurzeit noch vielen Anfragen/Buchungen gibt.

Gruß
Silke Harms



Raum- und Veranstaltungsbüro
Dezernat 4 / Gebäudemanagement
Carl von Ossietzky Universität Oldenburg

Telefon: 0441 / 798-2483


-----Ursprüngliche Nachricht-----
Von: Oliver Melchert
Gesendet: Dienstag, 22. Oktober 2013 11:13
An: Silke Ulrike Harms
Betreff: RE: Anfrage Raum für Einzel-/Blockveranstaltung

Sehr geehrte Frau Harms,

vielen Dank für Ihre Antwort. Aufgrund von Urlaub/Krankeit konnten mein Kollege und ich uns auf keinen der vorgeschlagenen Ternime festlegen.

Hiermit möchte erneut einen Anfrage für dieselben Räume (siehe unten angehängte e-Mail) im Zeitraum

11.11.2013 - 29.11.2013

stellen.

Mit herzlichen Grüßen
Oliver Melchert

________________________________________
From: Silke Ulrike Harms
Sent: Monday, September 23, 2013 1:42 PM
To: Oliver Melchert
Subject: AW: Anfrage Raum für Einzel-/Blockveranstaltung

Guten Tag Herr Melchert,
ich kann Ihnen folgende Raumangebote machen:
Montag 14. Oktober 2013
14-20 Uhr - W04 1-162 + 16-20 Uhr - W01 0-008 Dienstag 15. Oktober 2013
14-16 Uhr - W04 1-162 + 18-20 Uhr - W01 0-008 Mittwoch 16. Oktober 2013
16-20 Uhr - W04 1-162 + 08-12 Uhr - W01 0-008

Bitte teilen Sie mir mit, welche Buchungen ich vornehmen soll und unter welcher Nr. ich Ihre Veranstaltung finde (und die Räume buchen soll).

Gruß
Silke Harms



Raum- und Veranstaltungsbüro
Dezernat 4 / Gebäudemanagement
Carl von Ossietzky Universität Oldenburg

Telefon: 0441 / 798-2483


-----Ursprüngliche Nachricht-----
Von: Oliver Melchert
Gesendet: Freitag, 20. September 2013 10:00
An: Silke Ulrike Harms
Betreff: Anfrage Raum für Einzel-/Blockveranstaltung

Sehr geehrte Frau Harms,

mein Name ist Oliver Melchert und ich begleite derzeit die Stelle des "Koordinators für das wissenschaftliche Rechnen". Für die Nutzer des Oldenburger Großrechners möchte ich, zusammen mit einem Kollegen, die Einzelveranstaltung/Blockveranstaltung "A brief HPC tutorial" anbieten. Für die Veranstaltung planen wir die Dauer von 4 x 1.5 Stunden ein.  Optimal wäre es, wenn wir an zwei aufeinanderfolgenden Tagen jeweils 2 x 1.5h anbieten könnten.

Wir rechnen mit max. 30 Teilnehmern und suchen einen geeigneten Raum für unser Vorhaben. Wir würden, sofern das möglich ist, die Veranstaltung gerne an zwei aufeinanderfolgenden Tagen im Zeitraum Oktober/November in den Wochen

14.10. - 19.10.
oder
28.10. - Ende November

anbieten. Die Veranstaltung soll neben Vorträgen auch praktische Übungen bieten.  Daher wäre es optimal, wenn wir für den letzten 1.5h Block in einen Rechnerraum ausweichen könnten. Da die meisten Nutzer am Standord Wechloy sitzen wäre es super wenn wir dort einen Seminarraum finden könnten. Geeignete Räume wären z.B.

W2-1-143
W2-1-148
W3-1-156
W4-1-162

Ein geeigneter Rechnerraum wäre z.B.

W1-0-008

Bezüglich der Urzeit sind wir flexibel. Wäre denn überhaupt noch Raum unsere geplante Veranstaltung unterzubringen?

Mit freundlichen Grüßen
Dr. Oliver Melchert
  

Mail to users

 
Betr.: [HPC-HERO] Tutorial on High Performance Computing (19.-21. Nov)

Dear User of the HPC System,

this is to announce the first tutorial on "High Performance Computing" which
will take place from 19.11.2013 to 21.11.2013. More precisely, the tutorial
will be split into three sessions. The first two sessions feature the parts
0.-IV. (listed below) and are held at the following dates:

Seminar-Room: W04 1-162:
Tue, 19.11.13 - 14-16 Uhr
Wed, 20.11.13 - 16-18 Uhr

The third session (part V.) comprises practical exercises which are meant to
illustrate some of the content presented in the earlier parts and is held at:

Computer-Lab: W01 0-008:
Thu, 21.11.13 - 09-12 Uhr

The target audience of this 1st HPC tutorial are new Users of the local HPC
system, for whom, in order to benefit from the tutorial, the skills of reading
and writing C-programs are of avail. However, we are optimistic that we will
be able to announce a quite similar tutorial for all Matlab-focused users,
soon. If you would like to attend the HPC tutorial, please sent a brief response
to this email.

The planned programme of this 1st HPC tutorial is

0. Introduction to HPC
   1. Motivation
   2. Architectures
   3. Overview over parallel models 

I. Cluster Overview:
   1. System Overview
   2. Modification of user environments via "module"
   3. Available compiler
   4. Available parallel environments
   5. Available Libraries
   6. Performance hints

II. Introduction to the usage of SGE:
    1. Introduction
    2. General Job submission 
    3. Single Slot jobs 
    4. Parallel Jobs 
    5. Monitoring and Controlling jobs 
      
III. Debugging and Profiling:
    1. Compiling programs for debugging
    2. Tracking memory issues
    3. Profiling

IV. Misc:
    1. Logging in from outside the university
    2. Mounting the HPC home directory
    3. Parallel environment memory issue
    4. Importance of allocating proper resources
   
V. Exercises (Computer-Lab):
    1. Try out the examples given in part II
    2. Estimate pi using Monte Carlo simulation
       (code provided serial+parallel using mpi;
       compile, submit and monitor jobs for different
       parameter settings)

With kind regards 
Oliver Melchert and Stefan Albensoeder
  

Confirmation for users

 
Dear USER,

this is to confirm your registration for the first tutorial on 
"High Performance Computing" which will be held at the following 
dates:

Seminar-Room W04 1-162:
Tue, 19.11.13 - 14-16 Uhr
Wed, 20.11.13 - 16-18 Uhr

Computer-Lab W01 0-008:
Thu, 21.11.13 - 09-12 Uhr

Thank you for signing up
Oliver Melchert and Stefan Albensoeder
  

Mail to IT Services

Contact the IT services and ask to make sure that the participants of the HPC Tutorial can logon to the HPC system from the Computer lab.

 
Hallo Oliver,

wir haben das Subnetz freigeschaltet.
Kannst du mal probieren ob alles funktioniert.

Heute kann ich nicht an deiner Veranstaltung teilnehmen, da ich schon um 17 Uhr einen Termin habe.

Viele Grüße 
Felix

-----Ursprüngliche Nachricht-----
Von: Oliver Melchert 
Gesendet: Mittwoch, 20. November 2013 10:14
An: Jürgen Weiß; Felix Thole
Betreff: IP Adressen in Raum W01-0-008

Hallo Jürgen,
hallo Felix,

ich habe die IP Adressen der Rechner im Raum W01-0-008
nachgeschaut. Die ersten 3 Oktetts lauten auf:

134.106.45.XXX

Die Übungen sollen morgen von 9-12 Uhr in diesem 
Raum stattfinden.

Ist die obige Information ausreichend oder soll ich 
eine genaue Liste der vollständigen IP Adressen 
senden?

Viele Grüße
Oliver  

User-Wiki entry

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=HPC_Tutorial_No1

Corresponding mail to all users


 
Betr.: [HPC-HERO] Accompanying documents for HPC Tutorial

Dear User of the HPC System,

this is to inform you that the User Wiki page that collects the material 
related to the first tutorial on "High Performance Computing", which
took place from 19.11.2013 to 21.11.2013, is available at

Main Page > Basic Information > Examples > HPC Tutorial No1

under the link

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=HPC_Tutorial_No1

We would like to thank all users of the HPC components FLOW/HERO that 
attended this first HPC tutorial and we are looking forward to host a 
further educational workshop tailored to suit all "MATLAB distributed
computing server" (MDCS) users at the end of January 2014 (more 
information will follow in due time).

Best regards
Oliver Melchert and Stefan Albensoeder
  

List of HPC publications

This page is intended to list publications that were supported by simulations on the HPC components FLOW/HERO. If you want to contribute to this list, please send an e-mail with subject:

[HPC-HERO or HPC-FLOW] Contribution to the list of HPC publications

to the coordinator of scientific computing (position currently substituted by: oliver.melchert@uni-oldenburg.de). It would be highly appreciated if you could provide the digital object identifier (DOI) that refers to your article within that mail. If the journal you published your article(s) in offers to export citations you might alternatively send one of the formats supported by the journal (preverably: BibTex).

NOTE: We kindly ask you to acknowledge the HPC components FLOW/HERO within research articles that were supported by simulations on the HPC facilities.

2012

  1. Claussen, G. and Apolo, L. and Melchert, O. and Hartmann, A. K.,
    Analysis of the loop length distribution for the negative-weight percolation problem in dimensions d=2 through d=6,
    Physical Review E 86, 5 (2012), 10.1103/PhysRevE.86.056708.

2013

  1. Melchert, O.,
    Percolation thresholds on planar Euclidean relative-neighborhood graphs,
    Physical Review E 87, 4 (2013), 10.1103/PhysRevE.87.042106.
  2. Melchert, O. and Hartmann, A. K.,
    Information-theoretic approach to ground-state phase transitions for two- and three-dimensional frustrated spin systems,
    Physical Review E 87, 2 (2013), 10.1103/PhysRevE.87.022107.
  3. Melchert, O.,
    Universality class of the two-dimensional randomly distributed growing-cluster percolation model,
    Physical Review E 87, 2 (2013), 10.1103/PhysRevE.87.022115.
  4. Norrenbrock, C. and Melchert, O. and Hartmann, A. K.,
    Paths in the minimally weighted path model are incompatible with Schramm-Loewner evolution,
    Physical Review E 87, 3 (2013), 10.1103/PhysRevE.87.032142.
  5. Melchert, O. and Hartmann, A. K.,
    Typical and large-deviation properties of minimum-energy paths on disordered hierarchical lattices,
    The European Physical Journal B 86, 7 (2013), 10.1140/epjb/e2013-40230-1.


List of user wiki pages

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Brief_Introduction_to_HPC_Computing

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Matlab_Examples_using_MDCS

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Queues_and_resource_allocation

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Unix_groups

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Mounting_Directories_of_FLOW_and_HERO#OSX

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=File_system (Snapshot functionality)

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=STATA

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Memory_Overestimation

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Debugging

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Profiling_using_gprof

http://wiki.hpcuser.uni-oldenburg.de/index.php?title=HPC_Tutorial_No1

MISC

Limited resource quota sets

Slot limits for different user groups are set using resource quota sets (rqs)

 
alxo9476@hero01:~$ qconf -srqs
{
   name         max_slots_for_express_queue_FLOW
   description  "limits number of slots for express queue on FLOW"
   enabled      TRUE
   limit        users {@flowusers} queues {cfd_xtr_expr.q} to slots=40
}
{
   name         max_slots_for_pe_mdcs
   description  "limits number of slots for PE mdcs"
   enabled      TRUE
   limit        users {*} pes {mdcs} to slots=36
}
{
   name         max_slots_for_user_groups_HERO
   description  "limits number of slots of users on HERO"
   enabled      TRUE
   limit        users {@herousers} to slots=360
}