You are here

Minerva Quick Start

Minerva Quick Start

The Minerva HPC complex is comprised of two major partitions:
The Chimera partition:

  • 286 compute nodes - 48 Intel 8168 cores (2.7GHz) and 192 GB memory
  • 4x high memory nodes - 48 Intel 8168 cores (2.7GHz) and 1.5 TB memory
  • 48 V100 GPUs in 12 nodes - 32 Intel 6142 cores (2.6GHz) and 384 GB memory - 4x V100-16 GB GPU
  • The BODE2 partition:

  • 78 compute nodes – 48 Intel 8268 cores (2.9GHz) and 192 GB memory
  • Only BODE-enabled users have access to the BODE2 partition
  • Connecting to Minerva

    For security, Minerva uses the Secure Shell (ssh) protocol and Two Factor authentication. Unix systems typically have an ssh client already installed. Windows systems can download one of several ssh clients that are available for free such as PuTTY.

    Two Factor authentication requires you to enter a password that is the combination of your Sinai password and a generated token (Either Software or Hardware token).
    Software Token:
    On an Android and/or iPhone, the application is called "VIP Access" and is published by Symantec. Blackberry, Windows Mobile, etc are also supported.

    Hardware Token:
    You can obtain a Hardware Token from the IT Helpdesk. We don't have Hardware Token available now.
    To setup two factor authentication visit the ASCIT website.

    From on-site and off-site

    All users can login to Minerva cluster via ssh to minerva.hpc.mssm.edu. When you log in from outside, you will not be able to reach the Sinai campus network and, therefore, some services will not be accessible.

    For example:

    > ssh your_userid@minerva.hpc.mssm.edu
    Password: > your_Sinai_password123456

    ( the > sign indicates what you would type in; 123456 represents the numeric sequence obtained from your token)

    External groups and visiting faculty or students who are have a yubikey instead of a token should modify the ssh command and password as follows:

    > ssh your_us+yldap@minerva.hpc.mssm.edu
    Password: > your_Sinai_passwordYUBIKEY

    YUBIKEY represents pushing the button on your yubikey while inserted into a USB port on your computer.

    More information on login, direct to Logging In Page.

    File System

    /hpc/users/<userid> User HOME directories.  20GB quota.  It is NOT purged and is backed up.  Generally used for all the ‘rc’ and configuration files for various programs.
    /sc/hydra/work/<userid> A WORK directory for each user.  100GB quota.  It is NOT purged and it is NOT backed up. To be used for whatever purpose the user desires.
    /sc/hydra/scratch/<userid> A folder for each user inside the /sc/hydra/scratch directory.

    /sc/hydra/scratch has a 100TB quota and it is shared by all users.  This should be used in lieu of /tmp for temporary files as well as short term storage up to a maximum of 14 days.  Files older than 14 days are purged automatically by the system.

    /sc/hydra/projects/<projectid> A directory for each approved project.  The quota is set to the approved allocation for the project.  It is NOT purged but it is NOT backed up. 

    Queues

    The queues that are available are:
    Default memory per core is set as 3000MB for all the queues.

    Queue Description Max Walltime
    Premium Jobs requesting high priority with APS doubled as 200. Charged at 150% of alloc rate 144 hrs.
    express Jobs requiring less than 12 hours walltime 12 hrs.
    interactive Jobs running in interactive mode 12 hrs.
    long Jobs requiring more than 144 hours walltime 2 weeks
    gpu Jobs running on GPU nodes 144 hrs.
    private Jobs using dedicated resources unlimited


    LSF

    Minerva uses LSF for batch submission. bsub is the submission command. Options can be put on the command line or in the submission script. HOWEVER, if the options are placed in the submission script, you must feed the script into the bsub command via stdin for the options to be read: E.g.,

    cat MyLSF.script | bsub
    or
    bsub < MyLSF.script


    Some important points of interest:

    • The default disposition for output and logs is for LSF to email the output to you. This piece is not working yet so you must use the "-o" option to save the output.
    • In general, the shortest quantum of time in LSF is 1 minute. Wall time is expressed as HHH:MM -- There are no seconds. Durations are generally in minutes.
    • System level checkpoints are supported by LSF. There are some "gotchas" ( E.g., the default method does not work on our system) so check with the SC staff if you need/want to do checkpointing.

    A quick conversion guide from the PBS qsub to the LSF bsub can be found here.

    Some useful commands:

    bjobs - shows all your jobs in the queue
    bpeek - peek at your output before the job ends
    bqueues - what queues are available
    bkill - kill a job

    Check out the main pages for all the options.

    Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer