Running HPC jobs with containers

Overview

Teaching: 40 min
Exercises: 40 min
Questions
  • How can I execute commands in a container with Singularity or Shifter?

  • How are variables and directories shared between host and container?

  • Is it possible to simplify the container user experience?

Objectives
  • Download and run containers on a supercomputer

  • Manage sharing of variables and directories with the host

  • Run a real-world bioinformatics application in a container

  • Use container modules to hide the container runtime syntax

Download and run containers

Before starting, let us cd into the exercises subdirectory of the tutorial repository directory (see also Setup page):

cd ~/sc-tutorials/exercises

Singularity

Download Ubuntu 20.04 image using:

singularity pull docker://ubuntu:20.04
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
...
Writing manifest to image destination
Storing signatures
...
INFO:    Creating SIF file...

Note how you need to prepend docker:// to the image name, to tell Singularity you’re downloading an image in Docker format (the default would be to download a SIF image).

The image file is just in your current directory:

ls
ubuntu_20.04.sif

Now let’s execute some Linux commands from within the container, whoami and cat /etc/os-release:

singularity exec ubuntu_20.04.sif whoami
trainingxx
singularity exec ubuntu_20.04.sif cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Note how with Singularity the user in the container is the same as in the host machine.

Singularity has a dedicated syntax to open an interactive shell prompt in a container:

singularity shell ubuntu_20.04.sif
Singularity>

Exit the shell by typing exit or hitting Ctrl-D.

Finally, note you can request Singularity to execute a container straight away, if the image is not available locally it will be pulled under the hood, and stored in the Singularity cache:

singularity exec docker://ubuntu:16.04 cat /etc/os-release
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 92473f7ef455 done
Copying blob fb52bde70123 done
Copying blob 64788f86be3f done
Copying blob 33f6d5f2e001 done
Copying config f911e561e9 done
Writing manifest to image destination
Storing signatures

[..]

INFO:    Creating SIF file...

NAME="Ubuntu"
VERSION="16.04.7 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.7 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

By default, the cache is stored in ~/.singularity; this location can be customised using the environment variable SINGULARITY_CACHEDIR.
A subcommand, singularity cache, can be used to manage the cache.

Shifter (Optional: Use Perlmutter)

Let’s download the same Ubuntu image as above, using shifterimg:

shifterimg pull ubuntu:20.04
2022-10-11T05:25:11 Pulling Image: docker:ubuntu:20.04, status: READY

Locally stored images are managed by Shifter itself:

shifterimg images|grep ubuntu:20.04
perlmutter docker     READY    8e5c4f0285   2024-11-11T12:43:04 ubuntu:20.04

What’s the container user with Shifter? Let’s use both id -u and whoami (Note: the response depends on your training account):

shifter --image=ubuntu:20.04 whoami
train5
shifter --image=ubuntu:20.04 id -u
12962

Again, these come from the host.

You can try more Linux commands:

shifter --image=ubuntu:20.04 cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

NOTE: If you need to open an interactive shell in the container with Shifter, just execute bash in the container.

Share host environment variables

By default, host variables are shared with the container:

export HELLO="world"

You can access that variable both with Singularity:

singularity exec ubuntu_20.04.sif bash -c 'echo $HELLO'
world

and with Shifter (on Perlmutter):

export HELLO="world"
shifter --image=ubuntu:20.04 bash -c 'echo $HELLO'
world

There are some additional user options worth discussing, for further tuning of the shell environment. In some cases, e.g. when using Python containers, you might need to isolate the container shell from the host. To this end, use -e with Singularity:

singularity exec -e ubuntu_20.04.sif bash -c 'echo $HELLO'

and -E with Shifter:

shifter -E --image=ubuntu:20.04 bash -c 'echo $HELLO'

If you need to pass variables to the container in this situation, you can use a dedicated syntax. With Singularity it looks like:

export SINGULARITYENV_BYE="moon"
singularity exec -e ubuntu_20.04.sif bash -c 'echo $BYE'
moon

or, from version 3.6.x on:

singularity exec -e --env BYE="moon" ubuntu_20.04.sif bash -c 'echo $BYE'
moon

And with Shifter:

shifter -E --env BYE="moon" --image=ubuntu:20.04 bash -c 'echo $BYE'
moon

What about Podman?

By default and similar to Docker, Podman isolates host and container shell environments:

export HELLO="world"
podman run ubuntu:20.04 bash -c 'echo $HELLO'

You can pass specific variables to the container by using the flag -e:

podman run -e HELLO ubuntu:20.04 bash -c 'echo $HELLO'
world

Or even redefine variables, with the same flag:

podman run -e HELLO=moon ubuntu:20.04 bash -c 'echo $HELLO'
moon

Use host directories

There is a difference in how container engines set the default working directory. Let’s see this with an example, the container image marcodelapierre/ubuntu_workdir:18.04, which has a custom WORKDIR in the Dockerfile:

FROM ubuntu:18.04

WORKDIR "/workdir"

How does Singularity behave?

singularity exec docker://marcodelapierre/ubuntu_workdir:18.04 pwd
/home/tutorial/sc-tutorials/exercises

Singularity always uses the host current working directory.

Now, how about Shifter?

shifterimg pull marcodelapierre/ubuntu_workdir:18.04
shifter --image=marcodelapierre/ubuntu_workdir:18.04 pwd
/global/u1/t/train5

Shifter always uses the host current working directory, too.

With both Singularity and Shifter, the container filesystem is read-only, so if you want to write output files you must do it in a bind-mounted host directory.
Typically, HPC administrators will configure the container engine for you, so that host filesystems containing data and software are mounted by default.

In the unlikely scenario where you need to bind-mount additional paths, Singularity offers handy methods for users. For instance:

singularity exec ubuntu_20.04.sif ls /NODUS
ls: cannot access /NODUS: No such file or directory
singularity exec -B /NODUS ubuntu_20.04.sif ls /NODUS
bin  jobs  provision  scripts  share

or

export SINGULARITY_BINDPATH="/NODUS"
singularity exec ubuntu_20.04.sif ls /NODUS
bin  jobs  provision  scripts  share

What about Podman?

podman run marcodelapierre/ubuntu_workdir:18.04 pwd
/workdir
podman run ubuntu:20.04 pwd
/

Similar to Docker, Podman follows WORKDIR as defined in the Dockerfile; if undefined it defaults to the root dir /.

Also remember that, similar to Docker, Podman does not mount any host directories (although the system administrators may enable default mounting of relevant directories):

podman run -w $(pwd) marcodelapierre/ubuntu_workdir:18.04 pwd
Error: workdir "/home/tutorial/sc-tutorials/exercises" does not exist on container d42af9a0aaed56b840b5a6be3aa0e2ba19114ead456036573491e54816e185ab

So, if you want to run the container in the host current directory, you need to use -w to specify the work directory, and -v to mount the desired host directory:

podman run -v $(pwd):$(pwd) -w $(pwd) marcodelapierre/ubuntu_workdir:18.04 pwd
/home/trainxx

Do It Yourself: BLAST example

Now you’re going to run a BLAST (Basic Local Alignment Search Tool) example with a container from BioContainers.
BLAST is a tool bioinformaticians use to compare a sample genetic sequence to a database of known sequences; it’s one of the most widely used bioinformatics packages.
This example is adapted from the BioContainers documentation.

For this exercise, use Singularity.
Try and achieve what you’re asked to do, use the solution only if you’re lost.

Before you start, change directory to blast:

cd ~/sc-tutorials/exercises/blast

Pull the image

First, download the following container image:

quay.io/biocontainers/blast:2.9.0--pl526h3066fca_4

Solution

singularity pull docker://quay.io/biocontainers/blast:2.9.0--pl526h3066fca_4

Run a test command

Now, run a simple command using that image, for instance blastp -help, to verify that it actually works.

Solution

singularity exec blast_2.9.0--pl526h3066fca_4.sif blastp -help
USAGE
  blastp [-h] [-help] [-import_search_strategy filename]

[..]

 -use_sw_tback
   Compute locally optimal Smith-Waterman alignments?

Now, the exercise directory contains a human prion FASTA sequence, P04156.fasta and a gzipped reference database to blast against, zebrafish.1.protein.faa.gz.
First, uncompress the database (you can use host commands for this):

gunzip zebrafish.1.protein.faa.gz

Run the analysis

We need to perform two tasks:

  1. prepare the zebrafish database with makeblastdb:
      makeblastdb -in zebrafish.1.protein.faa -dbtype prot
    
  2. run the alignment with blastp:
      blastp -query P04156.fasta -db zebrafish.1.protein.faa -out results.txt
    

Start from these commands and adapt them so you can run them from within a container.

Solution

singularity exec blast_2.9.0--pl526h3066fca_4.sif makeblastdb -in zebrafish.1.protein.faa -dbtype prot
Building a new DB, current time: 11/16/2019 19:14:43
New DB name:   /home/ubuntu/singularity-containers/demos/blast_db/zebrafish.1.protein.faa
New DB title:  zebrafish.1.protein.faa
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 52951 sequences in 1.34541 seconds.
singularity exec blast_2.9.0--pl526h3066fca_4.sif blastp -query P04156.fasta -db zebrafish.1.protein.faa -out results.txt

The final results are stored in results.txt:

less results.txt
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

  XP_017207509.1 protein piccolo isoform X2 [Danio rerio]             43.9    2e-04
  XP_017207511.1 mucin-16 isoform X4 [Danio rerio]                    43.9    2e-04
  XP_021323434.1 protein piccolo isoform X5 [Danio rerio]             43.5    3e-04
  XP_017207510.1 protein piccolo isoform X3 [Danio rerio]             43.5    3e-04
  XP_021323433.1 protein piccolo isoform X1 [Danio rerio]             43.5    3e-04
  XP_009291733.1 protein piccolo isoform X1 [Danio rerio]             43.5    3e-04
  NP_001268391.1 chromodomain-helicase-DNA-binding protein 2 [Dan...  35.8    0.072
[..]

When you’re done, quit the view by hitting the q button.

Container modules with SHPC (Reference only)

Singularity Registry HPC, or SHPC in short, is an extremely interesting project by some of the original creators of Singularity. This utility enables the automatic deployment of so called Container Modules, using either Lmod or Environment Modules and bash functions within modulefiles. The latter are used to wrap the container runtime syntax inside aliases that match the application executable names.

The key advantage of SHPC is that it automates the process of downloading the container and creating the corresponding modulefile with bash function definitions. It does so by means of a registry of recipes (currently over 300) that are ready for use. If a recipe for a container does not exist, writing one is relatively straightforward, although out of scope for this episode.

Let’s see how we can install BLAST using SHPC. First, let’s look for available BLAST versions with shpc show:

shpc show --versions -f blast
ncbi/blast:2.11.0
ncbi/blast:2.12.0
ncbi/blast:latest
quay.io/biocontainers/blast:2.10.1--pl526he19e7b1_3
quay.io/biocontainers/blast:2.11.0--pl5262h3289130_1
quay.io/biocontainers/blast:2.12.0--pl5262h3289130_0

And now let’s install the latest BLAST biocontainer (copy-pasting the image and tag from the output above) with shpc install:

shpc install quay.io/biocontainers/blast:2.12.0--pl5262h3289130_0
singularity pull --name /opt/singularity-hpc/containers/quay.io/biocontainers/blast/2.12.0--pl5262h3289130_0/quay.io-biocontainers-blast-2.12.0--pl5262h3289130_0-sha256:a7eb056f5ca6a32551bf9f87b6b15acc45598cfef39bffdd672f59da3847cd18.sif docker://quay.io/biocontainers/blast@sha256:a7eb056f5ca6a32551bf9f87b6b15acc45598cfef39bffdd672f59da3847cd18
INFO:    Using cached SIF image
INFO:    Using cached SIF image
/opt/singularity-hpc/containers/quay.io/biocontainers/blast/2.12.0--pl5262h3289130_0/quay.io-biocontainers-blast-2.12.0--pl5262h3289130_0-sha256:a7eb056f5ca6a32551bf9f87b6b15acc45598cfef39bffdd672f59da3847cd18.sif
Module quay.io/biocontainers/blast:2.12.0--pl5262h3289130_0 was created.

That’s it! We now have a BLAST module:

module avail quay

-------------------------------------------- /opt/singularity-hpc/modules ---------------------------------------------
quay.io/biocontainers/blast/2.12.0--pl5262h3289130_0/module.tcl

Which we can load and use (well, the newer version does not like the -help flag):

module load quay.io/biocontainers/blast/2.12.0--pl5262h3289130_0
blastp -help
BLAST query/options error: Either a BLAST database or subject sequence(s) must be specified
Please refer to the BLAST+ user manual.

We can also see that this command is indeed a bash function wrapping the singularity syntax:

type blastp
blastp is a function
blastp () 
{ 
    singularity ${SINGULARITY_OPTS} exec ${SINGULARITY_COMMAND_OPTS} -B /opt/singularity-hpc/modules/quay.io/biocontainers/blast/2.12.0--pl5262h3289130_0/99-shpc.sh:/.singularity.d/env/99-shpc.sh /opt/singularity-hpc/containers/quay.io/biocontainers/blast/2.12.0--pl5262h3289130_0/quay.io-biocontainers-blast-2.12.0--pl5262h3289130_0-sha256:a7eb056f5ca6a32551bf9f87b6b15acc45598cfef39bffdd672f59da3847cd18.sif /usr/local/bin/blastp
}

NOTE: a limitation to this approach is that bash functions cannot be used as an argument for mpirun or srun, and hence are not usable for MPI applications. A simple workaround is to produce aliases for MPI applications in the form of wrapper bash scripts instead.

Key Points

  • Download a container image with singularity pull or shifterimg pull

  • Execute commands in a container with singularity exec or shifter

  • By default Singularity and Shifter pass all host variables to the container

  • By default Singularity and Shifter use the host current directory as the container working directory

  • Define container specific shell variables with Singularity by prefixing them with SINGULARITYENV_

  • Mount additional host directories with Singularity with the flag -B, or the variable SINGULARITY_BINDPATH