Skip to content

Getting started with containers on LC

Use Cases

What is Docker?

Docker has gotten so much attention for creating easily reproducible software and computing environments in the last few years, that it is sometimes conflated with container technology generally. But, just as every tissue is not a Kleenex, your workflow can benefit from container technology without using Docker.

Livermore Computing supports the use of Singularity rather than Docker. While you will not be able to work with Docker on LC machines, you will still be able to use containers originally created with Docker via Singularity.

Singularity vs. Docker?

While great for some environments, use of Docker creates some security concerns in a shared environment, like a cluster. Singularity offers a flavor of container technology designed for HPC systems that addresses these concerns.

Grabbing a Docker container via Singularity

Often we are interested in containers that live in a registry somewhere, such as Docker Hub, the Singularity Library, or Singularity Hub. Let us say we are interested in a Docker container like the one on Docker Hub at https://hub.docker.com/r/bitnami/python. We can now use the singularity pull command (more detail below) to grab this container with syntax that specifies both the user offering the container (here, bitnami) and the name of the container (here, python):

singularity pull docker://bitnami/python

By default, this creates a file with a .sif extension called python_latest.sif. Alternatively, you could give the output file a name of your choice (and also a different extension, such as .img) like my_python_container_name.img via the syntax

singularity pull my_python_container_name.img docker://bitnami/python

Independent of whether you create a .sif or .img file, you can work with your output file using Singularity commands, as described in the sections below. It no longer matters that this container originally came from Docker Hub.

Where Singularity can help you

Containers are pre-packaged compute environments in which you can largely ignore the underlying hardware.

You can share these environments with a colleague or a future version of yourself. If you are using the same container, the two of you can avoid many common concerns tied to reproducibility, dependencies, and versions, and at least one of you will not need to build the software yourself. It also means that containers can get you up and running with a new software relatively quickly. One caveat to the utility of containers is that the software running on them may not be as performant or efficient as it would be if it were optimized for the particular architecture on which you are running calculations.

Examples of times you might want to use a container include:

  • You want to quickly reproduce the workflow of a collaborator to check or extend that collaborator's results.
  • You want to try out a new piece of software or software chain on LC systems that isn't currently installed on LC.
  • You want to work with a specific version of a software. LC offers the software you want, but not the right version.

Working with a Singularity container

How to get a container

Sometimes the container we want to work with will come from a collaborator, and sometimes, as mentioned above, it will come from a registry such as Docker Hub. For example, if we search for julia on Docker Hub, we see an “Official Image” published by Docker that will include a binary for the Julia programming language.

Clicking on the link for this container image takes us here, where we see that the way to obtain this container via Docker is docker pull julia. This indicates that, relative to Docker Hub, the path to this container image is simply julia. We’ll have to expand the path to reach this container image from Singularity. It becomes docker://julia.

Similarly, getting the pytorch image on Docker Hub found here would require docker pull pytorch/pytorch; the “full” path of the image at pytorch/pytorch on Docker Hub is docker://pytorch/pytorch.

Pulling a container

In “pulling” a container, we download a pre-built image from a container registry. Pulling the container with singularity pull via

singularity pull docker://julia

generates a file called julia_latest.sif. The .sif extension indicates a Singularity-specific version of a SquashFS image, a type of compressed file. Alternatively we could rename the output file or create a different file type by adding an input argument to our call to singularity pull:

singularity pull my_julia.img docker://julia 

This generates the container image file my_julia.img.

Building a container

Alternatively, we could use the command singularity build to interact with a registry. Because this command takes at least two input arguments, grabbing a container image file from a registry with singularity build looks more like our second example using singularity pull:

singularity build my_julia.img docker://julia

and similarly can generate an output file my_julia.img. Unlike singularity pull, singularity build can be used for a variety of tasks. In addition to grabbing a container image file from a container registry, you can use singularity build to create a container with a Singularity definition file from “scratch” or to convert between formats. For example, with singularity build, you can make a container writeable or create a sandbox, as described below.

What do you have now?

Image files vs. containers

Often containers and their images are referred to interchangeably, but there is a distinction. The output .sif or .img files you might obtain from singularity pull or singularity build are images, which are generally immutable. A container, on the other hand, is an instance of a runtime environment created from an image. In other words, a container is only a container if it’s running.

Looking inside the container via singularity exec

One way to look inside the container is to use the singularity exec command, which is described in greater detail in the next section. For example, the command

singularity exec my_julia.img cat /singularity

cats to standard output (prints to your screen) the contents of the runscript (more about this soon) that lives inside the container. Similarly, you can list all the directories inside the container via

singularity exec my_julia.img ls /

and all the files within the container’s /etc directory via singularity exec my_julia.img ls /etc.

Via a sandbox

Alternatively, you could look inside the container by creating a “sandbox”. This means that you dump the contents of a container into a directory on your system. You can then look at the directory structure and the files of the container on your computer, rather than querying the image with singularity commands. You can create a sandbox using singularity build with the --sandbox flag: For example,

singularity build --sandbox julia_sandbox/ my_julia.img

creates a directory called julia_sandbox inside my working directory. Now,

ls julia_sandbox

returns the same results as

singularity exec my_julia.img ls /

How to interact with a container

Executing a container

As we began to see in the last section, executing a container with singularity exec allows us to run commands inside the container with the syntax

singularity exec $(container_name) $(commands_we_want_to_run)

For example,

singularity exec my_julia.img julia

runs the julia binary inside the container created from the my_julia.img image and brings up the Julia interpreter. Similarly,

singularity exec my_julia.img echo "hello"

instantiates a container from my_julia.img and runs echo "hello" inside that container, printing “hello” to stdout.

Shelling into a container

We can also run commands inside the container by first opening a shell in the container:

singularity shell my_julia.img

In response, the shell prompt Singularity> pops up, at which you could run, for example, echo "hello" or julia.

Running a container

Singularity containers contain a “runscript” at the path \singularity within the container. Running the container means to call or execute this runscript. We can run the container created from my_julia.img via either

./my_julia.img

or

singularity run my_julia.img

Either of these commands therefore yields the same result as

singularity exec my_julia.img /singularity

The runscript we see via singularity exec my_julia.img cat /singularity is

#!/bin/sh
OCI_ENTRYPOINT=''
OCI_CMD='"julia"'
CMDLINE_ARGS=""
# prepare command line arguments for evaluation 
for arg in "$@"; do
    CMDLINE_ARGS="${CMDLINE_ARGS} \"$arg\""
done
# ENTRYPOINT only - run entrypoint plus args
if [ -z "$OCI_CMD" ] && [ -n "$OCI_ENTRYPOINT" ]; then
    if [ $# -gt 0 ]; then 
        SINGULARITY_OCI_RUN="${OCI_ENTRYPOINT} ${CMDLINE_ARGS}"
    else 
        SINGULARITY_OCI_RUN="${OCI_ENTRYPOINT}"
    fi 
fi
# CMD only - run CMD or override with args
if [ -n "$OCI_CMD" ] && [ -z "$OCI_ENTRYPOINT" ]; then
    if [ $# -gt 0 ]; then 
        SINGULARITY_OCI_RUN="${CMDLINE_ARGS}"
    else 
        SINGULARITY_OCI_RUN="${OCI_CMD}"
    fi 
fi
# ENTRYPOINT and CMD - run ENTRYPOINT with CMD as default args 
# override with user provided args
if [ $# -gt 0 ]; then
    SINGULARITY_OCI_RUN="${OCI_ENTRYPOINT} ${CMDLINE_ARGS}" 
else
    SINGULARITY_OCI_RUN="${OCI_ENTRYPOINT} ${OCI_CMD}" 
fi
# Evaluate shell expressions first and set arguments accordingly, 
# then execute final command as first container process
eval "set ${SINGULARITY_OCI_RUN}"
exec "$@"

When we execute, singularity run my_julia.img, the Julia interpreter starts up in our terminal.

How to change the filesystems a container sees

By default, a singularity container running on LC will see your home directory and its contents, but not other filesystems such our /p/lustre# filesystems and our /usr/workspace filesystems. For example,

janeh@oslic9:~/Singularity$ singularity shell my_julia.img
Singularity> pwd
/g/g0/janeh/Singularity
Singularity> ls /p/lustre1/janeh
ls: cannot access '/p/lustre1/janeh': No such file or directory

You can change this by binding or mounting a particular directory path in your container via the --bind or -B flag.

janeh@oslic9:~/Singularity$ singularity shell -B /p/lustre1/janeh my_julia.img
Singularity> ls /p/lustre1/janeh
0_LC_AutoDelete  GaAs

Note that binding multiple directory paths in your container requires using the -B flag multiple times, for example,

singularity shell -B /p/lustre1/janeh -B /usr/workspace/janeh my_julia.img

The -B or --bind flag can follow any of the singularity commands mentioned above, and should precede the name of the container with which you are working. For example, you might use -B with singularity exec or singularity run as in

singularity exec --bind /p/lustre1/janeh my_julia.img ls /p/lustre1/janeh

or

singularity run -B /p/lustre1/janeh my_julia.img

Note: Symlinks can create ambiguity as to where to find a directory you might like to mount. For example, /usr/workspace/<username> on LC systems links either to /usr/WS1/<username> or /usr/WS2/<username>, depending on the username. Binding /usr/workspace/<your-username> to your container will work, but if you simply try to bind /usr/workspace, you may not be able to see your workspace directory. (Imagine your workspace lives in /usr/WS1 and binding /usr/workspace mounts /usr/WS2.)

How to change what a container does

Sometimes we’ll find ourselves wanting to change the contents of a container or the behavior exhibited by our container when it is run. For example, maybe I want a container that uses Julia to remind me of the value of pi at runtime, rather than to simply start the interpreter.

Let’s create a file in our working directory called calc_pi.jl which contains

"""
     function calc_pi(N)

This function calculates pi with a Monte Carlo simulation using N samples.
"""
function calc_pi(N)
    # Generate `N` pairs of x,y coordinates on a grid defined 
    # by the extrema (1, 1), (-1,-1 ), (1, -1), and (-1, 1) 
    samples = rand([1, -1], N, 2) .* rand(N, 2)
    # how many of these sample points lie within the circle
    # of max size bounded by the same extrema
    samples_in_circle = sum([sqrt(samples[i, 1]^2 + samples[i, 2]^2) < 1.0 for i in 1:N])

    pi = 4*samples_in_circle/N
end

# print the estimate of pi calculated with 10,000 samples 
println(calc_pi(10_000))

Then update that file’s permissions so it’s broadly accessible for reading, writing, and executing via

chmod 777 calc_pi.jl

Now we’ll show a couple ways to build a container that runs a copy of this file by default.

Sandboxes (again)

Using sandboxes is one way to edit a container and its behavior, though editing a container recipe (below) is the preferred method. In using a sandbox, we will typically

  1. Create a sandbox from an image with singularity build
  2. Change the contents of the container by editing within the sandbox
  3. Write a new image from the sandbox, again with singularity build

First, as already seen, we can create a sandbox from my_julia.img via singularity build --sandbox julia_sandbox/ my_julia.img.

Second, let’s change the contents of the sandbox inside julia_sandbox/. Now copy calc_pi.jl inside my sandbox’s /opt/ directory:

cp ./calc_pi.jl julia_sandbox/opt/

Now I’ll update the runscript in /singularity to read

#!/bin/sh
# if there are no command line arguments, run the calc_pi.jl script 
if [ -z "$@" ]; then
    julia /opt/calc_pi.jl
# otherwise, there may be command line arguments provided -- such as a julia script
else
    julia "$@" 
fi

so that we run calc_pi.jl and print our estimate for pi whenever we run this container. Finally, I can create an updated container image, my_updated_julia.img, from the edited sandbox via

singularity build my_updated_julia.img julia_sandbox/

Now, when I run the new container via ./my_updated_julia.img, an approximation to pi prints to stdout.

Container recipes

Rather than creating a sandbox from an image, manually editing that sandbox, and then creating a new image from the altered sandbox, a better documented and more reproducible way to create a new container image is to use a container recipe. For example, we could create a file my-recipe containing the text

Bootstrap: docker
From: julia
%files
    ./calc_pi.jl /opt/
%runscript
    #!/bin/sh

    # if there are no command line arguments, run the calc_pi.jl script 
    if [ -z "$@" ]; then
        julia /opt/calc_pi.jl
    # otherwise, there may be command line arguments provided -- such as a julia script
    else
        julia "$@"
    fi

This script creates a new container by bootstrapping off the julia container found on Docker Hub (docker://julia); unlike the original, the new container will store a copy of the file calc_pi.jl in my working directory in its /opt/ directory and has an updated runscript dictating that julia calc_pi.jl be executed when the container is run without input arguments.

To build a container called julia_from_recipe.simg from this recipe, we can run the following line on, for example, any LC machine that uses SLURM:

srun -t 00:05:00 -p pdebug --userns singularity build --fakeroot julia_from_recipe.simg my-recipe

More generally, we want to run something of the form

srun -t <walltime to build container> -p <queue to build container> --userns singularity build 

As expected, running the new container via ./julia_from_recipe.simg, prints an approximation to pi to stdout.

How to use a container in batch system calculations

Running a container via the batch scheduler

Running a container via the queue/batch system in a HPC environment is as simple as passing your preferred run syntax — singularity run $(image_name) or ./$(image_name) — to the scheduler. For example, using Slurm, a call to srun requesting 16 threads within your batch script might look like

srun -n16 ./my_updated_julia.img

or

srun -n16 singularity run my_updated_julia.img

Though you may need to specify a bank with #SBATCH -A, an example submission script submit.slurm that you could run on one of LC’s CTS systems (like Quartz) might contain the following

#!/bin/bash
#SBATCH -J test_container 
#SBATCH -p pdebug 
#SBATCH -t 00:01:00 
#SBATCH -N 1

srun -n16 ./my_updated_julia.img

Running sbatch submit.slurm at the command line on Quartz (in the same directory where the container image lives) submits the job to the queue. Once this job has run, a slurm-*.out file is written that will contain 16 different approximations to pi.

Running a binary that lives inside the container via batch schedulers

You can also call software that lives inside the container from within a batch script, rather than relying on the container’s runscript.

For example, perhaps I just want access to the julia binary so that I can use it to run various scripts that live in my home directory. Let’s say that I have a file in my home directory called hello.jl that simply contains the line println("hello world!"). I can run this via the batch system using my container if I update submit.slurm to read

#!/bin/bash
#SBATCH -J test_container 
#SBATCH -p pdebug 
#SBATCH -t 00:01:00 
#SBATCH -N 1

srun -n16 singularity run my_updated_julia.img hello.jl

# Alternatively, the following line produces the same result:
# srun -n16 singularity exec my_updated_julia.img julia hello.jl

The output file created by the job run via sbatch submit.slurm will contain “hello world!” 16 times.

A note about performance

In removing hardware concerns, containers can create an easy, friendly user experience. A cost of this ease-of-use is that containerized applications commonly will not achieve the same performance as would natively built applications, which might, for example, call libraries optimized for the specific hardware and underlying architecture.