AiiDA | HPC @ LLNL

What is AiiDA?🔗

AiiDA is an open-source workflow manager geared towards automating complex calculations. At it's core it is a generic framework which handles the submission of calculations, manages the data storage of calculation results, and enables complex error handling to name a few features.

AiiDA requires several other services to run such as a Postrgres database and a RabbitMQ service. Both of these can be found at LaunchIT. Before starting your AiiDA installation process you should initiate both of these services and keep the information readily available.

Getting started with AiiDA🔗

The original development of AiiDA developed around the idea of allowing a personal laptop act as the central repository. It would then be able to log into other computers/servers to submit jobs and retrieve after they had finished. With the advancement of firewalls and security, especially at the lab, it is not as feasible to configure AiiDA with this type of installation. The following instructions will outline the total installation process on LC machines at the lab.

Cloning and basic AiiDA Setup🔗

There are two ways that you can install the core AiiDA package into your system. For either case, it is highly recommended to create a python or conda environment to keep your AiiDA installation separate to prevent breaking possible dependencies. This can be done with

python -m venv /path/to/environment/aiida
source /path/to/environment/aiida/bin/activate

conda create -n aiida
conda activate aiida

Once you have your environment initialized, you can install AiiDA through pip

pip install aiida-core

or download and and install from the AiiDA git repository.

git clone https://github.com/aiidateam/aiida-core.git
cd aiida-core
git checkout v2.6.2 # Decide if there is a more recent version you will use.
pip install -e .

Anytime that you will be working with AiiDA you will need to activate this environment to gain access to the AiiDA commands. Activating your environment adds the name of your environment before the command prompt. It also changes the default binaries for python and pip to be those found in your environment.

Creating a profile🔗

The first thing you'll need to do is create a profile that will be associated with all the simulations launched through AiiDA. The profile will control which Postgres database and RabbitMQ service you will be connecting to. This can be used to separate projects if needed.

The main command to interact with AiiDA is `verdi`. This command should be available to you after installing the AiiDA package. You can list all possible commands with `verdi --help`.

There are several ways to setup a profile. The first is typing

verdi profile setup core.psql_dos

This will prompt you for relative information to set up your profile. A somewhat easier method would be to simply pass a yaml file that the command will read to create your profile.

verdi profile setup core.psql_dos --config profile.yaml

This file should look something like this:

non_interactive: y
profile: <profile_name>
email: <email>
first_name: <First>
last_name: <Last>
institution: LLNL
set_as_default: true
use_rabbitmq: y
database_engine: core.psql_dos
database_hostname: <hostname> # Postgres LaunchIT
database_port: <port> # Postgres LaunchIT
database_name: <name> # Postgres LaunchIT
database_username: <username> # Postgres LaunchIT
database_password: <password> # Postgres LaunchIT
repository_uri: file:///path/to/where/you/want/the/repository

The last line with `repository_uri` should be set to space with a large amount of storage. Your /usr/workspace can work for this. At some point, AiiDA will create a hidden folder called `.aiida`. The location of this can change but is either in your home folder or near your installation of aiida-core. The repository_uri can also be pointed to that location such that `~/.aiida/repository/main` would be the location where you replace ~ with the path. At this point you should be able to see that you have a profile created with `verdi profile list` which should return your chosen username.

AiiDA recently has changed so that you can configure your RabbitMQ server separately. At the time of writing this tutorial, the command to configure the RabbitMQ did not appear to be working correctly. To configure RabbitMQ with AiiDA, you can find a file named `config.json` inside of the `.aiida` folder. Find your user profile and edit the default values that AiiDA provides for the RabbitMQ. These values can be found from your LaunchIT instance. It should look something similar to the following example.

"process_control": {
    "backend": "core.rabbitmq",
        "config": {
            "broker_protocol": "amqps",
            "broker_username": "<username>",
            "broker_password": "<password>",
            "broker_host": "<hostname>",
            "broker_port": <port>,
            "broker_virtual_host": "<virtual_host>",
            "broker_parameters": {
                "no_verify_ssl": "1",
                "cafile": "/etc/pki/tls/cert.pem"
            }
        }
    }

These must be inserted at the same level as the "storage" value in the config.json file. Additionally, for LC the value `broker_parameters` must also be added to the file and can be seen above.

At this point you can test the status of the AiiDA installation by typing `verdi status`. The only thing that should not be initiated at this point is the daemon which interacts with the RabbitMQ server. This can be started with

verdi daemon start

By default it starts with a single daemon. You can increase the number by setting a number at the end of the command or by increasing or decreasing the current number with

verdi daemon incr 1
verdi daemon decr 1

Additionally, a nice with to interact with the python api is to use

verdi shell

which essentially initiates an ipython instance with many of the AiiDA modules pre-loaded.

Set up and configure a computer🔗

Set up computer🔗

A computer within the AiiDA context is where calculations will be ran. This can be any machine that you can ssh into and submit jobs. To give an example, we will create a computer for the Ruby server here at the lab. When setting up a computer. You will want to create a directory where all the simulations will be ran. Ideally, this would be some kind of scratch file system that has plenty of space for your simulations. For Ruby we have the Lustre file system which can be found at "/p/lustre1/<username>". I would recommend creating a folder in that space named AiiDA or something that will prompt you not to delete it in the future.

Similarly to the profile setup process, we can provide a yaml file to simplify the installation process. This can be done with

verdi computer setup --config ruby.yaml

where ruby.yaml is the file that will contain the following settings

label: ruby
hostname: "ruby.llnl.gov" # Make sure you can ssh to this before installation by typing `ssh ruby.lln.gov`
description: "Ruby server at LLNL"
transport: core.ssh
scheduler: "core.slurm"
work_dir: "/p/lustre1/<username>/AiiDA" # You will need to confirm which lustre folder you are a part of.
mpirun_command: "srun -n {tot_num_mpiprocs}"
mpiprocs_per_machine: "56"
prepend_text: ""
append_text: " "
shebang: "#!/bin/bash"

Once the initial step has been taken to setup the computer you will need to finish configuring the computer with

verdi computer configure core.ssh configure_ruby.yml

where configure_ruby.yaml contains

username: "<lc_username>"
port: 22
look_for_keys: true
key_filename: "/g/g1/<username>/.ssh/id_ecdsa" # Path to where your SSH key is at
timeout: 60
allow_agent: true
proxy_command: ""
compress: true
gss_auth: false
gss_kex: false
gss_deleg_creds: false
gss_host: "ruby"
load_system_host_keys: true
key_policy: "RejectPolicy"
use_login_shell: true
safe_interval: 10.0

AiiDA will ask you to set a maximum memory per machine (or node in this case) but this does not play well with the slurm scheduler on LC. You can tell it to ignore this value by putting ! as the value. If you put an initial value then you need to modify it by doing the following.

verdi shell # Loads a python shell with AiiDA functions loaded.

computer = load_computer("ruby") # Should be the name of the server you are modifying.
computer.set_default_memory_per_machine(None)

computer.get_default_memory_per_machine() # This should return nothing at this point.

Test computer🔗

Before testing the computer, you will probably need to download a known_hosts file which contains the information of all the different servers on LC. This should replace your current known_hosts file on LC which should be located in the ~/.ssh folder.

Another possible issue can come from spurious output that occurs as you log into and out of the server. There is a line in your ~/.profile.linux file that echoes 'logout' after every session and will cause issues with AiiDA. Go into the file and comment out the last line to make it look like

# Set up the shell environment:
    #trap "echo 'logout'" 0

To make sure this computer is configured correctly, execute and make sure all tests pass:

verdi computer test ruby

Make sure that all tests pass at this point. Some common issues include needing to modify your .bashrc file to allow for non-interactive sessions. One way to do this is to prepend the following text to your .bashrc file.

if [[ $- != *i* ]] ; then
    return
fi

Configure code🔗

At this point AiiDA can connect to your database/RabbitMQ service and connect to the different servers to which you are wanting to submit calculations. The next step in the process is to configure a code executable so that AiiDA knows where it is located. For an example, we will show how INQ and VASP can be configured to run within AiiDA.

AiiDA works by relying on plugins to specify how a code will interact with the AiiDA framework. Many of the popular code packages already have a community that supports a plugin and can found at the AiiDA plugin registry.

Before installing either of these plugins, we can check what current calculation plugins are currently installed with something like:

verdi plugin list aiida.calculations
Registered entry points for aiida.calculations:
* arithmetic.add
* core.transfer
* templatereplacer

Report: Pass the entry point as an argument to display detailed information

Install plugin for INQ🔗

To install the plugin for INQ, execute the following commands:

git clone https://github.com/LLNL/aiida-inq
cd aiida-inq
pip install -e .

After installing aiida-inq, the plugin list should be updated to show the new calculations that can be performed.

Set up INQ code🔗

To set up INQ with a script, create file inq_code.yml with the following content

label: inq
description: "INQ code from LLNL"
default_calc_job_plugin: "inq.inq"
filepath_executable: "/bin/bash"
computer: ruby
use_double_quotes: false
with_mpi: true
prepend_text: ""
append_text: ""

You might notice that the path to the executable has been set to "/bin/bash". The AiiDA implementation for INQ works by creating a bash script. Inside of that script it will make reference to the INQ binary. With that being the case, make sure that the INQ binary can be found in your $PATH. This can be done by appending the following line to your .bashrc file

export PATH="/path/to/inq/binary:$PATH"

To install the code within AiiDA, you can then execute the following command:

verdi code create core.code.installed --config inq_code.yml

To check if the code has correctly installed and is recognized by AiiDA, you can execute:

verdi code list

Run an INQ calculation🔗

Running a job from a script🔗

One of the easiest ways to launch a job within AiiDA is through simple python scripts. An example script, launch.py, is shown below

from aiida import load_profile
from aiida.orm import load_code, Dict
from aiida.plugins import CalculationFactory, DataFactory
from aiida.engine import run,submit
from ase.build import bulk

# Initiate the default profile
load_profile()

# Get the calculator from AiiDA
InqCalculation = CalculationFactory('inq.inq')

# Find the code you will use for the calculation
code = load_code('inq@ruby')

# Create a structure
StructureData = DataFactory('core.structure')
atoms = bulk('Si', crystalstructure='diamond', a=5.43)
atoms.positions
structure = StructureData(ase=atoms)

inputs = {
    'code': code,
    'structure': structure,
    'parameters' : Dict(dict={
        'electrons': {
            'cutoff': '35.0 Ha',
            'extra-states': 3
        },
        'kpoints': {
            'gamma': '',
            'insert': '-0.5 -0.5 -0.5 0.0'
        },
        'ground-state': {
            'tolerance': 1e-8
        },
        'run': {
            'ground-state': ''
        }
    }),
    'metadata': {
        #'dry_run': True, # If uncommented, will only create the files.
        #'store_provenance': False, # Will not store any of the provenance in the database.
        'options': {
            'resources': {
                'tot_num_mpiprocs': 4
            }
        }
    }
}

# Will show detailed results 
run(InqCalculation, **inputs)
# Comment the previous line and use the following if you want to get the pk value to follow along with.
#calc = submit(InqCalculation, **inputs)
#print(f'Created calculation with PK={calc.pk}')

which can then be run via

verdi run launch.py

You can again learn the pk of this number by running `verdi process list` or `verdi process list -a` to show all calculations. Check the help documentation, `verdi process list --help`, to see all the possible options.

Install plugin for VASP🔗

In order to run VASP you will need to have a current license. If you are not sure if you have access you should reach out to your group or project leaders to see. To install the plugin for VASP, execute the following commands:

pip install aiida-vasp

This will install the latest stable version of aiida-vasp. After installing aiida-vasp, the plugin list should be updated to show the new calculations that can be performed.

Set up VASP code🔗

To set up VASP with a script, create file vasp_code.yml with the following content

label: vasp_std
description: "VASP code Ruby server at LLNL"
default_calc_job_plugin: "vasp.vasp"
filepath_executable: "/path/to/vasp/binary"
computer: ruby
use_double_quotes: false
with_mpi: true
prepend_text: ""
append_text: ""

To install the code within AiiDA, you can then execute the following command:

verdi code create core.code.installed --config vasp_code.yml

To check if the code has correctly installed and is recognized by AiiDA, you can execute:

verdi code list

Install VASP POTCAR files🔗

VASP comes with proprietary PAW pseudopotential files which are named POTCAR. Elements can have multiple pseudopotential files depending on the number of electrons that were included when creating the file. To upload these to your AiiDA instance you will need to upload the files. This can be done by providing a tar file with all the POTCAR files. The format should be the same as they are provided from the VASP software.

% verdi data vasp-potcar uploadfamily --path=/path/to/potpaw_PBE.54.tar --name=PBE.54 --description="PBE potentials version 54"
POTCAR files found: 327. New files uploaded: 327, Added to Family: 327

The name that you provide here is how the pseudopotential family will be referenced by AiiDA later.

Run a VASP calculation🔗

Running a job from a script🔗

One of the easiest ways to launch a job within AiiDA is through simple python scripts. An example script, launch.py, is shown below

from aiida import load_profile
from aiida.orm import load_code, load_group, Str, Group, Int
from aiida.plugins import DataFactory, WorkflowFactory
from aiida.common.extendeddicts import AttributeDict
from ase.io import read
from ase.build import bulk, sort
from aiida.engine import submit, run

# Initiate the default profile
load_profile()

# Initiate workchain and other inputs
workchain = WorkflowFactory('vasp.relax')
inputs = AttributeDict()
settings = AttributeDict()
dict_data = DataFactory('core.dict')
kpoints_data = DataFactory('core.array.kpoints')
Bool = DataFactory('core.bool')

# Settings
settings.parser_settings = {
    'include_node': ['energies', 'trajectory'],
    'include_quantity': ['forces', 'stress'],
    'electronic_step_energies': True
}

inputs.settings = dict_data(dict=settings)

# Find the code you will use for the calculation
code = load_code('vasp_std@ruby') # This will change based on the computer you install on.

# Structure information
StructureData = DataFactory('core.structure')
atoms = bulk('Si', crystalstructure='diamond', a=5.43)
structure = StructureData(ase=atoms)
inputs.structure = structure

# KPOINTS
kpoints = kpoints_data()
kpoints.set_kpoints_mesh([3,3,3])
inputs.kpoints = kpoints

# INCAR
inputs.parameters = dict_data(dict={
    'incar': {
        'algo': 'Conjugate',
        'encut': 500,
        'prec': 'ACCURATE',
        'ediff': 1E-4,
        'ispin': 2,
        'magmom': [0]*len(atoms),
        'lorbit': 11,
        'ismear': 0,
        'sigma': 0.1,
        'gga': 'PS',
        'kpar': 2,
        'ncore': 14,
        'nelm': 500
    }
})

inputs.converge = AttributeDict({'pwcutoff_samples': Int(15)})

# POTCAR information
inputs.potential_family = Str('PBE.54') # Name you previously specified when uploading POTCAR
inputs.potential_mapping = dict_data(dict={'Si': 'Si'})

# Submission options
options = AttributeDict()
options.account = 'bank_name' # Name of the bank on LC
options.queue_name = 'pbatch'
options.max_wallclock_seconds = 60 * 60 * 12
options.resources = {'num_machines': 1} # Number of nodes
inputs.options = dict_data(dict=options)

# Relax options
relax = AttributeDict()
relax.perform = Bool(True)
# Select relaxation algorithm
relax.algo = DataFactory('core.str')('cg')
# Set force cutoff limit (EDIFFG, but no sign needed)
relax.force_cutoff = DataFactory('core.float')(0.01)
# Turn on relaxation of positions (strictly not needed as the default is on)
# The three next parameters correspond to the well known ISIF=3 setting
relax.positions = DataFactory('core.bool')(True)
# Turn on relaxation of the cell shape (defaults to False)
relax.shape = DataFactory('core.bool')(True)
# Turn on relaxation of the volume (defaults to False)
relax.volume = DataFactory('core.bool')(True)
# Set maximum number of ionic steps
relax.steps = DataFactory('core.int')(100)
inputs.relax = relax

# Label
inputs.label = Str('Pu2O3 structure optimization')
inputs.description = Str('Structure optimization of Pu2O3 without optimized spin states. Will be computed afterwards.')

inputs.clean_workdir = False

# Submit the workchain
calc = run(workchain, **inputs)

print(f'Launched geometry optimization with PK={calc.pk}')

This comprehensive script will launch a calculation of silicon. There are several parameters to consider when launching. The settings portion of the inputs contains the parser settings and tells AiiDA which sections to parse. This list is rather large and you should consult the documentation for aiida-vasp to see how to get a particular property. The options section contains the job submission details such as bank, queue, resources, and walltime. There are many options for specifying how to optimize the geometry of the structure. Most of the options have been listed here to show the full capabilities of the relax workflow.

The launch.py script can then be run via

verdi run launch.py

The pk of the launched job can be found by running `verdi process list` or `verdi process list -a` to show all calculations. Check the help documentation, `verdi process list --help`, to see all the possible options.

Checking job status in AiiDA🔗

If you run `verdi process list`, current calculation will show up at the bottom of the list, with its identifying “pk” number in the leftmost column:

verdi process list 
  PK  Created    Process label    Process State    Process status
----  ---------  ---------------  ---------------  ----------------
 3339  13m ago    InqCalculation        ⏹ Finished [0]

You can then take that number and run verdi process show <pk> to learn more about the process.

Property     Value
-----------  -------------------------------------------------------------
type         InqTDDFTWorkChain
state        Finished [11] The process did not register a required output.
pk           3339
uuid         c9d7fde2-12bf-4901-9893-3954695ff78c
label
description
ctime        2024-10-01 17:58:04.092209+00:00
mtime        2024-10-01 17:58:18.093927+00:00

Inputs              PK    Type
------------------  ----  -------------
gs
    inq
        code        1     InstalledCode
        structure   3331  StructureData
        parameters  3332  Dict
    clean_workdir   3333  Bool
    max_iterations  3334  Int
tddft
    inq
        code        1     InstalledCode
        structure   3331  StructureData
        parameters  3335  Dict
    clean_workdir   3336  Bool
    max_iterations  3337  Int
clean_workdir       3338  Bool
structure           3331  StructureData

Outputs              PK  Type
-----------------  ----  -------------
output_parameters  3354  Dict
output_structure   3353  StructureData

Called          PK  Type
------------  ----  ----------------
Ground_State  3341  InqBaseWorkChain
TDDFT         3348  InqBaseWorkChain

Log messages
---------------------------------------------
There are 3 log messages for this calculation
Run 'verdi process report 3339' to see them

Troubleshooting a job🔗

Once you have the pk number of a job, you can learn more about the calculation and its outputs via

verdi process show <pk>

verdi process report <pk>

verdi calcjob gotocomputer <pk>

The last option takes you to a directory where you’ll see the outputs from the job and the script actually submitted to slurm to run the desired calculations.

If necessary, you can kill a process with verdi process kill <pk>.

Getting results🔗

To look at the results of the calculation you can either write a script or start a verdi shell instance. The following code will get the results of the previous calculation.

verdi shell

node = load_node(3339)
outputs = node.outputs.output_parameters.get_dict()
print(outputs)

Conclusion🔗

Hopefully this brief tutorial is enough to get you started running calculations using AiiDA on the LC servers at LLNL. For further documentation we highly suggest going over the documentation and tutorials that AiiDA has on their website.