How to Configure AiiDA v1.5.2

Note: These instructions should work for AiiDA v1.5.2 on LC systems. Instructions for v1.3 are here

For more info, see the full AiiDA docs.

Getting started with AiiDA

Grabbing your credentials

First, log in to the production machine from which you'd like to use AiiDA. On the CZ, you'll find the credentials necessary to configure AiiDA in workspace in the directory /usr/workspace/<lcusername>/.lciaas/cz-<lcusername>. You should see subdirectories for rabbit-mq and postgresql, each with a *.info file.

Note that for groups, credentials will be stored in /usr/workspace/<lcgroupname>/.lciaas/cz-<lcgroupname>.

Cloning and basic AiiDA Setup

Run the following commands to install AiiDA.

$ git clone https://github.com/aiidateam/aiida-core.git
$ cd aiida_core/

$ git checkout v1.4.3 -b v1.4.3

$ python3 -m venv aiidavenv
$ source aiidavenv/bin/activate

$ pip install --upgrade pip # this takes it from pip 18.1 to ~20.2

$ pip install -e . # Note the `.` here! The `aiida_core` directory is an input argument to `pip install -e` here.

$ verdi --version

Creating a profile

You'll create your AiiDA profile using the credentials you found in /usr/workspace/<lcusername>/.lciaas/cz-<lcusername>. Note that postgresql is the database, relevant when you are queried about your database host, port, etc; rabbitmq is the broker, relevant when you are queried for a broker username, password, etc.

Now, run verdi setup --profile test and provide the requested information as shown below.

(aiidavenv) janeh@flash21:~/aiida-core$ verdi setup --profile janes_example
Info: enter "?" for help
Info: enter "!" to ignore the default and set no value
Email Address (for sharing data): herriman1@llnl.gov
First name: Jane
Last name: Herriman
Institution: LLNL
Database engine (postgresql_psycopg2) [postgresql_psycopg2]: # Leave this blank and hit enter to take the default
Database backend (django, sqlalchemy) [django]: # Leave this blank and hit enter to take the default
Database host: postgresql-janeh.apps.czapps.llnl.gov 
Database port [5432]: 32212
Database name: aiida
Database username: janeh
Database password: # Enter the password from `postgresql.info`
Broker protocol (amqp, amqps) [amqp]: amqps
Broker username [guest]: janeh
Broker password [guest]: # Enter the password from `rabitmq.info`
Broker host [127.0.0.1]: rabbitmq-janeh.apps.czapps.llnl.gov
Broker port [5672]: 32200
Broker virtual host name []: aiida
Repository directory [/g/g0/janeh/.aiida/repository/janes_example]: /g/g0/janeh/aiida-core
Success: created new profile `janes_example`.
Info: migrating the database.
Success: database migration completed.

To complete setting up the profile, edit the config file. Open it with your favorite text editor

vi ~/.aiida/config.json

And add broker_parameters after the broker_virtual_host field.

"broker_port": 32200,
"broker_virtual_host": "aiida",
"broker_parameters": {
    "no_verify_ssl": "1",
    "cafile": "/etc/pki/tls/cert.pem"
},

Testing

Now test some verdi commands to see if setting up AiiDA and a profile has worked.

verdi status

verdi profile list 

verdi daemon start 1

verdi shell

Set up and configure a computer

Set up computer

Create a file computer.yml with the following content:

label: flashdebug
hostname: "flash"
description: ""
transport: ssh
scheduler: "slurm"
work_dir: "/usr/workspace/janeh"
mpirun_command: "srun -n {tot_num_mpiprocs}"
mpiprocs_per_machine: "2"
prepend_text: |
  #SBATCH -p pdebug
append_text: " "
shebang: "#!/bin/bash"

If you are working on quartz, you might replace "flash" with "quartz" in both the label and hostname above. Then, run verdi computer setup --config computer.yml. You should then get a message of the form

Success: Computer<4> flashdebug created
Info: Note: before the computer can be used, it has to be configured with the command:
Info:   verdi computer configure ssh flashdebug

Alternatively, you can set up a computer manually by running verdi computer setup:

(aiidavenv) janeh@flash21:~/aiida-core$ verdi computer setup
Info: enter "?" for help
Info: enter "!" to ignore the default and set no value
Computer label: flashdebug
Hostname: flash
Description []:
Transport plugin: ssh
Scheduler plugin: slurm
Shebang line (first line of each script, starting with #!) [#!/bin/bash]:
Work directory on the computer [/scratch/{username}/aiida/]: /usr/workspace/janeh
Mpirun command [mpirun -np {tot_num_mpiprocs}]: srun -n {tot_num_mpiprocs}
Default number of CPUs per machine: 2

After entering the above info for manual setup, a file will open. Feel free to save without adding anything! Here, I choose to add #SBATCH -p pdebug in the “prepend” section to so that jobs submitted to this computer will only use the debug queue:

#==========================================================================#
#= PREPEND_TEXT: if there is any bash commands that should be prepended to
#= the executable call in all submit scripts for this computer, type that
#= between the equal signs below and save the file.
#==========================================================================#

#SBATCH -p pdebug
#==========================================================================#
#= All lines that start with `#=` will be ignored.
#==========================================================================#

After closing this file, a second file will open, which I chose to leave blank and save:

#==========================================================================#
#= APPEND_TEXT: if there is any bash commands that should be appended to
#= the executable call in all submit scripts for this computer, type that
#= between the equal signs below and save the file.
#==========================================================================#


#==========================================================================#
#= All lines that start with `#=` will be ignored.
#==========================================================================#

You should then get a message of the form

Success: Computer<4> flashdebug created
Info: Note: before the computer can be used, it has to be configured with the command:
Info:   verdi computer configure ssh flashdebug

Configure computer¶

Similarly, you can configure your computer with a script or manually at the command line. To do this via a script, create a file conf.yml for computer flashdebug with the following form:

---
username: "janeh"
port: 622
look_for_keys: true
key_filename: "/g/g0/janeh/.ssh/id_rsa"
timeout: 60
allow_agent: true
proxy_command: ""
compress: true
gss_auth: false
gss_kex: false
gss_deleg_creds: false
gss_host: "flash"
load_system_host_keys: true
key_policy: "RejectPolicy"
use_login_shell: true
safe_interval: 10.0

but replace the field username with your LC username and key_filename with the path to your id_rsa file, both in quotation marks. After this, run verdi computer configure ssh flashdebug --config conf.yml. Configuring the same computer manually at the command line would involve running verdi computer configure ssh flashdebug and then entering information as below (again using your own LC username and SSH key file):

(aiidavenv) janeh@flash21:~/aiida-core$ verdi computer configure ssh flashdebug
Info: enter "?" for help
Info: enter "!" to ignore the default and set no value
User name [janeh]:
Port number [22]: 622
Look for keys [True]:
SSH key file []: /g/g0/janeh/.ssh/id_rsa
Connection timeout in s [60]:
Allow ssh agent [True]:
SSH proxy command []:
Compress file transfers [True]:
GSS auth [False]:
GSS kex [False]:
GSS deleg_creds [False]:
GSS host [flash]:
Load system host keys [True]:
Key policy (RejectPolicy, WarningPolicy, AutoAddPolicy) [RejectPolicy]:
Use login shell when executing command [True]:
Connection cooldown time (s) [30.0]: 10.0
Info: Configuring computer flashdebug for user herriman1@llnl.gov.
Success: flashdebug successfully configured for herriman1@llnl.gov

Test computer

To make sure this computer is running, execute verdi computer test flashdebug and make sure all tests pass:

(aiidavenv) janeh@flash21:~/.ssh$ verdi computer test flashdebug
Info: Testing computer<flashdebug> for user<herriman1@llnl.gov>...
* Opening connection... [OK]
* Checking for spurious output... [OK]
* Getting number of jobs from scheduler... [OK]: 5 jobs found in the queue
* Determining remote user name... [OK]: janeh
* Creating and deleting temporary file... [OK]
Success: all 5 tests succeeded

Configure quantum espresso

Install plug ins for quantum espresso

verdi plugin list # to see options, including aiida.calculations 

verdi plugin list aiida.calculations # shows the plugins currently registered 

pip install aiida-quantumespresso # use pip to install a new plug in 

reentry scan # if you run verdi plugin list aiida.calculations, it won’t register the new plug in installed until after the reentry scan

verdi plugin list aiida.calculations # now this shows a long list of quantum espresso related plugins, including the one we’ll use next: quantumespresso.pw

Set up quantum espresso code

To set up quantum espresso with a script, create file code.yml with the following content

label: pw
description: "quantum_espresso"
input_plugin: "quantumespresso.pw"
on_computer: true
remote_abs_path: "/g/g0/janeh/pw.x"
computer: flashdebug
prepend_text: |
  module purge
  module load mkl impi intel
append_text: " "

replacing the field remote_abs_path with the path to your pw.x executable. After running verdi code setup --config code.yml you should see the message

Success: Code<4> pw@flashdebug created

If you'd prefer manual configuration, you can simply run verdi code setup and then provide information for the following prompts, as below:

(aiidavenv) janeh@flash21:~$ verdi code setup
Info: enter "?" for help
Info: enter "!" to ignore the default and set no value
Label: pw
Description []:
Default calculation input plugin: quantumespresso.pw
Installed on target computer? [True]:
Computer: flashdebug
Remote absolute path: /g/g0/janeh/pw.x

After entering the above, a text file will open. Add the following module lines to the “prepend text” section:

#==========================================================================#
#= PREPEND_TEXT: if there is any bash commands that should be prepended to
#= the executable call in all submit scripts for this code, type that
#= between the equal signs below and save the file.
#==========================================================================#

module purge
module load intel mkl impi

#==========================================================================#
#= All lines that start with `#=` will be ignored.
#==========================================================================#

After saving and closing this file, another file will open, prompting you to append text. Leave this file blank and save and close it:

#==========================================================================#
#==========================================================================#
#= APPEND_TEXT: if there is any bash commands that should be appended to
#= the executable call in all submit scripts for this code, type that
#= between the equal signs below and save the file.
#==========================================================================#


#==========================================================================#
#= All lines that start with `#=` will be ignored.
#==========================================================================#

and then you’ll get a message like:

Success: Code<4> pw@flashdebug created

Upload pseudo potentials

First grab a tarball of the pseudopotential files

wget https://legacy-archive.materialscloud.org/file/2018.0001/v4/SSSP_1.1_PBE_efficiency.tar.gz

then extract these files to a newly created directory

mkdir aiida_core/sssp

tar -zxvf SSSP_1.1_PBE_efficiency.tar.gz -C aiida_core/sssp #extract tar file

and upload those files so that AiiDA can find them

(aiidavenv) janeh@flash21:~/aiida-core$ verdi data upf uploadfamily sssp/ sssp "sssp pseudos"
Success: UPF files found: 85. New files uploaded: 85

Run quantum espresso calculation

How to run a test job

You should now be able to run a test calculation via

aiida-quantumespresso calculation launch pw -X pw@flashdebug -p sssp -i -d

If you run verdi process list -a, the most recently run calculation will show up at the bottom of the list, with its identifying “pk” number in the leftmost column:

(aiidavenv) janeh@flash21:~/aiida-core$ verdi process list -a
  PK  Created    Process label    Process State    Process status
----  ---------  ---------------  ---------------  ----------------
  93  36s ago    PwCalculation    ⏹ Created

Total results: 1

Info: last time an entry changed state: 36s ago (at 00:12:00 on 2020-11-19)

You can then take that number and run verdi process show <pk> to learn more about the process.

(aiidavenv) janeh@flash21:~/aiida-core$ verdi process show 93
Property     Value
-----------  ------------------------------------
type         PwCalculation
state        Created
pk           93
uuid         8fa69585-2f7e-4add-990e-963a61d33d9f
label
description
ctime        2020-11-19 00:12:00.303898+00:00
mtime        2020-11-19 00:12:00.599236+00:00
computer     [4] flashdebug

Inputs      PK    Type
----------  ----  -------------
pseudos
    Si      71    UpfData
code        4     Code
kpoints     91    KpointsData
parameters  92    Dict
structure   90    StructureData

Syntactic sugar for running a job

The pk values obtained from verdi process show... can be used to write a launch.py script containing info such as

from aiida.orm import load_node
from aiida.engine import submit

PwCalculation = CalculationFactory('quantumespresso.pw')

inputs = {
    'code': load_code('pw@quartz'),
    'structure': load_node(158),
    'pseudos': {
        'Si': load_node(111),
    },
    'parameters': load_node(253),
    'kpoints': load_node(252),
    'metadata': {
        'options': {
            'resources': {
                'num_machines': 1,
            },
            'max_wallclock_seconds': 500,
            'withmpi': True
        }
    }
}
submit(PwCalculation, **inputs)

which can then be run via

verdi run launch.py

Instead of aiida-quantumespresso calculation launch pw -X pwdebug@quartzdebug -p sssp -i -d. You can again learn the pk of this number by running verdi process list -a.

Troubleshooting a job

Once you have the pk number of a job, you can learn more about the calculation and its outputs via

verdi process show <pk>

verdi process report <pk>

verdi calcjob gotocomputer <pk>

The last option takes you to a directory where you’ll see the outputs from the job and the script actually submitted to slurm to run the desired calculations. You can also dig into the details of various inputs to the calculation via, for example, verdi data dict show <pk of theparametersDict> where the pk associated with parameters is learned from verdi process show <pk>.

If necessary, you can kill a process with verdi process kill <pk>.

Troubleshooting

QE version errors

Error messages

Both the stdout and XML output files could not be read or parsed.

parser returned exit code<305>: Both the stdout and XML output files could not be read or parsed.

Potential fix You may be using a version of Quantum Espresso that isn't supported by AiiDA. For example, the messages below were generated using QE version 6.6 too soon. Changing to a different version of Quantum espresso may resolve these error messages.

More details

+-> WARNING at 2020-08-24 16:44:52.714821+00:00
 | key 'symmetries' is not present in raw output dictionary
+-> ERROR at 2020-08-24 16:44:52.770446+00:00
 | ERROR_OUTPUT_STDOUT_INCOMPLETE
+-> ERROR at 2020-08-24 16:44:52.774858+00:00
 | Both the stdout and XML output files could not be read or parsed.
+-> ERROR at 2020-08-24 16:44:52.777151+00:00
 | parser returned exit code<305>: Both the stdout and XML output files could not be read or parsed.

Daemon restart needed

Error messages

aiida.common.exceptions.MissingEntryPointError: Entry point 'quantumespresso.pw' not found in group 'aiida.parsers'. Try running `reentry scan` to update the entry point cache.

Potential fix Restart daemon (assuming reentry scan does not fix the problem).

More details

Below is an example of full output from verdi process report <pk>

$ verdi process report 85
*** 85: None
*** (empty scheduler output file)
*** Scheduler errors:
The following modules were not unloaded:
  (Use "module --force purge" to unload all):

  1) intel/19.0.4   2) mvapich2/2.3   3) texlive/2016   4) StdEnv

Lmod is automatically replacing "mvapich2/2.3" with "impi/2019.8".


*** 1 LOG MESSAGES:
+-> REPORT at 2020-09-09 21:37:14.776260+00:00
| [85|PwCalculation|on_except]: Traceback (most recent call last):
|   File "/g/g12/keilbart/aiida_core/aiidavenv/lib/python3.6/site-packages/plumpy/process_states.py", line 220, in execute
|     result = self.run_fn(*self.args, **self.kwargs)
|   File "/g/g12/keilbart/aiida_core/aiida/engine/processes/calcjobs/calcjob.py", line 262, in parse
|     exit_code = execmanager.parse_results(self, retrieved_temporary_folder)
|   File "/g/g12/keilbart/aiida_core/aiida/engine/daemon/execmanager.py", line 412, in parse_results
|     parser_class = process.node.get_parser_class()
|   File "/g/g12/keilbart/aiida_core/aiida/orm/nodes/process/calculation/calcjob.py", line 489, in get_parser_class
|     return ParserFactory(parser_name)
|   File "/g/g12/keilbart/aiida_core/aiida/plugins/factories.py", line 158, in ParserFactory
|     entry_point = BaseFactory(entry_point_group, entry_point_name)
|   File "/g/g12/keilbart/aiida_core/aiida/plugins/factories.py", line 46, in BaseFactory
|     return load_entry_point(group, name)
|   File "/g/g12/keilbart/aiida_core/aiida/plugins/entry_point.py", line 202, in load_entry_point
|     entry_point = get_entry_point(group, name)
|   File "/g/g12/keilbart/aiida_core/aiida/plugins/entry_point.py", line 264, in get_entry_point
|     'the entry point cache.'.format(name, group)
| aiida.common.exceptions.MissingEntryPointError: Entry point 'quantumespresso.pw' not found in group 'aiida.parsers'. Try running `reentry scan` to update the entry point cache.

Setting up a code or computer

Error messages

psycopg2.OperationalError: SSL SYSCALL error: EOF detected

Potential fix Try adding the code or computer again. This error message appeared intermittently for me but would then go away on a new attempt, without changing what I was doing.

Jobs stuck in "Created" state

Error messages This issue doesn't come with explicit error messages. Instead, you will see via verdi process list -a that your job is "Created" indefinitely:

(aiidavenv) janeh@flash21:~$ verdi process list
  PK  Created    Process label             Process State    Process status
----  ---------  ------------------------  ---------------  ----------------------------------
  93  54D ago    PwCalculation             ⏹ Created

In this case, verdi process report 93 will not show any error messages in the log. If you try verdi calcjob gotocomputer 93, however, you'll see this will not work:

(aiidavenv) janeh@flash21:~$ verdi calcjob gotocomputer 93
Critical: no remote work directory for this calcjob, maybe the daemon did not submit it yet

Potential fix It's possible that there is something wrong with your rabbitmq configuration. If running verdi status shows that you are connected to rabbitmq, try restarting the daemon and resubmitting your calculation.

If newly submitted jobs continue to enter and persist in "Created", there is probably another verdi process somewhere that is interfering with the verdi daemon you're trying to use to submit this calculation. Start by checking for stale verdi processes on your current login node by running ps -u <lc username>. This should show only two entries labeled verdi. If there are more, kill all verdi processes and then restart the daemon (or simply kill the stale processes, if you're able to distinguish them from the active verdi processes). You will also want to use ps -u <lc username> to identify (and subsequently kill) all stale verdi processes living on any system that connects to the rabbitmq instance you are using. For example, if you have only used one instance of rabbitmq and have run AiiDA on quartz and on flash, you will want to kill stale verdi processes on all quartz and flash login nodes. Reach out to the LC Hotline if you need help with this.

Jobs stuck in “Waiting for transport task: upload” state

Error messages Running verdi process report <pk> gives a FileNotFound error of the form “FileNotFoundError: [Errno 2] No such file or directory: ''”.

Potential fix Try creating and subsequently using a new pseudopotential family by downloading and extracting the SSSP tarball as described above and adding the family with a new label. If this does not solve the issue, contact LC WEG via the hotline. We are still searching for a more elegant solution to this problem but have workarounds that should help.