Note: These instructions should work for AiiDA v1.5.2 on LC systems. Instructions for v1.3 are here
For more info, see the full AiiDA docs.
Getting started with AiiDA
Grabbing your credentials
First, log in to the production machine from which you'd like to use AiiDA. On the CZ, you'll find the credentials necessary to configure AiiDA in workspace in the directory /usr/workspace/<lcusername>/.lciaas/cz-<lcusername>. You should see subdirectories for rabbit-mq and postgresql, each with a *.info file.
Note that for groups, credentials will be stored in /usr/workspace/<lcgroupname>/.lciaas/cz-<lcgroupname>.
Cloning and basic AiiDA Setup
Run the following commands to install AiiDA.
$ git clone https://github.com/aiidateam/aiida-core.git $ cd aiida_core/ $ git checkout v1.4.3 -b v1.4.3 $ python3 -m venv aiidavenv $ source aiidavenv/bin/activate $ pip install --upgrade pip # this takes it from pip 18.1 to ~20.2 $ pip install -e . # Note the `.` here! The `aiida_core` directory is an input argument to `pip install -e` here. $ verdi --version
Creating a profile
You'll create your AiiDA profile using the credentials you found in /usr/workspace/<lcusername>/.lciaas/cz-<lcusername>. Note that postgresql is the database, relevant when you are queried about your database host, port, etc; rabbitmq is the broker, relevant when you are queried for a broker username, password, etc.
Now, run verdi setup --profile test and provide the requested information as shown below.
(aiidavenv) janeh@flash21:~/aiida-core$ verdi setup --profile janes_example Info: enter "?" for help Info: enter "!" to ignore the default and set no value Email Address (for sharing data): herriman1@llnl.gov First name: Jane Last name: Herriman Institution: LLNL Database engine (postgresql_psycopg2) [postgresql_psycopg2]: # Leave this blank and hit enter to take the default Database backend (django, sqlalchemy) [django]: # Leave this blank and hit enter to take the default Database host: postgresql-janeh.apps.czapps.llnl.gov Database port [5432]: 32212 Database name: aiida Database username: janeh Database password: # Enter the password from `postgresql.info` Broker protocol (amqp, amqps) [amqp]: amqps Broker username [guest]: janeh Broker password [guest]: # Enter the password from `rabitmq.info` Broker host [127.0.0.1]: rabbitmq-janeh.apps.czapps.llnl.gov Broker port [5672]: 32200 Broker virtual host name []: aiida Repository directory [/g/g0/janeh/.aiida/repository/janes_example]: /g/g0/janeh/aiida-core Success: created new profile `janes_example`. Info: migrating the database. Success: database migration completed.
To complete setting up the profile, edit the config file. Open it with your favorite text editor
vi ~/.aiida/config.json
And add broker_parameters after the broker_virtual_host field.
"broker_port": 32200, "broker_virtual_host": "aiida", "broker_parameters": { "no_verify_ssl": "1", "cafile": "/etc/pki/tls/cert.pem" },
Testing
Now test some verdi commands to see if setting up AiiDA and a profile has worked.
verdi status verdi profile list verdi daemon start 1 verdi shell
Set up and configure a computer
Set up computer
Create a file computer.yml with the following content:
label: flashdebug hostname: "flash" description: "" transport: ssh scheduler: "slurm" work_dir: "/usr/workspace/janeh" mpirun_command: "srun -n {tot_num_mpiprocs}" mpiprocs_per_machine: "2" prepend_text: | #SBATCH -p pdebug append_text: " " shebang: "#!/bin/bash"
If you are working on quartz, you might replace "flash" with "quartz" in both the label and hostname above. Then, run verdi computer setup --config computer.yml. You should then get a message of the form
Success: Computer<4> flashdebug created Info: Note: before the computer can be used, it has to be configured with the command: Info: verdi computer configure ssh flashdebug
Alternatively, you can set up a computer manually by running verdi computer setup:
(aiidavenv) janeh@flash21:~/aiida-core$ verdi computer setup Info: enter "?" for help Info: enter "!" to ignore the default and set no value Computer label: flashdebug Hostname: flash Description []: Transport plugin: ssh Scheduler plugin: slurm Shebang line (first line of each script, starting with #!) [#!/bin/bash]: Work directory on the computer [/scratch/{username}/aiida/]: /usr/workspace/janeh Mpirun command [mpirun -np {tot_num_mpiprocs}]: srun -n {tot_num_mpiprocs} Default number of CPUs per machine: 2
After entering the above info for manual setup, a file will open. Feel free to save without adding anything! Here, I choose to add #SBATCH -p pdebug in the “prepend” section to so that jobs submitted to this computer will only use the debug queue:
#==========================================================================# #= PREPEND_TEXT: if there is any bash commands that should be prepended to #= the executable call in all submit scripts for this computer, type that #= between the equal signs below and save the file. #==========================================================================# #SBATCH -p pdebug #==========================================================================# #= All lines that start with `#=` will be ignored. #==========================================================================#
After closing this file, a second file will open, which I chose to leave blank and save:
#==========================================================================# #= APPEND_TEXT: if there is any bash commands that should be appended to #= the executable call in all submit scripts for this computer, type that #= between the equal signs below and save the file. #==========================================================================# #==========================================================================# #= All lines that start with `#=` will be ignored. #==========================================================================#
You should then get a message of the form
Success: Computer<4> flashdebug created Info: Note: before the computer can be used, it has to be configured with the command: Info: verdi computer configure ssh flashdebug
Configure computer¶
Similarly, you can configure your computer with a script or manually at the command line. To do this via a script, create a file conf.yml for computer flashdebug with the following form:
--- username: "janeh" port: 622 look_for_keys: true key_filename: "/g/g0/janeh/.ssh/id_rsa" timeout: 60 allow_agent: true proxy_command: "" compress: true gss_auth: false gss_kex: false gss_deleg_creds: false gss_host: "flash" load_system_host_keys: true key_policy: "RejectPolicy" use_login_shell: true safe_interval: 10.0
but replace the field username with your LC username and key_filename with the path to your id_rsa file, both in quotation marks. After this, run verdi computer configure ssh flashdebug --config conf.yml. Configuring the same computer manually at the command line would involve running verdi computer configure ssh flashdebug and then entering information as below (again using your own LC username and SSH key file):
(aiidavenv) janeh@flash21:~/aiida-core$ verdi computer configure ssh flashdebug Info: enter "?" for help Info: enter "!" to ignore the default and set no value User name [janeh]: Port number [22]: 622 Look for keys [True]: SSH key file []: /g/g0/janeh/.ssh/id_rsa Connection timeout in s [60]: Allow ssh agent [True]: SSH proxy command []: Compress file transfers [True]: GSS auth [False]: GSS kex [False]: GSS deleg_creds [False]: GSS host [flash]: Load system host keys [True]: Key policy (RejectPolicy, WarningPolicy, AutoAddPolicy) [RejectPolicy]: Use login shell when executing command [True]: Connection cooldown time (s) [30.0]: 10.0 Info: Configuring computer flashdebug for user herriman1@llnl.gov. Success: flashdebug successfully configured for herriman1@llnl.gov
Test computer
To make sure this computer is running, execute verdi computer test flashdebug and make sure all tests pass:
(aiidavenv) janeh@flash21:~/.ssh$ verdi computer test flashdebug Info: Testing computer<flashdebug> for user<herriman1@llnl.gov>... * Opening connection... [OK] * Checking for spurious output... [OK] * Getting number of jobs from scheduler... [OK]: 5 jobs found in the queue * Determining remote user name... [OK]: janeh * Creating and deleting temporary file... [OK] Success: all 5 tests succeeded
Configure quantum espresso
Install plug ins for quantum espresso
verdi plugin list # to see options, including aiida.calculations verdi plugin list aiida.calculations # shows the plugins currently registered pip install aiida-quantumespresso # use pip to install a new plug in reentry scan # if you run verdi plugin list aiida.calculations, it won’t register the new plug in installed until after the reentry scan verdi plugin list aiida.calculations # now this shows a long list of quantum espresso related plugins, including the one we’ll use next: quantumespresso.pw
Set up quantum espresso code
To set up quantum espresso with a script, create file code.yml with the following content
label: pw description: "quantum_espresso" input_plugin: "quantumespresso.pw" on_computer: true remote_abs_path: "/g/g0/janeh/pw.x" computer: flashdebug prepend_text: | module purge module load mkl impi intel append_text: " "
replacing the field remote_abs_path with the path to your pw.x executable. After running verdi code setup --config code.yml you should see the message
Success: Code<4> pw@flashdebug created
If you'd prefer manual configuration, you can simply run verdi code setup and then provide information for the following prompts, as below:
(aiidavenv) janeh@flash21:~$ verdi code setup Info: enter "?" for help Info: enter "!" to ignore the default and set no value Label: pw Description []: Default calculation input plugin: quantumespresso.pw Installed on target computer? [True]: Computer: flashdebug Remote absolute path: /g/g0/janeh/pw.x
After entering the above, a text file will open. Add the following module lines to the “prepend text” section:
#==========================================================================# #= PREPEND_TEXT: if there is any bash commands that should be prepended to #= the executable call in all submit scripts for this code, type that #= between the equal signs below and save the file. #==========================================================================# module purge module load intel mkl impi #==========================================================================# #= All lines that start with `#=` will be ignored. #==========================================================================#
After saving and closing this file, another file will open, prompting you to append text. Leave this file blank and save and close it:
#==========================================================================# #==========================================================================# #= APPEND_TEXT: if there is any bash commands that should be appended to #= the executable call in all submit scripts for this code, type that #= between the equal signs below and save the file. #==========================================================================# #==========================================================================# #= All lines that start with `#=` will be ignored. #==========================================================================#
and then you’ll get a message like:
Success: Code<4> pw@flashdebug created
Upload pseudo potentials
First grab a tarball of the pseudopotential files
wget https://legacy-archive.materialscloud.org/file/2018.0001/v4/SSSP_1.1_PBE_efficiency.tar.gz
then extract these files to a newly created directory
mkdir aiida_core/sssp tar -zxvf SSSP_1.1_PBE_efficiency.tar.gz -C aiida_core/sssp #extract tar file
and upload those files so that AiiDA can find them
(aiidavenv) janeh@flash21:~/aiida-core$ verdi data upf uploadfamily sssp/ sssp "sssp pseudos" Success: UPF files found: 85. New files uploaded: 85
Run quantum espresso calculation
How to run a test job
You should now be able to run a test calculation via
aiida-quantumespresso calculation launch pw -X pw@flashdebug -p sssp -i -d
If you run verdi process list -a, the most recently run calculation will show up at the bottom of the list, with its identifying “pk” number in the leftmost column:
(aiidavenv) janeh@flash21:~/aiida-core$ verdi process list -a PK Created Process label Process State Process status ---- --------- --------------- --------------- ---------------- 93 36s ago PwCalculation ⏹ Created Total results: 1 Info: last time an entry changed state: 36s ago (at 00:12:00 on 2020-11-19)
You can then take that number and run verdi process show <pk> to learn more about the process.
(aiidavenv) janeh@flash21:~/aiida-core$ verdi process show 93 Property Value ----------- ------------------------------------ type PwCalculation state Created pk 93 uuid 8fa69585-2f7e-4add-990e-963a61d33d9f label description ctime 2020-11-19 00:12:00.303898+00:00 mtime 2020-11-19 00:12:00.599236+00:00 computer [4] flashdebug Inputs PK Type ---------- ---- ------------- pseudos Si 71 UpfData code 4 Code kpoints 91 KpointsData parameters 92 Dict structure 90 StructureData
Syntactic sugar for running a job
The pk values obtained from verdi process show... can be used to write a launch.py script containing info such as
from aiida.orm import load_node from aiida.engine import submit PwCalculation = CalculationFactory('quantumespresso.pw') inputs = { 'code': load_code('pw@quartz'), 'structure': load_node(158), 'pseudos': { 'Si': load_node(111), }, 'parameters': load_node(253), 'kpoints': load_node(252), 'metadata': { 'options': { 'resources': { 'num_machines': 1, }, 'max_wallclock_seconds': 500, 'withmpi': True } } } submit(PwCalculation, **inputs)
which can then be run via
verdi run launch.py
Instead of aiida-quantumespresso calculation launch pw -X pwdebug@quartzdebug -p sssp -i -d. You can again learn the pk of this number by running verdi process list -a.
Troubleshooting a job
Once you have the pk number of a job, you can learn more about the calculation and its outputs via
verdi process show <pk> verdi process report <pk> verdi calcjob gotocomputer <pk>
The last option takes you to a directory where you’ll see the outputs from the job and the script actually submitted to slurm to run the desired calculations. You can also dig into the details of various inputs to the calculation via, for example, verdi data dict show <pk of theparametersDict> where the pk associated with parameters is learned from verdi process show <pk>.
If necessary, you can kill a process with verdi process kill <pk>.
Troubleshooting
QE version errors
Error messages
Both the stdout and XML output files could not be read or parsed.
parser returned exit code<305>: Both the stdout and XML output files could not be read or parsed.
Potential fix You may be using a version of Quantum Espresso that isn't supported by AiiDA. For example, the messages below were generated using QE version 6.6 too soon. Changing to a different version of Quantum espresso may resolve these error messages.
More details
+-> WARNING at 2020-08-24 16:44:52.714821+00:00 | key 'symmetries' is not present in raw output dictionary +-> ERROR at 2020-08-24 16:44:52.770446+00:00 | ERROR_OUTPUT_STDOUT_INCOMPLETE +-> ERROR at 2020-08-24 16:44:52.774858+00:00 | Both the stdout and XML output files could not be read or parsed. +-> ERROR at 2020-08-24 16:44:52.777151+00:00 | parser returned exit code<305>: Both the stdout and XML output files could not be read or parsed.
Daemon restart needed
Error messages
aiida.common.exceptions.MissingEntryPointError: Entry point 'quantumespresso.pw' not found in group 'aiida.parsers'. Try running `reentry scan` to update the entry point cache.
Potential fix Restart daemon (assuming reentry scan does not fix the problem).
More details
Below is an example of full output from verdi process report <pk>
$ verdi process report 85 *** 85: None *** (empty scheduler output file) *** Scheduler errors: The following modules were not unloaded: (Use "module --force purge" to unload all): 1) intel/19.0.4 2) mvapich2/2.3 3) texlive/2016 4) StdEnv Lmod is automatically replacing "mvapich2/2.3" with "impi/2019.8". *** 1 LOG MESSAGES: +-> REPORT at 2020-09-09 21:37:14.776260+00:00 | [85|PwCalculation|on_except]: Traceback (most recent call last): | File "/g/g12/keilbart/aiida_core/aiidavenv/lib/python3.6/site-packages/plumpy/process_states.py", line 220, in execute | result = self.run_fn(*self.args, **self.kwargs) | File "/g/g12/keilbart/aiida_core/aiida/engine/processes/calcjobs/calcjob.py", line 262, in parse | exit_code = execmanager.parse_results(self, retrieved_temporary_folder) | File "/g/g12/keilbart/aiida_core/aiida/engine/daemon/execmanager.py", line 412, in parse_results | parser_class = process.node.get_parser_class() | File "/g/g12/keilbart/aiida_core/aiida/orm/nodes/process/calculation/calcjob.py", line 489, in get_parser_class | return ParserFactory(parser_name) | File "/g/g12/keilbart/aiida_core/aiida/plugins/factories.py", line 158, in ParserFactory | entry_point = BaseFactory(entry_point_group, entry_point_name) | File "/g/g12/keilbart/aiida_core/aiida/plugins/factories.py", line 46, in BaseFactory | return load_entry_point(group, name) | File "/g/g12/keilbart/aiida_core/aiida/plugins/entry_point.py", line 202, in load_entry_point | entry_point = get_entry_point(group, name) | File "/g/g12/keilbart/aiida_core/aiida/plugins/entry_point.py", line 264, in get_entry_point | 'the entry point cache.'.format(name, group) | aiida.common.exceptions.MissingEntryPointError: Entry point 'quantumespresso.pw' not found in group 'aiida.parsers'. Try running `reentry scan` to update the entry point cache.
Setting up a code or computer
Error messages
psycopg2.OperationalError: SSL SYSCALL error: EOF detected
Potential fix Try adding the code or computer again. This error message appeared intermittently for me but would then go away on a new attempt, without changing what I was doing.
Jobs stuck in "Created" state
Error messages This issue doesn't come with explicit error messages. Instead, you will see via verdi process list -a that your job is "Created" indefinitely:
(aiidavenv) janeh@flash21:~$ verdi process list PK Created Process label Process State Process status ---- --------- ------------------------ --------------- ---------------------------------- 93 54D ago PwCalculation ⏹ Created
In this case, verdi process report 93 will not show any error messages in the log. If you try verdi calcjob gotocomputer 93, however, you'll see this will not work:
(aiidavenv) janeh@flash21:~$ verdi calcjob gotocomputer 93 Critical: no remote work directory for this calcjob, maybe the daemon did not submit it yet
Potential fix It's possible that there is something wrong with your rabbitmq configuration. If running verdi status shows that you are connected to rabbitmq, try restarting the daemon and resubmitting your calculation.
If newly submitted jobs continue to enter and persist in "Created", there is probably another verdi process somewhere that is interfering with the verdi daemon you're trying to use to submit this calculation. Start by checking for stale verdi processes on your current login node by running ps -u <lc username>. This should show only two entries labeled verdi. If there are more, kill all verdi processes and then restart the daemon (or simply kill the stale processes, if you're able to distinguish them from the active verdi processes). You will also want to use ps -u <lc username> to identify (and subsequently kill) all stale verdi processes living on any system that connects to the rabbitmq instance you are using. For example, if you have only used one instance of rabbitmq and have run AiiDA on quartz and on flash, you will want to kill stale verdi processes on all quartz and flash login nodes. Reach out to the LC Hotline if you need help with this.
Jobs stuck in “Waiting for transport task: upload” state
Error messages Running verdi process report <pk> gives a FileNotFound error of the form “FileNotFoundError: [Errno 2] No such file or directory: ''”.
Potential fix Try creating and subsequently using a new pseudopotential family by downloading and extracting the SSSP tarball as described above and adding the family with a new label. If this does not solve the issue, contact LC WEG via the hotline. We are still searching for a more elegant solution to this problem but have workarounds that should help.