FireWorks is a free, open-source code for defining, managing, and executing scientific workflows. It has been designed to coordinate scientific analysis on large systems like Livermore Computing (LC) HPC.
Strengths of FireWorks🔗
- Well-suited to LC-- we have many FireWorks users
- Easy to install via conda
- Server/worker model is inherently load-balancing
- Extremely flexible and able to handle complex job structures
Configuration of FireWorks🔗
- Requires a MongoDB database (See these docs for more details.)
- FireWorks requires some time to learn and understand (more powerful tools are more complex)
Overview of FireWorks🔗
FireWorks uses a centralized server model where the server manages the workflows and workers run the jobs.
To use FireWorks you must request a mongoDB database. Please see our docs on LaunchIT for more details.
Below is an example of how to use FireWorks at LC. This is based heavily on the Fireworks tutorial which you will find here.
FireWorks is the primary workflow engine for the Materials Project, but this tool is general, well-documented, and suitable for a wide variety of scientific applications. For more information on Fireworks visit: https://materialsproject.github.io/fireworks/index.html
FireWorks uses a number of terms to describe the different parts of the workflow manager.
- FireServer: MongoDB that controls the workflow. Also referred to as the LaunchPad. It contains all the tasks to be run, and whether they have run successfully etc.
- FireTask: A Firetask is an atomic computing job. It can call a single shell script or execute a single Python function that you define (either within FireWorks, or in an external package).
- FireWork: A Firework contains the JSON spec that includes all the information needed to bootstrap your job. For example, the spec contains an array of Firetasks to execute in sequence. The spec also includes any input parameters to pass to your Firetasks. You can easily perform the same function over different input data by creating Fireworks with identical Firetasks but different input parameters in the spec. You can design your spec however you’d like, as long as it’s valid JSON. The JSON format used for Firework specs is extremely flexible, very easy to learn (Python users familiar with dicts and arrays essentially already know JSON), and immediately makes rich searches over the input data available to end users through MongoDB’s JSON document search capabilities.
- Workflows: A Workflow is a set of FireWorks with dependencies between them. For example, you might need a parent Firework to finish and generate some output files before running two child FireWorks.
- Rocket: A Rocket fetches a FireWork from the LaunchPad and runs it. It can be run on a separate machine (FireWorker) or through a batch system.
The configuration is accessed through the FW_CONFIG_FILE environment variable. Since LC systems in the same zone (e.g. quartz or ruby in the CZ) share the same home directory but may have different compiled binaries, we should create different configuration directories for each system.
On quartz, for example, a configuration directory might be created at $HOME/.atomate/quartz. For user shen9, this is at /g/g20/shen9/.atomate/quartz. Within this directory, two yaml files should be created -- a "launchpad" file with your authentication credentials from LaunchIT (my_launchpad.yaml) and a FireWorks configuration file (FW_config.yaml).
The credentials you obtailed from LaunchIT can be placed into my_launchpad.yaml with the following format:
host: [HOSTNAME] port: [PORT] name: [DBNAME] username: [USERNAME] password: [PASSWORD] strm_lvl: INFO mongoclient_kwargs: ssl: true tlsAllowInvalidCertificates: true
FW_config.yaml references my_launchpad.yaml with the following contents for user shen9:
ECHO_TEST: "ATOMATE ON QUARTZ" CONFIG_FILE_DIR: /g/g20/shen9/.atomate/quartz LAUNCHPAD_LOC: /g/g20/shen9/.atomate/quartz/my_launchpad.yaml
and the following general format:
ECHO_TEST: "Some string to test configuration" CONFIG_FILE_DIR: [FireWorks config directory for LC system] LAUNCHPAD_LOC: [Path to launchpad yaml with LaunchIT credentials]
Once you've created a configuration directory and the two yaml files described above, you can set the FW_CONFIG_FILE to find your FireWorks configuration file. For example,
Note: Once the FW_CONFIG_FILE variable is set, the string you assigned to ECHO_TEST will print to stdout whenever the FireWorks commandline tool lpad is called:
(mp) [shen9@quartz2300:quartz]$ lpad report ATOMATE ON QUARTZ
Initializing the database🔗
Be extremely careful when running this command! It will wipe all existing entries in your FireWorks database in the fireworks, workflows, and launches collections.
You should see the output of the echo test followed by a prompt to delete and reset the database.
Now on LaunchIT🔗
Users can now provision a FireWorks instance with a mongoDB backend on LaunchIT! For further information please visit our documentation on LaunchIT.