FireWorks is a free, open-source code for defining, managing, and executing scientific workflows. It has been designed to coordinate scientific analysis on large systems like Livermore Computing (LC) HPC.
Strengths of FireWorks¶
- Well-suited to LC-- we have many FireWorks users
- Easy to install via conda
- Server/worker model is inherently load-balancing
- Extremely flexible and able to handle complex job structures
Configuration of FireWorks¶
- Requires a MongoDB database (See these docs for more details.)
- FireWorks requires some time to learn and understand (more powerful tools are more complex)
Overview of FireWorks¶
FireWorks uses a centralized server model where the server manages the workflows and workers run the jobs.
To use FireWorks you must request a mongoDB database. Please see our docs on LaunchIT for more details.
Below is an example of how to use FireWorks at LC. This is based heavily on the Fireworks tutorial which you will find here.
FireWorks is the primary workflow engine for the Materials Project, but this tool is general, well-documented, and suitable for a wide variety of scientific applications. For more information on Fireworks visit: https://materialsproject.github.io/fireworks/index.html
FireWorks uses a number of terms to describe the different parts of the workflow manager.
- FireServer: MongoDB that controls the workflow. Also referred to as the LaunchPad. It contains all the tasks to be run, and whether they have run successfully etc.
- FireTask: A Firetask is an atomic computing job. It can call a single shell script or execute a single Python function that you define (either within FireWorks, or in an external package).
- FireWork: A Firework contains the JSON spec that includes all the information needed to bootstrap your job. For example, the spec contains an array of Firetasks to execute in sequence. The spec also includes any input parameters to pass to your Firetasks. You can easily perform the same function over different input data by creating Fireworks with identical Firetasks but different input parameters in the spec. You can design your spec however you’d like, as long as it’s valid JSON. The JSON format used for Firework specs is extremely flexible, very easy to learn (Python users familiar with dicts and arrays essentially already know JSON), and immediately makes rich searches over the input data available to end users through MongoDB’s JSON document search capabilities.
- Workflows: A Workflow is a set of FireWorks with dependencies between them. For example, you might need a parent Firework to finish and generate some output files before running two child FireWorks.
- Rocket: A Rocket fetches a FireWork from the LaunchPad and runs it. It can be run on a separate machine (FireWorker) or through a batch system.
The configuration is accessed through the
FW_CONFIG_FILE environment variable. Since LC systems in the same zone (e.g.
ruby in the CZ) share the same home directory but may have different compiled binaries, we should create different configuration directories for each system.
quartz, for example, a configuration directory might be created at
$HOME/.atomate/quartz. For user
shen9, this is at
/g/g20/shen9/.atomate/quartz. Within this directory, two yaml files should be created -- a "launchpad" file with your authentication credentials from LaunchIT (
my_launchpad.yaml) and a FireWorks configuration file (
The credentials you obtailed from LaunchIT can be placed into
my_launchpad.yaml with the following format:
host: [HOSTNAME] port: [PORT] name: [DBNAME] username: [USERNAME] password: [PASSWORD] strm_lvl: INFO mongoclient_kwargs: ssl: true tlsAllowInvalidCertificates: true
my_launchpad.yaml with the following contents for user
ECHO_TEST: "ATOMATE ON QUARTZ" CONFIG_FILE_DIR: /g/g20/shen9/.atomate/quartz LAUNCHPAD_LOC: /g/g20/shen9/.atomate/quartz/my_launchpad.yaml
and the following general format:
ECHO_TEST: "Some string to test configuration" CONFIG_FILE_DIR: [FireWorks config directory for LC system] LAUNCHPAD_LOC: [Path to launchpad yaml with LaunchIT credentials]
Once you've created a configuration directory and the two yaml files described above, you can set the
FW_CONFIG_FILE to find your FireWorks configuration file. For example,
Note: Once the
FW_CONFIG_FILE variable is set, the string you assigned to
ECHO_TEST will print to
stdout whenever the FireWorks commandline tool
lpad is called:
(mp) [shen9@quartz2300:quartz]$ lpad report ATOMATE ON QUARTZ
Initializing the database¶
Be extremely careful when running this command! It will wipe all existing entries in your FireWorks database in the fireworks, workflows, and launches collections.