SCONTROL(1) Slurm components SCONTROL(1) NAME scontrol - Used view and modify Slurm configuration and state. SYNOPSIS scontrol [OPTIONS...] [COMMAND...] DESCRIPTION scontrol is used to view or modify Slurm configuration including: job, job step, node, partition, and overall system configuration. Most of the commands can only be executed by user root. If an attempt to view or modify configuration information is made by an unauthorized user, an error message will be printed and the requested action will not occur. If no command is entered on the execute line, scontrol will operate in an interactive mode and prompt for input. It will continue prompting for input and executing commands until explicitly termi- nated. If a command is entered on the execute line, scontrol will exe- cute that command and terminate. All commands and options are case- insensitive, although node names and partition names are case-sensi- tive (node names "LX" and "lx" are distinct). Commands can be abbrevi- ated to the extent that the specification is unique. OPTIONS -a, --all When the show command is used, then display all partitions, their jobs and jobs steps. This causes information to be dis- played about partitions that are configured as hidden and par- titions that are unavailable to user's group. -h, --help Print a help message describing the usage of scontrol. --hide Do not display information about hidden partitions, their jobs and job steps. By default, neither partitions that are config- ured as hidden nor those partitions unavailable to user's group will be displayed (i.e. this is the default behavior). -o, --oneliner Print information one line per record. -q, --quiet Print no warning or informational messages, only fatal error messages. -v, --verbose Print detailed event logging. This includes time-stamps on data structures, record counts, etc. -V , --version Print version information and exit. COMMANDS all Show all partitiion, their jobs and jobs steps. This causes information to be displayed about partitions that are config- ured as hidden and partitions that are unavailable to user's group. abort Instruct the Slurm controller to terminate immediately and gen- erate a core file. checkpoint CKPT_OP ID Perform a checkpoint activity on the job step(s) with the spec- ified identification. CKPT_OP may be disable (disable future checkpoints), enable (enable future checkpoints),, able (test if presently not disabled, report start time if checkpoint in progress), create (create a checkpoint and continue the job step), vacate (create a checkpoint and terminate the job step), error (report the result for the last checkpoint request, error code and message), or restart (restart execution of the previ- ously checkpointed job steps). ID can be used to identify a specific job (e.g. "", which applies to all of its existing steps) or a specific job step (e.g. "."). completing Display all jobs in a COMPLETING state along with associated nodes in either a COMPLETING or DOWN state. delete SPECIFICATION Delete the entry with the specified SPECIFICATION. The only supported SPECIFICATION presently is of the form Partition- Name=. exit Terminate the execution of scontrol. help Display a description of scontrol options and commands. hide Do not display partitiion, job or jobs step information for partitions that are configured as hidden or partitions that are unavailable to the user's group. This is the default behavior. oneliner Print information one line per record. pidinfo PROC_ID Print the Slurm job id and scheduled termination time corre- sponding to the supplied process id, PROC_ID, on the current node. This will only work for processes which Slurm spawns and their descendants. ping Ping the primary and secondary slurmctld daemon and report if they are responding. quiet Print no warning or informational messages, only fatal error messages. quit Terminate the execution of scontrol. reconfigure Instruct all Slurm daemons to re-read the configuration file. This command does not restart the daemons. This mechanism would be used to modify configuration parameters (Epilog, Pro- log, SlurmctldLogFile, SlurmdLogFile, etc.) register the physi- cal addition or removal of nodes from the cluster or recognize the change of a node's configuration, such as the addition of memory or processors. The Slurm controller (slurmctld) for- wards the request all other daemons (slurmd daemon on each com- pute node). Running jobs continue execution. Most configura- tion parameters can be changed by just running this command, however, SLURM daemons should be shutdown and restarted if any of these parameters are to be changed: AuthType, BackupAddr, BackupController, ControlAddr, ControlMach, PluginDir, State- SaveLocation, SlurmctldPort or SlurmdPort. resume job_id Resume a previously suspended job. show ENTITY ID Display the state of the specified entity with the specified identification. ENTITY may be config, daemons, job, node, par- tition or step. ID can be used to identify a specific element of the identified entity: the configuration parameter name, job ID, node name, partition name, or job step ID for entities con- fig, job, node, partition, and step respectively. Multiple node names may be specified using simple node range expressions (e.g. "lx[10-20]"). All other ID values must identify a single element. The job step ID is of the form "job_id.step_id", (e.g. "1234.1"). By default, all elements of the entity type speci- fied are printed. shutdown Instruct all Slurm daemons to save current state and terminate. The Slurm controller (slurmctld) forwards the request all other daemons (slurmd daemon on each compute node). suspend job_id Suspend a running job. Use the resume command to resume its execution. User processes must stop on receipt of SIGSTOP sig- nal and resume upon receipt of SIGCONT for this operation to be effective. Not all architectures and configurations support job suspension. update SPECIFICATION Update job, node or partition configuration per the supplied specification. SPECIFICATION is in the same format as the Slurm configuration file and the output of the show command described above. It may be desirable to execute the show com- mand (described above) on the specific entity you which to update, then use cut-and-paste tools to enter updated configu- ration values to the update. Note that while most configuration values can be changed using this command, not all can be changed using this mechanism. In particular, the hardware con- figuration of a node or the physical addition or removal of nodes from the cluster may only be accomplished through editing the Slurm configuration file and executing the reconfigure com- mand (described above). verbose Print detailed event logging. This includes time-stamps on data structures, record counts, etc. version Display the version number of scontrol being executed. !! Repeat the last command executed. SPECIFICATIONS FOR UPDATE COMMAND, JOBS Account= Account name to be changed for this job's resource use. Value may be cleared with blank data value, "Account=". Contiguous= Set the job's requirement for contiguous (consecutive) nodes to be allocated. Possible values are"YES" and "NO". Dependency= Defer job's initiation until specified job_id completes. Can- cel dependency with job_id value of "0", "Depedency=0". Features= Set the job's required features on nodes specified value. Mul- tiple values may be comma separated if all features are required (AND operation) or separated by "|" if any of the specified features are required (OR operation). Value may be cleared with blank data value, "Features=". JobId= Identify the job to be updated. This specification is required. MinMemory= Set the job's minimum real memory required per nodes to the specified value. MinProcs= Set the job's minimum number of processors per nodes to the specified value. MinTmpDisk= Set the job's minimum temporary disk space required per nodes to the specified value. Name= Set the job's name to the specified value. Partition= Set the job's partition to the specified value. Priority= Set the job's priority to the specified value. Nice[=delta] Adjust job's priority by the specified value. Default value is 100. ReqNodeList= Set the job's list of required node. Multiple node names may be specified using simple node range expressions (e.g. "lx[10-20]"). Value may be cleared with blank data value, "ReqNodeList=". ReqNodes= Set the job's count of required nodes to the specified value. ReqProcs= Set the job's count of required processors to the specified value. Shared= Set the job's ability to share nodes with other jobs. Possible values are "YES" and "NO". StartTime= Set the job's earliest initiation time. It accepts times of the form HH:MM:SS to run a job at a specific time of day (sec- onds are optional). (If that time is already past, the next day is assumed.) You may also specify midnight, noon, or teatime (4pm) and you can have a time-of-day suffixed with AM or PM for running in the morning or the evening. You can also say what day the job will be run, by giving a date in the form month-name day with an optional year, or giving a date of the form MMDDYY or MM/DD/YY or DD.MM.YY. You can also give times like now + count time-units, where the time-units can be min- utes, hours, days, or weeks and you can tell SLURM to run the job today with the keyword today and to run the job tomorrow with the keyword tomorrow. TimeLimit= Set the job's time limit to the specified value. Connection= Reset the node connection type. Possible values on Blue Gene are "MESH", "TORUS" and "NAV" (mesh else torus). Geometry= Reset the required job geometry. On Blue Gene the value should be three digits separated by "x" or ",". The digits represent the allocation size in X, Y and Z dimentions (e.g. "2x3x4"). Rotate= Permit the job's geometry to be rotated. Possible values are "YES" and "NO". SPECIFICATIONS FOR UPDATE COMMAND, NODES NodeName= Identify the node(s) to be updated. Multiple node names may be specified using simple node range expressions (e.g. "lx[10-20]"). This specification is required. Reason= Identify the reason the node is in a "DOWN" or "DRAINED" or "DRAINING" state. Use quotes to enclose a reason having more than one word. State= Identify the state to be assigned to the node. Possible values are "NoResp", "DRAIN" "RESUME", "DOWN", "IDLE", "ALLOC", and "ALLOCATED". "RESUME is not an actual node state, but will return a DRAINED, DRAINING, or DOWN node to service, either IDLE or ALLOCATED state as appropriate. The "NoResp" state will only set the "NoResp" flag for a node without changing its underlying state. SPECIFICATIONS FOR UPDATE AND DELETE COMMANDS, PARTITIONS AllowGroups= Identify the user groups which may use this partition. Multi- ple groups may be specified in a comma separated list. To per- mit all groups to use the partition specify "AllowGroups=ALL". Default= Specify if this partition is to be used by jobs which do not explicitly identify a partition to use. Possible values are"YES" and "NO". Hidden= Specify if the partition and its jobs should be hidden from view. Hidden partitions will by default not be reported by SLURM APIs or commands. Possible values are"YES" and "NO". Nodes= Identify the node(s) to be associated with this partition. Mul- tiple node names may be specified using simple node range expressions (e.g. "lx[10-20]"). Note that jobs may only be associated with one partition at any time. Specify a blank data value to remove all nodes from a partition: "Nodes=". PartitionName= Identify the partition to be updated. This specification is required. RootOnly= Specify if only allocation requests initiated by user root will be satisfied. This can be used to restrict control of the par- tition to some meta-scheduler. Possible values are"YES" and "NO". Shared= Specify if nodes in this partition can be shared by multiple jobs. Possible values are"YES", "NO" and "FORCE". State= Specify if jobs can be allocated nodes in this partition. Pos- sible values are"UP" and "DOWN". If a partition allocated nodes to running jobs, those jobs will continue execution even after the partition's state is set to "DOWN". The jobs must be explicitly canceled to force their termination. MaxNodes= Set the maximum number of nodes which will be allocated to any single job in the partition. Specify a number or "INFINITE". MinNodes= Set the minimum number of nodes which will be allocated to any single job in the partition. ENVIRONMENT VARIABLES Some scontrol options may be set via environment variables. These environment variables, along with their corresponding options, are listed below. (Note: Commandline options will always override these settings.) SCONTROL_ALL -a, --all SLURM_CONF The location of the SLURM configuration file. EXAMPLE # scontrol scontrol: show part class PartitionName=class TotalNodes=10 TotalCPUs=20 RootOnly=NO Default=NO Shared=NO State=UP MaxTime=30 Hidden=NO MinNodes=1 MaxNodes=2 AllowGroups=students Nodes=lx[0031-0040] NodeIndices=31,40,-1 scontrol: update PartitionName=class MaxTime=99 MaxNodes=4 scontrol: show job 65539 JobId=65539 UserId=1500 JobState=PENDING TimeLimit=100 Priority=100 Partition=batch Name=job01 NodeList=(null) StartTime=0 EndTime=0 Shared=0 ReqProcs=1000 ReqNodes=400 Contiguous=1 MinProcs=4 MinMemory=1024 MinTmpDisk=2034ReqNodeList=lx[3000-3003] Features=(null) JobScript=/bin/hostname scontrol: update JobId=65539 TimeLimit=200 Priority=500 scontrol: quit COPYING Copyright (C) 2002 The Regents of the University of California. Pro- duced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). UCRL-CODE-217948. This file is part of SLURM, a resource management program. For details, see . SLURM is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. SLURM is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. FILES /etc/slurm.conf SEE ALSO scancel(1), sinfo(1), squeue(1), slurm_checkpoint(3), slurm_delete_partition(3), slurm_load_ctl_conf(3), slurm_load_jobs(3), slurm_load_node(3), slurm_load_partitions(3), slurm_reconfigure(3), slurm_resume(3), slurm_shutdown(3), slurm_suspend(3), slurm_update_job(3), slurm_update_node(3), slurm_update_partition(3), slurm.conf(5) scontrol 1.0 December 2005 SCONTROL(1)