Job

Note

This supersedes the pyslurm.job class, which will be removed in a future release

`pyslurm.Job`

A Slurm Job.

All attributes in this class are read-only.

Parameters:

Name	Type	Description	Default
`job_id`	`int`	An Integer representing a Job-ID.	required

Attributes:

Name	Type	Description
`steps`	`JobSteps`	Steps this Job has. Before you can access the Steps data for a Job, you have to call the `reload()` method of a Job instance or the `load_steps()` method of a Jobs collection.
`stats`	`JobStatistics`	Real-time statistics of a Job. Before you can access the stats data for a Job, you have to call the `load_stats` method of a Job instance or the Jobs collection.
`pids`	`dict[str, list]`	Current Process-IDs of the Job, organized by node name. Before you can access the pids data for a Job, you have to call the `load_stats` method of a Job instance or the Jobs collection.
`name`	`str`	Name of the Job
`id`	`int`	Unique ID of the Job.
`association_id`	`int`	ID of the Association this Job runs with.
`account`	`str`	Name of the Account this Job is runs with.
`user_id`	`int`	UID of the User who submitted the Job.
`user_name`	`str`	Name of the User who submitted the Job.
`group_id`	`int`	GID of the Group that Job runs under.
`group_name`	`str`	Name of the Group this Job runs under.
`priority`	`int`	Priority of the Job.
`nice`	`int`	Nice Value of the Job.
`qos`	`str`	QOS Name of the Job.
`min_cpus_per_node`	`int`	Minimum Amount of CPUs per Node the Job requested.
`state`	`str`	State this Job is currently in.
`state_reason`	`str`	A Reason explaining why the Job is in its current state.
`is_requeueable`	`bool`	Whether the Job is requeuable or not.
`requeue_count`	`int`	Amount of times the Job has been requeued.
`is_batch_job`	`bool`	Whether the Job is a batch job or not.
`node_reboot_required`	`bool`	Whether the Job requires the Nodes to be rebooted first.
`dependencies`	`dict`	Dependencies the Job has to other Jobs.
`time_limit`	`int`	Time-Limit, in minutes, for this Job.
`time_limit_min`	`int`	Minimum Time-Limit in minutes for this Job.
`submit_time`	`int`	Time the Job was submitted, as unix timestamp.
`eligible_time`	`int`	Time the Job is eligible to start, as unix timestamp.
`accrue_time`	`int`	Job accrue time, as unix timestamp
`start_time`	`int`	Time this Job has started execution, as unix timestamp.
`resize_time`	`int`	Time the job was resized, as unix timestamp.
`deadline`	`int`	Time when a pending Job will be cancelled, as unix timestamp.
`preempt_eligible_time`	`int`	Time the Job is eligible for preemption, as unix timestamp.
`preempt_time`	`int`	Time the Job was signaled for preemption, as unix timestamp.
`suspend_time`	`int`	Last Time the Job was suspended, as unix timestamp.
`last_sched_evaluation_time`	`int`	Last time evaluated for Scheduling, as unix timestamp.
`pre_suspension_time`	`int`	Amount of seconds the Job ran prior to suspension, as unix timestamp
`mcs_label`	`str`	MCS Label for the Job
`partition`	`str`	Name of the Partition the Job runs in.
`submit_host`	`str`	Name of the Host this Job was submitted from.
`batch_host`	`str`	Name of the Host where the Batch-Script is executed.
`num_nodes`	`int`	Amount of Nodes the Job has requested or allocated.
`max_nodes`	`int`	Maximum amount of Nodes the Job has requested.
`allocated_nodes`	`str`	Nodes the Job is currently using. This is only valid when the Job is running. If the Job is pending, it will always return None.
`required_nodes`	`str`	Nodes the Job is explicitly requiring to run on.
`excluded_nodes`	`str`	Nodes that are explicitly excluded for execution.
`scheduled_nodes`	`str`	Nodes the Job is scheduled on by the slurm controller.
`derived_exit_code`	`int`	The derived exit code for the Job.
`derived_exit_code_signal`	`int`	Signal for the derived exit code.
`exit_code`	`int`	Code with which the Job has exited.
`exit_code_signal`	`int`	The signal which has led to the exit code of the Job.
`batch_constraints`	`list`	Features that node(s) should have for the batch script. Controls where it is possible to execute the batch-script of the job. Also see 'constraints'
`federation_origin`	`str`	Federation Origin
`federation_siblings_active`	`int`	Federation siblings active
`federation_siblings_viable`	`int`	Federation siblings viable
`cpus`	`int`	Total amount of CPUs the Job is using. If the Job is still pending, this will be the amount of requested CPUs.
`cpus_per_task`	`int`	Number of CPUs per Task used.
`cpus_per_gpu`	`int`	Number of CPUs per GPU used.
`boards_per_node`	`int`	Number of boards per Node.
`sockets_per_board`	`int`	Number of sockets per board.
`sockets_per_node`	`int`	Number of sockets per node.
`cores_per_socket`	`int`	Number of cores per socket.
`threads_per_core`	`int`	Number of threads per core.
`ntasks`	`int`	Number of parallel processes.
`ntasks_per_node`	`int`	Number of parallel processes per node.
`ntasks_per_board`	`int`	Number of parallel processes per board.
`ntasks_per_socket`	`int`	Number of parallel processes per socket.
`ntasks_per_core`	`int`	Number of parallel processes per core.
`ntasks_per_gpu`	`int`	Number of parallel processes per GPU.
`delay_boot_time`	`int`	https://slurm.schedmd.com/sbatch.html#OPT_delay-boot, in minutes
`constraints`	`list`	A list of features the Job requires nodes to have. In contrast, the 'batch_constraints' option only focuses on the initial batch-script placement. This option however means features to restrict the list of nodes a job is able to execute on in general beyond the initial batch-script.
`cluster`	`str`	Name of the cluster the job is executing on.
`cluster_constraints`	`list`	A List of features that a cluster should have.
`reservation`	`str`	Name of the reservation this Job uses.
`resource_sharing`	`str`	Mode controlling how a job shares resources with others.
`requires_contiguous_nodes`	`bool`	Whether the Job has allocated a set of contiguous nodes.
`licenses`	`list`	List of licenses the Job needs.
`network`	`str`	Network specification for the Job.
`command`	`str`	The command that is executed for the Job.
`working_directory`	`str`	Path to the working directory for this Job.
`admin_comment`	`str`	An arbitrary comment set by an administrator for the Job.
`system_comment`	`str`	An arbitrary comment set by the slurmctld for the Job.
`container`	`str`	The container this Job uses.
`comment`	`str`	An arbitrary comment set for the Job.
`standard_input`	`str`	The path to the file for the standard input stream.
`standard_output`	`str`	The path to the log file for the standard output stream.
`standard_error`	`str`	The path to the log file for the standard error stream.
`required_switches`	`int`	Number of switches required.
`max_wait_time_switches`	`int`	Amount of seconds to wait for the switches.
`burst_buffer`	`str`	Burst buffer specification
`burst_buffer_state`	`str`	Burst buffer state
`cpu_frequency_min`	`Union[str, int]`	Minimum CPU-Frequency requested.
`cpu_frequency_max`	`Union[str, int]`	Maximum CPU-Frequency requested.
`cpu_frequency_governor`	`Union[str, int]`	CPU-Frequency Governor requested.
`billable_tres`	`float`	Amount of billable trackable resources.
`wckey`	`str`	Name of the WCKey this Job uses.
`mail_user`	`list`	Users that should receive Mails for this Job.
`mail_types`	`list`	Mail Flags specified by the User.
`heterogeneous_id`	`int`	Heterogeneous job id.
`heterogeneous_offset`	`int`	Heterogeneous job offset.
`temporary_disk_per_node`	`int`	Temporary disk space in Mebibytes available per Node.
`array_id`	`int`	The master Array-Job ID.
`array_tasks_parallel`	`int`	Max number of array tasks allowed to run simultaneously.
`array_task_id`	`int`	Array Task ID of this Job if it is an Array-Job.
`array_tasks_waiting`	`str`	Array Tasks that are still waiting.
`end_time`	`int`	Time at which this Job will end, as unix timestamp.
`run_time`	`int`	Amount of seconds the Job has been running.
`cores_reserved_for_system`	`int`	Amount of cores reserved for System use only.
`threads_reserved_for_system`	`int`	Amount of Threads reserved for System use only.
`memory`	`int`	Total Amount of Memory this Job has, in Mebibytes
`memory_per_cpu`	`int`	Amount of Memory per CPU this Job has, in Mebibytes
`memory_per_node`	`int`	Amount of Memory per Node this Job has, in Mebibytes
`memory_per_gpu`	`int`	Amount of Memory per GPU this Job has, in Mebibytes
`gres_per_node`	`dict`	Generic Resources (e.g. GPU) this Job is using per Node.
`profile_types`	`list`	Types for which detailed accounting data is collected.
`gres_binding`	`str`	Binding Enforcement of a Generic Resource (e.g. GPU).
`gres_tasks_per_sharing`	`str`	Task Sharing of a Generic Resource (e.g. GPU).
`kill_on_invalid_dependency`	`bool`	Whether the Job should be killed on an invalid dependency.
`spreads_over_nodes`	`bool`	Whether the Job should be spread over as many nodes as possible.
`is_cronjob`	`bool`	Whether this Job is a cronjob.
`cronjob_time`	`str`	The time specification for the Cronjob.
`elapsed_cpu_time`	`int`	Amount of CPU-Time used by the Job so far. This is the result of multiplying the run_time with the amount of cpus requested.
`run_time_remaining`	`int`	The amount of seconds the job has still left until hitting the `time_limit`.

`cancel()` `method descriptor`

Cancel a Job.

Implements the slurm_kill_job RPC.

Raises:

Type	Description
`RPCError`	When cancelling the Job was not successful.

Examples:

>>> import pyslurm
>>> pyslurm.Job(9999).cancel()

`get_batch_script()` `method descriptor`

Return the content of the script for a Batch-Job.

Returns:

Type	Description
`str`	The content of the batch script.

Raises:

Type	Description
`RPCError`	When retrieving the Batch-Script for the Job was not successful.

Examples:

>>> import pyslurm
>>> script = pyslurm.Job(9999).get_batch_script()

`get_resource_layout_per_node()` `method descriptor`

Retrieve the resource layout of this Job on each node.

Warning

Return type may still be subject to change in the future

Returns:

Type	Description
`dict`	Resource layout, where the key is the name of the node and the value another dict with the keys `cpu_ids`, `memory` and `gres`.

`hold(mode=None)` `method descriptor`

Hold a currently pending Job, preventing it from being scheduled.

Parameters:

Name	Type	Description	Default
`mode`	`str`	Determines in which mode the Job should be held. Possible values are `user` or `admin`. By default, the Job is held in `admin` mode, meaning only an Administrator will be able to release the Job again. If you specify the mode as `user`, the User will also be able to release the job.	`None`

Raises:

Type	Description
`RPCError`	When holding the Job was not successful.

Examples:

>>> import pyslurm
>>>
>>> # Holding a Job (in "admin" mode by default)
>>> pyslurm.Job(9999).hold()
>>>
>>> # Holding a Job in "user" mode
>>> pyslurm.Job(9999).hold(mode="user")

`load(job_id)` `staticmethod`

Load information for a specific Job.

Implements the slurm_load_job RPC.

Note

If the Job is not pending, the related Job steps will also be loaded. Job statistics are however not loaded automatically.

Parameters:

Name	Type	Description	Default
`job_id`	`int`	An Integer representing a Job-ID.	required

Returns:

Type	Description
`Job`	Returns a new Job instance

Raises:

Type	Description
`RPCError`	If requesting the Job information from the slurmctld was not successful.

Examples:

>>> import pyslurm
>>> job = pyslurm.Job.load(9999)

`load_stats()` `method descriptor`

Load realtime statistics for a Job and its steps.

Calling this function returns the Job statistics, and additionally populates the stats and pids attribute of the instance.

Returns:

Type	Description
`JobStatistics`	The statistics of the job.

Raises:

Type	Description
`RPCError`	When receiving the Statistics was not

Examples:

>>> import pyslurm
>>> job = pyslurm.Job.load(9999)
>>> stats = job.load_stats()
>>>
>>> # Print the CPU Time Used
>>> print(stats.total_cpu_time)
>>>
>>> # Print the Process-IDs for the whole Job, organized by hostname
>>> print(job.pids)

`modify(changes)` `method descriptor`

Modify a Job.

Implements the slurm_update_job RPC.

Parameters:

Name	Type	Description	Default
`changes`	`JobSubmitDescription`	A JobSubmitDescription object which contains all the modifications that should be done on the Job.	required

Raises:

Type	Description
`RPCError`	When updating the Job was not successful.

Examples:

>>> import pyslurm
>>>
>>> # Setting the new time-limit to 20 days
>>> changes = pyslurm.JobSubmitDescription(time_limit="20-00:00:00")
>>> pyslurm.Job(9999).modify(changes)

`notify(msg)` `method descriptor`

Sends a message to the Jobs stdout.

Implements the slurm_notify_job RPC.

Parameters:

Name	Type	Description	Default
`msg`	`str`	The message that should be sent.	required

Raises:

Type	Description
`RPCError`	When sending the message to the Job was not successful.

Examples:

>>> import pyslurm
>>> pyslurm.Job(9999).notify("Hello Friends!")

`release()` `method descriptor`

Release a currently held Job, allowing it to be scheduled again.

Raises:

Type	Description
`RPCError`	When releasing a held Job was not successful.

Examples:

>>> import pyslurm
>>> pyslurm.Job(9999).release()

`requeue(hold=False)` `method descriptor`

Requeue a currently running Job.

Implements the slurm_requeue RPC.

Parameters:

Name	Type	Description	Default
`hold`	`bool`	Controls whether the Job should be put in a held state or not. Default for this is `False`, so it will not be held.	`False`

Raises:

Type	Description
`RPCError`	When requeing the Job was not successful.

Examples:

>>> import pyslurm
>>>
>>> # Requeing a Job while allowing it to be
>>> # scheduled again immediately
>>> pyslurm.Job(9999).requeue()
>>>
>>> # Requeing a Job while putting it in a held state
>>> pyslurm.Job(9999).requeue(hold=True)

`send_signal(signal, steps='children', hurry=False)` `method descriptor`

Send a signal to a running Job.

Implements the slurm_signal_job RPC.

Parameters:

Name	Type	Description	Default
`signal`	`Union[str, int]`	Any valid signal which will be sent to the Job. Can be either a str like `SIGUSR1`, or simply an int.	required
`steps`	`str`	Selects which steps should be signaled. Valid values for this are: `all`, `batch` and `children`. The default value is `children`, where all steps except the batch-step will be signaled. The value `batch` in contrast means, that only the batch-step will be signaled. With `all` every step is signaled.	`'children'`
`hurry`	`bool`	If True, no burst buffer data will be staged out. The default value is False.	`False`

Raises:

Type	Description
`RPCError`	When sending the signal was not successful.

Examples:

Specifying the signal as a string:

>>> from pyslurm import Job
>>> Job(9999).send_signal("SIGUSR1")

or passing in a numeric signal:

>>> Job(9999).send_signal(9)

`suspend()` `method descriptor`

Suspend a running Job.

Implements the slurm_suspend RPC.

Raises:

Type	Description
`RPCError`	When suspending the Job was not successful.

Examples:

>>> import pyslurm
>>> pyslurm.Job(9999).suspend()

`to_dict()` `method descriptor`

Job information formatted as a dictionary.

Returns:

Type	Description
`dict`	Job information as dict

`unsuspend()` `method descriptor`

Unsuspend a currently suspended Job.

Implements the slurm_resume RPC.

Raises:

Type	Description
`RPCError`	When unsuspending the Job was not successful.

Examples:

>>> import pyslurm
>>> pyslurm.Job(9999).unsuspend()

`pyslurm.Jobs`

Bases: pyslurm.xcollections.MultiClusterMap

A Multi Cluster collection of pyslurm.Job objects.

Parameters:

Name	Type	Description	Default
`jobs`	`Union[list[int], dict[int, Job], str]`	Jobs to initialize this collection with.	`None`
`frozen`	`bool`	Control whether this collection is `frozen` when reloading Job information.	`False`

Attributes:

Name	Type	Description
`memory`	`int`	Total amount of memory requested for all Jobs in this collection, in Mebibytes
`cpus`	`int`	Total amount of cpus requested for all Jobs in this collection.
`ntasks`	`int`	Total amount of tasks requested for all Jobs in this collection.
`elapsed_cpu_time`	`int`	Total amount of CPU-Time used by all the Jobs in the collection. This is the result of multiplying the run_time with the amount of cpus requested for each job.
`frozen`	`bool`	If this is set to True and the `reload()` method is called, then ONLY Jobs that already exist in this collection will be reloaded. New Jobs that are discovered will not be added to this collection, but old Jobs which have already been purged from the Slurm controllers memory will not be removed either. The default is False, so old jobs will be removed, and new Jobs will be added - basically the same behaviour as doing Jobs.load().
`stats`	`JobStatistics`	Real-time statistics of all Jobs in this collection. Before you can access the stats data for this, you have to call the `load_stats` method on this collection.

`load(preload_passwd_info=False, frozen=False)` `staticmethod`

Retrieve all Jobs from the Slurm controller

Parameters:

Name	Type	Description	Default
`preload_passwd_info`	`bool`	Decides whether to query passwd and groups information from the system. Could potentially speed up access to attributes of the Job where a UID/GID is translated to a name. If True, the information will fetched and stored in each of the Job instances.	`False`
`frozen`	`bool`	Decide whether this collection of Jobs should be frozen.	`False`

Returns:

Type	Description
`Jobs`	A collection of Job objects.

Raises:

Type	Description
`RPCError`	When getting all the Jobs from the slurmctld failed.

Examples:

>>> import pyslurm
>>> jobs = pyslurm.Jobs.load()
>>> print(jobs)
pyslurm.Jobs({1: pyslurm.Job(1), 2: pyslurm.Job(2)})
>>> print(jobs[1])
pyslurm.Job(1)

`load_stats()` `method descriptor`

Load realtime stats for this collection of Jobs.

This function additionally fills in the stats attribute for all Jobs in the collection, and also populates its own stats attribute. Implicitly calls load_steps().

Note

Pending Jobs will be ignored, since they don't have any Stats yet.

Returns:

Type	Description
`JobStatistics`	The statistics of this job collection.

Raises:

Type	Description
`RPCError`	When retrieving the stats for all the Jobs failed.

Examples:

>>> import pyslurm
>>> jobs = pyslurm.Jobs.load()
>>> stats = jobs.load_stats()
>>>
>>> # Print the CPU Time Used
>>> print(stats.total_cpu_time)

`load_steps()` `method descriptor`

Load all Job steps for this collection of Jobs.

This function fills in the steps attribute for all Jobs in the collection.

Note

Pending Jobs will be ignored, since they don't have any Steps yet.

Raises:

Type	Description
`RPCError`	When retrieving the information for all the Steps failed.

`reload()` `method descriptor`

Reload the information for jobs in a collection.

Returns:

Type	Description
`Jobs`	Returns self

Raises:

Type	Description
`RPCError`	When getting the Jobs from the slurmctld failed.

Job

pyslurm.Job

cancel() method descriptor

get_batch_script() method descriptor

get_resource_layout_per_node() method descriptor

hold(mode=None) method descriptor

load(job_id) staticmethod

load_stats() method descriptor

modify(changes) method descriptor

notify(msg) method descriptor

release() method descriptor

requeue(hold=False) method descriptor

send_signal(signal, steps='children', hurry=False) method descriptor

suspend() method descriptor

to_dict() method descriptor

unsuspend() method descriptor

pyslurm.Jobs

load(preload_passwd_info=False, frozen=False) staticmethod

load_stats() method descriptor

load_steps() method descriptor

reload() method descriptor

`pyslurm.Job`

`cancel()` `method descriptor`

`get_batch_script()` `method descriptor`

`get_resource_layout_per_node()` `method descriptor`

`hold(mode=None)` `method descriptor`

`load(job_id)` `staticmethod`

`load_stats()` `method descriptor`

`modify(changes)` `method descriptor`

`notify(msg)` `method descriptor`

`release()` `method descriptor`

`requeue(hold=False)` `method descriptor`

`send_signal(signal, steps='children', hurry=False)` `method descriptor`

`suspend()` `method descriptor`

`to_dict()` `method descriptor`

`unsuspend()` `method descriptor`

`pyslurm.Jobs`

`load(preload_passwd_info=False, frozen=False)` `staticmethod`

`load_stats()` `method descriptor`

`load_steps()` `method descriptor`

`reload()` `method descriptor`