Slurm The standard for HPC and AI orchestration

Slurm: Open Source HPC and AI Workload Manager

The standard for HPC and AI orchestration.

Overview

Open Source Workload Management

Slurm is an open source workload manager built to efficiently manage nearly any workload and deliver proven throughput at massive scale. It uses a hierarchical structure consisting of a controller, nodes, and partitions to allocate jobs based on policies and resources, optimizing workload distribution, maximizing cluster utilization, and ensuring efficient job execution. Developed and maintained by engineers at SchedMD (now part of NVIDIA) with deep high-performance computing (HPC) and AI expertise, Slurm is the scheduler of choice for over half of the top 100 systems in the TOP500.

Get Support for Slurm

Slurm and Slinky support, training, and consultation services are now available from NVIDIA. From implementation to customization, get direct-to-engineering help from the experts to utilize Slurm to its full capacity.

Slurm for Kubernetes

Slinky provides a powerful set of tools for bringing Slurm’s capabilities into Kubernetes. It offers users flexibility and ease of use for managing HPC, cloud-native, and AI training workloads.

What Is Slurm?

Slurm is the market-leading open source workload manager for HPC and AI trusted by many of the world’s largest supercomputing and AI environments.

Slurm allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. It then provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, Slurm arbitrates conflicting requests for resources by managing a queue of pending work.

Features

A Closer Look at Slurm

The workload manager for the world’s top supercomputers.

Proven Scalability and Throughput for HPC and AI Clusters

Efficiently manage millions of jobs across the largest heterogeneous CPU and GPU clusters with the leading workload manager. Achieve high utilization and consistent performance across environments, from small labs to leadership-class, exascale supercomputers.

Optimized Resource Allocation

Accelerate job execution and improve productivity with sophisticated scheduling and prioritization capabilities, including complex policy management, quality of service, and balanced resource allocation that aligns with organizational service-level agreements and priorities.

Advanced Topology Awareness and Planning

Leverage Slurm’s understanding of complex network and system topologies to enable efficient workload placement on multi‑tier interconnects. Minimize latency, maximize bandwidth, and improve end‑to‑end job performance.

Widely Accessible: On Prem and Cloud Deployments

Build and expand over time with an open source workload manager that provides transparent code, active development, efficient cost, agile innovation, a strong user community. Support on‑prem, cloud, and hybrid deployments.

Download Slurm

Slurm is fully open source and hardware agnostic, providing complete transparency and flexibility for resource management and job scheduling. Deploy Slurm, contribute to its growth, and seamlessly integrate it into your infrastructure stack.

Check it out on GitHub and join the community!

Technology

Resource Management and Job Scheduling

The basis of Slurm is to allocate resources, manage pending work, and execute jobs, but it's the details of Slurm's architecture that make it the leading management system for HPC and AI workloads.

GPU Resource Management

With leading-class GPU resource management, Slurm lets users request GPU and CPU resources, ensuring  jobs execute quickly and efficiently with maximum utilization.

Cloud Integration

Slurm automatically spins up cloud instances based on queue depth and job requirements using autoscaling and hybrid cloud bursting, enabled by representational state transfer (REST) APIs and integration with major cloud providers.

Hardware Agnostic

Slurm runs on nearly any CPU- or GPU-accelerated cluster, with broad hardware support designed for modern, heterogeneous data centers running a variety of workloads.

Use Cases

Managing Workloads With Slurm

Find out how you can manage compute resources using the open source workload manager trusted by research labs and frontier AI leaders.

Massive-Scale Systems

Managing hundreds of thousands of cores, millions of jobs, and diverse hardware simultaneously requires more than basic scheduling. Slurm handles extreme concurrency with hierarchical job queues, topology-aware routing, and intelligent job packing that maximizes throughput. Built-in power management, policy enforcement, and detailed reporting keep massive deployments running efficiently and accountably at any scale.

HPC and AI Training

When training large AI models or running multi-physics simulations, job placement matters as much as raw compute. Slurm's topology-aware scheduling plans for multi-node workloads on multi-layered interconnects by assigning jobs to nodes that are physically closest in the network fabric, increasing performance by reducing the communication overhead. Combined with GPU-aware and policy-driven resource allocation, teams can run distributed workloads predictably without waiting on lower-priority or poorly placed jobs.

Kubernetes Clusters

Slinky is a toolkit of components that enables Slurm operation in Kubernetes environments, bridging the gap between traditional HPC and cloud-native environments. Teams can run Slurm and Kubernetes workloads on shared node pools, translating Kubernetes resource requests into Slurm jobs. This gives researchers and developers familiar Kubernetes workflows while benefiting from Slurm's superior batch scheduling and resource governance.

FAQs

FAQs About Slurm

An open source workload manager is software that automates the scheduling, execution, and monitoring of computing jobs across shared infrastructure such as clusters or cloud environments. Because it is open source, organizations can freely use, customize, and extend it to fit their performance, scalability, and operational needs without subscriptions or enterprise licenses.

The TOP500 is a ranking of the world's most powerful non-distributed computer systems. Slurm is the scheduler of choice for over half of the top 100 systems in the TOP500 list, which highlights its proven scalability and throughput at massive scale.

Yes, Slurm offers leading-class GPU resource management, allowing users to request both GPU and CPU resources to ensure jobs execute quickly and efficiently while maximizing utilization.

Official quick-start guides for users and administrators, release notes, and other detailed documentation  are available on the SchedMD (now part of NVIDIA) website. NVIDIA also provides technical blog posts and on-demand videos related to Slurm integration and features.1

Support tickets can be submitted through the support portal on the SchedMD (now part of NVIDIA) website. An email address with your organization's domain is required to validate your support entitlement. Slurm and Slinky support, training, and consultation services are available from NVIDIA. This provides direct-to-engineering help from experts for implementation and customization.2

Slurm leverages its understanding of complex network and system topologies to enable efficient workload placement on multi-tier interconnects. This minimizes latency, maximizes bandwidth, and improves end-to-end job performance, which is especially critical for HPC and AI training workloads.

SchedMD (now part of NVIDIA) developed Slinky as an open source toolkit of components that enables Slurm operation in Kubernetes environments, bridging the gap between traditional HPC and cloud-native environments. It allows teams to run Slurm and Kubernetes workloads on shared node pools, translating Kubernetes resource requests into Slurm jobs.3

Slurm is optimized for queue-based batch scheduling of large, parallel jobs, prioritizing throughput and hardware efficiency. Kubernetes is designed for declarative, event-driven orchestration of containerized microservices.4

Resources

The Latest in Workload Management

Orchestrate Next-Generation AI Workloads With Open Source Slurm

This GTC San Jose 2026 session explored the current architecture, recent enhancements, and ongoing community-driven work that are helping Slurm target higher efficiency, portability, and interoperability for supercomputing workloads.

Running Large-Scale GPU Workloads on Kubernetes with Slurm

Most organizations have years of investment in Slurm job scripts and face challenges transitioning onto Kubernetes without maintaining two separate environments. Slinky, an open source project, provides a new approach to managing Kubernetes environments at scale.

From Hardware to Topology-Aware Scheduling

AI architects and HPC operators face the challenge of transforming racked hardware into safe, performant, and easily consumable resources for end users. A validated software stack, like NVIDIA Mission Control™, offers tools for multi-node scheduling, supporting both Slurm and Kubernetes.

Next Steps

Ready to Get Started?

Download on GitHub and join the community!

Slurm Support

Stay up to date with new releases and get direct support from Slurm engineers.

Slurm Documentation

Access release notes and quick-start guides for Slurm.