Slinky - Slurm and Kubernetes GPU integration

Slinky

Slurm workload management for Kubernetes.

Overview

Bring Slurm Capabilities Into Kubernetes

Slinky, an open source project developed by SchedMD (now part of NVIDIA), enables seamless interoperability between Slurm and Kubernetes. It introduces tools that allow users to run and manage Slurm clusters inside Kubernetes environments built on nearly any GPU-accelerated cluster, providing broad hardware support designed for today’s heterogeneous data centers. Whether you're managing high-performance computing (HPC) workloads or operating within cloud-native environments, Slinky helps bring together the best of both worlds for efficient resource management and scheduling.

Get Support for Slinky

Slurm and Slinky support, training, and consultation services are now available from NVIDIA. From implementation to customization, get direct-to-engineering help from the experts to utilize Slinky to its full capacity.

Running Large-Scale GPU Workloads

Most organizations have years of investment in Slurm job scripts and face challenges transitioning onto Kubernetes without maintaining two separate environments. In this blog, see how Slinky manages Kubernetes environments at scale.

What Is Slinky?

Slinky is an open source toolkit for integrating Slurm with Kubernetes, making it ideal for hybrid compute scenarios and offering flexibility and ease of use for both HPC and cloud-native AI users.

Technology

A Closer Look at Slinky

The main components of the Slinky toolkit include Slurm Operator and Slurm Bridge. Slurm Operator runs full Slurm clusters on Kubernetes infrastructure, managing the complete lifecycle of Slurm daemons as pods. Slurm Bridge brings Slurm scheduling to native Kubernetes workloads, allowing Slurm to act as a Kubernetes scheduler for pods.

Slurm Operator

Slurm Operator is core to Slinky functionality. It successfully manages the scaling of Slurm nodes within Kubernetes. Slinky incorporates Slurm Operator to utilize aspects of Slurm, such as its job allocation, accounting and dependencies, fair-share, and priority scheduling.

Slurm Bridge

Slurm Bridge brings quick, intelligent scheduling of workloads across a Kubernetes cluster. Slinky uses Slurm Bridge to support the co-location of Slurm and Kubernetes workloads, bringing in the advantages of the Slurm scheduling and scale to both.

Download Slinky

Slinky is fully open source and hardware agnostic, providing complete transparency and flexibility for resource management and job scheduling on Kubernetes. Deploy Slinky, contribute to its growth, and seamlessly integrate it into your infrastructure stack.

Check it out on GitHub and join the community!

Benefits

Explore the Benefits of Slinky

Slinky is ideal for organizations running AI training and large-scale GPU workloads, scientific simulations, or data-intensive tasks alongside modern, cloud-native applications. It removes the need to maintain separate clusters, simplifying workload management and boosting efficiency.

Unified Resource Management

Run Slurm and Kubernetes workloads on the same node pool without duplicating infrastructure. Slinky eliminates the need to partition clusters between HPC and cloud-native teams, letting both operate on shared hardware under a single scheduling layer.

Topology-Aware GPU Scheduling

Slinky uses Slurm's topology-aware scheduling to place distributed workloads on nodes that are physically closest in the network fabric. This minimizes communication overhead for large-scale AI training and HPC workloads where inter-node latency directly impacts performance.

Kubernetes-Native Deployment

Because Slinky runs Slurm inside Kubernetes, clusters benefit from Kubernetes-native tooling for autoscaling, observability, and lifecycle management. Teams can adopt Slurm's world-class scheduling capabilities while continuing to work within their existing Kubernetes tooling and workflows.

Broad Hardware Compatibility

Slinky is designed to run on nearly any GPU-accelerated cluster, from on-premises supercomputers to major cloud providers. This hardware-agnostic approach gives organizations the flexibility to deploy consistent scheduling policies across heterogeneous data center environments without vendor lock-in.

Next Steps

Ready to Get Started?

Download on GitHub and join the community!

Slurm and Slinky Support

Stay up to date with new releases and get direct-to-engineering support.

Slinky Documentation

Access release notes and quick-start guides for Slinky.