Overview
Slinky, an open source project developed by SchedMD (now part of NVIDIA), enables seamless interoperability between Slurm and Kubernetes. It introduces tools that allow users to run and manage Slurm clusters inside Kubernetes environments built on nearly any GPU-accelerated cluster, providing broad hardware support designed for today’s heterogeneous data centers. Whether you're managing high-performance computing (HPC) workloads or operating within cloud-native environments, Slinky helps bring together the best of both worlds for efficient resource management and scheduling.
Slinky is an open source toolkit for integrating Slurm with Kubernetes, making it ideal for hybrid compute scenarios and offering flexibility and ease of use for both HPC and cloud-native AI users.
Technology
The main components of the Slinky toolkit include Slurm Operator and Slurm Bridge. Slurm Operator runs full Slurm clusters on Kubernetes infrastructure, managing the complete lifecycle of Slurm daemons as pods. Slurm Bridge brings Slurm scheduling to native Kubernetes workloads, allowing Slurm to act as a Kubernetes scheduler for pods.
Slurm Operator is core to Slinky functionality. It successfully manages the scaling of Slurm nodes within Kubernetes. Slinky incorporates Slurm Operator to utilize aspects of Slurm, such as its job allocation, accounting and dependencies, fair-share, and priority scheduling.
Slurm Bridge brings quick, intelligent scheduling of workloads across a Kubernetes cluster. Slinky uses Slurm Bridge to support the co-location of Slurm and Kubernetes workloads, bringing in the advantages of the Slurm scheduling and scale to both.
Slinky is fully open source and hardware agnostic, providing complete transparency and flexibility for resource management and job scheduling on Kubernetes. Deploy Slinky, contribute to its growth, and seamlessly integrate it into your infrastructure stack.
Check it out on GitHub and join the community!
Benefits
Slinky is ideal for organizations running AI training and large-scale GPU workloads, scientific simulations, or data-intensive tasks alongside modern, cloud-native applications. It removes the need to maintain separate clusters, simplifying workload management and boosting efficiency.
Run Slurm and Kubernetes workloads on the same node pool without duplicating infrastructure. Slinky eliminates the need to partition clusters between HPC and cloud-native teams, letting both operate on shared hardware under a single scheduling layer.
Slinky uses Slurm's topology-aware scheduling to place distributed workloads on nodes that are physically closest in the network fabric. This minimizes communication overhead for large-scale AI training and HPC workloads where inter-node latency directly impacts performance.
Because Slinky runs Slurm inside Kubernetes, clusters benefit from Kubernetes-native tooling for autoscaling, observability, and lifecycle management. Teams can adopt Slurm's world-class scheduling capabilities while continuing to work within their existing Kubernetes tooling and workflows.
Slinky is designed to run on nearly any GPU-accelerated cluster, from on-premises supercomputers to major cloud providers. This hardware-agnostic approach gives organizations the flexibility to deploy consistent scheduling policies across heterogeneous data center environments without vendor lock-in.
Download on GitHub and join the community!
Stay up to date with new releases and get direct-to-engineering support.
Access release notes and quick-start guides for Slinky.