NVIDIA Unified Fabric Manager (UFM)

Explore the network management platforms for cyber intelligence and analytics.

The NVIDIA® UFM® platforms revolutionize data center networking management by combining enhanced, real-time network telemetry with AI-powered cyber intelligence and analytics to support scale-out InfiniBand data centers.

 

Data Center Management Made Easy

UFM platforms empower research and industrial data center operators to efficiently provision, monitor, manage, and preventatively troubleshoot and maintain their InfiniBand data center fabric. UFM platforms comprise multiple solution levels and a comprehensive feature set to meet the broadest range of modern, scale-out data center requirements. Using UFM, you can realize higher utilization of fabric resources and gain a competitive advantage, while reducing opex.

UFM platforms feature robust graphical user interfaces (GUIs)

UFM platforms feature robust graphical user interfaces (GUIs).

Highlights

UFM Platforms

UFM telemetry: real-time monitoring

UFM Telemetry
Real-Time Monitoring

The UFM Telemetry platform provides network validation tools to monitor network performance and conditions, capturing and streaming rich real-time network telemetry information, application workload usage, and system configuration to an on-premises or cloud-based database for further analysis.

 

Platforms: Software containers or dedicated appliances

 

Key features:

  • Switches, adapters, and cables telemetry

  • System validation

  • Network performance tests

  • Streaming of telemetry information to on-premises or cloud-based database

UFM enterprise: fabric visibility and control

UFM Enterprise
Fabric Visibility and Control

The UFM Enterprise platform combines the benefits of UFM Telemetry with enhanced network monitoring and management. It performs automated network discovery and provisioning, traffic monitoring, and congestion discovery. It also enables job schedule provisioning and integrates with industry-leading job schedulers and cloud and cluster managers, including Slurm and Platform Load Sharing Facility (LSF). 

 

Platforms: Software containers or dedicated appliances

 

Key features:

  • Includes UFM Telemetry features

  • Automated network discovery and validation

  • Secure cable management

  • Congestion tracking to identify traffic bottlenecks

  • Problem identification and resolution

  • Global software updates

  • Job scheduler provisioning, integrated with Slurm and Platform LSF

  • Advanced reporting and comprehensive representational state transfer (REST) APIs

  • Rich web-based GUI

UFM cyber-AI: Cyber Intelligence and Analytics

UFM Cyber-AI
Cyber Intelligence and Analytics

The UFM Cyber-AI platform enhances the benefits of UFM Telemetry and UFM Enterprise, providing preventive maintenance and cybersecurity for lowering supercomputing opex.

 

Platform: Dedicated UFM Cyber-AI appliance on premises

 

Key features:

  • Includes UFM Telemetry and UFM Enterprise features

  • Detects performance degradations or usage profile changes over time

  • Detects abnormal cluster behavior

  • Uses AI to make correlations between phenomena (that may seem non-related) 

  • Alerts when preventive maintenance is required

  • Optimizes predictability with continuous system data collection

Additional Services

NVIDIA Net working Care—Monitoring and Network Operations Center (NOC) Services

NVIDIA Networking Care—Monitoring and Network Operations Center (NOC) Services

Regular performance analysis is essential to ensuring that your NVIDIA networking solution is aligned with your business objectives and the latest technologies. Our monitoring and NOC services continuously examine your solution for any potential faults before they occur, giving you peace of mind by identifying and addressing issues before they become problems. The end result is higher ROI and lower system maintenance costs.

 

Resources

See how you can build the most efficient, high-performance network.

Configure Your Cluster

Take Networking Courses

Ready to Purchase?