NVIDIA NetQ

Introduce holistic, real-time visibility, troubleshooting, and DevOps into your modern data center network.

Introduction

AI Factory Network Operations With NetQ

NVIDIA NetQ™ is a highly scalable network operations toolset that provides visibility, troubleshooting, correlation, and validation of your NVIDIA NVLink™ Switches and NVIDIA® Cumulus® fabrics in real time. NetQ utilizes telemetry and delivers actionable insights about the health of your data center network, ensuring your AI network fabric is operating smoothly.

Overview

How NetQ works

Data Collection, Processing, and Visualization

NetQ uses agents on the switches and hosts to collect telemetry data across the entire network. As a central control point, NetQ stores and processes information to provide actionable insights and complete visibility. Its rich graphical user interface (GUI) quickly highlights issues and alerts, simplifying operations and increasing efficiency.

Benefits

Why Choose NetQ for AI Factory Network Operations?

NetQ is a holistic observability platform that natively supports streaming telemetry for hardware-accelerated detection and reporting of data plane anomalies and intermittent network issues. It ensures the highest-performance networking for AI training and inference.

Streamline Upgrades

Experience push-button simplicity for network management with NetQ's intuitive GUI.

Gain Real-Time Intelligence

Correlate configuration and operational status, and instantly identify and track state changes for your entire data center.

Reduce Downtime

Optimize AI operations with quick alerts, faster troubleshooting, and proactive detection.

Remediate Faster

Detect faulty network states and get alerts with precise fault location data.

Remove Complexity

Simplify operations and increase operator efficiency by quickly highlighting issues through visualizations and alerts.

Diagnose Root Causes

Trace network paths, replay the network state at any time in the past, review fabric-wide event change logs, and diagnose the root cause of state deviation.

Key Features

What You Get With NetQ

With full continuous integration and continuous deployment (CI/CD) functionality, NetQ makes it easy to manage and provision network elements within your AI fabric with a full suite of operations capabilities, such as visibility, troubleshooting, validation, trace, and comparative look-back functionality.

  • Network Management: Access powerful tools to manage your NVIDIA Cumulus Linux and NVOS environments with the push of a button.
  • Advanced Telemetry: Collect real-time data that enables deep troubleshooting, visibility, and automated workflows from a single GUI
  • Snapshot and Compare: Easily compare prior network configurations to configurations after network changes are made to eliminate risk of disruption.
  • Network-Wide Visibility: See real-time visualizations about the health of your network with NetQ’s rich GUI.
  • Flow Telemetry: Analyze fabric-wide latency and buffer occupancy data of all the paths of a 4-tuple or 5-tuple flow to identify congestion points.
  • Preventive Validation: Reduce manual errors before they’re rolled into production.
  • Diagnostic Troubleshooting: Diagnose the root cause of state deviations with advanced diagnostic tools.
  • gNMI Collection: Use the gRPC Network Management Interface (gNMI) specification to stream WJH telemetry data from the NetQ agent.
  • RoCE Support: Monitor your remote direct-memory access (RDMA) over Converged Ethernet (RoCE) environment with NetQ to gain actionable insights into your AI network fabric.

Resources

Continue Exploring NetQ

NVIDIA NetQ Datasheet

Learn about the features and benefits of NetQ, a modern operations tool that enables holistic, real-time visibility and troubleshooting of your data center network.

NVIDIA NetQ User Guide

Explore documentation on deploying, configuring, monitoring, and troubleshooting your network in your data center environment.

Next Steps

Ready to Get Started?

Get a Free Trial of NVIDIA Networking Software

Simulate a fully automated network topology using NVIDIA Air.

Discover Networking for the Era of AI

The network is ultimately responsible for AI performance, acting as the backbone of the data center to harness the power of generative AI.

Learn About the Spectrum-X Ethernet Platform

Featuring the NVIDIA Spectrum-X™ Ethernet switch, the Spectrum-X Ethernet platform is designed specifically to improve the performance and efficiency of Ethernet-based AI infrastructure.