Understanding Web-Scale Networking

In the move to private, public, and hybrid cloud environments, more and more organizations are realizing the tremendous cost savings, flexibility, and scalability of open, disaggregated, modular data center networks. The modern architecture of web-scale networking provides scalability and agility in the data center while lowering total cost of ownership (TCO) by drastically reducing operational expenditure (OpEx).


  • Software-Defined Networking Versus Web-Scale Networking

    Web-scale networking is a modern architectural approach to infrastructure with a few key constructs. Businesses can design cost-effective, agile networks for the modern era by adhering to these three constructs:

    • Open and modular
    • Intelligence in software
    • Scalable and efficient


    Open and modular software is becoming more and more widely adopted by mainstream enterprise customers. Mainstream enterprises have begun using this new open approach to achieve the efficiency goals of mega data center designs without building the networking stack themselves. The early adopters of these systems have been able to abandon the expensive and inflexible systems of that past. In doing so, they’ve increased the speed at which they can expand and modify the networks—delivering more applications quickly and at a reduced cost. Open networking works hand in hand with DevOps. Together, they provide the ability to configure and operate the network nimbly using automation and provisioning tools. The influence of DevOps on IT culture, tools, processes, and organizational structure has resulted in the acceleration of application delivery and an environment of continuous experimentation, causing organizations to rethink the conventional wisdom of their IT operations. Web-scale IT creates more resilient architecture to support those applications, enabling IT operations teams to implement and support leaner, efficient, and more agile processes.


    Intelligence in software essentially means that businesses are able to standardize the stack with a smart operating system model that provides you with the leverage to choose any open application. Intelligence in software allows a business to disaggregate the software, providing unprecedented choice, flexibility, and efficiency.

    Similar to what is done with the PC industry today, where you choose the operating system (like Windows or Linux) and then choose the hardware to run it (from Dell or HP, for example), the same can now be done with data center network switches. If an enterprise prefers specific software, they aren’t limited to only one hardware vendor to run it on. If they prefer specific hardware, they aren’t limited to one software vendor. This provides the customer further choice, which can significantly reduce costs and opens the door to an array of applications and tools for automation, scale, and efficiency.


    Building modular networks in the leaf/spine (otherwise known as Clos) architecture is exceptionally scalable and efficient. The Clos architecture can build up to vast scale if needed, as it acts like LEGO building blocks for the modern data center. Leaf/spine architecture provides predictable latency, as all hosts are equally distant from each other, and it natively provides utilization of all links through equal-cost multipath Routing (ECMP) using standardized mature routing protocols. Using layer-3 routing for redundancy and load sharing eliminates the constraints of multichassis link aggregation (MLAG). Since MLAG isn’t standardized, it’s proprietary per vendor, which means both paired switches need to be from the same vendor. Additionally, MLAG limits redundancy to two switches—if one switch fails, you’re down 50 percent of the bandwidth. By deploying routing with ECMP, you’re able to customize and define the bandwidth and redundancy.

    Open, flexible architecture isn’t only efficient for the new, larger east-west data center traffic (server-to-server or server-to-storage traffic within the data center); it’s also an optimal design for very small networks up to the largest mega data centers. This is because any organization of any size can benefit from cost reduction and efficiency.

    Web-scale networking principles can have a very lasting impact on an organization. With this new approach, businesses usually find themselves completely rethinking their network design, operations, and practices in a way that brings freedom and scale to the data center.

    Organizations will multiply the number of switches per engineer, reduce network complexities, standardize tools and programs, build automated methods to detect issues, improve time to market, and—overall—build a better network. These improvements make it easier for an organization to scale efficiently and affordably.

    Think Google, Amazon, Facebook, and Netflix (to name a few), who have led the way in realizing the breadth and depth of benefits that traditional models simply couldn’t deliver. The same model can also be applied toward organizations of every size and every industry so they can also reap the benefits of these new, agile environments.

  • Comparing Private, Public, and Hybrid Clouds

    When an organization hosts its own data center running applications, it’s considered a private cloud. Although it requires maintenance, it’s often the most secure and the enterprise has full control over the operations, the design, and the applications it runs. With the advent of web-scale networking, this model is becoming much more affordable and often is less expensive than public cloud.

    A public cloud is a data center hosted and maintained by an external provider, such as Amazon AWS or Microsoft Azure, that’s leased to other companies. In this case, an enterprise company typically pays a lease fee and is provided access and rights to use specific hardware or software from the hosting data center. As a high-level example, say you're running your company’s online store in Microsoft Azure: The application actually runs on servers in Microsoft data centers, but you're allowed to access it for an ongoing fee. The public cloud method is often an easy choice, but it can be costly, which can result in it also being inflexible and limiting to an organization's ability to scale.

    Companies using a public cloud environment often maintain a smaller private cloud, a model known as hybrid cloud. With a hybrid cloud, your company provides some applications in its own data center but leases others from a public data center for a fee. Organizations use this approach for a lot of reasons:

    • Testing the waters: They may be testing and building out a private cloud environment while maintaining a public one.
    • Resource restraint: They may not have enough capacity internally to fully leverage a private cloud.
    • Diversification: They may divide different types of organizational operations into each environment for efficiency.
    • Cost savings: They may find the public cloud environment too pricey for all operations, so they've divided accordingly.
    • Backup: They may simply use a private cloud for development, testing, backup, redundancy, and control.

    It depends on your organization, but many companies start with private cloud adoption in order to test the waters. For example, a company with a traditional network may want to create a web-scale IT network to improve agility and lower costs. Often, these companies will first migrate non-critical workloads into the private cloud to get comfortable with the new environment. Then, when they’re satisfied with the output, they begin moving critical workloads into the private cloud.

    Private cloud deployments allow for ease of provisioning, capacity, performance, and lower cost when deployed with a modern architecture. When enterprises build a private cloud environment, they emulate large-scale IT organizations.

    Companies may use public cloud environments for testing applications or to enable fast and frictionless services, which enable agility and spur innovation. Other companies deploy some combination of both private and public cloud environments in a hybrid cloud, depending on their application delivery methods or the internal services required to support their organization.

  • The Benefits of Web-Scale Networking NetDevOps

    Moving to a web-scale IT architecture provides many benefits to an organization. Primarily:

    • Ability to scale efficiently
    • Ability to automate easily
    • Cost reduction on both Capex and Opex
    • Complete IT agility, deliver internal and external applications in record speed
    • Freedom to choose any combination of vendors based on needs and budget


    It’s easy to scale a web-scale network as your business grows. The leaf/spine architecture is effective for administering three racks to hundreds of racks. Simply add additional racks with leaf switches as the need arises and connect them to the spine switches. If this gets too large, an additional tier can be added to the network. It’s a very simple, modular design that’s optimized for both east-west traffic (inter-server communication) as well as north-south traffic (accessing the data center from outside). This model allows expansion without having to touch the existing network.


    When you standardize your network with Linux-based switches, you can leverage all of your existing automation tools and DevOps practices. You’ve probably heard the term NetDevOps—this is exactly what the term is referring to.

    NVIDIA enables a consistent experience between the network and compute. You can automate the complete operational life cycle of network devices, from configuration and provisioning to policy-based change management. By employing these modern networking approaches to automation with NVIDIA® Cumulus Linux, you can reduce costs, improve efficiency, improve your operator-to-switch ratio, and reduce complexities and issues.


    Modern data center networks are doubling down on automation. Deploying a new rack, and thus the applications and services on that rack, is much quicker than the traditional model. The Open Networking Install Environment (ONIE) is used to automatically load an operating system, zero-touch provisioning (ZTP) is used for initial configuration, and DevOps tools are used to configure and maintain all the servers and the switches on the rack. This means the server and the switches connecting the servers can be brought online simultaneously, thereby reducing the time needed to deliver the applications. That means your IT team can deliver applications with higher service-level agreements (SLAs) and vastly reduced deployment times.

    DevOps’ influence in combining knowledge and processes within software development, QA, and IT professionals has led to a newer, more collaborative, and agile method of developing and testing software, resulting in it becoming general availability ( GA) faster. Bringing this methodology into the network is often referred to as NetDevOps.

    With this NetDevOps mindset, the IT organization is transformed into an organization that is viewed as a trusted partner rather than a 24/7 support team being called only to put out fires when software and hardware systems fail. The IT department takes its rightful place as a proactive partner rather than a reactive disaster recovery team. This mindset has naturally flowed into the web-scale IT architecture that’s being increasingly adopted in data centers around the world today.


    An enterprise's ability to independently choose software and hardware, all the way down to the optics, provides not only cost reduction but also the ability to choose which aspect is best for its needs.

  • The Technology Available for Web-Scale Networking

    NVIDIA develops intelligent software and technology to provide solutions that are designed for automation and scale. Our network operating system (NOS), Cumulus Linux, works with more than 50 vendor platforms and offers a flexible open architecture, faster IT delivery, automation, and scale. We're continuously expanding the technology to optimize the web-scale networking approach with features we've innovated in house like Packet Transfer Mode (PTM), Ethernet VPN (EVPN), Network Command Line Utility (NCLU), and more.

    We also understand that the NOS is just one piece of the puzzle. We also offer a range of services, products, and education to get your modern data center network up and running quickly and easily.

    When building your web-scale data center, you should consider what types of technologies you'll want to leverage for efficiency, power, and your overall organizational needs. We offer information on several of our use cases and solutions, but we also believe that the beauty in web-scale IT is that it allows your network to be built precisely to the needs of your organization. Here are a few use cases to consider.


    Network virtualization involves separating the network layer from dependence on the underlying hardware. For example, a virtual local area network (VLAN) is layer-2 virtualization that divides the local network into separate subnetworks—often used for separating traffic across the same hardware cable. Additionally, network virtualization can entail running layer-2 segments over a layer-3 network. In the same way that cloud provisioning works, virtualization enables network provisioning so you can create as many instances as you need, supporting scaling as you grow.

    NVIDIA interoperates with network virtualization tools from vendors such as VMware, Midokura, Akanda, and others through Cumulus Linux, so you don't have to worry about which model and architecture to choose. The networking component has been simple and seamless under the hood.


    Automation is a key component of web-scale networking, as it reduces resource demands, mitigates errors, and helps operators manage more switches. By choosing an open networking model, businesses can leverage existing automation tools as well as DevOps practices.

    Many of the automation tools that have been used in the compute world for years have been migrated into the networking world. Using Cumulus Linux, you can automate the complete operational life cycle of network devices, from configuration and provisioning to policy-based change management. Automating a network results in ease and speed of configuration and operation, eliminating inconsistencies and misconfigurations contributed by human error and version control.


    Until open networking became an available methodology, OpenStack clusters were only as flexible as the top-of-rack switch. Customers could take advantage of the open compute and storage standards, but networking remained a bottleneck.

    This completely changes with web-scale networking. With Cumulus Linux, the entire stack can run Linux. With the data plane and control plane speaking the same language, there's no need for complex compute application programming interfaces (APIs) or networking command-line interfaces (CLIs).

  • How Cumulus Linux and SONiC Support Web-Scale IT

    NVIDIA is the leader in web-scale networking. We enable a web-scale IT architectural approach in the network—a critical part of the data center that has traditionally been a bottleneck for rapid deployment of applications.

    With Cumulus Linux and/or SONiC, our customers can run their data center networks the way Google and Facebook have done for years—highly automated—without all the development time or expensive, specialized hardware. However, for customers that don't wish to automate, we have also developed NCLU, which provides a modern command-line interface to configure the switch.

    Hardware can cost as little as a sixth of what it did in legacy IT models, but the operational difference is even more dramatic: One admin can oversee 500 switches, versus 20 previously, and provisioning takes hours or minutes instead of days or weeks.

    Interested in how NVIDA can help your business? Talk to a web-scale IT expert today.

Prefer to try before you buy? Try NVIDIA Cumulus Linux for free.