NetworkX

NetworkX is a package for the Python programming language that’s used to create, manipulate, and study the structure, dynamics, and functions of complex graph networks. 

 

What Is NetworkX?

NetworkX is a Python package for complex graph network analysis. In order to understand NetworkX functionality, you first need to understand graphs. Graphs are mathematical structures used to model many types of relationships and processes in physical, biological, social and information systems. A graph consists of nodes or vertices (representing the entities in the system) that are connected by edges (representing relationships between those entities). Working with graphs is a function of navigating edges and nodes to discover and understand complex relationships and/or optimize paths between linked data in a network.

Graph to optimize paths between linked data in a network.

There are many uses of graph network analysis, such as analyzing relationships in social networks, cyber threat detection, and identifying the people most likely to buy a product based on shared preferences. 

In the real world, nodes can be people, groups, places, or things such as customers, products, members, cities, stores, airports, ports, bank accounts, devices, mobile phones, molecules, or web pages.

Examples of edges, or relationships between nodes, include friendships, network connections, hyperlinks, roads, routes, wires, phone calls, emails, “likes,”  payments, transactions, phone calls, and social networking messages. Edges can have a one-way direction arrow to represent a relationship from one node to another, like if Janet “liked” a social media post of Jeanette’s. But they can also be non-directional, like if Bob is a Facebook friend of Alice, then Alice is also a friend of Bob.

NetworkX nodes can be any object that is hashable, meaning that its value never changes. These can be text strings, images, XML objects, entire graphs, and customized nodes. The base package includes many functions to generate, read, and write graphs in multiple formats.

NetworkX has the capacity to operate on very large graphs with more than 10 million nodes and 100 million edges. The core package, which is free software under the BSD license, includes data structures for representing such things as simple graphs, directed graphs, and graphs with parallel edges and self-loops. NetworkX also has a large community of developers who maintain the core package and contribute to a third-party ecosystem.

Among the principal uses of NetworkX are:

  • Study of the structure and dynamics of social, biological, and infrastructure networks
  • Standardized programming environment for graphs
  • Rapid development of collaborative, multidisciplinary projects
  • Integration with algorithms and code written in C, C++, and FORTRAN
  • Working with large nonstandard data sets

NetworkX is considered relatively easy to install and use, particularly for Python developers.

Why Graph Analytics?

Graph analytics can be used to determine the strength and direction of relationships between objects in a graph. The demand for tools to analyze relationships has nearly limitless potential given the growing role of networks in our information ecosystem. The influence of social networks on everything from buying decisions to national elections has catalyzed interest in graph analysis. It’s particularly useful in discovering relationships that aren’t obvious because of the complexity of the network or the number of paths between nodes. 

Graph analytics has been useful to achieve the following:

  • Detect financial crimes such as money laundering
  • Identify fraudulent transactions and activities
  • Perform influencer analysis in social network communities
  • Do recommendation analysis from customers ratings or purchases
  • Identify weaknesses in power grids, water grids, and transportation networks
  • Optimize routes in the airlines, retail, and manufacturing industries
  • During COVID-19, identifying people who had encountered infected individuals during a given period of time, an application that literally had life-and-death consequences
  • Understanding how influence works so marketers can target the people who are most likely to create word-of-mouth awareness for their products
  • Delivering social marketing content based on relationships between users—even if the users don’t know each other—by mapping similar interests and shared connections
  • Helping political campaigns and political scientists better understand the factors that contribute to information virality and the dissemination of fake news
  • Letting search engines serve up results based on preferences derived from the behavior of people with similar information demands

Graph analytics can be used to determine the strength and direction of relationships between objects in a graph. Graph analytics can be used to determine the strength and direction of relationships between objects in a graph.

Why NetworkX?

NetworkX provides a standardized way for data scientists and other users of graph mathematics to collaborate, build, design, analyze, and share graph network models. As free software that’s notable for its scalability and portability, NetworkX has been widely adopted by Python enthusiasts. It’s also the most popular graph framework used by data scientists, who contribute to a vibrant ecosystem of Python packages that extend NetworkX with features such as numerical linear algebra and drawing.

Facebook User Graph

Why NetworkX Matters to Data Scientists

Data Science Teams

Big data science projects like machine learning and deep learning often require collaboration between many team members. The availability of standardized tools and formats greatly simplifies information sharing. With its roots in Python, one of the most popular data science languages, NetworkX provides a graph analysis extension to Python libraries that requires minimal training for Python users and can be deployed across teams in different companies and continents.

Accelerating Graph Analytics with GPUs

GPUs provide a great way to accelerate data-intensive analytics—and graph analytics in particular—because of the massive degree of parallelism and the memory access bandwidth advantages. A GPU’s massively parallel architecture, consisting of thousands of small cores designed for handling multiple tasks simultaneously, makes it well suited for the computational task of “for every X do Y”, which can apply to sets of vertices or edges within a large graph.

The difference between a CPU and GPU.

Accelerating NetworkX with RAPIDS cuGraph

NVIDIA RAPIDS cuGraph delivers an accelerated graph analytics library that integrates the RAPIDS ecosystem with NetworkX. The vision of RAPIDS cuGraph is to make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks.

The compute power of the latest NVIDIA GPUs make graph analytics faster.  Moreover , the internal memory speed within a GPU allows cuGraph to rapidly switch the data structure to best suit the needs of the analytic rather than being restricted to a single data structure. 

RAPIDS’s graph algorithms like PageRank and functions like NetworkX make efficient use of the massive parallelism of GPUs to accelerate analysis of large graphs by over 1000X. Users can explore up to 200 million edges on a single NVIDIA A100 Tensor Core GPU and scale to billions of edges on NVIDIA DGX™ A100 clusters.

NVIDIA GPU-Accelerated, End-to-End Data Science

RAPIDS combines the ability to perform high-speed ETL, graph analytics, machine learning, and deep learning. It’s a suite of open-source software libraries and APIs for executing data science pipelines entirely on GPUs—and can reduce training times from days to minutes. RAPIDS relies on NVIDIA CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high memory bandwidth through user-friendly Python  interfaces. 

NVIDIA RAPIDS, end-to-end GPU-accelerated data science.

Rapids cuGraph seamlessly integrates into the RAPIDS data science ecosystem to enable data scientists to easily call graph algorithms using data stored in a GPU DataFrame. With the RAPIDS GPU DataFrame, data can be loaded onto GPUs using a Pandas-like interface, and then used for various connected machine learning and graph analytics algorithms without ever leaving the GPU. This level of interoperability is made possible through libraries like Apache Arrow. This allows acceleration for end-to-end pipelines—from data prep to machine learning to deep learning. RAPIDS and DASK allow cuGraph to scale to multiple GPUs to support multi-billion edge graphs.