2  Degree, Walks, and Paths

Open the live notebook in Google Colab here.

In our previous lecture we introduced the fundamentals of different representations of networks: mathematically through graphs and their matrix representations, and computationally through the Python package NetworkX. We’re now ready to start defining and measuring some more complex properties of networks.

The fundamental point of networks is that they are connected – we can “move along the edges” in order to understand the network structure. Our unifying theme in this set of lecture notes is how the idea of “taking steps in the network” unifies some important ideas: node degrees, walks between nodes, and paths between nodes.

Degree of a Node

Definition 2.1 The degree of a node in an undirected network is the number of edges connected to it: \[ k_i = \sum_{j=1}^n A_{ij} \,. \]

We can collect all the node degrees together into the degree sequence \(\mathbf{k} \in \mathbb{R}^n\) whose \(i\)th entry is \(k_i\), the degree of node \(i\).

A small complication in this definition is that, if there is a self-loop on node \(i\), then \(A_{ii} = 2\). So, when we say that the degree is the number of connected edges, we need to remember that self-loops count twice!

There is an important relationship between the degree sequence and the total number of edges \(m\) in the network:

Exercise

Give a formula for \(m\) in terms of the degree sequence \(\mathbf{k}\).

Exercise

The mean degree of a network is the average of the degree sequence: \(c = \frac{1}{n} \sum_{i = 1}^n k_i\). Give a formula for the mean degree in terms of \(m\) and \(n\) alone.

Computing with Degrees

Let’s grab some sample data. We’ll use the Les Miserables network, which is a network of coappearances of characters in the book Les Miserables by Victor Hugo. Nodes represent characters and edges represent characters who appear within the same chapter. This data set is supplied as a built-in example in NetworkX, and for this reason we’ll use it several times throughout these notes.

This hidden code cell imports several packages and defines an unweight function which we’ll use to convert the network from its native weighted format to an unweighted format.
Code
from matplotlib import pyplot as plt
import networkx as nx
import numpy as np
plt.style.use('seaborn-v0_8-whitegrid')

plot_kwargs = {"node_size" : 100, "edgecolors" : 'white', "node_color" : "steelblue", "width" : 0.5, "edge_color" : "darkgrey"}
# This is a coding exercise in the live notes.

def unweight(G):
    for source, target in G.edges():
        G[source][target]['weight'] = 1
    return G

Here’s how this network looks:

Code
fig, ax = plt.subplots(1, 1, figsize = (7, 4))
G_LesMis = unweight(nx.les_miserables_graph())
nx.draw(G_LesMis, **plot_kwargs)
Figure 2.1: A network of coappearances of characters in the book Les Miserables by Victor Hugo. Nodes represent characters and edges represent characters who appear within the same chapter.

Now let’s compute the degree of each node. There are multiple ways to achieve this task: we can work directly using the adjacency matrix, or we can use the built-in NetworkX function nx.degree().

One way to compute the degrees of the nodes in a graph is to use the adjacency matrix, as directly described by Definition 2.1. With convenient functions from NetworkX and NumPy, this is a two-liner:

For undirected graphs, we could equally do np.sum(A, axis = 0) because \(\mathbf{A}\) is a symmetric matrix. When we discuss directed graphs soon, it will become necessary to be careful!
# computing degrees directly from the adjacency matrix

A = nx.adjacency_matrix(G_LesMis)   # extract the adjacency matrix
degree_vector = np.sum(A, axis = 1) # sum the adjacency matrix over rows
print(f"The degree vector has shape {degree_vector.shape}")
The degree vector has shape (77,)

Networkx also has a helpful built-in function to calculate the complete set of degrees:

degree_view = G_LesMis.degree()

The result is a DegreeView object which behaves much like a Python dictionary (and which can be easily converted to a dictionary using the dict constructor).

Let’s take a moment to compare the first few nodes to make sure that our two methods agree:

i = 0
for name, degree in dict(degree_view).items():
    if i < 5:
        print(f"{name} has degree: {degree} (networkx) and {degree_vector[i]} (adjacency matrix)")
        i += 1
Napoleon has degree: 1 (networkx) and 1 (adjacency matrix)
Myriel has degree: 10 (networkx) and 10 (adjacency matrix)
MlleBaptistine has degree: 3 (networkx) and 3 (adjacency matrix)
MmeMagloire has degree: 3 (networkx) and 3 (adjacency matrix)
CountessDeLo has degree: 1 (networkx) and 1 (adjacency matrix)

Degree in directed graphs

We have to be a little more subtle in how we define degree in a directed network because there is a distinction between in-edges and out-edges in these networks.

Definition 2.2 In a directed network, the in-degree is the number of ingoing edges to a node and the out-degree is the number of outgoing edges. These are defined by the formulas \[ k_i^{\text{in}} = \sum_{j=1}^n A_{ij} \] and the out-degree is \[ k_j^{\text{out}} = \sum_{i=1}^n A_{ij} \]

Just as before, we can also define the in-degree sequence \(\mathbf{k}^{\mathrm{in}}\) and out-degree sequence \(\mathbf{k}^{\mathrm{out}}\).

We will repeat the exercises above for directed networks.

Exercise
  1. Express the total number of edges \(m\) in a directed graph in terms of the degree sequences \(\mathbf{k}^{\text{in}}\) and \(\mathbf{k}^{\text{out}}\).
  2. Determine the mean in-degree and mean out-degree of a directed graph in terms of \(m\) and \(n\).

Let’s see these definitions in action on a directed graph. This time, we’ll use the mention graph from the musical Hamilton. Each node in this graph is a character. There is a directed edge from character \(i\) to character \(j\) if character \(i\) mentions character \(j\) in a song.

Code
import pandas as pd

# import data for our directed network
df = pd.read_csv("https://philchodrow.github.io/PIC16A/homework/HW3-hamilton-data.csv", 
names = ["mentioner", "mentioned"])

df = df[df["mentioner"] != df["mentioned"]]

G_Hamilton = nx.from_pandas_edgelist(df, 
                            source = "mentioner", 
                            target = "mentioned", 
                            edge_attr=None, 
                            create_using=nx.DiGraph())

layout = nx.arf_layout(G_Hamilton)

fig, ax = plt.subplots(1, 1, figsize = (10, 7))
nx.draw(G_Hamilton, pos = layout,  with_labels = True, edge_color = "darkgrey", width = 0.5, node_color = "lavender", edgecolors = "white")

# This is a coding exercise in the live notes.

# Create and print a numpy array of node in-degrees and out-degrees directly from the adjacency matrix
Figure 2.2: A network of mentions in the musical Hamilton. Nodes represent characters; there is an edge from character \(i\) to character \(j\) if \(i\) mentions \(j\). This data set was collected and publicly shared by The Hamilton Project.

Let’s calculate the in-degrees of the nodes in the Hamilton network using the adjacency matrix and compare to the networkx built-ins.

Code
A = nx.adjacency_matrix(G_Hamilton)       # grab the adjacency matrix
in_degree_vector = np.sum(A, axis = 0)    # manual in-degrees from A
in_degree_view   = G_Hamilton.in_degree() # networkx builtin

i = 0
for name, degree in dict(in_degree_view).items():
    if i < 5:
        print(f"{name} has degree: {degree} (networkx) and {in_degree_vector[i]} (adjacency matrix)")
        i += 1
burr has degree: 13 (networkx) and 13 (adjacency matrix)
hamilton has degree: 14 (networkx) and 14 (adjacency matrix)
weeks has degree: 1 (networkx) and 1 (adjacency matrix)
madison has degree: 4 (networkx) and 4 (adjacency matrix)
jay has degree: 1 (networkx) and 1 (adjacency matrix)
Exercise

Repeat this calculation and check that you can also compute out-degrees, with a matching result.

Regular Graphs

The degree sequence can contain a lot of information about the graph structure. One important class of graphs has constant degree:

Definition 2.3 A network in which all nodes have the same degree is called a regular graph or regular network. A regular graph where all nodes have degree \(k\) is called \(k\)-regular.

Some special cases of regular graphs are lattices (e.g., a square lattice is 4-regular) and the complete graph where every node is connected to every other node (which is \((n-1)\)-regular).

Code
fig, ax = plt.subplots(1, 1, figsize = (2, 2))
G = nx.complete_graph(5)
nx.draw(G, **plot_kwargs)

A complete graph on 5 nodes. Each node has degree exactly equal to \(5-1 = 4\).

Regular graphs are of great interst in theoretical mathematics and computer science. We don’t usually spend a lot of time studying them in network science because most interesting systems and data sets are not regular.

Density and Sparsity

Definition 2.4 The density or connectance \(\rho\) of a simple network is the fraction of possible edges that are actually present. That is,

\[ \rho = \frac{\text{number of edges}}{\text{possible edges}} = \frac{m}{\binom{n}{2}} \,. \]

One way to interpret density is to think of it as a probability that a pair of nodes picked uniformly at random is connected by an edge.

We can rewrite density in terms of expected degree using our earlier exercises :

This simplification comes from the binomial coefficient formula \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\). Also, we can notice the cool fact that \(n \choose 2\) is equivalent to the sum of the first \(n-1\) integers!

\[ \begin{align} \rho &= \frac{m}{\binom{n}{2}} \\ &= \frac{m}{\frac{1}{2}n(n-1)} \\ &= \frac{2m}{n(n-1)} \\ &= \frac{c}{n-1} \,. \end{align} \]

If a network is sufficiently large, you can approximate the density as \(\rho \approx \frac{c}{n}.\)

Let’s compute density in network below using three different strategies:

  • Calculating directly using number of edges and number of nodes;
  • Calculating directly using mean degree and number of nodes;
  • Using the built-in NetworkX function nx.density().
Code
from matplotlib import pyplot as plt
import networkx as nx
import numpy as np
plt.style.use('seaborn-v0_8-whitegrid')

# Create an unweighted version of the Les Mis network
G = G_LesMis

# This is a coding exercise in the live notes.

# Calculate density directly using number of edges and number of nodes

n = G.number_of_nodes()
m = G.number_of_edges()
print(f"Density using nodes and edges: {2*m/(n*(n-1)):.4f}")

# Calculate density directly using mean degree and number of nodes
# Hint: you may want to calculate degree from the adjacency matrix so that you can calculate mean using numpy

A = nx.adjacency_matrix(G)
degree = np.sum(A, axis = 1)
c = np.mean(degree)
print(f"Calculating using mean degree {c/(n-1):.4f}")
#----

# Use the built-in NetworkX function nx.density()

density = nx.density(G)
print(f"Density with NetworkX built-in: {density:.4f}")
Density using nodes and edges: 0.0868
Calculating using mean degree 0.0868
Density with NetworkX built-in: 0.0868

Informally, we call a network sparse when it has low density. This is pretty tricky to formally define: how low is “low enough?” There isn’t a universally agreed upon threshold for density below which a real-world network would be considered sparse. However, we can create a definition which applies for certain theoretical models of networks. Many models are defined for graphs on \(n\) nodes. If we have a model where we can take a formal limit as \(n\) grows large, then such a network is sparse if \(\rho \to 0\) as \(n \to \infty\). In this scenario, the mean degree grows (much) more slowly than the number of nodes. We’ll study limiting arguments about sparse graphs much more when we discuss random graphs.

Exercise

Let \(\{G_n\}\) be a sequence of networks on \(n\) nodes which are \(k\)-regular for some fixed constant \(k\). Show that this sequence is sparse in the sense defined above.

There is a small cheat in this problem statement, since if \(k\) and \(n\) are both odd then it is not possible to form a valid network (since \(kn = 2m\), \(kn\) must always ben even). This kind of detail does not usually pose difficulties in the study of sparse network.

Walks and Paths

We may like to know if it it is possible to reach one node from another by traversing edges. For this task, we introduce the notion of a walk.

Definition 2.5 A walk of length \(k \geq 2\) is a set of edges \(\{ (i_1,j_1), (i_2, j_2), \dots, (i_k, j_k)\}\) with the property that \(i_l = j_{l-1}\) for each \(2 \leq l \leq k\). We say this is a walk from node \(i_1\) to node \(j_k.\)

The length of a walk is the number of edges it contains.

A single edge \((i,j)\) is always considered a walk of length 1 from \(i\) to \(j\).

Code
from matplotlib import pyplot as plt
import networkx as nx
import numpy as np
plt.style.use('seaborn-v0_8-whitegrid')

fig, ax = plt.subplots(1, 1, figsize = (3, 3))

graph_dict = {1: [2], 2: [1, 3], 3:[2], 4:[5,6], 7: [4]}
G = nx.Graph(graph_dict) 
nx.draw(G, with_labels = True, font_color = 'white', font_weight = 'bold', edgecolors = "white", ax = ax, node_size = 200, font_size = 10, edge_color = "darkgrey")
Figure 2.3: Not all pairs of nodes in this network have a walk between them.

Connected Graphs

It is not always the case that there exists a walk between two specified nodes, as shown in Figure 2.3. In many networks, it’s very important that there exist a walk between any two nodes (think, for example, of a power grid!), and we have a special definition to reflect this:

Definition 2.6 A connected graph is a graph with the property that, between every pair of nodes \(i\) and \(j\), there exists a walk from \(i\) to \(j\).

There are several ways to determine computationally whether a graph is connected.

Exercise

Outline an algorithm for determining whether a simple undirected graph is connected by progressively querying node neighbors. Assume that you know ahead of time how many total nodes exist in the network.

Counting Walks

A question that pops up a lot in network analysis is “How many walks of length \(r\) exist between nodes \(i\) and \(j\)?” The adjacency matrix gives a concise way to address this question. First, let’s consider \(r=1\). That’s just the number of edges from node \(j\) to \(i\), which is exactly \(A_{ij}\). Said in a slightly more cumbersome way,

The \(ij\)th entry of \({\bf A}^1\) counts the number of walks of length 1 from node \(j\) to node \(i\).

This observation generalizes by induction.

Theorem 2.1 (Counting Walks with Powers of \(\mathbf{A}\)) The \(ij\)th entry of the matrix \({\bf A}^r\) contains the number of walks of length \(r\) from \(j\) to \(i\).

There are many applications of this idea:

Exercise

What is \(\frac{1}{2}\mathrm{trace} \; \mathbf{A}^2\), in terms of quantities we’ve seen before?

Exercise

Give a mathematical criterion for a graph to be connected by considering the matrix

\[ \mathbf{M} = \sum_{i = 1}^{n-1} \mathbf{A}^i \,. \]

Paths

Paths are special walks which do not repeat any edges.

Definition 2.7 A path is a walk that is not self-intersecting. That is, any edge \((i,j)\) shows up in a path at most once.

A geodesic path or shortest path is from \(i\) to \(j\) is a path fom \(i\) to \(j\) of minimum length; i.e. a path such that no other path has shorter length.

The length of a geodesic path is called the (geodesic) distance between \(i\) and \(j\). If two nodes are not path-connected, their geodesic distance is undefined.

Here is an example of a geodesic path between two nodes in Zachary’s Karate Club network. The geodesic distance between these two nodes is 5.

Code
# code example adapted from: https://stackoverflow.com/questions/24024411/highlighting-the-shortest-path-in-a-networkx-graph

np.random.seed(123)

G = nx.karate_club_graph()
pos = nx.spring_layout(G)
nx.draw(G,pos, **plot_kwargs)

path = nx.shortest_path(G, source = 14, target = 16)
pos_path = {i: pos[i] for i in path}
path_edges = list(zip(path,path[1:]))
nx.draw_networkx_nodes(G,pos_path,nodelist=path,node_color='steelblue', node_size = 100, edgecolors = "black")
nx.draw_networkx_edges(G,pos_path,edgelist=path_edges,edge_color='k',width=1)

A geodesic path between two nodes in the Karate Club network.

Shortest paths are not necessarily unique; there may be more than one path of the same length between two nodes. There exist several algorithms for computing the shortest path between a pair of nodes or between every pair of nodes in a graph.

  • Dijkstra’s algorithm (Dijkstra 1956) is frequently used to compute shortest paths with a single fixed node as the point of origin.
  • The Floyd-Warshall algorithm (Rosen 2011) is a common choice for computing the shortest path between all pairs of nodes in a graph.

Explain briefly why shortest paths are self-avoiding: the same node cannot appear twice in a shortest path.

Cyclic and Acyclic graphs

Many algorithms and measures of graph structure reflect the existence of cycles within a graph.

Definition 2.8 (Cycles) A cycle is a path from a node \(i\) to itself.

A network with no cycles is acyclic.

By definition, self-edges are cycles of length 1. Multigraphs without self-edges can have cycles of length 2. Cycles in simple graphs are always of length 3 or more.
Exercise

How could we use the adjacency matrix \(\mathbf{A}\) to compute the number of cycles of length \(r\) in a network starting and ending at node \(i\)?

In some cases, it is useful to check a network to determine whether it contains any cycles. While the adjacency matrix can be used to count cycles of a specific length, it could be quite inefficient to use to detect whether we have a cycle of any length (because we may have to check the diagonal entries of \({\bf A}^r\) for all possible cycle lengths \(r\)). We can construct a simple algorithm to determine computationally whether a network is cyclic or acyclic. For the purposes of this algorithm, we’ll consider directed graphs; the algorithm can be easily adapted to undirected graphs. Here it is:

While \(n > 0\):

  • Find a node \(i\) with no out-edges (i.e. \(k_i^{\mathrm{out}} = 0\)).
    • If no such node exists, then the network is cyclic.
  • Remove node \(i\) and all its edges.

If all nodes can be removed this way, then the network is acyclic.

Exercise

Give a simple argument for why the algorithm above works. In particular:

  1. Why is this algorithm guaranteed to terminate?
  2. Why is this algorithm guaranteed to give the correct answer upon termination?

Now that we’re convinced that the algorithm works, let’s go ahead and implement it in Python. Our implementation accepts a NetworkX DiGraph object as an argument, returning True if the network is cyclic and False if the network is acyclic.

def is_cyclic(G):
    while G.number_of_nodes() > 0:
        zero_out_degree = [node for node in G.nodes if G.out_degree(node) == 0]
        if len(zero_out_degree) == 0:
            return True
        G.remove_nodes_from(zero_out_degree)
    else: 
        return False

Now let’s test our algorithm on a few examples. First, we’ll construct a random toy directed graph that we know does not possess any cycles by its construction. Indeed, our implemented cycle detector indicates that no cycles are present.

Code
print('Binomial Tree:', end = " ")

fig, ax = plt.subplots(1, 1, figsize = (3, 3))
G_Tree = nx.binomial_tree(4, create_using = nx.DiGraph)
nx.draw(G_Tree,  **plot_kwargs)
if is_cyclic(G_Tree): 
    print("cyclic.")
else: 
    print("acyclic.")
Binomial Tree: acyclic.
Figure 2.4: The binomial tree, a directed network containing no cycles.

On the other hand, the Hamilton mentions network contains at least one cycle:

Code
print('Hamilton Network:', end = " ")
layout = nx.arf_layout(G_Hamilton)
fig, ax = plt.subplots(1, 1, figsize = (3, 3))
nx.draw(G_Hamilton, layout, **plot_kwargs)
is_cyclic(G_Hamilton)
if is_cyclic(G_Hamilton): 
    print("cyclic.")
else: 
    print("acyclic.")
Hamilton Network: cyclic.
Figure 2.5: The Hamilton mention network, a directed network which contains many cycles.

This algorithm has a nice mathematical consequence. If we label the nodes in an acyclic network according to the order we removed them, we will end up with an adjacency matrix that is strictly upper triangular. This is because each node that is removed could only have out-edges that were already removed previously, i.e., nonzero entries of the \(i\)th column could only occur between columns 1 and \(i-1\). There exists at least one such labeling for any acyclic network.

Trees

Definition 2.9 A tree is a connected, acyclic network. By “connected”, we mean every node is reachable from every other node by traversing a sequence of edges, i.e., there exists a walk between any two nodes.

Trees can be both directed and undirected. When we refer to trees with no further descriptors, we usually mean undirected trees.

All trees are necessarily simple graphs, because self- and multiedges would create cycles. Trees always have exactly \(n-1\) edges, as can be proven via induction. Furthermore, any connected graph with \(n-1\) edges is a tree.

Trees are often drawn as rooted trees with a root node at the top and leaf nodes below. Any node can be chosen as the root of a tree.

Code
edges = [(0,1), (0,2), (1,3), (1,4), (2,5), (2,6), (2, 7), (7, 8)]

pos = {
    0 : (0, 0),
    1 : (-1, -1),
    2 : (1, -1),
    3 : (-1.5, -2),
    4 : (-0.5, -2),
    5 : (0.5, -2),
    6 : (1.0, -2), 
    7 : (1.5, -2), 
    8:  (1.5, -3)
}

fig, ax = plt.subplots(1, 1, figsize = (3, 3))
G_Tree = nx.Graph(edges)
nx.draw(G_Tree, pos = pos, **plot_kwargs)

A simple rooted tree with 9 nodes.

Trees play important roles, especially in math and computer science. Trees have many useful properties that we can exploit for network analysis. One of the most important ones is the uniqueness of paths: there is exactly one path between any pair of nodes (as long as we don’t allow “backtracking”). Many calculations on networks with this property are simple(r).

References

Dijkstra, Edsger W. 1956. “A Note on Two Problems in Connexion with Graphs.” In Edsger Wybe Dijkstra: His Life, Work, and Legacy, 287–90.
Rosen, Kenneth H. 2011. Discrete Mathematics and Its Applications (7th Edition). William C Brown Publishing.



© Heather Zinn Brooks and Phil Chodrow, 2025