2  Degree, Walks, and Paths

Open the live notebook in Google Colab here.

In our previous lecture we introduced the fundamentals of different representations of networks: mathematically through graphs and their matrix representations, and computationally through the Python package NetworkX. We’re now ready to start measuring some properties of different networks.

In this lecture we’ll focus on two different ways to measure features of connectivity in a network. The first is degree, which is a property of a node that gives us information about how many edges are connected to a node. We will also explore the notion of a walk, which tells us something about the ways to traverse between nodes.

Degree

Definition 2.1 The degree of a node in an undirected network is the number of edges connected to it. The degree of node \(i\) is, equivalently, \[ k_i = \sum_{j=1}^n A_{ij} \,. \]

Unsurprisingly, degree is directly related to the number of edges in an undirected network.

Exercise

Use degree to calculate the total number of edges in an undirected network.

Exercise

Calculate the mean degree of a node in an undirected network.

In the code below we use NetworkX to find the degrees of all the nodes in an undirected network: by definition using the adjacency matrix, and with the built-in function nx.degree().

Code
from matplotlib import pyplot as plt
import networkx as nx
import numpy as np
plt.style.use('seaborn-v0_8-whitegrid')

# This is a coding exercise in the live notes.

# Create an unweighted version of the Les Mis network
def unweight(G):
    for source, target in G.edges():
        G[source][target]['weight'] = 1
    return G

G_LesMis = unweight(nx.les_miserables_graph())

nx.draw(G_LesMis)

# Create and print a numpy array of node degrees directly from the adjacency matrix

A = nx.adjacency_matrix(G_LesMis)
print('Degrees computed from adjacency matrix:', np.sum(A, axis = 1))

# Print the degrees using the built in function in NetworkX. 

degree_vec = G_LesMis.degree()
print('Built-in NetworkX:', degree_vec)
#----
Degrees computed from adjacency matrix: [ 1 10  3  3  1  1  1  1  1  1 36  1  2  1  1  1  7  9  7  7  7  7  7 15
 11 16 11 17  4  8  2  4  1  2  6  6  6  6  6  3  1 11  3  3  2  1  2  1
 22  7  2  7  2  1  4 19  2 11 15 11  9 11 13 12 13 12 10  1 10 10 10  9
  3  2  2  7  7]
Built-in NetworkX: [('Napoleon', 1), ('Myriel', 10), ('MlleBaptistine', 3), ('MmeMagloire', 3), ('CountessDeLo', 1), ('Geborand', 1), ('Champtercier', 1), ('Cravatte', 1), ('Count', 1), ('OldMan', 1), ('Valjean', 36), ('Labarre', 1), ('Marguerite', 2), ('MmeDeR', 1), ('Isabeau', 1), ('Gervais', 1), ('Listolier', 7), ('Tholomyes', 9), ('Fameuil', 7), ('Blacheville', 7), ('Favourite', 7), ('Dahlia', 7), ('Zephine', 7), ('Fantine', 15), ('MmeThenardier', 11), ('Thenardier', 16), ('Cosette', 11), ('Javert', 17), ('Fauchelevent', 4), ('Bamatabois', 8), ('Perpetue', 2), ('Simplice', 4), ('Scaufflaire', 1), ('Woman1', 2), ('Judge', 6), ('Champmathieu', 6), ('Brevet', 6), ('Chenildieu', 6), ('Cochepaille', 6), ('Pontmercy', 3), ('Boulatruelle', 1), ('Eponine', 11), ('Anzelma', 3), ('Woman2', 3), ('MotherInnocent', 2), ('Gribier', 1), ('MmeBurgon', 2), ('Jondrette', 1), ('Gavroche', 22), ('Gillenormand', 7), ('Magnon', 2), ('MlleGillenormand', 7), ('MmePontmercy', 2), ('MlleVaubois', 1), ('LtGillenormand', 4), ('Marius', 19), ('BaronessT', 2), ('Mabeuf', 11), ('Enjolras', 15), ('Combeferre', 11), ('Prouvaire', 9), ('Feuilly', 11), ('Courfeyrac', 13), ('Bahorel', 12), ('Bossuet', 13), ('Joly', 12), ('Grantaire', 10), ('MotherPlutarch', 1), ('Gueulemer', 10), ('Babet', 10), ('Claquesous', 10), ('Montparnasse', 9), ('Toussaint', 3), ('Child1', 2), ('Child2', 2), ('Brujon', 7), ('MmeHucheloup', 7)]
Figure 2.1: A network of coappearances of characters in the book Les Miserables by Victor Hugo. Nodes represent characters and edges represent characters who appear within the same chapter.

We have to be a little more subtle in how we define degree in a directed network because there is a distinction between in-edges and out-edges in these networks.

Definition 2.2 In a directed network, the in-degree is the number of ingoing edges to a node and the out-degree is the number of outgoing edges. That is, the in-degree is defined to be \[ k_i^{\text{in}} = \sum_{j=1}^n A_{ij} \] and the out-degree is \[ k_j^{\text{out}} = \sum_{i=1}^n A_{ij} \]

We will repeat the exercises above for directed networks.

Exercise

Use degree to calculate the total number of edges in a directed network, and use this to calculate the mean (expected) in-degree and out-degree of a directed network.

Code
import networkx as nx
import numpy as np
import pandas as pd

# import data for our directed network
df = pd.read_csv("https://philchodrow.github.io/PIC16A/homework/HW3-hamilton-data.csv", 
names = ["mentioner", "mentioned"])

df = df[df["mentioner"] != df["mentioned"]]

G_Hamilton = nx.from_pandas_edgelist(df, 
                            source = "mentioner", 
                            target = "mentioned", 
                            edge_attr=None, 
                            create_using=nx.DiGraph())

nx.draw(G_Hamilton)

# This is a coding exercise in the live notes.

# Create and print a numpy array of node in-degrees and out-degrees directly from the adjacency matrix

A = nx.adjacency_matrix(G_Hamilton)
print('In-Degrees computed from adjacency matrix:', np.sum(A, axis = 0))
print('Out-Degrees computed from adjacency matrix:', np.sum(A, axis = 1))

# Print the in- and out-degrees using the built in function in NetworkX. 

out_degree_vec = G_Hamilton.out_degree()
in_degree_vec = G_Hamilton.in_degree()
print('In-degrees built-in NetworkX:', in_degree_vec)
print('Out-degrees built-in NetworkX:', out_degree_vec)
#----
In-Degrees computed from adjacency matrix: [13 14  1  4  1  1  1  1 10  1  6  1  2  2  7  6  4  1  9  9  1  1  2  4
  1  2  3  1  4  2  1  1  0  0  1  0  0  1  1  1  1  1  1  1  0  0]
Out-Degrees computed from adjacency matrix: [21 20  0  5  0  0  0  0  9  0  0  0  0  1  7  6  0  0  9  0  0  0  3  2
  5  2  1  0  2  0  0  0  6 13  0  5  6  0  0  0  0  0  0  0  1  1]
In-degrees built-in NetworkX: [('burr', 13), ('hamilton', 14), ('weeks', 1), ('madison', 4), ('jay', 1), ('theodosiaDaughter', 1), ('betsy', 1), ('theodosiaMother', 1), ('washington', 10), ('marthaWashington', 1), ('schuylerSis', 6), ('generalMontgomery', 1), ('philipS', 2), ('peggy', 2), ('angelica', 7), ('eliza', 6), ('reynolds', 4), ('generalMercer', 1), ('jefferson', 9), ('jAdams', 9), ('ness', 1), ('pendleton', 1), ('philipH', 2), ('lafayette', 4), ('laurens', 1), ('mulligan', 2), ('lee', 3), ('conway', 1), ('kingGeorge', 4), ('eacker', 2), ('kingLouis', 1), ('maria', 1), ('ensemble', 0), ('company', 0), ('admiralHowe', 1), ('men', 0), ('women', 0), ('franklin', 1), ('paine', 1), ('rochambeau', 1), ('green', 1), ('knox', 1), ('sAdams', 1), ('sally', 1), ('seabury', 0), ('doctor', 0)]
Out-degrees built-in NetworkX: [('burr', 21), ('hamilton', 20), ('weeks', 0), ('madison', 5), ('jay', 0), ('theodosiaDaughter', 0), ('betsy', 0), ('theodosiaMother', 0), ('washington', 9), ('marthaWashington', 0), ('schuylerSis', 0), ('generalMontgomery', 0), ('philipS', 0), ('peggy', 1), ('angelica', 7), ('eliza', 6), ('reynolds', 0), ('generalMercer', 0), ('jefferson', 9), ('jAdams', 0), ('ness', 0), ('pendleton', 0), ('philipH', 3), ('lafayette', 2), ('laurens', 5), ('mulligan', 2), ('lee', 1), ('conway', 0), ('kingGeorge', 2), ('eacker', 0), ('kingLouis', 0), ('maria', 0), ('ensemble', 6), ('company', 13), ('admiralHowe', 0), ('men', 5), ('women', 6), ('franklin', 0), ('paine', 0), ('rochambeau', 0), ('green', 0), ('knox', 0), ('sAdams', 0), ('sally', 0), ('seabury', 1), ('doctor', 1)]
Figure 2.2: A network of mentions in the musical Hamilton. Nodes represent characters; there is an edge from character \(i\) to character \(j\) if \(i\) mentions \(j\).

Definition 2.3 A network in which all nodes have the same degree is called a regular graph or regular network. A regular graph where all nodes have degree \(k\) is called \(k\)-regular.

Some special cases of regular graphs are lattices (e.g., a square lattice is 4-regular) and the complete graph where every node is connected to every other node (which is \((n-1)\)-regular).

Density and sparsity

Definition 2.4 The density or connectance \(\rho\) of a simple network is the fraction of possible edges that are actually present. That is,

\[ \rho = \frac{\text{number of edges}}{\text{possible edges}} = \frac{m}{\binom{n}{2}} \,. \]

One way to interpret density is to think of it as a probability that a pair of nodes picked uniformly at random is connected by an edge.

We can rewrite density in terms of expected degree using our earlier exercises :

This simplification comes from the binomial coefficient formula \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\). Also, we can notice the cool fact that \(n \choose 2\) is equivalent to the sum of the first \(n-1\) integers!

\[\begin{align} \rho &= \frac{m}{\binom{n}{2}} \\ &= \frac{m}{\frac{1}{2}n(n-1)} \\ &= \frac{2m}{n(n-1)} \\ &= \frac{c}{n-1} \,. \end{align}\]

If a network is sufficiently large, you can approximate the density as \(\rho \approx \frac{c}{n}.\)

Let’s compute density in network below using three different strategies:

  • Calculating directly using number of edges and number of nodes;
  • Calculating directly using mean degree and number of nodes;
  • Using the built-in NetworkX function nx.density().
Code
from matplotlib import pyplot as plt
import networkx as nx
import numpy as np
plt.style.use('seaborn-v0_8-whitegrid')

# Create an unweighted version of the Les Mis network
G = G_LesMis

# This is a coding exercise in the live notes.

# Calculate density directly using number of edges and number of nodes

n = G.number_of_nodes()
m = G.number_of_edges()
print('Density using nodes and edges: ', 2*m/(n*(n-1)))

# Calculate density directly using mean degree and number of nodes
# Hint: you may want to calculate degree from the adjacency matrix so that you can calculate mean using numpy

A = nx.adjacency_matrix(G)
degree = np.sum(A, axis = 1)
c = np.mean(degree)
print('Calculating using mean degree', c/(n-1))
#----

# Use the built-in NetworkX function nx.density()

density = nx.density(G)
print('Density with NetworkX built-in:', density)
Density using nodes and edges:  0.08680792891319207
Calculating using mean degree 0.08680792891319207
Density with NetworkX built-in: 0.08680792891319207

While it’s pretty straightforward to calculate the density of a network, it’s more complicated to determine whether a network is dense or sparse. There isn’t a universally agreed upon threshold for density below which a real-world network would be considered sparse. However, we can create a definition which applies for certain theoretical models of networks. If we have a model where we can take a formal limit, then such a network is sparse if \(\rho \to 0\) as \(n \to \infty\). In this scenario, the mean degree grows (much) more slowly than the number of nodes.

Walks and paths

We may like to know if it it is possible to reach one node from another by traversing edges. For this task, we introduce the notion of a walk.

Definition 2.5 A walk of length \(k \geq 2\) is a set of edges \(\{ (i_1,j_1), (i_2, j_2), \dots, (i_k, j_k)\}\) with the property that \(i_l = j_{l-1}\) for each \(2 \leq l \leq k\). We say this is a walk from node \(i_1\) to node \(j_k.\)

The length of a walk is the number of edges it contains.

A single edge \((i,j)\) is always considered a walk of length 1 from \(i\) to \(j\).

A walk between two nodes is not always well-defined. Consider the example below, where there is no walk from node 7 to node 3.

Code
from matplotlib import pyplot as plt
import networkx as nx
import numpy as np
plt.style.use('seaborn-v0_8-whitegrid')

graph_dict = {1: [2], 2: [1, 3], 3:[2], 4:[5,6], 7: [4]}
G = nx.Graph(graph_dict) 
nx.draw(G, with_labels = True, font_color = 'white', font_weight = 'bold')
Figure 2.3: Not all pairs of nodes in this network have a walk between them.

A question that pops up a lot in network analysis is “How many walks of length \(r\) exist between nodes \(i\) and \(j\)?”

The adjacency matrix gives a concise way to address this question. First, let’s consider \(r=1\). That’s just the number of edges from node \(j\) to \(i\), which is exactly \(A_{ij}\). Said another way,

The \(ij\)th entry of \({\bf A}^1\) counts the number of walks of length 1 from node \(j\) to node \(i\).

This observation generalizes by induction.

Theorem 2.1 (Counting Walks) The \(ij\)th entry of the matrix \({\bf A}^r\) contains the number of walks of length \(r\) from \(j\) to \(i\).

Definition 2.6 A path is a walk that is not self-intersecting. That is, any edge \((i,j)\) shows up in a path at most once.

A geodesic path or shortest path is from \(i\) to \(j\) is a walk fom \(i\) to \(j\) of minimum length; i.e. a walk such that no other walk has shorter length.

The length of a geodesic path is called the (geodesic) distance between \(i\) and \(j\) If two nodes are not path-connected, their geodesic distance is undefined.

Remarks

  • Shortest paths are not necessarily unique.
  • Shortest paths are self-avoiding. This is because if a shortest path intersected itself, this would create a loop which could be removed to create a shorter path.

Cyclic and acyclic graphs

Definition 2.7 A cycle is a path from a node \(i\) to itself.

A network with no cycles is acyclic.

By our definition, self-edges are cycles of length 1.

Exercise

What is the number of cycles of length \(r\) in a network starting and ending at node \(i\)?

While the formula above is very useful to look for cycles of a specific length, it could be quite inefficient to use to detect whether we have a cycle of any length (because we may have to check the diagonal entries of \({\bf A}^r\) for all possible cycle lengths \(r\)). We can construct a simple algorithm to determine computationally whether a network is cyclic or acyclic.

Use the sketch below to write this algorithm.

  • Find a node with no out-edges.
  • If no such node exists, the network is cyclic. Otherwise, remove the node and repeat the process.
  • If all nodes can be removed using the strategy above, the network is acyclic.
Code
from matplotlib import pyplot as plt
import networkx as nx
import numpy as np
plt.style.use('seaborn-v0_8-whitegrid')

# Implement the algorithm above to create a function is_cyclic(G). Your function should take a NetworkX graph G as an argument and print out whether a graph G is cyclic or acyclic.

def is_cyclic(G):
    while G.number_of_nodes() > 0:
        zero_out_degree = [node for node in G.nodes if G.out_degree(node) == 0]
        if len(zero_out_degree) == 0:
            print('Network is cyclic')
            return
        G.remove_nodes_from(zero_out_degree)
    else: 
        print('Network is acyclic')
        return

print('Binomial Tree:')
G_Tree = nx.binomial_tree(4, create_using = nx.DiGraph)
is_cyclic(G_Tree)

print('Hamilton Network:')
is_cyclic(G_Hamilton)
Binomial Tree:
Network is acyclic
Hamilton Network:
Network is cyclic
Figure 2.4

This algorithm has a nice mathematical consequence. If we label the nodes in an acyclic network according to the order we removed them, we will end up with an adjacency matrix that is strictly upper triangular. There exists at least one such labeling for any acyclic network.

This is because each node that is removed could only have out-edges that were already removed previously, i.e., nonzero entries of the \(i\)th column could only occur between columns 1 and \(i-1\).

Trees

Definition 2.8 A tree is a connected, acyclic, undirected network.

By “connected”, we mean every node is reachable from every other node by traversing a sequence of edges, i.e., there exists a walk between any two nodes

Trees are often drawn as rooted trees with a root node at the top and leaf nodes below.

A few remarks:

  • Topologically, the root of a tree is not unique
  • All trees are necessarily simple graphs, because self- and multiedges would create cycles.

Trees play important roles, especially in math and computer science. Trees have several useful properties that we can exploit for network analysis:

  • Because trees have no cycles, there is exactly one path between any pair of nodes (as long as we don’t allow ``backtracking’’’). Many calculations on networks with this property are simple(r).
  • A tree of \(n\) nodes has exactly \(n-1\) edges. Furthermore, any connected network with \(n-1\) edges and \(n\) nodes is a tree.

References



© Heather Zinn Brooks and Phil Chodrow, 2024