Configuration models are models of random graphs with a given degree sequence.
From a mathematical perspective, the idea of a configuration model is elegant and flexible. However, sampling from these models can be challenging and subtle. There are two primary strategies for thinking about configuration models.
Types of configuration models
Similar to \(G(n,p)\) models, we can understand a configuration model by understanding what is fixed (and what is random).
One way to set up a configuration model is to fix a degree sequence \(\{k_1, \dots, k_n\}\). Note that this also fixes the number of edges: \[
m = \frac{1}{2} \sum_j k_j \,.
\]
Definition 10.1 Consider a graph with \(n\) nodes. Let \({\bf k} \in \mathbb{N}^n\) be the degree sequence, where \(k_i\) is the degree of node \(i\).
A configuration model is a uniform distribution over graphs with degree sequence \({\bf k}.\)
The classic algorithmic strategy for sampling from such a configuration model is through stub-matching. Recall that each edge has two ends of edges, or stubs. This means that node \(i\) has \(k_i\) stubs.
Choose two stubs uniformly at random and connect these stubs to form an edge.
Choose another pair from the remaining unmatched stubs; repeat the process until all stubs are matched.
Show code
import networkx as nximport numpy as npfrom matplotlib import pyplot as pltfig, axarr = plt.subplots(1, 2)axarr[0].set(title ="Karate Club Graph")G_Karate = nx.karate_club_graph()nx.draw(G_Karate, ax = axarr[0], node_size =80)## Follow the steps below to create a stub-matching algorithm for a configuration model based on the Karate Club Graph.# Get degree sequence for Karate club graphdegree_sequence = [d for n, d in G_Karate.degree()]# Create new edgelist via stub matching algorithmstubs = np.repeat(np.arange(len(degree_sequence)), degree_sequence)mixed_stubs = np.random.permutation(stubs)edge_list = mixed_stubs.reshape(len(mixed_stubs)//2, 2)# Create a network G from your edgelistG = nx.MultiGraph()G.add_edges_from(edge_list)# Visualize our configuration model graphnx.draw(G, node_size =80)axarr[1].set(title ="Configuration Model Graph")
The stub-matching algorithm can produce multiedges and self-loops, which can cause the graph to not be simple. However, Bollobás (1980) proved that, when the graph is sparse, the expected number of multi-edges and self-loops does not grow with network size. One can, as a result, show these structures are rare, and can often be ignored in arguments. Another challenge with this algorithm is we require an even number of stubs to get a graph realization that exactly matches the degree sequence; although in practice if we are working from an empirical network this should not be a problem.
Fosdick et al. (2018) discuss the subtleties that arise due to whether we allow or disallow multiedges and self-loops and whether we choose to label stubs distinctly (as opposed to only labeling vertices). These choices result in different spaces of graphs from which we are sampling.
Fixing a degree distribution
When making mathematical arguments, you may want to sample from a particular degree distribution \(p_k\) instead of sampling graphs with a particular degree sequence \(k\).
We can do this using a slight modification to the strategy described above.
Draw a degree sequence \(\{k_i\}\) from the given distribution \(p_k.\)
In practice, this is most likely achieved by \(n\) independent draws from \(p_k\). A particular degree sequence then appears with probability \(\Pi_i p_{k_i}.\)
Construct a graph with this degree sequence by proceeding via stub matching as described above.
Once again, we can easily run into some challenges here. With the algorithm described above it is very possible to generate a degree sequence with an odd number of stubs, and such degree sequences would need to be discarded. The concerns about self-edges, multiedges, and labeling still apply.
While the two models we describe here are different, we expect them to behave similarly in the large-\(n\) limit, where a sequence drawn from a degree distribution more accurately captures the underlying distribution.
There are some important special cases that we’ve already encountered:
Using a Poisson degree distribution nearly recovers the \(G(n,p)\) model, excepting that we are able to generate self- and multiedges with this configuration model variant.
Using a power-law degree distribution helps us mathematically study the properties of scale-free networks, like those generated by preferential attachment.
Fixing a degree sequence in expectation (Chung–Lu model)
Some of the challenges with stub-matching arise from the requirement that we generate a graph with a specified degree sequence \({\bf k}.\)Chung and Lu (2002) relax this constraint and generate networks whose degree sequences are \({\bf k}\) in expectation, which avoids some of these issues.
Consider two nodes \(i\) and \(j\) with degress \(k_i\) and \(k_j\) respectively. From the perspective of node \(i\), the total number of stubs it could connect to is \(2m-k_i\) (if we exclude the possibility of self-loops). Thus, \[
\mathbb{P}(\text{node $i$ stub connects to node $j$}) = \frac{k_j}{2m-k_i}.
\]
Assuming that our network is large and sparse, we expect the argument above to hold independently for each stub of \(i\). Defining \(\mathbb{P}_{ij}\) as the probability the node \(i\) is connected to node \(j\), we have
We might notice that this feels a little similar to the \(G(n,p)\) model, and in fact, this is by construction. The degree of node \(i\) will be Poisson distributed with mean \(k_i\) under the assumptions given above. This means that, instead of stub-matching, we can use an algorithm like for a \(G(n,p)\) network by placing an edge between two nodes with probability \(\mathbb{P}_{ij}\).
Another assumption required here is that the maximum degree is not too large.
Excess Degree Distribution
Consider a configuration model with degree distribution \(p_k\) (so a fraction \(p_k\) of nodes have degree \(k\)). This means that \(p_k\) can be viewed as the probability that a node chosen uniformly at random from our network has degree \(k\).
Now, suppose we choose a node uniformly at random and follow an edges to this node’s neighbor (if it has one). What is the probability that the neighbor will have degree \(k\)?
Exercise
Explain why the probability that the neighbor has degree \(k\) is not also \(p_k.\)
First, let’s calculate the probability that we have a stub connected to a particular degree \(k\) node. A node with degree \(k\) has \(k\) stubs, and there are \(2m-1\) possible stubs.
Thus, because in expectation there are \(np_k\) nodes of degree \(k\), we have
\[
\mathbb{P}(\text{edge ends at any degree $k$ node}) = \frac{k}{2m-1}n p_k \approx \frac{k}{2m}np_k \,.
\]
Noting that our mean degree is \(c = \frac{2m}{n},\) we have that the probability that we reach a node of degree \(k\) is approximately \(\frac{1}{c}k p_k.\) In other words, this probability is proportional to \(kp_k\) (not \(p_k\) as we might expect). This quantity is called the edge-biased degree distribution.
We can calculate the additional number of edges attached to a neighbor other than the edge we arrived along using the process described above, which gives \[
q_k = \frac{(k+1)p_{k+1}}{c} \,.
\] This quantity is called the excess degree distribution.
The Friendship Paradox
Suppose we model a social network of friends with a configuration model. How many friends does your friend have?
To see this, let’s calculate the average degree of a neighbor of a randomly chosen node:
\[
\sum_k k \frac{1}{c}kp_k = \frac{1}{c}\sum_k k^2 p_k\,.
\] Let \(n\) be the total number of nodes and \(n_k\) be the number of nodes of degree \(k\). Then, \(p_k = \frac{1}{n}n_k.\)
Now we’ll simplify our expression by doing a clever re-indexing trick. As written, we’ve indexed over node degrees \(k\), but we could choose to index over node labels \(i\) instead. \[
\frac{1}{c}\sum_k k^2 \frac{1}{n}n_k = \frac{1}{cn}\sum_i k_i^2 = \frac{\langle k^2 \rangle}{c}\,,
\] where \(\langle k^2 \rangle = \frac{1}{n} \sum_i k_i^2\). To get the second equality, we notice that the set \(\{n_1, (2)^2n_2, (3)^2n_3, \dots, (k_{max})^2 n_{k_{max}}\}\) has the same sum as \(\{k_1^2, k_2^2, ..., k_n^2\}\).
Let’s compare this value to the expected degree of a randomly chosen node:
\[
\frac{\langle k^2 \rangle}{c} - c = \frac{1}{c}\left(\langle k^2 \rangle - c^2\right) = \frac{\sigma_k^2}{c}\,,
\] where \(\sigma_k^2\) is the variance of the degree distribution.
Unless we have a \(k\)-regular network, \(\sigma_k^2 >0.\) Thus,
That is, the expected degree of a node’s neighbor is greater than the expected node degree \(c\) … in other words, your friends have more friends than you do!
The intuition behind this idea is that a node with degree \(k\) appears as a neighbor to \(k\) other nodes, so high degree nodes are over-represented in the calculations.
References
Bollobás, Béla. 1980. “A Probabilistic Proof of an Asymptotic Formula for the Number of Labelled Regular Graphs.”European Journal of Combinatorics 1 (4): 311–16.
Chung, Fan, and Linyuan Lu. 2002. “Connected Components in Random Graphs with Given Expected Degree Sequences.”Annals of Combinatorics 6 (2): 125–45.
Fosdick, Bailey K, Daniel B Larremore, Joel Nishimura, and Johan Ugander. 2018. “Configuring Random Graph Models with Fixed Degree Sequences.”Siam Review 60 (2): 315–55.