7  Random Graphs: Erdős–Rényi

Open the live notebook in Google Colab here.

Why Random Graphs?

Recently, we measured several properties of real-world networks. Some of our measurements included the number of connected components, the proportion of possible triangles realized, and the mean length of paths between nodes. When interpreting these measurements, we face two important questions:

  1. Is a given observation of an empirical network interesting or surprising in some way?
  2. What properties of the graph would be sufficient to explain that observation?

For example, suppose that we measure the transitivity (proportion of realized triangles) in a graph and find a value of \(0.4\). Is that interesting or surprising? If so, how would we explain it? For example, would the degree sequence alone be sufficient to explain this observation, or would we need to consider more detailed graph properties in order to explain it?

The way that we usually approach these questions is with mathematical models of graphs. We determine whether measurements are surprising by comparing to mathematical models. We can also design models that hold certain features fixed (like the degree sequence) in order to see whether those features are sufficient on their own to explain the value of other measurements (like transitivity). Because real data is usually messy and noisy, it’s helpful for us to allow our mathematical models to be messy and noisy too! Our models are therefore random models of graphs, or just random graphs for short.

In addition to the role of random graphs in contextualizing and explaining network measurements, they are also very interesting mathematical objectis in their own right! Indeed, there is an entire field of mathematics about the theory of random graphs. Random graph models are also helpful for algorithms for network data analysis, including in both the design and analysis of these algorithms.

The \(G(n,p)\) Model (Erdős–Rényi)

The model of Erdős and Rényi (1960) is the simplest and most fundamental model of a random graph. Our primary interest in the ER model is for mathematical insight and null modeling. The ER model is mostly understood in its major mathematical properties, and it’s almost never used in statistical inference.

The ER model is, however, a building block of models that are used in statistical inference. The stochastic blockmodel, for example, is often used for graph clustering and community detection. In its simplest form, it’s a bunch of ER graphs glued together.

Definition 7.1 (Erdős–Rényi Random Graph) An Erdős–Rényi random graph with \(n\) nodes and connection probability \(p\), written \(G(n,p)\), is a random graph constructed by placing an edge with probability \(p\) independently between each pair of distinct nodes.

We can imagine visiting each possible pair of edges \((i,j)\) and flipping an independent coin with probability of heads \(p\). If heads, we add \((i,j) \in E\); otherwise, we don’t. Indeed, one (relatively inefficient) implementation of the \(G(n,p)\) model works exactly like this.

The hidden code block prepares for our implementation of \(G(n,p)\) by importing several libaries and setting plotting styles.
Show code
import networkx as nx
import numpy as np
from itertools import combinations
from matplotlib import pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')

np.random.seed(1234)
def gnp(n, p):
    G = nx.Graph()
    for i, j in combinations(range(n), 2): # all pairs of nodes
        if np.random.rand() < p:
            G.add_edge(i, j)
    return G

Having implemented this function, we can now generate a graph.

G = gnp(100, 0.03)
Show code
nx.draw(G, 
        node_size = 100, 
        edge_color = "gray", 
        width = 0.5, 
        edgecolors = "white", 
        node_color = "steelblue")
Figure 7.1: A \(G(n,p)\) random graph with \(n = 100\) and \(p = 0.03\).

This approach to generating \(G(n,p)\) graphs by looping through all pairs of edges is not the most efficient possible; see Ramani, Eikmeier, and Gleich (2019) for faster algorithms.

Basic Properties

Let’s now explore some of the fundamental mathematical properties of \(G(n,p)\) graphs.

Exercise: Let \(i\) be a fixed node in a \(G(n,p)\) graph, and let \(K_i\) be its (random) degree. Show that \(K_i\) has binomial distribution with success probability \(p\) and \(n-1\) trials.

Exercise: Show that, for any node \(i\), \(\mathbb{E}[K_i] = p(n-1)\).

Clustering Coefficient of \(G(n,p)\)

Let’s work an example of estimating a property of the \(G(n,p)\) model. We’re going to estimate the transitivity, also called the global clustering coefficient of the \(G(n,p)\) model. Recall that the transitivity is defined as

\[ \begin{aligned} T = \frac{\text{number of triangles}}{\text{number of wedges}}\;. \end{aligned} \]

Here, a wedge on node \(i\) is a possible triangle: two edges \((i,j)\) and \((i,k)\). The addition of the edge \((j,k)\) would complete the triangle.

Let’s estimate the value of \(T\) for the \(G(n,p)\) model. Analyzing ratios using probability theory can get tricky, but we can get a pretty reliable picture of things by computing the expectations of the numerator and denominator separately.

How many triangles are there? We can choose a “base” node \(i\) in \(n\) ways and then choose a pair of additional nodes \(j\) and \(k\) in \(\binom{n-1}{2}\) ways. The probability that all three edges exist is \(p^3\). So, in expectation, there are \(n\binom{n-1}{2}p^3\) triangles in the graph.

On the other hand, how many wedges are there? We can again choose the base node \(i\) and the two additional nodes in a total of \(n\binom{n-1}{2}\) ways. A wedge (potential triangle) exists if the edges \((i,j)\) and \((i,k)\) are present, which occurs with probability \(p^2\).

The ratio between these two expectations is \(p\). Indeed, it’s possible to prove using concentration inequalities that, in this case, the expected transitivity is indeed very close to this ratio of expectations.

Let’s compute the transitivity of a \(G(n,p)\) graph with NetworkX. We’ll use a larger graph size for this experiment:

G = nx.fast_gnp_random_graph(1000, 0.03)
nx.transitivity(G)
0.030319640037110504

The observed value of the clustering coefficient is close to the prediction of our approximate calculation.

Sparse \(G(n,p)\)

Recall that it is possible to define sparsity more explicitly when we have a theoretical model (where it makes sense to take limits). Indeed, the sparse Erdős–Rényi model is very useful. Intuitively, the idea of sparsity is that there are not very many edges in comparison to the number of nodes.

It is also possible to consider other functional forms for \(p\) which are also often called “sparse” and lead to interesting behavior; for example \(p = \frac{c \log n}{n}\)

Definition 7.2 We say that a \(G(n,p)\) graph is sparse when \(p = c/(n-1)\) for some constant \(c\).

A consequence of sparsity is that \(\mathbb{E}[K_i] = c\); i.e. the expected degree of a node in sparse \(G(n,p)\) is constant. When studying sparse \(G(n,p)\), we are almost always interested in reasoning in the limit \(n\rightarrow \infty\).

Degrees

It is possible to show that, when \(p = c/(n-1)\), the degree of a node in sparse \(G(n,p)\) converges (in distribution) to a Poisson random variable with mean \(c\). For this reason, the \(G(n,p)\) is also sometimes called the Poisson random graph.

A Poisson random variable \(K\) with mean \(\lambda\) has probability mass function \(p_K(k) = \frac{\lambda^k e^{-k}}{k!}\).

Clustering

What does this imply for our estimation of the global clustering coefficient from before? Well, we expect the global clustering coefficient to be about \(p\), and if \(p = c/(n-1)\), then \(p \rightarrow 0\) as \(n \to \infty\) for sparse \(G(n,p)\). We’ve given a heuristic argument showing that \(T \approx p\), and so our analysis predicts:

The need to move beyond the ER model to develop sparse graphs with clustering coefficients was part of the motivation of Watts and Strogatz (1998), a famous paper that introduced the “small world model.”

Sparse Erdős–Rényi graphs have vanishing clustering coefficients.

Figure 7.2 shows how the global clustering coefficient of a sparse Erdős–Rényi random graph decays as we increase \(n\).

Show code
c = 5
N = np.repeat(2**np.arange(5, 15), 10)

def global_clustering(n, c):
    G = nx.fast_gnp_random_graph(n, c/(n-1))
    return nx.transitivity(G)

T = [global_clustering(n, c) for n in N]

fig, ax = plt.subplots(1)

theory = ax.plot(N, c/(N-1), color = "black", label = "Estimate")

exp = ax.scatter(N, T, label = "Experiments", color = "steelblue", s = 10)
semil = ax.loglog()
labels = ax.set(xlabel = "Number of nodes", 
       ylabel = "Global clustering coefficient")
plt.show()
Figure 7.2: Each point gives the global clustering coefficient of an ER graph with mean degree \(c = 5\) and specified number of nodes. We’ve repeated the calculation for each size of graph 10 times. The black line gives the estimate \(T = p = c/(n-1)\) for the clustering coefficient that we computed earlier.

Although the estimate that the global clustering coefficient should be equal to \(p\) was somewhat informal, experimentally it works quite well.

Cycles and Local Tree-Likeness

Recall the following definition:

Definition 7.3 A cycle is a walk that does not repeat edges and ends at the same node that it begins.

A triangle is an example of a cycle of length \(3\). Our argument about the graph transitivity shows that triangles are rare in sparse \(G(n,p)\) graphs: if I look at any given wedge, the probability that the wedge is completed to form a triangle shrinks to zero with rate \(n^{-1}\). In fact, it is true more generally that cycles of any length are rare in sparse \(G(n,p)\) graphs.

Theorem 7.1 In the sparse \(G(n,p)\) model, for any length \(k\), the probability that there exists a cycle of length \(k\) attached to node \(i\) shrinks to 0 as \(n \rightarrow \infty\).

Graphs in which cycles are rare are often called locally tree-like. A tree is a graph without cycles; if cycles are very rare, then we can often use techniques algorithms that are normally guaranteed to only work on trees without running into (too much) trouble.

Path Lengths

How far apart are two nodes in \(G(n,p)\)? Again, exactly computing the length of geodesic paths involves some challenging mathematical detail. However, we can get a big-picture view of the situation by asking a slightly different question:

See Riordan and Wormald (2010) and references therein.

Given two nodes \(i\) and \(j\), what is the expected number of paths of length \(k\) between them?

Let \(R(k)\) denote the number of \(k\)-paths between \(i\) and \(j\). Let \(r(k) = \mathbb{E}[R(k)]\). Let’s estimate \(r(k)\).

First, we know that \(r(1) = p\). For higher values of \(k\), we’ll use the following idea: in order for there to be a path of length \(k\) from \(i\) to \(j\), there must be a node \(\ell\) such that:

  • There exists a path from \(i\) to \(\ell\) of length \(k-1\). In expectation, there are \(r(k-1)\) of these.
  • There exists a path from \(\ell\) to \(j\) of length \(1\). This happens with probability \(p\).

There are \(n-2\) possibilities for \(\ell\) (excluding \(i\) and \(j\)), and so we obtain the approximate relation

\[ r(k) \approx (n-2)r(k-1)p\;. \]

Why is this an approximation? Well, some of the paths between \(i\) and \(\ell\) that are counted in \(r(k-1)\) could actually include the edge \((j, \ell)\) already. An example is \((i,j), (j,\ell)\). In this case, the presence of edge \((j,\ell)\) is not independent of the presence of the path between \(i\) and \(\ell\). The derivation above implicitly treats these two events as independent. Again, because cycles are rare in large, sparse ER, this effect is small when \(k\) is small.

Proceeding inductively and approximating \(n-2 \approx n-1\) for \(n\) large, we have the relation

\[ r(k) \approx (n-1)^{k-1}p^{k-1}r(1) = c^{k-1}p \tag{7.1}\]

in the sparse ER model.

Using this result, let’s ask a new question:

What path length \(k\) do I need to allow to be confident that there’s a path of length \(k\) between nodes \(i\) and \(j\)?

Well, suppose we want there to be \(q\) paths. Then, we can solve \(q = c^{k-1}p\) for \(k\), which gives us:

\[ \begin{aligned} q &= c^{k-1}p \\ \log q &= (k-1)\log c + \log p \\ \log q &= (k-1)\log c + \log c - \log n \\ \frac{\log q + \log n}{\log c} &= k \end{aligned} \]

Here, we’ve also approximated \(\log (n-1) \approx \log n\) for \(n\) large. This result tells us that distances between nodes grow very slowly in \(G(n,p)\), roughly at the rate \(\log n\). This is very slow growth. For example, supposing that I want there to be at least one path in expectation (\(q = 1\)), I need to allow \(k = \frac{\log n}{\log c}\). The population of the world is about \(8\times 10^9\), and Newman estimates that an average individual knows around 1,000 other people; that is, \(c = 10^3\) in the world social network. The resulting value of \(k\) here is around 3.3. If on the other hand we wanted to be quite confident that a path exists, we might instead wish to require 1000000 paths, leading to a required path length of roughly 5.3.

In other words, this calculation suggests that, if the world were an ER network, it would be very likely for any two given individuals to be connected by paths of length no longer than 6.

Results in this spirit have been born out by experiments in the social sciences, such as the “small world” experiment of Milgram (1967).

More formal calculations regarding the diameter (length of longest shortest path) of the ER graph confirm that the diameter of the ER graph grows slowly as a function of \(n\).

A Caveat

If you spend some time looking at Equation 7.1, you might find yourself wondering:

Hey, what happens if \(c \leq 1\)?

Indeed, something very interesting happens here. Let’s assume \(c < 1\) (i.e. we’re ignoring the case \(c = 1\)), and estimate the expected number of paths between \(i\) and \(j\) of any length. Using Equation 7.1, we get

\[ \mathbb{E}\left[\sum_{k = 1}^{\infty} R(k)\right] = \sum_{k = 1}^\infty c^{k-1}p = \sum_{k = 0}^\infty c^kp = \frac{p}{1-c}\;. \]

If we now use Markov’s inequality, we find that the probability that there is a path of any length between nodes \(i\) and \(j\) is no larger than \(\frac{p}{1-c}.\) In the sparse regime, we can substitute \(p = \frac{c}{n-1}\) to see that

Markov’s inequality states that \(\mathbb{P}(X \geq a) \leq \frac{\mathbb{E}(X)}{a}\).

\[ \frac{c}{(1-c)(n-1)}\rightarrow 0 \text{ as } n \to \infty\;. \]

So, this suggests that, if \(c < 1\), any two nodes are likely to be disconnected! On the other hand, if \(c > 1\), we’ve argued that we can make \(k\) large enough to have high probability of a path of length \(k\) between those nodes.

So, what’s special about \(c = 1\)? This question brings us to one of the first and most beautiful results in the theory of random graphs. To get there, let’s study in a bit more detail the sizes of the connected components of the ER graph.

Component Sizes and the Branching Process Approximation

We’re now going to ask ourselves about the size of a “typical” component in the Erdős–Rényi model. In particular, we’re going to be interested in whether there exists a component that fills up “most” of the graph, or whether components tend to be vanishingly small in relation to the overall graph size.

Our first tool for thinking about this question is the branching process approximation. Informally, a branching process is a process of random generational growth. We’ll get to a formal mathematical definition in a moment, but the easiest way to get insight is to look at a diagram:

Figure 7.3: Image source.

We start with a single entity, \(X_0\). Then, \(X_0\) has a random number of “offspring”: \(X_1\) in total. Then, each of those \(X_1\) offspring has some offspring of their own; the total number of these offspring is \(X_2\). The process continues infinitely, although there is always a chance that at some point no more offspring are produced. In this case, we often say that the process “dies out.”

Some of this exposition in this section draws on these notes by David Aldous.

Definition 7.4 (Branching Process) Let \(p\) be a probability distribution on \(\mathbb{Z}\), called the offspring distribution.

A branching process with distribution \(p\) is a sequence of random variables \(X_0, X_1,,X_2\ldots\) such that \(X_0 = 1\) and, for \(t \geq 1\),

\[ X_t = \sum_{i = 1}^{X_{t-1}} Y_i\;, \]

where each \(Y_i\) is distributed i.i.d. according to \(p\).

Technically, this is a Galton-Watson branching process, named after the two authors who first proposed it (Watson and Galton 1875).

History note: Galton, one of the founders of modern statistics, was a eugenicist. The cited paper is explicit about its eugenicist motivation: the guiding question was about whether certain family names associated with well-to-do aristocrats were giving way to less elite surnames.

Application to Erdős–Rényi

Branching processes create trees – graphs without cycles. The reason that branching processes are helpful when thinking about Erdős–Rényi models is that cycles are rare in Erdős–Rényi random graphs. So, if we can understand the behavior of branching processes, then we can learn something about the Erdős–Rényi random graph as well.

Here’s the particular form of the branching process approximation that we will use:

Definition 7.5 (Branching Process Approximation for ER Component Sizes) Sample a single node \(j\) at random from a large, sparse ER graph with mean degree \(c\), and let \(S\) be the size (number of nodes) of the component in which \(j\) lies. Note that \(S\) is random: it depends both on \(j\) and on the realization of the ER graph.

Then, \(S\) is distributed approximately as \(T\), where \(T = \sum_{i = 0}^{\infty}X_t\) is the total number of offspring in a GW branching process with offspring distribution \(\text{Poisson}(c)\).

The idea behind this approximation is:

  • We start at \(j\), whose number of neighbors \(\sim \text{Poisson}(c)\).
  • Each of these neighbors has a number of new neighbors \(\sim \text{Poisson}(c)\), and so on.
  • We keep visiting new neighbors until we run out, and add up the number of neighbors we’ve visited to obtain \(S\).
  • Since cycles are rare in ER, we are unlikely to double-count any nodes (doing so would create a cycle), and so this whole process also approximately describes \(T\) in a branching process with a \(\text{Poisson}(c)\) offpsring distribution.

The Subcritical Case

The mean of a \(\text{Poisson}(c)\) random variable is again \(c\). This implies that \(X_t\), the number of offspring in generation \(t\), satisfies \(\mathbb{E}[X_t] = c^{t}\). It follows that, when \(c < 1\), \(\mathbb{E}[T] = \frac{1}{1-c}\).

Now using Markov’s inequality, we obtain the following results:

In a \(\text{Poisson}(c)\) branching process with \(c < 1\), \[ \begin{aligned} \mathbb{P}(X_t > 0) = \mathbb{P}(X_t \geq 0) \leq \frac{\mathbb{E}[X_t]}{1} = c^t\;. \end{aligned} \]

So, the probability that the branching process hasn’t yet “died out” decays exponentially with timestep \(t\). In other words, the branching process becomes very likely to die out very quickly. What about the total number of nodes produced in the branching process? Again using Markov’s inequality, we find that

In a \(\text{Poisson}(c)\) branching process with \(c < 1\), \[\mathbb{P}(T \geq a) \leq \frac{\mathbb{E}[T]}{a} = \frac{1}{a}\frac{1}{1-c}\]

In particular, for \(a\) very large, we are guaranteed that \(\mathbb{P}(T > a)\) is very small.

Summing up, when \(c < 1\), the GW branching process dies out quickly and contains a relatively small number of nodes: \(\frac{1}{1-c}\) in expectation.

In this setting, the branching process is called subcritical.

Back to ER

If we now translate back to the Erdős–Rényi random graph, the branching process approximation now suggests the following heuristic:

Heuristic: In a sparse ER random graph with mean \(c < 1\), the expected size of a component containing a randomly selected node is roughly \(\frac{1}{1-c}\).

In particular, since this quantity is independent of \(n\), we find that the fraction of the graph occupied by this component is \(\frac{1}{n}\frac{1}{1-c}\) and therefore vanishes as \(n\rightarrow \infty\).

We can also turn this into a statement about probabilities: Markov’s inequality implies that, if \(S\) is the size of a component containing a randomly selected node,

\[ \mathbb{P}(S/n > a) \rightarrow 0 \]

for any constant \(a > 0\). In other words, for large \(n\), the largest component is always vanishingly small in relation to the graph as a whole.

Let’s check this experimentally. The following code block computes the size of the component in an ER graph containing a random node, and averages the result across many realizations. The experimental result is quite close to the theoretical prediction.

Show code
import networkx as nx
import numpy as np

def component_size_of_node(n, c):
    G = nx.fast_gnp_random_graph(n, c/(n-1))
    return len(nx.node_connected_component(G, 1))

c = 0.8
sizes = [component_size_of_node(5000, c) for i in range(1000)]

out = f"""
Average over experiments is {np.mean(sizes):.2f}.\n
Theoretical expectation is {1/(1-c):.2f}.
"""

print(out)

Average over experiments is 4.82.

Theoretical expectation is 5.00.

Note that the expected (and realized) component size is very small, even though the graph contains 5,000 nodes!

For this reason, we say that subcritical ER contains only small connected components, in the sense that each component contains approximately 0% of the graph as \(n\) grows large.

This explains our result from earlier about path lengths. The probability that any two nodes have a path between them is the same as the probability that they are on the same connected component. But if every connected component is small, then the probability that two nodes occupy the same one is vanishes.

The Giant Component

Definition 7.6 (Giant Component) We say that \(G(n,p)\) has a giant component if \[ \mathbb{P}(S/n > a) \rightarrow b \] for some constant \(b > 0\).

Intuitively, this means that there is a possibility of a connected component that takes up a nonzero fraction \(b\) of the graph.

So far, we’ve argued using the branching process approximation that there is no giant component in the Erdős–Rényi model with \(c < 1\). The theory of branching processes also suggests to us that there could be a giant component when \(c > 1\).

The proof of this fact is usually done in terms of generating functions and is beyond our scope, but you can check Wikipedia for an outline.

Fact: when \(c > 1\), there is a nonzero probability that the \(\text{Poisson}(c)\) branching process continues forever; that is, never goes extinct.

Using our correspondence between components of the ER model and branching processes, this suggests that, if we pick a random node, the component it is in has the potential to be very large. In fact (and this requires some advanced probability to prove formally), when \(c > 1\), there is a giant component. This is our first example of a phase transition.

It is also possible to prove that, with high probability, there is only one giant component; Newman does this in 11.5.1.

Definition 7.7 (Phase Transition) A phase transition is a qualitative change in response to a small variation in a quantitative parameter.

Examples of phase transitions include freezing, in which a liquid undergoes a qualitative change into a solid in response to a small variation in temperature.

Figure 7.4 shows two sparse ER random graphs on either side of the \(c = 1\) transition. We observe an apparent change in qualitative behavior between the two cases.

Show code
import networkx as nx
from matplotlib import pyplot as plt

fig, axarr = plt.subplots(1, 2, figsize = (6, 3))

n = 500
c = [0.7, 1.3]

for i in range(2):
    G = nx.fast_gnp_random_graph(n, c[i]/(n-1))
    nx.draw(G, ax = axarr[i], node_size = 1)
    axarr[i].set(title = f"c = {c[i]}")
Figure 7.4: Two sparse ER graphs with 500 nodes and varying mean degree.

Size of the Giant Component

Perhaps surprisingly, while it’s difficult to prove that there is a giant component, it’s not hard at all to estimate its size.

This argument is reproduced from Newman, pages 349-350

Let \(S\) be the size of the giant component in an Erdős–Rényi random graph, assuming there is one. Then, \(s = S/n\) is the probability that a randomly selected node is in the giant component. Let \(u = 1 - s\) be the probability that a given node is not in the giant component.

Let’s take a random node \(i\), and ask it the probability that it’s in the giant component. Well, one answer to that question is just “\(u\).” On the other hand, we can also answer that question by looking at \(i\)’s neighbors. If \(i\) is not in the giant component, then it can’t be connected to any node that is in the giant component. So, for each other node \(j\neq i\), it must be the case that either:

  1. \(i\) is not connected to \(j\). This happens with probability \(1-p\).
  2. \(i\) is connected to \(j\), but \(j\) is not in the giant component either. \(i\) is connected to \(j\) with probability \(p\), and \(j\) is not in the giant component with probability \(u\).

There are \(n-1\) nodes other than \(i\), and so the probability that \(i\) is not connected to any other node in the giant component is \((1 - p + pu)^{n-1}\). We therefore have the equation

\[ u = (1 - p + pu)^{n-1}\;. \]

Let’s take the righthand side and use \(p = c/(n-1)\): \[ \begin{aligned} u &= (1 - p(1-u))^{n-1} \\ &= \left(1 - \frac{c(1-u)}{n-1}\right)^{n-1}\;. \end{aligned} \] This is a good time to go back to precalculus and remember the limit definition of the function \(e^x\): \[ e^x = \lim_{n \rightarrow \infty}\left(1 + \frac{x}{n}\right)^{n}\;. \] Since we are allowing \(n\) to grow large in our application, we approximate

\[ u \approx e^{-c(1-u)}\;. \] So, now we have a description of the fraction of nodes that aren’t in the giant component. We can get a description of how many nodes are in the giant component by substituting \(s = 1-u\), after which we get the equation we’re really after: \[ s = 1- e^{-cs} \tag{7.2}\]

This equation doesn’t have a closed-form solution for \(s\), but we can still plot it and compare the result to simulations (Figure 7.5). Not bad!

Show code
import numpy as np
from matplotlib import pyplot as plt
import networkx as nx

# experiment: compute the size of the largest connected 
# component as a function of graph size for a range of mean degrees. 

def largest_component(n, p):
    G = nx.fast_gnp_random_graph(n, p)
    S = max(nx.connected_components(G), key=len)
    return len(S) / n

n = 50000
C = np.repeat(np.linspace(0.5, 1.5, 11), 10)
U = np.array([largest_component(n, c/(n-1)) for c in C])

# theory: prediction based on Newman 11.16

S = np.linspace(-.001, .6, 101)
C_theory = -np.log(1-S)/S

# plot the results to compare

plt.plot(C_theory, 
         S, 
         color = "black", 
         label = "Theoretical prediction", 
         zorder = -1, 
         linestyle = "--")

plt.scatter(C, 
            U, 
            label = "Experiment", 
            facecolor = "steelblue", 
            color = "white", 
            alpha = 0.8, 
            s = 150)

plt.gca().set(xlabel = "Mean degree", 
              ylabel = "Proportion of graph in largest component")

plt.legend()
Figure 7.5: Each point gives the fraction of an ER graph with 50,000 nodes occupied by the largest component. The mean degree is on the horizontal axis. The black line gives the theoretical prediction of Equation 7.2.

Limitations of \(G(n,p)\)

The Erdős–Rényi model \(G(n,p)\) is an important object in random graph theory, and many mathematicians have devoted their careers to studying it. Many of its properties are tractable using tools from probability theory, and it even reproduces some interesting realistic behaviors, such as short path lengths and the existence of a giant component. However, \(G(n,p)\) also has severe limitations as a modeling framework. Since every degree has an approximate Poisson distribution with the same degree, the overall degree distribution is itself Poisson. Many graphs have distinctly non-Poisson degree distributions, and \(G(n,p)\) cannot produce that feature. The scarcity of cycles, especially triangles, in \(G(n,p)\) is also unlike many empirical networks. \(G(n,p)\) also can’t reproduce networks with any kind of modular or community structure.

\(G(n,p)\) is an extremely rich object of mathematical study and an important first model in the study of random graphs. That said, we’ll now move on to models that are more able to reproduce some of the features we observe in empirical networks.

References

Erdős, Paul, and Alfréd Rényi. 1960. “On the Evolution of Random Graphs.” Publ. Math. Inst. Hung. Acad. Sci 5 (1): 17–60.
Milgram, Stanley. 1967. “The Small World Problem.” Psychology Today 2 (1): 60–67.
Ramani, Arjun S, Nicole Eikmeier, and David F Gleich. 2019. “Coin-Flipping, Ball-Dropping, and Grass-Hopping for Generating Random Graphs from Matrices of Edge Probabilities.” SIAM Review 61 (3): 549–95.
Riordan, Oliver, and Nicholas Wormald. 2010. “The Diameter of Sparse Random Graphs.” Combinatorics, Probability and Computing 19 (5-6): 835–926.
Watson, Henry William, and Francis Galton. 1875. “On the Probability of the Extinction of Families.” The Journal of the Anthropological Institute of Great Britain and Ireland 4: 138–44.
Watts, Duncan J, and Steven H Strogatz. 1998. “Collective Dynamics of ‘Small-World’networks.” Nature 393 (6684): 440–42.



© Heather Zinn Brooks and Phil Chodrow, 2025