6 Homophily, assortativity, and modularity
Open the live notebook in Google Colab here.
The tendency of individuals to associate with others whom they perceive to be like themselves is called homophily. Supposing that we have a network where nodes can be categorized by different classes, types, or groups, we may like to develop a way to measure the prevalence of homophily in our network. Informally speaking, a network is said to be assortative if a significant fraction of edges are between nodes of the same “type” and disassortative if a significant fraction of edges are between nodes of different “types.”
In this lecture, we’ll examine how to quantify these concepts.
Assortative mixing and modularity
Suppose we have a network where nodes are classified by a finite set of descriptive values.
This definition gives us some inspiration on a possible strategy on how to assortativity. Assortativity is defined by a comparison to a counterfactual version (or versions) of the network that has the same degree distribution but edges are positioned at random:
\[ \left[ \text{fraction of same-type edges} \right] - \left[ \text{expected fraction of same-type edges}\right] \,. \]
Let’s formalize this notation. Let \(Z\) be the set of possible group labels. For example, if there are \(g\) groups, then we have \(z_i \in \{z_1, \dots, z_g\}\) representing the type of node \(i\). To count the total number of same-type edges, we’d like a way to encode 1 every time an edge is between same-type edges.
We now have nice mathematical notation to count the total number of same-type edges that are present in our network: \[ \frac{1}{2} \sum_i \sum_j A_{ij}\delta_{z_iz_j} \,. \tag{6.1}\]
Next we need to calculate the expected number of same-type edges. Let’s suppose that for this comparison we are keeping some important structural properties of our network the same: we want to make sure that the number of edges \(m\) is preserved, as well as the degrees \(k_i\) of each of the nodes.
We need to know the probability that an edge from any node connects to an edge from node \(j\):
\[ P((i,j) \in E) = \frac{k_j}{2m-1} \approx \frac{k_j}{2m} \,. \]
With this, we can calculate the expected number of edges between node \(i\) and node \(j\) is approximately \(k_i\frac{k_j}{2m}\). Summing over all possible \(i,j\) combinations and again using our Kronecker delta to only count same type edges gives us an approximation for the expected number of edges between same-type node pairs
\[ \frac{1}{2} \sum_i \sum_j \frac{k_ik_j}{2m}\delta_{z_iz_j} \,. \tag{6.2}\]
We can combine Equation 6.1 and Equation 6.2 to get a count of the difference between the actual and expected number of same-type edges: \[\begin{align} \frac{1}{2} \sum_i \sum_j A_{ij}\delta_{z_iz_j} - \frac{1}{2} \sum_i \sum_j \frac{k_ik_j}{2m}\delta_{z_iz_j} \\ = \frac{1}{2}\sum_i \sum_j \left( A_{ij} -\frac{k_ik_j}{2m}\right)\delta_{z_iz_j} \,. \end{align}\]
If we divide by our total number of edges \(m\) we achieve our desired result. This quantity that measures assortativity in a network is called modularity.
In some sense modularity measures the extent to which same-type nodes are connected in a network.
This expression highlights two things:
- First, we are comparing the actual adjacency matrix \(\mathbf{A}\) of the graph to the expected adjacency matrix with entries \(k_ik_j/2m\).
- Second, we are performing this comparison only on the edges in which \(\delta_{z_iz_j}=1\). These are the edges on which \(z_i = z_j\); i.e. the edge joins two nodes in the same group.
Another Perspective on Modularity
The idea of comparing to a random graph where certain properties are preserved is a pretty useful way to think about the modularity, but there are also others. Let’s take another point of view.
Let \(Z\) be the set of possible group labels. For example, \(Z = \{z_1,z_2,\ldots,z_g \}\) for some \(g\). For each label \(\ell \in Z\), define \[ e_\ell \triangleq \frac{1}{2m}\sum_{i,j\in V}A_{ij}\delta_{z_i, \ell}\delta_{z_j, \ell} \quad \text{and} \quad f_\ell \triangleq \frac{1}{2m}\sum_{i\in V} k_i \delta_{z_i, \ell}\;. \]
We’re going to find copies of these expressions in \(Q\). The “trick” is to note that we can do fancy things with the \(\delta\)-function, like this: \[ \delta_{z_i, z_j} = \sum_{\ell \in Z}\delta_{z_i,\ell}\delta_{z_j,\ell} \tag{6.3}\] Inserting Equation 6.3 and doing some algebra, we find \[ \begin{aligned} Q(G, \mathbf{z}) &= \frac{1}{2m}\sum_{i,j \in V}\left[A_{ij} - \frac{k_ik_j}{2m}\right]\delta_{z_i, z_j} \\ &= \frac{1}{2m}\sum_{i,j \in V}\left[A_{ij} - \frac{k_ik_j}{2m}\right]\sum_{\ell \in Z}\delta_{z_i,\ell}\delta_{z_j,\ell} \\ &= \frac{1}{2m}\sum_{\ell \in Z}\sum_{i,j \in V}\left[A_{ij}\delta_{z_i,\ell}\delta_{z_j,\ell} - \frac{k_ik_j}{2m}\delta_{z_i,\ell}\delta_{z_j,\ell}\right] \\ &= \sum_{\ell \in Z}\left[e_\ell - \frac{1}{(2m)^2}\sum_{i,j \in V}k_i\delta_{z_i,\ell}k_j\delta_{z_j,\ell}\right] \\ &= \sum_{\ell \in Z}\left[e_\ell - \frac{1}{(2m)^2}\sum_{i \in V}k_i\delta_{z_i,\ell}\sum_{j \in V}k_j\delta_{z_j,\ell}\right] \\ &= \sum_{\ell \in Z}\left[e_\ell - f_\ell^2\right]\;. \\ \end{aligned} \tag{6.4}\] This compact expression for the modularity helps us interpret the expression in a new way. Remember that we consider the network to be assortative when \(Q(G, \mathbf{z})\) is large, i.e., when \(e_\ell\) is large and \(f_\ell\) small. What does it mean here?
Well, \(\sum_{\ell \in Z} e_\ell\) is the fraction of all edges that join nodes in the same group. One extreme case is when every node is in the same group: Then, \(\sum_{\ell \in Z} e_\ell = 1\). The term \(f_\ell\) tell us the fraction of ends of edges attached to nodes of type \(\ell\). Because of this, \(\sum_{\ell}f_\ell = 1.\) This means that \(\sum_{\ell} f_\ell^2\) can be reasonably small in the case where groups have approximately equal sizes.
References
© Heather Zinn Brooks and Phil Chodrow, 2024