Monte-Carlo Eigengap
Most graph community detection techniques require the number of communities as an input parameter. Unfortunately, this number is very rarely known in advance. But fortunately, an educated guess may be obtained by using the eigengap heuristic. However, as its name indicates, this heuristic requires the computation of the eigenvalues of a graph's Laplacian matrix and its output's accuracy is not guaranteed. In addition, obtaining eigenvalues is a computationally expensive operation, with worst-case complexity of $O(N^3)$ (here, $N$ is the number of vertices). Even in the case of sparse matrices, the complexity only decreases to $O(dN^2)$ (here, d is the average number of non-zero elements in rows), which remains very high. Moreover, with very large graphs, which are now commonplace in the context of ``big data'', the need for efficient computation techniques has become ever more pressing.
To address this problem, we examine the accuracy of the eigengap heuristic in providing an estimate of the number of communities in a graph and the possibility of approximating it by sampling smaller subgraphs of $n \ll N$ vertices. We begin by testing the usefulness of the eigengap, by comparing it to the known number of communities in our test graphs. We continue our analysis with a comparison of the approximations obtained by sampling to the known location of the full graph's eigengap. We then examine the sample size to number samples trade-off. Results obtained by sampling fewer large subgraphs were compared to those obtained by sampling a larger number of smaller subgraphs. Finally, we end with a discussion of other graph sampling techniques from literature, namely sampling vertices via a degree-weighted distribution, sampling by traversal and sampling of edges.
Our initial results show that the eigengap is an accurate estimate of the number of communities on a graph and can be accurately estimated by sampling sub-matrices of the original adjacency matrix. These results are independent of the original graph's structure and are both theoretically and empirically demonstrated.
Joint work with A. Shestopaloff and Y. Lawryshyn. This work was supported by a Mitacs-Accelerate PhD award IT05806.