Software! Math! Data! The blog of R. Sean Bowman
The blog of R. Sean Bowman

# Simplicial Complexes from Data

February 11 2016

In a previous article, we saw how to compute the (zeroth) persistent homology of a filtered simplicial complex. But how do we obtain such a complex, anyway? Scientists and others who work with data often start with a bunch of points in a high dimensional space. What are some good ways to turn that information into a filtered simplicial complex we can compute the persistent homology from?

The main tool used is called the Vietoris-Rips (VR) complex. Let $E$ be a collection of points in $\R^n$, the input data points we’d like to examine. The VR complex $V_{\epsilon}(E)$ is the simplicial complex whose $1$–skeleton (the underlying graph of the complex) consists of all the points as well as line segments between any two points that are within distance $\epsilon$ of each other. We form a simplicial complex from this graph by adding a $2$–simplex wherever we see a triangle, a $3$–simplex wherever we see a complete graph on four vertices, and so forth. (This type of complex is often called a flag complex).

It’s not difficult to see that if we choose numbers $\epsilon_1<\epsilon_2<\dots < \epsilon_n$, we get inclusions $V_{\epsilon_1}(E)\subseteq V_{\epsilon_2}(E)\subseteq\dots\subseteq V_{\epsilon_n}(E)$. This is the filtered complex we want to take the persistent homology of.

### A Visualization of VR complexes

Below is an example of several simple VR complexes. You can choose a data set from the drop down box (the default, noisy circle, is a good one to start with). The slider marked “scale parameter” corresponds to what we called $\epsilon$ above. As you slide the slider to the right, $\epsilon$ increases. We visualize $\epsilon$ as transparent disks centered at the points. When a disk centered at one point touches another point, an edge is drawn between the two.

The zeroth and first Betti numbers of the complex are shown below the slider. The zeroth Betti number is the number of connected components of the complex. It starts out as the number of points, but as the points become connected to one another by edges, it decreases. The first Betti number describes the number of “holes” in the data. For the circle example, notice that when each point is connected to its neighbor by an edge, this number changes from 0 to 1. (If you keep moving the slider to the right, it changes back to zero. Why is that? It’s because, although we haven’t shown it, there are $2$-simplices (triangles) in this picture as well. Once they fill in the disk bounded by the circle, the "hole" disappears, and the first Betti number goes back to zero).

minmax

,

Play with the different data sets and see if the Betti numbers match with your intuition. Can you describe the barcodes for each of these data sets?

While we’re at it, we can define a “dual” graph to a VR complex. This graph consists of a vertex for every maximal simplex (a simplex that is not a subsimplex of any other simplex in the complex), with two vertices joined if their simplices overlap. Since the VR complex is determined by its $1$–skeleton, and each clique of vertices in the $1$–skeleton is filled in by a simplex, this could also be called the “clique graph” of our complex. This clique graph can be important in applications, both theoretical and computational. (Warning: the terminology “clique graph” is not standard, I don’t think.)