Questions:

  • What is the problem we are trying to solve?
  • What does the “birth time” of a simplex mean? Answer: Smallest time such that is created in the filtration.
  • How to pick the intervals for Mapper construction?
  • How to visualize bottleneck metric on covers?
  • How does interleaved filtration relate to stability?
  • Why does an isomorphism between the 1-skeleton of VR and cover imply isomorphism between filtrations?
  • How to interpret Pareto frontiers?

Full paper can be found at: https://arxiv.org/abs/1906.08256.

Introduction

Two important concepts in TDA are persistentence and the Mapper construction.

It is typically unclear as to how one should interpret higher dimensional homology. One method is via persistence diagrams.

In most TDA applications we are under the assumption that the data has a metric. However, assigning a meaningful metric is not always clear, e.g., Netflix movie recommendations.

Mapper construction is defined to be the nerve of a cover of data:

invert

We begin with a set of overlapping intervals covering the possible values of some parameter (shown here as the -axis). Next we cluster the data set such that each cluster falls into an interval. We then take the nerve of the clusters.

The full algorithm is given here.

The choice of intervals may drastically affect the resulting complex. For example, consider the two constructions below:

invert

By slightly shrinking an interval, the single point that was in both yellow and cyan regions is now no longer in the yellow region. This results in the yellow-cyan edge in the nerve being removed. From a path perspective, the path from yellow to cyan goes from one edge to six edges.

Cover Filtrations

We construct a filtration given by a distance on covers.

Steinhaus Distance

Given a measure , the Steinhaus distance between two sets and is

The Steinhaus distance is bounded on and is based on the Jaccard distance. Indeed, since , the ratio of measures is between 0 and 1. In a sense, the Steinhaus distance measures “how little” the two sets intersect. When they don’t intersect at all, we get a distance of one. When , the distance is zero.

Naturally, this can be extended to a collection of sets:

When constructing the nerve of our cover, we keep track of the Steinhaus distance.

Steinhaus Nerve

The Steinhaus nerve of a cover , denoted , is the nerve of where each simplex is assigned a weight given by

We claim that whenever . Notice that if and is generated from the cover , then must be generated by the subcover , . Thus, when taking the unions and intersections:

As a result we get the filtration

where for .

This construction can have upto vertices and dimension . Computationally, we need to compute the volume of both intersections an unions of sets. If and are the computational cost of computing the volumes of unions and intersections, respectively, then each simplex costs . Resulting in a worst-case scenario of .

Stability

We focus on stability in terms of changes in the cover.

Bottleneck metric on covers

Let and be two finite covers of a finite set such that . Let be the set of all possible matchings. Then the bottleneck distance between two covers is defined as

where is the symmetric difference.

One may verify that is a metric on the space of finite covers of a finite set.

The following theorem, shows that for two covers of the same size, the Steinhaus nerve are interleaved, i.e.,

where and .

Thm

Suppose that and are two covers of with and . Given , and are interleaved filtrations.

The idea of the theorem is the following question: what is the largest change in Steinhaus distance possible between collections and ?

When , then there is some vertex not present in . As a result they cannot be interleaved.

Equivalence

Let be the cover of by balls of radius . Then the Cech complex is the nerve of this cover. The Cech filtration is the sequence of complexes given by letting .

Conjecture: Given finite and , the Cech filtration is isomorphic to the Steinhaus filtration constructed from under the Lebesgue measure.

The paper starts with proving there is a bijection between -skeletons in the 1-dimensional case. See daily entry 2024-03-29 for details. The bijection is given by

where is the Cech birth radius of the simplex defined by vertices and is the set of -balls centered at each vertex .

The paper then moves on to the 1-skeleton in any arbitrary dimension. Notice that if the 1-skeletons of the two constructions are isomorphic, then the cover filtration is isomorphic to the Vietoris-Rips filtration since VR is formed by initially finding the 1-skeleton of the Cech complex.

Lemma

The Vietoris-Rips filtration completely determines the cover filtration in arbitrary dimensions.

While the paper doesn’t provide a bijection between the two, they do provide a mapping from the birth time of edges in the Cech filtration to birth time of edges in the cover filtration. More specifically, this map is given by the Steinhaus distance (with Lebesgue measure) of two spheres in with radius that have a Euclidean distance of :

where is the volume of a hypersphere of radius and is the volume of intersection between two hyperspheres both with radius whose centers have distance . Both terms involve the gamma function and are messy to compute. The term was derived by Li in 2011.

The paper does not demonstrate that this mapping is bijective, and the existence of an inverse is not clear from the map. However, the existence of the map does suggest we can derive the cover filtration from the VR filtration. More precisely, once the VR filtration is constructed, we can construct the cover filtration by evaluating given above for each radius in the VR filtration.

While the conjecture was not definitely proven, experimental results are given that reinforce the validity of the claim:

center

The left most plot shows a uniform data set of 20,000 points with 50 sampled landmark points. The middle plot gives the persistence diagram of dimensions 0 and 1 for the VR filtration. Whereas the right plot gives an approximated persitence diagrams (dimensions 0 and 1) for the Steinhaus filtration based on the landmarks. The selected cover was balls of radii 0.5.

The two persistence diagrams have minimal differences, which is to be expected if the conjecture holds true as the Steinhaus filtration is approximated.

Stable Paths

We want a method to find “stable” paths in the 1-skeleton of a cover filtration. That is, paths with the least distance. That is, the Steinhaus distance. This differs from the shortest path which would be the path with fewest edges.

\rho-Stable path

Given a Steinhaus distance , a path is defined to -stable if

In other words, a path is -stable if each step along the path is at most far. Clearly, if then a -stable path is also -stable. Moreover, when , then we have a higher confidence that a -stable path is not due to noise than a less stable -stable path . We want to find the maximally stable paths between two vertices.

Maximally Stable Path

Given a pair of vertices and , a maximally stable s-t path is a -stable path between and for the smallest value of . In the case of multiple s-t paths with the same minimum -value, the shortest of the paths is used.

Finding the maximally stable path is a minimax path problem on undirected graphs. That is, we want to minimize the maximum edge weight. See daily note 2024-04-02 for more details.

There are two paths of relevant: the shortest path, and the most stable. We want to compare the two by computing the Pareto frontier between the two paths:

center

We repeatedly compute the shortest path (orange dots above) while keeping track of the cheapest known path and removing all edges with distance larger than .

Input: 1-skeleton G of cover filtration and vertices s,t

set LIST = [] // stores path P and cost p pairs
while s and t are connected in G:
    compute shortest path P between s and t
    set p = max edge distance in P

    if LIST has no path P' where |P|=|P'|:
        add (P,p) to LIST
    else if p < p' for path (P',p') in LIST where |P|=|P'|:
        replace (P',p') with (P,p) in LIST

    remove all edges e from G with distance >= p

Application: Recommendation Systems

Consider the following question: What movies should I show my friend first, to wean them into my favorite movie?