Authors: Kaleb Ruscitti and Leland McInnes
Full paper can be found at https://arxiv.org/abs/2410.03862.
Recall that in the typical application of Mapper, we first project the data onto the real line. We then cover the real-line with uniform size intervals whose size we call the resolution . Mapper is sensitive to the resolution, which determines the scale of topological features the Mapper can detect. Finding the choice for can be difficult apriori.
Typically the resolution is a global parameter that is fixed across the entire sample. The authors propose an improvement to Mapper by using a locally-varying choice of resolution. They provide a methodology of computing said variation from the data set.
Density Based Mapper
Herein, we denote by samples taken from some distribution on a topological space with known pairwise distances and Morse-type function .
Morse-type
A continuous real-valued function on a topological space is of Morse type if:
- There is a finite set , called the set of critical values, such that over every open interval there is a compact and locally connected space and a homeomorphism such that , where is the projection onto the second factor;
- , extends to a continuous function ;
- Each levelset has a finitely generated homology.
Kernel Perspective on Mapper
Let be a finite open cover of , then the pullback cover of is . We cluster each pullback and label the clusters . Then the Mapper graph has vertices
and edges
Let be the characteristic function of , i.e., if and only if . Then .
Thus, one may generalize the Mapper by defining new kernel functions .
For example, if , then the kernel function given by
gives .
Density-Sensitive Kerneled Covers
We wish to construct a Mapper which is sensitive to density information of .
f-kernel
An -kernel function centered at is a function such that:
- If then
- is continuous on the set .
- For a fixed , is monotone non-increasing in . [UNCLEAR]
For a fixed , we say that a choice of kernel and has sufficient width if .
For example, the function given by
is a -kernel function centered at . This kernel centered at 0 is drawn below:
Notice that if , then
i.e., for the kernel has sufficient width.
Let be a generic open maximal interval cover (gomic for short) of , i.e., a cover of open intervals where no more than two intervals intersect at a time and the overlap defined by
where is the Lebesgue measure satisfies for all .
For each interval let be its midpoint. We say a family of kernel functions has sufficient width relative to for threshold if centered at has sufficient width with radius for every .
Def
Given , and as above, the kerneled cover of associated to , with threshold , is the cover consisting of sets
for every .
Question: Are we defining the family of kernels as ?
Prp
Let be an open cover of consisting of open balls . Let be the kerneled cover for with threshold . If the have sufficient width relative to , then is an open cover of .
Proof: Since is continuous, is open. By the sufficient width condition, if then which implies that so . Hence,
Let be the distance in from to its th nearest neighbor in . We define to be a normalized sigmoid function,
where are the mean and standard deviation of respectively.
Prp
Suppose is the kernel function associated to the open ball . Then has sufficient width for threshold .
Proof: Assume that . Then since ,
This kernel was found by taking the Gaussian kernel and solving for a normalizing constant which guarantees sufficient width.
Def
Let be an open cover of a manifold , with Morse-type function . Let the resolution of be
When is the pullback under of an open cover for , this reduces to the resolution of the regular Mapper, .
Density-Based Mapper Algorithm
Let be a set of samples, and let denote the associated values of . Fix a clustering algorithm and parameters (number of nearest neighbors to find), (number of cover elements), (cover overlap) and an -kernel function . The algorithm has the following steps:
- Compute the inverse approximate Morse density for .
- Determine an open cover of intervals that have overlap .
- For each interval
- Compute the kernel for each and define .
- Run the clustering algorithm to assign each point in a cluster label .
- Add vertices to
- For ,
- Compute the intersection ,
- For each point , add weight to the edge between and .
To approximate the Morse density of at a point in step 1, we pick an appropriate open set around and compute the density of by using nearest neighbors, i.e., . To avoid division-by-zero problems, we use the inverse approximate Morse density . The algorithm below computes :
- For each , find the set of nearest neighbors . Let .
- For each , define
- Smooth by convolving with a window function , i.e.,
The author’s implementation uses a cosine window.
Convergence to the Reeb Graph
Recall that as the resolution goes to zero, the Mapper converges to the Reeb graph. The paper claims this property is preserved by the density-based Mapper.
Recall that the family , where , defines a filtration. If flip the interval we get another filtration where . Define ordered for all and . Define the extended filtration to have spaces
Applying the homology functor defines the extended persistence module .
Proposition 13
Suppose is a topological space and that is a Morse function. Then the endpoints of a persistence interval of only occur at critical values of .
If the critical points of are , then is the sequence
A zigzag persistence module is a generalization of persistence modules in which one allows some arrows to go backwards. For example, if is a Morse type function with critical values ordered in increasing order and for any set of values with , then the levelset zigzag persistence module is the sequence
where .
Mapper Graphs from Zigzag Modules
Let be a topological space with cover . This gives the zigzag module
where each , with , is given by inclusion. We can construct the Mapper graph associated to cover as follows:
- For each of the upper spaces, , we choose the basis consisting of the connected components of .
- For each of the lower spaces, , we choose the basis . Then the multinerve Mapper graph associated to the zig-zag module is a multigraph defined by vertex sets
and edge sets
To obtain the Mapper graph , one may take and collapse all parallel edges into single edges.
Assuming is Morse type, this module decomposes into interval modules:
We define a partial matching between persistence diagrams and as a subset such that if and then and if then . Let be the diagonal. Then the cost of is defined as
where where , otherwise . Then the bottleneck distance between and is defined by
which ranges over all partial matchings .