Authors: Kaleb Ruscitti and Leland McInnes

Full paper can be found at https://arxiv.org/abs/2410.03862.


Recall that in the typical application of Mapper, we first project the data onto the real line. We then cover the real-line with uniform size intervals whose size we call the resolution . Mapper is sensitive to the resolution, which determines the scale of topological features the Mapper can detect. Finding the choice for can be difficult apriori.

Typically the resolution is a global parameter that is fixed across the entire sample. The authors propose an improvement to Mapper by using a locally-varying choice of resolution. They provide a methodology of computing said variation from the data set.

Density Based Mapper

Herein, we denote by samples taken from some distribution on a topological space with known pairwise distances and Morse-type function .

Morse-type

A continuous real-valued function on a topological space is of Morse type if:

  1. There is a finite set , called the set of critical values, such that over every open interval there is a compact and locally connected space and a homeomorphism such that , where is the projection onto the second factor;
  2. , extends to a continuous function ;
  3. Each levelset has a finitely generated homology.

Kernel Perspective on Mapper

Let be a finite open cover of , then the pullback cover of is . We cluster each pullback and label the clusters . Then the Mapper graph has vertices

and edges

Let be the characteristic function of , i.e., if and only if . Then .

Thus, one may generalize the Mapper by defining new kernel functions .

center

For example, if , then the kernel function given by

gives .

Density-Sensitive Kerneled Covers

We wish to construct a Mapper which is sensitive to density information of .

f-kernel

An -kernel function centered at is a function such that:

  1. If then
  2. is continuous on the set .
  3. For a fixed , is monotone non-increasing in . [UNCLEAR]

For a fixed , we say that a choice of kernel and has sufficient width if .

For example, the function given by

is a -kernel function centered at . This kernel centered at 0 is drawn below:

center

Notice that if , then

i.e., for the kernel has sufficient width.

Let be a generic open maximal interval cover (gomic for short) of , i.e., a cover of open intervals where no more than two intervals intersect at a time and the overlap defined by

where is the Lebesgue measure satisfies for all .

For each interval let be its midpoint. We say a family of kernel functions has sufficient width relative to for threshold if centered at has sufficient width with radius for every .

Def

Given , and as above, the kerneled cover of associated to , with threshold , is the cover consisting of sets

for every .

Question: Are we defining the family of kernels as ?

Prp

Let be an open cover of consisting of open balls . Let be the kerneled cover for with threshold . If the have sufficient width relative to , then is an open cover of .

Proof: Since is continuous, is open. By the sufficient width condition, if then which implies that so . Hence,

Let be the distance in from to its th nearest neighbor in . We define to be a normalized sigmoid function,

where are the mean and standard deviation of respectively.

Prp

Suppose is the kernel function associated to the open ball . Then has sufficient width for threshold .

Proof: Assume that . Then since ,

This kernel was found by taking the Gaussian kernel and solving for a normalizing constant which guarantees sufficient width.

Def

Let be an open cover of a manifold , with Morse-type function . Let the resolution of be

When is the pullback under of an open cover for , this reduces to the resolution of the regular Mapper, .

Density-Based Mapper Algorithm

Let be a set of samples, and let denote the associated values of . Fix a clustering algorithm and parameters (number of nearest neighbors to find), (number of cover elements), (cover overlap) and an -kernel function . The algorithm has the following steps:

  1. Compute the inverse approximate Morse density for .
  2. Determine an open cover of intervals that have overlap .
  3. For each interval
    1. Compute the kernel for each and define .
    2. Run the clustering algorithm to assign each point in a cluster label .
    3. Add vertices to
  4. For ,
    1. Compute the intersection ,
    2. For each point , add weight to the edge between and .

To approximate the Morse density of at a point in step 1, we pick an appropriate open set around and compute the density of by using nearest neighbors, i.e., . To avoid division-by-zero problems, we use the inverse approximate Morse density . The algorithm below computes :

  1. For each , find the set of nearest neighbors . Let .
  2. For each , define
  3. Smooth by convolving with a window function , i.e.,

The author’s implementation uses a cosine window.

Convergence to the Reeb Graph

Recall that as the resolution goes to zero, the Mapper converges to the Reeb graph. The paper claims this property is preserved by the density-based Mapper.

Recall that the family , where , defines a filtration. If flip the interval we get another filtration where . Define ordered for all and . Define the extended filtration to have spaces

Applying the homology functor defines the extended persistence module .

Proposition 13

Suppose is a topological space and that is a Morse function. Then the endpoints of a persistence interval of only occur at critical values of .

If the critical points of are , then is the sequence

A zigzag persistence module is a generalization of persistence modules in which one allows some arrows to go backwards. For example, if is a Morse type function with critical values ordered in increasing order and for any set of values with , then the levelset zigzag persistence module is the sequence

where .

Mapper Graphs from Zigzag Modules

Let be a topological space with cover . This gives the zigzag module

center

where each , with , is given by inclusion. We can construct the Mapper graph associated to cover as follows:

  1. For each of the upper spaces, , we choose the basis consisting of the connected components of .
  2. For each of the lower spaces, , we choose the basis . Then the multinerve Mapper graph associated to the zig-zag module is a multigraph defined by vertex sets

and edge sets

To obtain the Mapper graph , one may take and collapse all parallel edges into single edges.


Assuming is Morse type, this module decomposes into interval modules:

We define a partial matching between persistence diagrams and as a subset such that if and then and if then . Let be the diagonal. Then the cost of is defined as

where where , otherwise . Then the bottleneck distance between and is defined by

which ranges over all partial matchings .