Original Paper: https://research.math.osu.edu/tgda/mapperPBG.pdf

Authors: Singh, Mémoli, Carlsson


The Mapper preserves the notion of nearness, but can distort large scale distances.

The Mapper begins with a data set and real-valued function used to produce a graph. But the method can easily be modified to deal with maps to other parameter spaces such as or . In the case the parameter space is , we get a stochastic version of the Reeb graph associated with . If the covering is too coarse, we get an image of the Reeb graph. If it is too fine, we get exactly the Reeb graph.

A key step of the Mapper is to apply standard clustering algorithms to subsets of the original data set, and then understand the interaction of the partial clusters.

The goal is to construct a low-dimensional image of the data that is easy to understand.

Construction

The Mapper is motivated by the following construction. Given a finite covering of a space , we define the nerve of the covering to be the simplicial complex whose vertex set is the indexing set , and where a family spans a -simplex if and only if . Given a partition of unity, one can obtain a map from to .

Partition of Unity

A partition of unity of a topological space is a set of continuous functions such that for all

  1. there is a neighborhood of where all but a finite number of functions in are 0, and
  2. .

A partition of unity subordinate the finite open covering is partition of unity where the closure of the set is contained in the open set . Recall that if are the vertices of a simplex, then the points in the simplex correspond to the set of ordered -tuples with and . We call the numbers the barycentric coordinates.

For any point , we let be the set of all so that . We define to the point in the simplex spanned by the vertices , whose barycentric coordinates are given by our partition of unity. The map can be shown to be continuous and provides a coordinatization of .

In simpler terms, we get a map which maps each point to a point inside the -simplex whose vertices are defined by our partition of unity maps that have overlapping covers.

Suppose is continuous and we have a covering of . Since is continuous, the pullback forms an open cover of . For each we decompose into its path connected components . Giving a new cover of consisting of all the path connected components.

Multiresolution Structure

Given two coverings and of a space , a map of coverings from to is a function so that for all we have .

Example: For and the sets for and for form open coverings and of . Define the map by . Then induces a map of coverings whenever .

Given a map of coverings from to there is an induced map of simplicial complexes acting on the vertices by . That is, given a space equipped with a function and a map of coverings , then there is a corresponding map of coverings .

Implementation

We assume that the point cloud contains points and that we a function , which we call a filter, whose value is known for all data points. We also assume we can compute inter-point distances for the points in the cloud. More specifically, we should be able to construct a distance matrix of inter-point distances.

  1. We first find the range of the function () restricted to the given points.
  2. To find a covering of the given data, we divide this range into a set of smaller intervals () which overlap. This can be done by controlling two parameters: resolution (length of intervals) and gain percentage overlap of two successive intervals.
  3. For each we find the set of points.
  4. For each set we find clusters .
  5. We treat each cluster as a vertex and draw an edge between vertices whenever .

Clustering

The choice of clustering algorithm affects the outcome of the Mapper. The paper lists the following desired characteristics of the clustering algorithm:

  1. Takes the inter-point distance matrix () as an input.
  2. Do not require specifying the number of clusters beforehand.

The paper chooses to use single-linkage clustering. KeplerMapper defaults to DBSCAN.