Original Paper: https://research.math.osu.edu/tgda/mapperPBG.pdf

Authors: Singh, Mémoli, Carlsson

The Mapper preserves the notion of nearness, but can distort large scale distances.

The Mapper begins with a data set $X$ and real-valued function $f : X \to R$ used to produce a graph. But the method can easily be modified to deal with maps to other parameter spaces such as $R^{2}$ or $S^{1}$ . In the case the parameter space is $R$ , we get a stochastic version of the Reeb graph associated with $f$ . If the covering is too coarse, we get an image of the Reeb graph. If it is too fine, we get exactly the Reeb graph.

A key step of the Mapper is to apply standard clustering algorithms to subsets of the original data set, and then understand the interaction of the partial clusters.

The goal is to construct a low-dimensional image of the data that is easy to understand.

Construction

The Mapper is motivated by the following construction. Given a finite covering $U = {U_{α}}_{α \in A}$ of a space $X$ , we define the nerve of the covering $U$ to be the simplicial complex $N (U)$ whose vertex set is the indexing set $A$ , and where a family ${α_{0}, α_{1}, \dots, α_{k}}$ spans a $k$ -simplex if and only if $⋂_{i = 1}^{k} U_{α_{i}} \neq = \emptyset$ . Given a partition of unity, one can obtain a map from $X$ to $N (U)$ .

Partition of Unity

A partition of unity of a topological space $X$ is a set $R$ of continuous functions $ρ : X \to [0, 1]$ such that for all $x \in X$

there is a neighborhood of $x$ where all but a finite number of functions in $R$ are 0, and

$\sum_{ρ \in R} ρ (x) = 1$ .

A partition of unity subordinate the finite open covering $U$ is partition of unity where the closure of the set ${x \in X ∣ ρ (x) > 0}$ is contained in the open set $U_{ρ}$ . Recall that if ${v_{0}, v_{1}, \dots, v_{k}}$ are the vertices of a simplex, then the points $v$ in the simplex correspond to the set of ordered $k$ -tuples $(r_{0}, r_{1}, \dots, r_{k})$ with $0 \leq r_{i} \leq 1$ and $\sum_{i = 0}^{k} r_{i} = 1$ . We call the numbers $r_{i}$ the barycentric coordinates.

For any point $x \in X$ , we let $T (x) \subset A$ be the set of all $α \in A$ so that $x \in U_{α}$ . We define $ρ (x) \in N (U)$ to the point in the simplex spanned by the vertices $α \in T (x)$ , whose barycentric coordinates are given by our partition of unity. The map $ρ$ can be shown to be continuous and provides a coordinatization of $X$ .

In simpler terms, we get a map $ρ : X \to N (U)$ which maps each point $x \in X$ to a point inside the $k$ -simplex $σ$ whose vertices are defined by our partition of unity maps $ρ_{α}$ that have overlapping covers.

Suppose $f : X \to Z$ is continuous and we have a covering $U = {U_{α}}_{α \in A}$ of $Z$ . Since $f$ is continuous, the pullback $f^{- 1} (U_{α})$ forms an open cover of $X$ . For each $α$ we decompose $f^{- 1} (U_{α})$ into its path connected components $V (α, 1), \dots, V (α, j_{α})$ . Giving a new cover $\overline{U}$ of $X$ consisting of all the path connected components.

Multiresolution Structure

Given two coverings $U = {U_{α}}) α \in A$ and $V = {V_{β}}_{β \in B}$ of a space $X$ , a map of coverings from $U$ to $V$ is a function $f : A \to B$ so that for all $α \in A$ we have $U_{α} \subset V_{f (α)}$ .

Example: For $X = [0, 2 N]$ and $ϵ > 0$ the sets $I_{l}^{ϵ} = (l - ϵ, l + 1 + ϵ) \cap X$ for $l = 0, 1, \dots, 2 N - 1$ and $J_{m}^{ϵ} = (2 m - ϵ, 2 m + 2 + ϵ) \cap X$ for $mm = 0, 1, \dots, N - 1$ form open coverings $I_{ϵ}$ and $J_{ϵ}$ of $X$ . Define the map $f : {0, 1, \dots, 2 N - 1} \to {0, 1, \dots, N - 1}$ by $f (l) = ⌊ l /2 ⌋$ . Then $f$ induces a map of coverings $I_{ϵ} \to J_{ϵ^{'}}$ whenever $ϵ \leq ϵ^{'}$ .

Given a map of coverings $f : A \to B$ from $U$ to $V$ there is an induced map of simplicial complexes $N (f) : N (U) \to N (V)$ acting on the vertices by $f$ . That is, given a space $X$ equipped with a function $f : X \to Z$ and a map of coverings $U \to V$ , then there is a corresponding map of coverings $\overline{U} \to \overline{V}$ .

Implementation

We assume that the point cloud contains $N$ points $x \in X$ and that we a function $f : X \to R$ , which we call a filter, whose value is known for all $N$ data points. We also assume we can compute inter-point distances for the points in the cloud. More specifically, we should be able to construct a distance matrix of inter-point distances.

We first find the range of the function ( $I$ ) restricted to the given points.
To find a covering of the given data, we divide this range into a set of smaller intervals ( $S$ ) which overlap. This can be done by controlling two parameters: resolution (length of intervals) and gain percentage overlap of two successive intervals.
For each $I_{j} \in S$ we find the set $X_{j} = {x ∣ f (x) \in I_{j}}$ of points.
For each set $X_{j}$ we find clusters ${X_{jk}}$ .
We treat each cluster as a vertex and draw an edge between vertices whenever $X_{jk} \cap X_{l m} \neq = \emptyset$ .

Clustering

The choice of clustering algorithm affects the outcome of the Mapper. The paper lists the following desired characteristics of the clustering algorithm:

Takes the inter-point distance matrix ( $D \in R^{N \times N}$ ) as an input.
Do not require specifying the number of clusters beforehand.

The paper chooses to use single-linkage clustering. KeplerMapper defaults to DBSCAN.

Notes

Explorer

Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition

Construction

Multiresolution Structure

Implementation

Clustering

Graph View

Table of Contents

Backlinks

Source code