Authors: Wako Bungula and Isabel Darcy

Full paper can be found at https://arxiv.org/abs/2409.17360.

Given a topological space $X$ equipped with a continuous function $f : X \to R$ , a filtration of covers of $R$ gives rise to a filtration of covers of $X$ , which in turn gives rise to a filtration of the Mapper graphs. This filtration of Mapper graphs induces a filtration of homology groups. This paper is instead interested in point cloud data.

Filtrations via Nerves of a Cover

Def

Given a family of covers, $U = {U_{λ}}$ equipped with a family of maps ${u^{λ_{i}, λ^{j}} : U_{λ_{i}} \to U_{λ_{j}} ∣ \forall λ_{i} \leq λ_{j}}$ for some parameter $λ$ where if $u^{λ_{i}, λ_{j}} (U) = V$ then $U \subseteq V$ , we say there is a filtration of covers if $u^{λ_{j}, λ_{k}} \circ u^{λ_{i}, λ^{j}} = u^{λ_{i}, λ_{k}}$ .

Let $X$ be a topological space with $N$ coverings, ${U^{i}}_{i = 1}^{N} = {{U_{α}^{i}}_{α \in A_{i}}}_{i = 1}^{n}$ and $h$ is the family of functions

\dots \to A_{i} h^{i, j} A_{j} h^{j, k} A_{k} \to \dots

where $i \leq j \leq k$ and for $α \in A_{i}$ , $h^{i, j} (α) = β$ implies that $U_{α}^{i} \subseteq U_{β}^{j}$ and $h^{j, k} \circ h^{i, j} = h^{i, k}$ . Then $h$ induces a filtration of simplicial maps

\dots \to Nrv (U^{i}) Nrv (h^{i, j}) Nrv (U^{j}) Nrv (h^{j, k}) Nrv (U^{k}) \to \dots

where $Nrv (h^{i, j})$ is a simplicial map defined on the vertex set of $Nrv (U^{i})$ such that if $[v_{0}, v_{1}, \dots, v_{n}] \in Nrv (U^{i})$ then $[Nrv (h^{i, j}) (v_{0}), \dots, Nrv (h^{i, j}) (v_{n})] \in Nrv (U^{j})$ where $Nrv (h^{i, j}) (v_{k}) = v_{h^{i, j} (k)}$ .

Theorem 1

A filtration of covers induces a well-defined filtration of simplicial complexes.

Theorem 2

A filtration of simplicial complexes induces a filtration of homology groups.

Filtration of Mapper Graphs

The traditional Mapper depends on several parameters such as bin size and percent overlap. Recall that in the Mapper we make use of a filter function $f : X \to Z$ and form a cover of $Z$ that is then pulled back by $f$ to create a cover of $X$ . Then each pullback cover is clustered to form the cluster cover.

While a filtration of covers of $Z$ induces a filtration of pullback covers of $X$ , it is not clear that it induces a filtration of cluster covers. The paper claims there are instances in which we do not get a filtration of cluster covers. Clearly a filtration of cluster covers induces a filtration of their nerves and hence their homology groups. Thus, we simply only need to check when a filtration of covers induces a filtration of cluster covers.

center

Remark: This is all for the filtration as defined above (i.e., the Multiscale mapper). We can still construct a filtration starting with the cluster covers (see Steinhaus Filtration and Stable Paths in the Mapper).

No Filtration of Cluster Covers: Complete/Average-Linkage, bin size.

Two common clustering algorithms used in Mapper is single-linkage and DBSCAN as they provide the correct connected components of a dataset. But, two points far from each other may be put in the same cover if there is a chain of points connecting them.

The authors construct an example in which the Mapper parameterized by bin size fails to produce a filtration. More specifically, they take a data set $X \subset R$ and the filter function $f (x) = x$ and construct two covers of $f (X)$ :

Two interval $I_{1}$ and $I_{2}$ with 20% overlap
Two intervals $J_{1}$ and $J_{2}$ with 50% overlap Note that $f^{- 1} (I_{1}) \subseteq f^{- 1} (J_{1})$ and $f^{- 1} (I_{2}) \subseteq f^{- 1} (J_{2})$ .

This forms a simple filtration on the dataset $X$

{f^{- 1} (I_{1}), f^{- 1} (I_{2})} \to {f^{- 1} (J_{1}), f^{- 1} (J_{2})} .

They then used single-linkage clustering on the cover. After clustering, there is a cover that lies in $f^{- 1} (I_{2})$ but is not contained any clusters of $f^{- 1} (J_{2})$ . In other words, a filtration on the pullback cover parameterized by bin size failed to induce a filtration on the cluster cover.

center

Notice that the cover ${8.4, 10.2}$ in the Mapper graph corresponding to $I_{1}, I_{2}$ (B) is lost in the Mapper graph corresponding to $J_{1}, J_{2}$ (C).

DBSCAN

The general idea of DBSCAN is to cluster a set of dense points together, and if there is a set of low-density points they are considered noise. DBSCAN is configured by two parameters: $M in Pt s \in Z$ and radius $ϵ \in R$ . An $ϵ$ -neighborhood of a point $p$ is considered a dense set if the neighborhood contains at least $M in Pt s$ points, i.e.,

∣ N_{ϵ} (p) ∣ = ∣ {q \in X ∣ d (p, q) \leq ϵ} ∣ \geq M in Pt s .

KeplerMapper defaults to $ϵ = 0.5$ and $M in Pt s = 3$ .

A point $p$ is a core point if $∣ N_{ϵ} (p) ∣ \geq M in Pt s$
A point $q$ is a border point if $∣ N_{ϵ} (q) ∣ < M in Pt s$ and there is a core point $p$ such that $q \in N_{ϵ} (p)$ .
A point $r$ is noise if $r$ is neither a core point nor a border point.

center

Def

A point $q$ is directly density-reachable from a point $p$ w.r.t. $ϵ$ and $M in Pt s$ if $p$ is a core point and $q \in N_{ϵ} (p)$ .

Def

A point $q$ is density-reachable from a point $p$ w.r.t. $ϵ$ and $M in Pt s$ if there is a sequence of points $p = α_{1}, α_{2}, \dots, α_{n} = q$ such that $α_{i + 1}$ is directly density-reachable from $α_{i}$ .

Def

A point $p$ is density-connected to a point $q$ w.r.t. $ϵ$ and $M in Pt s$ if there is a point $o$ such that both $p$ and $q$ are density reachable from $o$ w.r.t. $ϵ$ and $M in Pt s$ .

Question: Why the distinction between density-reachable and density-connected? It seems like $q$ being density-reachable from $p$ implies that $p$ and $q$ are density-connected.

Answer: Density-reachable is not a symmetric relation. For example, in the dataset below

center

the point $s$ is (directly) density-reachable to the points $p$ and $q$ , but $p$ and $q$ are not density-reachable to $s$ because $s$ is not a core point ( $∣ N_{ϵ} (s) ∣ < M in Pt s$ ). But $p$ is density-connected to $s$ because there is a point $o = p$ such that $p$ and $s$ are density-reachable from $o$ .

Def

Let a dataset $X = C_{1} ⊔ C_{2} ⊔ \dots ⊔ C_{n} ⊔ N$ where $N$ is a set of noise points and $C_{i}$ is a cluster w.r.t. $ϵ$ and $M in Pt s$ satisfying:

(Maximality) $\forall p, q$ , if $p \in C_{i}$ , $q \neq \in ⋃_{j = 1}^{i - 1} C_{j}$ , and $q$ is density-reachable from $p$ w.r.t. $ϵ$ and $M in Pt s$ , then $q \in C_{i}$ .

(Connectivity) $\forall p, q \in C_{i}$ , $p$ is density-connected to $q$ w.r.t. $ϵ$ and $M in Pt s$ .

In summary, a DBSCAN cluster of a dataset $X$ is a set of points $C_{i}$ such that any two points $p, q \in C_{i}$ are density-connected and there is no larger such cover $C_{i}$ .

The paper constructs an example in which the assignment of a point is dependent on the order the data is listed.

Def

A point $s$ is a free-border point w.r.t. $ϵ$ and $M in Pt s$ if there exists two points $p$ and $q$ such that

$s \in N_{ϵ} (p)$ and $∣ N_{ϵ} (p) ∣ \geq M in Pt s$

$s \in N_{ϵ} (q)$ and $∣ N_{ϵ} (q) ∣ \geq M in Pt s$

$∣ N_{ϵ} (s) ∣ < M in Pt s$ , and

$p$ is not density connected to $q$ .

For the example, the dataset given below:

center

contains a free-border point $s$ when $ϵ = d (p, s) = d (s, q)$ and $M in Pt s = 5$ .

The paper claims that as long as there are no free-border points a filtration of covers of $X$ gives a filtration of cluster covers of $X$ as bin size increases, the $ϵ$ parameter increases, or $M in Pt s$ decreases.

Lemma 2

Let $X$ be a dataset.

If $p$ is a core point, then there is a cluster $C$ containing $p$ and if $q \in C$ then $q$ is density-reachable from $p$ .

Let $C$ be a cluster. Then there is a point $p \in C$ that is a core point. If $q \in C$ then $q$ is density-reachable from $p$ .

In other words, the Lemma above states that a cluster is determined by any of its core points and the core points do not change depending on the order of the dataset.

Filtration of cluster covers: DBSCAN, bin size.

Free-border points can results in a failure to filter the cluster covers. Consider the example below:

center

Assume that $q$ is ordered before $s$ and $p$ , $M in Pt s = 5$ and $ϵ = d (p, s) = d (q, s)$ . When DBSCAN is applied to $bi n_{1}$ a cluster $C_{p}^{bi n_{1}}$ is formed w.r.t. $ϵ$ and $M in Pt s$ containing $s, p$ and all points to the right of $p$ . When we apply DBSCAN to $bi n_{2}$ we get two clusters w.r.t. $ϵ$ and $M in Pt s$ : $C_{q}^{bi n_{2}}$ contaning points $s$ , $q$ and points to the left of $q$ and $C_{p}^{bi n_{2}}$ containing $p$ and points to the right of $p$ .

This is an issue as $bi n_{1} \subset bi n_{2}$ but $C_{p}^{bi n_{1}} \neq \subseteq C_{p}^{bi n_{2}}$ . Thus, the free-border point $s$ results in a filtration of cluster covers parameterized by bin size being invalid.

The critical point here is the free-border point $s$ . In the absence of free-border points the issue disappears:

Lemma 4

Suppose there are no free-border points when DBSCAN is used to cluster. If $bin 1 \subseteq bin 2$ ., then $C_{p}^{bin 1} \subseteq C_{p}^{bin 2}$ .

Therefore, if there are no free-border points than there is a filtration of cluster covers parameterized by bin size. As a corollary, if $M in Pt s \in {1, 2}$ then there is a filtration of cluster covers parameterized by bin size as there will be no free-border poitns.

Filtration of cluster covers: DBSCAN, ϵ

Notice that if $ϵ_{0} \leq ϵ_{1}$ then if $p$ is a core point w.r.t $ϵ_{0}$ then $p$ is a core point w.r.t. $ϵ_{1}$ . Thus, one could potentially construct a filtration parameterized by $ϵ$ .

Lemma 5

Suppose $X$ is a data set, $ϵ_{0} \leq ϵ_{1}$ , and $M in Pt s$ and $B = {bi n_{i}}$ are fixed. If there are no free-border points w.r.t. $ϵ_{1}$ and $M in Pt s$ then $C_{p}^{ϵ_{0}} \subseteq C_{p}^{ϵ_{1}}$ .

Therefore, the absence of free-border points indicates that there is a filtration of cluster covers parameterized by $ϵ$ .

Filtration of cluster covers: DBSCAN, MinPts

Notice that if $M in Pt s_{0} \geq M in Pt s_{1}$ and $p$ is a core point w.r.t. $ϵ$ and $M in Pt s_{0}$ , then $p$ is also a core point w.r.t. $ϵ$ and $M in Pt s_{1}$ .

Lemma 6

Suppose $X$ is a dataset, $M in Pt s_{0} \geq M in Pt s_{1}$ , and $ϵ$ and $B$ are fixed. If there are no free-border points w.r.t. $ϵ$ and $M in Pt s_{1}$ then $C_{p}^{M in Pt s_{0}} \subseteq C_{p}^{M in Pt s_{1}}$ .

Therefore, the absence of free-border points indicates that there is a filtration of cluster covers parameterized by decreasing $M in Pt s$ .

Filtration of Simplicial Complexes and Homology Groups

If there are no free-border points, then we can construct simplicial and homological filtrations parameterized by

Bin size
$ϵ$
$M in Pt s$

Bi-Filtrations and Stability

The paper claims that DBSCAN is not stable under small perturbation. Let $X$ be a dataset and $X_{δ}$ the dataset obtained by perturbing $X$ by at most $δ$ , i.e., there is some function $Δ : X \to X_{δ}$ such that $d (x, Δ (x)) \leq δ$ for all $x \in X$ . We say that

d (X, X_{δ}) = max {min {d (x, y) ∣ x \in X, y \in X_{δ}}} \leq δ .

How does applying DBSCAN to $X$ and $X_{δ}$ vary?

For example, consider the following two datasets:

center

If we let $ϵ$ be the distance between two points in $X$ and $M in Pt s = 2$ , then $X$ is clustered into a single set where as $X_{δ}$ is clustered into three disjoint sets.

We construct a two-dimensional filtration (bi-filtration) parameterized by $ϵ$ and bins $B$ , leaving $M in Pt s$ fixed. Note that each parameter alone gives a filtration (assuming no free-border points), that is, given a dataset $X$ there is a filtration of cluster covers

{c^{B_{i}, B_{j}} : C_{B_{i}} \to C_{B_{j}} ∣ \forall B_{i} \leq B_{j}}

which induces a filtration of simplicial complexes

{Φ^{B_{i}, B_{j}} : Nrv (C_{B_{i}}) \to Nrv (C_{B_{j}}) ∣ \forall B_{i} \leq B_{j}}

which induces a filtration of $k$ -th homology groups

{f^{B_{i}, B_{j}} : H_{k} (Nrv (C_{B_{i}})) \to H_{k} (Nrv (C_{B_{j}})) ∣ \forall B_{i} \leq B_{j}} .

A similar set of filtrations exist parameterized by $ϵ$ . Combining the two together gives a set of bi-filtrations where $B$ is one dimension and $ϵ$ is the other.

center

We can perform the same with $X_{δ}$ to get another set of bi-filtrations $D_{(B_{i}, ϵ_{j})}$ .

Interleaving of Bi-filtrations

See The structure and stability of persistence modules for more details.

Def

Let $P_{n}$ be a polynomial ring in $n$ variables $x = {x_{1}, x_{2}, \dots, x_{n}}$ . An $n$ -graded module is a $P_{n}$ module $M$ such that $M ≅ ⨁_{a \in R^{n}} M_{a}$ and $x^{b} (M_{a}) \subset M_{a + b}$ for all $a \in R^{n}$ , $b \in [0, \infty)^{n}$ where $M_{a}$ is a vector space over some field $k$ . The action of $x^{b - a}$ gives rise to a linear map $φ : M_{a} \to M_{b}$ for all $a \leq b \in R^{n}$ .

This is unnecessarily complicated. We’re just taking persistence modules parameterized by two variables. Thus, we have a set of vector spaces $M_{(x, y)}$ and a set of linear maps $φ : M_{(x, y)} \to M_{(x^{'}, y^{'})}$ for all $(x, y) \leq (x^{'}, y^{'})$ .

Def

For $M$ an $n$ -graded module and $v \in R^{n}$ , $M (v)$ is the shifted module such that $M (v)_{u} = M_{v + u}$ .

Def

For $M$ and $n$ -graded module, $\overline{ξ} = {ξ, ξ, \dots, ξ} \in R_{+}^{n}$ and $M (\overline{ξ})$ ,
$φ_{M}^{\overline{ξ}} : M \to M (\overline{ξ})$
is the (diagonal) $ξ$ -transition morphism such that $φ_{M}^{\overline{ξ}} (M_{a}) = φ_{M} (a + \overline{ξ})$ .

This is just $ϵ$ -homomorphisms but now $ϵ \in R^{n}$ .

Def

Let $ξ \geq 0$ . Two $n$ -modules $M$ and $N$ are $ξ$ -interleavd if there are morphisms $f : M \to N (ξ)$ and $g : N \to M (ξ)$ such that $φ_{N}^{2 ξ} = f (ξ) \circ g$ and $φ_{M}^{2 ξ} = g (ξ) \circ f$ .

Again, this is just $ϵ$ -interleaving but now $ϵ \in R^{n}$ .

Stability Against Perturbation

Assume there are no free-border points. Then as described above we get a filtration for $X$

C : \dots \to C_{(B_{i}, ϵ_{j})} \to C_{(B_{k}, ϵ_{ℓ})} \to \dots

and a filtration for $X_{δ}$

D : \dots \to D_{(B_{i}, ϵ_{j})} \to D_{(B_{k}, ϵ_{ℓ})} \to \dots .

According to proposition 1, there are family of maps $ϕ$ and $ψ$ such that the diagram below commutes:

center

In other words, the filtrations $C$ and $D$ are $2 δ$ -interleaved. This $2 δ$ -interleaving induces a $2 δ$ -interleaving between their homologies $H_{k} (C)$ and $H_{k} (D)$ .

Notes

Explorer

Bi-Filtration and Stability of TDA Mapper for Point Cloud Data

Filtrations via Nerves of a Cover

Filtration of Mapper Graphs

No Filtration of Cluster Covers: Complete/Average-Linkage, bin size.

DBSCAN

Filtration of cluster covers: DBSCAN, bin size.

Filtration of cluster covers: DBSCAN, ϵ

Filtration of cluster covers: DBSCAN, MinPts

Filtration of Simplicial Complexes and Homology Groups

Bi-Filtrations and Stability

Interleaving of Bi-filtrations

Stability Against Perturbation

Graph View

Table of Contents

Backlinks

Source code