Questions:

Unsure how the weights $w_{x}$ for a point $x \in N$ are determined
How to solve the LP for box expansion?
What if a point $x \in X$ lies on the boundary of two or more pixels?

Full paper can be found at: https://arxiv.org/abs/2404.05859v2

Notation used:

Notation	Definition/Explanation
$σ$	unit pixel
$m_{σ}, w_{σ}, θ (σ)$	centroid, weight, and number of points in pixel $σ$
$w_{x}$	weight of point $x$
$α, π$	$α \in [0, 1], π \in R_{+}$ are parameters for linear optimization
$U (0)$	initial collection of boxes that covers $X$
$V$	$V = [l_{1}, u_{1}] \times \dots \times [l_{n}, u_{n}]$ is a box in $U (0)$ .
$V [jπ]$	$j$ -th expansion of $V$
$B (V, π)$	$π$ -neighborhood box of $V$ : $(l_{1} - π, u_{1} + π) \times \dots \times (l_{n} - π, u_{n} + π)$
$C_{α} (V, N)$	cost of $V$ in the neighborhood $N \supset V$
$Sol_{α} (V, N)$	set of optimal solutions for input box $V$ and neighborhood $N \supset V$
$U (jπ)$	$j$ -th cover $\forall V \in U (0)$
$Θ (V)$	set of pixels with $m_{σ} \in V$ , $θ (σ) \neq = 0$
$ψ_{1} (V), ψ_{2} (V), ψ_{3} (V)$	rounded boxes for given box $V$
$K (U)$	filtration corresponding to cover $U$

Introduction

There are many newly proposed approaches to solve the outlier problem, such as, a distance to measure class of filtration which uses density functions and kernel density estimates to grow the balls guided by where the measure is greater; as a result ignoring isolated outliers. Another proposed approach is a bi-filtration where both distance and density thresholds are treated as parameters. All of these approaches center on using balls centered around points to control the filtration. Since balls grow uniformly, a symmetry bias may occur.

Building filtrations by growing hyper-rectangles (boxes) non-uniformly in different directions based on the distribution of points may be a better approach to capturing the data’s topological features. Since boxes are still convex, the nerve lemma still applies, i.e., the simplicial complex defined as the nerve of the boxes has the same homotopy type as the collection.

The paper defines a new framework called the box filtration of a point-cloud data (PCD) $X \in R^{n}$ built by growing boxes as the convex sets covering $X$ . Two approaches to handle boxes are provided: a point cover where each point is assigned a box at start, and a pixel cover that “works with a pixelization of the space of the PCD”. A filtration is built by expanded the boxes in a manner that minimizes an objective function. An expansion algorithm is provided.

The box filtration can produce results that are more resilient to noise and with less symmetry bias than VR and distance-to-measure (DTM) filtrations. Any box cover of $X$ also gives a mapper, hence the box filtration can function as a mapper framework. For example, the top row above is the point cloud ( $X$ ) and the box covers. The bottom row is the nerve of each.

center

Construction

Definition 2.1 (Box)

A box in $R^{n}$ is defined as the $n$ -fold Cartesian product $[l_{1}, u_{1}] \times \dots \times [l_{n}, u_{n}]$ where $l_{i} \leq u_{i}$ , $\forall i \in {1, \dots, n}$ .

Note that a boxes dimension may be lower than $n$ if $l_{i} = u_{i}$ for some $i \in {1, \dots, n}$ .

Point Cover

Given a finite PCD $X \in R^{n}$ , the initial cover $U (0)$ of its point cover consists of a collection of hypercubes ( $ℓ_{\infty}$ -balls) or boxes such that each point is located in a single box. A box may contain more than one point. Every box $V \in U (0)$ is called a pivot box. We want to expand each pivot box using two parameters $π \in R_{+}$ and $α \in [0, 1]$ using linear optimization where $π$ represents a step size to expand the boxes by and $α$ controls the relative weight in the objective function. The set of optimal solutions for an input box $V$ and neighborhood $N = B (V, π) = (l_{1} - π, u_{1} + π) \times \dots \times (l_{n} - π, u_{n} + π)$ is denoted $Sol_{α} (V, N)$ .

center

Let $\tilde{V} = [\tilde{l_{1}}, \tilde{u_{1}}] \times \dots \times [\tilde{l_{n}}, \tilde{u_{n}}] \supseteq V$ be an expanded box in the neighborhood $N$ . The objective function of the linear program is denoted by $C_{α} (\tilde{V}, N)$ . The goal is to cover all points, the points in $N$ that are not covered by $\tilde{V}$ result in a cost in $C_{α} (\tilde{V}, N)$ . The farther away a non-covered point is from $\tilde{V}$ , the higher the cost. Inside of $\tilde{V}$ the cost is zero. The weight $w_{x} \in R$ of a point $x \in N$ is given by

w_{x} \leq min {{x_{i} - \tilde{l_{i}} ∣ i \in I} \cup {\tilde{u_{i}} - x_{i} ∣ i \in I} \cup {0}} .

Notice that if $x \neq \in \tilde{V}$ , then $w_{x} < 0$ . We get that $x \in \tilde{V}$ if and only if $w_{x} = 0$ since

min {{x_{i} - \tilde{l_{i}} ∣ i \in I} \cup {\tilde{u_{i}} - x_{i} ∣ i \in I}} \geq 0.

If we just minimized this cost, we could grow the box to the maximum extend possible. Hence, we also want to minimize the size of $\tilde{V}$ by opposing a cost by the sum of lengths of its edges. The full linear program is given by

\forall \tilde{V} \supseteq V min subject to C_{α} (\tilde{V}, N) = - α x \in N \sum w_{x} + (1 - α) i \in I \sum (\tilde{u_{i}} - \tilde{l_{i}}) \tilde{u_{i}} \geq u_{i}, \forall i \in I \tilde{l_{i}} \leq l_{i}, \forall i \in I w_{x} \leq x_{i} - \tilde{l_{i}}, \forall i \in I, x \in N w_{x} \leq \tilde{u_{i}} - x_{i}, \forall i \in I, x \in N w_{x} \leq 0, \forall x \in N .

Example 2.2: Let $X = {a, b}$ with $a < b$ be a one-dimensional point cloud. Let the initial point cover $U (0)$ be a single pivot box $V = [a, a]$ . If $\tilde{V} = [a, x]$ with $x \leq b$ and $N = B (V, π = b - a + δ)$ for some small $δ > 0$ , then

C_{α} (\tilde{V}, N) = α (b - a - x) + (1 - α) (x - a) \Rightarrow \frac{\partial C _{α} ( V ~ , N )}{\partial x} = 1 - 2 α .

The partial derivative is zero when $α = 0.5$ . Since multiple boxes $\tilde{V} \supseteq V$ are solutions to the LP, the solution is not unique.

Definition 2.3 (Union of boxes)

Let $V^{1} = \prod [l_{i}^{1}, u_{i}^{1}]$ and $V^{2} = \prod [l_{i}^{2}, u_{i}^{2}]$ be two boxes. Their union is the box $V^{1} \cup V^{2} = \prod [\hat{l_{i}}, \overset{u_{i}}{^}]$ where $\hat{l}_{i} = min {l_{i}^{1}, l_{i}^{2}}$ and $\overset{u_{i}}{^} = max {u_{i}^{1}, u_{i}^{2}}$ for each $i \in I$

For example in 2D:

center

Let $I_{i} = {1, \dots, i}$ and $V \subseteq \tilde{V}$ . Then $S (V, \tilde{V})$ is the ordered sequence whose $i$ -th entry is the union of $V$ and the projections of $\tilde{V}$ onto the set of directions $I_{i}$ , i.e., $S (V, \tilde{V})$ is the ordered sequence of boxes from $V$ to $\tilde{V}$ by expanding in each direction. Let $\tilde{c_{i}}$ denote the change in the cost function resulting from the expansion of $\tilde{V}$ in the one additional $i$ -th direction.

center

Proposition 2.4

Let $V^{l} \supseteq V$ , $V^{k} \supseteq V$ , and $\hat{V} = V^{l} \cup V^{k}$ be expansions of a box $V$ such that $V = V^{k} \cap V^{l}$ for some neighborhood $N$ . Let $S (V, V^{l})$ , $S (V, V^{k})$ , and $S (V, \hat{V})$ be the sequences with $c_{i}^{1}$ , $c_{i}^{k}$ , and $\overset{c_{i}}{^}$ being the corresponding changes in the cost function at the $i$ -th step. Then
$i \in I \sum \overset{c_{i}}{^} \leq i \in I \sum (c_{i}^{k} + c_{i}^{l}) .$

One may imagine expanding from $V$ to $V^{k}$ to $\hat{V}$ . The proposition is stating the corresponding total cost of this procedure is bounded above by the sum of each total cost of expansions $V^{k}$ and $V^{l}$ since the expansions are disjoint ( $V = V^{k} \cap V^{l}$ ).

Theorem 2.5

The following results hold:

If $V^{l}, V^{k} \in Sol_{α} (V, N)$ , then $V^{l} \cap V^{k} \in Sol_{α} (V, N)$

If $V^{l}, V^{k} \in Sol_{α} (V, N)$ , then $V^{l} \cup V^{k} \in Sol_{α} (V, N)$

In other words, the union and intersection of two optimal solutions for the input box $V$ and neighborhood $N$ are also optimal solutions. For example, if $[a, c]$ and $[a, d]$ are optimal solutions such that $c \leq d$ in Example 2.2 (the two points $a < b$ example), then both $[a, c] \cap [a, d] = [a, c]$ and $[a, c] \cup [a, d] = [a, d]$ are also optimal.

Theorem 2.6

Let $N = B (V, π)$ and $\tilde{N} = B (V, \tilde{π})$ . If $\tilde{π} \geq π$ then

$C (V^{l}, \tilde{N}) \leq C (V^{'}, \tilde{N})$ where $V^{'} \subseteq V^{l}$ and $V^{l} \in Sol (V, N)$

$C (⋂_{V^{l} \in Sol(V,N)} V^{l}, \tilde{N}) < C (V^{'}, \tilde{N})$ where $V^{'} \subset ⋂_{V^{l} \in Sol (V, N)} V^{l}$ and $V \subseteq V^{'}$

$\forall V^{l} \in Sol (V, N)$ , $\exists V^{k} \in Sol (V, \tilde{N})$ such that $V^{l} \subseteq V^{k}$ .

$\forall V^{k} \in Sol (V, \tilde{N})$ , $\exists V^{l} \in Sol (V, N)$ such that $V^{l} \subseteq V^{k}$ .

Interpretation of statements:

The cost of a subset of an optimal solution cannot be lower than the optimal solution even when the neighborhood is expanded.
The cost for a candidate solution that is strictly smaller than the intersection of all optimal solutions is strictly larger when the neighborhood is expanded.
When the neighborhood is enlarged, we get an optimal solution that contains the original optimal solution.

Lemma 2.7

The largest optimal solution of $Sol_{α} (V, N)$ is contained in any $V^{k} \in Sol_{\tilde{α}} (V, N)$ when $\tilde{α} > α$ .

Lemma 2.8

Let $V^{k} \in Sol_{α} (M, Δ N)$ where $Δ N = B (M, \tilde{π} - π)$ and $M \in Sol_{α} (V, N)$ is a largest optimal solution. Then $V^{k} \in Sol_{α} (V, \tilde{N})$ .

Lemma 2.9

Let $M$ be a largest optimal solution in $Sol_{α} (V, N)$ such that $M \neq = V$ . With $γ = (1/ α) - 1$ we get that
$\frac{θ ( N \ M ) + θ ( \partial M )}{p} \geq γ \geq \frac{θ ( N \ M )}{q}$
where $p, q \in {1, \dots, 2 n}$ are the numbers of facets of $M$ that do not intersect $V$ and $N$ , respectively.

If there are $m$ points in the neighborhood, the running time of the LP is $O (q^{3} lo g q)$ where $q = mn$ .

Pixel Cover

With the pixel cover we instead work over a discretization of $X \in R^{d}$ where each pixel is a unit cube with integer vertices. We also assume that $π \in Z$ when defining neighborhoods. All results shown above for point covers also hold true for pixel covers.

Define the integer ceiling by

↾ x ↿ = {⌈ x ⌉ x + 1 x \neq \in Z x \in Z .

For $x = (x_{1}, \dots, x_{n}) \in X$ we define the pixel $σ = [⌊ x_{1} ⌋], ↾ x_{1} ↿] \times \dots \times [⌊ x_{n} ⌋, ↾ x_{n} ↿]$ . We denote the centroid of a pixel $σ$ by $m_{σ} = (m_{σ}^{1}, \dots, m_{σ}^{n})$ and define $θ (σ)$ to be the number of points in $X$ that are in $σ$ . We denote by $Θ (\tilde{V})$ the set of pixels $σ$ such that $m_{σ} \in \tilde{V}$ and $θ (σ) \neq = 0$ , i.e., the set of nonempty pixels whose centroids are in the box $\tilde{V}$ .

center

For a given input box $V$ , let $\tilde{V} \supseteq V$ be a box in the neighborhood $N = B (V, π)$ . The total width of box $\tilde{V}$ is given by $∣ \tilde{V} ∣ = \sum_{i \in I} \tilde{u_{i}} - \tilde{l_{i}}$ . Let $w_{σ}$ be the weight corresponding to pixel $σ \in Θ (N)$ for a given expansion $\tilde{V}$ ,

w_{σ} \leq min {{m_{σ}^{i} - \tilde{l}^{i} ∣ i \in I} \cup {\tilde{u_{i}} - m_{σ}^{i} ∣ i \in I} \cup {0.5}} .

Note that, like before, $m_{σ} \in \tilde{V}$ if and only if

min {{m_{σ}^{i} - \tilde{l_{i}} ∣ i \in I} \cup {\tilde{u_{i}} - m_{σ}^{i} ∣ i \in I}} \geq 0.

For $V = [l_{1}, u_{1}] \times \dots \times [l_{n}, u_{n}]$ we define the following LP:

\forall \tilde{V} \supseteq V min subject to \overline{C}_{α} (\tilde{V}, N) = - α σ \in Θ (N) \sum w_{σ} θ (σ) + (1 - α) i \in I \sum (\tilde{u_{i}} - \tilde{l_{i}}) \tilde{u_{i}} \geq u_{i} \forall i \in {1, \dots, n} \tilde{l_{i}} \leq l_{i} \forall i \in {1, \dots, n} w_{σ} \leq m_{σ}^{i} - \tilde{l_{i}} \forall i \in {1, \dots, n}, σ \in Θ (N) w_{σ} \leq \tilde{u_{i}} - m_{σ}^{i} \forall i \in {1, \dots, n}, σ \in Θ (N) w_{σ} \leq 0.5\forall i \in {1, \dots, n}, σ \in Θ (N)

An optimal solution $[l_{1}^{*}, u_{1}^{*}] \times \dots \times [l_{n}^{*}, u_{n}^{*}] \in \overline{Sol}_{α} (V, N)$ may have non-integer coordinates. The paper claims that in such a case, there is another optimal solution with integer coordinates.

Recall that $frac (x) = x - ⌊ x ⌋ \in [0, 1)$ is the fractional part of $x \in R$ .

Rounding Functions

Given a box $V = [l_{1}, u_{1}] \times \dots \times [l_{n}, u_{n}]$ we define three rounded boxes $Ψ_{r} (V) = [ψ_{r} (l_{1}), ψ_{r} (u_{1})] \times \dots \times [ψ_{r} (l_{n}), ψ_{r} (u_{n})]$ for $r = 1, 2, 3$ where
$ψ_{1} (l_{i}) = ⌈ l_{i} ⌉, ψ_{1} (u_{i}) = ⌊ u_{i} ⌋$ $ψ_{2} (l_{i}) = ⎩ ⎨ ⎧ ⌈ l_{i} ⌉ ⌊ l_{i} ⌋ + 0.5 ⌊ l_{i} ⌋ if frac (l_{i}) \in (0.5, 1) if frac (l_{i}) \in (0, 0.5] if frac (l_{i}) = 0, ψ_{2} (u_{i}) = {⌊ u_{i} ⌋ + 0.5 ⌊ u_{i} ⌋ if frac (u_{i}) \in [0.5, 1) if frac (u_{i}) \in [0, 0.5)$ $ψ_{3} (l_{i}) = {⌈ l_{i} ⌉ ⌊ l_{i} ⌋ if frac (l_{i}) \in (0.5, 1) if frac (l_{i}) \in [0, 0.5], ψ_{3} (u_{i}) = {⌈ u_{i} ⌉ ⌊ u_{i} ⌋ if frac (u_{i}) \in [0.5, 1) if frac (u_{i}) \in [0, 0.5)$

center

Notes

Explorer

Box Filtration

Introduction

Construction

Point Cover

Pixel Cover

Graph View

Table of Contents

Backlinks

Source code