Questions:
- Unsure how the weights for a point are determined
- How to solve the LP for box expansion?
- What if a point lies on the boundary of two or more pixels?
Full paper can be found at: https://arxiv.org/abs/2404.05859v2
Notation used:
Notation | Definition/Explanation |
---|---|
unit pixel | |
centroid, weight, and number of points in pixel | |
weight of point | |
are parameters for linear optimization | |
initial collection of boxes that covers | |
is a box in . | |
-th expansion of | |
-neighborhood box of : | |
cost of in the neighborhood | |
set of optimal solutions for input box and neighborhood | |
-th cover | |
set of pixels with , | |
rounded boxes for given box | |
filtration corresponding to cover |
Introduction
There are many newly proposed approaches to solve the outlier problem, such as, a distance to measure class of filtration which uses density functions and kernel density estimates to grow the balls guided by where the measure is greater; as a result ignoring isolated outliers. Another proposed approach is a bi-filtration where both distance and density thresholds are treated as parameters. All of these approaches center on using balls centered around points to control the filtration. Since balls grow uniformly, a symmetry bias may occur.
Building filtrations by growing hyper-rectangles (boxes) non-uniformly in different directions based on the distribution of points may be a better approach to capturing the data’s topological features. Since boxes are still convex, the nerve lemma still applies, i.e., the simplicial complex defined as the nerve of the boxes has the same homotopy type as the collection.
The paper defines a new framework called the box filtration of a point-cloud data (PCD) built by growing boxes as the convex sets covering . Two approaches to handle boxes are provided: a point cover where each point is assigned a box at start, and a pixel cover that “works with a pixelization of the space of the PCD”. A filtration is built by expanded the boxes in a manner that minimizes an objective function. An expansion algorithm is provided.
The box filtration can produce results that are more resilient to noise and with less symmetry bias than VR and distance-to-measure (DTM) filtrations. Any box cover of also gives a mapper, hence the box filtration can function as a mapper framework. For example, the top row above is the point cloud () and the box covers. The bottom row is the nerve of each.
Construction
Definition 2.1 (Box)
A box in is defined as the -fold Cartesian product where , .
Note that a boxes dimension may be lower than if for some .
Point Cover
Given a finite PCD , the initial cover of its point cover consists of a collection of hypercubes (-balls) or boxes such that each point is located in a single box. A box may contain more than one point. Every box is called a pivot box. We want to expand each pivot box using two parameters and using linear optimization where represents a step size to expand the boxes by and controls the relative weight in the objective function. The set of optimal solutions for an input box and neighborhood is denoted .
Let be an expanded box in the neighborhood . The objective function of the linear program is denoted by . The goal is to cover all points, the points in that are not covered by result in a cost in . The farther away a non-covered point is from , the higher the cost. Inside of the cost is zero. The weight of a point is given by
Notice that if , then . We get that if and only if since
If we just minimized this cost, we could grow the box to the maximum extend possible. Hence, we also want to minimize the size of by opposing a cost by the sum of lengths of its edges. The full linear program is given by
Example 2.2: Let with be a one-dimensional point cloud. Let the initial point cover be a single pivot box . If with and for some small , then
The partial derivative is zero when . Since multiple boxes are solutions to the LP, the solution is not unique.
Definition 2.3 (Union of boxes)
Let and be two boxes. Their union is the box where and for each
For example in 2D:
Let and . Then is the ordered sequence whose -th entry is the union of and the projections of onto the set of directions , i.e., is the ordered sequence of boxes from to by expanding in each direction. Let denote the change in the cost function resulting from the expansion of in the one additional -th direction.
Proposition 2.4
Let , , and be expansions of a box such that for some neighborhood . Let , , and be the sequences with , , and being the corresponding changes in the cost function at the -th step. Then
One may imagine expanding from to to . The proposition is stating the corresponding total cost of this procedure is bounded above by the sum of each total cost of expansions and since the expansions are disjoint ().
Theorem 2.5
The following results hold:
- If , then
- If , then
In other words, the union and intersection of two optimal solutions for the input box and neighborhood are also optimal solutions. For example, if and are optimal solutions such that in Example 2.2 (the two points example), then both and are also optimal.
Theorem 2.6
Let and . If then
- where and
- where and
- , such that .
- , such that .
Interpretation of statements:
- The cost of a subset of an optimal solution cannot be lower than the optimal solution even when the neighborhood is expanded.
- The cost for a candidate solution that is strictly smaller than the intersection of all optimal solutions is strictly larger when the neighborhood is expanded.
- When the neighborhood is enlarged, we get an optimal solution that contains the original optimal solution.
Lemma 2.7
The largest optimal solution of is contained in any when .
Lemma 2.8
Let where and is a largest optimal solution. Then .
Lemma 2.9
Let be a largest optimal solution in such that . With we get that
where are the numbers of facets of that do not intersect and , respectively.
If there are points in the neighborhood, the running time of the LP is where .
Pixel Cover
With the pixel cover we instead work over a discretization of where each pixel is a unit cube with integer vertices. We also assume that when defining neighborhoods. All results shown above for point covers also hold true for pixel covers.
Define the integer ceiling by
For we define the pixel . We denote the centroid of a pixel by and define to be the number of points in that are in . We denote by the set of pixels such that and , i.e., the set of nonempty pixels whose centroids are in the box .
For a given input box , let be a box in the neighborhood . The total width of box is given by . Let be the weight corresponding to pixel for a given expansion ,
Note that, like before, if and only if
For we define the following LP:
An optimal solution may have non-integer coordinates. The paper claims that in such a case, there is another optimal solution with integer coordinates.
Recall that is the fractional part of .
Rounding Functions
Given a box we define three rounded boxes for where