Authors: Ross Whitaker, Mahsa Mirzargar, and Robert Kirby

Original Paper: https://ieeexplore.ieee.org/document/6634129

Introduction

A crucial strategy within uncertainty quantification (UQ) is the construction of ensembles, i.e., collections of different simulations of the same physical process with varying parameters (think weather forecasting). In many cases these simulations provide level sets highlighting a desired feature.

How does one visualize an ensemble of data? One natural strategy is computing various statistics such as mean, variance/covariance, etc. However, these methods do not capture the underlying shape, position and variability of the level sets. The paper seeks to propose a new methodology of shape analysis with the following goals in mind:

Informative about regions: visualization should apply to contours and convey statistical properties about their shapes, positions, etc.
Qualitative interpretation: visualizations should provide high-level qualitative interpretations.
Quantitative interpretation: visualizations should display well-defined statistical content.
Statistical robustness: aggregate quantities in visualizations should not be sensitive to a small number of examples in the ensemble.
Aggregation preserving shape: visualizations should display summary information but not hide details.

Band Depth Method and its Generalization to Contours

Order statistics such as median require an ordering on the data. For scalar data, this is easily obtained by simply sorting; however, for non-scalar data this ordering is less clear. One method is data depth, which provides a center-outward ordering of multivariate data. Data depth quantifies how central a particular sample is within a point cloud, with deeper samples considered more representative of the data.

Another method is known as band depth. Given an ensemble of functions $f_{i} : D \to R$ , $i = 1, 2, \dots, n$ , (herein $D$ and $R$ are intervals in $R$ ) the band depth of each function $f_{i}$ is the probability that the function lies within the band defined by a random selection of $j$ functions from the distribution. For instance, a function $g (x)$ lies in the band of $j$ randomly selected functions $f_{1} (x), \dots, f_{j} (x)$ if it satisfies the following

g (x) \subset B (f_{1} (x), \dots, f_{j} (x)) ⟺ \forall x, k = 1, \dots, j min f_{k} (x) \leq g (x) \leq k = 1, \dots, j max f_{k} (x) .

I assume what we are saying here is that the image of $g$ lies in the band of $f_{1}, \dots, f_{j}$ if every point lies between the smallest and largest $f_{k}$ values. That would match the picture given in Figure 2a. As a set, we can visualize the band $B$ by

B (f_{1}, \dots, f_{j}) = {(x, y) \in D \times R ∣ k = 1, \dots, j min f_{k} (x) \leq y \leq k = 1, \dots, j max f_{k} (x)} .

For a given $j$ , we define the band depth as the probability that a function falls into the band formed by an arbitrary set of $j$ other functions chosen at random from the ensemble, i.e.,

B D^{j} (g) = Pr [g (D) \subset B (f_{1}, \dots, f_{j})] .

Band depth is more robust if one takes the sum of all smaller sets to form a band:

B D_{J} (g) = j = 2 \sum J B D^{j} (g) .

In practice we compute $B D^{j} (g)$ by a sample mean of the indicator function formed by evaluating $B D$ over all appropriately sized subsets from the ensemble (excluding $g$ ). One should pick subset sizes so that all bands are sufficiently wide to contain $g$ and not too many examples have the same depth. For instance, for $j = 3$ an ensemble containing 10 functions should test $C (9, 3) = 84$ bands to compute $B D^{j} (g)$ .

The paper generalizes this notion of band depth and provides a modification that handles small sample sizes and high variability better. Let $E = {S_{1}, \dots, S_{n}}$ , where $S_{i} \subset U$ for some universal set $U$ , be an ensemble of sets. We say that

S \in s B (S_{1}, \dots, S_{j}) ⟺ k = 1 ⋂ j S_{k} \subset S \subset k = 1 ⋃ j S_{k},

i.e., a set $S$ is in the band defined by $j$ other sets if it lies between the intersection and unions of the $j$ other sets. For example, consider the figure below:

center

The set $S$ defined by the red contour lies in the contour band of the three blue contours as it lies between the intersection (gray) and the union (light gray).

The band depth is then defined by

s B D_{J} (S) = j = 2 \sum J Pr [S \in s B (S_{1}, \dots, S_{j})] .

Similar to $B D$ , $s B D_{J}$ is computed by taking all appropriately sized subsets of $E$ .

The paper outlines an algorithm for applying $s B D$ to a set of level sets by considering the subsets in the plane enclosed by said level sets. Let $F_{1} (x, y), \dots, F_{n} (x, y)$ be a set of fields. For a given $j$ , the algorithm for computing the $s B D$ of level sets with value $q$ is

Compute the sets (as binary functions on a grid) $S_{i} = {(x, y) ∣ F_{i} (x, y) > q}$ for $i = 1, \dots, n$ .
For $i = 1$ to $n$
1. Initialize $P_{i} = 0$
2. For each subset $Q$ of ${S_{1}, \dots, S_{n}}$ of size $j$ not containing $S_{i}$
  1. Compute $S_{U} = ⋃_{S_{k} \in Q} S_{k}$ and $S_{I} = ⋂_{S_{k} \in Q} S_{k}$ (can do via min/max operations on the grid)
  2. If $S_{I} \subset S_{i} \subset S_{U}$ , then increment $P_{i}$
3. Normalize $P_{i}$ by dividing by the number of subsets $(n - 1)$ -choose- $r$ .
Sort the values of $P_{i}$ .

They call this application of set $B D$ to level sets as contour band depth ( $c B D$ ). Note nothing in the above algorithm requires that we work in 2D domains.

This formulation may produce unsatisfactory results if the ensemble is relatively small and the contours vary significantly in shape. Thus, they relax the definition of subset:

A \subset_{ϵ} B ⟺ ∣ A ∣ = 0 or \frac{∣ A - B ∣}{∣ A ∣} < ϵ,

i.e., $A$ is the empty set or the percentage of elements $x \in A$ with $x \neq \in B$ is less than $ϵ$ . Under this notion, the epsilon set band is then

S \in s B_{ϵ} (S_{1}, \dots, S_{j}) ⟺ k = 1 ⋂ j S_{k} \subset_{ϵ} S \subset_{ϵ} k = 1 ⋃ j S_{k} .

Methods

The introduction of $s B D_{ϵ}$ gives rise to a free parameter $ϵ$ . The authors present a method in which $ϵ$ can be tuned automatically to produce the most informative ordering of the data. If there are $m$ sets of $J = 2$ against which to compare $n$ contours, then this gives an $m \times n$ matrix $M = [m_{ij}]$ where $m_{ij}$ gives the percentage of mismatch for $S_{j} \in s B_{ϵ} (S_{i}^{1}, S_{i}^{2})$ (where ${S_{i}^{1}, S_{i}^{2}}$ is a set of 2 contours from $S_{1}, \dots, S_{n}$ ) of the worst between union and intersections, i.e.,

m_{ij} = max {\frac{∣ ( S _{i}^{1} \cap S _{i}^{2} ) - S _{j} ∣}{∣ S _{i}^{1} \cap S _{i}^{2} ∣}, \frac{∣ S _{j} - ( S _{i}^{1} \cup S _{i}^{2} ) ∣}{∣ S _{j} ∣}} .

If we sum along a column thresholding by $ϵ$ and divide by $m$ , we get $c B D_{ϵ}$ for that row/sample, i.e., if

T_{ϵ} (m_{ij}) = {m_{ij} 0 if m_{ij} < ϵ, if m_{ij} \geq ϵ

then the sample mean of the band depth for a particular $ϵ$ is given by

i = 1 \sum m \frac{1}{m} j = 1 \sum n T_{ϵ} (m_{ij}) .

Aside:

Recall that given a one-dimensional random variable $X$ , a function $f$ is the probability density of $X$ if

Pr [X \in A] = \int_{A} f (x) d x .

Moreover, $X$ has cumulative distribution $c (x)$ if

c (x) = Pr [X \leq x] = \int_{- \infty}^{x} f (t) d t .

Notice that

Pr [a < X \leq b] = c (b) - c (a) = \int_{- \infty}^{b} f (x) d x - \int_{- \infty}^{a} f (x) d x = \int_{a}^{b} f (x) d x,

i.e., by the fundamental theorem of calculus, $c^{'} (x) = f (x)$ .

TODO: Relearn measure theoretic probability.

Assume that a one-dimensional probability density, $p (x)$ , has cumulative distribution $c (x)$ , i.e., $c^{'} (x) = p (x)$ . Then (for $J = 2$ ) a particular sample $x$ is in the band of two randomly chosen samples if one sample is less than $x$ and the other sample is greater where the probabilities are given by $c (x)$ . That is,

???

Thus the expected value of the band depth is

E_{B D} = \int_{- \infty}^{\infty} x p (x) d x =^{?} \int_{- \infty}^{\infty} (1 - c (x)) c (x) p (x) d x = \int_{- \infty}^{\infty} (1 - c (x)) c (x) c^{'} (x) d x = \frac{1}{6} .

As a result, the probability distribution does not matter. Hence we pick the $ϵ$ that satisfies

\frac{1}{mn} i = 1 \sum n j = 1 \sum m T_{ϵ} (m_{ij}) = \frac{1}{6} .

Notes

Explorer

Contour Boxplots A Method for Characterizing Uncertainty in Feature Sets from Simulation Ensembles

Introduction

Band Depth Method and its Generalization to Contours

Methods

Graph View

Table of Contents

Backlinks

Source code