Authors: Yuyang Tao and Shufei Ge
Full paper can be found at https://arxiv.org/abs/2412.11631.
A major limitation of the Mapper algorithm are the fixed length/spacing of intervals. The paper makes use of a probabilistic model of the Mapper requiring fewer parameter selections as well as allowing for a more flexible interval partitioning. They make use of gradient descent to automatically parameter tune a mixture distribution.
Preliminary: Soft Mapper
Instead of intervals on the filtered data, the soft Mapper works with a “hidden assignment matrix”. That is an binary matrix depicting the allocation between data points and groups in which indicates that the -th point belongs to the -th interval.
Given a hidden assignment matrix , a Mapper function is a map from the hidden assignment matrix to a Mapper graph which includes the pullback and clustering operations.
If one lets be a random matrix, the soft Mapper can be seen as a stochastic version of the Mapper parameterized by and a PDF defined over .
Suppose follows a Bernoulli distribution with parameter , i.e., each element of is drawn independently from a Bernoulli distribution with probability of success . Under this view the model inference is simplified to estimate the probability matrix which yields the distribution of the Mapper graph. However, it can be challenging to estimate .