Topology for Data Science

Rips Filtration

Tutorial on Rips Filtrations


Objectives

  • Gain familiarity with simplices and simplicial complexes
  • Be able to conceptualize a rips filtration, and understand what a rips filtration tells us about a set of data
  • Learn about a few exciting applications of the Rips filtration
  • Utilize the R package TDA to conduct a rips filtration on a miniature bobcat.

Triangle Appreciation

Of course we are all familiar with triangles, and to understand the Rips filtration, we must first conceptualize triangles of differing dimensions. that is triangles of increasing dimension ranging from 0 to n. (The -1nth dimensional triangle being a null value). n dimensional triangles A simplex is synonymous with an n dimensional triangle, that is it is just a triangle of any dimension.


Now, consider a set of vertices, each of which being the center of a circle of some generic, fixed radius, as follows: vertices


If we were to generate a simplex between each set of vertices whose circles intersect, we can visualize the result in this data as primarily being 1 simplices, with one 0 simplex, one 2 simplex, and one 3 simplex. simplicial complex The term for this creation is a simplicial complex, and it can actually tell us quite a lot about a set of data.


Now, visualize a similar set of data, but with circles whose radii simultaneously change over time. Every point has a circle with an equivalent radius, but the entire set of radii changes constantly over time. Features will emerge and disappear as the universal radius changes. rips gif

Finally, the process will terminate over a data set of n points once an n - 1 dimensional simplex has been created (this is a simplex connecting all of the points).


We refer to the emergence of features as “births,” and the disappearance of features as “deaths” in the data set. Over time, we can trace births and deaths of features through something called a persistence diagram.


Persistence diagrams allow us to pick up on trends in data that could be of importance. We can track the lifespan of features throughout the filtration in order to separate high persistence, theoretically more important features in a data set from low persistence features, thought to be statistical noise.


Persistence diagrams simply operate with one axis dedicated to birth time and one axis dedicated to death time. As time passes, we can visualize data points who immediately die upon being born as falling along the line y = x, if y were to represent time along the death axis and x were to represent time along the birth axis. Features of higher persistence are born and remain for longer periods of time, so their points can be plotted above this line. Features that never die are plotted at time = infinity on the death axis. To visualize:

persistence diagram


Rips filtrations are especially useful because they can indicate meaning in data without depending on a direction to consider the persistent homology of a data set.


Rips filtration as a technique certainly has benefits, however a major shortcoming of using a rips filtration is that rips filtrations can get very large very quickly.


One pertinent example of utilizing a rips filtration as a useful tool to understand data comes in shape reconstruction; the ability to categorize matching complex shapes mathematically. This often arises in graphics and machine learning, in order for a machine to more effectively understand an nature of its surroundings. Because rips filtrations take into account the entire data set simultaneously, they are advantageous in categorizing shapes which could be rotated, since the rips filtration doesn’t favor any specific direction.

Because the major shortcoming of using rips filtration is its time complexity, a current important source of progress in the topological data analysis community is coming from groups attempting to approximate the rips filtration effectively with more efficient methods. One interesting paper came out of Ohio State in 2016, which gives a good overview of various techniques to estimate a rips filtration and proposes a new, efficient method which retains most of the original information in a proper rips filtration.


Rips Filtration in the R Package ‘TDA’

R package

R

Works Cited: https://aqjaffe.github.io/VRPolygons/assets/CechFiltration.gif

This project is maintained by benholmgren