notebooks/ICML_Code/PathsonRecSys.ipynb

Need to download MovieLens 20m Data Set (190 MB).

Data sets:

ratings.csv
- 3 columns: userId, movieId, rating
movies.csv
- 3 columns: movieId, title, genres (| separated)

We load the ratings.csv and movies.csv data sets and create a subset of movies based on the number of ratings.

Create a map movies_map from movieId → list of user ids
Create a subset slim_movies_map of movies_map by keeping movies with at least 10 ratings
Create a subset slimer_movies_map of movies_map randomly including some movies and making sure to include a few select movies. The number of movies goes from 22,884 to ~216.

Question: Is the slimming of movies_map just for shorter computational time? Or is there some statistical advantage of a smaller sample size?

Building 1-skeleton:

The code simply generates a Cech complex but instead of balls we use Jaccard distance.

J ({C_{i}}) = 1 - \frac{∣ \cap _{i} C _{i} ∣}{∣ \cup _{i} C _{i} ∣}

The covers $C_{i}$ contain all users who rated movie $i$ . Thus, the intersection would give the users who rated a set of movies. Hence, if $J ({C_{i}}) < 1$ , it follows that at least two people rated the same movie, that is, the covers intersect.

In terms of the mapper, our filter function is given by $viewer ratings \mapsto movies .$ We cover the space of movies with singleton sets, resulting in the preimage giving all ratings of a movie.

The movies_map represents all users who rated a particular movie.

Starts with Cover.build:

Create 0-simplices, one for each movie.
For all $k$ -combinations of movies ( $k = 2, \dots, ma x_d im + 1$ ) Jaccard distance of the covers. If it is less than 1, add $(k - 1)$ -simplex.
Sample result:

[([3], 0.0), ([100], 0.0), ([253], 0.0), ([447], 0.0), ([574], 0.0), ([688], 0.0), ([702], 0.0), ([753], 0.0), ([1127], 0.0), ([1197], 0.0), ([1269], 0.0), ([1314], 0.0), ([1386], 0.0), ([1459], 0.0), ([1659], 0.0), ([1907], 0.0), ([1909], 0.0),...((117533, 118930), 0.9782608695652174)]

Generate a nx.Graph representing the 1-skeleton

Notes

Explorer

2024-04-08

notebooks/ICML_Code/PathsonRecSys.ipynb

Building 1-skeleton:

Graph View

Table of Contents

Backlinks

Source code