Today I reimplemented the code described in 2024-04-08 in rust. It works the same as the code in the Jupyter notebook. My goal was to speed things up by switching to a higher performant language. With the smaller subset as given in the notebook, the entire process (compiling, loading the data, building the filtration, finding stable paths) takes about 23 seconds.
To test it I found the path from my favorite movie (Sweeney Todd) to Toy Story using the smaller subset:
- Sweeney Todd: The Demon Barber of Fleet Street (2007)
- Transformers (2007)
- Donnie Darko (2001)
- Toy Story (1995) (stability: 0.8425772)
Using the entire data set (removing movies with less than 10 ratings) gives the following most stable path:
- Sweeney Todd: The Demon Barber of Fleet Street (2007)
- Corpse Bride (2005)
- Charlie and the Chocolate Factory (2005)
- Star Wars: Episode III - Revenge of the Sith (2005)
- Lord of the Rings: The Fellowship of the Ring, The (2001)
- Toy Story (1995) (stability: 0.7412176)
With the following shortest path:
- Sweeney Todd: The Demon Barber of Fleet Street (2007)
- Toy Story (1995) (stability: 0.9505234)
The above output took ~6.5 hours to compute.
All paths:
- Sweeney Todd: The Demon Barber of Fleet Street (2007)
- Toy Story (1995) (stability: 0.9505234)
- Sweeney Todd: The Demon Barber of Fleet Street (2007)
- V for Vendetta (2006)
- Toy Story (1995) (stability: 0.8516998)
- Sweeney Todd: The Demon Barber of Fleet Street (2007)
- Charlie and the Chocolate Factory (2005)
- Finding Nemo (2003)
- Toy Story (1995) (stability: 0.7909518)
- Sweeney Todd: The Demon Barber of Fleet Street (2007)
- Corpse Bride (2005)
- Charlie and the Chocolate Factory (2005)
- Incredibles, The (2004)
- Toy Story (1995) (stability: 0.7585832)
- Sweeney Todd: The Demon Barber of Fleet Street (2007)
- Corpse Bride (2005)
- Charlie and the Chocolate Factory (2005)
- Star Wars: Episode III - Revenge of the Sith (2005)
- Lord of the Rings: The Fellowship of the Ring, The (2001)
- Toy Story (1995) (stability: 0.7412176)