Skip main navigation

Hurry, only 9 days left to get one year of Unlimited learning for £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

A similarity-based-recommender in Python

A metric-based recommender in Python

Run and modify the following code and share your experiences.

import numpy as np
import pandas as pd

# Load the movie basic information dataset in a Pandas dataframe.
# https://datasets.imdbws.com/title.basics.tsv.gz
movies = pd.read_csv('data.tsv', sep='t', usecols=['tconst', 'primaryTitle'])

# Load the users' rating dataset into a Numpy array.
# The dataset contains 4,669,820 ratings from 1,499,238 users to
# 351,109 movies on the imdb.com website.
# https://ieee-dataport.org/open-access/imdb-users-ratings-dataset
ratings_array = np.load ('Dataset.npy')

# Convert the numpy array to a pandas DataFrame
ratings = pd.DataFrame([x.split(',') for x in ratings_array],
columns=['userID', 'movieID', 'rating', 'review_date'])
ratings = ratings.iloc[:, 0:3]

# Convert the rating column to numeric type.
ratings['rating'] = pd.to_numeric(ratings['rating'])
#ratings['review_date'] = pd.to_datetime(ratings['review_date'])

print(ratings.shape)
print(ratings.head())

# Sample 0.0025 of the original dataset to reduce the computation complexity.
# Total number of ratings is 11674.
frac_ratings = ratings.sample(frac = 0.0025)

print(frac_ratings.shape)

# Create a data frame of user ratings by pivoting users and their ratings for each movie
user_ratings = frac_ratings.pivot_table(index='userID', columns='movieID', values='rating')

# Compute the mean rating for each user
user_means = user_ratings.mean(axis=1)

# Center the ratings by subtracting the mean rating for each user
centered_ratings = user_ratings.sub(user_means, axis=0)

# Calculate the movie similarities based on the centered ratings
movie_sims = centered_ratings.corr()

# Function to recommend movies based on user history
def recommend_movies(user_id, n=10):
user_history = centered_ratings.loc[user_id].dropna()
sim_scores = movie_sims.loc[user_history.index].sum()
sim_scores = sim_scores.drop(user_history.index)
rec_movies = sim_scores.nlargest(n).index
return rec_movies

# Function to create the list of primaryTitles.
def movie_df (rec_movies):
titles = []
for movieId in rec_movies:
title = movies.loc[movies['tconst'] == movieId, 'primaryTitle'].iloc[0]
titles.append(title)
# create the DataFrame of tconst and primaryTitle
rec_movies_df = pd.DataFrame({'tconst': rec_movies, 'primaryTitle': titles})
return rec_movies_df
This article is from the free online

Recommender Systems in Python

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now