197
Building a recommendations system
Today is a weekend day with good weather. Based on the data you just
saw, how many loaves will you sell? Let’s use KNN, where K = 4. First,
figure out the four nearest neighbors for this point.
Here are the distances. A, B, D, and E are the closest.
Take an average of
the loaves sold on those days, and you get 218.75.
That’s how many loaves you should make for today!
Cosine similarity
So far, you’ve been using the distance formula to compare the distance
between two users. Is this the best formula to use?
A common one used
in practice is
cosine similarity
. Suppose two users are similar, but one of
them is more conservative in their ratings.
They both loved Manmohan
Desai’s
Amar Akbar Anthony
. Paul rated it 5 stars, but Rowan rated it 4
stars. If you keep using the distance formula, these two users might not be
each other’s neighbors, even though they have similar taste.
Cosine similarity doesn’t measure the distance between two vectors.
Instead, it compares the angles of the two vectors. It’s
better at dealing
with cases like this. Cosine similarity is out of the scope of this book, but
look it up if you use KNN!