<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <title>brain of mat kelcey</title>
    <link>http://matpalm.com/blog</link>
    <description>thoughts from a data scientist wannabe</description>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <item>
      <title>brutally short intro to collaborative filtering</title>
      <link>http://matpalm.com/blog/2010/03/18/brutally-short-intro-to-collaborative-filtering/</link>
      <category><![CDATA[recommendations]]></category>
      <category><![CDATA[brutally short intro]]></category>
      <category><![CDATA[data mining]]></category>
      <guid>http://matpalm.com/blog/?p=346</guid>
      <description>brutally short intro to collaborative filtering</description>
      <content:encoded><![CDATA[<p>my favourite recommendations system is the collaborative filter; it gives good results
and is easy to understand and extend as required.</p>
<p>it works on the intuition that
if i like coffee, chocolate and ice cream
and you like coffee and chocolate
you might also like ice cream</p>
<p>so we need a little bit of terminology; <em>users</em> (me and you), <em>items</em> (coffee, chocolate and ice cream)</p>
<p>in a user based collaborative filter the process is</p>
<pre>
to calculate recommendation for user1
 for each other user (user2)
  calculate user_similarity_score between user1 and user2 (0 -> 1 value )
  if the user_similarity_score is non zero
  for each item user2 has that user1 doesn't
   add to user1's recommendations, weighted by the user_similarity_score
</pre>

<p>e.g. say alice, bob, charlie and dave have listed the things they like...</p>
<p>alice likes coffee and chocolate
bob likes coffee, chocolate and ice cream
charlie likes coffee, ice cream and carob
dave likes carob and fruit cake</p>
<p>to recommend items for alice we first need a way to calculate a similarity between users</p>
<p>a reasonable measure when we just have these sets of items is the jaccard coefficient defined simply as the number of items in common divided by the total number of items.</p>
<pre>
Jaccard(alice,bob) = 2/3
Jaccard(alice,charlie) = 1/4
Jaccard(alice,dave) = 0/4 = 0
</pre>

<p>we can then build up a list of alice's recommendations based on the items others have seen that alice hasn't</p>
<p>from bob we get ice cream for a value of 2/3
from charlie we get ice cream for 1/4 and carob for 1/4
we can ignore dave since alice had nothing in common with them.</p>
<p>so ice cream is the highest recommended item with a score of 2/3 + 1/4 = 0.91
carob is also recommended but with a much lower strength = 0.25</p>
<p>easy peasy and a great place to start!</p>]]></content:encoded>
    </item>
  </channel>
</rss>

