<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <title>brain of mat kelcey</title>
    <link>http://matpalm.com/blog</link>
    <description>thoughts from a data scientist wannabe</description>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <item>
      <title>e12.2 entity set expansion</title>
      <link>http://matpalm.com/blog/2010/01/28/e12-2-entity-set-expansion/</link>
      <category><![CDATA[linguistics]]></category>
      <category><![CDATA[e12]]></category>
      <guid>http://matpalm.com/blog/?p=259</guid>
      <description>e12.2 entity set expansion</description>
      <content:encoded><![CDATA[<p>i've been doing some reading for my statistical synonyms project and have uncovered a heap of cool papers. most of them are around an idea (from the 1950's!) called <a href="http://en.wikipedia.org/wiki/Distributional_hypothesis">the distributional hypothesis</a> that simply states that words that appear in similar contexts often have similar meanings.</p>
<p>the coolest paper so far is <a href="http://scholar.google.com.au/scholar?q=Web-Scale+Distributional+Similarity+and+Entity+Set+Expansion&amp;hl=en">'Web-Scale Distributional Similarity and Entity Set Expansion' by Pantel,Crestan,Borkovsky et al</a> which has introduced me to an area of research i didn't really know existed; entity set expansion.</p>
<p>entity set expansion is a bit like thesaurus building for proper nouns; given a seed set of related items can you expand the set to include other semantically similiar items?</p>
<p>an example might be brands of japanese motorbikes. starting with 'yamaha' and 'kawasaki' we might expect the set to be expanded to include 'honda'</p>
<p>i started hacking around in <a href="http://hadoop.apache.org/pig/">pig</a> but today switched back to ruby for slightly quicker prototyping. who knows, i might give <a href="http://github.com/iconara/piglet">piglet</a> a go!</p>
<p>the code is on <a href="http://github.com/matpalm/statistical_synonyms">github</a></p>]]></content:encoded>
    </item>
  </channel>
</rss>

