<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <title>brain of mat kelcey</title>
    <link>http://matpalm.com/blog</link>
    <description>thoughts from a data scientist wannabe</description>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <item>
      <title>a pig screencast</title>
      <link>http://matpalm.com/blog/2010/01/17/a-pig-screencast/</link>
      <category><![CDATA[screencast]]></category>
      <category><![CDATA[hadoop]]></category>
      <category><![CDATA[pig]]></category>
      <guid>http://matpalm.com/blog/?p=248</guid>
      <description>a pig screencast</description>
      <content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="400" height="300" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://vimeo.com/moogaloop.swf?clip_id=8789251&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed type="application/x-shockwave-flash" width="400" height="300" src="http://vimeo.com/moogaloop.swf?clip_id=8789251&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p><a href="http://vimeo.com/8789251">pig demo</a> from <a href="http://vimeo.com/user2935988">Mat Kelcey</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
<p>based on a talk i gave at work recently</p>]]></content:encoded>
    </item>
    <item>
      <title>e11.2 aggregating tweets by time of day</title>
      <link>http://matpalm.com/blog/2009/10/24/e11-2-aggregating-tweets-by-time-of-day/</link>
      <category><![CDATA[e11]]></category>
      <category><![CDATA[twitter]]></category>
      <category><![CDATA[hadoop]]></category>
      <category><![CDATA[pig]]></category>
      <guid>http://matpalm.com/blog/?p=144</guid>
      <description>e11.2 aggregating tweets by time of day</description>
      <content:encoded><![CDATA[<p>for v3 lets aggregate by time of the day, should make for an interesting animation</p>
<p>browsing the data there are lots of other lat longs in data, not just iPhone: and ÜT: there are also one tagged with Coppó:, Pre:, etc perhaps should just try to take anything that looks like a lat long.</p>
<p>furthermore lets switch to a bigger dataset again, 4.7e6 tweets from Oct 13 07:00 thru Oct 19 17:00,</p>
<p>i've been streaming all my tweets ( <a href="http://matpalm.com/blog/2009/10/16/e11-0-tweets-around-the-world/">as previously discussed</a> ) and been storing them in a directory json_stream</p>
<p>here are the steps...</p>
<h2>1. extract locations</h2>
<p>use a streaming script to take a tweet in json form and emit the tweet time and location string</p>
<pre>export HADOOP_STREAMING_JAR=$HADOOP_HOME/contrib/streaming/hadoop-*-streaming.jar
hadoop jar $HADOOP_STREAMING_JAR \
 -mapper <a href="http://github.com/matpalm/rtw_tweet/blob/master/v3/extract_locations.rb">./extract_locations.rb</a> -reducer /bin/cat \
 -input json_stream -output locations</pre>

<p>sample output (4.7e6 tuples) { time, location string }</p>
<pre>Wed Oct 14 22:01:41 +0000 2009    iPhone: -23.492420,-46.846916
Wed Oct 14 22:01:41 +0000 2009    Ottawa
Wed Oct 14 22:01:41 +0000 2009    DA HOOD
Wed Oct 14 22:01:42 +0000 2009    Earth</pre>

<h2>2. pluck lat longs from locations</h2>
<p>make another pass and extract possible lat lons from the location strings</p>
<pre>hadoop jar $HADOOP_STREAMING_JAR \
 -mapper <a href="http://github.com/matpalm/rtw_tweet/blob/master/v3/extract_lat_longs_from_locations.rb">./extract_lat_longs_from_locations.rb</a> -reducer /bin/cat \
 -input locations -output lat_lons</pre>

<p>sample output (reduces down to 320e3 data points) { time, lat, lon }</p>
<pre>Wed Oct 14 22:01:41 +0000 2009    -23.49242    -46.846916
Wed Oct 14 22:05:25 +0000 2009    35.670086    139.740766
Wed Oct 14 22:11:35 +0000 2009    41.37731257    -74.68153942
Wed Oct 14 22:15:18 +0000 2009    51.503212    5.478329</pre>

<h2>3. bucket data into timeslices and points for a map</h2>
<p>we need to project the times into 10min slots; ie 00:05 will be slot 0, 00:12 will be slot 1.</p>
<p>also use to project the lat lons to x and y coords (0-&gt;1) using a simple <a href="http://en.wikipedia.org/wiki/Mercator_projection">mercator</a> projection</p>
<pre>hadoop jar $HADOOP_STREAMING_JAR \
 -mapper <a href="http://github.com/matpalm/rtw_tweet/blob/master/v3/lat_long_to_merc_and_bucket.rb">./lat_long_to_merc_and_bucket.rb</a> -reducer /bin/cat \
 -cmdenv BUCKET_SIZE=0.005 \
 -input lat_lons -output x_y_points</pre>

<p>sample output { timeslice, normalised x position, normalised y position }</p>
<pre>122     0.48    0.205
122     0.295   0.26
122     0.29    0.26
123     0.265   0.265</pre>

<p>as a slight digression before we move onto aggregating per timeslice here's a pic of all 320e3 tweets on a heatmap.</p>
<p>some interesting noise on the greenwich meridian, must be incorrectly identified lat lons during the <a href="http://github.com/matpalm/rtw_tweet/blob/master/v3/extract_lat_longs_from_locations.rb">./extract_lat_longs_from_locations.rb</a> step.</p>
<h3>log10 tweet location (click for a hires version)</h3>
<p><a href="http://matpalm.com/rtw_tweet/v3/hi_res_320e3_log.jpg"><img class="size-full wp-image-149" title="lo_res_320e3_log" src="/blog/imgs/2009/10/lo_res_320e3_log.jpg" alt="log10 tweet location, click for a hires version" width="640" height="496" /></a></p>
<h2>4. aggregate (x,y) pairs per timeslice</h2>
<p>next we aggreate, per timeslice, the frequency of points each x,y point.
we'll do this with a pig script, <a href="http://github.com/matpalm/rtw_tweet/blob/master/v3/aggregate_per_timeslice.pig">aggregate_per_timeslice.pig</a></p>
<pre>
# aggregating per timeslice
pts = load 'x_y_points/part-00000' as (timeslice:int, x:float, y:float);
pts2 = group pts by (timeslice,x,y);
pts3 = foreach pts2 generate $0, COUNT($1) ;
pts4 = foreach pts3 generate $0.$0, $0.$1, $0.$2, $1 as freq;
pts5 = order pts4 by timeslice;
store pts5 into 'aggregated_freqs';</pre>

<p>results in the tuples in 'aggregated_freqs' { timeslice, normalised x position, normalised y position, frequency }</p>
<pre>0    0.0    0.32    1
0    0.06    0.325    9
0    0.065    0.33    1
0    0.08    0.17    2
0    0.155    0.225    8</pre>

<p>we need to normalise each frequency value for drawing on the map and would have like to have done this in pig also but turns out there isn't a log function in v0.3 of pig (??)</p>
<p>will have to do scaling when generating the images. isn't such a big deal since the dataset is quite small at this stage but was trying to use this whole thing as an excuse to learn pig :(</p>
<h2>5. take aggregated_freqs and make 144 heat map images</h2>
<p>use a simple script to read through the aggregated_freqs and generate a heap map for each frame</p>
<pre><a href="http://github.com/matpalm/rtw_tweet/blob/master/v3/heat_maps.rb">heat_maps.rb</a> aggregated_freqs 0.005 frames</pre>

<h2>6. convert to animation</h2>
<p>next bundle stills into an animation and upload to youtube</p>
<pre>mencoder mencoder "mf://frames/*" -mf fps=25 -o rtw_tweet_v3.avi -ovc x264 -x264encopts bitrate=750</pre>

<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/cSnGI33CwP0&amp;hl=en&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/cSnGI33CwP0&amp;hl=en&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<h2>7. conclusions</h2>
<ol>
<li>didn't really end up using hadoop's power that much; streaming jobs that use just cat as a reducer as just a parallel way of doing 1:1 string mapping</li>
<li>aggregation was really easy in pig but lack of Log function is annoying; could have written a <a href="http://wiki.apache.org/pig/UDFManual">UDF</a>, and there probably already is one but i couldn't find it</li>
<li>this visualisation came out pretty lame; funny to see how the really swish visualisations rely far more on pretty colours and smooth lines than the data itself. there are a bundle of things i could do with this one but it's time to move on to something else.</li>
</ol>]]></content:encoded>
    </item>
    <item>
      <title>e11.1 from bash scripts to hadoop</title>
      <link>http://matpalm.com/blog/2009/10/18/e11-1-from-bash-scripts-to-hadoop/</link>
      <category><![CDATA[e11]]></category>
      <category><![CDATA[maps]]></category>
      <category><![CDATA[twitter]]></category>
      <category><![CDATA[hadoop]]></category>
      <category><![CDATA[pig]]></category>
      <guid>http://matpalm.com/blog/?p=100</guid>
      <description>e11.1 from bash scripts to hadoop</description>
      <content:encoded><![CDATA[<p>let's rewrite <a href="http://matpalm.com/blog/2009/10/16/e11-0-tweets-around-the-world/">v1</a> using hadoop tooling, code is on <a href="http://github.com/matpalm/rtw_tweet/tree/master/v2/">github</a></p>
<p>we'll run hadoop in non distributed <a href="http://hadoop.apache.org/common/docs/r0.20.0/quickstart.html#Local">standalone mode</a>. in this mode everything runs in a single jvm so it's nice and simple to dev against.</p>
<h3>step 1: extract the locations strings from the json stream</h3>
<p>in v1 it was</p>
<div class="pygments_murphy"><pre>bzcat sample.bz2 | ./extract_locations.pl &gt; locations
</pre></div>

<p>using the the awesome <a href="http://hadoop.apache.org/common/docs/current/streaming">hadoop streaming</a> interface it's not too different. this interface allows you to specify any app as the mapper or reducer. the main difference is that it works on directories not just files.</p>
<p>for the mapper we'll use exactly the same script as before; extract_locations.pl and since there is no reduce component of this job so we use an "identity" script, ie cat, as the reduce phase.</p>
<div class="pygments_murphy"><pre>mkdir json_stream
bzcat sample.bz2 | gzip - &gt; json_stream/input.gz
# hadoop supports gzip out of the bound but not bzip2 :(
export HADOOP_STREAMING_JAR=$HADOOP_HOME/contrib/streaming/hadoop-*-streaming.jar
hadoop jar $HADOOP_STREAMING_JAR \
  -mapper ./extract_locations.pl -reducer /bin/cat \
  -input json_stream -output locations
</pre></div>

<p>this gives us the locations in a single file locations/part-0000</p>
<h3>step 2: extract iphone and ut lat longs strings</h3>
<p>the second step is another text munging problem where we extract just the lat longs for the iPhone and UT tagged locations</p>
<p>ie for strings of the form</p>
<div class="pygments_murphy"><pre>iPhone: 21.320328,-157.877579
\u00dcT: 41.727877,-91.626323
</pre></div>

<p>we want to extract</p>
<div class="pygments_murphy"><pre>21.320328 -157.877579
41.727877 -91.626323
</pre></div>

<p>since this is just text manipulation we'll use streaming again</p>
<p>in v1 it was</p>
<div class="pygments_murphy"><pre>cat locations | ./extract_lat_longs_from_locations.rb iphone &gt; locations.iphone
cat locations | ./extract_lat_longs_from_locations.rb ut &gt; locations.ut
</pre></div>

<p>for hadoop streaming it's</p>
<div class="pygments_murphy"><pre>hadoop jar $HADOOP_STREAMING_JAR \
  -mapper &#39;./extract_lat_longs_from_locations.rb iphone&#39; -reducer /bin/cat \
  -input locations -output locations.iphone
hadoop jar $HADOOP_STREAMING_JAR \
  -mapper &#39;./extract_lat_longs_from_locations.rb ut&#39; -reducer /bin/cat \
  -input locations -output locations.ut
</pre></div>

<h3>step 3: convert from lat long to mercator coordinates and aggregate into buckets for the heat map</h3>
<p>in v1 it was</p>
<div class="pygments_murphy"><pre>cat locations.{ut,iphone} | ./lat_long_to_merc.rb | ./bucket.rb | sort | uniq -c
</pre></div>

<p>this converts the three tuples { lat, long }</p>
<div class="pygments_murphy"><pre>35.670086 139.740766
-23.492420 -46.846916
35.657570 139.744858
</pre></div>

<p>into two tuples { frequency, left-offset, top-offset }</p>
<div class="pygments_murphy"><pre>1 0.36 0.45
2 0.88 0.28
</pre></div>

<p>the first two parts, converting to mercator (lat_long_to_merc.rb) and the bucketing (bucket.rb), i'll combine into one script.</p>
<div class="pygments_murphy"><pre>hadoop jar $HADOOP_STREAMING_JAR \
  -mapper ./lat_long_to_merc_and_bucket.rb -reducer /bin/cat \
  -input locations.iphone -input locations.ut -output x_y_points
</pre></div>

<p>but the use of sort and uniq to aggregate the data is represented by the shuffle and reduce stages of hadoop.</p>
<p>we could use the aggregate functionality of the streaming interface but i'm trying to learn more pig so we'll use that instead. <a href="http://hadoop.apache.org/pig/">pig</a> is a scripting language that translates a pig latin query language into map reduce jobs. my main motivation for using it has been that it's great at doing joins, something i've found to be a <a href="http://matpalm.com/sip/take2_term_frequency.html#hadoop+part+2">big pain</a> to represent in plain map reduce jobs.</p>
<p>( note we didn't do the conversion to mercator and bucketing in pig, the arithmetic operations provided are a bit lacking. )</p>
<p>enter a pig shell running in standalone (ie non hadoop distributed) mode</p>
<div class="pygments_murphy"><pre>bash&gt; pig -x local
</pre></div>

<p>load the points</p>
<div class="pygments_murphy"><pre>grunt&gt; pts = load &#39;x_y_points/part-00000&#39; as (x:float, y:float);
grunt&gt; describe pts;
pts: {x: float,y: float}
grunt&gt; dump pts
(0.06F,0.32F)
(0.15F,0.27F)
(0.16F,0.27F)
...
</pre></div>

<p>group them together</p>
<div class="pygments_murphy"><pre>grunt&gt; buckets = group pts by (x,y);
grunt&gt; describe buckets;
buckets: {group: (x: float,y: float),pts: {x: float,y: float}}
grunt&gt; dump buckets;
((0.06F,0.32F),{(0.06F,0.32F)})
((0.15F,0.27F),{(0.15F,0.27F)})
((0.16F,0.27F),{(0.16F,0.27F),(0.16F,0.27F),(0.16F,0.27F),(0.16F,0.27F)})
...
</pre></div>

<p>from the groups emit the size of each bucket, this corresponds to the frequency</p>
<div class="pygments_murphy"><pre>grunt&gt; freq = foreach buckets { generate group, SIZE(pts) as size; }
grunt&gt; describe freq;
freq: {group: (x: float,y: float),size: long}
grunt&gt; dump freq
((0.06F,0.32F),1L)
((0.15F,0.27F),1L)
((0.16F,0.27F),4L)
...
</pre></div>

<p>and based on the sizes we can evaluate the min and max frequencies which we'll use in the colour coding of the heat map</p>
<div class="pygments_murphy"><pre>grunt&gt; freqs = group freq all;
grunt&gt; describe freqs;
freqs: {group: chararray,freq: {group: (x: float,y: float),size: long}}
grunt&gt; dump freqs;
(all,{((0.06F,0.32F),1L),((0.15F,0.27F),1L), ... })
grunt&gt; store freq into &#39;freqs&#39;;&lt;/pre&gt;
&lt;pre&gt;grunt&gt; min_max = foreach freqs { generate MAX(freq.size) as max, MIN(freq.size) as min; };
grunt&gt; describe min_max;
min_max: {max: long,min: long}
grunt&gt; dump min_max;
(7L,1L)
grunt&gt; store min_max into &#39;min_max&#39;;&lt;/pre&gt;
&lt;pre&gt;bash&gt; cat freqs
(0.06,0.32)   1
(0.15,0.27)   1
(0.16,0.27)   4
</pre></div>

<p>these call all be run as one command</p>
<div class="pygments_murphy"><pre>bash&gt; pig -x local -f freqs.pig
</pre></div>

<p>we just need our final conversion to a javascript snippet to jam into a map page</p>
<div class="pygments_murphy"><pre>bash&gt; cat freqs | ./as_draw_square.rb 1 7
</pre></div>

<p>win!</p>
<p>to make things a little different lets use a bigger sample of 475e3 tweets from oct 13 07:00 to 20:00. this results in 10e3 iphone locations (7e3 unique) and 22e3 ut locations (15e3 unique)</p>
<p>lat longs are bucketed into only 478 pixels for map</p>
<p>here's one plot with the raw numbers; highest freq is 9e3 in jakarta</p>
<h4>raw frequencies</h4>
<img class="size-full wp-image-111" title="raw frequencies" src="/blog/imgs/2009/10/raw1.jpg" alt="raw frequencies" width="682" height="529" />

<p>scaling down by log 10 gives a smoother map</p>
<h4>log10 frequencies</h4>
<img class="size-full wp-image-113" title="log10 frequencies" src="/blog/imgs/2009/10/log10.jpg" alt="log10 frequencies" width="682" height="529" />

<p>and here is a comparison of iphone vs ut. without knowing what ut is i can see it's not big in northern europe or japan but it's popular in indonesia.</p>
<h4>iphones</h4>
<img class="size-medium wp-image-120" title="iphones" src="/blog/imgs/2009/10/iphones-300x232.jpg" alt="iphones" width="300" height="232" />

<h4>ut</h4>
<img class="size-medium wp-image-119" title="ut" src="/blog/imgs/2009/10/ut-300x232.jpg" alt="ut" width="300" height="232" />

<p>next steps, animating based on the hour of the day</p>]]></content:encoded>
    </item>
    <item>
      <title>e10.3 twitter crawl progress</title>
      <link>http://matpalm.com/blog/2009/09/29/e10-3-twitter-crawl-progress/</link>
      <category><![CDATA[e10]]></category>
      <category><![CDATA[twitter]]></category>
      <category><![CDATA[algorithms]]></category>
      <category><![CDATA[hadoop]]></category>
      <guid>http://matpalm.com/blog/?p=70</guid>
      <description>e10.3 twitter crawl progress</description>
      <content:encoded><![CDATA[<p>since the twitter api is rate limited it's quite slow to crawl twitter and after a most of a week i've still only managed to get info on 8,000 users. i probably should subscribe to get a 20,000 an hr limit instead of the 150 i'm on now. i'll just let it chug along in the background of my pvr.</p>
<p>while the crawl has been going on i've been trying some things on the data to decide what to do with it.</p>
<p>i've managed to write a version of pagerank using <a href="http://hadoop.apache.org/pig/">pig</a> which has been very interesting. (for those who haven't seen it before pig is a query language that sits on top of hadoop's mapreduce). my initial feel for pig is that it's pretty awesome. it was <em>much</em> quicker to write this script than to write the <a href="http://matpalm.com/sip/">statistically improbable phrases</a>. in fact i'm reinspired to have another crack at the sip stuff using pig. my final result wasn't great for the performance of hadoop and after some <a href="http://mail-archives.apache.org/mod_mbox/hadoop-general/200909.mbox/%3C93d501de0909141814vaa8c9c0wc5a47ee05baae7de@mail.gmail.com%3E">great feedback on the hadoop mailing list</a> i've got a number of other things to try including writing my joins in pig.</p>
<p>anyways, here's my pagerank in pig</p>
<!--more-->

<p>done once</p>
<table class="pygments_murphytable"><tr><td class="linenos"><div class="linenodiv"><pre>1
2
3
4
5
6</pre></div></td><td class="code"><div class="pygments_murphy"><pre>edges = load &#39;edges&#39; as (from:chararray, to:chararray);
nodes = group edges by from;
node_contribs = foreach nodes generate group, 1.0 / (double)SIZE(edges) as contrib;
store node_contribs into &#39;node_contribs&#39;;
zero_contribs = foreach nodes generate group, (double)0 as contrib;
store zero_contribs into &#39;zero_contribs&#39;;
</pre></div>
</td></tr></table>

<p>done until convergence</p>
<table class="pygments_murphytable"><tr><td class="linenos"><div class="linenodiv"><pre> 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16</pre></div></td><td class="code"><div class="pygments_murphy"><pre>page_rank = load &#39;$input&#39; as (node:chararray, rank:float);
node_contribs = load &#39;node_contribs&#39; as (node:chararray, contrib:double);
nodes_page_rank = join node_contribs by node, page_rank by node;
contribs = foreach nodes_page_rank {
  generate node_contribs::node, (double)node_contribs::contrib*(double)page_rank::rank as contrib;
}
edges = load &#39;edges&#39; as (from:chararray, to:chararray);
joined_divy_groups = join edges by from, contribs by node_contribs::node;
page_rank_contributions = foreach joined_divy_groups generate edges::to, contribs::contrib;
zero_contribs = load &#39;zero_contribs&#39; as (node:chararray, contrib:double);
page_rank_contributions_with_zero = union page_rank_contributions, zero_contribs;
group_page_ranks = group page_rank_contributions_with_zero by edges::to;
next_page_rank = foreach group_page_ranks {
  generate group, 0.15+(0.85*SUM(page_rank_contributions_with_zero.contribs::contrib));
}
store next_page_rank into &#39;$output&#39;;
</pre></div>
</td></tr></table>

<p>as for all my projects code is on <a href="http://github.com/matpalm/tgraph">github</a></p>]]></content:encoded>
    </item>
    <item>
      <title>e10.0 introducing tgraph</title>
      <link>http://matpalm.com/blog/2009/09/19/e10-0-introducing-tgraph/</link>
      <category><![CDATA[big data]]></category>
      <category><![CDATA[e10]]></category>
      <category><![CDATA[twitter]]></category>
      <category><![CDATA[hadoop]]></category>
      <category><![CDATA[pig]]></category>
      <category><![CDATA[algorithms]]></category>
      <guid>http://matpalm.com/blog/?p=47</guid>
      <description>e10.0 introducing tgraph</description>
      <content:encoded><![CDATA[<p>so <a href="http://matpalm.com/sip/">e9 sip</a> is on hold for a bit while i kick off e10 tgraph. was looking for another problem to try hadoop with and came across a classic graph one, <a title="pagerank" href="http://en.wikipedia.org/wiki/PageRank">pagerank</a>. a well understood algorithm like page rank will be a  great chance to try <a href="http://hadoop.apache.org/pig/">pig</a>, the query language that sits on top of hadoop mapreduce.</p>
<p>so we need a graph to work on. my first thoughts were using one of the <a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2596">wikipedia linkage dumps</a> but it feels a bit sterile. instead it's a good excuse to do a little crawl of the following graph of twitter.</p>
<p>this will also be a chance to try to document a project via a blog. <a href="http://www.skorks.com/">skorks</a>' incessant blog rambling has convinced me to give it a go.</p>]]></content:encoded>
    </item>
    <item>
      <title>first hadoop experiment</title>
      <link>http://matpalm.com/blog/2009/09/16/first-hadoop-experiment/</link>
      <category><![CDATA[ec2]]></category>
      <category><![CDATA[big data]]></category>
      <category><![CDATA[hadoop]]></category>
      <guid>http://matpalm.com/blog/?p=43</guid>
      <description>first hadoop experiment</description>
      <content:encoded><![CDATA[<p>just finished my first hadoop experiment.</p>
<p><a href="http://matpalm.com/sip">matpalm.com/sip</a></p>
<p>not fantastic results but heaps of of feedback from hadoop mailing group</p>
<p>more results coming soon</p>]]></content:encoded>
    </item>
  </channel>
</rss>

