brain of mat kelcey
a quick study in tf/icf
June 09, 2010 at 09:58 PM | categories: frequency normalisation | View Comments
while doing some more research on trending algorithms i came across a cool little paper about term frequency normalisation for streaming data: TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams.i'm finding streaming related algorithms quite interesting lately and think are the way forward in terms of dealing with large amounts of constant data. it's just not feasible to use algorithms that expect you to have all the data at any given time; it forces you to reprocess all the data you've ever seen as you get new examples. my thinking is the best solutions are the ones...
old projects...
- latent semantic analysis via the singular value decomposition (for dummies)
- semi supervised naive bayes
- statistical synonyms
- round the world tweets
- decomposing social graphs on twitter
- do it yourself statistically improbable phrases
- should i burn it?
- the median of a trillion numbers
- deduping with resemblance metrics
- simple supervised learning / should i read it?
- audioscrobbler experiments
- chaoscope experiment