brain of mat kelcey
finding phrases with mutual information
November 15, 2011 at 11:00 PM | categories: nlp, phrase-extraction, collocations, mutual-information | View Comments
continuing on with my series of mutual information experiments how might we extend the technique to identity sequences longer than just two terms?one novel way is to identify the bigrams of interest, replace them with a single token and simply repeat the entire process. (thanks ted for the idea)so say we had the 6 term sentence i went to new york cityit has 5 bigrams; ('i went', 'went to', 'to new', 'new york', 'york city')running the mutual information algorithm over this might identify new york as a bigram of interest. we can swap the two terms with a single token...
old projects...
- latent semantic analysis via the singular value decomposition (for dummies)
- semi supervised naive bayes
- statistical synonyms
- round the world tweets
- decomposing social graphs on twitter
- do it yourself statistically improbable phrases
- should i burn it?
- the median of a trillion numbers
- deduping with resemblance metrics
- simple supervised learning / should i read it?
- audioscrobbler experiments
- chaoscope experiment