me on twitter

brain of mat kelcey


finding phrases with mutual information

November 15, 2011 at 11:00 PM | categories: nlp, phrase-extraction, collocations, mutual-information | View Comments

continuing on with my series of mutual information experiments how might we extend the technique to identity sequences longer than just two terms?one novel way is to identify the bigrams of interest, replace them with a single token and simply repeat the entire process. (thanks ted for the idea)so say we had the 6 term sentence i went to new york cityit has 5 bigrams; ('i went', 'went to', 'to new', 'new york', 'york city')running the mutual information algorithm over this might identify new york as a bigram of interest. we can swap the two terms with a single token...
Read and Post Comments

old projects...