brain of mat kelcey
fastmap and the jaccard distance
October 31, 2008 at 11:31 AM | categories: algorithms, deduplication, c++ | View Comments
given a set of pairwise distances how do you determine what points correspond to those distances?my latest experiment considers this problem in relation to jaccard distances, a resemblance measure similar to jaccard coefficients used in a previous experimentby using the fastmap algorithm we get points from distances and once you have points you have visualisation!...
shingling and the jaccard index
October 06, 2008 at 11:30 AM | categories: ruby, algorithms, deduplication, c++ | View Comments
on the weekend i did another experiment using shingling and the jaccard index to try to determine if two sets of data were “duplicates”it works quite well and includes a ruby and c++ version with low level bit operations.project page is www.matpalm.com/resemblancecode at github.com/matpalm/resemblance...