- This topic has 1 reply, 2 voices, and was last updated 4 years, 7 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
You must be logged in to reply to this topic. Login here
Home › Forums › Data mining and machine learning › Archive 2020 › General Topic › Week 1 section 1.5 Nearest and Farthest neighbour clustering
I might have missed this but how do we decide that we need to use the single linkage or complete linkage during the Agglomerative Hierarchical clustering ? The Assignment 2 showed both graphs but how would say I as a data person decide which to use ?
Thanks for the question! Yes, the reading assignment 2 of section 1.5 mentions both single linkage and complete linkage, which are two different ways for an agglomerative clustering to merge smaller clusters to bigger clusters. Even though we did not mean to elaborate on their definitions and difference in this course, we are happy to see its discussion on this forum.
To recap, single linkage clustering will merge based on the smallest distance of members within clusters, while complete linkage clustering will do so based on the largest distance of the members within clusters. That yields very different results as you have noticed.
By its definition, the single linkage will give you the merge which spreads locally. It will merge anything (points or clusters) that comes as the next closest objects together. I’m thinking it’s similar to water droplets being merged on a surface. Once smaller droplets form bigger water droplets, they expand to the next closest water droplet to merge. Edge-to-edge distances between clusters are a deciding factor for merging here.
In contrast, the complete linkage looks for a minimum farthest distance between clusters. This makes it not sensitive to outliers, i.e. it will not merge by only looking locally for the closest neighbor, but it will look more globally for who is the most similar neighbor. As a result, the complete linkage tends to give us more compact clusters. That is not to say that the single linkage is without its merit. The single linkage can produce clusters that are more inclusive of special cases.
And you can decide the type of clustering you want for your analysis depending on your goal of analysis as well as the nature of data.
You must be logged in to reply to this topic. Login here