- This topic has 1 reply, 2 voices, and was last updated 1 year, 5 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
You must be logged in to reply to this topic. Login here
Home › Forums › Data mining and machine learning › Archive 2023 › The outliers in clustering
Hi, I am glad to be given a chance to study clustering here, but I am concerned about data in abnormal positions or distances (what we called outliers). Some algorithms such as K-means and hierarchical clustering may be tolerant and can capture the outliers to be included in the clusters. However, other algorithms such as DBSCAN do not capture outliers (from what I learned in YouTube videos). Here, I need to ask how should we manage the outliers itself? I mean, are the outliers something that we need to consider in the cluster or do we just ignore them? Thank you.
Hi Abdillah,
You are right to be concerned about outliers. Outliers are ubiquitous in all kinds of datasets! When we are trying to make sense of something, we probably don’t want to think too much about exceptional cases like outliers. Because outliers can distort our understanding of the nature of the data.
There is an exception if we want to study those extreme cases, then outliers are important. But for clustering, algorithms that are prone to outliers like k-means or hierarchical clustering need to be treated with care. If we are aware that data has outliers, some may choose to play with inclusion / exclusion of those cases to see if clustering results change.
I hope that makes sense. Thanks for a spark of discussion!
Pimwadee
You must be logged in to reply to this topic. Login here