The effect of data set characteristics on the choice of clustering validity index type

Tugba Taskaya Temizel, Mehrdad A. Mizani, Tulin Inkaya, Sait Can Yucebas

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Clustering techniques are widely used to give insight about the similarities/dissimilarities between data set items. Most algorithms require the user to tune parameters such as number of clusters or threshold for cut-off point in a dendrogram. Such parameters also affect the clustering quality. In a good quality cluster, the intra-cluster similarity should be high, whereas the inter-cluster similarity should be low. To determine the optimal cluster number, several cluster validity methods have been proposed. However, there is no guideline with respect to which clustering validity methods can be used in conjunction with which clustering algorithms. In this paper, Dunn and SD validity indices were applied to Kohonen self organizing maps, k-means and agglomerative clustering algorithms and their limitations were shown empirically.

Original languageEnglish
Title of host publication22nd International Symposium on Computer and Information Sciences, ISCIS 2007 - Proceedings
Pages169-174
Number of pages6
DOIs
Publication statusPublished - 1 Dec 2007
Event22nd International Symposium on Computer and Information Sciences, ISCIS 2007 - Ankara, Turkey
Duration: 7 Nov 20079 Nov 2007

Conference

Conference22nd International Symposium on Computer and Information Sciences, ISCIS 2007
Country/TerritoryTurkey
CityAnkara
Period7/11/079/11/07

Fingerprint

Dive into the research topics of 'The effect of data set characteristics on the choice of clustering validity index type'. Together they form a unique fingerprint.

Cite this