Improved K-means algorithm automatix acquisiotion of initial clustering center

User Rating:  / 0
PoorBest 

Authors:

Guangbin Sun, China University of Petroleum, Beijing, China

Hongqi Li, China University of Petroleum, Beijing, China

Haiying Huang, Daqing Oilfield Engineering Co,Ltd, Daqing, Heilongjiang, China

Abstract:

Purpose. The traditional K-means algorithm requires the K value, and it is sensitive to the initial clustering center. Different initial clustering centers often correspond to the different clustering results, and the K value is always required. Aiming at these shortcomings, the article proposes a method for getting the clustering center based on the density and max-min distance means. The selection of the clustering center and classification can be carried out simultaneously.

Methodology. According to the densities of objects, the noise was eliminated and the densest object was selected as the first clustering center. The max-min distance method was used to search the other best cluster centers, at the same time, the cluster, which the object belongs to, was decided.

Findings. Clustering results are related to the selection of parameters θ. If the sample distribution is unknown, only test method can be used through multiple test optimization. With prior knowledge for the selection of θ, it can be converged quickly. Therefore, θ should be optimized.

Originality. This article proposes the new method based on the density to get the first initial clustering center, and then the new method based on the maximum and minimum value. The improved algorithm obtained through experimental analysis insures higher and stable accuracy.

Practical value. The experiments showed that the algorithm allows for automatic obtaining of the k clustering centers and have a higher clustering accuracy in unknown datasets processing.

Список литературы / References

1. Celebi, M.E., Kingravi, H.A. and Vela, P.A., 2013. A comparative study of efficient initialization methods for the k-means clustering algorithm. ExpertSystems with Applications, vol.40, no.1, pp. 200‒210.

2. Tran T.N. and Drab K., Daszykowski M., 2013. Revised DBSCAN algorithm to cluster data with dense adjacent clusters. Chemometrics and Intelligent Laboratory Systems, vol.120, pp.9296.

3. Chakraborty, S. and Nagwani, N.K. 2014. Analysis and study of Incremental DBSCAN clustering algorithm. Eprint ArXiv, vol.1406, no.4754, pp. 401‒410.

4. Smiti, A. and Eloudi, Z. 2013., Soft DBSCAN: Improving DBSCAN Clustering method using fuzzy set theory. In: Proc. of the 6thInternational Conf. On Human System Interaction (HSI), pp. 380385.

5. Onoda, T., Sakai, M. and Yamada, S.2012. Careful seeding method based on independent components analysis for k-means clustering. Journal of Emerging Technologies in Web Intelligence, vol.4 no.1, pp. 5159.

6. Reddy, D., Jana, P.K. and Member, I.S., 2012. Initialization for K-means clustering using Voronoi diagram, Procedia Technology, vol.4, pp. 395400.

7. Zhang, Y.J. and Cheng, E. 2013. An optimized method for selection of the initial centers of k-means clustering.Integrated Uncertainty in Knowledge Modelling and Decision Making. Springer Berlin Heidelberg, pp. 149156.

8. Frank, A. and Asuncion A. 2012, UCI machine learning repository.Availableat: <http:// archive.ics.uci.edu/ml> (2012-05-20)

 

Files:
2016_02_Guangbin
Date 2016-06-21 Filesize 1.22 MB Download 940

Visitors

7669896
Today
This Month
All days
1519
73527
7669896

Guest Book

If you have questions, comments or suggestions, you can write them in our "Guest Book"

Registration data

ISSN (print) 2071-2227,
ISSN (online) 2223-2362.
Journal was registered by Ministry of Justice of Ukraine.
Registration number КВ No.17742-6592PR dated April 27, 2011.

Contacts

D.Yavornytskyi ave.,19, pavilion 3, room 24-а, Dnipro, 49005
Tel.: +38 (066) 379 72 44.
e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
You are here: Home Archive by issue 2016 Contents No.2 2016 Information technologies, systems analysis and administration Improved K-means algorithm automatix acquisiotion of initial clustering center