Adaptive normalized weighted KNN text classification based on PSO

User Rating:  / 0
PoorBest 

Authors:

Wu Fenlin, Xiamen Medical College, Xiamen, China

Zheng Yifei, Xiamen Medical College, Xiamen, China

Wang Cheng, HuaQiao University, Xiamen, China

Abstract:

Purpose. Classical KNN text classifier has some shortcomings such as consistent weights of each characteristics, which causes low classification accuracy and high feature dimension that leads the program to run too much time when facing a large datasets. To solve these problems, a normalized feature weighted KNN text classifier was proposed (it was named NPSOKNN algorithm).

Methodology. The overall accuracy of classifier was used as the global optimization goal of feature weights. PSO was used to search the global optimization feature weights. In order to reduce the number of features and time cost of KNN text classifier, we set a threshold to drop the features that are lower than the threshold value.

Findings. We first got the global optimization feature weights, and then by using these weights and feature reduction method, we obtained a new feature vector, the dimension number of which is much smaller than the original text vector and with high accuracy of text classification.

Originality. We made a study of the improvement of the text classifier by using improved PSO and KNN. We discussed normalized feature weights, weighted distance calculation function, and feature dimension reduction. The research on this aspect has not been found at present.

Practical value. The 10-fold cross-validation experimental results showed that the average accuracy of NPSOKNN is higher than that of the classical KNN in text classifier, and the time cost was reduced significantly because of features reducing.

References:

1. Kulkarni, S.R. and Posner, S.E., 1995. Rates of convergence of nearest neighbor estimation under arbitrary sampling. IEEE Information Theory, vol. 41, no. 4, pp. 1028−1039.

2. Fan, J. and Lv, J.A., 2010. Selective overview of variable selection in high dimensional feature space. Statistica Sinica, vol. 20, no. 1, pp. 101−148.

3. Chen, J., Huang, H., Tian, S. and Qu, Y., 2009. Feature selection for text classification with Naïve Bayes. Expert Systems with Applications, vol. 36, no. 3, pp. 5432−5435.

4. Sun, W.M.B., 2012. On relationship between probabilistic rough set and Bayesian risk decision over two universes. International Journal of General Systems, vol. 41, no. 3, pp. 225−245.

5. Liang, J., Wang, F., Dang, C. and Qian, Y. 2014. A group incremental approach to feature selection applying rough set technique. IEEE Knowledge and Data Engineering, vol. 26, no. 2, pp. 294−308.

6. Chen, C.L., Tseng, F.S.C. and Liang, T., 2011. An integration of fuzzy association rules and WordNet for document clustering, Knowledge & Information Systems, vol. 28, no. 3, pp. 687−708.

7. Uysal, A.K., and Serkan, G., 2012. A novel probabilistic feature selection method for text classification. Knowledge-Based Systems, vol. 36, pp. 226−235.

8. Confalonieri, R., Bregaglio, S. and Acutis, M. 2010. A proposal of an indicator for quantifying model robustness based on the relationship between variability of errors and of explored conditions. Ecological Modelling, vol. 221, no. 6, pp. 960–964.

9. Mao, Yu-Xing, Chen, Tong-Bing and Shi, Bai-Le, 2011. Efficient method for mining multiple-level and generalized association rules. Journal of Software, vol. 22, no. 12, pp. 2965−2980.

 

Files:
2016_01_fenlin
Date 2016-04-02 Filesize 552.04 KB Download 1053

Visitors

7351305
Today
This Month
All days
580
40808
7351305

Guest Book

If you have questions, comments or suggestions, you can write them in our "Guest Book"

Registration data

ISSN (print) 2071-2227,
ISSN (online) 2223-2362.
Journal was registered by Ministry of Justice of Ukraine.
Registration number КВ No.17742-6592PR dated April 27, 2011.

Contacts

D.Yavornytskyi ave.,19, pavilion 3, room 24-а, Dnipro, 49005
Tel.: +38 (056) 746 32 79.
e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
You are here: Home Archive by issue 2016 Contents No.1 2016 Information technologies, systems analysis and administration Adaptive normalized weighted KNN text classification based on PSO