Adaptive normalized weighted KNN text classification based on PSO
- Details
- Category: Information technologies, systems analysis and administration
- Last Updated on 02 April 2016
- Published on 02 April 2016
- Hits: 5196
Authors:
Wu Fenlin, Xiamen Medical College, Xiamen, China
Zheng Yifei, Xiamen Medical College, Xiamen, China
Wang Cheng, HuaQiao University, Xiamen, China
Abstract:
Purpose. Classical KNN text classifier has some shortcomings such as consistent weights of each characteristics, which causes low classification accuracy and high feature dimension that leads the program to run too much time when facing a large datasets. To solve these problems, a normalized feature weighted KNN text classifier was proposed (it was named NPSOKNN algorithm).
Methodology. The overall accuracy of classifier was used as the global optimization goal of feature weights. PSO was used to search the global optimization feature weights. In order to reduce the number of features and time cost of KNN text classifier, we set a threshold to drop the features that are lower than the threshold value.
Findings. We first got the global optimization feature weights, and then by using these weights and feature reduction method, we obtained a new feature vector, the dimension number of which is much smaller than the original text vector and with high accuracy of text classification.
Originality. We made a study of the improvement of the text classifier by using improved PSO and KNN. We discussed normalized feature weights, weighted distance calculation function, and feature dimension reduction. The research on this aspect has not been found at present.
Practical value. The 10-fold cross-validation experimental results showed that the average accuracy of NPSOKNN is higher than that of the classical KNN in text classifier, and the time cost was reduced significantly because of features reducing.
References:
1. Kulkarni, S.R. and Posner, S.E., 1995. Rates of convergence of nearest neighbor estimation under arbitrary sampling. IEEE Information Theory, vol. 41, no. 4, pp. 1028−1039.
2. Fan, J. and Lv, J.A., 2010. Selective overview of variable selection in high dimensional feature space. Statistica Sinica, vol. 20, no. 1, pp. 101−148.
3. Chen, J., Huang, H., Tian, S. and Qu, Y., 2009. Feature selection for text classification with Naïve Bayes. Expert Systems with Applications, vol. 36, no. 3, pp. 5432−5435.
4. Sun, W.M.B., 2012. On relationship between probabilistic rough set and Bayesian risk decision over two universes. International Journal of General Systems, vol. 41, no. 3, pp. 225−245.
5. Liang, J., Wang, F., Dang, C. and Qian, Y. 2014. A group incremental approach to feature selection applying rough set technique. IEEE Knowledge and Data Engineering, vol. 26, no. 2, pp. 294−308.
6. Chen, C.L., Tseng, F.S.C. and Liang, T., 2011. An integration of fuzzy association rules and WordNet for document clustering, Knowledge & Information Systems, vol. 28, no. 3, pp. 687−708.
7. Uysal, A.K., and Serkan, G., 2012. A novel probabilistic feature selection method for text classification. Knowledge-Based Systems, vol. 36, pp. 226−235.
8. Confalonieri, R., Bregaglio, S. and Acutis, M. 2010. A proposal of an indicator for quantifying model robustness based on the relationship between variability of errors and of explored conditions. Ecological Modelling, vol. 221, no. 6, pp. 960–964.
9. Mao, Yu-Xing, Chen, Tong-Bing and Shi, Bai-Le, 2011. Efficient method for mining multiple-level and generalized association rules. Journal of Software, vol. 22, no. 12, pp. 2965−2980.
2016_01_fenlin | |
2016-04-02 552.04 KB 1067 |
Newer news items:
- Image edge detection based on hybrid ant colony algorithm - 02/04/2016 22:11
- Optimizing feed-forward neural network weight based on orthogonal genetic algorithm - 02/04/2016 22:08
- Application of self-adaptive dynamic niche genetic algorithm in global multimodal optimization problems - 02/04/2016 22:06
- A self-adaptive generic IMM data fusion algorithm - 02/04/2016 22:02
- Genetic-bee colony dual-population self-adaptive hybrid algorithm based on information entropy - 02/04/2016 21:58