Ensemble classification algorithm based improved SMOTE for imbalanced data
- Details
- Category: Information technologies, systems analysis and administration
- Last Updated on 23 June 2016
- Published on 23 June 2016
- Hits: 4122
Authors:
Liu Ning, Shangluo University, Shangluo, China
Abstract:
Purpose. In practical application, the accuracy of the minority class is very important and the research on imbalanced data has become one of the most popular topics. In order to improve the classification performance for imbalanced data, the classification algorithm based on data sampling and integration technology for imbalanced data was proposed.
Methodology. Firstly, the traditional SMOTE algorithm was improved to K-SMOTE (an over-sampling method based on SMOTE and K-means). In K-SMOTE, the dataset was to perform clustering operation, and the interpolation operation was performed on the connection of the cluster center and the original data point. Secondly, ECA-IBD (an ensemble classification algorithm based improved SMOTE for imbalanced data) was proposed. In ECA-IBD, over-sampling was conducted by K-SMOTE, and random under-sampling was carried out to reduce the problem scale to form a new dataset. A number of weak classifiers were generated and integration techniques were used to form the final strong classifier.
Findings. Experiment was carried out on the UCI imbalanced dataset. The results showed that the proposed algorithm was effective by using the F-value and G-mean value as the evaluation indexes.
Originality. In the paper, we improved the SMOTE algorithm and combined over-sampling technology, under-sampling technology and boosting technology to solve the classification problem for imbalanced data.
Practical value. The proposed algorithm has important value in imbalanced data classification. It can be applied in the field of different kinds of imbalanced data classification, such as fault detection, intrusion detection, etc.
Список літератури / References
1. Napierała, K. and Stefanowski, J., 2015.Addressing imbalanced data with argument based rule learning. Expert Systems with Applications, vol.24, no.24, pp. 9468‒9481.
2. Ditzler, G. and Polikar, R.,2013. Incremental learning of concept drift from streaming imbalanced data. IEEE Transactions on Knowledge & Data Engineering, vol.25, no.10, pp. 2283‒2301.
3. Maldonado, S.andLópez, J., 2014. Imbalanced data classification using second-order cone programming support vector machines. Pattern Recognition, vol.47, no.5, pp.2070‒2079.
4. Barua, S., Islam, M.M. and Yao, X., 2014. MWMOTE-majority weighted minority-oversampling technique for imbalanced dataset learning. IEEE Transactions on Knowledge & Data Engineering, vol.26, no.2, pp.405‒425.
5. Castro, C.L.and Braga, A.P.,2013. Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Transactions on Neural Networks & Learning Systems, vol.24, no.6, pp.888‒899.
6. Maratea, A., Petrosino, A.and Manzo, M., 2014. Adjusted F-measure and kernel scaling for imbalanced data learning. Information Sciences, vol.257, no.257, pp.331–341.
7. Sun, Z., Song, Q. and Zhu, X., 2015.A novel ensemble method for classifying imbalanced data. Pattern Recognition, vol.48, no.5, pp.1623‒1637.
8. Galar, M., Fernández, A.andBarrenechea, E., 2013. EUSBoost: Enhancing ensembles for highly imbalanced datasets by evolutionary undersampling. Pattern Recognition, vol.46, no.12, pp.460‒3471.
9. Khoshgoftaar, T.M., Van Hulse, J. and Napolitano, A., 2011. Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Transactions on Systems Man and Cybernetics - Part a Systems and Humans, vol.41, no.3, pp.552‒568.
10. Ghazikhani, A., Monsefi, R. and Yazdi, H.S., 2013. Ensemble of online neural networks for non-stationary and imbalanced data streams.Neurocomputing, vol.122, pp.535‒544.
2016_02_Liu | |
2016-06-21 831.52 KB 921 |
Newer news items:
Older news items:
- A differential clustering algorithm based on elite strategy - 23/06/2016 21:54
- Method of Image Denoising Based on Sparse Representation and Adaptive dictionary - 23/06/2016 21:51
- Improved binaryanity-collision algorithm for RFID - 23/06/2016 21:49
- Similarity distance based approach for outlier detection by matrix calculation - 23/06/2016 21:47
- Formation of an automated traffic capacity calculation system of rail networks for freight flows of mining and smelting enterprises - 23/06/2016 21:42