論文の公開元へ

書き出し

Refer/BibIX

RIS

BibTeX

TSV

Study on SVM Classifiers for Imbalanced Data Classification Using Quasi-Linear Kernel

Liang Peifeng 早稲田大学

2021.07.29

概要

The classification problem is a key part of machine learning. In real-world, most classification problems are nonlinear and complicated. When building a classification model, the datasets are generally assumed to be balanced and the target is to minimize the loss function to achieve high accuracy and good generalization. However, in most real-world applications, the datasets suffer from imbalance, which results in the learned classification model to have a decision boundary leaning to the minority class which causes many minority samples misclassified. This will result in poor performance for global classification, since the minority class which includes much fewer samples than other class is often much more important and valuable. Therefore, simply minimizing loss function to achieve high accuracy is unsuitable to classify the imbalanced datasets. Study on imbalanced data classification has been a very hot research issue in machine learning. In this thesis, support vector machine (SVM) based classifiers with high performance are developed for imbalanced datasets by taking advantages of the quasi-linear SVM.

Quasi-linear SVM is a model based on divide-and-conquer strategy. It uses a multi-local linear model to solve nonlinear classification problems. Unlike SVM with standard k- emel functions which may be considered as black-box models, the quasi-linear SVM first learns information of data structure as prior knowledge and then builds multi-local linear model with interpolation function or piecewise linear model to represent the nonlinear classification boundary. From a viewpoint of modeling, the quasi-linear SVM is a two-step modeling method for nonlinear classifications including model building and model optimization. In model building step, the information of data structure is learned and represented as a model in linear regression form. In model optimization step, SVM formulation is applied to optimizing the parameters of model. From the viewpoint of SVM, the quasi-linear SVM is an SVM with a quasi-linear kernel which includes k- emel composition and SVM optimization. In kernel composition step, the learned information is used to compose a kernel function. The second step can be considered as a common kernel SVM for further model optimization. Although the quasi-linear SVM is effective and flexible in classifying nonlinear datasets, it is based on balanced datasets. Simply applying the quasi-linear SVM to imbalanced data classification will not achieve desirable results. Therefore, the goals of this thesis are developing SVM classifiers with high performance for imbalanced datasets of three different scenarios by taking advantages of the quasi-linear SVM:1)a quasi-linear SVM classifier with local offset adjustment for solving within-class imbalance problem; 2) a quasi-linear

SVM classifier with oversampling in feature space to avoid overlapping problems; 3) a quasi-linear SVM classifier for one-class classification with high performance.

The dissertation contains five chapters as follows:

Chapter 1 briefly describes the problems of imbalanced data classification and SVM with quasi-linear kernel. Different problems in imbalanced data classification, such as within-class imbalance problem, overlapping problem caused in SMOTE method and the one-class classification problem are discussed.

Chapter 2 develops a quasi-linear SVM classifier with local offset adjustment for solving within-class imbalance problem by leveraging the multiple local linear models embedded in the quasi-linear SVM. Within-class imbalance problems often occur in the imbalanced data classification which worsen the imbalance distribution problem and increase the learning complexity. However, most existing methods for imbalanced data either focus on rectifying the between-class imbalance problem, which is insufficient and inappropriate in many different scenarios. The proposed method tries to solve this problem by developing a simple yet effective SVM classifier with local offset adjustment for imbalance classification problems. First, a geometry-based partitioning method is modified for imbalanced dataset to divide the input space into multiple linearly separable partitions along the potential separation boundary. Then an F-measure SVM is applied to estimate local offsets optimized in each local linear partition. Finally, by constructing a quasi-linear kernel based on the partitioning information, a quasi-linear SVM classifier with local offsets is constructed. On 14 real-world datasets, the proposed method outperforms traditional imbalanced classification methods without considering within- class imbalance problem such as weighted SVM (WSVM) and SVM with weighted harmonic mean (WHM) offset (The “win/tie/lose=14/0/0” and “13/0/1” for the F-score and Gmean indexes). Meanwhile, the proposed method has better performance than the previous solutions (local cluster with sampling methods) with the overall evaluation F-score and Gmean: “win/tie/lose= 12/0/2” and “10/0/4” correspondingly.

Chapter 3 develops a quasi-linear SVM classifier with oversampling in multi-linear feature space (MLFS) to solve overlapping problem caused in implementing SMOTE. S- MOTE is a useful and effective oversampling method to balance datasets by creating synthetic minority samples using linear interpolation. However, when dealing with nonlinear imbalanced datasets, SMOTE may create wrong samples falling inside majority class which causes the decision boundary to be spread further into majority class. To solve this problem, an MLFS is built based on the quasi-linear kernel which is composed from a pretrained neural network. Since data points are separated linearly in the MLFS, implementing SMOTE in the MLFS will not cause overlapping problem. By using the quasi-linear kernel, the proposed oversampling method avoids computing Euclidean distances among the samples directly when mapping the samples to the MLFS and oversampling minority class in the MLFS, which makes it easily to be applied to deal with high-dimensional datasets. The proposed method uses unsupervised learning for pretraining neural network, which make it possible to avoid considering the imbalance problem at the stage of pretraining the neural network. Finally, in order to implement oversampling minority class fast and effectively, the proposed method implements the SMOTE in kernel level to avoid computing very high dimensional feature vectors directly. On 18 real-world datasets, the proposed method is very competitive and outperforms the traditional SMOTE method on the overall evaluation metrics F-score and Gmean with “win/tie/lose= 18/0/0”,“12/0/6”. Comparing with other previous solutions, the proposed method also achieves better performance with “win/tie/lose= 16/0/2” and “14/0/4” for the F-score and Gmean indexes on the average.

Chapter 4 develops a quasi-linear SVM classifier for one-class classification with high performance by building a piecewise linear model in feature space. The proposed classifier builds a piecewise linear separation boundary in the feature space to separate data from the origin so as to capture a more compact region in the input space. The proposed one-class classifier with more compact region will decrease the probability of outlier objects falling inside the domain of the classifier and then achieve better performance. For this purpose, the input space is first divided into a group of partitions by using a partitioning mechanism of s% winner-take-all autoencoder. A gated linear network is then designed to implement a group of linear classifiers for each partition, in which the gate signals are generated from the autoencoder. By applying a one-class SVM formulation to optimize the parameter set of gated linear network, the one-class classifier is implemented in an exact same way as a standard one-class SVM with a quasi-linear kernel composed by using a base kernel with the gate signals. Numerical experiments on various real-world datasets demonstrate the effectiveness of the proposed method. The classification results of the proposed one-class classifier are 6twin/lose=13/l55 and “14/0” for F-score and accuracy indexes compare to traditional one-class SVM on 14 real-world datasets.

Chapter 5 concludes the dissertation. In this thesis, by taking advantage of the quasi- linear SVM, three SVM classifiers with high performance are developed for imbalanced data classification problems: a quasi-linear SVM classifier with local offset adjustment for solve within-class imbalance, a quasi-linear SVM classifier with oversampling in feature space to avoid overlapping problems and a quasi-linear SVM classifier for one- class classification with high performance. Numerical simulation results demonstrated the effectiveness of the proposed SVM classifiers for imbalanced data classifications.

論文の公開元へ

この論文で使われている画像

参考文献

[1] B. Zhou, B. Chen, and J. Hu, “Quasi-linear support vector machine for nonlinear classification,55IEICE Trans, on Fundamentals of Electronics, Communications and Computer Sciences, vol.97, no. 7, pp.1587-1594, 2014.

[2] W. Li, B. Zhou, B. Chen, and J. Hu, i6A geometry-based two-step method for nonlinear classification using quasi-linear support vector machine,55IEEJ Trans, on Electrical and Electronic Engineering, vol.12, no. 6, pp. 883-890, Nov., 2017.

[3] W. Li, B. Zhou, and J. Hu, “A deep neural network based quasi-linear kernel for support vector machines,,5 IEICE Trans, on Fundamentals of Electronics, Communications and Computer Sciences, vol.99, no.12, pp. 2558—2565, 2016.

[4] R Kang and S. Cho, “Eus svms: Ensemble of under-sampled svms for data imbalance problems,5, in Proc, of International Conference on Neural Information Processing (ICONIP 2006). Berlin: Springer, 2006, pp. 837-846.

[5] Z. Ghahramani, “Unsupervised learning,55 in Advanced Lectures on Machine Learning. Springer, 2004, pp. 72-112.

[6] J. C. Fernandez, S. Salcedo-Sanz, P. A. Gutierrez, E. Alexandre, and C. Hervas- Martinez, ""'Significant wave height and energy flux range forecast with machine learning classifiers,,5 Engineering Applications of Artificial Intelligence, vol.43, pp. 44-53, 2015.

[7] B. K. Panigrahi, R K. Dash, and J. Reddy, “Hybrid signal processing and machine intelligence techniques for detection, quantification and classification of power quality disturbances/5 Engineering Applications of Artificial Intelligence, vol.22, no. 3, pp. 442—454, 2009.

[8] F. Wang, Z. Zhen, Z. Mi, H. Sun, S. Su, and G. Yang, “Solar irradiance feature extraction and support vector machines based weather status pattern recognition model for short-term photovoltaic power forecasting,55 Energy and Buildings, vol.86, pp. 427438, 2015.

[9] N. Japkowicz, ""'Learning from imbalanced data sets: a comparison of various strategies,5, in AAAI workshop on learning from imbalanced data sets, vol.68. Menlo Park, CA, 2000, pp. 10-15.

[10] X. Guo, Y. Yin, C. Dong, G. Yang, and G. Zhou, “On the class imbalance problem,,5 in Proc, of fourth international conference on natural computation (IC- NC’2008),vol.4. IEEE, 2008, pp.192-201.

[11] G. H. Nguyen, A. Bouzerdoum, and S. L. Phung, “Learning pattern classification tasks with imbalanced data sets,55 in Pattern recognition. In-Teh, 2009, pp. 193- 208.

[12] C. Huang, Y Li, C. Chen, and X. Tang, “Learning deep representation for imbalanced classification/5 in Proc, of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'16), Las Vgas, USA, 2016, pp. 5375-5384.

[13] M. Buda, A. Maki, and M. A. Mazurowski, “A systematic study of the class imbalance problem in convolutional neural networks,"" Neural Networks, vol. 106, pp. 249-259, 2018.

[14] Y Sun, A. Wong, and M. Kamel, ''Classification of imbalanced data: A review,55 International journal of pattern recognition and artificial intelligence, vol.23, no. 4, pp. 687-719, 2009.

[15] H. Guo, Y. Li, J. Shang, M. Gu, Y Huang, and B. Gong, '""Learning from class- imbalanced data: Review of methods and applications,55 Expert Systems with Applications, vol.73, pp. 220-239, 2017.

[16] A. Ali, S. M. Shamsuddin, and A. L. Ralescu, ''Classification with class imbalance problem: a review,55 International Journal of Advances in Soft Computing and its Applications, vol.7, no. 3, pp.176-204, 2015.

[17] J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance,55 Journal of Big Data, vol.6, no.1,pp. 1-54, 2019.

[18] A. Mahani and A. R. B. Ali, ''Classification problem in imbalanced datasets,55 in Recent Trends in Computational Intelligence. IntechOpen, 2019.

[19] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, 6iA review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches,,5 IEEE Trans, on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol.42, no. 4, pp. 463-484, 2012.

[20] V Ganganwar, ‘An overview of classification algorithms for imbalanced dataset- s,,s International Journal of Emerging Technology and Advanced Engineering, vol.2, no. 4, pp. 42-47, 2012.

[21] H. He and E. A. Garcia, ''Learning from imbalanced data,55 IEEE Trans, on Knowledge and Data Engineering, vol.21,no. 9, pp. 1263-1284, 2009.

[22] S. Pouyanfar, Y. Tao, A. Mohan, H. Tian, A. S. Kaseb, K. Gauen, R. Dailey, S. Aghajanzadeh, Y Lu, and S. Chen, “Dynamic sampling in convolutional neural networks for imbalanced data classification,55 in Proc, of IEEE conference on multimedia information processing and retrieval (MIPR 2018). Miami, USA: IEEE, 2018, pp. 112-117.

[23] D. Gupta and B. Richhariya, “Entropy based fuzzy least squares twin support vector machine for class imbalance learning,55 Applied Intelligence, vol.48, no.11,pp. 4212-^231,2018.

[24] R Perera and V M. Patel, '""Learning deep features for one-class classification,55 IEEE Trans, on Image Processing, vol.28, no.11,pp. 5450-5463, 2019.

[25] H. Zhang, H. Zhang, S. Pirbhulal,W. Wu, and V. H. C. D. Albuquerque, “Ac- tive balancing mechanism for imbalanced medical data in deep learning-based classification models,55 ACM Trans, on Multimedia Computing, Communications, and Applications (TOMM), vol.16, no. Is, pp.1-15, 2020.

[26] Fanny and T. W. Cenggoro, “Deep learning for imbalance data classification using class expert generative adversarial network/5 Procedia Computer Science, vol.135, pp. 60-67, 2018.

[27] Y Yan, M. Chen, M. Shyu, and S. Chen, “Deep learning for imbalanced multimedia data classification,55 in Proc, of IEEE international symposium on multimedia (ISM 2015). Miami USA: IEEE, 2015, pp. 483488.

[28] Ji. Lin, Q. Chen, and X. Qi, “Deep reinforcement learning for imbalanced classification,Applied Intelligence, pp. 1-15, 2020.

[29] S. Goyal, A. Raghunathan, M. Jain, H. V. Simhadri, and P. Jain, “Drocc: Deep robust one-class classification,55 arXiv preprint arXiv:2002.12718, 2020.

[30] Q. Dong, S. Gong, and X. Zhu, “Imbalanced deep learning by minority class incremental rectification,55 IEEE iron, on pattern analysis and machine intelligence^ vol.41,no. 6, pp. 1367-1381, 2018.

[31] H. Lee, M. Park, and J. Kim, “Plankton classification on imbalanced large scale database via convolutional neural networks with transfer learning,55 in Proc, of IEEE international conference on image processing (ICIP 2016). Vancouver, Canada: IEEE, 2016, pp. 3713-3717.

[32] S. H. Khan, M. Hayat, M. Bennamoun, F. Sohel, and R. Togneri, “Cost-sensitive learning of deep feature representations from imbalanced data,55 IEEE Trans, on Neural Networks and Learning Systems, vol.29, no. 8, pp. 3573-3587, 2017.

[33] S. Ando and C. Y. Huang, “Deep over-sampling framework for classifying imbalanced data,55 in Proc, of The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Skopje, 2017,pp. 770-785.

[34] Z. Zhu and Z. Song, “Fault diagnosis based on imbalance modified kernel fisher discriminant analysis,55 Chemical Engineering Research and Design, vol.88, no. 8, pp. 936-951,201 0.

[35] L. Ruff, R. Vandernieulen, N. Goemitz, L. Deecke, S. A. Siddiqui, A. Binder, E. Muller, and M. Kloft, “Deep one-class classification,55 in Proc, of International conference on machine learning (ICML 2018), Stockholm SWEDEN,2018, pp. 43934402.

[36] I. Golan and R. El-Yaniv, “Deep anomaly detection using geometric transformations,,5 in Proc, of Thirty-second Conference on Neural Information Processing Systems (NIPS 2018), Montreal, Canada, 2018, pp. 9758-9769.

[37] W. Khreich, E. Granger, A. Miri, and R. Sabourin, '""Iterative boolean combination of classifiers in the roc space: An application to anomaly detection with hmms,” Pattern Recognition, vol.43, no. 8, pp. 2732-2752, 2010.

[38] M. rravallaee, N. Stakhanova, and A. A. Ghorbani, “Toward credible evaluation of anomaly-based intrusion-detection methods,55 IEEE Trans, on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol.40, no. 5, pp. 516-524, 2010.

[39] H. Wang, Z. Cui, Y. Chen, M. Avidan, A. B. Abdallah, and A. Kronzer, ^Predicting hospital readmission via cost-sensitive deep learning,55IEEE/ACM trans, on computational biology and bioinformatics, vol.15, no. 6, pp. 1968-1978, 2018.

[40] M. A. Mazurowski, P. A. Habas, J. M. Zurada, J. Y. Lo, J. A. Baker, and G. D. Tourassi, “Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance,55 Neural networks, vol.21,no. 2-3, pp. 427-436, 2008.

[41] Q. T. Ngo and S. Yoon, “Facial expression recognition based on weighted-cluster loss and deep transfer learning using a highly imbalanced dataset,” Sensors, vol. 20(9), no. 2639, pp. 1-21,2020.

[42] N. Sarafianos, X. Xu, and I. A. Kakadiaris, “Deep imbalanced attribute classification using visual attention aggregation,55 in Proc, of the European Conference on Computer Vision (ECCV 2018), Munich,Germany, 2018, pp. 680-697.

[43] C. Huang, Y Li, C. L. Chen, and X. Tang, “Deep imbalanced learning for face recognition and attribute prediction,55 IEEE trans, on pattern analysis and machine intelligence, pp. 1-14, 2019.

[44] C. Padurariu and M. E. Breaban, “Dealing with data imbalance in text classification,55 Procedia Computer Science, vol. 159, pp. 736-745, 2019.

[45] A. Sun, E. Lim, and Y Liu, “On strategies for imbalanced text classification using svm: A comparative study,” Decision Support Systems, vol.48, no.1,pp. 191-201, 2009.

[46] M. Lango, “Tackling the problem of class imbalance in multi-class sentiment classification: An experimental study,55 Foundations of Computing and Decision Sciences, vol.44, no. 2, pp. 151-178, 2019.

[47] Y Li, G. Sun, and Y. Zhu, “Data imbalance problem in text classification,55 in Proc, of third International Symposium on Information Processing(ISIP 2010). Qingdao, China: IEEE, 2010, pp. 301-305.

[48] D. Galpert, S. Del Rio, F. Herrera, E. Ancede-Gallardo, A. Antunes, and G. Agiiero-Chapin, fiiAn effective big data supervised imbalanced classification approach for ortholog detection in related yeast species,55 BioMed research international, vol. 2015, 2015.

[49] X. Zhao, X. Li, L. Chen, and K. Aihara, “Protein classification with imbalanced data,’’ Proteins: Structure, function, and bioinformatics, vol.70, no. 4, pp. 1125- 1132, 2008.

[50] A. Al-Shahib, R. Breitling, and D. Gilbert, “Feature selection and the class imbalance problem in predicting protein function from sequence,,5 Applied Bioinformatics, vol.4, no. 3, pp. 195-203, 2005.

[51] N. V. Chawla, K. W. Bowyer, L. 0. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,5, Journal of Artificial Intelligence Research, vol.16, pp. 321-357, 2002.

[52] D. P. Williams, V. Myers, and M. S. Silvious, “Mine classification with imbalanced data,” IEEE Geoscience and Remote Sensing Letters, vol.6, no. 3, pp. 528-532, 2009.

[53] Y. Tang, Y. Zhang, N. V. Chawla, and S. Krasser, “SVMs modeling for highly imbalanced classification,,5 IEEE Trans, on Systems, Man and Cybernetics, Part B (Cybernetics), vol.39, no.1,pp. 281-288, 2009.

[54] H. M. Nguyen, E. W. Cooper, and K. Kamei, '""Borderline over-sampling for imbalanced data classification,55 International Journal of Knowledge Engineering and Soft Data Paradigms, vol.3, no.1,pp. 4—21,2011.

[55] S. Barua, M. M. Islam, X. Yao, and K. Murase, “MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning,,5 IEEE Trans, on Knowledge and Data Engineering, vol.26, no. 2, pp. 405一425, 2014.

[56] M. Perez-Ortiz, P. A. Gutierrez, P. Tino, and C. Hervas-Martmez, “0versampling the minority class in the feature space,” IEEE Trans, on Neural Networks and Learning Systems, vol.27, no. 9, pp. 1947-1961, 2016.

[57] V N. Vapnik, “An overview of statistical learning theory,55 IEEE trans, on neural networks, vol.10, no. 5, pp. 988-999, 1999.

[58] X. Liu, J. Wu, and Z. Zhou, “Exploratory undersampling for class-imbalance learning,55 IEEE Trans, on Systems, Man,and Cybernetics, Part B (Cybernetics), vol.39, no. 2, pp. 539-550, 2009.

[59] J. Van Hulse, T. M. Khoshgoftaar, and A. Napolitano, “Experimental perspectives on learning from imbalanced data,’7 in Proc, of the 24th international conference on Machine learning (ICML 2007). New York, USA: ACM, 2007, pp. 935-942.

[60] P. Hensman and D. Masko, “The impact of imbalanced training data for convolutional neural networks,55 Degree Project in Computer Science, KTH Royal Institute of Technology, 2015.

[61] H. Han, W. Wang, and B. Mao, “Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning,55 in Proc, of International Conference on Intelligent Computing. Hefei, China: Springer, 2005, pp. 878-887.

[62] C. Bunkhumpompat, K. Sinapiromsaran, and C. Lursinsap, “Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem,5, in Pacific-Asia conference on knowledge discovery and data mining. Springer, 2009, pp. 475—482.

[63] T. Jo and N. Japkowicz, “Class imbalances versus small di^juncts,” ACM Sigkdd Explorations Newsletter, vol.6, no.1,pp. 40-49, 2004.

[64] I. Nekooeimehr and S. K. Lai-Yuen, ‘""Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets,55 Expert Systems with Applications, vol.46, pp. 405-416, 2016.

[65] B. Zhou, W. Li, and J. Hu, ‘""A new segmented oversampling method for imbalanced data classification using quasi-linear support vector machine,,5 IEEJ Trans, on Electical and Electronic Engineering, vol.12, no. 2, pp. 133-145, 2017.

[66] Y. Sun, M. S. Kamel, A. K. Wong, and Y. Wang, ""'Cost-sensitive boosting for classification of imbalanced data,’’ Pattern Recognition, vol.40, no.12, pp. 3358- 3378, 2007.

[67] T. Razzaghi, ""'Cost-sensitive learning-based methods for imbalanced classification problems with applications,,5 Ph.D. dissertation, University of Central Florida, 2014.

[68] B. Li, J. Hu, and K. Hirasawa, “Support vector machine classifier with WHM offset for unbalanced data.” Journal of Advanced Computational Intelligence and Intelligent Informatics, vol.12, no.1,pp. 94—101, 2008.

[69] Y Huang and S. Du, uWeighted support vector machine for classification with uneven training class sizes,” in Proc, of 2005 International Conference on Machine Learning and Cybernetics (ICMLC 2005),vol.7. Guangzhou, China: IEEE, 2005, pp. 43654369.

[70] D. M. J. Tax, “One-class classification,,5 PhD dissertation series number 65, Delft University of Technology, 2001.

[71] D. M. Tax and R. P. Duin, “Support vector data description,55 Machine Learning, vol.54, no.1,pp. 45-66, 2004.

[72] B. Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, '""Estimating the support of a high-dimensional distribution,,5 Neural Computation, vol.13, no. 7, pp. 1443-1471, 2001.

[73] L. M. Manevitz and M. Yousef, “One-class SVMs for document classification,55 Journal of Machine Learning Research, vol.2, no. Dec, pp. 139-154, 2001.

[74] B. Krawczyk, ''Learning from imbalanced data: open challenges and future directions,55 Progress in Artificial Intelligence, vol.5, no. 4, pp. 221-232, 2016.

[75] N. V. Chawla, A. Lazarevic, L. 0. Hall, and K. W. Bowyer, 6tSmoteboost: Improving prediction of the minority class in boosting,55 in Proa of European conference on principles of data mining and knowledge discovery. Berlin, Heidelberg: Springer, 2003, pp. 107-119.

[76] H. Guo and H. L. Viktor, ''Learning from imbalanced data sets with boosting and data generation: the databoost-im approach,55 ACM Sigkdd Explorations Newsletter, vol.6, no.1,pp. 30-39, 2004.

[77] D. Mease, A. J. Wyner, and A. Buja, “Boosted classification trees and class prob- ability/quantile estimation,55 Journal of Machine Learning Research, vol.8, no. Mar, pp. 409439, 2007.

[78] W. Li, P. Liang, and J. Hu, “An autoencoder-based piecewise linear model for nonlinear classification using quasilinear support vector machines,55IEEJ Trans. on Electrical and Electronic Engineering, vol.14, no. 8, pp.1236——1243, Aug., 2019.

[79] , “Non-local information for a mixture of multiple linear classifiers,5, in Proc, of IEEE International Joint Conference on Neural Networks (IJCN- N’2017). Anchorage: IEEE, 2017, pp. 3741-3746.

[80] B. Scholkopf and A. J. Smola, Learning with kernels: support vector machines, regularization,optimization, and beyond. MIT press, 2002.

[81] C. Cortes and V. Vapnik, “Support-vector networks,55 Machine Learning, vol.20, no. 3, pp. 273-297, 1995.

[82] T. Hofmann, B. Scholkopf, and A. J. Smola, “Kernel methods in machine learning,,5 The Annals of Statistics, pp. 1171-1220, 2008.

[83] J. Vert and K. Tsuda, “A primer on kernel methods,55 Kernel Methods in Compu-t . 47, pp. 35-70, 2004.

[84] W. Li and J. Hu, “Geometric approach of quasi-linear kernel composition for support vector machine,55 in Proc, of 2015 International Joint Conference on Neural Networks (IJCNN2015). Killamey: IEEE, 2015, pp. 1-7.

[85] V Franc and V Hlavac, “An iterative algorithm learning the maximal margin classifier,Pattern Recognition, vol.36, no. 9, pp. 1985-1996, 2003.

[86] P. Liang, W. Li, and J. Hu, “Fast svm training using data reconstruction for classification of very large datasets,55IEEJ Trans, on Electrical and Electronic Engineering, vol.15, no. 3, pp. 372-381, March, 2020.

[87] Y LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks and applications in vision.55 in Proc, of International Symposium on Circuits and Systems (ISCAS2010), Paris, 2010, pp. 253-256.

[88] P. Liang, W. Li, D. Liu, and J. Hu, '""Large-scale image classification using fast svm with deep quasi-linear kernel,55 in Proc, of IEEE International Joint Conference on Neural Networks (IJCNN'2017). Anchorage: IEEE, 2017, pp. 1064— 1071.

[89] N. Japkowicz, '""Concept-learning in the presence of between-class and within- class imbalances/5 in Proc, of Conference of the Canadian Society for Computational Studies of Intelligence. Ottawa: Springer, 2001,pp. 67-77.

[90] S. S. Khan and M. G. Madden, (tA survey of recent trends in one class classification,55 in Proc, of the 18th Irish Conference on Artificial Intelligence and Cognitive Science. Dublin: Springer, 2009, pp. 188-197.

[91] R Liang, F. Zheng, W. Li, and J. Hu, “Quasi-linear svm classifier with segmented local offsets for imbalanced data classification,,5 IEEJ Trans, on Electrical and Electronic Engineering, vol.14, no. 2, pp. 289-296, Feb., 2019.

[92] D. R. Musicant, V. Kumar, and A. Ozgur, ""'Optimizing F-measure with support vector machines.55 in Proc, of the International Florida Artificial Intelligence Research Society Conference, Augustine, 2003, pp. 356-360.

[93] R Liang, W. Li, and J. Hu, “0versampling the minority class in a multi-linear feature space for imbalanced data classification,5, IEEJ Trans, on Electrical and Electronic Engineering, vol.13, no.10, pp. 1483-1491, Oct., 2018.

[94] N. V. Chawla, N. Japkowicz, and A. Kotcz, “Editorial: special issue on learning from imbalanced data sets/5 ACM Sigkdd Explorations Newsletter, vol.6, no.1, pp. 1-6, 2004.

[95] N. Japkowicz, ""'Supervised versus unsupervised binary-learning by feedforward neural networks,55 Machine Learning, vol.42, no.1,pp. 97-122, 2001.

[96] R. C. Prati, G. E. Batista, and M. C. Monard, “Class imbalances versus class overlapping: an analysis of a learning system behavior,55 in Proc, of Mexican International Conference on Artificial Intelligence, vol.4. Mexico: Springer, 2004, pp. 312-321.

[97] N. Japkowicz and S. Stephen, “The class imbalance problem: A systematic s- tudy,” Intelligent Data Analysis, vol.6, no. 5, pp. 429—449, 2002.

[98] G. E. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM Sigkdd Explorations Newsletter, vol.6, no.1,pp. 20-29, 2004.

[99] J. Wu, H. Xiong, R Wu, and J. Chen, “Local decomposition for rare class analysis,55 in Proc, of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-07). San Jose: ACM, 2007, pp. 814—823.

[100] D. A. Cieslak and N. V. Chawla, “Start globally, optimize locally, predict globally: Improving performance on imbalanced data,” in Proc, of Eighth IEEE International Conference on Data Mining (ICD ). Pisa: IEEE, 2008, pp.143-152.

[101] A. Rakotomamonjy, ""Optimizing area under roc curve with svms.” in Proc, of ROC Analysis in Artificial Intelligence, Valencia, 2004, pp. 71-80.

[102] C. Chang and C. Lin, “LIBSVM: a library for support vector machines,ACM Trans, on Intelligent Systems and Technology (TIST), vol.2, no. 3, pp. 1-27,2011, software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[103] L. Buitinck, G. Louppe, M. Blondel,F. Pedregosa, A. Mueller, O. Grisel,V Nic- ulae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, and G. Varoquaux, “API design for machine learning software: experiences from the scikit-leam project,in ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 2013, pp. 108-122.

[104] J. Davis and M. Goadrich, “The relationship between precision-recall and ROC curves,,5 in Proc, of the 23rd International Conference on Machine Learning (ICML2006). Pittsburgh: ACM, 2006, pp. 233-240.

[105] M. Kubat and S. Matwin, ''Addressing the curse of imbalanced training sets: one-sided selection,,5 in Proc, of the 14th International Conference on Machine Learning (ICML 1997), vol.97, Nashville, 1997, pp. 179-186.

[106] N. Ofek, L. Rokach, R. Stem, and A. Shabtai, “Fast-cbus: A fast clustering-based undersampling method for addressing the class imbalance problem,55 Neurocomputing, vol. 243, pp. 88-102, 2017.

[107] N. Thai-Nghe, Z. Gantner, and L. Schmidt-Thieme, “Cost-sensitive learning methods for imbalanced data,55 in Proc, of 2010 IEEE on International Joint Conference on Neural Networks (IJ CNN 2010). Barcelona: IEEE, 2010, pp.1-8.

[108] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning/5 in Proc, of IEEE International Joint Conference on Neural Networks (IJCNN 2008). Hongkong, China: IEEE, 2008, pp. 1322-1328.

[109] B. Wang and N. Japkowicz, “Imbalanced data set learning with synthetic samples,55 in Proc, of IRIS Machine Learning Workshop, Ottawa, 2004, pp. 1-19.

[110] B. Chen, W. Gu, and J. Hu, ‘""An improved multi-label classification method and its application to functional genomics,5, International Journal of Computational Biology and Drug Design, vol.3, no. 2, pp. 133-145, 2010.

[111] C. C. Aggarwal, A. Hinneburg, and D. A. Keim, “On the surprising behavior of distance metrics in high dimensional space,5, in Proc, of International Conference on Database Theory, London, UK, 2001, pp. 420-434.

[112] R. Blagus and L. Lusa, “SMOTE for high-dimensional class-imbalanced data,55 BMC Bioinformatics, vol.14, no. 106, pp. 1-16, 2013.

[113] V Vapnik, The nature of statistical learning theory. Springer Science & Business Media, 2013.

[114] H. Xiong, M. Swamy, and M. 0. Ahmad, ''Optimizing the kernel in the empirical feature space,55 IEEE Trans, on Neural Networks, vol.16, no. 2, pp. 460-474, 2005.

[115] S. Tokui, K. Oono, S. Hido, and J. Clayton, “Chainer: a next-generation open source framework for deep learning,55 in Proc, of The 29th Annual Conference on Neural Information Processing Systems (NIPS2015), Montreal, 2015.

[116] A. Makhzani and B. J. Frey, “Winner-take-all autoencoders,5, in Advances in Neural Information Processing Systems,2015, pp. 2791-2799.

[117] K. Simonyan and A. Zisserman, “Vbry deep convolutional networks for large- scale image recognition,5, arXiv preprint arXiv:1409.1556, 2014.

[118] S. S. Khan and M. G. Madden, “One-class classification: taxonomy of study and review of techniques,55 The Knowledge Engineering Review, vol.29, no. 3, pp. 345-374, 2014.

[119] J. H. M. Janssens, “Outlier selection and one-class classification,55 TiCC PhD Dissertation Series No. 27, Tilburg University, 2013.

[120] H. Yu, J. Han, and K. Chang, “PEBL: Web page classification without negative examples,55 IEEE Trans, on Knowledge and Data Engineering, vol.16, no.1,pp. 70-81, 2004.

[121] G. Cohen, M. Hilario, H. Sax, S. Hugonnet, C. Pellegrini, and A. Geissbiihler, “An application of one-class support vector machines to nosocomial infection detection.55 in Medinfo 2004, 2004, pp. 716-720.

[122] H. Shin, D. Eom, and S. Kim, “One-class support vector machines - an application in machine fault detection and classification,55 Computers & Industrial Engineering, vol.48, no. 2, pp. 395—408, 2005.

[123] Z. Zeng, Y. Fu, G. I. Roisman, Z. Wen, Y. Hu, and T. S. Huang, “One-class classification for spontaneous facial expression analysis,55 in Proc, of 7th International Conference on Automatic Face and Gesture Recognition (FGR 2006). Southampton: IEEE, 2006, pp. 281-286.

[124] H. Alashwal,S. Deris, and R. M. Othman, “One-class support vector machines for protein-protein interactions prediction,55 World Academy of Science, Engineering and Technology International Journal of Bioengineering and Life Sciences, vol.1,no. 3, pp.192-199, 2007.

[125] C. Desir, S. Bernard, C. Petitjean, and L. Heutte, “One class random forests,55 Pattern Recognition, vol.46, no.12, pp. 3490-3506, 2013.

[126] R. Duin, “On the choice of smoothing parameters for Parzen estimators of probability density functions,55 IEEE Tran, on Computers, vol.25, no.11,pp. 1175- 1179, 1976.

[127] G. Cohen, H. Sax, A. Geissbuhler et al., “Novelty detection using one-class parzen density estimator, an application to surveillance of nosocomial infection- s.” Studies in Health Technology and Informatics, vol. 136, pp. 21-26, 2008.

[128] E. M. Knorr, R. T. Ng, and V Tucakov, ""'Distance-based outliers: algorithms and applications,5, The International Journal on Very Large Data Bases, vol.8, no. 3-4, pp. 237-253, 2000.

[129] M. Jiang, S. Tseng, and C. Su, “Two-phase clustering process for outliers detection,5, Pattern Recognition Letters, vol.22, no. 6-7, pp. 691-700, 2001.

[130] L. Manevitz and M. Yousef, “One-class document classification via neural networks,55 Neurocomputing, vol.70, no. 7-9, pp. 1466-1481, 2007.

[131] E. Pekalska, D. M. Tax, and R. Duin, “One-class LP classifiers for dissimilarity representations/5 in Proc, of Advances in Neural Information Processing Systems (NIPS 2003), Vancouver, 2003, pp. 777-784.

[132] L. E. Ghaoui, M. I. Jordan, and G. R. Lanckriet, “Robust novelty detection with single-class mpm,” in Proc, of Advances in Neural Information Processing Systems, Vancouver, 2003, pp. 929-936.

[133] S. Yin, X. Gao, H. R. Karimi, and X. Zhu, “Study on support vector machinebased fault detection in tennessee eastman process,55 in Abstract and Applied Analysis, vol. 2014. Hindawi, 2014.

[134] P. Liang, W. Li, H. Tian, and J. Hu, “One-class classification using a support vector machine with a quasi-linear kernel,’’ IEEJ Trans, on Electrical and Electronic Engineering, vol.14, no. 3, pp. 449—456, March, 2019.

[135] G. E. Hinton and R. R. Salakhutdinov, ''Reducing the dimensionality of data with neural networks/5 Science, vol. 313, no. 5786, pp. 504—507, 2006.

[136] A. Ng, “Sparse autoencoder,55 CS294A Lecture notes, 2011.

[137] A. Makhzani and B. Frey, “K-sparse autoencoders,55 arXiv preprint arX- iv:1312.5663, 2013.

[138] L. Ladicky and P. Torr, “Locally linear support vector machines,55 in Proc, of the 28th International Conference on Machine Learning (ICML 2011),Bellevue, 2011, pp. 985-992.

[139] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,5, in Proc, of the Fourteenth International Conference on Artificial Intelligence and Statistics, Lauderdale, 2011, pp. 315-323.

[140] D. M. Tax and K. Muller, “Feature extraction for one-class classification,55 in Proc, of Joint International Conference on Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP 2003). Istanbul: Springer, 2003, pp. 342-349.

[141] D. Tax, “Data description toolbox dd tools 2.1.1,’’ Delft University of Technology, Delft, The Netherlands, Tech. Rep, 2014.

参考文献をもっと見る

分野

大学

学位論文種類・取得年

言語

Study on SVM Classifiers for Imbalanced Data Classification Using Quasi-Linear Kernel

概要

この論文で使われている画像

関連論文

Indexing complex networks for fast attributed kNN queries

S-SOM v1.0: a structural self-organizing map algorithm for weather typing

Generalization of Bounded Linear Logic and its Categorical Semantics

Summarization and Visualization of Movement Trajectories

Visualization of Networks Including Categorized Nodes for Summarization and Comparative Analysis

参考文献

分野

大学

学位論文種類・取得年

言語

コピーが完了しました

URLをコピーしました

Study on SVM Classifiers for Imbalanced Data Classification Using Quasi-Linear Kernel

概要

この論文で使われている画像

関連論文

Indexing complex networks for fast attributed kNN queries

S-SOM v1.0: a structural self-organizing map algorithm for weather typing

Generalization of Bounded Linear Logic and its Categorical Semantics

Summarization and Visualization of Movement Trajectories

Visualization of Networks Including Categorized Nodes for Summarization and Comparative Analysis

参考文献