Oversampling the minority class in the feature space

Hits: 6869
Research areas:
Year:
2016
Type of Publication:
Article
Keywords:
Over-sampling, imbalanced classification, kernel methods, empirical feature space, support vector machines
Authors:
Journal:
IEEE Transactions on Neural Networks and Learning Systems
Volume:
27
Number:
9
Pages:
1947-1961
Month:
September
ISSN:
2162-237X
BibTex:
Note:
JCR(2016): 6.108 Position: 3/104 (Q1) Category: COMPUTER SCIENCE, THEORY & METHODS
Abstract:
The imbalanced nature of some real-world data is one of the current challenges for machine learning researchers. One common approach over-samples the minority class through convex combination of its patterns. We explore the general idea of synthetic over-sampling in the feature space induced by a kernel function (as opposed to input space). If the kernel function matches the underlying problem, the classes will be linearly separable and synthetically generated patterns will lie on the minority class region. Since the feature space is not directly accessible, we use the empirical feature space (a Euclidean space isomorphic to the feature space) for over-sampling purposes. The proposed method is framed in the context of support vector machines where imbalanced datasets can pose a serious hindrance. The idea is investigated in three scenarios: 1) over-sampling in the full and reduced-rank empirical feature spaces; 2) a kernel learning technique maximising the data class separation to study the influence of the feature space structure (implicitly defined by the kernel function); 3) a unified framework for preferential over-sampling that spans some of the previous approaches in the literature. We support our investigation with extensive experiments over $50$ imbalanced datasets.
Comments:
JCR(2016): 6.108 Position: 3/104 (Q1) Category: COMPUTER SCIENCE, THEORY & METHODS
Back