نویسندگان | Hamid Saadatfar |
---|---|
نشریه | Knowledge-Based Systems |
شماره صفحات | 108217-108217 |
شماره سریال | 241 |
شماره مجلد | 1 |
نوع مقاله | Full Paper |
تاریخ انتشار | 2022 |
رتبه نشریه | ISI |
نوع نشریه | الکترونیکی |
کشور محل چاپ | ایران |
نمایه نشریه | JCR،Scopus |
چکیده مقاله
Abstract - An imbalanced dataset consists of a majority class and a minority class, where the former’s sample size is substantially larger than other classes. This difference disrupts the data learning process and drives the learning algorithms into modeling the majority class. Data overlap can exacerbate the complicated problem of imbalanced datasets, a problem for which oversampling and undersampling approaches are adopted. This paper proposes two novel density-based algorithms in order to eliminate the overlap between two classes and the noise, as well as creating balance and normalizing the class distribution. The first algorithm employs an undersampling technique, whereas the second one uses undersampling and oversampling techniques simultaneously. These two algorithms delete high-density samples from the majority class and eliminate the noises in both classes. The two proposed algorithms and other popular algorithms were run on 16 imbalanced datasets that included a variety of scenarios. The datasets balanced by these algorithms were then modeled through Random Forest, and SVM classifiers. The models obtained from the two proposed algorithms outperformed the other algorithms in all criteria. These models also achieved a balance by maximum maintenance of the class structure and form, which protects the quality of learning from any detriment.
tags: Imbalanced dataset; Density; Undersampling; Oversampling; Overlapping