Authors | Hamid Saadatfar |
---|---|
Journal | Expert Systems with Applications |
Page number | 117501-117501 |
Serial number | 203 |
Volume number | 1 |
Paper Type | Full Paper |
Published At | 2022 |
Journal Grade | ISI |
Journal Type | Electronic |
Journal Country | Iran, Islamic Republic Of |
Journal Index | ISI،JCR،Scopus |
Abstract
Abstract: Today, data is being generated with a high speed. Managing large volume of data has become a challenge in the current age. Clustering is a method to analyze data that is generated in the Internet. Various approaches have been presented for data clustering until now. Among them, DBSCAN is a most well-known density-based clustering algorithm. This algorithm can detect clusters of different shapes and does not require prior knowledge about the number of clusters. A major part of the DBSCAN run-time is spent to calculate the distance of data from each other to find the neighbors of each sample in the dataset. The time complexity of this algorithm is O(n2); Therefore, it is not suitable for processing big datasets. In this paper, DBSCAN is improved so that it can be applied to big datasets. The proposed method calculates accurately each sample density based on a reduced set of data. This reduced set is called the operational set. This collection is updated periodically. The use of local samples to calculate the density has greatly reduced the computational cost of clustering. The empirical results on various datasets of different sizes and dimensions show that the proposed algorithm increases the clustering speed compared to recent related works while having similar accuracy as the original DBSCAN algorithm.
tags: Data Mining; Clustering; Big Data; DBSCAN Algorithm