A fast DBSCAN algorithm for big data based on efficient density calculation

Hamid Saadatfar

نویسندگان	Hamid Saadatfar
نشریه	Expert Systems with Applications
شماره صفحات	117501-117501
شماره سریال	203
شماره مجلد	1
نوع مقاله	Full Paper
تاریخ انتشار	2022
رتبه نشریه	ISI
نوع نشریه	الکترونیکی
کشور محل چاپ	ایران
نمایه نشریه	ISI،JCR،Scopus

چکیده مقاله

Abstract: Today, data is being generated with a high speed. Managing large volume of data has become a challenge in the current age. Clustering is a method to analyze data that is generated in the Internet. Various approaches have been presented for data clustering until now. Among them, DBSCAN is a most well-known density-based clustering algorithm. This algorithm can detect clusters of different shapes and does not require prior knowledge about the number of clusters. A major part of the DBSCAN run-time is spent to calculate the distance of data from each other to find the neighbors of each sample in the dataset. The time complexity of this algorithm is O(n2); Therefore, it is not suitable for processing big datasets. In this paper, DBSCAN is improved so that it can be applied to big datasets. The proposed method calculates accurately each sample density based on a reduced set of data. This reduced set is called the operational set. This collection is updated periodically. The use of local samples to calculate the density has greatly reduced the computational cost of clustering. The empirical results on various datasets of different sizes and dimensions show that the proposed algorithm increases the clustering speed compared to recent related works while having similar accuracy as the original DBSCAN algorithm.

لینک ثابت مقاله

tags: Data Mining; Clustering; Big Data; DBSCAN Algorithm