Effective clustering of big text data: Empirical evaluation and comparative analysis

Hamed Vahdat; Nejad,Mohadese Jamalian

نویسندگان	Hamed Vahdat-Nejad,Mohadese Jamalian
نشریه	Egyptian Informatics Journal
شماره صفحات	1-6
شماره سریال	32
شماره مجلد	1
نوع مقاله	Full Paper
تاریخ انتشار	2025
رتبه نشریه	علمی - مروری
نوع نشریه	چاپی
کشور محل چاپ	قرقیزستان
نمایه نشریه	ISI،Scopus
کلید واژه ها	Text clustering, Big data, Feature extraction, Dimension reduction

چکیده مقاله

Currently, texts are the most important big data that are continuously generated by social network users. In comparison to classical datasets, big text datasets have two new characteristics: large data volume and variety, which are converted to vast vector spaces. As a result, available clustering methods that are used in previous studies do not work effectively with big text data. This paper proposes a novel indirect approach for clustering texts, which involves proposing and extracting features from text. The new heuristic approach enables the extraction of the most significant features from text, which can then be utilized to generate more efficient clustering. The main innovation of this research lies in proposing an indirect and exploratory approach for extracting key features from large-scale textual data, specifically addressing challenges related to high volume and diversity. This method enables more efficient and accurate clustering, outperforming classical methods based on clustering evaluation metrics. Comparisons reveal that the proposed text clustering approach outperforms classical methods commonly employed in recent research in terms of clustering evaluation criteria, including the Silhouette Score, Davis-Bouldin Index, and Dunn Index. This research is applicable in areas requiring analysis and categorization of large-scale textual data, such as sentiment analysis in social networks, text content classification in content management systems, clustering customer feedback for service improvement, and recommender systems.

لینک ثابت مقاله

حامد وحدت نژاد

دانشیار حامد وحدت نژاد

Effective clustering of big text data: Empirical evaluation and comparative analysis

چکیده مقاله