eprintid: 11264 rev_number: 9 eprint_status: archive userid: 2 dir: disk0/00/01/12/64 datestamp: 2024-03-14 23:30:43 lastmod: 2024-03-14 23:30:43 status_changed: 2024-03-14 23:30:43 type: article metadata_visibility: show creators_name: Rizwan, Muhammad creators_name: Mushtaq, Muhammad Faheem creators_name: Rafiq, Maryam creators_name: Mehmood, Arif creators_name: Diez, Isabel de la Torre creators_name: Gracia Villar, Mónica creators_name: Garay, Helena creators_name: Ashraf, Imran creators_id: creators_id: creators_id: creators_id: creators_id: creators_id: monica.gracia@uneatlantico.es creators_id: helena.garay@uneatlantico.es creators_id: title: Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble ispublished: pub subjects: uneat_eng subjects: uneat_ps divisions: uneatlantico_produccion_cientifica divisions: unincol_produccion_cientifica divisions: uninimx_produccion_cientifica divisions: uninipr_produccion_cientifica divisions: unic_produccion_cientifica full_text_status: public keywords: Depression classification; deep learning; FastText; machine learning abstract: Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances. date: 2024-02 publication: Computers, Materials & Continua volume: 78 number: 2 pagerange: 2047-2066 id_number: doi:10.32604/cmc.2024.037347 refereed: TRUE issn: 1546-2226 official_url: http://doi.org/10.32604/cmc.2024.037347 access: open language: en citation: Artículo Materias > Ingeniería Materias > Psicología Universidad Europea del Atlántico > Investigación > Producción Científica Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica Abierto Inglés Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances. metadata Rizwan, Muhammad; Mushtaq, Muhammad Faheem; Rafiq, Maryam; Mehmood, Arif; Diez, Isabel de la Torre; Gracia Villar, Mónica; Garay, Helena y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, monica.gracia@uneatlantico.es, helena.garay@uneatlantico.es, SIN ESPECIFICAR (2024) Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble. Computers, Materials & Continua, 78 (2). pp. 2047-2066. ISSN 1546-2226 document_url: http://repositorio.unini.edu.mx/id/eprint/11264/1/TSP_CMC_37347.pdf