Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble

Artículo Materias > Ingeniería
Materias > Psicología
Universidad Europea del Atlántico > Investigación > Producción Científica
Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Artículos y libros
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Abierto Inglés Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances. metadata Rizwan, Muhammad; Mushtaq, Muhammad Faheem; Rafiq, Maryam; Mehmood, Arif; Diez, Isabel de la Torre; Gracia Villar, Mónica; Garay, Helena y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, monica.gracia@uneatlantico.es, helena.garay@uneatlantico.es, SIN ESPECIFICAR (2024) Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble. Computers, Materials & Continua, 78 (2). pp. 2047-2066. ISSN 1546-2226

[img] Texto
TSP_CMC_37347.pdf
Available under License Creative Commons Attribution.

Descargar (861kB)

Resumen

Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances.

Tipo de Documento: Artículo
Palabras Clave: Depression classification; deep learning; FastText; machine learning
Clasificación temática: Materias > Ingeniería
Materias > Psicología
Divisiones: Universidad Europea del Atlántico > Investigación > Producción Científica
Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Artículos y libros
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Depositado: 14 Mar 2024 23:30
Ultima Modificación: 14 Mar 2024 23:30
URI: https://repositorio.unini.edu.mx/id/eprint/11264

Acciones (logins necesarios)

Ver Objeto Ver Objeto

<a class="ep_document_link" href="/10290/1/Influence%20of%20E-learning%20training%20on%20the%20acquisition%20of%20competences%20in%20basketball%20coaches%20in%20Cantabria.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Influence of E-learning training on the acquisition of competences in basketball coaches in Cantabria

The main aim of this study was to analyse the influence of e-learning training on the acquisition of competences in basketball coaches in Cantabria. The current landscape of basketball coach training shows an increasing demand for innovative training models and emerging pedagogies, including e-learning-based methodologies. The study sample consisted of fifty students from these courses, all above 16 years of age (36 males, 14 females). Among them, 16% resided outside the autonomous community of Cantabria, 10% resided more than 50 km from the city of Santander, 36% between 10 and 50 km, 14% less than 10 km, and 24% resided within Santander city. Data were collected through a Google Forms survey distributed by the Cantabrian Basketball Federation to training course students. Participation was voluntary and anonymous. The survey, consisting of 56 questions, was validated by two sports and health doctors and two senior basketball coaches. The collected data were processed and analysed using Microsoft® Excel version 16.74, and the results were expressed in percentages. The analysis revealed that 24.60% of the students trained through the e-learning methodology considered themselves fully qualified as basketball coaches, contrasting with 10.98% of those trained via traditional face-to-face methodology. The results of the study provide insights into important characteristics that can be adjusted and improved within the investigated educational process. Moreover, the study concludes that e-learning training effectively qualifies basketball coaches in Cantabria.

Producción Científica

Josep Alemany Iturriaga mail josep.alemany@uneatlantico.es, Álvaro Velarde-Sotres mail alvaro.velarde@uneatlantico.es, Javier Jorge mail , Kamil Giglio mail ,

Alemany Iturriaga

<a class="ep_document_link" href="/12747/1/sensors-24-03754%20%281%29.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Ultra-Wide Band Radar Empowered Driver Drowsiness Detection with Convolutional Spatial Feature Engineering and Artificial Intelligence

Driving while drowsy poses significant risks, including reduced cognitive function and the potential for accidents, which can lead to severe consequences such as trauma, economic losses, injuries, or death. The use of artificial intelligence can enable effective detection of driver drowsiness, helping to prevent accidents and enhance driver performance. This research aims to address the crucial need for real-time and accurate drowsiness detection to mitigate the impact of fatigue-related accidents. Leveraging ultra-wideband radar data collected over five minutes, the dataset was segmented into one-minute chunks and transformed into grayscale images. Spatial features are retrieved from the images using a two-dimensional Convolutional Neural Network. Following that, these features were used to train and test multiple machine learning classifiers. The ensemble classifier RF-XGB-SVM, which combines Random Forest, XGBoost, and Support Vector Machine using a hard voting criterion, performed admirably with an accuracy of 96.6%. Additionally, the proposed approach was validated with a robust k-fold score of 97% and a standard deviation of 0.018, demonstrating significant results. The dataset is augmented using Generative Adversarial Networks, resulting in improved accuracies for all models. Among them, the RF-XGB-SVM model outperformed the rest with an accuracy score of 99.58%.

Producción Científica

Hafeez Ur Rehman Siddiqui mail , Ambreen Akmal mail , Muhammad Iqbal mail , Adil Ali Saleem mail , Muhammad Amjad Raza mail , Kainat Zafar mail , Aqsa Zaib mail , Sandra Dudley mail , Jon Arambarri mail jon.arambarri@uneatlantico.es, Ángel Gabriel Kuc Castilla mail , Furqan Rustam mail ,

Siddiqui

<a class="ep_document_link" href="/12749/1/fnut-11-1083759.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

From by-products to new application opportunities: the enhancement of the leaves deriving from the fruit plants for new potential healthy products

In the last decades, the world population and demand for any kind of product have grown exponentially. The rhythm of production to satisfy the request of the population has become unsustainable and the concept of the linear economy, introduced after the Industrial Revolution, has been replaced by a new economic approach, the circular economy. In this new economic model, the concept of “the end of life” is substituted by the concept of restoration, providing a new life to many industrial wastes. Leaves are a by-product of several agricultural cultivations. In recent years, the scientific interest regarding leaf biochemical composition grew, recording that plant leaves may be considered an alternative source of bioactive substances. Plant leaves’ main bioactive compounds are similar to those in fruits, i.e., phenolic acids and esters, flavonols, anthocyanins, and procyanidins. Bioactive compounds can positively influence human health; in fact, it is no coincidence that the leaves were used by our ancestors as a natural remedy for various pathological conditions. Therefore, leaves can be exploited to manufacture many products in food (e.g., being incorporated in food formulations as natural antioxidants, or used to create edible coatings or films for food packaging), cosmetic and pharmaceutical industries (e.g., promising ingredients in anti-aging cosmetics such as oils, serums, dermatological creams, bath gels, and other products). This review focuses on the leaves’ main bioactive compounds and their beneficial health effects, indicating their applications until today to enhance them as a harvesting by-product and highlight their possible reuse for new potential healthy products.

Producción Científica

Lucia Regolo mail , Francesca Giampieri mail francesca.giampieri@uneatlantico.es, Maurizio Battino mail maurizio.battino@uneatlantico.es, Yasmany Armas Diaz mail , Bruno Mezzetti mail , Maria Elexpuru Zabaleta mail maria.elexpuru@uneatlantico.es, Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Kilian Tutusaus mail kilian.tutusaus@uneatlantico.es, Luca Mazzoni mail ,

Regolo

<a href="/12750/1/s41598-024-63831-0.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Efficient deep learning-based approach for malaria detection using red blood cell smears

Malaria is an extremely malignant disease and is caused by the bites of infected female mosquitoes. This disease is not only infectious among humans, but among animals as well. Malaria causes mild symptoms like fever, headache, sweating and vomiting, and muscle discomfort; severe symptoms include coma, seizures, and kidney failure. The timely identification of malaria parasites is a challenging and chaotic endeavor for health staff. An expert technician examines the schematic blood smears of infected red blood cells through a microscope. The conventional methods for identifying malaria are not efficient. Machine learning approaches are effective for simple classification challenges but not for complex tasks. Furthermore, machine learning involves rigorous feature engineering to train the model and detect patterns in the features. On the other hand, deep learning works well with complex tasks and automatically extracts low and high-level features from the images to detect disease. In this paper, EfficientNet, a deep learning-based approach for detecting Malaria, is proposed that uses red blood cell images. Experiments are carried out and performance comparison is made with pre-trained deep learning models. In addition, k-fold cross-validation is also used to substantiate the results of the proposed approach. Experiments show that the proposed approach is 97.57% accurate in detecting Malaria from red blood cell images and can be beneficial practically for medical healthcare staff.

Producción Científica

Muhammad Mujahid mail , Furqan Rustam mail , Rahman Shafique mail , Elizabeth Caro Montero mail elizabeth.caro@uneatlantico.es, Eduardo René Silva Alvarado mail eduardo.silva@funiber.org, Isabel de la Torre Diez mail , Imran Ashraf mail ,

Mujahid

<a class="ep_document_link" href="/12751/1/s12874-024-02249-8.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Feature group partitioning: an approach for depression severity prediction with class balancing using machine learning algorithms

In contemporary society, depression has emerged as a prominent mental disorder that exhibits exponential growth and exerts a substantial influence on premature mortality. Although numerous research applied machine learning methods to forecast signs of depression. Nevertheless, only a limited number of research have taken into account the severity level as a multiclass variable. Besides, maintaining the equality of data distribution among all the classes rarely happens in practical communities. So, the inevitable class imbalance for multiple variables is considered a substantial challenge in this domain. Furthermore, this research emphasizes the significance of addressing class imbalance issues in the context of multiple classes. We introduced a new approach Feature group partitioning (FGP) in the data preprocessing phase which effectively reduces the dimensionality of features to a minimum. This study utilized synthetic oversampling techniques, specifically Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN), for class balancing. The dataset used in this research was collected from university students by administering the Burn Depression Checklist (BDC). For methodological modifications, we implemented heterogeneous ensemble learning stacking, homogeneous ensemble bagging, and five distinct supervised machine learning algorithms. The issue of overfitting was mitigated by evaluating the accuracy of the training, validation, and testing datasets. To justify the effectiveness of the prediction models, balanced accuracy, sensitivity, specificity, precision, and f1-score indices are used. Overall, comprehensive analysis demonstrates the discrimination between the Conventional Depression Screening (CDS) and FGP approach. In summary, the results show that the stacking classifier for FGP with SMOTE approach yields the highest balanced accuracy, with a rate of 92.81%. The empirical evidence has demonstrated that the FGP approach, when combined with the SMOTE, able to produce better performance in predicting the severity of depression. Most importantly the optimization of the training time of the FGP approach for all of the classifiers is a significant achievement of this research.

Producción Científica

Tumpa Rani Shaha mail , Momotaz Begum mail , Jia Uddin mail , Vanessa Yélamos Torres mail vanessa.yelamos@funiber.org, Josep Alemany Iturriaga mail josep.alemany@uneatlantico.es, Imran Ashraf mail , Md. Abdus Samad mail ,

Shaha