relation: http://repositorio.unini.edu.mx/id/eprint/17885/ canonical: http://repositorio.unini.edu.mx/id/eprint/17885/ title: Dual-modality fusion for mango disease classification using dynamic attention based ensemble of leaf & fruit images creator: Mohsin, Muhammad creator: Hashmi, Muhammad Shadab Alam creator: Delgado Noya, Irene creator: Garay, Helena creator: Abdel Samee, Nagwan creator: Ashraf, Imran subject: Alimentación description: Mango is one of the most beloved fruits and plays an indispensable role in the agricultural economies of many tropical countries like Pakistan, India, and other Southeast Asian countries. Similar to other fruits, mango cultivation is also threatened by various diseases, including Anthracnose and Red Rust. Although farmers try to mitigate such situations on time, early and accurate detection of mango diseases remains challenging due to multiple factors, such as limited understanding of disease diversity, similarity in symptoms, and frequent misclassification. To avoid such instances, this study proposes a multimodal deep learning framework that leverages both leaf and fruit images to improve classification performance and generalization. Individual CNN-based pre-trained models, including ResNet-50, MobileNetV2, EfficientNet-B0, and ConvNeXt, were trained separately on curated datasets of mango leaf and fruit diseases. A novel Modality Attention Fusion (MAF) mechanism was introduced to dynamically weight and combine predictions from both modalities based on their discriminative strength, as some diseases are more prominent on leaves than on fruits, and vice versa. To address overfitting and improve generalization, a class-aware augmentation pipeline was integrated, which performs augmentation according to the specific characteristics of each class. The proposed attention-based fusion strategy significantly outperformed individual models and static fusion approaches, achieving a test accuracy of 99.08%, an F1 score of 99.03%, and a perfect ROC-AUC of 99.96% using EfficientNet-B0 as the base. To evaluate the model’s real-world applicability, an interactive web application was developed using the Django framework and evaluated through out-of-distribution (OOD) testing on diverse mango samples collected from public sources. These findings underline the importance of combining visual cues from multiple organs of plants and adapting model attention to contextual features for real-world agricultural diagnostics. date: 2025-11 type: Artículo type: PeerReviewed format: text language: en rights: cc_by_nc_nd_4 identifier: http://repositorio.unini.edu.mx/id/eprint/17885/1/s41598-025-26052-7.pdf identifier: Artículo Materias > Alimentación Universidad Europea del Atlántico > Investigación > Producción Científica Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica Universidad Internacional Iberoamericana México > Investigación > Artículos y libros Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica Universidad de La Romana > Investigación > Producción Científica Abierto Inglés Mango is one of the most beloved fruits and plays an indispensable role in the agricultural economies of many tropical countries like Pakistan, India, and other Southeast Asian countries. Similar to other fruits, mango cultivation is also threatened by various diseases, including Anthracnose and Red Rust. Although farmers try to mitigate such situations on time, early and accurate detection of mango diseases remains challenging due to multiple factors, such as limited understanding of disease diversity, similarity in symptoms, and frequent misclassification. To avoid such instances, this study proposes a multimodal deep learning framework that leverages both leaf and fruit images to improve classification performance and generalization. Individual CNN-based pre-trained models, including ResNet-50, MobileNetV2, EfficientNet-B0, and ConvNeXt, were trained separately on curated datasets of mango leaf and fruit diseases. A novel Modality Attention Fusion (MAF) mechanism was introduced to dynamically weight and combine predictions from both modalities based on their discriminative strength, as some diseases are more prominent on leaves than on fruits, and vice versa. To address overfitting and improve generalization, a class-aware augmentation pipeline was integrated, which performs augmentation according to the specific characteristics of each class. The proposed attention-based fusion strategy significantly outperformed individual models and static fusion approaches, achieving a test accuracy of 99.08%, an F1 score of 99.03%, and a perfect ROC-AUC of 99.96% using EfficientNet-B0 as the base. To evaluate the model’s real-world applicability, an interactive web application was developed using the Django framework and evaluated through out-of-distribution (OOD) testing on diverse mango samples collected from public sources. These findings underline the importance of combining visual cues from multiple organs of plants and adapting model attention to contextual features for real-world agricultural diagnostics. metadata Mohsin, Muhammad; Hashmi, Muhammad Shadab Alam; Delgado Noya, Irene; Garay, Helena; Abdel Samee, Nagwan y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, irene.delgado@uneatlantico.es, helena.garay@uneatlantico.es, SIN ESPECIFICAR, SIN ESPECIFICAR (2025) Dual-modality fusion for mango disease classification using dynamic attention based ensemble of leaf & fruit images. Scientific Reports, 15 (1). ISSN 2045-2322 relation: http://doi.org/10.1038/s41598-025-26052-7 relation: doi:10.1038/s41598-025-26052-7 language: en