Artículo Materias > Ingeniería <http://repositorio.unini.edu.mx/view/subjects/uneat=5Feng.html> Universidad Europea del Atlántico > Investigación > Producción Científica <http://repositorio.unini.edu.mx/view/divisions/uneatlantico=5Fproduccion=5Fcientifica.html>
Universidad Internacional Iberoamericana México > Investigación > Artículos y libros <http://repositorio.unini.edu.mx/view/divisions/uninimx=5Fproduccion=5Fcientifica.html>
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica <http://repositorio.unini.edu.mx/view/divisions/uninipr=5Fproduccion=5Fcientifica.html>
Universidad Internacional do Cuanza > Investigación > Producción Científica <http://repositorio.unini.edu.mx/view/divisions/unic=5Fproduccion=5Fcientifica.html>
Universidad de La Romana > Investigación > Producción Científica <http://repositorio.unini.edu.mx/view/divisions/uniromana=5Fproduccion=5Fcientifica.html> Abierto Inglés Autonomous unmanned aerial vehicles (UAVs) offer cost-effective and flexible solutions for a wide range of real-world applications, particularly in hazardous and time-critical environments. Their ability to navigate autonomously, communicate rapidly, and avoid collisions makes UAVs well suited for emergency response scenarios. However, real-time path planning in dynamic and unpredictable environments remains a major challenge, especially in confined tunnel infrastructures where accidents may trigger fires, smoke propagation, debris, and rapid environmental changes. In such conditions, conventional preplanned or model-based navigation approaches often fail due to limited visibility, narrow passages, and the absence of reliable localization signals. To address these challenges, this work proposes an end-to-end emergency response framework for tunnel accidents based on Multi-Agent Reinforcement Learning (MARL). Each UAV operates as an independent learning agent using an Independent Q-Learning paradigm, enabling real-time decision-making under limited computational resources. To mitigate premature convergence and local optima during exploration, Grey Wolf Optimization (GWO) is integrated as a policy-guidance mechanism within the reinforcement learning (RL) framework. A customized reward function is designed to prioritize victim discovery, penalize unsafe behavior, and explicitly discourage redundant exploration among agents. The proposed approach is evaluated using a frontier-based exploration simulator under both single-agent and multi-agent settings with multiple goals. Extensive simulation results demonstrate that the proposed framework achieves faster goal discovery, improved map coverage, and reduced rescue time compared to state-of-the-art GWO-based exploration and random search algorithms. These results highlight the effectiveness of lightweight MARL-based coordination for autonomous UAV-assisted tunnel emergency response. metadata ur Rehman, Hafiz Muhammad Raza; Gul, M. Junaid; Younas, Rabbiya; Jhandir, Muhammad Zeeshan; Álvarez, Roberto Marcelo; Miró Vera, Yini Airet y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, roberto.alvarez@uneatlantico.es, yini.miro@uneatlantico.es, SIN ESPECIFICAR     <http://repositorio.unini.edu.mx/id/eprint/27154/1/s41598-026-37191-w_reference.pdf>     (2026) End-to-end emergency response protocol for tunnel accidents augmentation with reinforcement learning.  Scientific Reports.   ISSN 2045-2322