Automatic Detection and Description Generation of Drones and Their Actions via YOLOv7 and LLMs
Gustavo Garcia-Vargas
Co-Presenters: Phil Ho Combatir
College: The Dorothy and George Hennings College of Science, Mathematics and Technology
Major: Computer Science
Faculty Research Mentor: Yulia Kumar
Abstract:
Drones' proliferation and diverse applications demand robust real-time detection and description systems to enhance safety, prevent collisions, and optimize autonomy in multi-drone environments. This study introduces an autonomous drone detection and description framework, leveraging the YOLOv7 object detection model integrated with advanced Large Language Models (LLMs). The system processes a curated dataset of 1,359 drone images from Kaggle, programmatically labeled using OpenAI's ChatGPT-4o via API. The labeling process employed a novel classification scheme proposed by leading LLMs, including Google’s Gemini-1.5-Pro-002. The annotated dataset was divided into training, testing, and validation subsets, enabling fine-tuning of the YOLOv7 model for superior detection performance. Initial results revealed suboptimal performance with pre-trained weights; however, fine-tuning significantly enhanced generalization and accuracy in diverse scenarios. Drone videos were analyzed by extracting frames at 30 frames per second, where detected drones were described using the LLM pipeline. These descriptions were overlaid onto the videos, providing an autonomous, real-time analysis of drone activity. Preliminary trials demonstrate high detection accuracy and computational efficiency, with ongoing experiments exploring alternative YOLO models to optimize results. This research establishes a foundation for AI-driven drone monitoring systems with applications in airspace management, regulatory compliance, and security. This study advances autonomous unmanned aerial vehicle (UAV) detection and situational awareness technologies by combining state-of-the-art object detection with LLM-driven descriptions.