A Comprehensive Review of AI Advancement Using testFAILS and testFAILS-2 for the Pursuit of AGI
Mengtian Lin
Co-Presenters: Guohao Yang, Dan Li, Christopher Paredes
College: The Dorothy and George Hennings College of Science, Mathematics and Technology
Major: Computer Science
Faculty Research Mentor: Yulia Kumar
Abstract:
This study is devoted to testing Artificial Intelligence Linguistic Systems (AILS) like ChatGPT. It builds upon the original testFAILS benchmarking framework [1] by introducing testFAILS‑2, a more comprehensive evaluation method for assessing the real-world performance and practical value of AILS [2]. A team of researchers tested more than 10 leading AI models across multiple domains, broadening the evaluation scope from traditional areas, such as multilingual text generation, pair programming, and security, to emerging fields like multimodal machine learning and emotional AI. This broader approach provides deeper insights into each model’s strengths and weaknesses under real-world conditions.Empirical results showed that ChatGPT-4o mini reached the highest user productivity and satisfaction score (85.37%), suggesting that certain models can effectively enhance efficiency and user experience in multilingual collaboration and automated code generation. A bot-based application trial further demonstrated these models’ adaptability and safety in educational and corporate environments. Future work will focus on expanding the dataset and exploring diverse industry use cases, while also reinforcing bias detection and security measures. By doing so, the study aims to offer a more systematic foundation for the development, deployment, and governance of AI systems, guiding progress toward AGI.[1] Kumar Y, Morreale P, Sorial P, Delgado J, Li JJ, Martins P. A Testing Framework for AI Linguistic Systems (testFAILS). Electronics. 2023; 12(14):3095. https://doi.org/10.3390/electronics12143095.[2] Kumar Y, Lin M, Paredes C, Li D, Yang G, Kruger D, Li JJ, Morreale P. A Comprehensive Review of AI Advancement Using testFAILS and testFAILS-2 for the Pursuit of AGI. Electronics. 2024; 13(24):4991. https://doi.org/10.3390/electronics13244991.