A Comprehensive Review of AI Advancement Using testFAILS and testFAILS-2 for the Pursuit of AGI

Mengtian Lin

Co-Presenters: Guohao Yang, Dan Li, Christopher Paredes

College: The Dorothy and George Hennings College of Science, Mathematics and Technology

Major: Computer Science

Faculty Research Mentor: Yulia Kumar

Abstract:

This study is devoted to testing Artificial Intelligence Linguistic Systems (AILS) like ChatGPT. It builds upon the original testFAILS benchmarking framework [1] by introducing testFAILS‑2, a more comprehensive evaluation method for assessing the real-world performance and practical value of AILS [2]. A team of researchers tested more than 10 leading AI models across multiple domains, broadening the evaluation scope from traditional areas, such as multilingual text generation, pair programming, and security, to emerging fields like multimodal machine learning and emotional AI. This broader approach provides deeper insights into each model’s strengths and weaknesses under real-world conditions.Empirical results showed that ChatGPT-4o mini reached the highest user productivity and satisfaction score (85.37%), suggesting that certain models can effectively enhance efficiency and user experience in multilingual collaboration and automated code generation. A bot-based application trial further demonstrated these models’ adaptability and safety in educational and corporate environments. Future work will focus on expanding the dataset and exploring diverse industry use cases, while also reinforcing bias detection and security measures. By doing so, the study aims to offer a more systematic foundation for the development, deployment, and governance of AI systems, guiding progress toward AGI.[1] Kumar Y, Morreale P, Sorial P, Delgado J, Li JJ, Martins P. A Testing Framework for AI Linguistic Systems (testFAILS). Electronics. 2023; 12(14):3095. https://doi.org/10.3390/electronics12143095.[2] Kumar Y, Lin M, Paredes C, Li D, Yang G, Kruger D, Li JJ, Morreale P. A Comprehensive Review of AI Advancement Using testFAILS and testFAILS-2 for the Pursuit of AGI. Electronics. 2024; 13(24):4991. https://doi.org/10.3390/electronics13244991.

Previous
Previous

Foster Parents’ Experiences with and Barriers to Genetic Services for Medically Complex Children​

Next
Next

Revisiting Raising Families: A Mixed Methods Evaluation of Impact One-Year Later