Mechanistically Interpretable QSAR and Machine Learning Models for Predicting Ames TA98 + S9 Mutagenicity of Aromatic and Heteroaromatic Amines

Lihui Xin

Apr 15

Written By

Lihui Xin

Co-Presenters: Individual Presentation

College: Hennings College of Science Mathematics and Technology

Major: BA.BIOLOGY

Faculty Research Mentor: Kar, Supratik

Abstract:

Chemical mutagenicity remains a major bottleneck in pharmaceutical discovery and environmental risk assessment because experimental testing is slow, costly, and dependent on biological materials. This study focuses on the Ames TA98 + S9 endpoint (Salmonella typhimurium TA98 with metabolic activation), a critical assay for identifying frame shift mutagens among aromatic and heteroaromatic amines that require bioactivation. Using a curated dataset of 305 compounds enriched in aromatic/heteroaromatic amines and their derivatives, we developed mechanistically interpretable regression based QSAR and machine learning models to support screening, prioritization, and confidence guided decision making. The best-performing QSAR model was a multiple linear regression equation comprising nine descriptors that capture (i) N-O topological patterns, (ii) detour matrix connectivity, (iii) secondary/tertiary nitrogen edge distance relationships, and (iv) polarizability weighted autocorrelations. These features provide interpretable links to metabolic activation potential and DNA interaction propensity. Internal validation showed strong fitness and robustness (R2 = 0.75; Q2Loo = 0.73). External testing remained predictive (R2 = 0.72; Q²F1/R2pred = 0.72), and the model satisfied standard Golbraikh–Tropsha criteria, while an MAE-based diagnostic indicated scope to further reduce error (MAE95 = 0.60). The combined QSAR and ML models were subsequently applied to a large external library of heteroaromatic amines to fill mutagenicity data gaps, prioritize candidates for confirmatory testing, and support safer chemical design and regulatory screening. The next step includes deployment as an open source application with applicability domain flags and batch prediction capabilities for community use.

College: Hennings College of Science Mathematics and TechnologyFaculty Mentor: Kar, SupratikMajor: BA-BIOLOGY

Mechanistically Interpretable QSAR and Machine Learning Models for Predicting Ames TA98 + S9 Mutagenicity of Aromatic and Heteroaromatic Amines

Lihui Xin

Abstract:

Beach Performance 10 Months after Nourishment at Ortley Beach, New Jersey

Wells Fargo to pay $1B to settle shareholder lawsuit over fake accounts scandal