In silico Modeling and Large-Scale Screening of PFASs Oral Toxicity against Rodents
Melanie Rios
Co-Presenters: Individual Presentation
College: Hennings College of Science Mathematics and Technology
Major: BA.BIOLOGY
Faculty Research Mentor: Kar, Supratik
Abstract:
Per- and polyfluoroalkyl substances (PFAS) constitute a large and structurally diverse chemical class for which mammalian acute oral toxicity data remain limited, hindering effective risk assessment and prioritization. This study develops transparent, reproducible quantitative Read-Across Structure-Activity Relationship (q RASAR) models for predicting rat and mouse oral toxicity (pLD50) using integrated structural and similarity-derived descriptors. Following endpoint harmonization and 1D/2D descriptor generation, Best Subset Selection and genetic algorithm-driven feature reduction were applied to derive compact, interpretable multilinear regression formulations. The final rat q RASAR combines three mechanistically relevant structural descriptors with a similarity concordance metric, achieving high goodness-of-fit and external predictivity (R2 = 0.92; Q2 = 0.90). The mouse model relies on four kernel similarity descriptors that capture neighborhood-driven toxicity patterns and demonstrate similarly strong performance (R2 = 0.93; Q2 = 0.90). Robustness was confirmed through applicability-domain (AD) evaluation and Y randomization. Both models were applied to over 2,500+ PFAS from the NORMAN SusDat database to fill critical toxicity data gaps. External predictions revealed that highly toxic candidates were enriched in heteroatom-containing, functionalized, or long chain PFAS structures, whereas lower-toxicity chemicals were dominated by simpler aliphatic perfluoroalkyl ethers and esters. Species specific toxicity patterns were evident, with the rat model exhibiting sharper structural discrimination and the mouse model reflecting stronger neighborhood-based effects. These results provide a scalable, regulatory aligned framework for screening, triage, and prioritization of PFAS, delivering ranked hazard indicators supported by AD-based reliability metrics. The models offer practical tools for data-gap filling while highlighting the need for expanded experimental datasets to further strengthen regulatory confidence.