More Signal, Less Noise: AIM-MASH AI Assist as a solution to the scoring variability problem in MASH clinical trials

The landscape of metabolic dysfunction-associated steatohepatitis (MASH) drug development reached a potentially pivotal turning point in 2025. This was the year that PathAI’s AIM-MASH AI Assist* digital pathology tool achieved landmark regulatory recognition, securing both EMA qualification and FDA Drug Development Tool (DDT) qualification [1,2]. These regulatory milestones have brought the MASH field closer to a new era where integration of AI-powered scoring tools into clinical trials provide the standardized, reproducible scoring necessary for accurate trial enrollment and endpoint assessment.

To understand the significance of this shift, this blog will review the historical evidence and challenges of manual pathology that necessitated this innovation, as well as the potential advantages provided by AIM-MASH AI Assist.

MASH histology scoring challenges

While liver biopsy assessment remains the gold standard for both trial enrollment and endpoint evaluation, opportunities remain to improve MASH histology scoring in the clinical trial setting.

A notable challenge in the field is scoring variability – inter-reader and intra-reader – even among expert liver pathologists [3-5]. In the clinical trial setting, this variability can have repercussions:

Enrollment: MASH trial enrollment typically relies on a single central reader. Scoring subjectivity and inconsistency can result in the improper inclusion or exclusion of patients from a trial [3]. If patients who do not meet the MASH enrollment criteria are included, the trial population will be diluted, obscuring responder data [3,6]. Furthermore, baseline biopsy re-evaluation for endpoint analysis frequently reveals patients to have not met the initial enrollment criteria, leading to their exclusion from primary analysis populations [7].

Treatment effect: Phase 2 MASH trials often use single central readers or dual readers with a third pathologist for adjudication [7,8]. The United States FDA recommends that Phase 3 MASH trials utilize dual readers with a third pathologist for adjudication [9], while some sponsors incorporate consensus panels of three pathologists into their study designs [10]. Despite the movement toward pathologist panels for endpoint evaluation in late-stage trials, scoring variability persists. It has been hypothesized that this variability can artificially lower the observed effect size, making potentially effective drugs appear unsuccessful [6,7,11].

Trial planning and comparison: The lack of scoring reliability complicates investigators’ ability to design late phase trials in MASH. Specifically, the variability of scoring complicates the use of early-phase trial data for appropriately powering later phase studies. Furthermore, this scoring variability can impact response rates in both placebo and treatment arms of MASH trials, complicating the comparison of effect size across therapeutic candidates.

A Solution: AI-powered MASH scoring

AIM-MASH is an AI-powered tool that yields predicted MASH CRN grades and stages from input whole slide images of MASH biopsies [6,13]. This tool was developed using an extensive dataset of ~8,700 H&E-stained slides and ~7,600 Masson Trichrome-stained slides from six completed phase 2b and phase 3 MASH clinical trials [6]. Roughly 60 expert MASH pathologists contributed over 100,000 annotations of relevant histologic substances – including tissue features, fibrosis, and regions of artifact – for model training. From these model predictions, AIM-MASH yields overlays highlighting pixel-level predictions of relevant histologic substances on input whole slide images, as well as predicted MASH CRN grades and stages for each cardinal MASH feature (Figure 1).

Screenshot 2026-05-21 at 4.02.46 PM

Given the known issues with manual scoring variability [3-5], we evaluated the repeatability and reproducibility of AIM-MASH (Table 1). Ten sequential deployments of AIM-MASH on a set of whole-slide images revealed that the algorithm was 100% repeatable for each of the four cardinal MASH histology features [6]. We also assessed the repeatability of AIM-MASH on WSIs from the same glass slides scanned multiple times on different days, as well as the reproducibility of AIM-MASH on WSIs from the same glass slides scanned at three different laboratories [12]. For each of the four cardinal MASH features, repeatability was at least 92% (range: 92.6-96.3%) and reproducibility was at least 84% (range: 84.7-91.2%). Given that one single site is typically used as a central pathology laboratory (including slide scanning), the repeatability results are especially representative of the potential reliability of AIM-MASH in trial settings. Furthermore, these repeatability and reproducibility values are all much higher than the mean pairwise agreement between pathologists, which vary between features but are between 45-70%.

Screenshot 2026-05-21 at 4.04.17 PM

Using AIM-MASH as an AI-Assist tool in clinical trials

The performance of AIM-MASH as a standalone tool was promising: non-inferior performance to manual pathologist scoring for steatosis and fibrosis and superior performance to manual pathologist scoring for lobular inflammation and hepatocellular ballooning, the most variable of the CRN score components [5,6,10-13].

In prospective clinical trials, however, the intended use of AIM-MASH is to assist pathologists with scoring for trial enrollment and endpoint assessment. In these settings, AIM-MASH AI Assist can support pathologists in a manner that 1) reduces inter-pathologist variability and 2) anchors pathologists to a consistent, validated definition of the MASH cardinal histologic characteristics, especially for features for which interpretation can be subjective. In other words, a human in the loop is necessary for the AIM-MASH clinical trial scoring workflow, preserving the function of clinicians to override algorithm outputs and assess the slides for any additional safety findings.

The EMA qualification was inclusive of an AI-assisted workflow for AIM-MASH, shown in Figure 2 [12]. Primary pathologists were tasked with reviewing AIM-MASH-derived scores. If AI-assisted pathologists agreed with the model score, the score was finalized. Secondary review was necessary if the pathologist and model disagreed by 2+ points, and consensus calls were needed if disagreement persisted after secondary review. Using this workflow, less than 2% of all cases require secondary review, and less than 1% require consensus calls, preserving the high level of precision demonstrated by the algorithm, which is key for detecting true response signals and for comparisons across trials.

Screenshot 2026-05-21 at 4.05.39 PM

Performance of AIM-MASH as an AI Assist tool

Using this workflow, we evaluated AIM-MASH as an AI assist tool in a large dataset (1,400 biopsies) from three MASH clinical trials, comparing AIM-MASH AI Assist to single unassisted pathologists and a 3-reader panel of unassisted pathologists [12].

AIM-MASH AI Assist vs. single unassisted pathologist: AIM-MASH assisted pathologist agreement with consensus ground-truth was non-inferior to the agreement of single unassisted pathologists across all four cardinal MASH features; AIM-MASH assisted pathologists had greater agreement with ground truth than single unassisted pathologists for lobular inflammation and hepatocellular ballooning. Notably, the features for which AIM-MASH AI Assist achieved superiority are those that are the most susceptible to scoring variability, highlighting the potential of AIM-MASH to standardize scoring.
AIM-MASH AI Assist vs. panel of three unassisted pathologists: When assessing agreement with ground truth, the AIM-MASH assisted workflow achieved non-inferiority compared to the median of three unassisted reads. In other words, AIM-MASH-assisted pathologists can be considered equivalent to a multi-reader consensus panel for the scoring of MASH CRN grades and stages, potentially allowing these panels to be replaced by an AI-assisted workflow without sacrificing accuracy. Given that pathologist panels were introduced to MASH clinical trials to reduce scoring variability, the non-inferior performance of AIM-MASH AI Assist suggests that the algorithm may have the same impact in a trial setting.

Screenshot 2026-05-21 at 4.06.45 PM

Summary and potential impact

The landmark regulatory qualifications achieved by PathAI’s AIM-MASH AI Assist solution [1,2] mark a significant advancement in MASH drug development. By integrating this AI-powered tool into MASH clinical trial workflows, the field may overcome the long-standing challenges of manual scoring variability, a factor that has historically complicated patient enrollment and obscured potential treatment effects.

The evidence in support of AIM-MASH AI Assist demonstrates that this tool is not only repeatable and reproducible, but also performs equivalently to a labor-intensive, multi-reader consensus panel [5,12]. Thus, AIM-MASH AI Assist provides a standardized, efficient, and accurate method for assessing MASH CRN grades and stages. Already, the potential impact of AIM-MASH AI Assist has been seen in the phase 2 WAYFIND trial, which enrolled patients with compensated cirrhosis – a population of patients for which no therapies are currently approved. Retrospective analysis of this trial, which did not meet its primary endpoint by manual scoring, demonstrated that it would have met this primary endpoint if the validated AIM-MASH AI Assist read workflow had been used [2,14,15].

Thus, there is great potential for utilizing AIM-MASH AI Assist in MASH clinical trials. Given the unprecedented nature of AI-based scoring in MASH clinical trials, continued collaboration between PathAI, trial sponsors, and regulatory agencies will be needed to most effectively incorporate AIM-MASH AI Assist into trial protocols. The shift toward an AI-powered scoring approach that keeps pathologists in the loop may indeed bring greater signal and less noise to trial efficacy data, thereby accelerating the development of effective therapies and moving the MASH community closer to the goal of delivering better outcomes for patients.

*AIM-MASH AI Assist is qualified as a tool in the EU and as a DDT in the US for use in MASH clinical trials. AIM-MASH AI Assist is not for use in diagnostic procedures.

References:

European Medicines Agency (EMA). Qualification Opinion — Artificial Intelligence-Based Measurement of Non-alcoholic Steatohepatitis Histology in Liver Biopsies to Determine Disease Activity in NASH/MASH Clinical Trials. EMA/CHMP; 2025.
FDA Qualifies First AI Drug Development Tool, Will Be Used in 'MASH' Clinical Trials
Davison BA, Harrison SA, Cotter G, et al. Suboptimal reliability of liver biopsy evaluation has implications for randomized clinical trials. J Hepatol. 2020; 73(6):1322-1332.
Kleiner DE, Brunt EM, Wilson LA, et al. Association of histologic disease activity with progression of nonalcoholic fatty liver disease. JAMA Netw Open. 2019; 2(10):e1912565
Juluri R, Vuppalanchi R, Olson F, et al. Generalizability of the NASH CRN Histological Scoring System for Nonalcoholic Fatty Liver Disease. J Clin Gastroenterol. 2011; 45(1):55–58.
Iyer JS, Juyal D, Le Q, Shanis Z, et al. AI-based automation of enrollment criteria and endpoint assessment in clinical trials in liver diseases. Nature Medicine. 2024; 30:2914-2923.
Harrison SA and Douberg J. Liver biopsy evaluation in MASH drug development: Think thrice, act wise. J Hepatol. 2024; 81(5): 886–894.
Chalasani NP, Sanyal AJ, Kowdley KV, et al. Pioglitazone versus vitamin E versus placebo for the treatment of non-diabetic patients with non-alcoholic steatohepatitis: PIVENS trial design. Contemp Clin Trials. 2009; 30(1):88-96.
https://www.fda.gov/regulatory-information/search-fda-guidance-documents/technical-specifications-submitting-clinical-trial-data-sets-treatment-noncirrhotic-nonalcoholic
Sanyal AJ, Loomba R, Anstee QM, et al. Utility of pathologist panels for achieving consensus in NASH histologic scoring in clinical trials: data from a phase 3 study. Hepatol Commun. 2023; 8(1):e0325.
Shah A, MacConell L, Liberman A, et al. Challenges in histological endpoints for MASH therapies: an exercise in statistical modelling. Aliment Pharmacol Ther. 2025; 61(9):1489-1499.
Pulaski H, Harrison SA, Mehta SS, Sanyal AJ, Vitali MC, Manigat LC, et al. Clinical validation of an AI-based pathology tool for scoring of metabolic dysfunction-associated steatohepatitis. Nature Medicine. 2025;31:315–322. doi:10.1038/s41591-024-03301-2.
Pai R, Jairath V, Hogan M, et al. Reliability of histologic assessment for NAFLD and development of an expanded NAFLD activity score. Hepatology. 2022; 76(4):1150-1163.
Alkhouri N, et al. A Randomized, Placebo-Controlled, Phase 2 Study of the Safety and Efficacy of Combination Treatment with Semaglutide, Cilofexor and Firsocostat in Patients With Compensated Cirrhosis Due to Metabolic Dysfunction-Associated Steatohepatitis (WAYFIND). Hepatology. 2025; 82(S1):S131-S133
A New Path Forward: How PathAI's FDA-Qualified AIM-MASH AI Assist Can Redefine MASH Clinical Trials