The landscape of metabolic dysfunction-associated steatohepatitis (MASH) drug development reached a potentially pivotal turning point in 2025. This was the year that PathAI’s AIM-MASH AI Assist* digital pathology tool achieved landmark regulatory recognition, securing both EMA qualification and FDA Drug Development Tool (DDT) qualification [1,2]. These regulatory milestones have brought the MASH field closer to a new era where integration of AI-powered scoring tools into clinical trials provide the standardized, reproducible scoring necessary for accurate trial enrollment and endpoint assessment.
To understand the significance of this shift, this blog will review the historical evidence and challenges of manual pathology that necessitated this innovation, as well as the potential advantages provided by AIM-MASH AI Assist.
While liver biopsy assessment remains the gold standard for both trial enrollment and endpoint evaluation, opportunities remain to improve MASH histology scoring in the clinical trial setting.
A notable challenge in the field is scoring variability – inter-reader and intra-reader – even among expert liver pathologists [3-5]. In the clinical trial setting, this variability can have repercussions:
Enrollment: MASH trial enrollment typically relies on a single central reader. Scoring subjectivity and inconsistency can result in the improper inclusion or exclusion of patients from a trial [3]. If patients who do not meet the MASH enrollment criteria are included, the trial population will be diluted, obscuring responder data [3,6]. Furthermore, baseline biopsy re-evaluation for endpoint analysis frequently reveals patients to have not met the initial enrollment criteria, leading to their exclusion from primary analysis populations [7].
Treatment effect: Phase 2 MASH trials often use single central readers or dual readers with a third pathologist for adjudication [7,8]. The United States FDA recommends that Phase 3 MASH trials utilize dual readers with a third pathologist for adjudication [9], while some sponsors incorporate consensus panels of three pathologists into their study designs [10]. Despite the movement toward pathologist panels for endpoint evaluation in late-stage trials, scoring variability persists. It has been hypothesized that this variability can artificially lower the observed effect size, making potentially effective drugs appear unsuccessful [6,7,11].
Trial planning and comparison: The lack of scoring reliability complicates investigators’ ability to design late phase trials in MASH. Specifically, the variability of scoring complicates the use of early-phase trial data for appropriately powering later phase studies. Furthermore, this scoring variability can impact response rates in both placebo and treatment arms of MASH trials, complicating the comparison of effect size across therapeutic candidates.
AIM-MASH is an AI-powered tool that yields predicted MASH CRN grades and stages from input whole slide images of MASH biopsies [6,13]. This tool was developed using an extensive dataset of ~8,700 H&E-stained slides and ~7,600 Masson Trichrome-stained slides from six completed phase 2b and phase 3 MASH clinical trials [6]. Roughly 60 expert MASH pathologists contributed over 100,000 annotations of relevant histologic substances – including tissue features, fibrosis, and regions of artifact – for model training. From these model predictions, AIM-MASH yields overlays highlighting pixel-level predictions of relevant histologic substances on input whole slide images, as well as predicted MASH CRN grades and stages for each cardinal MASH feature (Figure 1).
Given the known issues with manual scoring variability [3-5], we evaluated the repeatability and reproducibility of AIM-MASH (Table 1). Ten sequential deployments of AIM-MASH on a set of whole-slide images revealed that the algorithm was 100% repeatable for each of the four cardinal MASH histology features [6]. We also assessed the repeatability of AIM-MASH on WSIs from the same glass slides scanned multiple times on different days, as well as the reproducibility of AIM-MASH on WSIs from the same glass slides scanned at three different laboratories [12]. For each of the four cardinal MASH features, repeatability was at least 92% (range: 92.6-96.3%) and reproducibility was at least 84% (range: 84.7-91.2%). Given that one single site is typically used as a central pathology laboratory (including slide scanning), the repeatability results are especially representative of the potential reliability of AIM-MASH in trial settings. Furthermore, these repeatability and reproducibility values are all much higher than the mean pairwise agreement between pathologists, which vary between features but are between 45-70%.
The performance of AIM-MASH as a standalone tool was promising: non-inferior performance to manual pathologist scoring for steatosis and fibrosis and superior performance to manual pathologist scoring for lobular inflammation and hepatocellular ballooning, the most variable of the CRN score components [5,6,10-13].
In prospective clinical trials, however, the intended use of AIM-MASH is to assist pathologists with scoring for trial enrollment and endpoint assessment. In these settings, AIM-MASH AI Assist can support pathologists in a manner that 1) reduces inter-pathologist variability and 2) anchors pathologists to a consistent, validated definition of the MASH cardinal histologic characteristics, especially for features for which interpretation can be subjective. In other words, a human in the loop is necessary for the AIM-MASH clinical trial scoring workflow, preserving the function of clinicians to override algorithm outputs and assess the slides for any additional safety findings.
The EMA qualification was inclusive of an AI-assisted workflow for AIM-MASH, shown in Figure 2 [12]. Primary pathologists were tasked with reviewing AIM-MASH-derived scores. If AI-assisted pathologists agreed with the model score, the score was finalized. Secondary review was necessary if the pathologist and model disagreed by 2+ points, and consensus calls were needed if disagreement persisted after secondary review. Using this workflow, less than 2% of all cases require secondary review, and less than 1% require consensus calls, preserving the high level of precision demonstrated by the algorithm, which is key for detecting true response signals and for comparisons across trials.
Using this workflow, we evaluated AIM-MASH as an AI assist tool in a large dataset (1,400 biopsies) from three MASH clinical trials, comparing AIM-MASH AI Assist to single unassisted pathologists and a 3-reader panel of unassisted pathologists [12].
The landmark regulatory qualifications achieved by PathAI’s AIM-MASH AI Assist solution [1,2] mark a significant advancement in MASH drug development. By integrating this AI-powered tool into MASH clinical trial workflows, the field may overcome the long-standing challenges of manual scoring variability, a factor that has historically complicated patient enrollment and obscured potential treatment effects.
The evidence in support of AIM-MASH AI Assist demonstrates that this tool is not only repeatable and reproducible, but also performs equivalently to a labor-intensive, multi-reader consensus panel [5,12]. Thus, AIM-MASH AI Assist provides a standardized, efficient, and accurate method for assessing MASH CRN grades and stages. Already, the potential impact of AIM-MASH AI Assist has been seen in the phase 2 WAYFIND trial, which enrolled patients with compensated cirrhosis – a population of patients for which no therapies are currently approved. Retrospective analysis of this trial, which did not meet its primary endpoint by manual scoring, demonstrated that it would have met this primary endpoint if the validated AIM-MASH AI Assist read workflow had been used [2,14,15].
Thus, there is great potential for utilizing AIM-MASH AI Assist in MASH clinical trials. Given the unprecedented nature of AI-based scoring in MASH clinical trials, continued collaboration between PathAI, trial sponsors, and regulatory agencies will be needed to most effectively incorporate AIM-MASH AI Assist into trial protocols. The shift toward an AI-powered scoring approach that keeps pathologists in the loop may indeed bring greater signal and less noise to trial efficacy data, thereby accelerating the development of effective therapies and moving the MASH community closer to the goal of delivering better outcomes for patients.
*AIM-MASH AI Assist is qualified as a tool in the EU and as a DDT in the US for use in MASH clinical trials. AIM-MASH AI Assist is not for use in diagnostic procedures.
References: