Context
Precise determination of biomarker status is necessary for clinical trial enrollment and endpoint analyses, as well as for optimal treatment determination in real-world practice. However, variabilities may be introduced into this process due to the processing of clinical specimens by different laboratories and assessment by distinct pathologists. Machine learning tools have the potential to minimize inconsistencies, although their use is not presently widespread.
To assess the applicability of machine learning to the quality control process for biomarker scoring in oncology, we developed and validated an automated machine learning model to be applied as a quality control tool for monitoring the assessment of human epidermal growth factor-2 (HER2).
The model was trained using whole slide images from multiple sources to quantify HER2 expression and measure immunohistochemistry stain intensity, tumor area, and the presence of artifacts or ductal carcinoma in situ across breast cancer phenotypes. The quality control tool was deployed in a real-world cohort of HER2-stained breast cancer sample images collected from routine diagnostic practice to evaluate trends in HER2 testing quality indicators and between pathology laboratories.
Automated image analysis for HER2 scoring is consistent and reliable using this algorithm. Deployment of the HER2 quality control tool across 3 clinical laboratories revealed interlaboratory variability in HER2 scoring and inconsistencies in data reporting.
These results support the future incorporation of quality control algorithms for real-time monitoring of clinical laboratories contributing to clinical trials in oncology and in the real-world setting of HER2 immunohistochemistry testing in local clinical laboratories and hospitals.
Analysis of histopathologic specimens is an integral step in the decision-making process in oncology, both in clinical trials and real-world settings. Pathologists confirm disease diagnosis, assess disease grade and stage, perform biomarker scoring, identify the presence of surgical resection margins and disease recurrence, and, more recently in the neoadjuvant setting, determine whether a pathologic complete response has occurred in response to treatment.1
In oncology, advances in precision medicine have led to the development of molecularly targeted therapeutics targeting populations of patients who are important for pathologists to identify accurately with immunohistochemistry (IHC). Therefore, standardization of biomarker assays (to ensure consistency in patient enrollment and endpoint analysis) is necessary. However, issues related to tissue processing (eg, fixative and fixation time) and intrinsic tumor characteristics (eg, tumor heterogeneity) may impact the quality of test results if not sufficiently accounted for. In addition, large, multicenter clinical trials often necessitate the involvement of multiple laboratories and participating pathologists. However, this setup introduces multiple sources of potential variability at the preanalytical stage (eg, tissue processing and preparation), analytical stage (eg, staining protocol and conditions), and postanalytical stage (eg, pathologist scoring).1 Similar issues arise when using quantitative image analysis to assess human epidermal growth factor-2 (HER2) status,2,3 leading the College of American Pathologists to issue recommendations that laboratories using this approach (1) ensure the reproducibility of results between batches, between operators, and between regions of interest; and (2) monitor the performance of their image analysis platform.4
To address these issues, recent efforts have been made to assess and standardize the quality of a sampling process for pathology quality control (QC), including the SPIRIT-PATH guideline.5,6 For example, proper identification of patients with HER2-positive cancer, including HER2-low (cases scored as 1+ by IHC or 2+ by IHC without HER2 amplification) is necessary to identify patients who may respond to HER2-targeted therapies, and QC training can significantly improve interobserver reproducibility for HER2 assessment.7 Many examples of the need for QC of biomarker assessment in oncology involve HER2, a well-described prognostic biomarker in many cancer subtypes.8–10 HER2 status determination is routinely conducted on patient samples, using IHC to assess patients’ eligibility for anti-HER2–targeted therapies. Manual scoring is the current standard for interpreting HER2 IHC status; however, this method is subjective, which can lead to intraobserver and interobserver variability.7,11–14 After the first guideline for HER2 testing in breast cancer was published in 2007, a retesting analysis revealed discrepancies in the degree to which laboratories adhered to the recommendations regarding tissue fixation.15 Despite guideline updates being published in 2013,16 2018,17 and 2023,18 standardization of preanalytical variables, including fixation techniques, is still lacking.19
Recent reports have further demonstrated the difficulty, but significant importance, of precise and accurate HER2 status determination. Of those breast cancers classified as HER2-negative, 60% actually express low levels of the protein (“HER2-low” cancers), but these patients have not had clinical benefit from conventional HER2-targeted therapies in several studies.20–23 Despite good concordance among pathologists around higher levels of HER2 expression (ie, HER2-positive cases), concordance around lower levels of expression, while acceptable, remains suboptimal.11 These data reveal an opportunity for computer vision approaches to aid in the QC of HER2 scoring.
Machine learning (ML) approaches have shown promise in the automated analysis of IHC images.12,24,25 We hypothesized that ML algorithms can support the QC of the pathology workflow. In this study, we describe the development and validation of an automated ML model for HER2 testing QC and monitoring. We further describe how ML models could be adapted in real-world laboratories to reproducibly and rapidly monitor both the quality of HER2 scoring in breast cancer and inconsistencies in pathology scoring practices.
Authors
- Glass et al.