Dear Sir:
The optimal timing for initiating direct oral anticoagulant (DOAC) in patients with stroke and atrial fibrillation (AF) remains uncertain, largely due to the varying risk of hemorrhagic transformation. Previous studies suggest that the risk of hemorrhagic transformation is related to stroke severity, using the National Institutes of Health Stroke Scale (NIHSS) score [
1,
2]. However, the NIHSS score reflects not only the size of the infarct but also its location, potentially leading to discrepancies between the score and the actual burden of infarct volume. Moreover, hemorrhagic transformation after ischemic stroke is closely related to the size of stroke [
3]. Based on these findings, the Early versus Late Initiation of Direct Oral Anticoagulants in Post-ischemic Stroke Patients with Atrial Fibrillation (ELAN) trial stratified stroke size into minor, moderate, and major categories based on imaging, showing that early initiation of DOACs is likely safe and may reduce the risk of recurrent ischemic events [
4].
In real-world practice, a lack of certified stroke centers and consistent access to in-person neurological consultations pose significant challenges [
5]. Consequently, many institutions face challenges in maintaining sufficient expertise to reliably classify stroke size based on imaging criteria. Furthermore, imaging-based risk stratification in large-scale clinical trials is often limited by the requirement for precise stroke-volume assessment and the availability of multiple neurologists. To address these needs, we developed a deep learning algorithm that automatically classifies stroke sizes based on the imaging criteria, using diffusion-weighted imaging (DWI) data of stroke patients with AF.
The algorithm was trained on 1,091 DWI scans of ischemic stroke attributable to AF, collected from four hospitals between 2011 and 2021. An external validation dataset comprising 1,265 DWI scans was collected from 11 non-overlapping hospitals between 2017 and 2020 (
Supplementary Methods and
Supplementary Figure 1). The institutional review boards of all centers approved the study, and written informed consent was obtained.
For the training/internal validation dataset, stroke size classification was determined by an experienced vascular neurologist (WSR) using a standardized visual rating scheme from the ELAN trial (
Supplementary Methods) [
4]. For the external validation dataset, stroke size classification was independently determined by two vascular neurologists (WSR and HK) using the same criteria. In cases of disagreement, a consensus was reached and used as the ground truth for external validation. Stroke locations in the external validation dataset were classified into supratentorial, infratentorial, and mixed lesions (
Supplementary Methods).
Infarct lesions on DWIs were automatically segmented using a validated 3D U-Net algorithm (JLK-DWI; JLK Inc., Seoul, Korea) [
6]. For the classification, we employed an EfficientNet3D. The model was modified to accept two-channel inputs (DWI and segmentation mask) and output three classes representing stroke size classification. Additional details for model development are provided in
Supplementary Methods and
Supplementary Figure 2.
The algorithm’s performance was compared with expert consensus using percentage agreement, Cohen’s kappa, and area under the receiver operating characteristic curve (AUC). Inter-rater percentage agreement and Cohen’s kappa were also calculated, comparing classifications made by two vascular neurologists. Details for statistical analysis are provided in
Supplementary Methods.
The mean (SD) ages for the internal and external datasets were 73.6 (10.3) years and 75.2 (10.2) years, respectively, with 54.4% and 53.0% of participants being male (
Supplementary Tables 1 and
2). In the external validation dataset, the percentage agreement and Cohen’s kappa between the deep learning algorithm and the consensus of two vascular neurologists were 87.4% (95% confidence interval [CI], 85.4-89.2) and 0.81 (95% CI, 0.78-0.84), respectively, with comparable performance in the training/internal validation dataset and the algorithm demonstrated notable accuracy for each stroke size classifications (
Table 1). The AUC values for classifying minor, moderate, and major stroke categories in the external validation dataset were 0.988, 0.955, and 0.988, respectively, with similar performance in the training/internal validation dataset (
Figure 1). In comparison, between the vascular neurologists, the percentage agreement was 74.6% with Cohen’s kappa of 0.62 (
Supplementary Table 3).
After stratifying by stroke location, the deep learning algorithm showed high agreement with stroke experts for supratentorial and infratentorial lesions, achieving Cohen’s kappa values of 0.82 (95% CI, 0.79-0.85) and 0.85 (95% CI, 0.76-0.93), respectively (
Table 2). For mixed lesions, agreement was lower with a kappa of 0.61 (95% CI, 0.49-0.74). The mean infarct volume also varied among stroke size classifications within each lesion location category (
Supplementary Figure 3 and
Supplementary Table 4).
In patients undergoing DWI within 24 hours from the onset time, Cohen’s kappa was 0.81 (95% CI, 0.78-0.88). When the time was extended to 48 hours, the model exhibited similar performance, with Cohen’s kappa of 0.81 (95% CI, 0.78-0.87) (
Supplementary Table 5).
Additionally, stroke size predicted by the algorithm was significantly associated with the frequency of symptomatic hemorrhagic transformation (
Supplementary Figure 4). The mean processing time from raw image to output in a graphics processing unit (GPU) environment was 5.188 seconds (SD, 0.654) across 100 randomly selected DWI scans.
In this study, we developed and validated a deep learning algorithm to classify stroke size in AF-related stroke using multicenter and multivendor datasets, achieving excellent agreement with stroke experts. To our knowledge, this is the first study to develop a deep learning model that automatically classifies stroke size for severity prediction based on DWI.
Several observational studies have established that the risk of hemorrhagic transformation in AF-related stroke is closely related to infarct size, supporting neuroimaging-based risk stratification to minimize intracranial hemorrhage [
7,
8]. In addition to the ELAN trial, a recent meta-analysis suggested that early DOAC initiation may reduce recurrent ischemic stroke risk by 36% without increasing intracranial hemorrhage [
9]. Our model effectively classified minor, moderate, and major cases separately and showed even higher agreement when categorizing patients into non-major versus major cases. These findings suggest our model could help guide DOAC initiation timing, particularly for physicians with less experience.
Furthermore, the algorithm’s mean processing time from raw DWI input to stroke size classification was approximately 5 seconds (Supplementary Discussion). This rapid processing could facilitate large-scale studies, enabling further research on infarct volume, DOAC initiation, and the risk of intracranial hemorrhage.
In conclusion, this algorithm has the potential to assist less experienced physicians in optimizing DOAC initiation timing and supports the use of large neuroimaging datasets in future research.