Deep Learning-Based Automatic Classification of Ischemic Stroke Subtype Using Diffusion-Weighted Images
Article information
Abstract
Background and Purpose
Accurate classification of ischemic stroke subtype is important for effective secondary prevention of stroke. We used diffusion-weighted image (DWI) and atrial fibrillation (AF) data to train a deep learning algorithm to classify stroke subtype.
Methods
Model development was done in 2,988 patients with ischemic stroke from three centers by using U-net for infarct segmentation and EfficientNetV2 for subtype classification. Experienced neurologists (n=5) determined subtypes for external test datasets, while establishing a consensus for clinical trial datasets. Automatically segmented infarcts were fed into the model (DWI-only algorithm). Subsequently, another model was trained, with AF included as a categorical variable (DWI+AF algorithm). These models were tested: (1) internally against the opinion of the labeling experts, (2) against fresh external DWI data, and (3) against clinical trial dataset.
Results
In the training-and-validation datasets, the mean (±standard deviation) age was 68.0±12.5 (61.1% male). In internal testing, compared with the experts, the DWI-only and the DWI+AF algorithms respectively achieved moderate (65.3%) and near-strong (79.1%) agreement. In external testing, both algorithms again showed good agreements (59.3%–60.7% and 73.7%–74.0%, respectively). In the clinical trial dataset, compared with the expert consensus, percentage agreements and Cohen’s kappa were respectively 58.1% and 0.34 for the DWI-only vs. 72.9% and 0.57 for the DWI+AF algorithms. The corresponding values between experts were comparable (76.0% and 0.61) to the DWI+AF algorithm.
Conclusion
Our model trained on a large dataset of DWI (both with or without AF information) was able to classify ischemic stroke subtypes comparable to a consensus of stroke experts.
Introduction
Studies have shown that the volume [1] and pattern [2] of ischemic lesions on diffusion-weighted image (DWI) are associated with stroke subtype and predictive of post-stroke functional outcomes and future cerebrovascular events. Approximately a quarter of patients with ischemic stroke experience recurrence [3,4]. In a previous study of 7,101 patients with acute ischemic stroke, we observed that large artery atherosclerosis (LAA) and cardioembolic strokes were associated with an approximately 5-times higher risk of recurrence at 1-year, compared with small vessel occlusion (SVO) stroke [5]. The etiology of stroke is critical to the correct implementation of future preventative strategies.
The Trial of Org10172 in Acute Stroke (TOAST) classification has been the most frequently method employed for etiologic stroke subtyping in clinical practice and research [6]. The original TOAST classification required clinical features and data from tests including brain imaging (computed tomography/magnetic resonance imaging [CT/MRI]), cardiac evaluation (electrocardiography [ECG], echocardiography, and etc.), duplex imaging of extracranial arteries, arteriography, and laboratory assessments for a pro-thrombotic state [6]. Additional tests, such as Holter monitoring, implantable loop recorder, and high-resolution vessel wall MRI have enabled more precise stroke subtyping [7]. However, these tests increase the cost and the length of hospital stay. Moreover, many countries lack enough access to these advanced techniques. A diagnosis support system using initial or simple exams, such as DWI and ECG, to detect acute infarcts and atrial fibrillation (AF) could reduce costs [8,9] and assist clinicians who do not have access to other resources to determine stroke etiology.
To date, a few previous studies have developed automated systems for classifying stroke subtypes using deep learning algorithms and DWI [10,11]. However, no study has externally validated these algorithms, which is critically important given the low inter-rater reliability in the classification of stroke subtypes [12]. In the present multi-center study, we enrolled about 6,500 patients with acute ischemic stroke. Using 2,489 patients’ DWI data with and without information on the presence of AF, we developed a deep learning algorithm to classify stroke subtypes. We then externally validated the deep learning algorithm on a new set of 3,384 patients, using three temporally and regionally different datasets. In addition, we compared stroke subtype classifications by the deep learning algorithm versus neurovascular experts. Finally, we outlined practical applications of the deep learning-based stroke subtype classification for cardioembolism (CE) risk stratification based solely on initial DWI assessments, for use when AF information is not available or becomes available after continuous ECG monitoring (for days–years) [13,14].
Methods
Participants
Dataset for training, validation, and internal testing
From May 2011 to March 2014, we consecutively enrolled 4,514 patients with acute ischemic stroke, who were admitted to three university hospitals (Dongguk University Hospital, Seoul National University Bundang Hospital, and Dong-A University Hospital). We included a consecutive series of patients who were admitted within 7 days of onset, while excluding the following patients with: (1) unavailable or poor-quality of DWI (n=342), (2) other causes of stroke (n=241), and (3) undetermined stroke subtype (n=933) (Supplementary Figure 1). The remaining 2,998 patients’ data were used for training, validation, and internal test, using random sub-setting in ratio of 7:2:1. The institutional review board of Dongguk University Hospital (IRB No. 2017-09-017) and each participating center approved the study protocol, and patients or their legal proxies provided a written informed consent.
Datasets for external testing
A total of 3,384 fresh stroke imaging datasets were used for external testing, comprised of the following components.
External test dataset 1
From May 2011 to March 2014, 2,787 patients with acute ischemic stroke who were admitted within 7 days of symptom onset were consecutively enrolled from Chonnam National University Hospital. After excluding 868 patients, 1,919 were finally included (Supplementary Figure 1).
External test dataset 2
From October 2021 to August 2022, 1,315 patients with acute ischemic stroke who were admitted within 7 days of symptom onset were consecutively enrolled from the Chonnam National University Hospital. After excluding 491 patients, 824 were finally included (Supplementary Figure 1).
External test dataset 3
From March 2021 to April 2022, 931 patients with acute ischemic stroke who were admitted within 7 days of symptom onset were consecutively enrolled from Korea University Guro Hospital. After excluding 290 patients, 641 were finally included (Supplementary Figure 1).
Clinical trial dataset
A pivotal clinical trial was conducted to assess the efficacy of deep learning algorithms in comparison to a standard reference established through expert consensus, and to measure the level of agreement between the deep learning algorithm and the consensus as well as among the experts themselves. From March 2016 to May 2017, 1,701 patients who met the following inclusion criteria were enrolled from the two stroke centers (Dongguk University Hospital and Seoul National University Bundang Hospital): (1) age between 20 and 95 years, (2) patients with acute ischemic stroke who visited the hospitals within 7 days after symptom onset, and (3) patients who underwent DWI. According to the pre-planned exclusion criteria, we excluded 612 patients for the following reasons: inadequate or poor-quality DWI (n=148), other causes of stroke (n=114), undetermined causes of stroke (n=315), and CE stroke attributable to causes other than AF (n=35). Thus, data from 900 patients remained available for clinical testing.
Clinical data collection
Using a standardized protocol [15], we prospectively collected demographic data, prior medication history, and the presence of vascular risk factors including hypertension, diabetes mellitus, hyperlipidemia, coronary artery disease, AF, and smoking history.
Imaging acquisition and infarct segmentation
For the training data, brain MRIs were performed on 1.5 tesla (n=2,471) or 3.0 tesla (n=527) MRI systems. The DWI protocol was as follows: b-values of 0 and 1,000 s/mm2, echo time 50–99 ms, repetition time 2,400–9,000 ms, voxel size 1×1×3–5 mm3, interslice gap of 0–2 mm, and slice thickness of 3–7 mm. Using a validated 3D U-net algorithm that we recently developed,16,17 we automatically segmented infarct lesions on DWIs.
Ischemic stroke subtype classification
For the datasets for training and validation, internal testing, and external test datasets 1–3, stroke subtypes were determined by experienced vascular neurologists at each hospital, using a validated MRI-based classification system built on the TOAST criteria (details provided in Supplementary Methods and Supplementary Figure 2) [7]. Briefly, the modified TOAST classification is composed of the following five steps: (1) consideration of other determined etiologies of stroke, (2) screening for SVO on DWI, (3) consideration of relevant artery stenosis or occlusion, (4) consideration of recanalization status after recanalization therapy, and (5) consideration of follow-up recanalization status without recanalization therapy. For the clinical trial dataset, stroke subtypes were determined through consensus among three experienced vascular neurologists (J-W Chung, J-S Lim, and D-E Kim).
Development of a deep learning algorithm for ischemic stroke subtype classification
Brain DWIs were preprocessed by (1) skull stripping using the Gaussian blur and Otsu’s threshold [18], (2) applying N4 bias field correction using the SimpleITK library, and (3) performing image signal normalization. After the preprocessing, infarct areas on DWI were automatically segmented using the aforementioned validated 3D U-net algorithm (JLK-DWI, JLK Inc., Seoul, Korea) [16,17]. The segmented infarct masks from raw DWIs were stacked and condensed into three 2D X, Y, Z-axis images to ensure consistent data input regardless of the number of slices (Supplementary Figure 3). These condensed 2D X, Y, Z-axis images were resized to 256×256 pixels using bilinear interpolation. Thus, the training data for the algorithm was comprised of three 2D images representing X, Y, Z-axis projections of segmented infarct area and a label (LAA, SVO, and CE).
In a pilot study, deep learning models using EfficientNetV2 [19] outperformed those using ResNet [20], MobileNetV3 [21], and EfficientNet [22] in stroke subtyping (data not shown). The EfficientNetV2 [19] is a new family of convolutional networks that have faster training speed and better parameter efficiency, while adding a global_average_pooling2d layer to minimize overfitting by reducing the total number of parameters. In addition, we incorporated a sequence of one inner dense layer with dropout layers. In total, a 30% dropout rate was randomly chosen to avoid overfitting. Finally, one output dense layer contained 3 output units for multi-class (LAA, SVO, and CE) classification, which were designated as the DWI-only based subtype classification. The details of the layers, their order in the proposed model, and the output shape of each layer are presented in Supplementary Figure 3. The total number of parameters was 52,862,199.
To develop a deep learning algorithm that takes account for AF, we concatenated a binary value (0 vs. 1: the absence vs. presence of AF) to previous outputs, and then applied a fully connected layer. The output was then designated as the DWI+AF based subtype classification.
For all the procedures, including preprocessing and model development, we used Python 3.7.9 and 3.8.13, PyTorch 1.12.0, Torchvision 0.13.0, pandas 1.2.4, NumPy 1.19.5/1.22.3, SciPy 1.4.1/1.6.3, scikit-image 0.15.0/0.18.1, SimpleITK 2.1.1, and Pydicom 2.1.2. Each model was trained for approximately 9 hours using a hardware system comprising Intel Xeon Silver 4314 @2.40 GHz, 640 GB RAM, and NVIDIA Quadro RTX A6000 with 48GB GDDR6.
Expert consensus for the classification of stroke subtype in the clinical trial dataset
For the clinical trial dataset, we first assessed the inter-observer agreement of stroke subtype classification between two experts (J-W Chung and J-S Lim, board-certified neurologists with more than 5-year experience in both stroke practice and research), who had served as stroke neurologists at least 5 years and independently reviewed the brain MRI and patients’ data. Information provided to the reviewers included age, sex, the presence of AF, DWIs, and magnetic resonance or CT angiography. Based on the aforementioned ischemic stroke subtype-classification system [7], they independently determined etiologies (i.e., LAA, SVO, or CE). In cases of disagreement between the two reviewers, a third reviewer (D-E Kim) served as the tie-breaker. When the final consensus on stroke subtype was undetermined or other determined stroke, the case was excluded from the analysis. The experts’ consensus classifications were compared with the deep learning algorithm’s classifications.
Statistical analysis
The baseline characteristics between datasets were compared using the analysis of variance or Kruskal–Wallis test for continuous variables and chi-square test for categorical variables, as appropriate. To compare the subtype classifications made by experts and those made by deep learning algorithms, we used percentage agreement and Cohen’s kappa. To assess performance metrics of deep learning algorithms, we used the one-vs-rest method [23] and calculated the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value, and negative predictive value for each subtype (LAA, SVO, and CE). In the clinical trial dataset, a paired t-test was used to compare CE probabilities estimated by the DWIonly and DWI+AF algorithms for CE cases that artificial intelligence (AI) algorithms misclassified as LAA. Additionally, the rates of inter-expert disagreements for LAA, SVO, and CE were compared with the rates of disagreements between the expert consensus and the DWI+AF algorithm-based classifications for these three subtypes. Further, disagreements for the following three subtypes of LAA were also assessed (Supplementary Figure 2): LAA–negative (LAA-NG), LAA–branch atheromatous disease (LAABR), and LAA–lacune (LAA-LC). To evaluate the performance of deep learning algorithms depending on the onset-to-imaging time (<24 hours vs. 24 hours–7 days), we calculated the percentage agreements of stroke subtyping between both the early and late imaging groups using the external test dataset 3 as well as the clinical trial dataset, both of which had information on the time of DWI acquisition. To examine the clinical implications of AI prediction of CE using DWI, participants in each dataset were stratified into ten groups based on the probability of having CE estimated by deep learning algorithm. The trend of the observed frequency of CE stroke, as determined by experts, was examined using a Wilcoxon-type test for trend.24 All the statistical analyses described above were performed using Stata 16.0 (Stata Corp., College Station, TX, USA), and a P-value <0.05 was considered statistically significant.
Results
Baseline characteristics
In the training and validation datasets, the mean (standard deviation [SD]) age was 68.0 (12.5) and 61.1% were men (Table 1). Mean ages were similar in all datasets. Other demographic characteristics, such as sex, admission National Institute of Health Stroke Scale scores, and risk factors for stroke varied significantly among the datasets. The distribution of stroke subtypes also differed among the datasets, indicating their heterogeneity.
Deep learning prediction of stroke subtype using DWI data only versus DWI plus AF data
In the internal test dataset (Table 2), the percentage agreement between the DWI-only algorithm and stroke experts was 65.3% (95% confidence interval [CI]: 60.0%–70.6%); the AUC values for LAA, SVO, and CE were 0.75, 0.93, and 0.81, respectively (Supplementary Figure 4). After incorporating the information regarding the presence of AF (DWI+AF algorithm), the percentage agreement increased to 79.1% (95% CI: 74.6%–83.6%), and the AUC values for LAA, SVO, and CE increased to 0.90, 0.93, and 0.95, respectively (Figure 1).
In the external test datasets (Table 2), both algorithms again showed good agreements. The DWI-only algorithm achieved 59.3%–60.7% levels of agreements (Table 2 and Supplementary Table 1); the AUC values for LAA, SVO, and CE were 0.69–0.72, 0.83–0.90, and 0.79–0.82, respectively (Supplementary Figure 4). The DWI+AF algorithm again showed higher agreements, ranging from 73.7% to 74.0%, with Cohen’s kappa ranging from 0.57 to 0.59. In addition, the accuracy of stroke subtype classification reached 0.83 (Table 3), and the AUC values for LAA, SVO, and CE increased to 0.84–0.88, 0.85–0.91, and 0.95–0.97, respectively (Figure 1).
In the clinical trial dataset (Table 2), the percentage agreements and Cohen’s kappa were respectively 58.1% (95% CI: 54.9%–61.3%) and 0.34 (0.29–0.39) for the DWI-only algorithm, and the values were 72.9% (95% CI: 69.1%–76.7%) and 0.57 (0.51–0.62) for the DWI+AF algorithm, respectively. In addition, the AUC values for LAA, SVO, and CE improved from 0.68 to 0.90, 0.86 to 0.87, and 0.77 to 0.996, respectively (Supplementary Figure 4 and Figure 1).
Alluvial plots for the five datasets (Figure 2) showed that additional information regarding the presence of AF on ECG changed the categorization of stroke subtype by the DWI-only algorithm from CE to LAA more often (22.1%–38.2%) than from LAA to CE (13.7%–16.2%) or from SVO to CE (4.2%–7.2%). There was no reclassification from CE to SVO. In the clinical trial dataset, we found (1) one CE stroke (with a small subcortical infarct and a tiny cortical infarct in the presence of AF; Supplementary Figure 5A) that the DWI+AF algorithm classified as SVO and (2) 65 LAA strokes cases (without AF) that the DWI+AF algorithm classified as CE. For the former CE case, one of the experts misclassified it as undetermined (with two or more causes) prior to the consensus meeting, because he failed to detect the cortical infarct lesion and had to rule out both SVO and CE as possible causes of the single subcortical infarct lesion. Among the 65 LAA stroke cases, 14 cases (21.5%) had been initially classified as undetermined (two or more; AF+relevant artery stenosis) by one of the two experts. Subsequently in the consensus meeting, the degree of stenosis was determined not to be significant. As shown by the scatter plots in Supplementary Figure 5B, the DWI+AF algorithm yielded significantly lower CE probabilities for the 65 cases than the DWI-only algorithm (mean±SD 0.66±0.10 vs. 0.86±0.06, respectively; P<0.001 by paired t test). However, adding the AF-absence information did not sufficiently lower the estimated CE probability to reclassify to LAA, probably due in part to the absence of the arterial stenosis information. Reviewing five cases with the highest CE probabilities (blue dotted square in the left Supplementary Figure 5B) showed that four had large wedge-shaped territorial infarcts (Supplementary Figure 5C-F) while the other one had multiple infarcts in the posterior circulation (Supplementary Figure 5G).
While stroke experts disagreed more on SVO strokes, the experts and the DWI+AF algorithm disagreed more on LAA strokes (Supplementary Table 2). Compared with LAA and LAA-NG strokes, LAA-BR and LAA-LC showed higher disagreements between the experts or between the experts’ consensus and the DWI+AF algorithm-based subtyping (Supplementary Table 3).
In the external test dataset 3 and the clinical trial dataset, where information regarding the time of image acquisition within the 7-day period was available, there was no significant difference in the classification performance of either the DWI-only algorithm or the DWI+AF algorithm (vs. expert consensus) between the early (within 24 hours from last known well) and late (24 hours–7 days) imaging groups (Supplementary Table 4).
DWI-based prediction of CE
When we divided subjects into deciles of the expected CE probability (estimated by the DWI-only algorithm; Supplementary Table 5), the observed frequency of the CE subtype determined by experts increased with a nearly linear fashion (P<0.001; Figure 3), showing good agreement. A similar trend was observed in all external test datasets. In the 8th, 9th, and 10th decile groups, approximately 40%–70% of subjects were shown to have CE strokes. Furthermore, in the clinical trial dataset, there was a strong correlation between the expected probability and observed frequency (P<0.001).
Discussion
In the present study, we developed a fully automated deep learning algorithm to classify ischemic stroke subtype using DWI and AF data from 2,998 ischemic stroke patients from three stroke centers. The deep learning algorithm was externally validated with three external datasets. The algorithm demonstrated good agreement with stroke experts, achieving Cohen’s kappa coefficients of 0.57–0.59 for three external datasets, which were lower than the value (0.68) for the internal dataset. Furthermore, the clinical trial also demonstrated that the AI classification of stroke subtypes was comparable to the expert consensus.
To date, few studies have developed deep learning algorithms to classify stroke subtypes. According to a study that exclusively utilized electronic medical records, deep learning algorithms demonstrated moderate agreement (kappa=0.57) when compared with expert decisions [25]. Another study reported that a deep learning algorithm to classify stroke subtypes using DWI showed an average accuracy of 81.9% [26]. However, these investigations did not conduct an external validation. As described, the present study validated our deep learning algorithm in three different external datasets and in a clinical trial involving two hospitals. This represents the largest dataset and best external validation currently available in the literature, to our knowledge. In all datasets, the deep learning algorithm achieved a similarly high mean accuracy (between 0.82 and 0.83), supporting its robustness. It is notable that there was a comparable level of agreement between the consensus of experts and deep learning predictions as there was between the experts themselves.
Studies have demonstrated that stroke subtypes are closely associated with the pattern and extent of ischemic lesions [2,27,28]. Cardioembolic strokes were associated with corticosubcortical single lesions, multiple lesions in anterior and posterior circulations, and multiple lesions in multiple cerebral circulations (P=0.008) [2]. LAA stroke lesions were located more frequently in the same vascular territory than CE strokes [23,28]. SVO stroke could be distinguished from other stroke subtypes based on distinctive morphological properties [27]. Thus, our deep learning algorithm trained on extensive DWI data may infer morphological and geometrical patterns associated with stroke etiologies. This could be one of the explanations for why, in the clinical trial dataset, the DWI+AF algorithm classified a CE stroke (with AF) as SVO and 65 LAA strokes (without AF) as CE.
The training dataset and the three external datasets included CE cases with AF or other potential cardiac embolic sources, while the clinical trial dataset did not include CE cases with potential cardiac embolic sources other than AF. Undetermined strokes, such as large infarcts with both relevant large artery stenosis and AF or single small subcortical infarcts with AF, were excluded from all the datasets including the training dataset. Intriguingly, as depicted by the alluvial plot, our AI algorithms trained on this training dataset classified some AF-positive cases as non-CE strokes in internal and external validations. Although further investigation is required, we speculate that the AI algorithms may have indeed identified AF-positive LAA strokes due to a covert source of artery-to-artery embolism such as non-significant (remnant) carotid stenosis or aortic atheroma, with AF acting as a bystander. Further studies are required to investigate how the presence or absence of vessel information, as well as AF information, affects our AI-based stroke subtyping, particularly in distinguishing between artery-to-artery embolism-mediated LAA and AF-mediated CE.
Guidelines for secondary prevention of stroke underscore a tailored therapeutic approach based on stroke subtypes [29,30], recommending strict blood pressure management for SVO strokes [31], intensive antiplatelet and lipid-lowering therapy for LAA strokes [32-35], and anticoagulant therapy for CE strokes [36]. However, a quarter of strokes are classified as embolic stroke with undetermined source (ESUS) [37]. Repeated failures of randomized clinical trials to compare the effectiveness of antiplatelets and direct oral anticoagulants in preventing stroke in patients with ESUS [38-40] have highlighted the need for new biomarkers or tools to identify people at high risk of CE stroke. A few machine learning algorithms using clinical and echocardiography data have demonstrated promising results in identifying individuals with an increased risk of AF within ESUS subjects [37,41]. However, these algorithms relied on extensive data input such as patients’ demographics, vascular risk factors, comorbidities, vital signs, laboratory results, and echocardiographic findings. The comprehensive data requirement poses a challenge in real-world scenarios, where data acquisition varies and resources are often limited. Our deep learning algorithm identified CE strokes based solely on DWI, suggesting its potential clinical utility in predicting an occult cardioembolic source in ESUS without additional clinical and laboratory data.
In the CRYSTAL-AF (Cryptogenic Stroke [CS] and Underlying AF) trial, stroke was classified as cryptogenic when the cause remained uncertain after extensive diagnostic evaluation, including 12-lead ECG, 24 hours or more of ECG monitoring, transesophageal echocardiography, angiographic or ultrasonographic evaluation of intracranial and extracranial vessels, and screening for thrombophilic states (in patients <55 years of age) [14]. In this study, ECG monitoring with an insertable cardiac monitor detected AF in 12.4% of patients by 1 year [14]. We hypothesize that AI algorithms can increase the yield of testing, by helping to select patients who are more likely to test positive for AF during long-term ECG monitoring. To test the hypothesis, further research should investigate prospectively whether an occult cardioembolic source is more often found during post-ESUS or post-CS follow-up in patients with higher CE probabilities predicted by our DWI-only algorithm.
Including AF information changed the DWI-only algorithm-based original categorization of stroke subtype in about 20% of cases, which highlights the importance of detecting AF. In the NAVIGATE ESUS (New Approach Rivaroxaban Inhibition of Factor Xa in a Global Trial Versus ASA to Prevent Embolism in Embolic Stroke of Undetermined Source) trial, rivaroxaban failed to show superiority over aspirin in preventing recurrent ischemic stroke (4.7% per year in both groups) [39]. It was suggested that the eligibility assessment may not have effectively identified strokes due to embolism and that AF was not a major cause of recurrent stroke [39,42]. Indeed, AF was identified in only 3% of the patients at a median follow-up of 5 months, although systematic screening for arrhythmia was not performed during the trial [39]. However, the role of AF in patients with ESUS, whether it is the underlying cause of the index stroke or not, and its effect on stroke recurrence remain unclear [43], requiring further investigations. In the NAVIGATE ESUS trial, about two-thirds of carotid plaques were present in the carotid artery ipsilateral to the index stroke, showing a strong trend of a higher risk of recurrent ischemic stroke [39]. Thus, future ESUS trials for direct oral anticoagulants may have to exclude strokes due to carotid atherosclerosis [44]. Our deep learning algorithms, which effectively classify stroke subtypes using DWIs with or without AF data, would facilitate these research, such as by improving eligibility assessments.
Many disparity studies have shown that primary and comprehensive stroke centers provide different levels of care for treating ischemic strokes. In a recent study involving 750,594 stroke patients from 1,474 stroke centers [45], Chinese investigators found lower levels of care for quality measures such as thrombolysis, rehabilitation access, and medication at discharge, suggesting the need to increase the awareness on guideline-recommended treatments. In addition, a Korean study involving 10,399 patients from 201 healthcare facilities showed that approximately 40% of general hospitals provided relatively low quality stroke care (grade 3–5), while only one third of stroke patients received treatment at grade 1 hospitals [46]. Thus, we believe that our AI algorithms with accuracy comparable to stroke experts (working in grade 1 hospitals) may assist physicians in appropriately triaging stroke patients, particularly in hospitals with limited resources. The algorithms allow more expert guidance to be available to caregivers in resource-poor circumstances and should help to provide more optimal care to stroke patients.
Our study has limitations. First, stroke experts typically determine ischemic stroke etiology by using clinical, angiographic, and laboratory data in a comprehensive manner. The validity of relying solely on DWI and AF information could be questioned. However, an earlier study demonstrated that TOAST diagnoses without DWI matched final diagnoses in 48%, improving to 83% after DWI alone and to 94% after DWI plus magnetic resonance angiography [47], indicating that DWI features has a major impact on classification accuracy enhancement. Second, although we validated the algorithm using multiple external datasets of Korean stroke populations, further investigation is required for multi-ethnic populations. Third, due to the rarity of other determined stroke subtype, deep learning algorithms to identify this subtype are challenging to develop. Thus, further investigation with a larger sample size is required.
Conclusions
In conclusion, our deep learning algorithm trained on a large dataset of DWI and AF information was able to classify ischemic stroke subtypes with a level of accuracy comparable to that of stroke experts. The AI algorithm, which performed well with the minimal data input in three different external test datasets and a multi-center clinical trial dataset, could be useful for stroke management by less experienced physicians or general practitioners.
Supplementary materials
Supplementary materials related to this article can be found online at https://doi.org/10.5853/jos.2024.00535.
Notes
Funding statement
This study was supported by the Multiministry Grant for Medical Device Development (KMDF_PR_20200901_0098), the National Priority Research Center Program Grant (NRF-2021R1A6A1A03038865), and the Basic Science Research Program Grant (NRF-2020R1A2C3008295) of National Research Foundation, funded by the Korean government.
Conflicts of interest
Wi-Sun Ryu, Hoyoun Lee, and Dongmin Kim are employees of JLK Inc., Seoul, Korea. Dong-Eog Kim has stocks of JLK Inc.
Author contribution
Conceptualization: WSR, DEK. Study design: WSR, DK, DEK. Methodology: WSR, HL, DEK. Data collection: WSR, KJL, CKK, BJK, JTK, DHK, JKC, SIS, OYB, HJB, DEK. Investigation: WSR, JWC, JSL, DK, DEK. Statistical analysis: WSR, DEK. Writing—original draft: WSR, DS, DEK. Writing—review & editing: all authors. Funding acquisition: DK, HJB, DEK. Approval of final manuscript: all authors.
Acknowledgements
The authors appreciate the contributions of all members of the Clinical Research Collaboration for Stroke in Korea to this study.