These authors contributed equally to the manuscript as first author.
This study aimed to investigate the applicability of deep learning (DL) model using diffusion-weighted imaging (DWI) data to predict the severity of aphasia at an early stage in acute stroke patients.
We retrospectively analyzed consecutive patients with aphasia caused by acute ischemic stroke in the left middle cerebral artery territory, who visited Asan Medical Center between 2011 and 2013. To implement the DL model to predict the severity of post-stroke aphasia, we designed a deep feed-forward network and utilized the lesion occupying ratio from DWI data and established clinical variables to estimate the aphasia quotient (AQ) score (range, 0 to 100) of the Korean version of the Western Aphasia Battery. To evaluate the performance of the DL model, we analyzed Cohen’s weighted kappa with linear weights for the categorized AQ score (0–25, very severe; 26–50, severe; 51–75, moderate; ≥76, mild) and Pearson’s correlation coefficient for continuous values.
We identified 225 post-stroke aphasia patients, of whom 176 were included and analyzed. For the categorized AQ score, Cohen’s weighted kappa coefficient was 0.59 (95% confidence interval [CI], 0.42 to 0.76;
DL approaches using DWI data may be feasible and useful for estimating the severity of aphasia in the early stage of stroke.
Post-stroke aphasia is a major stroke sequela and burden which significantly and negatively affects the quality of life of patients [
Deep learning (DL) techniques, which are applications of artificial intelligence, have recently emerged and are now rigorously applied in the medical field, especially in outcome prediction studies using imaging data [
Anonymized data are available on reasonable request from any qualified investigator.
We retrospectively analyzed consecutive acute stroke patients with aphasia who visited the Asan Medical Center (Seoul, South Korea) within 7 days of symptom onset between January 2011 and December 2013. Aphasia was defined as the presence of a score greater than 1 in the best language category of the National Institutes of Health Stroke Scale [
We included patients who had ischemic lesions in the left middle cerebral artery (MCA) territory and those who underwent K-WAB within 14 days of symptom onset. Patients with multiple lesions in multiple vascular territories were also included if they had aphasia and stroke lesions in the left MCA territory. Only patients who used Korean as their first language were included. We excluded patients who did not undergo DWI or who demonstrated old stroke lesions within the left MCA territory on fluid-attenuated inversion recovery or gradient echo sequences of MRI [
The patients underwent 1.5-T MRI (Magnetom Avanto, Siemens Healthineers, Erlangen, Germany). The ischemic lesion mask was extracted in the native DWI space using the FSLView toolbox in the functional MRI of the brain software library (FSL, developed by Oxford Center for Functional MRI of the Brain, Oxford, UK). Initial DWI was used for the analysis. A stroke neurologist, who was blinded to all clinical information, manually segmented the DWI high-signal intensity area to measure the ischemic lesion. In patients with lesions in multiple vascular territories, lesions outside the left MCA territory were also included in the lesion volume analysis. When we obtained image features, only lesions in the left hemisphere were targeted and analyzed. Affine transformation and nonlinear warping coefficients were estimated between the DWI using a b-value of zero in the native space and standard Montreal Neurological Institute 152 T2 template images via the Functional Magnetic Resonance Imaging of the Brain’s linear or nonlinear image registration tool. The estimated parameters were applied to the delineated lesion masks.
To cover the whole brain area and to consider white matter regions, we used five brain ATLAS templates (Broadmann’s Area [BA], Automated Anatomical Labeling [AAL], Harvard-Oxford [HO], JHU white matter label [WM-Label], and JHU white matter track [WM-Track]) for analyzing the brain imaging information. The lesion occupying ratio in each individual cortex was calculated for five human brain ATLASs (BA, 41 left hemisphere regions among 82 region labels; AAL, 54 among 116; HO, 55 among 110; WM-Label, 19 among 44; and WM-Track, nine out of 20) in the standard Montreal Neurological Institute space to quantitatively calculate the regional damage as follows: (volume of lesion inside the individual gyrus/total volume of individual gyrus)×100 (%) [
To develop the DL model, we obtained 184 features, including 178 lesion occupying ratio features derived from manually segmented infarct lesions on DWI, and six clinical features (age, sex, interval between the K-WAB test and onset, interval between DWI and onset, education, and lesion volume), which are regarded as important prognostic factors for post-stroke aphasia [
We utilized a deep feed-forward network (DFFN) for the DL model and did not use convolutional neural network approaches. We chose 178 lesion occupying ratio features associated with the left hemisphere regions among various atlases on DWI and clinical features as input parameters. DFFN was implemented using PyTorch (developed by Facebook’s AI Research lab). DFFN consists of four fully connected layers: an input layer (184 nodes), 1st hidden layer (90 nodes), 2nd hidden layer (30 nodes), and an output layer. The hard hyper-tangent activation function was adopted for the 1st and 2nd hidden layers with minimum/maximum values of −1 to 1. Finally, the output layer is fully connected and concatenates the outputs of the 1st and 2nd layers, resulting in a final score with a hyper-tangent activation function having a range of 0 to 1. During training of the model, mean-squared error as a loss function and Adam optimizer with a learning rate of 0.001 with default parameters (beta=0.9 and 0.999; eps=1e-08; weight decay=0; AMSgrad=false) were adopted with 50% of dropout nodes on the 1st and 2nd hidden layers. A mini-batch of 50 samples for an epoch was used for the training, and the training was stopped when the loss from the validation data was no longer minimized compared to the best loss over 50 epochs. The maximum number of training epochs was limited to 500.
For feature selection, we adopted a dropout technique, a regularization method to overcome redundancies in image features, and overfitting [
We developed a conventional machine-learning model using logistic regression with Lasso (least absolute shrinkage and selection operator) regularization [
Baseline characteristics between the training and test sets were compared using the chi-square test or Fisher’s exact test for categorical variables and the t-test or Mann-Whitney U test for continuous variables, as appropriate. To compare the voxel-wise frequency difference of lesions between the training and test groups, the Bernoulli model-based two-sample t-test was performed for each voxel [
The study was performed in accordance with the Good Clinical Practice guidelines and the Declaration of Helsinki, and was approved by the Institutional Review Board of Asan Medical Center (IRB No. 2020-1794). Written informed consent was waived owing to the retrospective nature of the study.
During the study period, a total of 225 acute stroke patients with aphasia visited Asan Medical Center and were included in the study. The K-WAB test was performed for all patients. Of these, 49 were excluded (no available DWI data [n=16], crossed aphasia with right MCA lesions [n=7], old stroke lesion in the left MCA territory [n=13], left-handedness [n=10], and delayed K-WAB test [>14 days after symptom onset, n=3]) (
For categorized AQ score, the DL model showed an accuracy of 61% in total, with Cohen’s weighted kappa of 0.59 (95% confidence interval [CI], 0.42 to 0.76;
For continuous AQ score, the correlation coefficient between the true AQ score and model-estimated AQ score was 0.72 (95% CI, 0.55 to 0.83;
We performed an additional analysis using logistic regression to evaluate the performance of the DL method, as compared to that of the conventional machine-learning approach (
Three patients demonstrated notable discrepancies between the model-estimated and true AQ scores (
We subsequently evaluated Cohen’s weighted kappa value and correlation coefficients only in patients who had undergone acute thrombolysis/thrombectomy. Because the number of these patients was small (n=24 for the training group; n=12 for the test group), we combined two groups in this analysis to increase the number of patients (n=36). As a result, the values of the variables (κ=0.60; 95% CI, 0.41 to 0.78;
In this study, we demonstrated that DL techniques using DWI and clinical data could be used to estimate the severity of aphasia in ischemic stroke patients at an early stage. The DL model showed good performance in estimating the severity of post-stroke aphasia, as compared to the actual AQ score values, within 14 days after symptom onset.
Aphasia is one of the most devastating cognitive sequelae of stroke, which results in difficulties in activities of daily living in stroke patients [
The major strength of this study is that we developed a DL model to estimate aphasia severity in detail. Although several models have been used to predict the outcome of aphasia after stroke [
We noticed three patients with notable discrepancies between the model-estimated and true AQ scores (
The DL technique also estimated the AQ score of patients who received acute thrombolysis/thrombectomy in our cohort. This is remarkable in that MRI findings at a time point early enough to perform acute interventions may also be useful in estimating the degree of aphasia at an early stage. However, the performance of the DL model in an acute intervention setting is likely to be reliable in patients with mild or very severe aphasia, but not in those with a medium degree of aphasia. Because intervention procedures may result in early neurological improvement or alteration, the model-predicted aphasia outcomes should be interpreted with caution in patients undergoing acute interventions (as in Case 3 in
In the present study, the DL model used a skip connection instead of a plain network (i.e., concatenated outputs of the 1st and 2nd hidden layers for input in the output layer; this is called the residual neural network). Previous studies have shown that the residual neural network architecture is useful to avoid the problem of vanishing gradients in multi-layer neural networks in case of an image recognition field [
Our study has some limitations. First, it was a retrospective observational study with a single-center cohort, and therefore has a certain level of inherent bias. Furthermore, all patients used only Korean as their first language, which may have decreased the generalizability of the study results. In addition, we did not correlate early aphasia results with long-term aphasia. Although the severity of early aphasia is an important predictor of chronic aphasia [
Our study suggests that the DL model using DWI data may be feasible and useful in estimating the severity of aphasia in patients with acute stroke at an early stage. These findings warrant further research to evaluate the applicability of DL model in different study populations.
Supplementary materials related to this article can be found online at
Structure of the deep learning (DL) model. The number of nodes in each layer was noted in the box. For the input layer, 178 lesion occupying ratio features (left hemisphere associated regions from various atlases) and six clinical features (age, sex, Korean version of the Western Aphasia Battery [K-WAB] evaluation days from magnetic resonance imaging [MRI], MRI hours from onset, education years, and lesion volume) were used. Using the DL model, the final score was predicted to range from 0 to 1. The true score of the K-WAB was fed into the model by normalizing scores in the range of 0 to 1.
Correlation analysis of sub-scores of aphasia quotient in the test set. (A) The correlation coefficient of spontaneous speech was 0.75 (95% confidence interval [CI], 0.59 to 0.85;
Correlation analysis between true aphasia quotient (AQ) score and model-predicted AQ score of stroke patients undergone acute intervention. The correlation coefficient was 0.73 (95% confidence interval, 0.53 to 0.85;
Performance of logistic regression and deep learning methods depending on input features in the training set
Performance of logistic regression and deep learning methods depending on the input features in the test set
Contingency table between the true and model-predicted scores in patients who underwent acute intervention
Yong-Hwan Kim was employed by company Nunaps Inc.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This research was supported by grants from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare (HI18C2383) and the Ministry of Science and ICT (NRF-2018M3A9E8066249), Republic of Korea.
Flowchart showing patient selection. DWI, diffusion-weighted image; MCA, middle cerebral artery; K-WAB, Korean version of the Western Aphasia Battery.
Lesion pattern heat maps of (A) training and (B) test groups. A heat map was used to visualize the proportion of lesions in each voxel. We compared the lesion proportion in every voxel between the training and test groups using the Bernoulli model-based two-sample t-test, but found no difference between the training and test groups (
Correlation analysis between the true aphasia quotient (AQ) score and predicted AQ score in the test set. The correlation coefficient was 0.72 (95% confidence interval, 0.55 to 0.83;
Imaging characteristics of cases with notable discrepancies. Panels represent (A-D) Case 1, (E-H) Case 2, and (I-L) Case 3, respectively. (A, B) Small acute lesions in the left internal carotid artery (ICA) border zone (yellow arrowheads) on diffusion-weighted imaging (DWI); (C) increased time-to-peak value in the left middle cerebral artery (MCA) territory; (D) severe stenosis in the left proximal ICA (yellow arrow); (E-H) acute infarction in the left corona radiata and bilateral anterior cerebral artery territory on DWI, especially in the left anterior cingulate cortex (yellow arrow); (I, J) large but subtle DWI high-signal intensity in the left MCA territory (yellow dashed lines); (K) occlusion of the left MCA inferior division (yellow arrow) on conventional angiography; (L) subsequent recanalization after immediate mechanical thrombectomy.
Baseline characteristics of patients in the training and test set
Characteristic | 127 Training set | 49 Test set | |
---|---|---|---|
Age (yr) | 65.9±11.8 | 68.0±12.0 | NS |
Female sex | 46 (36.2) | 14 (28.6) | NS |
Hypertension | 60 (47.2) | 31 (63.3) | NS |
Diabetes | 40 (31.5) | 14 (28.6) | NS |
NIHSS on admission | 8.3±5.5 | 8.4±5.5 | NS |
Years of education | 10.1±5.1 | 10.1±4.7 | NS |
Lesion volume (cm3) | 58.4±75.7 | 38.9±49.9 | NS |
MRI from onset (hr) | 56.1±66.7 | 60.4±59.2 | NS |
K-WAB from onset (day) | 3.9±3.0 | 3.4±2.1 | NS |
AQ score | 43.6±30.3 | 37.7±31.1 | NS |
Mild (≥76) | 29 (22.8) | 13 (26.5) | NS |
Moderate (51–75) | 28 (22.0) | 6 (12.2) | NS |
Severe (26–50) | 27 (21.3) | 8 (16.3) | NS |
Very severe (0–25) | 43 (33.9) | 22 (44.9) | NS |
Values are presented as mean±standard deviation or number (%).
NS, non-significant; NIHSS, National Institutes of Health Stroke Scale; MRI, magnetic resonance imaging; K-WAB, Korean version of the Western Aphasia Battery; AQ, aphasia quotient.
Contingency table between the true score and model-estimated score
True score |
Total | ||||
---|---|---|---|---|---|
Very severe | Severe | Moderate | Mild | ||
Model-estimated score | |||||
Very severe | 16 | 3 | 2 | 0 | 21 |
Severe | 4 | 4 | 1 | 2 | 11 |
Moderate | 1 | 1 | 3 | 4 | 9 |
Mild | 1 | 0 | 0 | 7 | 8 |
Total | 22 | 8 | 6 | 13 | 49 |
Accuracy | 16/22 (73) | 4/8 (50) | 3/6 (50) | 7/13 (54) | 30/49 (61) |
Values are presented as number (%). Cohen’s weighted kappa, κ=0.59 (95% confidence interval, 0.42 to 0.76;
Clinical features of cases with significant discrepancies between model-estimated score and true AQ score
Case 1 | Case 2 | Case 3 | |
---|---|---|---|
Age (yr) | 81 | 75 | 54 |
Sex | Male | Male | Female |
Years of education | 9 | 6 | 12 |
Model-estimated AQ score | 90.5 | 55.7 | 1.8 |
True AQ score | 28.6 | 0.5 | 78.1 |
NIHSS on admission | 10 | 9 | 9 |
MRI from onset (hr) | 3.6 | 83 | 0.8 |
K-WAB from onset (day) | 3 | 9 | 5 |
Prime suspect for discrepancy | Low perfusion due to left proximal ICA stenosis | Abulia due to bilateral frontal lesions | Early revascularization |
AQ, aphasia quotient; NIHSS, National Institutes of Health Stroke Scale; MRI, magnetic resonance imaging; K-WAB, Korean version of the Western Aphasia Battery; ICA, internal carotid artery.