Accuracy of Machine Learning Using the Montreal Cognitive Assessment for the Diagnosis of Cognitive Impairment in Parkinson’s Disease
Article information
Abstract
Objective
The Montreal Cognitive Assessment (MoCA) is recommended for assessing general cognition in Parkinson’s disease (PD). Several cutoffs of MoCA scores for diagnosing PD with cognitive impairment (PD-CI) have been proposed, with varying sensitivity and specificity. This study investigated the utility of machine learning algorithms using MoCA cognitive domain scores for improving diagnostic performance for PD-CI.
Methods
In total, 2,069 MoCA results were obtained from 397 patients with PD enrolled in the Parkinson’s Progression Markers Initiative database with a diagnosis of cognitive status based on comprehensive neuropsychological assessments. Using the same number of MoCA results randomly sampled from patients with PD with normal cognition or PD-CI, discriminant validity was compared between machine learning (logistic regression, support vector machine, or random forest) with domain scores and a cutoff method.
Results
Based on cognitive status classification using a dataset that permitted sampling of MoCA results from the same individual (n = 221 per group), no difference was observed in accuracy between the cutoff value method (0.74 ± 0.03) and machine learning (0.78 ± 0.03). Using a more stringent dataset that excluded MoCA results (n = 101 per group) from the same patients, the accuracy of the cutoff method (0.66 ± 0.05), but not that of machine learning (0.74 ± 0.07), was significantly reduced. Inclusion of cognitive complaints as an additional variable improved the accuracy of classification using the machine learning method (0.87–0.89).
Conclusion
Machine learning analysis using MoCA domain scores is a valid method for screening cognitive impairment in PD.
Cognitive impairment is common in Parkinson’s disease (PD). Mild cognitive impairment (MCI) affects 20%–50% of patients with PD [1]. More than 25% of cases of newly diagnosed PD with MCI (PD-MCI) progress to PD dementia (PDD) within 3 years [2]. The cumulative prevalence of dementia in PD over an 8- to 12-year follow-up has been reported to be 60%–83% [3-5]. Crucially, the occurrence of dementia in PD has a major impact on functional independence, nursing home placement, mortality, psychiatric morbidities, and caregiver burden. The Movement Disorders Society (MDS) Task Force has proposed diagnostic criteria guidelines for PD-MCI and PDD and recommends cognitive assessments using abbreviated (level I) or comprehensive assessments (level II) comprising neuropsychological tests with at least two tests in each of the five cognitive domains [6]. The Montreal Cognitive Assessment (MoCA) is the most widely recommended level I test. The MoCA is more sensitive than the MiniMental State Examination owing to the inclusion of tools for frontal executive function and no ceiling effect [7,8] and is recommended for global cognitive assessment; however, it also permits the evaluation of specific cognitive domains [1,9,10]. The MoCA has been employed to differentiate PD with cognitive impairment (PD-CI) from PD with normal cognition (PD-NC), and several different cutoffs of MoCA total scores have been proposed, with varying sensitivity and specificity [7-9,11-13]. Using a cutoff score in the MoCA has the following limitations [14]: MoCA scores are influenced by not only age but also education, occupational experience, or translated versions of the MoCA. Therefore, application of the MoCA in a less educated population is not accurate. Cognitive change in PD is affected by the duration of PD, the presence of depression and the severity of parkinsonian motor symptoms, which are not incorporated in the determination of cognitive diagnosis using a cutoff score. In PD-MCI or PDD, executive dysfunction or visuospatial dysfunction is more frequently reported [15-18]. Although the evaluation of individual cognitive domains is possible using the MoCA, no study has used multiple cognitive domains of the MoCA in screening PD-CI.
The growing use of big data has catalyzed the adoption of machine learning and deep learning technologies in clinical studies. Machine learning is a statistical methodology for determining patterns based on a training dataset using a specific algorithm to make learning-based predictions for other datasets [19]. Although the cutoff method based on MoCA total scores has been widely used in clinical domains, machine learning methods based on individual domain scores of the MoCA may exhibit superior performance compared to the previous cutoff method. However, no study to date has systematically evaluated the comparative effectiveness of machine learning algorithms for the diagnosis of cognitive impairment in PD using the MoCA. In this study, we investigated the extent to which machine learning algorithms based on MoCA cognitive domain scores could be used to improve the accuracy of PD-CI diagnosis using comprehensive neuropsychological tests compared to a conventional cutoff method.
MATERIALS & METHODS
Participants
This study used the database from the Parkinson’s Progression Markers Initiative (PPMI) cohort, which is an observational, international study cohort designed to identify clinical, imaging, genetic, and biospecimen markers for PD progression to accelerate disease-modifying therapeutic trials. Clinical and neuroimaging data of patients with PD in the PPMI database were downloaded in April 2021. A flowchart of participant enrollment and performance of the MoCA is presented in Figure 1. A total of 3,307 MoCA results were assessed longitudinally in 450 patients with PD in whom presynaptic dopamine loss was documented by dopamine transporter imaging using 123I-Ioflupane single-photon emission computed tomography. MoCA results from patients with PD with cognitive categorization of normal cognition, cognitive complaints, MCI, or dementia based on comprehensive neuropsychological tests (level II) were included. In total, 947 MoCA results from patients with no cognitive categorization and 274 MoCA results from patients with “indeterminate” cognitive categorization were excluded. We further excluded 17 MoCA results from patients with incomplete MoCA, short version of the Geriatric Depression Scale (SGDS), or Unified Parkinson’s Disease Rating Scale by the MDS data. Finally, a total of 2,069 MoCA and SGDS results from 397 patients with PD were included, comprising 221 MoCA results from 101 patients with PD-CI and 1,848 MoCA results from 370 patients with PD-NC. Cognitive categorization of the 74 patients with PD varied from normal to cognitive impairment (either MCI or dementia) during the follow-up. To avoid potential bias due to skewness of sample size or repeated measures, we generated six datasets by random sampling as follows (Figure 1): Datasets I, II, and III were used to compare PD-NC and PD-CI (MCI or dementia). Datasets IV, V, and VI were used to compare PD-NC and PD-MCI. In datasets I and IV, all MoCA results from cases and controls were included, which has a flaw of skewness (unbalanced data). To overcome the skewness, datasets II and V were generated by randomly sampling the same number of MoCA results. However, the flaws in these datasets are that repeated measures in the same subjects are included. Last, to avoid the potential bias of repeated measures, we randomly sampled the MoCA from different subjects, not allowing repeated measurements. This study was approved by the Institutional Review Boards of each participating PPMI site. Written informed consent was obtained from all participants. The study is registered at http://www.clinicaltrials.gov (identifier: NCT01141023).

Flowchart of participants and the enrollment process. PPMI, Parkinson’s Progression Markers Initiative; MoCA, Montreal Cognitive Assessment; PD, Parkinson’s disease; DAT, dopamine transporter; COGCAT_TEXT, cognitive categorization; SGDS, short version of the Geriatric Depression Scale; MDS-UPDRS, Unified Parkinson’s Disease Rating Scale by the Movement Disorders Society; PD-NC, PD with normal cognition; PD-CI, PD with cognitive impairment; PD-MCI, PD with mild cognitive impairment; PDD, PD dementia.
Statistical analysis
The effectiveness of machine learning classification methods was compared using extensive analyses. Three methods were used to build prediction models: support vector machine (SVM), random forest (RF), and logistic regression (LR). SVM is a widely used machine learning method with excellent prediction accuracy [20]. RF is an ensemble model of decision trees that exhibits robustness to noise and irrelevant factors and requires almost no fine-tuning of parameters to produce good predictions [21,22]. LR is a simple model that involves few parameters and is easily interpretable. LR was used as the basis to determine the relative importance of individual screening factors for the identification of cognitive status. Model development was initiated by stepwise LR using the training dataset. Three optimized prediction models (SVM, RF, and LR) based on informative factors were built. Model performance was evaluated and compared with that of a simple classification method using the MoCA total score of 26 as a cutoff.
In the PPMI datasets, there was a discrepancy between the number of MoCA results in the normal cognition and cognitive impairment groups, resulting in an unbalanced dataset. Predictions derived from an unbalanced dataset tend to be inclined to the majority group [23]. We therefore generated two datasets from the raw data of the PPMI datasets, with equal numbers of tests between groups. One dataset included all 221 MoCA results from 101 patients with PD-CI and the same number of results randomly sampled from 370 patients with PD-NC. In the other dataset, 101 MoCA results in each group were independently randomly sampled from 101 patients with PD-CI and 370 patients with PD-NC.
Each of the three original datasets was randomly split into training and testing datasets at an 80:20 ratio, which is the most common split ratio used for machine learning models. The ratio of MoCA results with PC-CI in the training and testing datasets was kept the same as the original dataset using stratified sampling [24]. We repeated the random data sampling and train– test splitting to generate 1,000 different training and testing datasets to evaluate the performance of the above classification models. As the name suggests, the training dataset is used for training the model, and the testing dataset is used for testing the accuracy of the model. The process of model training and testing was performed 1,000 times with randomly formed training and testing datasets, and the model accuracy was averaged across all the trained models. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were used to evaluate the classification models.
Cognitive categorization (variable name in PPMI: “COGCAT”), which was determined based on level II neuropsychological tests according to the MDS Task Force recommendations [6], was used as a dependent variable. PD cases with cognitive complaints were categorized as normal cognition, and a new independent variable of cognitive complaints was added. Since the PPMI manual indicates that the diagnosis of MCI and dementia requires cognitive complaints, patients with MCI and dementia were considered to have cognitive complaints. The MoCA total score was used in the cutoff analysis. In the classification analyses using machine learning (LR, RF, and SVM), MoCA scores in executive, visuospatial, memory, language, and attention domains were used as indicated in previous literature [10]. The total number of words in the verbal fluency test (variable name in the PPMI dataset: “MCAVFNUM”) was included. The depression score was used as a variable by including binary results of 15 items in the SGDS. Demographic data, including age, sex, years of education, handedness, and duration of PD diagnosis, were also incorporated as independent variables.
RESULTS
The demographic and clinical characteristics of the PD groups are summarized in Table 1 and Supplementary Tables 1 and 2 (in the online-only Data Supplement). Patients in the PD-CI group comprised a higher proportion of female patients were older and less educated, and had longer PD duration and more severe parkinsonian motor symptoms. The depression score was higher and MoCA scores were lower in the PD-CI group than in the PD-NC group. Of patients in the PD-NC group, 20.3% presented with cognitive complaints. Patients with PD-NC with cognitive complaints comprised a higher proportion of female patients, were older, and had longer PD duration, more severe parkinsonian motor symptoms, higher depression scores, and lower MoCA scores. We first compared the performance of different classification methods between the PD-NC and PD-CI groups (Table 2). In the analysis of all MoCA results using a cutoff of 26, the accuracy was 0.78 with a sensitivity of 0.81 and specificity of 0.78 (PPV = 0.30, NPV = 0.97). Analyses with three different machine learning methods using the MoCA domain scores exhibited better performance (accuracy = 0.93) but lower sensitivity (0.39–0.46). Low PPV in the cutoff method and low sensitivity in the machine learning analysis using all MoCA results were biased because the number of MoCA results in the PD-NC group was 8.4-fold higher than that in the PD-CI group. To avoid the potential bias caused by unbalanced data in the classification analysis [23], we next performed classification analyses using the same number of MoCA results randomly sampled from the PD-NC and PD-CI groups. There was no difference in accuracy between the cutoff method (0.74 ± 0.03) and machine learning (0.78 ± 0.03) for the classification of cognitive status using a liberal dataset that permitted sampling from the same individual (n = 221 MoCA results in each group). However, the use of a more stringent dataset that excluded MoCA results from the same patients (n = 101 in each group) resulted in a significant reduction in the accuracy of the cutoff method (0.66 ± 0.05) but not of machine learning (0.74 ± 0.07). When cognitive complaints were included as an additional variable, classification accuracy was improved in both datasets of 221-201 and 101-101 MoCA results (accuracy of 0.89 and 0.87, respectively). The addition of SGDS results as an independent variable did not improve the machine learning classification performance. Furthermore, analysis using binary results of 30 items in the MoCA instead of domain scores did not improve the classification performance (Supplementary Table 3 in the online-only Data Supplement).

Diagnostic performance of domain scores of the MoCA for distinguishing PD with normal cognition and cognitive impairment (mild cognitive impairment or dementia) according to the classification model
We next compared the classification performance between the PD-NC and PD-MCI groups after excluding 84 MoCA results from 18 patients with a cognitive categorization of dementia (Table 3). The overall classification performance for the PD-MCI and PD-NC groups was similar to that in previous analyses including PDD. Classification using machine learning with MoCA domain scores exhibited a similar performance to that using a cutoff of 26 for the MoCA total score. The addition of cognitive complaints as a variable improved the performance of the machine learning method, but the addition of depression scores did not affect the results.
DISCUSSION
To our knowledge, this study is the first to investigate the performance of machine learning using the MoCA for PD-CI diagnosis. Previous studies investigating the validity of the MoCA for the diagnosis and screening of PD-CI used a cutoff score and reported an accuracy, sensitivity, and specificity of 56%–64%, 83%–90%, and 44%–75%, respectively [7,11,12]. Our study evaluated whether a machine learning analysis using MoCA domain scores with several covariates could distinguish PD-CI from PD-NC more effectively than a cutoff method using the MoCA total score. The machine learning analysis was trained by MoCA results with cognitive categorization diagnosed using comprehensive neuropsychological assessments (level II criteria). We observed that the accuracy of the machine learning analysis was higher than that of the cutoff method in a stringent dataset. Notably, the inclusion of cognitive complaints, but not depression scores, as a variable improved the classification performance of machine learning.
As the PPMI datasets included longitudinally collected results of cognitive assessments using both level I and II tests, analyses of these data could be biased by multiple longitudinal results from the same patients. An imbalance in the number of MoCA results between the PD-NC and PD-CI groups could introduce another source of bias. Hence, we analyzed several iterations of datasets after sampling MoCA results from the PD-NC and PDCI groups by applying different dataset stringencies. The accuracy of the cutoff method varied considerably (0.66–0.75) depending on the dataset stringency. In contrast, the accuracy of the machine learning analysis was not influenced by the dataset stringency, with the caveat that training was performed with sufficient data. Exclusion of PDD from the dataset did not influence the classification performance of machine learning. Moreover, machine learning analysis has several advantages as follows: Cognition and its measurement are strongly influenced by individual patient variables and their interaction with disease-related factors, which include age, educational background, premorbid function, and cognitive reserve [25]. The use of a cutoff method is unable to reflect the effects of these complex interactions. Education level in the PPMI dataset was relatively high, with small differences. However, in a population of patients with PD with a low education level, classification using the cutoff method may be misleading, unless appropriate norms are provided according to age and education level [14]. Likewise, in non-English speaking populations, appropriate cutoffs for different versions of the MoCA translated to different languages should be provided. Whether machine learning analysis with the MoCA is applicable in non-English speaking populations should be explored in future studies. Although depression influences cognitive function, the inclusion of PD cases with depression did not affect the classification performance despite incorporating the depression score as a variable in the machine learning analysis.
The MDS-recommended diagnostic criteria of PD-MCI stipulate a gradual decline in cognition reported by either the patient or informant or observed by the clinician. According to the PPMI operations manual, the diagnosis of PD-MCI requires cognitive complaints by either the patient or informant (spouse, family member, or friend). Our study demonstrated that the inclusion of cognitive complaints as a variable markedly improved the accuracy of the machine learning analysis in all datasets. However, the retrospective nature of the dataset should be noted because all patients with MCI and dementia were considered to have cognitive complaints based on the PPMI operations manual, whereas 20.3% of patients with PD-NC had cognitive complaints. Information regarding cognitive complaints in the PPMI dataset was not collected using a standardized protocol. Indeed, impaired self-awareness of cognitive deficits has been reported in 16% of patients with PD-MCI and 21.8% of patients with de novo PD [26,27]. The proportion of patients classified with PD-MCI increased from 33% to 41% by eliminating the need for cognitive complaints and performing the diagnosis based on neurobehavioral signs and symptoms derived from the patient or informant [12]. Therefore, our machine learning analysis results including cognitive complaints may have overestimated the classification performance. Future studies investigating the actual magnitude of improvement in PD-CI classification using machine learning analysis, including subjective cognitive complaints obtained from the patient or informant and based on standardized methods, are warranted.
The MoCA enables the detection of mild cognitive changes in PD as well as the evaluation of specific cognitive domains. Characteristic profiles of dysfunctional cognitive domains in patients with PD-MCI or PDD have been reported [28,29], with executive function being the most frequently involved [15,18]. Visuospatial dysfunction is an early feature of PD-MCI and is severely affected in PDD [16,17]. However, our results did not reveal any differences in the performance of machine learning analyses between the use of domain scores and the individual data of 30 items. This may be due to the limitation of the MoCA in the evaluation of cognitive domains. Another possible cause is the heterogeneity of PD-MCI, whereby the involvement of multiple cognitive domains has been reported in 43%–93% of cases [15,18,30].
This study has a few limitations. Most participants did not have de novo PD. Furthermore, education level was high in the PPMI datasets, which may have affected the validity for populations with low education. The optimal cutoff scores for the MoCA vary by race and ethnicity [31]. Therefore, for generalizability of our finding that machine learning analysis is superior to cutoff methods, future studies that are performed in patients of diverse languages, cultures or education levels are needed. Since the results of machine learning may vary depending on the quality of training datasets, high-quality training datasets, including both level I and level II tests, are absolutely important. Types of antiparkinsonian drugs were not included as covariates. Given that the primary purpose of this study was to test the performance of machine learning analysis with the MoCA compared to cognitive assessment by comprehensive neuropsychological tests, we do not think that the effect of medications on cognition differentially affects level I and level II tests. Analyzing the discrimination ability of individual domains, which we think is beyond the purpose of this study, was not included.
In conclusion, our study demonstrates that machine learning analysis using MoCA domain scores is a valid method for screening cognitive impairment in PD. Future studies are warranted to validate the performance of machine learning analysis using the MoCA with the inclusion of a history of cognitive complaints in prospectively enrolled patients with de novo PD with diverse language, culture, or education levels.
Supplementary Materials
The online-only Data Supplement is available with this article at https://doi.org/10.14802/jmd.22012.
Supplementary Table 1.
Demographic data of participants with PD with normal cognition, mild cognitive impairment, and dementia
Supplementary Table 2.
PD-NC with versus without cognitive complaints
Supplementary Table 3.
Diagnostic performance of binary data of 30 items of the MoCA for distinguishing PD-NC and PD-CI according to classification model
Notes
Conflicts of Interest
The authors have no financial conflicts of interest.
Funding Statement
This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. NRF-2020R1F1A1050128).
Author Contributions
Conceptualization: Yun Joong Kim. Data curation: Junbeom Jeon, Kiyong Kim, Kyeongmin Baek, Jeehee Yoon. Formal analysis: all authors. Funding acquisition: Yun Joong Kim. Investigation: Yun Joong Kim, Jeehee Yoon. Methodology: Yun Joong Kim, Kiyong Kim, Jeehee Yoon. Project administration: Yun Joong Kim, Jeehee Yoon. Resources: Yun Joong Kim. Software: Junbeom Jeon, Kiyong Kim, Kyeongmin Baek, Jeehee Yoon. Supervision: Yun Joong Kim, Kiyong Kim, Jeehee Yoon. Validation: Junbeom Jeon, Kiyong Kim, Kyeongmin Baek, Seok Jong Chung, Jeehee Yoon. Visualization: Junbeom Jeon. Writing—original draft: Junbeom Jeon, Yun Joong Kim. Writing—review & editing: Yun Joong, Kim, Kiyong Kim, Jeehee Yoon.