INTRODUCTION
Cognitive impairment is one of the most common and significant nonmotor symptoms of Parkinson’s disease (PD). Mild cognitive impairment (MCI) affects 20%–50% of patients with PD, 15%–20% of whom experience it at the time of diagnosis [
1,
2]. Cross-sectional studies have indicated that the prevalence of dementia in PD patients is approximately 30%, and the cumulative prevalence over an 8- to 12-year follow-up period has been reported as 60%–83% [
1-
4]. For the diagnosis of MCI or dementia in PD patients, the Movement Disorders Society (MDS) Task Force has proposed the following two sets of tests: a practical set (Level I) for screening for cognitive impairment and a second set (Level II) consisting of comprehensive neuropsychological tests that assess individual cognitive domains [
5,
6]. The Montreal Cognitive Assessment (MoCA), originally developed as a screening tool for MCI in the general population [
7], is recommended as a Level I test due to its reliability and validity [
6,
8].
The selection of an optimal cutoff for a diagnostic screening test hinges on balancing sensitivity and specificity to suit the requirements of a particular study or clinical context. Nasreddine et al. [
7] initially recommended a cutoff score of 26 for identifying MCI; however, a lower cutoff of 23 was proposed following a recent meta-analysis [
9]. In the context of memory clinic outpatients, two distinct cutoff points were suggested: a specific cutoff of 24 (92% specificity) and a sensitive cutoff of 26 (91% sensitivity) [
10]. Most studies proposing a cutoff have relied on normative data from either a general population cohort or patients with memory impairments [
10,
11].
Total MoCA scores are independently influenced by age, education level, race, and ethnicity [
11-
13]. In PD, previous studies in English-speaking populations have reported good efficacy of using a single MoCA cutoff score between 24 and 26 for differentiating MCI and/or dementia from normal cognition [
14-
18]. However, in many non-English speaking populations, which exhibit a wide range of educational levels, cutoff scores vary significantly depending on the study, suggesting that a single cutoff may not be universally applicable [
19,
20]. In a recent study focusing on less educated patients with PD, cutoff scores of 19 or lower were proposed for those with 9 years of education or less [
20]. In non-English-speaking populations, cutoffs for cognitive impairment in PD patients have not been systematically validated. These studies are limited by the small number of PD patients, wide ranges of education levels or ages, and use of cohorts that may not be directly applicable to PD [
19]. Using MoCA domain scores from patients with PD in the Parkinson’s Progression Markers Initiative (PPMI) data repository, we previously demonstrated that machine learning analysis could be used to classify cognitive impairment in PD patients with a cutoff of 26 [
21]. However, to date, a comparison between age- and education-adjusted cutoff scores and machine learning analyses in non-English-speaking populations has not been made.
In this study, we developed age- and education-adjusted cutoffs for patients with PD whose cognitive diagnoses were determined through comprehensive neuropsychological assessment. We conducted a validation study in independent PD cohorts and compared the efficacy of machine learning analysis with age- and education-adjusted cutoffs in identifying cognitive impairment in PD patients.
MATERIALS & METHODS
- Participants
We conducted a retrospective review of medical records of PD patients diagnosed according to the MDS’s clinical diagnostic criteria [
22]. These patients had undergone comprehensive neuropsychological tests and the MoCA in two referral hospitals. The exclusion criteria were as follows: 1) had structural brain lesions related to cognitive changes, such as multiple lacunes in the basal ganglia, old cerebrovascular lesions, intracranial hemorrhage, or hydrocephalus; 2) had severe white matter changes; or 3) used anticholinergic medication. This study received approval from the Yonsei University Yongin Severance Hospital Institutional Review Board (No. 9-2021-0181). Informed consent was waived because of the retrospective nature of the study and the analysis used anonymous clinical data. A total of 1,293 PD patients were enrolled and divided into two data tiers based on the subsets and specific objectives (
Figure 1). Tier 1 included 1,202 PD patients (951 patients from Severance Hospital and 251 patients from Yongin Severance Hospital). A subset of Tier 1 patients, consisting of 416 PD patients (168 patients from Severance Hospital and 248 patients from Yongin Severance Hospital; 219 with normal cognition, 122 with MCI, and 75 with dementia) of whom MoCA domain scores and all parameters for machine learning (including category-cued semantic recall scores and multiple-choice cued recall scores) were available, was used to construct the optimal machine learning model. The remaining 786 PD patients, who lacked recognition memory scores (category-cued semantic recall and multiple-choice cued recall), were not included in the machine learning test. Total MoCA scores from 1,202 PD patients were utilized to develop age- and education-adjusted cutoff values. There were 662 individuals with normal cognition and 540 with cognitive impairment (326 with MCI and 214 with dementia). Tier 2 comprised 91 consecutive PD patients (40, 39, and 12 with normal cognition, MCI, and dementia, respectively) who underwent comprehensive neuropsychological tests at Yongin Severance Hospital from January 1, 2023 to August 31, 2023. For a direct comparison of performance between machine learning and cutoff scores, the results of the MoCA, including domain scores or total scores from these 91 PD patients, were used.
- Diagnosis of cognitive status
To evaluate cognitive performance, all patients completed the Korean version (Korean 7.1, K2-Chuncheon) of the Montreal Cognitive Assessment (K-MoCA) for initial cognitive function screening (Level I test) and the Seoul Neuropsychological Screening Battery (SNSB), a comprehensive neuropsychological test suite (Level II test). The SNSB, a standardized neuropsychological test battery extensively used in South Korea [
23], is used to evaluate five cognitive domains: attention and working memory, visuospatial function, language, memory, and executive function. To assess activities of daily living (ADL) related to cognitive performance, two instrumental ADL scales were utilized: the Korean Instrumental Activities of Daily Living scale and the Seoul Activities of Daily Living scale. ADL impairment was identified when both ADL scales indicated abnormalities [
24]. For diagnosing cognitive status, only the results from the SNSB and instrumental ADL scales were considered. Two neurologists and one neuropsychologist came to a consensus to diagnose PD dementia diagnosis, as previously described [
25], following the clinical diagnostic criteria proposed by the MDS Task Force [
5,
26]. When diagnosing MCI, we adhered to the criteria set by the MDS Task Force [
6], utilizing the SNSB. For MCI diagnosis, two tests represented each of the five cognitive domains: attention and working memory function were assessed using the digit span task (backward) and the Color-Word Stroop test; language, through the Korean version of the Boston Naming Test and Wechsler Adult Intelligence Scale-IV Similarities; visuospatial function, via the Rey Complex Figure Test copy and clock copying; memory, through the Seoul Verbal Learning Test; and executive function, through the Controlled Oral Word Association Test and 10-point Clock Drawing Test. Scores on each cognitive test were considered abnormal if they fell below the 1.5 standard deviation mark of the age-, sex-, and education-specific norms. A diagnosis of PD-MCI was determined if impairments in at least two tests across these five cognitive domains were demonstrated.
- Statistical analysis
Using logistic regression, we derived age- and education-adjusted MoCA cutoffs for cognitive impairment in patients with PD [
27]. A logistic regression model was constructed to predict cognitive impairment (normal cognition vs. cognitive impairment) with three independent variables: age, total MoCA score, and education level. To prevent overfitting and ensure the model’s generalizability, fivefold cross-validation was employed [
28]. The logistic regression model was trained using this method, and the average regression coefficients were calculated. A receiver operating characteristic (ROC) curve, along with the area under the curve, was plotted to assess the model’s ability to differentiate between the two groups [
29]. The ROC curves generated for each fold were analyzed, and an optimal average probability threshold was established. The MoCA cutoff value for each age and education level was then calculated using the logistic regression equation, incorporating the estimated optimal regression coefficients and probability threshold. The analysis was divided into different age groups: median ages ranging from 56 to 80 years, each encompassing patients within a 5-year range from the median. The age groups 51–55, 81–85, and 86–95 years were combined into one group as an exception. Additionally, years of education were divided into the following six groups: 0, 0.5–3, 4–6, 7–9, 10–12, and > 12 years. For the performance comparison of the age- and education-adjusted cutoffs, cutoffs developed from Korean patients with vascular cognitive impairment were referenced [
30].
Machine learning analysis was conducted using previously established methods, employing support vector machine (SVM), random forest (RF), and logistic regression models [
21,
31,
32]. These models were applied to analyze detailed cognitive and biological data associated with PD. The MoCA scores were calculated in one of two ways: by calculating domain subtotal scores in the visuospatial/executive, abstraction, memory, orientation, language, and attention domains or by utilizing each of the 30 individual item scores. Additionally, the total number of words in the verbal fluency test from the MoCA and the score on the interlocking pentagon copying test from the Mini-Mental State Examination were included. The presence of depression was determined using either the Short version of the Geriatric Depression Scale or the Beck Depression Inventory. The demographic data, including age, sex, years of education, handedness, and duration of PD diagnosis, were also included as independent variables. To assess the impact of including the scores of the interlocking pentagon copying test on the results, analyses were repeated with this score added as an additional predictor variable.
In the Tier 1 dataset designated for the machine learning test (
n = 416), there was an imbalance between the number of patients with normal cognition and those with cognitive impairment. Predictions from an unbalanced dataset are often skewed toward the majority group [
33]. To mitigate this bias, the dataset was balanced by randomly undersampling the majority group before training the machine learning algorithms. Subsequently, the dataset was randomly divided into training and testing subsets at an 80:20 ratio, a commonly used split in machine learning models. The proportion of patients with PD-cognitive impairments (PD-CIs) in the training and testing datasets was consistent with that in the original dataset through stratified sampling [
34]. The training and testing of the model were repeated 100 times with randomly generated training and testing datasets, and the average model accuracy was computed across all iterations. The evaluation of the classification models was based on sensitivity, specificity, positive predictive value, and negative predictive value.
Feature importance scores are utilized to assess the relative significance of each feature when constructing a prediction model. These scores are calculated based on the machine learning algorithm used [
35]. In the case of a linear SVM, feature importance is determined by the coefficient statistics correlating each feature with the output (dependent) variable [
36]. The feature scores provided by the SVM model were averaged across all trained models to arrange the features in descending order of importance. Negative scores are indicative of features that are crucial for classifying a patient as having normal cognition, while positive scores denote features associated with cognitive impairment.
All analyses were conducted using Python version 3.9.13, along with the scikit-learn 1.3.0 package (Python Software Foundation, Beaverton, OR, USA).
- Data availability
To facilitate replication of the procedures and results, qualified investigators may request anonymized data following ethics clearance and approval from the corresponding author.
RESULTS
- Clinical and demographic characteristics
The demographic and clinical characteristics of the PD patients are summarized in
Table 1 and
Supplementary Table 1 (in the online-only Data Supplement). In Tier 1 (
n = 1,202), patients in the PD-CI group were older and were predominantly male. There were no differences in education level. In the Tier 1 subset for machine learning tests (
n = 416), PD-CI patients were older, had a longer duration of PD, more severe parkinsonian motor symptoms, and more likely to suffer from depression. In Tier 2, which was designed for the validation study (
n = 91), the PD-CI patients were older and exhibited more severe parkinsonian motor symptoms. Across both tiers and the Tier 1 subset, total MoCA scores were significantly lower in the PD-CI group. In both the Tier 1 and Tier 2 subsets, cognitive complaints reported by caregivers were more common in the PD-CI subgroup. The frequency of failure in the pentagon copying test was greater in the order of PD-MCI to PD-related dementia.
- Age- and education-adjusted cutoffs of the MoCA
The cutoffs for cognitive impairment across different education levels and ages, determined via logistic regression, are presented in
Table 2. For patients with a median age between 53 and 68 years and more than 12 years of education, 25 points was established as the cutoff. In contrast, for patients with a median age between 69 and 83.5 years, the cutoff was set at 24 points. A lower education level generally corresponded to lower cutoff values: 22 points for 10–12 years of education (median age range: 62–83.5 years), 21 points for 7–9 years of education (median age range: 53–69 years), and 20 points for the same educational level but an older age range (70–83.5 years). For those with 4–6 years of education, the cutoff was 19 points (median age range: 53–77 years). The cutoff for a lower education level (0.5–3 years) ranged between 18 and 16 points. For patients with PD with no formal education, the cutoff was set at 16 points (median age range: 60–80 years).
- Comparison of machine learning method performances
A comparison of machine learning analysis methods using MoCA domain scores revealed that the SVM method yielded the highest accuracy (0.7624), while the RF method exhibited the lowest (0.7404). The area under the ROC curve ranged from 0.8193 to 0.8577. However, all accuracy levels fell within one standard deviation, as detailed in
Table 3. This ranking of performance across different machine learning methods was consistent when applied to various types of datasets. The inclusion of interlocking pentagon copying test scores with MoCA domain scores led to improved performance, which was still within the standard deviation range. Analyses using binary results of the 30 items in the MoCA, as opposed to domain scores, resulted in the lowest performance, yet it was also within the standard deviation range. Across all outcomes from various machine learning models and datasets, specificity consistently exceeded sensitivity. In the machine learning analysis using SVM, the factors contributing to accuracy were ranked in the following order of importance: years of education, subtotal scores of orientation, memory, visuospatial/executive function, attention, motor Unified Parkinson’s Disease Rating Scale scores, and subtotal language score, as shown in
Supplementary Table 2 (in the online-only Data Supplement). Notably, cognitive complaints reported by caregivers contributed to accuracy, whereas those reported by patients did not. The contribution of the pentagon copying test to performance followed that of the subtotal test in terms of visuospatial/executive function. The website URL for the SVM-based machine learning test of the MoCA is “
http://pdmoca.com.”
- Head-to-head comparison of diagnostic performance among various cutoff methods and machine learning methods
We subsequently compared the performances of various cutoffs and the machine learning method, specifically the SVM, using a new dataset of 91 PD patients (
Table 4). Among the cutoff methods, the highest accuracies were achieved when using the age- and education-adjusted cutoffs, which demonstrated high sensitivity (0.8627) and moderate specificity (0.7250). Applying a uniform cutoff of 25 or 24 for identifying cognitive impairment in PD patients resulted in reduced accuracy (0.7033 or 0.7582), with a significant decrease in specificity (0.4000 or 0.5250). Analysis using cutoffs developed for Korean vascular cognitive impairment indicated lower accuracy (0.7582) with high specificity (0.9250) but low sensitivity (0.6274), implying that these cutoffs were less stringent than the age- and education-adjusted cutoffs for PD. The machine learning analysis using SVM, based on MoCA domain scores, showed comparable accuracies but with higher specificity (0.8500) and slightly lower sensitivity (0.7843) than when the age- and education-adjusted cutoffs were used.
DISCUSSION
Our findings indicate that both age- and education-adjusted cutoff values, as well as machine learning analysis using MoCA domain scores, effectively distinguish PD-CIs with notable accuracy. The age- and education-adjusted cutoffs demonstrated higher sensitivity, while machine learning analysis, specifically the SVM utilizing MoCA domain scores, showed greater specificity. To our knowledge, this study is the first where age- and education-adjusted MoCA cutoffs were developed from a PD patient cohort. Our results confirm that MoCA scores are sensitive to variations in both age and education levels. Prior research examining the validity of the MoCA for diagnosing PD-MCI or PD dementia in English-speaking populations suggested a universal cutoff score of 25 (sensitivity 0.70, specificity 0.75) or 24 (sensitivity 0.87, specificity 0.75), irrespective of age and education level [
15,
18]. Our findings indicate that for individuals with education levels above 12 years, the cutoff is 25 or 24 within a median age range of 53 to 83.5 years. With each 3-year decrease in education level, the cutoff score decreased by a median of 2 points (ranging from 1 to 3 points). When our age- and education-adjusted cutoffs were applied to our Tier 2 cohort, the performance of these cutoff values was comparable to that of previous studies. However, the specificity was lower when a single cutoff score was applied to the same cohort. In conclusion, within a population with varying education levels, using a single cutoff score may lead to inaccurate results.
Several studies have reported MoCA cutoffs for screening MCI and/or dementia in non-English-speaking populations. However, systematic validation in PD groups has been limited [
19,
37]. Age- and education-adjusted cutoffs developed based on these populations vary significantly and are influenced by factors such as participant enrollment methods, underlying disorders related to cognitive impairment, cognitive diagnosis approaches, assessment methods, languages used, and the range of ages and education levels [
11,
38]. Consequently, directly comparing our cutoff scores or performance with those of these studies is impractical.
Our study on developing cutoff scores differs from previous research in several ways. First, while most studies include participants with normal cognition, MCI, or dementia from various neurocognitive disorder etiologies, our study exclusively involved PD patients with three different cognitive diagnoses. This approach differs from that used in other studies on Korean vascular cognitive impairment patients, where the derived ageand education-adjusted cutoffs showed reduced diagnostic accuracy. Therefore, the need for disease-specific cutoff criteria, which may result in better diagnostic performance, warrants further investigation. Second, the cognitive diagnoses in our study were determined using the gold standard method, namely, comprehensive neuropsychological assessments, as recommended by the MDS Task Force [
5,
6]. This approach likely led to more accurate cognitive diagnoses than did those in other studies. Notably, cognitive diagnosis using machine learning tests demonstrated greater specificity than did that using age- and education-adjusted cutoffs. This could be attributed to the machine learning test’s ability to mimic our cognitive diagnoses, which are based on comprehensive assessments of multiple cognitive domains. Lastly, unlike most studies that group age into 60s or 70s, we utilized median ages ranging from ± 5 years. This was feasible due to the large number of participants in our study. Given that age is a continuous variable, grouping subjects into broad age categories may not be as accurate. Our results indicated that within the same education level, two or three different cutoff scores emerged, and the ages at which these cutoff scores changed varied across education levels.
The accuracy of machine learning analyses using the database from the PPMI cohort in our previous study was comparable to that in the present study [
21]. Although the inclusion of cognitive complaints enhanced the classification performance of machine learning in our previous study, feature importance analyses in the current study revealed that a caregiver’s report of cognitive complaints was not a highly significant factor. The accuracy of the machine learning analysis in this study is akin to that of the PPMI data, even when the variable “cognitive complaint” is excluded. This discrepancy might stem from differences in the data collection methods used, as all patients with MCI and dementia were considered to have cognitive complaints according to the PPMI operations manual.
Visuospatial dysfunction is recognized as an early symptom of PD-MCI and is strongly associated with PD dementia, and failing the pentagon copying test is a known predictor of dementia risk in PD patients [
39]. However, our results indicate that including the pentagon copying score in machine learning analysis did not enhance diagnostic performance. This outcome may be attributed to the redundancy of features between the pentagon copying test and subtotal scores of visuospatial/executive function. This inference is supported by the significantly greater frequency of failing pentagon copying test scores in the PD-CI group than in the PD-normal cognition group according to our data and by the fact that both tests are close in terms of ranking in the feature importance analysis.
Our study has several limitations. First, the data were sourced from two hospitals that used identical neuropsychological tests to evaluate cognitive domains. The validation was performed at only one center and involved a relatively small participant group. Ideally, our cutoff values should be validated at other centers using different types of neuropsychological tests. Second, our cutoffs were based on the Korean version of the MoCA, and the impact of linguistic variations may be significant. It would be beneficial to validate whether our cutoff scores are applicable to MoCA tests in different languages among patients with PD. Third, our study merged MCI and dementia categories, although some research has suggested specific cutoffs for each [
14-
17]. We believe that screening for cognitive impairment by combining MCI and dementia is practical in clinical settings. This is because a cutoff solely for MCI, excluding dementia, often results in a high rate of false-positives and low specificity [
9,
16,
17]. Additionally, some patients with PD experience early and rapid cognitive decline, progressing to dementia [
40]. We did not investigate separate cutoffs for dementia due to the limited number of dementia patients in our validation cohort. Assessing ADL related to cognitive dysfunction may be more crucial in diagnosing dementia than relying solely on a MoCA cutoff score.
In conclusion, a single MoCA cutoff score is inadequate for screening for cognitive impairment in PD patients across diverse education levels. Both age- and education-adjusted cutoff methods and machine learning, particularly the SVM approach, demonstrated high effectiveness in detecting cognitive impairment in PD patients. This underscores the potential of machine learning to enhance cognitive assessments in PD patients.