Skip Navigation
Skip to contents

JMD : Journal of Movement Disorders



Page Path
HOME > J Mov Disord > Volume 16(2); 2023 > Article
Original Article
Reliability and Validity of the Embouchure Dystonia Severity Rating Scale
Tobias Mantel1*corresp_iconorcid, André Lee1,2*orcid, Shinichi Furuya2,3,4orcid, Masanori Morise5, Eckart Altenmüller2orcid, Bernhard Haslinger1orcid
Journal of Movement Disorders 2023;16(2):191-195.
Published online: May 24, 2023

1Department of Neurology, Technical University of Munich, Munich, Germany

2Institute for Music Physiology and Musicians’ Medicine, Hannover University of Music, Drama, and Media, Hanover, Germany

3Sony Computer Science Laboratories Inc. (Sony CSL), Tokyo, Japan

4NeuroPiano Institute, Kyoto, Japan

5Meiji University, School of Interdisciplinary Mathematical Sciences, Tokyo, Japan

Corresponding author: Tobias Mantel, MD Department of Neurology, Technical University of Munich, Ismaninger St. 22, Munich 81675, Germany / Tel: +49-89 4140 4630 / Fax: +49-89 4140 4966 / E-mail:
*These authors contributed equally to this work.
• Received: December 13, 2022   • Revised: March 10, 2023   • Accepted: April 5, 2023

Copyright © 2023 The Korean Movement Disorder Society

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • 59 Download
  • Objective
    Embouchure dystonia (ED) is a task-specific movement disorder that leads to loss of fine motor control of the embouchure and tongue muscles in wind musicians. In contrast to musicians’ hand dystonia, no validated severity rating for ED exists, posing a major obstacle for structured assessment in scientific and clinical settings. The aim of this study is to validate an ED severity rating scale (EDSRS) allowing for a standardized estimation of symptom severity in ED.
  • Methods
    The EDSRS was set up as a composite score of six items evaluating audio-visual disease symptoms during the performance of three standardized musical tasks (sustained notes, scales, and fourths) separately for each body side. For validation, 17 musicians with ED underwent standardized audiovisual recordings during performance. Anonymized and randomized recordings were assessed by two experts in ED (raters). Statistical analysis included metrics of consistency, reliability, and construct validity with the fluctuation of the fundamental frequency of the acoustic signal (F0) (extracted in an audio analysis of the sustained notes).
  • Results
    The EDSRS showed high internal consistency (Cronbach’s α = 0.975−0.983, corrected item-total correlations r = 0.90−0.96), interrater reliability (intraclass correlation coefficient [ICC] for agreement/consistency = 0.94/0.96), intrarater reliability over time (ICC per rater = 0.93/0.87) and good precision (standard error of measurement = 2.19/2.65), and correlated significantly with F0 variability (r = 0.55–0.60, p = 0.011–0.023).
  • Conclusion
    The developed EDSRS is a valid and reliable tool for the assessment of ED severity in the hands of trained expert raters. Its easy applicability makes it suitable not only for routine clinical practice but also for scientific studies.
Musician’s dystonia (MD) is a focal, task-specific dystonia. The resulting loss of fine motor control for highly trained movements considerably impairs playing ability, often resulting in termination of professional careers. 1 Two major subentities can be distinguished: 1) MD of the extremities, mostly affecting the fingers playing the instrument (lower extremities are only rarely affected), and 2) embouchure dystonia (ED) affecting the orofacial muscles and the tongue, most frequently in brass players [1,2]. Diagnosis still relies mainly on patient history and clinical examination at the instrument by a movement disorders specialist. In musician’s hand dystonia, the dystonic posture of the affected fingers is usually visible on the instrument during performance. In contrast, in ED, such purely visual assessment is inherently limited since not only perioral but also the jaw or tongue muscles may be affected [2]. Thus, assessment of ED severity relies heavily on the evaluation of sound quality (e.g., onset of a note, ability to sustain a note, etc.). A validated clinical rating scale with sufficient sensitivity to ED-specific features is therefore highly desirable; however, to date, these exist only for musician’s hand dystonia (see Peterson et al. [3] for a review). While customized clinical ratings of ED severity have been applied in previous neuroimaging research [4-6], no validated version of such a scoring is available. Therefore, the aim of the present work was to assess the validity of a clinical ED symptom severity scale (EDSRS) derived from those previously used customized scoring approaches, including its relationship with an established objective measure, the fluctuation of the fundamental frequency (F0), which has reproducibly been shown to differ between diseased and nondiseased professional brass musicians [7,8].
Participants and audiovisual data acquisition
Between 2016 and 2017, 17 professional brass players with ED from the Institute of Music Physiology and Musicians Medicine in Hanover (an expert center for diagnosis and management of MD) were included in parallel to a neuroimaging study published elsewhere [4]. Data acquisition was approved by the Technical University of Munich ethics comittee (5173/11S), and written informed consent ac­cording to the Declaration of Helsinki was obtained from the participants. Standardized performances were audio-visually recorded from each patient with a Sony® HDR-CX305 video camera (Sony, Tokyo, Japan) using a stereo microphone with zoom. Each patient played three short musical tasks: 1) ascending and descending scales, 2) ascending and descending fourths, and 3) sustained tones of at least 5 seconds duration (Supplementary Figure 1 in the online-only Data Supplement). All three tasks were performed by each patient in low, medium and high pitch registers typical for the respective instrument (French horn, trombone, or trumpet) to ensure sufficient sensitivity given a frequent differential degree of involvement of pitch registers in ED. Furthermore, each piece was performed twice to account for potential asymmetry in symptom manifestation: once while being video recorded from a right and once from a left lateral viewing angle of 30°–45° from the midline (Figure 1). This ensured an optimal visualization of the lower face and neck region on each side. For validation purposes, the field of view was confined to this area to ensure anonymization of all participants to the raters.
Scale composition and rating procedure

Scale composition

Based on scorings applied in past neuroimaging studies [4-6], a video-based embouchure dystonia severity rating scale (EDSRS) was set up for evaluation. In brief, the EDSRS is calculated as a composite scale from the audio-visual ratings of six items on a 5-point (0 to 4) Likert scale based on the criteria outlined in Supplementary Figure 2 in the online-only Data Supplement. The six items of the EDSRS correspond to the performance of the three task categories (scales, fourths, and sustained tones) across all registers, separately for each (left and right) body side (i.e., three pieces × two sides). The resulting EDSRS accordingly ranged from 0–24 points, aiming to proportionally describe the severity of impairment due to ED.

Rating procedure

For EDSRS validation, rating was performed by two experts specialized in MD (A.L., E.A.). Prior to rating, appropriate rater training was ensured using selected similar video recordings from a past neuroimaging study [5]. For evaluation of interrater reliability, anonymized patient videos were provided to each rater in a randomized order. To assess intrarater consistency over time, each rater was again provided with the anonymized and newly randomized patient videos. To best avoid recall effects, this second rating was performed > 30 months after the first rating (A.L./E.A. completion at 35/32 months).
Audio analysis
F0 analysis was based on a previous approach [7]. In brief, after a signal cleanup that was required due to the acoustic limitations of video camera microphone recording, time-varying information of F0 from the acoustic signal of the sustained tones in the representative middle pitch-register (for the respective instrument) was extracted from both left and right face recording using Harvest [9]. A sustained F0 signal of 1 s without obvious estimation error was extracted, and then the standard deviation (SD) was computed from the F0 signal. This value was defined as a variable representing the fluctuation of the time-varying F0 signal. The average value of the left/right body side recording was defined as a variable for statistical analysis (Table 1).
Statistical analysis
The following attributes were evaluated, with benchmarks given in brackets: 1) Acceptability: floor/ceiling effects (< 20%) [10] and data distribution characteristics (skewness; from -1 to 1) [11], 2) Internal consistency (Cronbach’s α; > 0.70) [12] and corrected item-total correlation (Pearson’s r > 0.40) [10,11], 3) Interrater reliability expressed by intraclass correlation coefficients (ICC) for the EDSRS (considered satisfactory if > 0.70) [12], and secondarily for the scale’s six items by means of Krippendorff ’s α (considered satisfactory if > 0.60) [11,13], 4) Intrarater reliability expressed by ICC for the EDSRS (considered satisfactory if > 0.70) [12], and 5) Precision as estimated through the standard error of measurement (SEM = SD * 1-r; r = reliability coefficient; acceptable value < ½ * SD) [10,14]. 6) Convergent validity with F0 variability as an objective measure of the construct by Pearson correlation (considered satisfactory if r > 0.50) [11,15].
Data from medical records characterizing the cohort and rating results are given in Table 1 (single item results are additionally provided in Supplementary Table 1 in the online-only Data Supplement). With regard to EDSRS ratings (2 raters × 2 time points), 1) data distribution characteristics by skewness (-0.31–0.92), as well as floor (across ratings 4.4%; range 5.9%–11.8%) or ceiling effects (across ratings, 8.8%; range 0%–17.6%), were within benchmarks across EDSRS ratings. 2) Internal consistency was satisfactory across EDSRS ratings (Cronbach’s α = 0.975−0.983). Each of the six items of the EDSRS reached the 0.40 threshold value for corrected item-total correlations (r = 0.90−0.96). 3) Interrater reliability for the EDSRS was satisfactory (ICC [3 to 2]: agreement 0.94, 95% confidence interval (CI) [0.64–0.98]; consistency 0.96, 95% CI [0.90–0.99]). For the six items of the scale, reliability was also within benchmark with Krippendorff ’s α between 0.64 and 0.90 across ratings, with the highest values for sustained tones (fourths α = 0.64−0.72, scales α = 0.69−0.75, sustained tones α = 0.71−0.90). 4) The intrarater reliability of the EDSRS between the first and second application was also satisfactory (ICC [1 to 1] 0.93/0.87, 95% CI [0.82–0.97]/[0.68–0.95]). 5) The SEM was below one-third of the SD (SD at baseline 8.15/7.31; SEM 2.19/2.65). 6) Convergent (construct) validity against the measure of fundamental frequency (F0) variability was also adequate (r = 0.55-0.60, p = 0.011–0.023; Supplementary Figure 3 in the online-only Data Supplement).
In the present study, we validated a composite score (EDSRS) that quantitatively measures ED-related impairment based on an audio-visual rating of patients. Through this score, we aim to overcome the lack of a valid and reliable clinical score for the estimation of symptom severity in ED, which to date is a major obstacle for the structured clinical assessment of this type of MD. By assessing three modes of playing across registers that require different techniques of the embouchure (i.e., sustained notes, scales, and fourths) from both the left and right sides on a five-point Likert scale, the scale considers that playing impairment is not specific to one certain way of playing [2]. The EDSRS showed high internal consistency and interrater and intrarater reproducibility. Together with a low SEM, the EDSRS thus may prove to be a reliable tool for the quantification of ED severity in daily clinical practice. Furthermore, the significant association of the EDSRS with the fluctuation of the fundamental frequency as an objective correlate of ED symptoms [7,16] indicated construct validity. While a purely technical rating of disease severity in ED based on such correlates has also been proposed [8], this is technically challenging in both acquisition and processing and hence to date not applicable in clinical routine; neither an automated application for such sound analysis nor specialized technical equipment for such approaches is broadly available.
For the first time, we present a clinical rating score for ED that fulfills three of four criteria for scores assessing MD proposed by Spector and Brandfonbrener [17]: The EDSRS is 1) reliable and valid, 2) specifically designed for MD since it assesses symptom severity at the instrument with tasks that induce dystonia, and 3) practical in a clinical setting. Indeed, Comella et al. [18] showed that rating scales that are too complex are not considered useful for clinical applications but rather for clinical studies. However, as a limiting aspect, we could not make a statement regarding the fourth proposed criterion, sensitivity to change. One reason is that treatment options for MD are limited and highly individualized. Thus, no standardized intervention exists against which an improvement could be validated. However, future research should aim to address this criterion.
One strength of the EDSRS is that its application takes 5–6 min and that it can be applied during a clinical consultation, which we consider feasible in daily practice. Although not necessary for rating in clinical settings, additional audiovisual recording of the six items of the EDSRS does not require much extra effort and resources, yet allows for the additional assessment of F0 fluctuations and makes the score easily usable for clinical trials. Naturally, setting up the technical equipment for such optional recordings may require some additional time investment beyond solely the EDSRS application time. Furthermore, in the case of the use of recordings for blinded or external ratings (e.g., in clinical trials), scale application on such recordings would have to be done offline after acquisition (similarly as done for this study).
Our aim was to show that the EDSRS is a valid and reliable tool for assessing ED when applied by experts in musicians’ medicine to whom most of the musicians with ED are referred. One limitation is therefore that we cannot make a statement regarding the generalizability of the EDSRS if it is applied by non-experts in musicians’ medicine. Future studies should aim to broaden the applicability of the scale. Another limitation of the study is that not all phenotypes of ED were present. Future prospective studies may therefore aim to 1) assess the sensitivity of the EDSRS to change, 2) apply it to a larger sample of patients for external validation, ideally including all phenotypes, and 3) assess whether a short version can be derived.
A key challenge in ED is that 12 muscles of the embouchure, laryngeal muscles or the tongue may be involved [2,19], which makes the detection of overt abnormal movement patterns more difficult than in MD of the upper extremity. In the latter condition, the affected fingers can usually be determined by carefully observing the abnormal movements [1], and therefore, this has been a key measure in validated scores for MD of the upper limb as well as in other focal dystonias [3,18]. We addressed this challenge by developing an audiovisual rating scale of performance, which is specifically designed for ED and quantitatively assesses impairment of performance. We showed that this scale is valid and reliable, and we consider it to be suitable for application in everyday clinical routines as well as in clinical studies at clinics specialized in musicians’ medicine.
The online-only Data Supplement is available with this article at
Supplementary Table 1.
Scores for the six EDSRS items for each participant and rating session
Supplementary Figure 1.
Musical tasks performed for Embouchure Dystonia Severity Rating Scale (EDSRS) rating. Ascending and descending fourths, sustained tones and ascending and descending scales in low, medium and high pitch-registers typical for the respective instrument—French horn (“Horn”), trombone (“Posaune”), or trumpet (“Trompete”).
Supplementary Figure 2.
Scoring sheet. Rating is performed as outlined in the instructions. Then, the symptoms during performance of the three task categories (scales, fourths, and sustained tones) across all registers is rated separately for each body side. The sum of all item ratings represents the total score. L, left; R, right.
Supplementary Figure 3.
Scatter plots of correlation analyses performed for assessment of convergent validity between average ratings of reviewers 1 and 2 (R1, R2) and the F0 variability (SDF0). Significance level for the Pearson correlation analyses was set at p < 0.025 (0.05/2, Bonferroni-corrected). EDSRS, Embouchure Dystonia Severity Rating Scale.

Conflicts of Interest

The authors have no financial conflicts of interest.

Funding Statement


Author Contributions

Conceptualization: Tobias Mantel, André Lee. Data curation: Tobias Mantel. Formal analysis: Tobias Mantel, André Lee, Shinichi Furuya, Masanori Morise. Investigation: Tobias Mantel, André Lee, Shinichi Furuya, Masanori Morise. Methodology: Tobias Mantel, André Lee, Masanori Morise. Software: Masanori Morise, Tobias Mantel. Supervision: Tobias Mantel, Masanori Morise, Eckart Altenmüller, Bernhard Haslinger. Visualization: Tobias Mantel, André Lee. Writing—original draft: Tobias Mantel, André Lee. Writing—review & editing: all authors.

We thank all musicians for taking part in this study.
Figure 1.
Field of view used during anonymized video recordings, illustrated for a French horn player from the left and right lateral viewing angle, respectively.
Table 1.
Patients’ demographic, clinical characteristics and EDSRS scoring results
Subject Sex Age (yr) Main instrument Profession Age at start (yr) Dystonia characteristics
Rating results
Age at onset (yr) Duration (yr) Pheno-type R1
T1 T2 T2 T2
P1 M 39 Trumpet Teacher 8 29 10 LP 12 12 16 16 11.17
P2 M 36 Trumpet Teacher 8 25 11 LP 2 0 0 4 1.60
P3 F 20 Trombone Student 14 20 1 LP 18 18 24 24 11.02
P4 M 50 Trumpet Teacher 10 48 2 TS 4 4 7 8 4.07
P5 M 61 Trombone Orchestra 13 50 11 LP 16 12 17 14 5.61
P6 M 37 Trombone Orchestra 15 37 0 LP 23 23 24 24 9.99
P7 M 58 French horn Orchestra 10 57 1 LP 14 10 17 17 3.92
P8 M 29 Trombone Orchestra 10 25 4 LP 6 2 5 4 1.39
P9 M 59 Trumpet Orchestra 12 28 12 LP 0 0 0 1 1.95
P10 M 62 French horn Orchestra 15 59 2 LP 20 18 23 23 7.06
P11 M 47 Trombone Orchestra 13 46 2 LP 21 16 17 16 2.11
P12 F 57 French horn Orchestra 12 42 10 LP 22 21 20 20 6.65
P13 M 45 Trombone Orchestra 15 34 7 LP 16 18 20 20 4.05
P14 M 54 Trombone Orchestra 10 54 12 LP 18 15 23 20 1.67
P15 M 29 Trombone Orchestra, teacher 10 26 3 LP 16 9 20 17 5.46
P16 M 36 French horn Orchestra 10 36 0 LP 12 2 11 0 4.90
P17 M 49 Trombone Teacher 7 45 4 LP 17 17 24 24 7.15
Mean NA 45.2 NA NA 11.3 38.8 5.4 NA 13.9 11.6 15.8 14.8 5.28
SD NA 12.7 NA NA 2.6 12.3 4.6 NA 7.04 7.60 8.21 8.31 3.23

Demographic and clinical characteristics as well as Embouchure Dystonia Severity Rating Scale (EDSRS) values for each patient during the first (T1) and second (T2) rating session, are given for each rater. Furthermore, averaged (left- and right face recordings) variability of the fundamental frequency (SDF0) from the acoustic signal of the sustained tones in the representative middle pitch-register is presented. SD, standard deviation; NA, not applicable; LP, lip-pull; TS, tongue-stop; M, male; F, female.

  • 1. Altenmüller E, Lee A, Jabusch HC. [Musician’s dystonia: phenomenology, causes, differential diagnoses and treatment options]. Musikphysiologie und Musikermedizin 2019;1:13–27.German.
  • 2. Frucht SJ. Embouchure dystonia--Portrait of a task-specific cranial dystonia. Mov Disord 2009;24:1752–1762.ArticlePubMedPDF
  • 3. Peterson DA, Berque P, Jabusch HC, Altenmüller E, Frucht SJ. Rating scales for musician’s dystonia: the state of the art. Neurology 2013;81:589–598.ArticlePubMedPMC
  • 4. Mantel T, Altenmüller E, Li Y, Lee A, Meindl T, Jochim A, et al. Structurefunction abnormalities in cortical sensory projections in embouchure dystonia. Neuroimage Clin 2020;28:102410.ArticlePubMedPMC
  • 5. Mantel T, Dresel C, Altenmüller E, Zimmer C, Noe J, Haslinger B. Activity and topographic changes in the somatosensory system in embouchure dystonia. Mov Disord 2016;31:1640–1648.ArticlePubMedPDF
  • 6. Haslinger B, Noé J, Altenmüller E, Riedl V, Zimmer C, Mantel T, et al. Changes in resting-state connectivity in musicians with embouchure dystonia. Mov Disord 2017;32:450–458.ArticlePubMedPDF
  • 7. Lee A, Furuya S, Morise M, Iltis P, Altenmüller E. Quantification of instability of tone production in embouchure dystonia. Parkinsonism Relat Disord 2014;20:1161–1164.ArticlePubMed
  • 8. Morris AE, Norris SA, Perlmutter JS, Mink JW. Quantitative, clinically relevant acoustic measurements of focal embouchure dystonia. Mov Disord 2018;33:449–458.ArticlePubMedPMCPDF
  • 9. Morise M. Harvest: a high-performance fundamental frequency estimator from speech signals. In: Proc. Interspeech; 2017 August 20-24; Stockholm: ISCA; 2017. p. 2321-2325.
  • 10. Hobart J, Cano S. Improving the evaluation of therapeutic interventions in multiple sclerosis: the role of new psychometric methods. Health Technol Assess 2009;13:iii, ix-x, 1-177. ArticlePDF
  • 11. Portney LG, Watkins MP. Foundations of clinical research: applications to practice, 4th ed. Saddle River: Pearson/Prentice Hall: Saddle River; 2020.
  • 12. Aaronson N, Alonso J, Burnam A, Lohr KN, Patrick DL, Perrin E, et al. Assessing health status and quality-of-life instruments: attributes and review criteria. Qual Life Res 2002;11:193–205.PubMed
  • 13. Wyrwich KW, Bullinger M, Aaronson N, Hays RD, Patrick DL, Symonds T, et al. Estimating clinically significant differences in quality of life outcomes. Qual Life Res 2005;14:285–295.ArticlePubMedPDF
  • 14. Abma IL, Rovers M, van der Wees PJ. Appraising convergent validity of patient-reported outcome measures in systematic reviews: constructing hypotheses and interpreting outcomes. BMC Res Notes 2016;9:226.ArticlePubMedPMC
  • 15. Hallgren KA. Computing inter-rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol 2012;8:23–34.ArticlePubMedPMC
  • 16. Uehara K, Furuya S, Numazawa H, Kita K, Sakamoto T, Hanakawa T. Distinct roles of brain activity and somatotopic representation in pathophysiology of focal dystonia. Hum Brain Mapp 2019;40:1738–1749.ArticlePubMedPMCPDF
  • 17. Spector JT, Brandfonbrener AG. Methods of evaluation of musician’s dystonia: critique of measurement tools. Mov Disord 2007;22:309–312.ArticlePubMed
  • 18. Comella CL, Leurgans S, Wuu J, Stebbins GT, Chmura T; The Dystonia Study Goup. Rating scales for dystonia: a multicenter assessment. Mov Disord 2003;18:303–312.ArticlePubMedPDF
  • 19. Iltis PW, Frahm J, Altenmüller E, Voit D, Joseph A, Kozakowski K. Tongue position variability during sustained notes in healthy vs dystonic horn players using real-time MRI. Med Probl Perform Art 2019;34:33–38.ArticlePubMed

Figure & Data



    Citations to this article as recorded by  

      Comments on this article

      Add a comment

      JMD : Journal of Movement Disorders Twitter