Development of a new instrument to assess clinical performance of residents in dermatology-venereology department

Number of Citations: 0

Submitted: 20 March 2020
Accepted: 28 July 2020
Published online: 5 January, TAPS 2021, 6(1), 70-82
https://doi.org/10.29060/TAPS.2021-6-1/OA2241

Sandra Widaty1, Hardyanto Soebono2, Sunarto3 & Ova Emilia4

1Department of Dermatology and Venereology, Faculty of Medicine, Universitas Indonesia – Dr. Cipto Mangunkusumo Hospital, Indonesia; 2Department of Dermatology and Venereology Faculty of Medicine, Universitas Gadjah Mada, Indonesia; 3Pediatric Department, Faculty of Medicine, Universitas Gadjah Mada Indonesia, Indonesia; 4Medical Education Department, Faculty of Medicine, Universitas Gadjah Mada, Indonesia

Abstract

Introduction: Performance assessment of residents should be achieved with evaluation procedures, informed by measured and current educational standards. The present study aimed to develop, test, and evaluate a psychometric instrument for evaluating clinical practice performance among Dermatology and Venereology (DV) residents.

Methods: This is a qualitative and quantitative study conducted from 2014 to 2016. A pilot instrument was developed by 10 expert examiners from five universities to rate four video-recorded clinical performance, previously evaluated as good and bad performance. The next step was the application of the instrument to evaluate the residents which was carried out by the faculty of DV at two Universities.

Results: The instrument comprised 11 components. There was a statistically significant difference (p < 0.001) between good and bad performance. Cronbach’s alpha documented high overall reliability (a = 0.96) and good internal consistency (a = 0.90) for each component. The new instrument correctly evaluated 95.0% of poor performance. The implementation study showed that inter-rater reliability between evaluators range from low to high (correlation coefficient r =0.79, p < 0.001).

Conclusion: The instrument is a reliable and valid instrument for assessing clinical practice performance of DV residents. More studies are required to evaluate the instrument in different situation.

Keywords:            Instrument, Clinical Assessment, Performance, Resident, Dermatology-Venereology, Workplace-Based Assessment

Practice Highlights

  • The residents’ performance will reflect on their professionalism and competencies. Furthermore, clinical care provided by Dermatology and Venereology field is unique, therefore a standard instrument is needed to assess their performance.
  • Dermatology – Venereology Clinical Practice Performance Examination instrument is proven to be reliable and valid in assessing residents’ clinical performance

I. INTRODUCTION

    Performance assessment in medical clinical practice has been a great concern for medical education programmes worldwide. (Holmboe, 2014; Khan & Ramachandran, 2012; Naidoo, Lopes, Patterson, Mead, & MacLeod, 2017). It is an accepted premise that performance may differ according to competency (Cate, 2014; Khan & Ramachandran, 2012). Performance also occurs within a domain; therefore, the assessment of performance should be separated from that of competency. Performance assessment of medical residents should also be informed by existing medical standards and performance criteria (Li, Ding, Zhang, Liu, & Wen, 2017; Naidoo et al., 2017).

    Assessment of residents during their training programme is an important issue in postgraduate medical education, which has declared formative evaluation and constructive feedback as priorities (World Federation for Medical Education, 2015). An earmark of postgraduate medical specialist training is that it occurs in the workplace; therefore, the most appropriate measurement tools are Workplace-Based Assessments (WPBA). In medical education, these assessments emphasise on result and professionalism (Boursicot et al., 2011; Joshi, Singh, & Badyal, 2017).

    In response to a standardisation programme for postgraduate medical specialist training (PMST), the World Federation for Medical Education (WFME) had published guidelines which adopted by several countries including Indonesia (Indonesian College of Dermatology and Venereology, 2008; World Federation for Medical Education, 2015). Clinical care provided in the Dermatology and Venereology (DV) field is unique; a brief examination of the patient is often useful before taking a lengthy history (Garg, Levin, & Bernhard, 2012). Privacy is a top priority, especially for venereology patients, patients with communicable diseases, cosmetic dermatology and skin surgery care.

    Until now, no standard instrument has been available for performance assessment of PMST in DV; therefore, a variety of assessments are in use which may cause discrepancies (Jhorar, Waldman, Bordelon, & Whitaker-Worth, 2017). A valid and reliable method of assessment is required that can be used in various facilities and related to proficiency in both content and process (Kurtz, Silverman, Benson, & Drapper, 2003). Therefore, a study was conducted to focus on the development of a residents’ clinical performance assessment based on certain standards and principles such as the WPBA and WFME standards.

    II. METHODS

    A. Instrument Development

    The instrument was developed and tested using qualitative and quantitative study designs. It started with a solicitation of inputs regarding expected performance from a variety of stakeholders in DV: patients, nurses, laboratory staff, newly graduated DV specialists, DV practitioners, and faculty. A literature review was performed, which included various documents such as the educational programme standards for DV residents, and documentation on available assessment tools (Cate, 2014; Hejri et al., 2017; Norcini, 2010). The instrument was developed according to the current standards (Campbell, Lockyer, Laidlaw, & MacLeod, 2007; McKinley, Fraser, van der Vleuten, & Hastings, 2000).

    The resulted 11-items instrument was subsequently evaluated by faculty groups from various universities in Indonesia. Repeated revisions were carried out. Psychometric data for the instrument were provided through independent evaluations of performance videos of the residents and also through comparison of the results of the new instrument (Dermatology -Venereology Clinical Practice Performance Examination/DVP-Ex) and the compared instrument. The design was a validation study in which psychometric data for the instrument were provided. Further step was the assessment of residents’ performances when performing their clinical practice using the instrument to evaluate instrument reliability and feedback. Flowchart of the study process is shown in Appendix A.

    B. Setting

    The present study was conducted at the Department of Dermatology and Venereology, Dr. Cipto Mangunkusumo Hospital, a teaching hospital for Faculty of Medicine Universitas Indonesia, from 2014 to 2016. The study was conducted in four steps. When developing the instrument (Step 1), we included faculty members from five medical faculties in Indonesia that have DV Residency Programme (Universitas Indonesia, Universitas Sriwijaya (UNSRI), Universitas Padjajaran (UNPAD), Universitas Gadjah Mada, and Universitas Sam Ratulangi) through in depth interview and expert panel. The study received ethical approval from the Research Ethics Committee of the Faculty of Medicine Universitas Gadjah Mada Number KE/FK/238/EC.

    The instrument that had been developed was sent to five senior faculty members from three universities (Department of Dermatology and Venereology, Universitas Indonesia, UNPAD and UNSRI) (Step 2). They were asked to give their assessment in order to have face and content validity. As a test of criterion validity, we recruited 10 faculty members of Faculty of Medicine, Universitas Indonesia, randomised them into two groups. Randomisation was performed to prevent bias against the instruments being tested. One group used DVP-Ex and the other used the current instrument. The single inclusion criterion was more than three years of teaching experience. After receiving some inputs, final correction was done and training was provided for faculty member who would use the instrument.

    C. Performance Video

    To obtain standardised performance of the residents, video recordings of the resident’s clinical practice were made. Two residents were voluntarily recruited and a special team recorded their clinical practice performance using scenarios created by the first author. (Campbell et al., 2007; McKinley et al., 2000)

    There were four videos, each of which showed the clinical practice performance of the residents when they were presented with a difficult case (dermatomyositis) and a common case (leprosy tuberculoid borderline type). Patients had to sign informed consent before included in this study. A good (first and fourth video clips) and poor (second and third clips) standard of performance were demonstrated. Activities presented in the scenarios were those associated with patient care (Campbell et al., 2007; Iobst et al., 2010). After the recording session was finished, patients were managed accordingly and provided with rewards.

    D. Training on the Performance Instrument

    An hour-long training was provided for the 10 faculty members (the examiners). The faculty then practiced scoring, using the recorded video clips. During the training, we received some input and made necessary corrections to the rubrics. There was no training given for the comparison instrument because the entire faculty was already accustomed to this instrument. Step 3 is the step to produce validity, reliability and accuracy of performance instrument, which was conducted through a comparative study between two instruments of assessment; i.e. performance and control instruments. It evaluated the clinical practice performance of residents in the form of video film recording their performance

     E. Implementation of Resident Performance Assessment with Performance Instrument

    This step was aimed to evaluate the reliability of the instrument and results of instrument implementation when it was used to assess the residents (Step 4). The sample included residents of Postgraduate Medical Specialist Training Programme in Dermatology and Venereology, Faculty of Medicine, University of Indonesia and UGM, who were at their basic level (residents who were on their 1st semester in clinical setting), intern level (semester II-V) and independent level (semester VI or higher). 

    Sample size:  n = 3 – 4/level/Faculty of Medicine = 20. The evaluators were five lecturers/ Faculty of Medicine = 10, and each lecturer evaluated six residents.

    F. Data Collection

    One week after the training, the instrument was evaluated. Faculty members assessed the performance of the residents in the four video recordings at the same time. Three days later, the groups underwent a rotation to reassess the video with whichever of the two instruments they had not already used. The examiners were asked to provide feedback and information on the ease of completing the instrument and the clarity of its instructions.  For the implementation of resident assessment performance with the instrument, one resident was being evaluated by three lecturers simultaneously. The lecturers were grouped randomly; therefore, every lecturer could evaluate six residents out of ten residents from each group that would be assessed.

    G. Data Analysis

    The analyses aimed to evaluate validity, reliability, and precision of the instrument for discriminating the performance of the residents as poor, good, or excellent.

    H. Validity and Reliability

    A reliability test was performed, i.e. internal consistency in the form of responses against items in each field (Cronbach alpha coefficient). Face and content validity were assessed by addressing the relevant performance standards and criteria, and by optimizing clarity of instruction, specific criteria, acceptable format, gradation of responses, correct and comprehensive answers (including all assessed variables). The cut-off score of the instrument was determined using ROC (receiver operating curve) principles, which was then used to evaluate sensitivity, specificity, positive and negative predictive value. The accuracy of the instrument was determined to evaluate the precision of the instrument in distinguishing between good and poor performance.

    I. Statistical Analysis

    The statistical analysis was performed using SPSS 11.5 software. Total assessment scores of each examiner were analysed using analysis of variance (ANOVA). Internal consistency was determined using Cronbach’s α and Spearman analysis was performed to acquire p value for the validity. The accuracy was determined by comparing failed or passed score results and comparing it with the video. To obtain the intergroup difference, McNemar’s test and Kappa analysis were carried out. Qualitative analysis was also performed, especially to evaluate feedbacks by performing several analytical steps.

    III. RESULTS

    A performance instrument was developed with 11 competency components, for which evaluation responses were given in the form of rubric scale (Appendix B). All 10 faculty members completed an assessment of each of the four videos. Eight examiners had more than 3 years of teaching experiences, and five examiners were DV consultants.

    A. Validity

    For validity, face, content, and construct validity remain solid points of reference for validity evaluation (Colliver, Conlee, & Verhulst, 2012; Johnson & Christensen, 2008). Face content was evaluated by five experts from three universities. The evaluation was implemented to improve the instrument. The scale of the rubrics described the capacity of residents to perform activities according to the Standard Competency of DV specialist and the domain of performance for physicians has made the instrument evaluated as the instrument with good validity on its face, content and construction.

    The results of the assessments made on performance videos with the DVP-Ex showed that examiners agreed that the performances of the first and fourth videos (the “good” videos) were good performance (>60); conversely, the second and third videos (the “bad” ones) were evaluated as poor performance by 10 and 9 out of 10 faculty members, respectively (Table 1).

    Video

    Mean

    (Score)

    N

    Standard Deviation

    Median

    Minimum

    Maximum

    Score >60

    Score <60

    1

    87.45

    10

    12.59

    89.44

    56.00

    100.00

    90%

    10%

    2

    33.54

    10

    15.77

    35.92

    4.17

    51.85

    100%

    0

    3

    25.31

    10

    16.84

    25.00

    3.70

    64.00

    90%

    10%

    4

    81.96

    10

    9.06

    84.25

    66.67

    96.29

    100%

    0

    Note: Chi Square, Kruskal–Wallis p < 0.001

    Table 1. Assessment scores for each of the four videos (n = 10)

    Faculty members also gave feedback suggesting that the instrument would be useful for assessing residents’ performance. They also commented that the instrument was more objective than the one currently in use, that it was challenging in that they had to read the instrument carefully in order to use it properly, and that the response options allowed several aspects of the residents’ performance to be assessed.

    B. Validity and Reliability

    Validity of the instrument was measure using Spearman analyses showed significant result for all of the competency component (p > 0.001). Reliability measure of the correlation between each item score and the total score on all relevant items (Cohen, Manion, & Morrison, 2008). Our analysis revealed good overall reliability, with Cronbach α = 0.96. All components of competency achieved internal reliability scores >0.95. The correlation between each item score on the competency components and the overall score was excellent (range: 0.64–0.99).

    No

    Competency Component

    Corrected Item-Total Correlation

    Alpha if item Deleted (Cronbach a 0.96)

    1

    C1

    0.76

    0.96

    2

    C2

    0.81

    0.96

    3

    C3

    0.79

    0.96

    4

    C4

    0.76

    0.96

    5

    C5

    0.84

    0.96

    6

    C6

    0.82

    0.96

    7

    C7

    0.88

    0.96

    8

    C8

    0.99

    0.95

    9

    C9

    0.64

    0.96

    10

    C10

    0.90

    0.95

    11

    C11

    0.89

    0.96

    Note: C1 = history-taking, C2 = effective communication, C3 = physical examination, C4 = workup, C5 = diagnosis/ differential diagnosis, C6 = DV management, C7 = information and/ education, C8 = data documentation on medical record, C9 = multidisciplinary consultation, C10 = self-development/ transfer of knowledge, C11 = introspective, ethical, and professional attitude

    Table 2. Analysis of internal consistency for each competency component

    Results from instrument

    Video type

    Amount

    Good

    Poor

    Passed

    Failed

    19

    1

    1

    19

    20

    20

    Total

    20

    20

    40

    Note: McNemar’s test: p = 0.50, Kappa Analysis κ = 0.90, p < 0.001; accuracy = 95%

    Table 3. Comparison of the results from the DVP-Ex instrument and video type (n=40)

    It can be concluded that the instrument was able to accurately assess the clinical practice performance demonstrated in the videos (Table 3). The control instrument can accurately identify 80% of the insufficient performance, which makes it a valuable tool for assessment during the clinical years (Table 4). From both data, it can be concluded that DVP-Ex was better than the control instrument in assessing the video with superior accuracy (95% vs 80%, respectively) and better interrater reliability (0.90 vs 0.60, respectively).

    Results from instrument

    Video type

    Total

    Good

    Poor

    Passed

    Failed

    18

    2

    6

    14

    24

    16

    Total

    20

    20

    40

    Note: McNemar’s test: p = 0.289, Kappa analysis κ = 0.60, p <0.001, accuracy: 80%

    Table 4. Comparison of the results from the control instrument and video type (n=40)

    C. Implementation of the Instrument

    By using the cut-off score of 60, a reliability test was performed among instrument evaluators gradually, i.e. between the evaluator I and II (PI-II), evaluator I and III (PI-III) and evaluator II and III (PII-III). We found the following results: (Table 5).

     

     

    Evaluator I

    Evaluator II

    Evaluator III

    Evaluator I

    Coefficient of correlation

    1.000

    0.59(**)

    0.49

     

    P value

    .

    0.01

    0.07

     

    N

    20

    20

    14

    Evaluator II

    Coefficient of correlation

    0.59(**)

    1.00

    0.79(**)

     

    P value

    0.006

    .

    0.001

     

    N

    20

    20

    14

    Evaluator III

    Coefficient of correlation

    0.49

    0.79(**)

    1.00

     

    P value

    0.07

    0.001

    .

     

    N

    14

    14

    20

    Note: **significant correlation

    Table 5. Analysis of reliability on performance instrument with Spearman’s Rho correlation

    D. Feedback on Assessment with the Performance Instrument

    Most feedback was about skill and the process of the clinical practice being performed. In contrast to results of another study suggesting that most feedback addresses communication (Pelgrim, Kramer, Mokkink, & van der Vleuten, 2012), only 5% of examiners’ remarks mentioned a need to improve communication skill. Additionally, 20% of examiner comments mentioned the importance of attitude, especially as a part of effective communication.

    IV. DISCUSSION

    The present study was conducted to develop a WPBA instrument to assess clinical practice performance, and to obtain psychometric data on the instrument. The DVP-Ex can easily be used by faculty members, Early psychometric evaluation has demonstrated promising levels of validity and reliability of the instrument.

    We found that examiners experienced some difficulties in completing the instrument, therefore, repeated trainings are necessary. Further workup or laboratory examination (C4), multidisciplinary consultation (C9) and knowledge transfer and self-development (C10) were not always scored because they were not observable in every clinical encounter. However, those components (C4, C9, and C10) are important and are not assessed at all by other WPBA instruments (Norcini & Burch, 2007; Norcini, 2010).

    The validity evaluation through face and content validity was performed by the experts, who agreed in their approval of the content and construction of the instrument and its relevance to the competencies and performance of physicians. Moreover, the consistency of the examiners in evaluating the performance videos has provided further evidence that the instrument is appropriate for DV residents. Analysis of internal consistency provided ample evidence of the instrument’s reliability. Additionally, the DVP-Ex’s 95% success rate in categorising poor performance as failing offers yet another converging piece of evidence of the instrument’s validity for identifying residents who are struggling.

    On the step of implementation, not all of inter-evaluator reliability values were good, which might be cause by the unfamiliarity of the evaluators with the performance instrument; therefore, a more intensive training on how to use the instrument may improve inter-evaluator reliability value. The advantage of utilisation of instrument for evaluators in association with instrument reliability has been discussed in various studies (Boursicot et al., 2011). A special strategy is required to produce a successful assessment process (Kurtz et al., 2003). Full participation in the assessment process and training, including providing the feedbacks are needed (Norcini & Burch, 2007).

    The promising results for this instrument’s ability to differentiate poor and good performance could be the basis for further studies to assess the formative functions of the instrument through repeated assessment of the same resident by several examiners. In addition, further studies are needed to justify whether this instrument can also be used as a summative tool. Limitations of the study are that some of the experts were from the same university as the residents which could impose bias on the assessment and no training for the level of questioning. Also, a lot of training and standardisation of the assessors should be addressed if this instrument is to be used in a larger population. 

    V. CONCLUSION

    DVP-Ex is a reliable and valid instrument for assessing DV residents’ clinical performance. With intensive training for the evaluator, this instrument can correctly classify a poor clinical practice performance as a failed performance according to applicable standards. Therefore, it can improve the DV education programme. 

    Notes on Contributors

    Sandra Widaty is a dermato-venereologist consultant and a fellow of Asia Academy of Dermatology and Venereology. She is a Faculty member in Dermatology and Venereology Post Graduate Training and Medical Education Department of Faculty of Medicine Universitas Indonesia. She is the main investigator in this study. 

    Hardyanto Soebono is a professor and faculty member in Dermatology and Venereology and Medical Education Department of Faculty of Medicine Universitas Gadjah Mada.  He conducts lots of publication in both fields. He contributed to the conceptual development and data analysis, including approving this final manuscript.

    Sunarto is a faculty member and teach residents in Pediatrics Department. He conducts lots of research and publication in the field of medical education. He contributed to conceptual development and editing, including approving this final manuscript.

    Ova Emilia got her PhD degree in Medical Education. She teaches doctoral degree in Medical Education. Currently, she is the dean of Faculty of Medicine Universitas Gadjah Mada. She contributed to the conceptual development, data analysis and editing, including approving this final manuscript.

    Ethical Approval

    Research Ethics Committee of the Faculty of Medicine University Gadjah Mada Number KE/FK/238/EC. 

    Acknowledgement

    The authors would like to thank Joedo Prihartono for the statistical calculation and analysis. 

    Funding

    No funding source was required.

    Declaration of Interest

    All authors declared no conflict of interest. 

    References

    Boursicot, K., Etheridge, L., Setna, Z., Sturrock, A., Ker, J., Smee, S., & Sambandam, E. (2011). Performance in assessment: Consensus statement and recommendations from the Ottawa conference. Medical Teacher, 33(5), 370-383.

    Campbell, C., Lockyer, J., Laidlaw, T., & MacLeod, H. (2007). Assessment of a matched-pair instrument to examine doctor – Patient communication skills in practising doctors. Medical Education, 41(2), 123- 129.

    Cate, O. T. (2014). Competency-based postgraduate medical education: Past, present and future. GMS Journal for Medical Education, 34(5), 1-13.

    Cohen, L., Manion, L., & Morrison, K. (2008). Research Methods in Education (6th ed.). London: Routledge.

    Colliver, J. A., Conlee, M. J., & Verhulst, S. J. (2012). From test validity to construct validity and back? Medical Education, 46(4), 366-371.

    Garg, A., Levin, N. A., & Bernhard, J. D. (2012). Structure of skin lesions and fundamentals of clinical diagnosis.  In: L. A. Goldsmith , S. I. Katz, B. A. Gilchrest, A. S. Paller, D. J Leffel & K. Wolff (Eds), Fitzpatrick’s Dermatology in General Medicine, 8e. New York: McGraw-Hill Medical.

    Hejri, S. M., Jalili, M., Shirazi, M., Masoomi, R., Nedjat, S., & Norcini, J. (2017). The utility of mini-clinical evaluation exercise (mini-CEx) in undergraduate and postgraduate medical education: Protocol for a systematic review. Systematic Reviews, 6(1), 146-53.

    Holmboe, E. S. (2014). Work-based assessment and co-production in postgraduate medical training. GMS Journal for Medical Education, 34(5), 1-15.

    Indonesian College of Dermatology and Venereology. (2008). Standard of Competencies for Dermatologists and Venereologists. Jakarta: Indonesian Collegium Dermatology and Venereology.

    Iobst, W. F., Sherbino, J., Cate, O. T., Richardson, D. L., Swing, S. R., Harris, P., … Frank, J. R. (2010). Competency-based medical education in postgraduate medical education. Medical Teacher, 32(8), 651-656.

    Jhorar, P., Waldman, R., Bordelon, J., & Whitaker-Worth, D. (2017). Differences in dermatology training abroad: A comparative analysis of dermatology training in the United States and in India. International Journal of Women’s Dermatology, 3(3), 164-169.

    Johnson, B., & Christensen, L. (2008). Educational Research, Quantitative, Qualitative and Mixed Approaches (3rd ed.). London: Sage Publications, Thousand Oaks.

    Joshi, M. K., Singh, T., & Badyal, D. K. (2017). Acceptability and feasibility of mini-clinical evaluation exercise as a formative assessment tool for workplace based assessment for surgical postgraduate students. Journal of Postgraduate Medicine, 63(2), 100-105.

    Khan, K., & Ramachandran, S. (2012). Conceptual framework for performance assessment: Competency, competence and performance in the context of assessments in healthcare – Deciphering the terminology. Medical Teacher, 34(11), 920-928.

    Kurtz, S., Silverman, J., Benson, J., & Drapper, J. (2003). Marrying content and process in clinical method teaching: Enhancing the Calgary–Cambridge guide. Academic Medicine, 78(8), 802-809.

    Li, H., Ding, N., Zhang, Y., Liu, Y., & Wen, D. (2017). Assessing medical professionalism: A systematic review of instruments and their measurement properties. PLOS One, 12(5), 1-28.

    McKinley, R. K., Fraser, R. C., van der Vleuten, C. P., & Hastings, A. M. (2000). Formative assessment of the consultation performance of medical students in the setting of general practice using a modified version of the Leicester Assessment Package. Medical Education, 34(7), 573-579.

    Naidoo, S., Lopes, S., Patterson, F., Mead, H. M., & MacLeod, S. (2017). Can colleagues’, patients’ and supervisors’ assessments predict successful completion of postgraduate medical training? Medical Education, 51(4), 423-431.

    Norcini, J., & Burch, V. (2007). Workplace-based assessment as an educational tool: AMEE Guide No. 31. Medical Teacher. 29(9):855-71.

    Norcini, J. J. (2010). Workplace based assessment. In: T. Swanwick (Ed), Understanding Medical Education: Evidence, Theory and Practice (1st ed., pp. 232-245). London UK: The Association for the Study of Medical Education.

    Pelgrim, E. A., Kramer, A. W., Mokkink, H. G., & van der Vleuten, C. P. (2012). The process of feedback in workplace-based assessment: Organisation, delivery, continuity. Medical Education, 46(6), 604-612.

    World Federation for Medical Education. (2015). Postgraduate medical education WFME global standards for quality improvement. University of Copenhagen, Denmark: WFME Office.[Accessed 2018 July 20] http://wfme.org/publications/wfme-global-standards-for-quality-improvement-pgme-2015/

    *Sandra Widaty
    Jl. Diponegoro 71,
    Central Jakarta,
    Jakarta, Indonesia, 10430
    Tel: +622131935383
    Email: sandra.widaty@gmail.com

    Announcements