Research article | Open | Open Peer Review | Published:
Comparison of audio vs. audio + video for the rating of shared decision making in oncology using the observer OPTION5 instrument: an exploratory analysis
BMC Health Services Researchvolume 18, Article number: 522 (2018)
How non-verbal data may influence observer-administered ratings of shared decision making is unknown. Our objective for this exploratory analysis was to determine the effect of mode of data collection (audio+video vs. audio only) on the scoring of the OPTION5 instrument, an observer rated measure of shared decision making.
We analyzed recordings of 15 encounters between cancer patients and clinicians in which a clinical decision was made. Audio+video or audio only recordings of the encounters were randomly assigned to four trained raters, who reviewed them independently. We compared the adjusted mean scores of audio+video and audio only.
Forty-one unique decisions were identified within the 15 encounters. The mean OPTION5 score for audio+video was 17.5 (95% CI 13.5, 21.6) and for audio only was 21.8 (95% CI 17.2, 26.4) with a mean difference of 4.28 (95% CI = 0.36, 8.21; p = 0.032).
A rigorous and well established measure of shared decision making performs differently when the data source is audio only. Data source may influence rating of observer administered measures of shared decision making. This potential bias needs to be confirmed as video recording to examine communication behaviors becomes more common.
Shared decision making (SDM) is an approach to clinical deliberation in which patients and clinicians think, talk, and feel their way through a troubling and uncertain situation. Together, they settle on a course of action that is sensitive to and respectful of both the research evidence and the patient’s values and context . SDM has been called the “pinnacle of patient centered care”  and has begun to be incorporated into clinical practice guidelines [3, 4].
As SDM is incorporated into clinical practice guidelines the need to understand and measure whether and to what extent it is occurring becomes essential. Several different approaches to measuring SDM have been proposed [5,6,7,8]. These approaches differ based on which viewpoint is being assessed. Some measures, such as CollaboRATE [9, 10], the decisional conflict scale , and the 9-item SDM questionnaire (SDM-Q-9) are self-reported from the patient’s perspective , while others are self-reported from the clinician perspective such as the SDM-Q-Doc . Another approach is to have a third party not involved in the encounter assess the occurrence of SDM. One such measure is the 5-item Observer OPTION5 instrument [14, 15]. The five items are designed to capture plausible mediating behaviors associated with SDM: drawing attention to the existence of options, supporting the patient through the process of information sharing and deliberation, sharing information, eliciting preferences, and integrating preferences . As it is a third party measure, the data to which this measure is applied (i.e. the clinical encounter) is often captured either through audio or video recording . Visual data may affect ratings of communication by showing non-verbal behavior that may affect observer ratings of clinician communication behavior . Existing literature, however, is mixed on the effects of different modes of data collection on the subsequent analysis of data [18,19,20,21,22,23]. Understanding the effect of different modes of data collection on subsequent ratings of SDM is important because it may explain both within and across study differences in ratings of SDM. It may also aid in identifying best methodological practices for researchers.
This exploratory pilot study aimed to determine the effects of data mode, examining audio+video vs. audio only, on OPTION5 scores.
This study was reviewed and approved by the University of Southern California School of Medicine and Mayo Clinic Institutional Review Boards.
All data for this study came from an observational study on cancer communication . In that study, patients were approached just prior to their regularly scheduled oncology encounter at a single tertiary cancer center in the Midwest United States and consented for participation. A study coordinator placed a small digital audio recorder in the exam room for all participants (n = 367), turning it on immediately before the encounter and off immediately after. For the final 40 of these encounters, the study coordinator placed a small video recorder in the room instead of a digital audio recorder. Among these 40 encounters we identified 15 in which both the patient and clinician agreed (through survey rating) that a clinical decision was made . We used recordings from these 15 encounters to conduct this exploratory analysis (Fig. 1).
First, a team of two raters met to review a pilot set of encounters not included in the study sample and to develop criteria and consensus of what should be considered a “decision” for the purposes of comparative rating. To be counted as a decision, the patient-clinician discussion had to focus on issues specifically related to the potential medical management of the patient and two or more alternatives had to be presented or available. A single rater (KEN) then listened to all recorded encounters and flagged all decisions that met these criteria; multiple decisions could be identified in a single encounter.
All decisions identified by this procedure were then coded by four additional trained raters (GS-B, LL, BK, CF) with the OPTION5 instrument. Three raters (GS-B, LL, BK) had training in medicine and one rater had experience in study coordination (CF). Three raters also worked in a SDM research group (GS-B, LL, CF), while the fourth worked in a bioethics research group (BK). The Observer OPTION5, a five item measure, rates decisions on a scale from 0 to 4, with 0 representing “no effort” and 4 representing an “exemplary effort” to exhibit each of the five mediating behaviors . We summed the score from each item and re-scaled to a normalized range from 0 to 100 (e.g. a score of 2 on each of the 5 items would result in a summed score of 10/20, which would be re-scaled to 50/100).
Four raters were randomly assigned to audio only or audio+video conditions for each of the 15 encounters. When raters were assigned the audio only condition, they turned off the video input or otherwise did not view the video input. The statistician, blinded to rater identity, assigned reviewers’ files in a systematic way to ensure that all raters reviewed unique encounters from each mode (audio+video or audio only). For any given encounter recording, a rater reviewed either the audio only or the audio+video recording. Each of the 30 recordings (i.e., 15 audio only and 15 audio+video) were coded independently by two raters.
Prior to scoring any decisions or encounters, raters were trained in the use of OPTION5 using an online training module created by the developers of the instrument and clarified any remaining questions directly with the developers. Raters were also provided the scoring manual  created by the developers of OPTION5 as well as an investigator-developed protocol (Additional file 1) to the scoring of decisions using OPTION5. To minimize rater variability, the four raters calibrated their coding using a set of three encounters (audio+video) not included in the study dataset and prior to this experiment.
To assess the impact of recording mode on OPTION5 scoring, we used a generalized hierarchical model in which the fixed effects were the mode (audio+video vs. audio only) and reviewer and the random effects were the encounter and clinician. This approach accounts for an encounter having multiple decisions, clinicians having multiple encounters, and assumes the error about the effect on OPTION5 within an encounter can vary from encounter to encounter and clinician to clinician. To estimate the average OPTION5 score per recording mode, the predictive margins were calculated providing an adjusted average score with 95% confidence intervals . The predictive margins allowed us to isolate the impact of the mode (audio+video or audio only) by taking the adjusted model with all values and setting mode to either audio+video or audio only. This allowed us to find the average mean effect per mode to determine the amount of difference in a more quantifiable outcome. The statistical analysis was conducted using Stata version 14.0 (College Station, TX).
The mean patient age was 64, a majority of patients had at least some college, and 53% of patients were female (Table 1). Of the 8 clinicians who participated in the 15 encounters, the median number of encounters per clinician was 2 with a range of 1 to 3.
Forty-one unique decisions were identified within the 15 encounters. Ten of the encounters included more than one decision with a maximum of six decisions occurring in two encounters (Additional file 2: Table S1). These ranged from specific decisions such as choosing between radiation or chemotherapy, to less specific discussions such as “what to do next” ((Additional file 3: Table S2).
The overall mean OPTION5 score for the audio only recordings was 21.8 (95% CI 17.2, 26.4) out of 100. For the audio+video recordings, the mean score was lower (17.5, 95% CI 13.5, 21.6). Thus, the average adjusted OPTION5 score for audio only recordings was 4.3 points higher than audio+video with a 95% CI of 0.36, 8.21. This overall difference in mean OPTION5 score was reflected in the mean scores for four of the five constituent items of the OPTION5 instrument (Table 2). For all items, except for item 2 (i.e. clinician supports the patient through the decision making process), audio only was associated with higher scores to varying degrees (i.e. on average, audio is 0.04 points higher for item 4 – eliciting preferences, to 0.31 points higher for item 3 – sharing information). Visual inspection of a plot of total OPTION5 scores for the audio+video mode (y-axis) versus scores for the audio only mode (x-axis) (Fig. 2) suggests that at the higher ranges of OPTION5 scoring, there appears to be greater heterogeneity between audio+video and audio only scores, than on the lower end of OPTION5 scoring. The concordance between reviewers within mode (i.e. audio+video or audio) was approximately 60% (audio: 58.3 95% CI (37.7, 78.9); audio+video: 60 95% CI (40.4, 79.6)).
This small methodological experimental sub-study embedded in a larger observational study demonstrates that mode of recorded data, audio+video vs. audio only, influences ratings of a prominent and commonly used third-party measure of SDM, the OPTION5 instrument.
This finding is similar to those reported for other measures of communication in healthcare [18,19,20,21,22]. Riddle et al.  found that when clinical encounters were coded using the Moffitt Accrual Analysis System (MAAS) and the Roter Interaction Analysis System (RIAS) there were significant differences (p < 0.028 for MAAS and p < 0.01 for RIAS), between audio and video coding. In contrast to this study however, other studies have failed to show a difference between audio only and audio+video modes when rating aspects of health communication. Nicolai et al.  measured empathic communication during clinical encounters using the Rating Scales for the Assessment of Empathic Communication in Medical Interviews; they found no difference between audio and video formats for rating empathy but both modes resulted in higher empathy scores over transcripts. Williams et al. examined video vs. audio only clips of nursing care and rated them using the Emotional Tone Rating Scale and found that the modes were highly correlated and only one item (patronizing) after adjustment for multiple comparisons, was significantly different between modes (audio scored higher). Weingarten et al. . assessed clinical encounters for patient-centeredness using the Henbest and Stewart scale of patient-centeredness and found no difference (both mean scores 1.94 out of 3) between audio and video. Dent et al.  assessed communication in simulated cancer consultations using the Cancode interaction analysis system and found substantial agreement  between audio and video as indicated by kappas of 0.72 and 0.77 for the function and content dimensions of Cancode, respectively . The discrepancies in the literature on the effect of mode could reflect the different constructs being measured, with some more affected by the additional information contained in audio+video (e.g. non-verbal behaviors). In a review of the impact of clinicians’ personality and interpersonal behaviors on the quality of care, Boerebach et al. found mixed results for the impact of non-verbal behaviors on patient reported ratings of quality of care . It is therefore plausible that clinicians’ non-verbal behavior affected third party’s ratings and for example, may explain the non-statistically significantly higher score for item 2. However, the difference we found was driven by items which coded for sharing information and drawing attention to the existence of options; items which arguably are more objective as compared to supporting the patient through the process of information sharing and deliberation. These findings, if confirmed in larger samples, could more definitively establish the relationship between data collection mode and OPTION5 scoring. Ideally, confirmatory studies will be conducted using mixed methods including interviews with raters so that differences could be further explored and explained.
This study has several limitations. Our sample was drawn from cancer consultations with no uniform focus of the discussion. This is in contrast to the typical use of the OPTION5 instrument, which is usually used to score a specific, focused decision (e.g. decisions about medication treatments for diabetes or management strategies for heavy menstrual bleeding). As a result, many discussions with low scores could have created an artificial floor effect, thus limiting our ability to detect a difference. We detected a difference and therefore, the effect of an artificial floor was limited. Our analytic choice to pool all decisions may have differentially affected our mode comparison (i.e. audio+video vs. audio only), thus potentially creating a difference between modes which may be instead due to aspects of the decisions or discussions.
Whether the effects observed in our exploratory study constitute a bias is an important question. Arguably, what matters is how the actors in the encounter experienced it. Third-party measures seek to ascertain second-hand how that process went. It offers an important degree of detached objectivity helpful for documenting verifiable behaviors that may not be easily recalled by participants. Video offers the advantage of non-verbal inputs, yet the meaning of those inputs, largely constituted by gesture, posture, and eye contact and other non-verbal behaviors are as influenced by experience and culture as much or more as the spoken word. Thus, we cannot say whether these differences are good or bad. However, at a minimum, they clearly have implications for aggregating outcome data across trials where a variety of SDM measures and observer-administered ratings are used. They also may influence effect sizes estimation for design of future trials.
While our results merit further investigation on methodologic grounds, investigators studying SDM and patient-clinician communication need to consider the pragmatic implications of choosing audio or audio+video to record encounters. For example, some patients may be more comfortable with audio only recording as it is less intrusive and offers a greater degree of privacy. Yet, this choice limits the richness of the available data for the investigator as audio+video provides additional data such as non-verbal communication which may provide important contextual and non-verbal relational information not available with audio only recordings. Investigators planning to record encounters need to consider these trade-offs and the extent they matter for a given set of study objectives.
Shared decision making is an important component of patient centered care [2, 29]. Research, quality assurance, and quality improvement should be supported by accurate and reliable measures. Different ways of measuring SDM are available including third party measures such as the OPTION5 instrument. Due to patient preference, available resources, or clinician comfort level discussions may be recorded as video or audio only. This study suggests that, for the OPTION5 instrument, mode of recording affects the rating of SDM in the clinical encounter. Additional research is needed to confirm and further understand the effects of mode of recording on ratings of SDM across of variety of conditions and decisions.
Institutional review board
Moffitt accrual analysis system
Roter interaction analysis system
Shared decision making
Hargraves I, LeBlanc A, Shah ND, Montori VM. Shared decision making: the need for patient-clinician conversation, not just information. Health Aff. 2016;35(4):627–9.
Barry MJ, Edgman-Levitan S. Shared decision making — the pinnacle of patient-centered care. N Engl J Med. 2012;366(9):780–1.
Carter HB, Albertsen PC, Barry MJ, Etzioni R, Freedland SJ, Greene KL, Holmberg L, Kantoff P, Konety BR, Murad MH, et al. Early detection of prostate cancer: AUA guideline. J Urol. 2013;190(2):419–26.
January CT, Wann LS, Alpert JS, Calkins H, Cigarroa JE, Cleveland JC, Conti JB, Ellinor PT, Ezekowitz MD, Field ME, et al. 2014 AHA/ACC/HRS guideline for the Management of Patients with Atrial Fibrillation. J Am Coll Cardiol. 2014;64(21):e1–e76.
Bouniols N, Leclere B, Moret L. Evaluating the quality of shared decision making during the patient-carer encounter: a systematic review of tools. BMC Res Notes. 2016;9:382.
Scholl I, Loon MK-v, Sepucha K, Elwyn G, Légaré F, Härter M, Dirmaier J. Measurement of shared decision making – a review of instruments. Zeitschrift für Evidenz, Fortbildung und Qualität im Gesundheitswesen. 2011;105(4):313–24.
Sepucha KR, Borkhoff CM, Lally J, Levin CA, Matlock DD, Ng CJ, Ropka ME, Stacey D, Joseph-Williams N, Wills CE, et al. Establishing the effectiveness of patient decision aids: key constructs and measurement instruments. BMC Med. Inform. Decis. Mak. 2013;13(2):1–11.
Sepucha KR, Scholl I. Measuring shared decision making: a review of constructs, measures, and opportunities for cardiovascular care. Circulation: Cardiovascular Quality and Outcomes. 2014;7(4):620–6. https://www.ncbi.nlm.nih.gov/pubmed/24867916.
Barr JP, Thompson R, Walsh T, Grande WS, Ozanne ME, Elwyn G. The psychometric properties of CollaboRATE: a fast and frugal patient-reported measure of the shared decision-making process. J Med Internet Res. 2014;16(1):e2.
Elwyn G, Barr PJ, Grande SW, Thompson R, Walsh T, Ozanne EM. Developing CollaboRATE: a fast and frugal patient-reported measure of shared decision making in clinical encounters. Patient Educ Couns. 2013;93(1):102–7.
O'Connor AM. Validation of a decisional conflict scale. Med Decis Mak. 1995;15(1):25–30.
Kriston L, Scholl I, Holzel L, Simon D, Loh A, Harter M. The 9-item shared decision making questionnaire (SDM-Q-9). Development and psychometric properties in a primary care sample. Patient Educ Couns. 2010;80(1):94–9.
Scholl I, Kriston L, Dirmaier J, Buchholz A, Harter M. Development and psychometric properties of the shared decision making questionnaire--physician version (SDM-Q-doc). Patient Educ Couns. 2012;88(2):284–90.
Barr PJ, O'Malley AJ, Tsulukidze M, Gionfriddo MR, Montori V, Elwyn G. The psychometric properties of observer OPTION(5), an observer measure of shared decision making. Patient Educ Couns. 2015;98(8):970–6.
Elwyn G, Tsulukidze M, Edwards A, Legare F, Newcombe R. Using a 'talk' model of shared decision making to propose an observation-based measure: observer OPTION 5 item. Patient Educ Couns. 2013;93(2):265–71.
Couët N, Desroches S, Robitaille H, Vaillancourt H, Leblanc A, Turcotte S, Elwyn G, Légaré F. Assessments of the extent to which health-care providers involve patients in decision making: a systematic review of studies using the OPTION instrument. Health Expect. 2015;18(4):542–61.
Boerebach BC, Scheepers RA, van der Leeuw RM, Heineman MJ, Arah OA, Lombarts KM. The impact of clinicians' personality and their interpersonal behaviors on the quality of patient care: a systematic review. Int. J. Qual. Health Care. 2014;26(4):426–81.
Dent E, Brown R, Dowsett S, Tattersall M, Butow P. The Cancode interaction analysis system in the oncological setting: reliability and validity of video and audio tape coding. Patient Educ Couns. 2005;56(1):35–44.
Nicolai J, Demmel R, Farsch K. Effects of mode of presentation on ratings of empathic communication in medical interviews. Patient Educ Couns. 2010;80(1):76–9.
Weingarten MA, Yaphe J, Blumenthal D, Oren M, Margalit A. A comparison of videotape and audiotape assessment of patient-centredness in family physicians' consultations. Patient Educ Couns. 2001;45(2):107–10.
Williams K, Herman R, Bontempo D. Comparing audio and video data for rating communication. West J Nurs Res. 2013;35(8):1060–73.
Riddle DL, Albrecht TL, Coovert MD, Penner LA, Ruckdeschel JC, Blanchard CG, Quinn G, Urbizu D. Differences in audiotaped versus videotaped physician-patient interactions. J Nonverbal Behav. 2002;26(4):219–39.
Harris FC, Lahey BB. Recording system bias in direct observational methodology. Clin Psychol Rev. 1982;2(4):539–56.
Kimball BC, James KM, Yost KJ, Fernandez CA, Kumbamu A, Leppin AL, Robinson ME, Geller G, Roter DL, Larson SM, et al. Listening in on difficult conversations: an observational, multi-center investigation of real-time conversations in medical oncology. BMC Cancer. 2013;13(1):1–10.
Leppin AL, Humeniuk KM, Fernandez C, Montori VM, Yost K, Kumbamu A, Geller G, Tilburt JC. Was a decision made? An assessment of patient-clinician discordance in medical oncology encounters. Health Expect. 2015;18(6):3374–81.
Elwyn G, Grande SW, Barr JP: Observer OPTION 5 Manual. 2015.
Graubard BI, Korn EL. Predictive margins with survey data. Biometrics. 1999;55(2):652–9.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74.
Institute of Medicine. Crossing the quality chasm: a new health system for the 21st century. Washington, DC: The National Academies Press; 2001.
Part of this work was completed by MRG while he was a graduate student at the Mayo Clinic where he was supported by CTSA Grant Number TL1 TR000137 from the National Center for Advancing Translational Science (NCATS). GSB was supported by CTSA Grant Number TL1TR000137 from the National Center for Advancing Translational Science (NCATS) and grant number 3R01HL131535-01S1 from the National Heart Lung and Blood Institute (NHLBI). This work was also supported by Grant Number R01 AT06515 from the National Center for Complementary and Integrative Health (NCCIH). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Ethics approval and consent to participate
This study was reviewed and approved by the University of Southern California School of Medicine and Mayo Clinic IRBs.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Preparing to Score OPTION5. This is the investigator developed protocol for scoring OPTION5 (DOCX 14 kb)
Table S1. Number of discussions within encounters. This Table lists and enumerates the number of discussions that occurred within encounters in our dataset. (DOCX 13 kb)
Table S2. Discussion Topics. This Table lists and enumerates the topics of discussions that were present in the encounters within our dataset (DOCX 14 kb)