Assessment of Videos on YouTubeTM about Nasopharyngeal Cancer in Terms of Accuracy, Reliability and Understandability

Introduction: In the internet era we live in, it is very easy to access information. While this situation has positive effects for patients using the internet, it also brings some negative effects. The effects of the quality of YouTube™ videos on nasopharyngeal cancer were examined. Methods: “Nasopharyngeal Cancer” as search term was used to conduct a search on YouTube™. The ‘Sort by’ search filter was set at ‘relevance’, which is the default for YouTube™ searches. The first 250 results were reviewed and analyzed. After the videos were eliminated according to the exclusion criteria, 45 videos were evaluated by two authors. Video materials were categorized according to “video type” and “source of content”. According to “video type” and “source of content” the videos were categorized into two as educational and testimonial and three as medical institution, medical website, and individual users. After recording the features of all evaluated videos, accuracy score, audiovisual score, modified discern score, patient education materials assessment tool for audiovisual materials (PEMAT) score and usefulness score were determined for each video to evaluate the accuracy, reliability, and understandability of the videos. Results: The usefulness score, modified discern score, and accuracy score of the educational videos were significantly higher than testimonial videos (p<0.001 for all). Educational videos provided more useful and accurate video content than testimonial videos. In addition, it was also determined that the median PEMAT actionability score and audiovisual score of the individual group were statistically significantly lower than medical institutions and medical websites (p=0.001 and p<0.001, respectively). The videos provided by medical institutions, including universities, did not have a significant advantage over other groups in terms of accuracy, reliability, and usefulness. Conclusion: Healthcare videos concerning nasopharyngeal cancer on YouTube™ are heterogeneous and are not peer reviewed. Therefore, medical professions on nasopharyngeal cancer need to upload more accurate, reliable and easy to understand videos onto online platforms such as YouTube™.

Introduction video streaming site with over 100 million daily viewers, with around 65,000 videos uploaded daily (YouTube Press Statistics, 2017). It is also a great resource for patient information with large number of contents, variety of videos, and practical uses. Whereas some of the videos are uploaded by professional sources such as medical professionals, most of the videos are based on personal experiences. This assortment of video content providers and lack of peer review process on YouTube™ has resulted in the broadcast of inadequate or misleading health information (Pandey et al., 2010). In addition, internet users who are not medical professionals may not be able to distinguish between genuine patient education videos and those made for commercial purposes. Therefore, it is important to emphasize that inappropriate and biased health-related information on the internet can influence patients to make irrational decisions (Balakrishnan et al., 2016). Therefore, there is a need to make sure that online users reach accurate, reliable, and understandable information sources.
To the best of our knowledge, there has been no assessment of the accuracy, reliability, and intelligibility of the information presented on YouTube™ videos for NPC. We aimed to systematically analyze the videos that users are likely to encounter when searching YouTube™ for information about NPC. These videos were evaluated in terms of accuracy, reliability, intelligibility and actionability using measurement scales such as modified discern score, patient education materials assessment tool for audiovisual materials (PEMAT), NPC usefulness scoring systems, accuracy score, and audiovisual score.

Materials and Methods
The Cukurova University, Faculty of Medicine, Non-Invasive Clinical Research Ethics Committee approved the study, and the study was conducted in line with the Declaration of Helsinki.
"Nasopharyngeal Cancer" as search term was used to conduct a search on YouTube (https://www.youtube. com/) on July 19, 2021. The 'Sort by' search filter was set at 'relevance', which is the default for YouTube searches. All researchers conducted the search by clearing their browser's search history and disabling their location status to decrease bias. The first 250 results were reviewed and analyzed. Ads served by YouTube™ in search results were not counted. The videos were excluded based on exclusion criteria including non-English videos, non-relevant videos, duplicate videos, videos with no sound or subtitles, and meeting and webinar videos.

Video analysis and data collections
Two independent reviewers (C.E., O.S.) assessed and analyzed all videos separately. Firstly, identifying features of each video were recorded including URL; video name; video duration; upload date; the number of views, likes, dislikes and comments. View ratio (views/day) and likes ratio (likes * 100) / (likes * dislikes) were also calculated. As a result of our literature search for an index that evaluates both views and likes of videos, we chose to evaluate the popularity of videos using an index called Video Power Index (VPI), which is calculated with the following formula: like ratio*view ratio/100 (Erdem and Karaca, 2018). Then, each video was evaluated separately by applying all measurement scales. During the evaluation process of videos with measurement scales, some videos were rated differently by individual reviewers. Therefore, inter-rater reliability was assessed for each measurement scales. Three possible answers were given to the "source of content" category. If responses by reviewers differed, the video was considered as "unidentified source". In terms of the accuracy score, audiovisual score, modified discern score, PEMAT score (understandability and actionability) and usefulness score, the median value would be taken into account if results found by reviewers differed.
Video materials were categorized according to "video type" and "source of content". It was divided into two; 1) educational and 2) testimonial videos according to the purpose of production. In terms of source of content, the videos were in three categories; 1) medical institution (the official account of a university, hospital and an individual who works in a university and hospital), 2) medical website (healthcare related youtube channels or medical charity foundations) and 3) individual users (a private account with no affiliation to an institution or a university). The features and measurement scale results of the groups within the same category were compared with each other.
Audio-visual quality scores and accuracy scores were determined after the videos were classified. Audio-visual quality scores were given out of four using the following ratings: 0 = impossible to view, 1 = poor-blurred, out of focus; 2 = moderate-non-professional editing; 3 = excellent quality-clear, professional editing. Accuracy scores were also given out of four: 0 = misleading and largely false; 1 = poor-easily identifiable misinformation; 2 = moderate-some oversimplification, general correct information; 3 = excellent-professional level, extremely accurate (Enver et al., 2020).
Reliability of video information was determined using the modified discern score, as originally defined by Singh (2012). The modified discern score assesses clarity, credibility, bias, reference reinforcement, and areas of uncertainty, specifically for information in YouTube™ videos (Supplement 1). One point is given for each criterion, with five points indicating the highest reliability. The PEMAT is a systematic method developed to select printable and audiovisual patient education materials, which are easier to understand and easier to act on. We used the version for audiovisual materials, which consists of thirteen items measuring understandability and four items measuring actionability (Supplement 2). The PEMAT provides two scores for each material, one for understandability and a separate score for actionability. Every item has 1 point (Agree) or 0 points (Disagree) and not applicable (N/A -not included the calculation). The total scores of the material on understandability and actionability items were summed up separately. The total score was divided by the total possible score-i.e. the number of items for which the material was rated, excluding items scored as not applicable (N/A). The result was multiplied by 100 to obtain a percentage (%) for understandability and actionability. There was no set Evaluation of YouTube™ as an Information Source in Nasopharyngeal Cancer median and minimum-maximum. Chi-square test was used to compare categorical variables between the groups. The normality of distribution for continuous variables was confirmed with the Shapiro Wilk test. For comparison of continuous variables between two groups, Mann Whitney U test was used. For non-normal distributed data, Kruskal Wallis test was used to compare more than two groups and Bonferroni adjusted Mann Whitney U test was used for multiple comparisons of groups. To evaluate the correlations between measurements, Spearman Rank Correlation Coefficient was used. Inter-rater reliability was evaluated with Fleiss kappa and Spearman Rank Correlation coefficient. All analyses were performed using IBM SPSS Statistics Version 20.0 statistical software package. The statistical level of significance for all tests was considered to be 0.05.

Results
The first 250 results of the search were reviewed and analyzed. Videos that did not meet the criteria (n:178) and could not be categorized due to "unidentified source" (n:27) were excluded from the study. A total of 45 videos were included in the study. After classification by type, the educational and testimonial groups consisted of 22 and 23 videos respectively. After the classification by source of content, the healthcare channel, university and individual groups consisted of 18, 12, and 15 videos respectively. In addition, videos were classified according to the NPC usefulness scoring system and no very useful video was found. In this case, useful, slightly useful, and poor video groups consisted of 8, 11, and 26 videos respectively. The mean length of the videos was 180 second (s) (range cut-off value for the scores (PEMAT, 2020).
Since there is no current usefulness index for nasopharyngeal cancer, a new scoring system was developed for this study to review the reliability of the videos (Supplement 3). In this scoring system, we have listed the main information that a reasonable doctor should give to the patient about NPC. We adapted the scoring systems made in previous studies for NPC in order to evaluate the YouTube™ videos as well as reviewing the current literature (Ben-Ami et al., 2021;Bossi et al., 2021). The usefulness score consisted of a total of 12 items; Ebstein Barr virus, nutrition and habits (Salted fish, smoking, and alcohol), geographical distribution, age, gender and genetics, nasal obstruction, epistaxis, neck mass, additional symptoms (otitis media, sinusitis, diplopia, and facial paresthesia), diagnosis, treatment options, and progress of the disease. The videos were checked in terms of the twelve items being mentioned one by one, and each mentioned item was worth one point. According to this, scores between 10 and 12, 7 and 9, 4 and 6, and 0 and 3 were categorized as very useful, useful, slightly useful, and poor respectively. The usefulness scores of the videos were calculated, categorized, and compared. In addition, the videos were divided into three according to the usefulness category; 1) useful group, 2) slightly useful group, and 3) poor group. The features and measurement scale results of the groups were compared with each other.

Statistical analysis
Categorical variables were expressed as numbers and percentages, whereas continuous variables were summarized as mean and standard deviation and as from 25 s to 1912 s). The mean number of views for nasopharyngeal cancer-related videos was 730 (range: 2 -69935 views); each video was viewed for a mean of 0,3 views/day (range: 0 -45 views/day).
The inter-rater reliability of the scores is listed in Table  1. The Fleiss kappa coefficients for the scores ranged from 0.547 to 0.961 and the Spearman correlation coefficients were ranged from 0.887 to 0.982, showing statistically significant inter-rater reliability. Inter-rater reliability was statistically significant and almost excellent for all scores (p<0.001). This result means that the two reviewers agreed on all scores about the YouTube™ videos in terms of NPC.
Descriptive statistics of included videos according to video type are given in Table 2. While there was no difference in the median number of dislike and comment (p=0.586, p=0.547 respectively), the median number of likes and VPI values of the testimonial group (7 and 0.6, respectively) were higher than the educational group (3.5 and 0.2 respectively). However, this difference was not statistically significant (p=0.501 and p=0.776) (Figure 1a-b). The median usefulness score, modified discern score, and accuracy score of the educational videos were significantly higher than testimonial videos (p<0.001 for all), and their distribution is shown in Figure  1c-e. Although, the median audiovisual score, PEMAT understandability, and actionability index values of educational videos were higher compared to testimonial videos, this difference was not significant (p=0.127, p=0.114, and p=0.071 respectively).
The distributions of PEMAT actionability score and audiovisual score were not similar between source of content groups (p=0.001 and p<0.001, respectively) (Figure 2a-b). As a result of pairwise comparisons, it was determined that the median PEMAT actionability score and audiovisual score of the individual group were lower than the other groups. Additionally, usefulness score, accuracy score, and modified discern score were similar among the source of context groups (p=0.647, p=0.424, and p=0.155 respectively) ( Table 3) (Figure 2c-e).
Descriptive statistics of included videos according to usefulness are given in Supplement 4. The median modified discern and accuracy score of the poor group was statistically significantly lower than slightly useful and useful groups (both p<0.001). The PEMAT understandability score of the poor group was significantly lower than the useful group (p=0.031). Similarly, PEMAT actionability score of the poor group was statistically lower than the slightly useful group (p=0.036).
We also examined the correlations between the suggested usefulness score and other video characteristics (Supplement 5). Correlation analysis demonstrated that usefulness score significantly and positively correlated with accuracy score (r=0.794, p<0.001 and r=0.747, and p<0.001 respectively).

Discussion
In today age of technology, it is very easy to access information and this condition causes specific problems. There is an incredible amount of information pollution on the internet. If the user is not an expert on the subject they research, they cannot determine whether the information is correct. YouTube™ is one of these sources of information. Similar to other popular social media websites, YouTube™ also shares materials for free and allows any registered user to upload health-related, non-peer-reviewed videos, and the information contained in these videos is not checked. This makes YouTube™   Audiovisual score a 2 (0-3) 2 (1-3) 1 (0-2) Ψ, ɸ <0.001 Usefulness score a 3.5 (0-9) 3 (0-10) 2 (0-8) 0  , Data were expressed as median(min-max); b , n(%); Ψ p<0.05 compared with healthcare channel/charity group; ɸ p<0.05 compared with university/ hospital group; s, second; VPI, Video power indeks; PEMAT, Patient education materials assessment tool for audiovisual materials Figure 2. Distributions of PEMAT Score (actionability) (a), Audiovisual score (b), Usefulness score (c), Accuracy score (d) and Modified Discern score () according to source of content vulnerable to posting fake and potentially dangerous videos that are not backed by solid scientific evidence. In our study, regardless of the video source, the median accuracy score was 1/3, the mean usefulness score was 3/12, and the mean modified discern score was 1/5. Especially in testimonial videos, these values were found to be lower. Additionally, symptoms and treatment options were not mentioned in most of the videos. Drozd's (2018) review highlighted that most health-related YouTube videos present false and unreliable information. Given the results, we consider that YouTube™ is not a reliable and accurate source for NPC.
It was found that only 26% of them were medical professionals in official medical institutions. The accuracy and reliability of the information given by the rest is highly suspicious. The heterogeneous and uncontrolled misinformation on YouTube has been previously reviewed by Hassona et al., (2016) in their own study on oral cavity cancer. Due to this information pollution, patients can be misled and make wrong decisions even on a serious and sensitive issue such as cancer.
The videos available on YouTube™ vary in terms of quality, accuracy, and content as these materials are very heterogeneous. Enver et al., (2020) showed that the audiovisual quality scores and the accuracy score of the university group were statistically significantly higher than the health-care channels, individual users, and TV channel/news groups. In the same study, authors also indicated that videos uploaded by university-affiliated accounts were more accurate, more trustworthy, and more professionally recorded and edited. In our study, we also observed that audiovisual quality score of the individual group were lower than the universities and health-care groups. However, there were no significant difference between university/medical institution and health-care groups. Additionally, accuracy scores between these three groups did not have significant differences. Therefore, our findings indicated that university/medical institution group was not more reliable, more accurate, and more professional than health-care group for NPC.
In the study by Fode et al., (2020) to evaluate videos content on erectile disfunction, the median PEMAT understandability score and the median PEMAT actionability score were equal to 100%. In our study, the PEMAT score differed according to the source from which the videos were uploaded. The PEMAT actionability scores of the videos uploaded by university/medical institution and healthcare groups were statistically significantly higher compared to individual group. There were differences between these groups about PEMAT understandability score; however, these differences were not statistically significant. The fact that most of the videos in the university/medical institution group consisted of patient testimonial videos (%66.7) may explain that these videos are more understandable and actionable. However, the accuracy and reliability values were not better than the other groups. We think that these videos are not helpful for patients to make the right choice. Therefore, it is clear that there is a need for official medical institutions to upload more accurate and reliable content on NPC.
Videos from a different perspective as educational and testimonial were examined. Although there was no difference between the audiovisual quality scores of the educational and testimonial videos, the accuracy and usefulness scores of the educational group were statistically higher than testimonial group. It was also found that modified discern score of educational group was significantly higher than testimonial group. Educational videos were more reliable, more accurate, and useful than testimonial videos. Moreover, there was no significant difference between the two groups in terms of VPI score, number of views, and likes. Radonjic et al., (2020) demonstrated that videos with the highest VPI values, a measure of video popularity, had the lowest reliability scores. Several studies have also indicated a paradoxically high popularity among patient testimonials videos, which do not cite evidence-based research for their claims (Fischer et al., 2013;Erdem and Karaca, 2018;Gokcen and Gumussuyu, 2019). Similarly, in our study, even though 48% of the videos were delivered by physicians for educational purposes, the VPI scores on these videos were not significantly higher than the testimonial videos.
It is clear that more useful videos are more accurate, more reliable, more understandable, and more actionable for treatment decision-making process of patient. It was demonstrated that there were statistically significant correlations between usefulness score and accuracy score, modified discern score, and PEMAT scores (understandability and actionability). This situation may reveal that the usefulness score system developed by the researchers can be used in parallel with other scoring systems in the literature. However, it was also demonstrated that there were no statistically significantly correlations between usefulness score and views number, viewing rate, and VPI score. Studies have also achieved similar findings on videos about oral cavity cancer and laryngeal cancer respectively (Hassona et al., 2016;Enver et al., 2020). Thus, these findings show that useful, accurate, reliable healthcare related videos on YouTube™ are not viewed enough. In addition, these findings mean that videos that provide misleading and incomplete information are watched at least as often as useful educational videos.
The first limitation of the present study is the pretty dynamic structure of YouTube™. The videos are continuously uploaded and deleted. A video you watch today may be deleted the day after. Future studies assessing healthcare related videos on YouTube™ can solve this issue by utilizing different methods. Second, we analyzed only English videos, while most of the regions where NPC is more common are not English speakers. Extending the coverage of languages especially southeastern Asia languages, it is possible to obtain more accurate results. Lastly, YouTube™ is not the only online platform which people can prefer to access videos. Planning more comprehensive studies to include other video platforms would provide more reliable results.
In conclusion, YouTube™ is a place that patients can research information about their diseases and treatment options. Healthcare videos on YouTube™ are heterogeneous and are not peer reviewed. People view these videos without making a distinction between poor and useful content and they do not know how to make such a distinction. Therefore, these misleading videos can lead viewers into wrong decisions. In our age, social media is an obvious unneglectable source and otorhinolaryngologists need to upload more accurate, reliable, and easy to understand videos onto platforms such as YouTube™.