Interrater Reliability of Various Thyroid Imaging Reporting and Data System (TIRADS) Classifications for Differentiating Benign from Malignant Thyroid Nodules

Document Type: Research Articles

Authors

Department of Radiology, Faculty of Medicine, Khon Kaen University, Thailand.

Abstract

Background: Thyroid ultrasound(US) is used as the first diagnostic tool to assess the management of disease but
is operator dependent. There have been few reports evaluating interrater variability in US assessment. Therefore, we
evaluated interrater reliability in US assessment of thyroid nodules and estimated its diagnostic accuracy for various
TIRADS systems. Methods: This retrospective study included 24 malignant nodules and 84 benign nodules from
January 2015 to October 2017. Two blinded observers independently reviewed stored US images by using TIRADS. All
analyses followed guidelines proposed by ACR-TR, Siriraj-TR and EU-TR systems. Interrater reliability was calculated
using Cohen’s Kappa statistics. Diagnostic accuracy were also calculated. Results: Interobserver agreement showed
substantial agreement for composition (K=0.616); echogenicity and echogenic foci showed fair agreement (K=0.327
and 0.288, respectively); margin showed slight agreement (K=0.143). Interrater reliability for the final assessment;
moderate agreement for ACR-TIRADS system (K=0.500); fair agreement for EU-TIRADS system (K=0.209) and
slight agreement (K=0.114) for Siriraj-TIRADS system. The diagnostic performance from the two observers; ACRTIRADS
system; sensitivities were 75% and 79.2%, specificities were 58.3% and 56%, positive predictive value (PPV)
were 34% and 33.9% and negative predictive value (NPV) were 89.1% and 90.4%. For the Siriraj-TIRADS system,
sensitivities were 41.7% and 25%, specificities were 84.5% and 89.3%, positive predictive value (PPV) were 43.5%
and 40% and negative predictive value (NPV) were 83.5% and 80.6%. For the EU-TIRADS system, sensitivities were
45.8% and 66.7%, specificities were 79.8% and 72.6%, positive predictive value (PPV) were 39.3% and 41% and
negative predictive value (NPV) were 83.8% and 88.4%. Conclusion: The ACR-TIRADS had highest interobserver
agreement, a trend to have highest sensitivity and negative predictive value for diagnosis of malignant thyroid nodules.
Siriraj-TIRADS had higher specificity and accuracy, but