An intra- and inter-rater agreement and reliability study
Background: Detailed information about the development of disc morphology over time could provide valuable knowledge about disc health when compared with clinical measures such as pain and activity limitation. However, a review of the available literature did not reveal any detailed and directly applicable description of quantitative methods for measuring lumbar disc herniations and related structures on sagittal MRIs. The objectives of this study were: 1) to develop methods for quantitative measures of intervertebral discs, lumbar disc herniations and dural sac/spinal canal using MRIs, 2) to evaluate the agreement of these methods, and 3) to identify factors in the measurement procedures that may compromise agreement. Methods: In this intra- and inter-rater agreement study, lumbar quantitative measurements were performed on magnetic resonance images from 32 participants from a study cohort representative of the Danish general population. A new method for quantitative measures of intervertebral discs and related structures was developed and systematically described. MRI-images were measured twice by one rater for intra-rater agreement and once by a second rater for inter-rater agreement. Length and volume measurements were conducted, and cross-sectional areas calculated from length measurements. Statistical analysis included Bland and Altman’s limits of agreement and weighted kappa analysis. Results: Acceptable to good agreement was found for intra-rater measurements and calculations, with limits of agreement ranging from 1.8% to 33.6% of mean values. Questionable to good agreement was found for inter-rater measurements and calculations, with limits of agreement ranging from 2.3% to 51.9% of mean values. An exception was found for volume measurements, which showed poor agreement. It was possible to identify two main causes compromising agreement, both of which related to uncertainty in point markings during measurements. Conclusions: Quantitative length measurements and CSA calculations showed good to excellent agreement, and are applicable for further use in a broader context. Quantitative volume measurements showed poor agreement and are not applicable for further use.
European Spine Journal, 2013, Vol 22, Issue 5, Supplement