To investigate the effects of choosing between different metrics in estimating the size of pulmonary nodules as a factor both of nodule characterization and of performance of computer aided detection systems, since the latters are always qualified with respect to a given size range of nodules.
Materials and Methods
This study used 265 whole-lung CT scans documented by the Lung Image Database Consortium using their protocol for nodule evaluation. Each inspected lesion was reviewed independently by four experienced radiologists who provided boundary markings for nodules larger than 3 mm. Four size metrics, based on the boundary markings, were considered: a uni-dimensional and two bi-dimensional measures on a single image slice and a volumetric measurement based on all the image slices. The radiologist boundaries were processed and those with four markings were analyzed to characterize the inter-radiologist variation, while those with at least one marking were used to examine the difference between the metrics.
The processing of the annotations found 127 nodules marked by all of the four radiologists and an extended set of 518 nodules each having at least one observation with three-dimensional sizes ranging from 2.03 to 29.4 mm (average 7.05 mm, median 5.71 mm). A very high inter-observer variation was observed for all these metrics: 95% of estimated standard deviations were in the following ranges [0.49, 1.25], [0.67, 2.55], [0.78, 2.11], and [0.96, 2.69] for the three-dimensional, the uni-dimensional, and the two bi-dimensional size metrics respectively (in mm). Also a very large difference among the metrics was observed: 0.95 probability-coverage region widths for the volume estimation conditional on uni-dimensional, and the two bi-dimensional size measurements of 10mm were 7.32, 7.72, and 6.29 mm respectively.
The selection of data subsets for performance evaluation is highly impacted by the size metric choice. The LIDC plans to include a single size measure for each nodule in its database. This metric is not intended as a gold standard for nodule size; rather, it is intended to facilitate the selection of unique repeatable size limited nodule subsets.