1.
Coleman RE (2001) Metastatic bone disease: clinical features, pathophysiology and
treatment strategies. Cancer Treat Rev 27:165–176
2.
Macedo F, Ladeira K, Pinho F, et al. (2017) Bone metastases: an overview. Oncol Rev
11:321
3.
D’Oronzo S, Coleman R, Brown J, Silvestris F (2019) Metastatic bone disease:
Pathogenesis and therapeutic options: up-date on bone metastasis management. J Bone
Oncol 15:100205
4.
O’Sullivan GJ, Carty FL, Cronin CG (2015) Imaging of bone metastasis: an update.
World J Radiol 7:202-211
5.
Heindel W, Gübitz R, Vieth V, Weckesser M, Schober O, Schäfers M (2014) The
diagnostic imaging of bone metastases. Dtsch Arztebl Int 111:741–747
6.
Kalogeropoulou C, Karachaliou A, Zampakis P (2009) Radiologic evaluation of skeletal
metastases: role of plain radiographs and computed tomography. In: Cancer Metastasis –
Biology and Treatment, 12:119–136. Springer, Dordrecht
7.
Groves AM, Beadsmoore CJ, Cheow HK, et al. (2006) Can 16-detector multislice CT
exclude skeletal lesions during tumour staging? Implications for the cancer patient. Eur
Radiol 16:1066–1073
8.
Chmelik J, Jakubicek R, Walek P, et al. (2018) Deep convolutional neural networkbased segmentation and classification of difficult to define metastatic spinal lesions in
15
3D CT data. Med Image Anal 49:76–88
9.
Hammon M, Dankerl P, Tsymbal A, et al. (2013) Automatic detection of lytic and
blastic thoracolumbar spine metastases on computed tomography. Eur Radiol 23:1862–
1870
10.
Vandemark RM, Shpall EJ, Affronti M Lou (1992) Bone metastases from breast cancer:
value of CT bone windows. J Comput Assist Tomogr 16:608–614
11.
Pomerantz SM, White CS, Krebs TL, et al. (2000) Liver and bone window settings for
soft-copy interpretation of chest and abdominal CT. Am J Roentgenol 174:311–314
12.
Burns JE, Yao J, Wiese TS, Muñoz HE, Jones EC, Summers RM (2013) Automated
detection of sclerotic metastases in the thoracolumbar spine at CT. Radiology 268:69–78
13.
Choy G, Khalilzadeh O, Michalski M, et al. (2018) Current applications and future
impact of machine learning in radiology. Radiology 288:318–328
14.
Roth HR, Lu L, Liu J, et al. (2016) Improving computer-aided detection using
convolutional neural networks and random view aggregation. IEEE Trans Med Imaging
35:1170–1181
15.
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical
image segmentation. In: MICCAI 2015. Lecture Notes in Computer Science, 9351:234–
241. Springer, Cham
16.
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In:
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) p770–778.
Las Vegas
17.
Noguchi S, Nishio M, Yakami M, Nakagomi K, Togashi K (2020) Bone segmentation
on whole-body CT using convolutional neural network with novel data augmentation
techniques. Comput Biol Med 121:103767
18.
Zou KH, Warfield SK, Bharatha A, et al. (2004) Statistical validation of image
segmentation quality based on a spatial overlap index. Acad Radiol 11:178–189
16
19.
Chakraborty DP, Zhai X (2016) On the meaning of the weighted alternative freeresponse operating characteristic figure of merit. Med Phys 43:2548–2557
20.
Chakraborty DP (2017) Observer performance methods for diagnostic imaging:
foundations, modeling, and applications with R-based examples. CRC Press, Boca Raton
21.
Chakraborty DP (2021) The RJafroc Book. Available via
https://dpc10ster.github.io/RJafrocBook/. Accessed 24 Dec 2021
22.
Sakamoto R, Yakami M, Fujimoto K, et al. (2017) Temporal subtraction of serial CT
images with large deformation diffeomorphic metric mapping in the identification of
bone metastases. Radiology 285:629–639
23.
Nakamoto Y, Osman M, Wahl RL (2003) Prevalence and patterns of bone metastases
detected with positron emission tomography using F-18 FDG. Clin Nucl Med 28:302–
307
24.
Kakhki VRD, Anvari K, Sadeghi R, Mahmoudian AS, Torabian-Kakhki M (2013)
Pattern and distribution of bone metastases in common malignant tumors. Nucl Med Rev
16:66–69
25.
Kobatake H (2007) Future CAD in multi-dimensional medical images: - project on
multi-organ, multi-disease CAD system -. Comput Med Imaging Graph 31:258–266
26.
Liu K, Li Q, Ma J, et al. (2019) Evaluating a fully automated pulmonary nodule
detection approach and its impact on radiologist performance. Radiol Artif Intell
1:e180084
27.
Xie H, Yang D, Sun N, Chen Z, Zhang Y (2019) Automated pulmonary nodule detection
in CT images using deep convolutional neural networks. Pattern Recognit 85:109–119
28.
Pehrson LM, Nielsen MB, Lauridsen CA (2019) Automatic pulmonary nodule detection
applying deep learning or machine learning algorithms to the LIDC-IDRI database: A
systematic review. Diagnostics 9:29
29.
Chlebus G, Schenk A, Moltz JH, van Ginneken B, Hahn HK, Meine H (2018) Automatic
17
liver tumor segmentation in CT with fully convolutional neural networks and objectbased postprocessing. Sci Rep 8:15497
30.
Vorontsov E, Cerny M, Régnier P, et al. (2019) Deep learning for automated
segmentation of liver lesions at ct in patients with colorectal cancer liver metastases.
Radiol Artif Intell 1:e180014
31.
Azer SA (2019) Deep learning with convolutional neural networks for identification of
liver masses and hepatocellular carcinoma: a systematic review. World J Gastrointest
Oncol 11:1218–1230
32.
van Leeuwen KG, Schalekamp S, Rutten MJCM, van Ginneken B, de Rooij M (2021)
Artificial intelligence in radiology: 100 commercially available products and their
scientific evidence. Eur Radiol 31:3797–3804
33.
Çiray I, Åström G, Sundström C, Hagberg H, Ahlström H (1997) Assessment of
suspected bone metastases: CT with and without clinical information compared to CTguided bone biopsy. Acta radiol 38:890–895
18
Figures
Figure 1. Flowchart of data collection and division
All scans were collected retrospectively from the clinical databases of a single institution.
19
Figure 2. Schematic of the proposed algorithm
Schemas are shown in 2D planar images for simplicity. In truth, most of the processes are
operated in a 3D volumetric manner, except for 2D UNet for bone segmentation. In this case, a
candidate region on the left half of the sacrum was included in the final output, and two
candidate regions on the left ileum were discarded.
20
Figure 3. Screenshot of the image viewer for the observer study
From left to right, the three images in a row are the original image, overlaid image of the
original image and the candidate region output from the proposed algorithm, and maximum
intensity projection of the bone region overlaid with the candidate region. In this case, two
candidate regions located on the right rib and lumbar spine are presented. Patient age, sex, and
number of candidate regions were also displayed. When an observer clicks on a suspicious
lesion, a dialog box for rating the likelihood (1–100) of bone metastasis appears.
21
Figure 4. Representative images of true-positive lesions with various
appearances and locations
22
From left to right, the three images in a row are the original image, candidate region output from
the DLA (red), and the ground truth label (blue). The DSC of the candidate region and ground
truth label, predicted probability for the candidate region, and detection rate by radiologists
without and with the DLA in the observer study are shown in the right table. (a) Sclerotic bone
metastasis on the vertebra. (b) Expansile lytic bone metastasis in the right iliac bone of the
pelvis. (c) Sclerotic bone metastasis in the left femur. (d) Mixed sclerotic and lytic bone
metastasis on the right transverse process of the vertebra. (e) Lytic bone metastasis in the right
scapula. (f) Small sclerotic bone metastasis in the right rib. (g) Small lytic bone metastasis on
the right transverse process of the vertebra. (h) Lytic bone metastasis in the sternum.
Abbreviations: DLA, deep learning-based algorithm; DSC, Dice similarity coefficient.
23
Figure 5. Representative images of false-negative lesions (a–c) and falsepositive regions (d–f).
24
(a) Small sclerotic bone metastasis on the right scapula. It appears to be too small and faint for
the DLA to detect. (b) Lytic bone metastasis on the right humerus. Note that the red region on
the middle image is a candidate region before thresholding, and DSC (*) was calculated on this
region. With a threshold of 0.6, the region was deleted since its probability was 0.328, which is
<0.6. Therefore, this lesion was counted as false-negative. (c) Mixed sclerotic and lytic bone
metastasis on the right ischial and pubic bones of the pelvis. The lesion was detected by the
DLA but was counted as false-negative since the DSC was <0.3. (d) False-positive region due to
an old rib fracture. (e) False-positive region due to non-specific inhomogeneous density of the
right iliac bone of the pelvis. (f) False-positive region located outside the bone due to posttherapeutic changes of the liver tumor. Such errors occurred occasionally, because the DLA
focuses only on local image features and does not take the holistic anatomical information into
account. Abbreviations: DLA, deep learning-based algorithm; DSC, Dice similarity coefficient;
N/A, not applicable.
25
Figure 6. The average free-response receiver operating characteristic curves of
the nine radiologists without and with the DLA
The overall performance of radiologists improved significantly with the aid of the DLA.
Abbreviations: DLA, deep learning-based algorithm
26
Tables
Table 1. Demographics of the cases in the three datasets
Training
Positive Negative
66.5
63.2
± 12.8
± 15.9
(28-86)
(1-92)
Validation
Positive Negative
66.6
68.6
± 13.0
± 9.7
(34-84)
(43-81)
Test
Positive Negative
67.3
67.9
± 9.8
± 9.1
(45-86)
(46-87)
100 (59)
69 (41)
225 (49)
238 (51)
13 (65)
7 (35)
12 (60)
8 (40)
20 (67)
10 (33)
17 (57)
13 (43)
(34)
(20)
(15)
(32)
0* (0)
0* (0)
0* (0)
0* (0)
6 (30)
4 (20)
3 (15)
7 (35)
6 (30)
5 (25)
5 (25)
4 (20)
10
10
169**
463
20
20
30
30
Use of Contrast Media
Plain
Contrast-enhanced
153 (57)
116 (43)
237 (51)
226 (49)
8 (40)
12 (60)
9 (45)
11 (55)
12 (40)
18 (60)
16 (53)
14 (47)
Slice Thickness
1.0 mm
0.5 mm
266 (99)
3 (1)
417 (90)
46 (10)
20 (100)
0 (0)
20 (100) 30 (100)
0 (0)
0 (0)
30 (100)
0 (0)
Aquilion Prime
Aquilion One
Aquilion
Aquilion Precision
124 (46)
117 (43)
24 (9)
4 (1)
191 (41)
157 (34)
105 (23)
10 (2)
9 (45)
3 (15)
8 (40)
0 (0)
7 (35)
6 (30)
7 (35)
0 (0)
6 (20)
5 (17)
19 (63)
0 (0)
10 (33)
10 (33)
10 (33)
0 (0)
Scan Coverage
Neck to Abdomen
Chest to Abdomen
Neck to Chest
Chest
Abdomen
Brain
Neck
133 (49)
102 (38)
2 (1)
27 (10)
4 (1)
1 (0)
0 (0)
329 (71)
20 (4)
1 (0)
9 (2)
41 (9)
44 (10)
19 (4)
12 (60)
4 (20)
1 (5)
2 (10)
1 (5)
0 (0)
0 (0)
8 (40)
8 (40)
1 (5)
3 (15)
0 (0)
0 (0)
0 (0)
14 (47)
13 (43)
0 (0)
3 (10)
0 (0)
0 (0)
0 (0)
13 (43)
9 (30)
1 (3)
4 (13)
3 (10)
0 (0)
0 (0)
269**
463
20
20
30
30
Age (years)
Sex
Per
Patient
Male
Female
Primary Lesion
Lungs
Prostate
Breast
Others
Total Number of Patients
57
33
25
54
(27)
(20)
(20)
(33)
(27)
(20)
(20)
(33)
Scanner Model
Per
Scan
Total Number of Scans
27
For patient age, the mean age and standard deviation are presented, with range of values in
parentheses. For other data, the number of patients or scans are presented, with percentages in
parentheses.
*Negative scans of the training dataset were acquired from patients without malignancy.
**For positive cases in the training dataset, the total number of patients and scans were not
equal, because more than one scan was included from one patient if the radiological appearance
of bone metastases had changed substantially.
28
Table 2. Characteristics of lesions in the three datasets
Training
Validation
Test
Location
Vertebra
620 (45)
22 (45)
29 (39)
Pelvis
412 (30)
15 (31)
14 (19)
Rib
228 (17)
9 (18)
18 (24)
Scapula
38 (3)
1 (2)
4 (5)
Limb
32 (2)
0 (0)
5 (7)
Sternum
30 (2)
2 (4)
5 (7)
Clavicle
11 (1)
0 (0)
0 (0)
4 (0)
0 (0)
0 (0)
Sclerotic
709 (52)
21 (43)
31 (41)
Lytic
518 (38)
19 (39)
25 (33)
Mixed
148 (11)
9 (18)
19 (25)
109 (8)
2 (4)
4 (5)
≥30 mm to <50 mm
263 (19)
11 (22)
13 (17)
≥10 mm to <30 mm
896 (65)
31 (63)
49 (65)
≥5 mm to <10 mm
107 (8)
5 (10)
9 (12)
1375
49
75
Skull
Appearance
Diameter
≥50 mm
Total Number of Lesions
Data are the number of lesions for each category, with percentages in parentheses.
29
Table 3. Performance of the DLA according to the preset threshold
Test dataset
(30 positive cases with 75 lesions
and 30 negative cases)
Validation dataset
(20 positive cases with 49 lesions
and 20 negative cases)
Lesion-based analysis
Case-based analysis
Thres
hold
TP
FN
FP
Sensitiv
ity (%)
FP per
case
TP
FN
TN
FP
Sensitiv
ity (%)
Specific
ity (%)
0.9
40
21
81.6
0.525
19
17
95.0
85.0
0.8
41
25
83.7
0.625
20
17
100.0
85.0
0.7
42
26
85.7
0.650
20
16
100.0
80.0
0.6
44
31
89.8
0.775
20
14
100.0
70.0
0.5
44
35
89.8
0.875
20
14
100.0
70.0
0.4
44
41
89.8
1.025
20
13
100.0
65.0
0.3
45
47
91.8
1.175
20
13
100.0
65.0
0.2
45
66
91.8
1.650
20
12
100.0
60.0
0.1
45
91
91.8
2.275
20
10
10
100.0
50.0
0.9
55
20
24
73.3
0.400
30
27
100.0
90.0
0.8
58
17
32
77.3
0.533
30
26
100.0
86.7
0.7
60
15
35
80.0
0.583
30
26
100.0
86.7
0.6
62
13
37
82.7
0.617
30
24
100.0
80.0
0.5
62
13
43
82.7
0.717
30
22
100.0
73.3
0.4
62
13
52
82.7
0.867
30
20
10
100.0
66.7
0.3
65
10
60
86.7
1.000
30
16
14
100.0
53.3
0.2
66
72
88.0
1.200
30
11
19
100.0
36.7
0.1
67
90
89.3
1.500
30
21
100.0
30.0
The results for each threshold from 0.1 to 0.9 are presented. Sensitivities are indicated as
percentages. FP per case indicates the average number of FP counts per a case. Based on the
results for the validation dataset, a threshold of 0.6 was defined as the standard value for the
algorithm (given in bold numbers). Note that TNs for lesion-based analysis are omitted because
TN lesion is undefinable for a data that contains multiple lesions in one scan and contains
location information, unlike typical diagnostic test with a binary outcome (e.g., presence or
absence). Abbreviations: DLA, deep learning-based algorithm; TP, true-positive; FN, falsenegative; TN, true-negative; FP, false-positive.
30
Table 4. Sensitivities of the DLA and nine radiologists without and with the DLA,
stratified according to lesion characteristics
Validation
DLA
Test
DLA
Radiologists
without DLA
Radiologists
with DLA
Location
Vertebra
90.9
(20/22)
89.7
(26/29)
60.2
(17.4/29)
78.5
(22.8/29)
Pelvis
93.3
(14/15)
92.9
(13/14)
60.3
(8.4/14)
87.3
(12.2/14)
Rib
77.8
(7/9)
77.8
(14/18)
35.8
(6.4/18)
54.9
(9.9/18)
100.0
(1/1)
50.0
(2/4)
33.3
(1.3/4)
47.2
(1.9/4)
(0/0)
60.0
(3/5)
57.8
(2.9/5)
62.2
(3.1/5)
(2/2)
80.0
(4/5)
44.4
(2.2/5)
77.8
(3.9/5)
Scapula
Limb
Sternum
100.0
Appearance
Sclerotic
85.7
(18/21)
83.9
(26/31)
45.5
(14.1/31)
66.7
(20.7/31)
Lytic
89.5
(17/19)
92.0
(23/25)
64.4
(16.1/25)
84.0
(21.0/25)
Mixed
100.0
(9/9)
73.7
(14/19)
45.0
(8.6/19)
63.7
(12.1/19)
100.0
(2/2)
75.0
(3/4)
86.1
(3.4/4)
94.4
(3.8/4)
Diameter
≥50 mm
≥30 mm to <50 mm
90.9
(10/11)
100.0
(13/13)
70.1
(9.1/13)
88.0
(11.4/13)
≥10 mm to <30 mm
96.8
(30/31)
85.7
(42/49)
50.6
(24.8/49)
73.0
(35.8/49)
≥5 mm to <10 mm
60.0
(3/5)
44.4
(4/9)
16.0
(1.4/9)
30.9
(2.8/9)
89.8
(44/49)
82.7
(62/75)
51.7
(38.8/75)
71.7
(53.8/75)
Total
Sensitivities are indicated as percentages, with actual numbers in parentheses. For numerators of
radiologists’ sensitivities, averages of nine radiologists are presented. Abbreviations: DLA, deep
learning-based algorithm.
31
Table 5. Interpretation results of nine radiologists without and with the DLA
wAFROCFOM
Radiologist
wo
0.828
Lesion-based
sensitivity (%)
wo
0.899
64.0
0.743
0.924
0.714
False-positives
per case
wo
69.3
0.333
45.3
72.0
0.933
40.0
0.802
0.901
0.769
Case-based
sensitivity (%)
Case-based
specificity (%)
Interpretation
time per case (s)
wo
wo
wo
0.083
86.7
96.7
96.7
100.0
204
108
0.083
0.050
70.0
86.7
100.0
100.0
119
43
78.7
0.100
0.050
60.0
90.0
100.0
93.3
144
80
65.3
78.7
0.833
0.183
90.0
96.7
90.0
93.3
257
62
0.914
54.7
73.3
0.083
0.150
76.7
93.3
93.3
100.0
148
127
0.754
0.936
65.3
74.7
0.217
0.283
86.7
93.3
86.7
96.7
152
82
0.743
0.904
50.7
70.7
0.383
0.267
76.7
93.3
90.0
90.0
214
66
0.783
0.888
52.0
76.0
0.050
0.300
70.0
93.3
100.0
93.3
196
140
0.575
0.791
28.0
52.0
0.050
0.050
53.3
76.7
100.0
100.0
75
54
Average
0.746
0.899*
51.7
71.7*
0.237
0.157
74.4
91.1*
95.2
96.2
168
85*
Sensitivity and specificity are indicated as percentages, and interpretation times are indicated in
seconds. Asterisks indicate a significant difference between the two sessions. Abbreviations:
DLA, deep learning-based algorithm; wo, without DLA; w, with DLA; wAFROC-FOM,
weighted alternative free-response receiver operating characteristic figure of merit.
...