論文の公開元へ

書き出し

Refer/BibIX

RIS

BibTeX

TSV

Convolutional neural networks with superpixels : toward detail-preserving image segmentation (本文)

鈴木, 哲平慶應義塾大学

2022.03.23

概要

In the computer vision field, image recognition and understanding are the main tasks. In particular, dense prediction tasks, such as image segmentation and depth estimation, are important for image editing and scene understanding. To solve such tasks, fully convolutional networks (FCNs), which is a variant of convolutional neural networks (CNNs), have been proposed and have become a de fact standard method. Although FCNs achieved better accuracy for image seg- mentation tasks than traditional methods, detailed information, such as image edges, boundaries, and small and/or thin objects, is often missed due to the downsampling layers, which are used for reducing computational costs and expanding receptive fields. In this thesis, the detail-preserving framework utilizing superpixels in downsampling layers is proposed. The proposed method mitigates the detailed in- formation loss by incorporating it into existing FCNs.

Chapter 1 describes image segmentation, its application, and research questions.

Chapter 2 describes existing image segmentation methods using classi- cal Markov random fields and deep neural networks and their variants.

Chapter 3 defines superpixel segmentation as the maximization of mutual information and then proposes an unsupervised superpixel segmentation framework using CNNs. The proposed method shows the CNNs have a strong prior for superpixel segmentation.

Chapter 4 describes graph convolutional networks and then defines convolution operations for superpixel images. Compared to general CNNs and the model replacing the convolution with the proposed convolution shows the effectiveness of superpixels in CNNs.

論文の公開元へ

この論文で使われている画像

参考文献

[Achanta et al., 2012] Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and Su¨sstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel meth- ods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11):2274– 2282.

[Arbelaez et al., 2010] Arbelaez, P., Maire, M., Fowlkes, C., and Malik, J. (2010). Con- tour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5):898–916.

[Ba et al., 2016] Ba, J. L., Kiros, J. R., and Hinton, G. E. (2016). Layer normalization. arXiv preprint arXiv:1607.06450.

[Badrinarayanan et al., 2017] Badrinarayanan, V., Kendall, A., and Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12):2481–2495.

[Boykov et al., 2001] Boykov, Y., Veksler, O., and Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11):1222–1239.

[Boykov and Jolly, 2001] Boykov, Y. Y. and Jolly, M.-P. (2001). Interactive graph cuts for optimal boundary & region segmentation of objects in nd images. In International Conference on Computer Vision, volume 1, pages 105–112. IEEE.

[Bradski, 2000] Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Journal of Software Tools.

[Bridle et al., 1992] Bridle, J. S., Heading, A. J., and MacKay, D. J. (1992). Unsuper- vised classifiers, mutual information and’phantom targets. In Advances in Neural Information Processing Systems, pages 1096–1101.

[Bruna et al., 2013] Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. (2013). Spec- tral networks and locally connected networks on graphs. Technical report.

[Chen et al., 2014] Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. Technical report.

[Chen et al., 2017a] Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2017a). Deeplab: Semantic image segmentation with deep convo- lutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848.

[Chen et al., 2017b] Chen, L.-C., Papandreou, G., Schroff, F., and Adam, H. (2017b). Rethinking atrous convolution for semantic image segmentation. Technical report.

[Chen et al., 2018] Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image seg- mentation. In European Conference on Computer Vision, pages 801–818.

[Ciresan et al., 2012] Ciresan, D., Giusti, A., Gambardella, L., and Schmidhuber, J. (2012). Deep neural networks segment neuronal membranes in electron microscopy images. Advances in Neural Information Processing Systems, 25:2843–2851.

[Cordts et al., 2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 3213–3223.

[Defferrard et al., 2016] Defferrard, M., Bresson, X., and Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pages 3844–3852.

[Dosovitskiy et al., 2015] Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., and Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 2758–2766.

[Farabet et al., 2012] Farabet, C., Couprie, C., Najman, L., and LeCun, Y. (2012). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1915–1929.

[Felzenszwalb and Huttenlocher, 2004] Felzenszwalb, P. F. and Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Com- puter Vision, 59(2):167–181.

[Fu et al., 2018] Fu, H., Gong, M., Wang, C., Batmanghelich, K., and Tao, D. (2018). Deep ordinal regression network for monocular depth estimation. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 2002–2011.

[Gadde et al., 2016] Gadde, R., Jampani, V., Kiefel, M., Kappler, D., and Gehler, P. V. (2016). Superpixel convolutional networks using bilateral inceptions. In European Conference on Computer Vision, pages 597–613. Springer.

[Godard et al., 2017] Godard, C., Mac Aodha, O., and Brostow, G. J. (2017). Un- supervised monocular depth estimation with left-right consistency. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 270–279.

[Godard et al., 2019] Godard, C., Mac Aodha, O., Firman, M., and Brostow, G. J. (2019). Digging into self-supervised monocular depth estimation. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 3828–3838.

[Goodfellow et al., 2014] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde- Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27:2672–2680.

[Gould et al., 2009] Gould, S., Fulton, R., and Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In International Conference on Computer Vision, pages 1–8. IEEE.

[Grady, 2006] Grady, L. (2006). Random walks for image segmentation. IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 28(11):1768–1783.

[Gupta et al., 2013] Gupta, S., Arbelaez, P., and Malik, J. (2013). Perceptual organiza- tion and recognition of indoor scenes from RGB-D images. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 564–571.

[Hammond et al., 2011] Hammond, D. K., Vandergheynst, P., and Gribonval, R. (2011). Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2):129–150.

[He et al., 2017] He, K., Gkioxari, G., Doll´ar, P., and Girshick, R. (2017). Mask r-cnn. In International Conference on Computer Vision, pages 2961–2969.

[He et al., 2012] He, K., Sun, J., and Tang, X. (2012). Guided image filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6):1397–1409.

[He et al., 2016a] He, K., Zhang, X., Ren, S., and Sun, J. (2016a). Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778.

[He et al., 2016b] He, K., Zhang, X., Ren, S., and Sun, J. (2016b). Identity mappings in deep residual networks. In European Conference on Computer Vision, pages 630– 645. Springer.

[He et al., 2004] He, X., Zemel, R. S., and Carreira-Perpin˜´an, M. A´. (2004). Multiscale conditional random fields for image labeling. In The IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages II–II. IEEE.

[He et al., 2006] He, X., Zemel, R. S., and Ray, D. (2006). Learning and incorporating top-down cues in image segmentation. In European Conference on Computer Vision, pages 338–351. Springer.

[Hoogeboom et al., 2019] Hoogeboom, E., Berg, R. v. d., and Welling, M. (2019). Emerging convolutions for generative normalizing flows. arXiv preprint arXiv:1901.11137.

[Iizuka et al., 2017] Iizuka, S., Simo-Serra, E., and Ishikawa, H. (2017). Globally and locally consistent image completion. ACM Transactions on Graphics, 36(4):1–14.

[Ilg et al., 2017] Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., and Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 2462– 2470.

[Ioffe and Szegedy, 2015] Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accel- erating deep network training by reducing internal covariate shift. Technical report.

[Isola et al., 2017] Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017). Image-to- image translation with conditional adversarial networks. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 1125–1134.

[Jampani et al., 2018] Jampani, V., Sun, D., Liu, M.-Y., Yang, M.-H., and Kautz, J. (2018). Superpixel sampling networks. In European Conference on Computer Vision, pages 352–368. Springer.

[Johnson et al., 2018] Johnson, J., Gupta, A., and Fei-Fei, L. (2018). Image genera- tion from scene graphs. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 1219–1228.

[Kanezaki, 2018] Kanezaki, A. (2018). Unsupervised image segmentation by backprop- agation. In International Conference on Acoustics, Speech and Signal Processing, pages 1543–1547. IEEE.

[Kendall et al., 2015] Kendall, A., Badrinarayanan, V., and Cipolla, R. (2015). Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder archi- tectures for scene understanding. arXiv preprint arXiv:1511.02680.

[Kingma and Ba, 2014] Kingma, D. P. and Ba, J. (2014). Adam: A method for stochas- tic optimization. International Conference on Learning Representations.

[Kipf and Welling, 2016] Kipf, T. N. and Welling, M. (2016). Semi-supervised classifi- cation with graph convolutional networks. Technical report.

[Kirillov et al., 2019a] Kirillov, A., Girshick, R., He, K., and Doll´ar, P. (2019a). Panop- tic feature pyramid networks. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 6399–6408.

[Kirillov et al., 2019b] Kirillov, A., He, K., Girshick, R., Rother, C., and Doll´ar, P. (2019b). Panoptic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 9404–9413.

[Knyazev et al., 2019] Knyazev, B., Lin, X., Amer, M. R., and Taylor, G. W. (2019).

Image classification with hierarchical multigraph networks. Technical report.

[Kohli et al., 2009] Kohli, P., Torr, P. H., et al. (2009). Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision, 82(3):302– 324.

[Kr¨ahenbu¨hl and Koltun, 2011] Kr¨ahenbu¨hl, P. and Koltun, V. (2011). Efficient in- ference in fully connected crfs with gaussian edge potentials. Advances in Neural Information Processing Systems, 24:109–117.

[Krizhevsky et al., 2012] Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Ima- genet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1097–1105.

[Krizhevsky et al., 2017] Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Im- agenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90.

[Kwak et al., 2017] Kwak, S., Hong, S., and Han, B. (2017). Weakly supervised seman- tic segmentation using superpixel pooling network. In Thirty-First AAAI Conference on Artificial Intelligence.

[Leung and Malik, 2001] Leung, T. and Malik, J. (2001). Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, 43(1):29–44.

[Levin et al., 2007] Levin, A., Lischinski, D., and Weiss, Y. (2007). A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2):228–242.

[Li and Yu, 2015] Li, G. and Yu, Y. (2015). Visual saliency based on multiscale deep features. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 5455–5463.

[Li et al., 2020] Li, X., You, A., Zhu, Z., Zhao, H., Yang, M., Yang, K., and Tong, Y. (2020). Semantic flow for fast and accurate scene parsing. Technical report.

[Lin et al., 2017] Lin, T.-Y., Doll´ar, P., Girshick, R., He, K., Hariharan, B., and Be- longie, S. (2017). Feature pyramid networks for object detection. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 2117–2125.

[Liu et al., 2011] Liu, M.-Y., Tuzel, O., Ramalingam, S., and Chellappa, R. (2011). Entropy rate superpixel segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 2097–2104. IEEE.

[Liu et al., 2016] Liu, Y.-J., Yu, C.-C., Yu, M.-J., and He, Y. (2016). Manifold slic: A fast method to compute content-sensitive superpixels. In Tthe IEEE Conference on Computer Vision and Pattern Recognition, pages 651–659.

[Long et al., 2015] Long, J., Shelhamer, E., and Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440.

[Matsuo and Aoki, 2015] Matsuo, K. and Aoki, Y. (2015). Depth image enhancement using local tangent plane approximations. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 3574–3583.

[McCallum and Sutton, 2005] McCallum, A. and Sutton, C. (2005). Piecewise training of undirected models. In Conference on Uncertainty in Artificial Intelligence.

[Mester et al., 2011] Mester, R., Conrad, C., and Guevara, A. (2011). Multichannel segmentation using contour relaxation: fast super-pixels and temporal propagation. In Scandinavian Conference on Image Analysis, pages 250–261. Springer.

[Mnih and Hinton, 2010] Mnih, V. and Hinton, G. E. (2010). Learning to detect roads in high-resolution aerial images. In European Conference on Computer Vision, pages 210–223. Springer.

[Mnih and Hinton, 2012] Mnih, V. and Hinton, G. E. (2012). Learning to label aerial images from noisy data. In The International Conference on Machine Learning, pages 567–574.

[Monti et al., 2017] Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., and Bronstein, M. M. (2017). Geometric deep learning on graphs and manifolds using mixture model cnns. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 5115–5124.

[Morgan and Bourlard, 1990] Morgan, N. and Bourlard, H. (1990). Generalization and parameter estimation in feedforward nets: Some experiments. In Advances in Neural Information Processing Systems, pages 630–637.

[Nah et al., 2017] Nah, S., Hyun Kim, T., and Mu Lee, K. (2017). Deep multi-scale convolutional neural network for dynamic scene deblurring. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 3883–3891.

[Noh et al., 2015] Noh, H., Hong, S., and Han, B. (2015). Learning deconvolution network for semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 1520–1528.

[Park et al., 2019] Park, T., Liu, M.-Y., Wang, T.-C., and Zhu, J.-Y. (2019). Semantic image synthesis with spatially-adaptive normalization. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 2337–2346.

[Paszke et al., 2019] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pages 8024–8035.

[Perazzi et al., 2016] Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., and Sorkine-Hornung, A. (2016). A benchmark dataset and evaluation method- ology for video object segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 724–732.

[Pinheiro and Collobert, 2014] Pinheiro, P. and Collobert, R. (2014). Recurrent convo- lutional neural networks for scene labeling. In International Conference on Machine Learning, pages 82–90. PMLR.

[Porter and Duff, 1984] Porter, T. and Duff, T. (1984). Compositing digital images. In The 11th Annual Conference on Computer Graphics and Interactive Techniques, pages 253–259.

[Ren and Malik, 2003] Ren and Malik (2003). Learning a classification model for seg- mentation. In International Conference on Computer Vision, pages 10–17 vol.1.

[Ronneberger et al., 2015] Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In International Con- ference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer.

[Rother et al., 2004] Rother, C., Kolmogorov, V., and Blake, A. (2004). ” grabcut” interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3):309–314.

[Saeedan et al., 2018] Saeedan, F., Weber, N., Goesele, M., and Roth, S. (2018). Detail-preserving pooling in deep networks. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 9108–9116.

[Saito et al., 2016] Saito, S., Yamashita, T., and Aoki, Y. (2016). Multiple object ex- traction from aerial imagery with convolutional neural networks. Electronic Imaging, 2016(10):1–9.

[Sawicki, 2007] Sawicki, M. (2007). Filming the fantastic: a guide to visual effect cin- ematography. Taylor & Francis.

[Schonfeld et al., 2020] Schonfeld, E., Schiele, B., and Khoreva, A. (2020). A u-net based discriminator for generative adversarial networks. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 8207–8216.

[Shi and Malik, 2000] Shi, J. and Malik, J. (2000). Normalized cuts and image segmen- tation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888– 905.

[Shotton et al., 2006] Shotton, J., Winn, J., Rother, C., and Criminisi, A. (2006). Tex- tonboost: Joint appearance, shape and context modeling for multi-class object recog- nition and segmentation. In European Conference on Computer Vision, pages 1–15. Springer.

[Shotton et al., 2009a] Shotton, J., Winn, J., Rother, C., and Criminisi, A. (2009a). Textonboost for image understanding: Multi-class object recognition and segmen- tation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81(1):2–23.

[Shotton et al., 2009b] Shotton, J., Winn, J., Rother, C., and Criminisi, A. (2009b). Textonboost for image understanding: Multi-class object recognition and segmen- tation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81(1):2–23.

[Shuman et al., 2013] Shuman, D. I., Narang, S. K., Frossard, P., Ortega, A., and Van- dergheynst, P. (2013). The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, 30(3):83–98.

[Silberman et al., 2012] Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012). Indoor segmentation and support inference from RGBD images. In European Con- ference on Computer Vision, pages 746–760. Springer.

[Strang, 2006] Strang, G. (2006). Linear algebra and its applications. Thomson, Brooks/Cole, Belmont, CA.

[Stutz et al., 2018] Stutz, D., Hermans, A., and Leibe, B. (2018). Superpixels: An evaluation of the state-of-the-art. Computer Vision and Image Understanding, 166:1– 27.

[Sun et al., 2018] Sun, D., Yang, X., Liu, M.-Y., and Kautz, J. (2018). Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 8934–8943.

[Sun et al., 2004] Sun, J., Jia, J., Tang, C.-K., and Shum, H.-Y. (2004). Poisson mat- ting. In ACM SIGGRAPH 2004 Papers, pages 315–321.

[Suzuki, 2020] Suzuki, T. (2020). Superpixel segmentation via convolutional neural networks with regularized information maximization. In International Conference on Acoustics, Speech and Signal Processing, pages 2573–2577. IEEE.

[Suzuki, 2021] Suzuki, T. (2021). Implicit integration of superpixel segmentation into fully convolutional networks. arXiv preprint arXiv:2103.03435.

[Suzuki et al., 2018] Suzuki, T., Akizuki, S., Kato, N., and Aoki, Y. (2018). Superpixel convolution for segmentation. In International Conference on Image Processing, pages 3249–3253. IEEE.

[Suzuki and Aoki, 2018] Suzuki, T. and Aoki, Y. (2018). Graph convolutional neural networks on superpixels for segmentation.電子情報通信学会論文誌 D, 101(8):1120– 1129.

[Suzuki and Aoki, 2020] Suzuki, T. and Aoki, Y. (2020). Unsupervised superpixel seg- mentation via convolutional neural network. 電子情報通信学会論文誌 D, 103(10):702– 711.

[Takayama et al., 2016] Takayama, S., Suzuki, T., Aoki, Y., Isobe, S., and Masuda,

M. (2016). Tracking people in dense crowds using supervoxels. In International Conference on Signal-Image Technology & Internet-Based Systems, pages 532–537. IEEE.

[Tao et al., 2018] Tao, X., Gao, H., Shen, X., Wang, J., and Jia, J. (2018). Scale- recurrent network for deep image deblurring. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 8174–8182.

[Tasli et al., 2013] Tasli, H. E., Cigla, C., Gevers, T., and Alatan, A. A. (2013). Super pixel extraction via convexity induced boundary adaptation. In IEEE International Conference on Multimedia and Expo, pages 1–6. IEEE.

[Torralba et al., 2004] Torralba, A., Murphy, K. P., and Freeman, W. T. (2004). Shar- ing features: efficient boosting procedures for multiclass object detection. In The IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages II–II. IEEE.

[Tu et al., 2018] Tu, W.-C., Liu, M.-Y., Jampani, V., Sun, D., Chien, S.-Y., Yang, M.-H., and Kautz, J. (2018). Learning superpixels with segmentation-aware affinity loss. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 568–576.

[Uijlings et al., 2013] Uijlings, J. R., Van De Sande, K. E., Gevers, T., and Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2):154–171.

[Ulyanov et al., 2016] Ulyanov, D., Vedaldi, A., and Lempitsky, V. (2016). Instance normalization: The missing ingredient for fast stylization. Technical report.

[Ulyanov et al., 2018] Ulyanov, D., Vedaldi, A., and Lempitsky, V. (2018). Deep image prior. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 9446–9454.

[Van den Bergh et al., 2012] Van den Bergh, M., Boix, X., Roig, G., de Capitani, B., and Van Gool, L. (2012). SEEDS: Superpixels extracted via energy-driven sampling. In European Conference on Computer Vision, pages 13–26. Springer.

[van der Walt et al., 2014] van der Walt, S., Sch¨onberger, J. L., Nunez-Iglesias, J., Boulogne, F., Warner, J. D., Yager, N., Gouillart, E., Yu, T., and the scikit-image contributors (2014). scikit-image: image processing in Python. PeerJ, 2:e453.

[Vaswani et al., 2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L- ., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008.

[Veksler et al., 2010] Veksler, O., Boykov, Y., and Mehrani, P. (2010). Superpixels and supervoxels in an energy optimization framework. In European Conference on Computer Vision, pages 211–224. Springer.

[Voigtlaender et al., 2019] Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B. B. G., Geiger, A., and Leibe, B. (2019). Mots: Multi-object tracking and segmenta- tion. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 7942–7951.

[Von Luxburg, 2007] Von Luxburg, U. (2007). A tutorial on spectral clustering. Statis- tics and computing, 17(4):395–416.

[Weikersdorfer et al., 2012] Weikersdorfer, D., Gossow, D., and Beetz, M. (2012). Depth-adaptive superpixels. In The IEEE Conference on Pattern Recognition, pages 2087–2090. IEEE.

[Wu and He, 2018] Wu, Y. and He, K. (2018). Group normalization. In European Conference on Computer Vision, pages 3–19.

[Xu et al., 2018] Xu, N., Yang, L., Fan, Y., Yue, D., Liang, Y., Yang, J., and Huang, T. (2018). Youtube-vos: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327.

[Yang et al., 2020] Yang, F., Sun, Q., Jin, H., and Zhou, Z. (2020). Superpixel seg- mentation with fully convolutional networks. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 13964–13973.

[Yao et al., 2015] Yao, J., Boben, M., Fidler, S., and Urtasun, R. (2015). Real-time coarse-to-fine topologically preserving segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 2947–2955.

[Yeh et al., 2017] Yeh, R. A., Chen, C., Yian Lim, T., Schwing, A. G., Hasegawa- Johnson, M., and Do, M. N. (2017). Semantic image inpainting with deep generative models. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 5485–5493.

[Yu and Koltun, 2015] Yu, F. and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. Technical report.

[Zhang et al., 2019] Zhang, L., Li, X., Arnab, A., Yang, K., Tong, Y., and Torr, P. H. (2019). Dual graph convolutional network for semantic segmentation. Technical report.

[Zhao et al., 2017] Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017). Pyramid scene parsing network. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 2881–2890.

[Zheng et al., 2015] Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., and Torr, P. H. (2015). Conditional random fields as recurrent neural networks. In International Conference on Computer Vision, pages 1529–1537.

[Zhi et al., 2019] Zhi, S., Bloesch, M., Leutenegger, S., and Davison, A. J. (2019). Scenecode: Monocular dense semantic reconstruction using learned encoded scene representations. In The IEEE Conference on Computer Vision and Pattern Recog- nition, pages 11776–11785.

[Zhou et al., 2016] Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., and Torralba, A. (2016). Semantic understanding of scenes through the ade20k dataset. Technical report.

[Zhu et al., 2019] Zhu, X., Hu, H., Lin, S., and Dai, J. (2019). Deformable convnets v2: More deformable, better results. In The IEEE Conference on Computer Vision and Pattern Recognition, pages 9308–9316.

参考文献をもっと見る

分野

大学

学位論文種類・取得年

言語

Convolutional neural networks with superpixels : toward detail-preserving image segmentation (本文)

概要

この論文で使われている画像

関連論文

Demazure slices of type A₂l(²)

Extension of Olsen's inequality to Morrey-Lorentz spaces

Path optimization with neural network for sign problem in quantum field theories

クライン群の可視化手法とその芸術表現への広がりに関する研究

Experimental study on the stability and measurement of ultrafine bubbles in water (本文)

参考文献

分野

大学

学位論文種類・取得年

言語

コピーが完了しました

URLをコピーしました

Convolutional neural networks with superpixels : toward detail-preserving image segmentation (本文)

概要

この論文で使われている画像

関連論文

Demazure slices of type A₂l(²)

Extension of Olsen's inequality to Morrey-Lorentz spaces

Path optimization with neural network for sign problem in quantum field theories

クライン群の可視化手法とその芸術表現への広がりに関する研究

Experimental study on the stability and measurement of ultrafine bubbles in water (本文)

参考文献