論文の公開元へ

書き出し

Refer/BibIX

RIS

BibTeX

TSV

A Research on Enhancing Reconstructed Frames in Video Codecs

PHAM DO Kim Chi 法政大学 DOI:info:doi/10.15002/00025871

2022.12.12

概要

To relieve the burden on video storage, streaming and other video services, researchers from the video community have developed a series of video coding standards for compressing the videos with higher quality and lower bitrates. While traditional coding algorithms have been continuously enhanced, deep learning based video coding have been receiving significant attention in recent years. This thesis proposes several deep learning methods to improve the performance of existing video codecs. This thesis includes the following chapters.

Chapter 1 [Introduction] describes a comprehensive review of video coding, then an analysis of the drawbacks of state-of-the-art methods.

Chapter 2 [Enhancing reference-frame interpolation for video encoder] presents the deep learning-based fractional interpolation method for generating the half-pixel and quarter-pixel of the reference samples. Motion-compensated prediction is one of the essential methods for reducing temporal redundancy in inter coding. The target of motion-compensated prediction is to predict the current frame from the list of reference frames. Recent video coding standards commonly use interpolation filters to obtain sub-pixel for the best matching block located in the fractional position of the reference frame. However, the fixed filters are not flexible to adapt to the variety of natural video contents. This chapter describes a novel CNN-based fractional interpolation in motion-compensated prediction to improve the coding efficiency. The proposed interpolation filters are designed to replace the hand-crafted interpolation filters of video coding standards, extending the ability to deal with the diversity of video content. Only one model is trained for all the fractional positions, enabling the flexibility to deal with other video coding standards with the least modifications. Moreover, two syntax elements indicate interpolation methods for the Luminance and Chrominance components, which have been added to bin-string and compressed by an entropy encoder. As a result, the proposal gains 2.9%, 0.3%, 0.6% Y, U, V BD-rate reduction, respectively, under low delay P configuration.

Chapter 3 [Compressive sensing image enhancement at video decoder] describes a deep learning-based compressive sensing (CS) enhancement technology using multiple reconstructed images for enhancing decoded images. Different from the other image compression standards, CS can get various reconstructed images by applying different reconstruction algorithms to coded data. Using this property, it is the first time to propose a deep learning based compressive sensing image enhancement framework using multiple reconstructed signals. Firstly, images are reconstructed by different CS reconstruction algorithms. Secondly, reconstructed images are assessed and sorted by a no-reference quality assessment module before input to the quality enhancement module by order of quality scores. Finally, a multiple-input recurrent dense residual network is designed for exploiting and enriching the useful information from the reconstructed images. Experimental results show that the proposal obtains 1.88–8.07dB PSNR improvement while the state-of-the-art works achieve a 1.69–6.69 dB PSNR improvement under sampling rates from 0.125 to 0.75.

Chapter 4 [In-loop filtering image enhancement for video encoder-decoder] presents a deep learning-based in-loop filtering framework for the latest video coding standard, Versatile Video Coding (VVC). The existing deep learning-based VVC in-loop filtering enhancement works mainly focus on learning the one-to-one mapping between the reconstructed and the original video frame, ignoring the potential resources at encoder and decoder. This work proposes a deep learning-based Spatial-Temporal In-Loop filtering (STILF) that takes advantage of the coding information to improve VVC in-loop filtering. Three filtering modes are applied, including VVC default in-loop filtering, self-enhancement convolutional neural network with coding unit map, and the reference-based enhancement CNN with the optical flow. To further improve the coding efficiency, this work proposes a reinforcement learning-based autonomous mode selection (AMS) approach. The agent is trained to predict the trend of splitting and filtering mode in each coding unit. By predicting filtering mode and allowing the coding unit to be split more, STILF-AMS requires zero extra bit while ensuring the quality of reconstructed images. As a result, this work outperforms the latest VVC standard and the state-of-the-art deep learning-based in-loop filtering algorithms. Remarkably, up to 18% and an average of 5.9% bitrate savings have been.

Chapter 5 [Conclusion] concludes the contributions of this dissertation.

As mentioned above, this dissertation proposes several deep learning based techniques for enhancing reconstructed frames in video encoders and decoders. As a result, the compression ratio and video quality are greatly improved.

論文の公開元へ

参考文献

[1] C. D. K. Pham, C. Fu, and J. Zhou, “Deep Learning based Spatial-Temporal In-loop filtering for Versatile Video Coding,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021, pp. 1861–1865.

[2] B. Bross, J. Chen, S. Liu, and Y.-K. Wang, “Versatile Video Coding (Draft 10),” document Rep. JVET-S2001, Teleconference, Apr. 2020.

[3] C. Evans, I. Julian, and F. Simon, “The sustainable future of video entertainment from creation to consumption,” Futuresource Consulting Ltd.: Hertfordshire, UK, pp. 1–34, 2020.

[4] G. K. Wallace, “The JPEG still picture compression standard,” IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii–xxxiv, 1992.

[5] A. Skodras, C. Christopoulos, and T. Ebrahimi, “The JPEG2000 still image compression standard,” IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 36–58, 2001.

[6] P. Tudor, “MPEG-2 video compression,” Electronics & communication engineering journal, vol. 7, no. 6, pp. 257–264, 1995.

[7] W. Gao, C. Reader, F. Wu, Y. He, L. Yu, H. Lu, S. Yang, T. Huang, and X. Pan, “Avs-the chinese next-generation video coding standard,” National association of broadcasters, Las Vegas, 2004.

[8] J. Bankoski, J. Koleszar, L. Quillio, J. Salonen, P. Wilkins, and Y. Xu, “VP8 data format and decoding guide,” in RFC 6386, 2011.

[9] A. Grange, P. De Rivaz, and J. Hunt, “VP9 bitstream & decoding process specification,” Version 0.6, March, 2016.

[10] P. de Rivaz and J. Haughton, “AV1 bitstream & decoding process specification,” The Alliance for Open Media, p. 182, 2018.

[11] T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, 2003.

[12] G. Sullivan, J. Ohm, W. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) standard,” IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649–1668, 2012.

[13] J. Lainema, F. Bossen, W.-J. Han, J. Min, and K. Ugur, “Intra coding of the HEVC standard,” IEEE transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1792–1801, 2012.

[14] H. Lakshman, B. Bross, H. Schwarz, and T. Wiegand, “Fractional-sample motion compensation using generalized interpolation,” in 28th Picture Coding Symposium. IEEE, 2010, pp. 530–533.

[15] C.-M. Fu, C.-Y. Chen, Y.-W. Huang, and S. Lei, “Sample adaptive offset for hevc,” in 2011 IEEE 13th International Workshop on Multimedia Signal Processing, 2011, pp. 1–5.

[16] A. Norkin, G. Bjontegaard, A. Fuldseth, M. Narroschke, M. Ikeda, K. Andersson, M. Zhou, and G. Van der Auwera, “Hevc deblocking filter,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1746–1754, 2012.

[17] J. Pfaff, A. Filippov, S. Liu, X. Zhao, J. Chen, S. De-Lux´an-Hern´andez, T. Wiegand, V. Rufitskiy, A. K. Ramasubramonian, and G. Van der Auwera, “Intra prediction and mode coding in VVC,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3834–3847, 2021.

[18] M. Karczewicz, N. Hu, J. Taquet, C.-Y. Chen, K. Misra, K. Andersson, P. Yin, T. Lu, E. Fran¸cois, and J. Chen, “Vvc in-loop filters,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3907–3925, 2021.

[19] D. L. Donoho et al., “Compressed sensing,” IEEE Transactions on information theory, vol. 52, no. 4, pp. 1289–1306, 2006.

[20] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Transactions on information theory, vol. 53, no. 12, pp. 4655–4666, 2007.

[21] T. T. Do, L. Gan, N. Nguyen, and T. D. Tran, “Sparsity adaptive matching pursuit algorithm for practical compressed sensing,” in 2008 42nd Asilomar Conference on Signals, Systems and Computers. IEEE, 2008, pp. 581–587.

[22] D. Needell and J. A. Tropp, “Cosamp: Iterative signal recovery from incomplete and inaccurate samples,” Applied and computational harmonic analysis, vol. 26, no. 3, pp. 301–321, 2009.

[23] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM review, vol. 43, no. 1, pp. 129–159, 2001.

[24] W. Lu and N. Vaswani, “Modified basis pursuit denoising (modified-bpdn) for noisy compressive sensing with partially known support,” in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2010, pp. 3926–3929.

[25] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996.

[26] E. Candes and J. Romberg, “l1-magic: Recovery of sparse signals via convex programming,” URL: www. acm. caltech. edu/l1magic/downloads/l1magic. pdf, vol. 4, p. 14, 2005.

[27] E. Van Den Berg and M. P. Friedlander, “Probing the pareto frontier for basis pursuit solutions,” SIAM Journal on Scientific Computing, vol. 31, no. 2, pp. 890–912, 2008.

[28] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.

[29] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable object detection using deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 2147–2154.

[30] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.

[31] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, 2017.

[32] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in European conference on computer vision. Springer, 2014, pp. 184–199.

[33] J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1646–1654.

[34] C. Dong, Y. Deng, C. C. Loy, and X. Tang, “Compression artifacts reduction by a deep convolutional network,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 576–584.

[35] Z. Wang, D. Liu, S. Chang, Q. Ling, Y. Yang, and T. S. Huang, “D3: Deep dual-domain based fast restoration of JPEG-compressed images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2764–2772.

[36] J. Guo and H. Chao, “One-to-many network for visually pleasing compression artifacts reduction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3038–3047.

[37] G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “Dvc: An end-to-end deep video compression framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 006–11 015.

[38] G. Lu, X. Zhang, W. Ouyang, L. Chen, Z. Gao, and D. Xu, “An end-to-end learning framework for video compression,” IEEE transactions on pattern analysis and machine intelligence, 2020.

[39] J. Li, B. Li, J. Xu, R. Xiong, and W. Gao, “Fully connected network-based intra prediction for image coding,” IEEE Transactions on Image Processing, vol. 27, no. 7, pp. 3236–3247, 2018.

[40] W. Cui, T. Zhang, S. Zhang, F. Jiang, W. Zuo, Z. Wan, and D. Zhao, “Convolutional neural networks based intra prediction for hevc,” in 2017 Data Compression Conference (DCC), 2017, pp. 436–436.

[41] Y. Wang, X. Fan, C. Jia, D. Zhao, and W. Gao, “Neural network based inter prediction for hevc,” in 2018 IEEE International Conference on Multimedia and Expo (ICME), 2018, pp. 1–6.

[42] Y. Wang, X. Fan, R. Xiong, D. Zhao, and W. Gao, “Neural network-based enhancement to inter prediction for video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 2, pp. 826–838, 2022.

[43] N. Yan, D. Liu, H. Li, and F. Wu, “A convolutional neural network approach for half-pel interpolation in video coding,” in 2017 IEEE international symposium on circuits and systems (ISCAS). IEEE, 2017, pp. 1–4.

[44] H. Zhang, L. Song, Z. Luo, and X. Yang, “Learning a convolutional neural network for fractional interpolation in hevc inter coding,” in 2017 IEEE Visual Communications and Image Processing (VCIP). IEEE, 2017, pp. 1–4.

[45] S. Xia, W. Yang, Y. Hu, S. Ma, and J. Liu, “A group variational transformation neural network for fractional interpolation of video coding,” in 2018 Data Compression Conference. IEEE, 2018, pp. 127–136.

[46] C. Pham and J. Zhou, “icnn: A convolutional neural network for fractional interpolation in video coding,” in International Symposium on Artificial Intelligence and Robotics, 2019.

[47] N. Yan, D. Liu, H. Li, B. Li, L. Li, and F. Wu, “Convolutional neural network-based fractionalpixel motion compensation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 3, pp. 840–853, 2019.

[48] J. Liu, S. Xia, W. Yang, M. Li, and D. Liu, “One-for-all: Grouped variation network-based fractional interpolation in video coding,” IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2140–2151, 2018.

[49] S. Xia, W. Yang, Y. Hu, W.-H. Cheng, and J. Liu, “Switch mode based deep fractional interpolation in video coding,” in 2019 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2019, pp. 1–5.

[50] R. Yang, M. Xu, T. Liu, Z. Wang, and Z. Guan, “Enhancing quality for hevc compressed videos,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 7, pp. 2039–2054, 2018.

[51] Z. Jin, P. An, C. Yang, and L. Shen, “Quality enhancement for intra frame coding via cnns: An adversarial approach,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 1368–1372.

[52] T. Wang, W. Xiao, M. Chen, and H. Chao, “The multi-scale deep decoder for the standard hevc bitstreams,” in 2018 Data Compression Conference, 2018, pp. 197–206.

[53] X. He, Q. Hu, X. Zhang, C. Zhang, W. Lin, and X. Han, “Enhancing hevc compressed videos with a partition-masked convolutional neural network,” in 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018, pp. 216–220.

[54] L. Ma, Y. Tian, and T. Huang, “Residual-based video restoration for hevc intra coding,” in 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM). IEEE, 2018, pp. 1–7.

[55] D. E. Knuth, “Dynamic huffman coding,” Journal of algorithms, vol. 6, no. 2, pp. 163–180, 1985.

[56] X. Meng, C. Chen, S. Zhu, and B. Zeng, “A new hevc in-loop filter based on multi-channel long-short-term dependency residual networks,” in 2018 Data Compression Conference, 2018, pp. 187–196.

[57] J. Kang, S. Kim, and K. M. Lee, “Multi-modal/multi-scale convolutional neural network based in-loop filter design for next generation video codec,” in 2017 IEEE International Conference on Image Processing (ICIP), 2017, pp. 26–30.

[58] X. Song, J. Yao, L. Zhou, L. Wang, X. Wu, D. Xie, and S. Pu, “A practical convolutional neural network as loop filter for intra frame,” in 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018, pp. 1133–1137.

[59] W.-S. Park and M. Kim, “Cnn-based in-loop filtering for coding efficiency improvement,” in 2016 IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP). IEEE, 2016, pp. 1–5.

[60] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image superresolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2472–2481.

[61] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649–1668, 2012.

[62] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h. 264/avc video coding standard,” IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 560–576, 2003.

[63] J.-R. Ohm, G. J. Sullivan, H. Schwarz, T. K. Tan, and T. Wiegand, “Comparison of the coding efficiency of video coding standards—including high efficiency video coding (hevc).” IEEE Transactions on circuits and systems for video technology, pp. 1669–1684., 22.12 (2012).

[64] Y. Ye, G. Motta, and M. Karczewicz, “Enhanced adaptive interpolation filters for video coding,” in 2010 Data Compression Conference. IEEE, 2010, pp. 435–444.

[65] S. Wittmann and T. Wedi, “Separable adaptive interpolation filter for video coding,” in 2008 15th IEEE International Conference on Image Processing. IEEE, 2008, pp. 2500–2503.

[66] Z. Guo, D. Zhou, and S. Goto, “An optimized mc interpolation architecture for hevc,” in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012, pp. 1117–1120.

[67] F. Bossen, “Common test conditions and software reference configurations,” Joint Collaborative Team on Video Coding (JCT-VC) of ITUT SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Doc. JCTVC-J1100, Stockholm, Sweden, July 2012.

[68] G. Bjontegaard, “Calculation of average psnr differences between rd-curves,” VCEG-M33, 2001.

[69] Y. Zhang, P. Wang, H. Huang, Y. Zhu, D. Xiao, and Y. Xiang, “Privacy-assured fogcs: Chaotic compressive sensing for secure industrial big image data processing in fog computing,” IEEE Transactions on Industrial Informatics, 2020.

[70] Y. Zhang, Q. He, G. Chen, X. Zhang, and Y. Xiang, “A low-overhead, confidentiality-assured, and authenticated data acquisition framework for iot,” IEEE Transactions on Industrial Informatics, vol. 16, no. 12, pp. 7566–7578, 2020.

[71] Y. Zhang, P. Wang, L. Fang, X. He, H. Han, and B. Chen, “Secure transmission of compressed sampling data using edge clouds,” IEEE Transactions on Industrial Informatics, vol. 16, no. 10, pp. 6641–6651, 2020.

[72] C. Jia, S. Wang, X. Zhang, S. Wang, J. Liu, S. Pu, and S. Ma, “Content-aware convolutional neural network for in-loop filtering in high efficiency video coding,” IEEE Transactions on Image Processing, vol. 28, no. 7, pp. 3343–3356, 2019.

[73] S. Anwar and N. Barnes, “Real image denoising with feature attention,” IEEE International Conference on Computer Vision (ICCV-Oral), 2019.

[74] C. Tian, Y. Xu, Z. Li, W. Zuo, L. Fei, and H. Liu, “Attention-guided cnn for image denoising,” Neural Networks, vol. 124, pp. 117–129, 2020.

[75] Z. Yue, Q. Zhao, L. Zhang, and D. Meng, “Dual adversarial network: Toward real-world noise removal and noise generation,” in European Conference on Computer Vision. Springer, 2020, pp. 41–58.

[76] H. Qiu, Q. Zheng, G. Memmi, J. Lu, M. Qiu, and B. Thuraisingham, “Deep residual learningbased enhanced jpeg compression in the internet of things,” IEEE Transactions on Industrial Informatics, vol. 17, no. 3, pp. 2124–2133, 2020.

[77] M. El Helou and S. S¨usstrunk, “Blind universal Bayesian image denoising with Gaussian noise level learning,” IEEE Transactions on Image Processing, vol. 29, pp. 4885–4897, 2020.

[78] Z. Guan, Q. Xing, M. Xu, R. Yang, T. Liu, and Z. Wang, “Mfqe 2.0: A new approach for multiframe quality enhancement on compressed video,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2019.

[79] R. Yang, M. Xu, Z. Wang, and T. Li, “Multi-frame quality enhancement for compressed video,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6664–6673.

[80] D. Vijayalakshmi, M. K. Nath, and O. P. Acharya, “A comprehensive survey on image contrast enhancement techniques in spatial domain,” Sensing and Imaging, vol. 21, no. 1, pp. 1–40, 2020.

[81] D. Po lap, “An adaptive genetic algorithm as a supporting mechanism for microscopy image analysis in a cascade of convolution neural networks,” Applied Soft Computing, vol. 97, p. 106824, 2020.

[82] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

[83] Z. Li, J. Yang, Z. Liu, X. Yang, G. Jeon, and W. Wu, “Feedback network for image superresolution,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[84] E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.

[85] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” 115 in Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2, 2001, pp. 416–423 vol.2.

[86] M. Bevilacqua, A. Roumy, C. Guillemot, and M. line Alberi Morel, “Low-complexity singleimage super-resolution based on nonnegative neighbor embedding,” in Proceedings of the British Machine Vision Conference. BMVA Press, 2012, pp. 135.1–135.10.

[87] R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in International conference on curves and surfaces. Springer, 2010, pp. 711–730.

[88] J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed selfexemplars,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5197–5206.

[89] N. Ponomarenko, L. Jin, O. Ieremeiev, V. Lukin, K. Egiazarian, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti et al., “Image database tid2013: Peculiarities, results and perspectives,” Signal Processing: Image Communication, vol. 30, pp. 57–77, 2015.

[90] X. Liu, J. van de Weijer, and A. D. Bagdanov, “Rankiqa: Learning from rankings for noreference image quality assessment,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1040–1049.

[91] J. M. Bioucas-Dias and M. A. Figueiredo, “A new twist: Two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Transactions on Image processing, vol. 16, no. 12, pp. 2992–3004, 2007.

[92] W. Deng, W. Yin, and Y. Zhang, “Group sparse optimization by alternating direction method,” in Wavelets and Sparsity XV, vol. 8858. International Society for Optics and Photonics, 2013, p. 88580R.

[93] D. L. Donoho, Y. Tsaig, I. Drori, and J.-L. Starck, “Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit,” IEEE transactions on Information Theory, vol. 58, no. 2, pp. 1094–1121, 2012.

[94] L. Li, H. Li, D. Liu, Z. Li, H. Yang, S. Lin, H. Chen, and F. Wu, “An efficient four-parameter affine motion model for video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 8, pp. 1934–1948, 2018.

[95] K. Zhang, Y.-W. Chen, L. Zhang, W.-J. Chien, and M. Karczewicz, “An improved framework of affine motion compensation in video coding,” IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1456–1469, 2019.

[96] C.-Y. Tsai, C.-Y. Chen, T. Yamakage, I. S. Chong, Y.-W. Huang, C.-M. Fu, T. Itoh, T. Watanabe, T. Chujoh, M. Karczewicz, and S.-M. Lei, “Adaptive loop filtering for video coding,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 6, pp. 934–945, 2013.

[97] Y. Dai, D. Liu, Y. Li, and F. Wu, “AHG9: CNN-based in-loop filter proposed by USTC,” in document JVET-M0510, 13th JVET meeting, 2019.

[98] S. N. K. Kawamura, “A result of convolutional neural network filter,” document Rep. JVETM0872, Marrakech, MA, USA, Jan. 2019.

[99] Z. Huang, Y. Li, and J. Sun, “Multi-Gradient Convolutional Neural Network Based In-Loop Filter For VVC,” in 2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2020, pp. 1–6.

[100] Y. Li, L. Zhao, S. Liu, Y. Wang, Z. Chen, and X. Li, “Test results of dense residual convolutional neural network based in-loop filter,” document Rep. JVET-M0508, Marrakech, MA, USA, Jan. 2019.

[101] D. Ding, L. Kong, G. Chen, Z. Liu, and Y. Fang, “A switchable deep learning approach for in-loop filtering in video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 7, pp. 1871–1887, 2019.

[102] Z. Huang, J. Sun, X. Guo, and M. Shang, “One-for-all: An efficient variable convolution neural network for in-loop filter of vvc,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2021.

[103] ——, “Adaptive deep reinforcement learning based in-loop filter for vvc,” IEEE Transactions on Image Processing, vol. 30, pp. 5439–5451, 2021.

[104] T. Li, M. Xu, C. Zhu, R. Yang, Z. Wang, and Z. Guan, “A deep learning approach for multiframe in-loop filter of hevc,” IEEE Transactions on Image Processing, vol. 28, no. 11, pp. 5663–5678, 2019.

[105] M. Wang, S. Wan, H. Gong, and M. Ma, “Attention-based dual-scale CNN in-loop filter for Versatile Video Coding,” IEEE Access, vol. 7, pp. 145 214–145 226, 2019.

[106] S. Chen, Z. Chen, Y. Wang, and S. Liu, “In-loop filter with dense residual convolutional neural network for VVC,” in 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE, 2020, pp. 149–152.

[107] Y. Dai, D. Liu, and F. Wu, “A convolutional neural network approach for post-processing in HEVC intra coding,” in International Conference on Multimedia Modeling. Springer, 2017, pp. 28–39.

[108] T. Laude and J. Ostermann, “Deep learning-based intra prediction mode decision for hevc,” in 2016 Picture Coding Symposium (PCS). IEEE, 2016, pp. 1–5.

[109] Z. Liu, X. Yu, Y. Gao, S. Chen, X. Ji, and D. Wang, “Cu Partition Mode Decision for HEVC Hardwired Intra Encoder using Convolution Neural Network,” IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5088–5103, 2016.

[110] Z. Chen, J. Shi, and W. Li, “Learned fast hevc intra coding,” IEEE Transactions on Image Processing, vol. 29, pp. 5431–5446, 2020.

[111] F. Zaki, A. E. Mohamed, and S. G. Sayed, “Ctunet: A deep learning-based framework for fast ctu partitioning of h265/hevc intra-coding,” Ain Shams Engineering Journal, vol. 12, no. 2, pp. 1859–1866, 2021.

[112] M. Xu, T. Li, Z. Wang, X. Deng, R. Yang, and Z. Guan, “Reducing Complexity of HEVC: A Deep Learning Approach,” IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 5044–5059, 2018.

[113] T. Li, M. Xu, and X. Deng, “A deep convolutional neural network approach for complexity reduction on intra-mode hevc,” in 2017 IEEE International Conference on Multimedia and Expo (ICME), 2017, pp. 1255–1260.

[114] S. Kuanar, K. Rao, M. Bilas, and J. Bredow, “Adaptive cu mode selection in hevc intra prediction: A deep learning approach,” Circuits, Systems, and Signal Processing, vol. 38, no. 11, pp. 5081–5102, 2019.

[115] W. Han, S. Chang, D. Liu, M. Yu, M. Witbrock, and T. Huang, “Image super-resolution via dual-state recurrent networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1654–1663. 118

[116] S. Woo, J. Park, J. Lee, and I. So Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19.

[117] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in International conference on machine learning. PMLR, 2016, pp. 1928–1937.

[118] R. Furuta, N. Inoue, and T. Yamasaki, “Fully convolutional network with multi-step reinforcement learning for image processing,” in AAAI Conference on Artificial Intelligence (AAAI), 2019.

[119] ——, “Pixelrl: Fully convolutional network with reinforcement learning for image processing,” IEEE Transactions on Multimedia (TMM), vol. 22, no. 7, pp. 1704–1719, 2020.

[120] K. Yu, C. Dong, L. Lin, and C. C. Loy, “Crafting a toolchain for image restoration by deep reinforcement learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2443–2452.

[121] C. Montgomery et al., “Xiph. org Video Test Media (Derf’s collection), the xiph open source community, 1994,” Online, https://media.xiph.org/video/derf.

[122] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in NIPS Autodiff Workshop, 2017.

[123] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.

[124] S. Tokui, R. Okuta, T. Akiba, Y. Niitani, T. Ogawa, S. Saito, S. Suzuki, K. Uenishi, B. Vogel, and H. Yamazaki Vincent, “Chainer: A deep learning framework for accelerating the research cycle,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2019, pp. 2002–2011.

[125] Y. Fujita, P. Nagarajan, T. Kataoka, and T. Ishikawa, “ChainerRL: A Deep Reinforcement Learning Library,” Journal of Machine Learning Research, vol. 22, no. 77, pp. 1–14, 2021. [Online]. Available: http://jmlr.org/papers/v22/20-376.html

[126] G. Bjontegaard, “Calculation of Average PSNR Differences Between RD-Curves,” document ITU-T Q. 6/SG16 VCEG, 15th Meeting, Austin, TX, USA, 2001.

[127] J. Boyce, K. Suehring, X. Li, and V. Seregin, “JVET common test conditions and software reference configurations,” document Rep. JVET-J1010, San Diego, USA, 2018.

参考文献をもっと見る

分野

大学

学位論文種類・取得年

言語

A Research on Enhancing Reconstructed Frames in Video Codecs

概要

関連論文

Research on Image and Video Coding Algorithms for Compressive Imaging

A Research on Learned Image/Video Restoration and Compression for Solving Real-World Degradation

Efficient Convolutional Neural Networks for Brain Machine Interface Systems : A transfer learning approach

Deep Learning Based Intelligent Diagnosis Methods for Rotating Machinery Using Vibration Signal-Approach by Signal Preprocessed SAE and Improved CNN

Cross-Modal and Multi-Modal Person Re-identification with RGB-D Sensors

参考文献

分野

大学

学位論文種類・取得年

言語

コピーが完了しました

URLをコピーしました

A Research on Enhancing Reconstructed Frames in Video Codecs

概要

関連論文

Research on Image and Video Coding Algorithms for Compressive Imaging

A Research on Learned Image/Video Restoration and Compression for Solving Real-World Degradation

Efficient Convolutional Neural Networks for Brain Machine Interface Systems : A transfer learning approach

Deep Learning Based Intelligent Diagnosis Methods for Rotating Machinery Using Vibration Signal-Approach by Signal Preprocessed SAE and Improved CNN

Cross-Modal and Multi-Modal Person Re-identification with RGB-D Sensors

参考文献