論文の公開元へ

書き出し

Refer/BibIX

RIS

BibTeX

TSV

時空間特徴量を⽤いた⼈物照合に関する研究

廣井優姫早稲田大学

2021.03.15

概要

1.1 研究背景
近年、防犯意識の⾼まりや監視カメラの普及に伴い、膨⼤な動画像データを収集することが可能となったが、この膨⼤なデータを⼈間の⼿で処理することは時間と⼿間がかかってしまう。そこで、機械学習を⽤いて⾼速かつ正確に処理することが求められる。特に、視野を共有しない異なる複数カメラに写る同⼀⼈物を識別する⼈物照合技術(re-ID)は防犯⽤途やマーケティング⽤途等、活⽤の幅も広いことから、⼤いに注⽬を集めている研究分野の⼀つである。

また、⼈物照合技術は⼤きく⼆つに分けることができる。⼀つ⽬は Image-based re-ID である。これは画像情報から特徴量を抽出することができる。しかし、画像情報のみを取得するので、似た服装を着た別⼈物や向きが異なる同⼀⼈物に対して識別が難しいという課題がある。⼀⽅、⼆つ⽬の Video-based re-ID は画像情報と時系列情報から時空間特徴量を抽出することが可能であり、⾒た⽬だけでなく歩き⽅などの動作により識別することができる。そのため、近年では後者の Video-based re-ID の⽅が盛んに研究されている。しかし、 Video-based re-ID にもまだ課題が残っている。例えば、動作により⼈物を識別することから、歩⾏以外の特定の動作(ポケットから物を取り出す動作、携帯電話で電話をかける動作など)の影響を受けやすいことが挙げられる。更に、⼊⼒データが動画もしくは時系列上に連続した複数の画像である必要があることから、⼊⼒する動画データのフレーム数のばらつきの違い、すなわち⼈物がカメラに写っている時間が異なるという課題が挙げられる。

1.2 研究⽬的
本研究では、Video-based re-ID に焦点を当て、時空間特徴量と提案⼿法により、服装やアイテム、⾏動による⼈物の写り⽅の違いや⼊⼒する動画データのフレーム数のばらつきといった従来の課題を改善し、従来⼿法[1]からの精度の向上を⽬的とする。
本稿では、⼈物照合⼿法の中でも畳み込みニューラルネットワーク(以下、CNN)を⽤いて、視野を共有しない複数カメラ間での⼈物照合について説明する。

1.3 本論⽂の構成
本論⽂は以下の 6 章で構成されている。
第 1 章では本研究の研究背景及び⽬的について⽰す。
第 2 章では関連研究として、本研究の基盤となる CNN や時系列モデリング⼿法として Temporal Pooling、3D CNN、Convolutional LSTM、そしてストライプ分割特徴量について述べる。
第 3 章では提案⼿法として、Two-stream Feature-fusion Architecture、Shifting-subclip、そして Hard Positive Mining について述べる。
第 4 章では本実験で使⽤するデータセット、本実験の内容及び実験条件について述べる。
第 5 章では結果と考察として、実験結果を⽰し、実験結果から推測される考察について述べる。
第 6 章では本稿のまとめと今後の課題について述べる。

論文の公開元へ

参考文献

[1] J. Gao and R. Nevatia, “Revisiting Temporal Modeling for Video-based Person ReID”, arXiv preprint arXiv:1805.02104, 2018.

[2] D. H. Hubel and T. N. Wiesel, “Receptive Fields and Functional Architecture of Monkey Striate Cortex”, The Journal of physiology, vol. 195, pp.215‒243, 1968.

[3] Y. A. LeCun, L. Bottou and G. B. Orr, K.-R. Müller, “Efficient Backprop”, Neural Networks: Tricks of the Trade: Second Edition, pp.9‒48, 2012.

[4] V. Nair and G. E. Hinton, “Rectified Linear Units Improve Restricted Boltzmann Machines”, in Proceedings of the International Conference on Machine Learning (ICML), pp.807‒814, 2010.

[5] T. Wang, D. J. Wu, A. Coates and A. Y. Ng, “End-to-End Text Recognition with Convolutional Neural Networks”, in Proceedings of the International Conference on Pattern Recognition (ICPR), pp.3304‒3308, 2012.

[6] Y. Boureau, J. Ponce and Y. LeCun, “A Theoretical Analysis of Feature Pooling in Visual Recognition”, in Proceedings of the International Conference on Machine Learning (ICML), pp.111‒118, 2010.

[7] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg and L. Fei-Fei, “Imagenet Large Scale Visual Recognition Challenge”, International Journal of Computer Vision (IJCV), vol.115 ,pp.211‒252, 2015.

[8] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770‒778, 2016.

[9] Y. Liu, J. Yan, and W. Ouyang, “Quality Aware Network for Set to Set Recognition”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4694‒4703, 2017.

[10] Z. Zhou, Y. Huang, W. Wang, L. Wang, and T. Tan, “See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-based Person Re-identification”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6776‒6785, 2017.

[11] N. McLaughlin, J. Martinez del Rincon, and P. Miller, “Recurrent Convolutional Network for Video-based Person Re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1325‒1334, 2016.

[12] Y. Yan, B. Ni, Z. Song, C. Ma, Y. Yan and X. Yang, “Person Re-identification via Recurrent Feature Aggregation”, in Proceedings of the European Conference on Computer Vision (ECCV), pp.701‒716, 2016.

[13] L. Zheng, Z. Bie, Y. Sun and J. Wang, “MARS: A Video Benchmark for Large-Scale Person Re-Identification”, in Proceedings of the European Conference on Computer Vision (ECCV), pp.868-885, 2016.

[14] K. Hara, H. Kataoka, and Y. Satoh, “Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6546-6555, 2018.

[15] K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks”, in Proceedings of the European Conference on Computer Vision (ECCV), pp. 630‒645, 2016.

[16] S. Zagoruyko and N. Komodakis, “Wide Residual Networks”, in Proceedings of the British Machine Vision Conference, pp.87.1-87.12, 2016.

[17] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated Residual Transformations for Deep Neural Networks”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1492‒1500, 2017.

[18] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4700‒4708, 2017.

[19] X. Shi, Z. Chen, H. Wang and D. Yeung, “Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting,” in Proceedings of the 28th International Conference on Neural Information Processings Systems, vol. 1, pp.802‒810, 2015.

[20] A. Graves, “Generating Sequences with Recurrent Neural Networks”, arXiv preprint arXiv:1308.0850, 2013.

[21] 渡邊滉⼤，⻲⼭渉，澁⾕直⼤，“CNN による視野を共有しないカメラ間での⼈物同定”，電⼦情報通信学会総合⼤会 2018，D-12-69 (2018 年 3 ⽉)

[22] Y. Hiroi and W. Kameyama, “Applying Hard Positive Mining and its Evaluation for Person Re-identification”, IEICE Communications Express, vol.9, no.12, pp.622-626, 2020.

[23] T. Wang, S. Gong, X. Zhu and S. Wang, “Person Re-identification by Video Ranking”, in Proceedings of the European Conference on Computer Vision (ECCV), pp.688-703, 2014.

[24] M. Hirzer, C. Beleznai, P. M. Roth and H. Bischof, “Person Re-identification by Descriptive and Discriminative Classification”, in Proceedings of the Scandinavian Conference on Image Analysis (SCIA), pp.91‒102, 2011.

[25] Z. Cao, T. Simon, S. Wei and Y. Sheikh. “Realtime Multi-Person 2D Pose Estimation using Part Affinity”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1302-1310, 2017.

[26] Z. Zhong, L. Zheng, D. Cao and S. Li, “Re-ranking Person Re-identificaition with K- reciprocal Encoding”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1318-1327, 2017.

[27] A. Hermans, L. Beyer, and B. Leibe, “In Defense of the Triplet Loss for Person Re-Identification”, arXiv preprint arXiv:1703.07737, 2017.

[28] Y. Hiroi and W. Kameyama, “Person Re-identification by Two-stream Feature- fusion Architecture Utilizing a Partial Body Image”, in Proceedings of the IEEE 9th Global Conference on Consumer Electronics, pp.531-532, 2020.

参考文献をもっと見る

分野

大学

学位論文種類・取得年

言語

時空間特徴量を⽤いた⼈物照合に関する研究

概要

関連論文

AI組み込み型ICNによる災害時コンテンツ配信の有⽤性

深層学習を用いた胎児超音波動画における胎児胸壁に対するセグメンテーションを行うためのModel-Agnosticな新手法の確立

Measurement of the Flavor Changing Neutral Current Decays B → Kｌ+ｌ- at the Belle II Experiment

機械学習を利用したサイバーナイフ動体追尾照射の追尾誤差予測システムの開発

Physical and Chemical Structures of Young Low-Mass Protostellar Sources in Isolated Condition

参考文献

分野

大学

学位論文種類・取得年

言語

コピーが完了しました

URLをコピーしました

時空間特徴量を⽤いた⼈物照合に関する研究

概要

関連論文

AI組み込み型ICNによる災害時コンテンツ配信の有⽤性

深層学習を用いた胎児超音波動画における胎児胸壁に対するセグメンテーションを行うためのModel-Agnosticな新手法の確立

Measurement of the Flavor Changing Neutral Current Decays B → Kｌ+ｌ- at the Belle II Experiment

機械学習を利用したサイバーナイフ動体追尾照射の追尾誤差予測システムの開発

Physical and Chemical Structures of Young Low-Mass Protostellar Sources in Isolated Condition

参考文献