[1] J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders, “Selective search for object recognition,” International Journal of Computer Vision (IJCV), 2013.
[2] Takumi eye, https://www.ntt.com/content/dam/nttcom/hq/jp/about-us/press- releases/pdf/2017/0712.pdf.
[3] P.P. Ray, “Internet of Robotic Things: Concept, Technologies, and Challenges,” IEEE Access, 4 (2016) 9489-9500.
[4] H. Choi, and I.V. Bajic: “Deep Feature Compression for Collaborative Object Detection,” IEEE International Conference on Image Processing (ICIP2018) WP. P6.8, Oct. 2018.
[5] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
[6] I. Khokhlov, E. Davydenko, I. Osokin, I. Ryakin, A. Babaev, V. Litvinenko, R. Gorbachev. “Tiny- YOLO object detection supplemented with geometrical data,” 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), 2020.
[7] S. P. Chinchali, E. Cidon, E. Pergament, T. Chu, and S. Katti, “Neural networks meet physical networks: Distributed inference between edge devices and the cloud,” in Proc. ACM Workshop on Hot Topics in Networks, pp. 50–56, ACM, 2018.
[8] P. Mach and Z. Becvar, “Mobile edge computing: A survey on architecture and computation offloading,” IEEE Communications Surveys & Tutorials, vol. 19, no. 3, pp. 1628–1656, 2017.
[9] N. Dalal and B. Triggs. “Histograms of oriented gradients for human detection,” CVPR, pages I: 886–893, 2005.
[10] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004
[11] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intelligent Systems and their applications, vol. 13, no. 4, pp. 18–28, 1998.
[12] Lowe D G. Distinctive Image Features from Scale-Invariant Keypoints[J]. “International Journal of Computer Vision,” 2004, 60(2):91---110.
[13] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 32 (9) (2010) 1627–1645.
[14] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes (VOC) Challenge,” IJCV, 2010.
[15] J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” Annals of Statistics, 28(2):337–407, 2000.
[16] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” CVPR, 2014, pp. 580–587
[17] R. Girshick, “Fast R-CNN,” in IEEE International Conference on Computer Vision (ICCV), 2015.
[18] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards realtime object detection with region proposal networks,” in NIPS, 2015.
[19] J.R. Uijlings, K.E. van de Sande, T. Gevers, and A. Smeulders, “Selective search for object recognition,” IJCV, vol. 104, no. 2, pp. 154–171, 2013.
[20] C. Zitnick and P. Dollar, “Edge boxes: Locating object proposals from edges,” ECCV, 2014, pp. 391–405.
[21] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “Imagenet classification with deep convolutional neural networks,” NIPS, 2012, pp. 1097–1105.
[22] K. He, Zhang X, Ren S, et al., “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2015, 37(9): 1904–1916.
[23] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” CVPR, 2016.
[24] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and Reed S.E., “SSD: Single shot multibox detector, [C]” CoRR, abs/1512.02325, 2016.
[25] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263–7271.
[26] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” European conference on computer vision, pages 740–755. Springer, 2014.
[27] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv: 1804. 02767, 2018.
[28] A. Bochkovskiy, C. Wang, and H. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv: 2004. 10934, 2020.
[29] C. Wang, H. Liao, I. Yeh, Y. Wu, P. Chen, and J. Hsieh, “CSPNet: A new backbone that can enhance learning capability of CNN,” CVPR Workshop, 2020. 2, 7.
[30] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015.
[31] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Improving Object Detection with One Line of Code,” arXiv e-prints, Apr. 2017.
[32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv preprint arXiv: 1512. 03385, 2015.
[33] S. Teerapittayanon, B. McDanel, and H.-T. Kug, “Branchynet: Fast inference via early exiting from deep neural networks,” 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2016, pp. 2464–2469.
[34] J. Deng, A. Berg, S. Satheesh, H. Su, A. Khosla, and F. Li, “Imagenet large scale visual recognition competition 2012,” ILSVRC2012, 2012.
[35] M. Everingham and J. Winn, “The pascal visual object classes challenge 2011 (voc 2011) development kit. Pattern Analysis, Statistical Model ling and Computational Learning, “Tech. Rep (2011)
[36] L. Hu, T. Wang, H. Watanabe, S. Enomoto, X. Shi, A. Sakamoto and T. Eda: “ECNet: A Fast, Accurate, and Lightweight Edge-Cloud Network System Based on Cascading Structure,” IEEE Global Conference on Consumer Electronics (GCCE) 2020, pp.259-262, Sep. 2020.