リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

大学・研究所にある論文を検索できる 「モデルベース強化学習による自律移動ロボットの安全性を考慮したビジュアルナビゲーション (本文)」の論文概要。リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

コピーが完了しました

URLをコピーしました

論文の公開元へ論文の公開元へ
書き出し

モデルベース強化学習による自律移動ロボットの安全性を考慮したビジュアルナビゲーション (本文)

石原, 悠 慶應義塾大学

2022.09.05

概要

本論文では,自律発達しながら人の労働を代替するロボットの実現を目指し,画像指示によりタスクを実行するロボットにおいて,ロボットが環境内で行動することで得られるデータから,タスク達成方法を自律的に学習する手法を提案する.実環境での動作を想定し,安全性をロボット周囲の静的障害物と衝突しないことと定義し,ロボットの安全性が学習したパラメータに依存しない枠組みを提案する.静的障害物と衝突しないことを拘束条件とした,最適制御問題として問題の定式化を行い,環境内で収集したデータから学習し,問題を解くことでロボットの行動生成を行う.具体的には,環境地図上で経路計画を実施することで安全性を考慮した行動列を複数サンプリングし,サンプルされた行動列にしたがった場合に到達する未来の状態を現在の状態から画像として予測する.そして,予測した画像が指示された画像の状態に近づくか評価し,最も近づくと評価した行動列を実行することを繰り返し行うことでタスクを実行する.提案手法の実現にあたり,画像予測モデルおよびタスクを達成するために効果的な評価指標を設計し,学習する手法の提案も行う.本研究では,安全性の考慮が不可欠である移動ロボットのナビゲーションタスクに提案手法を適用し,シミュレーションおよび実機を用いた実験を通じて,その有効性を検証する.

第1章では,本論文の背景,研究目的を述べ,その位置づけと論文の構成を述べた.

第2章では,ロボットの行動決定問題をマルコフ決定過程における拘束条件付き最適制御問題として定式化を行い,本論文で提案するアルゴリズムがモデルベース強化学習アルゴリズムの一つであることを述べた.

第3章では,画像の状態遷移モデルを設計し,モデルに与える入力画像が,状態遷移モデルの予測に与える影響を実験的に検証した結果について述べた.入力画像として,従来広く利用されてきた正面カメラではなく,周囲360度を観測可能な全方位カメラが予測に有効であることを示した.

第4章では,画像予測と行動評価に基づく行動生成アルゴリズムの実装方法を提示する.行動評価を時間ステップ数により行うため,その推定モデルの設計および学習法を示し,複数のシミュレーション及び実機実験を通じた提案手法の有効性検証結果と課題を述べた.

第5章では,初期状態から目標状態までの距離が遠い場合にタスク実行に失敗する課題を解決するため,時間ステップ数による行動評価ではなく,所望の時間ステップ以内に現在の状態から目標状態に到達する確率で定義されるタスク達成可能性と呼ぶ評価指標の利用とその予測モデルを提案し,行動生成アルゴリズムにおける有効性を示した.

第6章では結論を示し,得られた成果の重要な貢献と今後の展望について総括した.

参考文献

[1] S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal motion plan- ning,” Int. J. Rob. Res., vol. 30, pp. 846–894, June 2011.

[2] Toyota Motor Corporation, “Toyota shifts home helper robot r&d into high gear with new developer community and upgraded prototype.” https://global.toyota/en/detail/8709541, 2015. Accessed 28 June 2020.

[3] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.

[4] U. Nations, “World population prospects 2019,” Vol (ST/ESA/SE. A/424) De- partment of Economic and Social Affairs: Population Division, 2019.

[5] Cabinet Office, Government of Japan, “Moonshot research and development pro- gram.” https://www8.cao.go.jp/cstp/english/moonshot/top.html, 2019. Accessed 28 June 2020.

[6] Yaskawa Electric Corporation, “Industrial robots.” https://www.yaskawa- global.com/product/robotics, 2022. Accessed 01 April 2022.

[7] Boston Dynamics, “Automated construction site documentation.” https://blog.bostondynamics.com/automated-construction-site-documentation, 2022. Accessed 22 April 2022.

[8] iRobot, “Roomba robot vacuum cleaner.” https://www.irobot.com/roomba, 2022. Accessed 01 April 2022.

[9] S. M. LaValle, Planning Algorithms. USA: Cambridge University Press, 2006.

[10] Sony, “Airpeak base (in japanese).” https://www.sony.jp/airpeak/tool/, 2022. Ac- cessed 02 April 2022.

[11] 照明学会, “屋内照明のガイド,” 1978.

[12] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in Proceedings of the 38th Inter- national Conference on Machine Learning (M. Meila and T. Zhang, eds.), vol. 139 of Proceedings of Machine Learning Research, pp. 8821–8831, PMLR, 18–24 Jul 2021.

[13] Open AI, “Dall-e 2.” https://openai.com/dall-e-2/, 2022. Accessed 09 April 2022.

[14] D. Kragic, H. I. Christensen, et al., “Survey on visual servoing for manipulation,”

[15] S. Hutchinson, G. Hager, and P. Corke, “A tutorial on visual servo control,” IEEE Transactions on Robotics and Automation, vol. 12, no. 5, pp. 651–670, 1996.

[16] F. Chaumette and S. Hutchinson, “Visual servo control. i. basic approaches,” IEEE Robotics Automation Magazine, vol. 13, no. 4, pp. 82–90, 2006.

[17] A. Merke, S. Welker, and M. Riedmiller, “Line based robot localization under natural light conditions,” Agents in dynamic and real-time environments, p. 19, 2004.

[18] S. Zickler, T. Laue, O. Birbach, M. Wongphati, and M. Veloso, “Ssl-vision: The shared vision system for the robocup small size league,” in RoboCup 2009: Robot Soccer World Cup XIII (J. Baltes, M. G. Lagoudakis, T. Naruse, and S. S. Ghidary, eds.), (Berlin, Heidelberg), pp. 425–436, Springer Berlin Heidelberg, 2010.

[19] T. Tamada, W. Ikarashi, D. Yoneyama, K. Tanaka, Y. Yamakawa, T. Senoo, and M. Ishikawa, “High-speed bipedal robot running using high-speed visual feedback,” in 2014 IEEE-RAS International Conference on Humanoid Robots, pp. 140–145, 2014.

[20] D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in Advances in Neural Information Processing Systems (D. Touretzky, ed.), vol. 1, Morgan-Kaufmann, 1988.

[21] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. The MIT Press, second ed., 2018.

[22] 森村哲郎, 強化学習 (機械学習プロフェッショナルシリーズ). 講談社, 2019.

[23] M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda, “Purposive behavior acqui- sition for a real robot by vision-based reinforcement learning,” Machine Learning, vol. 23, no. 2, pp. 279–303, 1996.

[24] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.

[25] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep vi- suomotor policies,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1334–1373, 2016.

[26] Y. Tassa, T. Erez, and E. Todorov, “Synthesis and stabilization of complex be- haviors through online trajectory optimization,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4906–4913, 2012.

[27] R. Imamura, T. Seno, K. Kawamoto, and M. Spranger, “Expert human-level driv- ing in gran turismo sport using deep reinforcement learning with image-based rep- resentation,” in Deep RL Workshop NeurIPS 2021, 2021.

[28] L. P. Kaelbling, “Learning to achieve goals,” in IN PROC. OF IJCAI-93, pp. 1094– 1098, Morgan Kaufmann, 1993.

[29] F. Ebert, C. Finn, A. X. Lee, and S. Levine, “Self-supervised visual planning with temporal skip connections,” in 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, vol. 78 of Proceedings of Machine Learning Research, pp. 344–356, PMLR, 2017.

[30] B. Eysenbach, R. R. Salakhutdinov, and S. Levine, “Search on the replay buffer: Bridging planning and reinforcement learning,” in Advances in Neural Infor- mation Processing Systems (H. Wallach, H. Larochelle, A. Beygelzimer, F. d¥ textquotesingle Alch¥’e-Buc, E. Fox, and R. Garnett, eds.), vol. 32, Curran As- sociates, Inc., 2019.

[31] S. Nasiriany, V. Pong, S. Lin, and S. Levine, “Planning with goal-conditioned policies,” in Advances in Neural Information Processing Systems (H. Wallach, H. Larochelle, A. Beygelzimer, F. d¥textquotesingle Alch¥’e-Buc, E. Fox, and R. Garnett, eds.), vol. 32, Curran Associates, Inc., 2019.

[32] D. Singh Chaplot, R. Salakhutdinov, A. Gupta, and S. Gupta, “Neural topological slam for visual navigation,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12872–12881, 2020.

[33] N. Hirose, F. Xia, R. Martn-Martn, A. Sadeghian, and S. Savarese, “Deep visual mpc-policy learning for navigation,” IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 3184–3191, 2019.

[34] C. Finn, I. Goodfellow, and S. Levine, “Unsupervised learning for physical inter- action through video prediction,” in Advances in Neural Information Processing Systems 29, pp. 64–72, 2016.

[35] A. Wu, A. Piergiovanni, and M. S. Ryoo, “Model-based behavioral cloning with future image similarity learning,” in Conference on Robot Learning (CoRL), 2019.

[36] N. Hirose, A. Sadeghian, F. Xia, R. Martn-Martn, and S. Savarese, “Vunet: Dy- namic scene view synthesis for traversability estimation using an rgb camera,” IEEE Robotics and Automation Letters, vol. 4, pp. 2062–2069, April 2019.

[37] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). The MIT Press, 2005.

[38] T. Otsuka, Introduction to Nonlinear Optimal Control. CORONA PUBLISHING CO. LTD., 2018.

[39] T. M. Moerland, J. Broekens, and C. M. Jonker, “Model-based reinforcement learn- ing: A survey,” CoRR, vol. abs/2006.16712, 2020.

[40] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller, “Playing atari with deep reinforcement learning,” CoRR, vol. abs/1312.5602, 2013.

[41] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hass- abis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.

[42] H. v. Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelli- gence, AAAI’16, p. 20942100, AAAI Press, 2016.

[43] M. G. Bellemare, W. Dabney, and R. Munos, “A distributional perspective on reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, p. 449458, JMLR.org, 2017.

[44] W. Dabney, M. Rowland, M. G. Bellemare, and R. Munos, “Distributional rein- forcement learning with quantile regression,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’18/IAAI’18/EAAI’18, AAAI Press, 2018.

[45] W. Dabney, G. Ostrovski, D. Silver, and R. Munos, “Implicit quantile networks for distributional reinforcement learning,” in Proceedings of the 35th International Conference on Machine Learning (J. Dy and A. Krause, eds.), vol. 80 of Proceedings of Machine Learning Research, pp. 1096–1105, PMLR, 10–15 Jul 2018.

[46] M. Hessel, J. Modayil, H. van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Hor- gan, B. Piot, M. Azar, and D. Silver, “Rainbow: Combining improvements in deep reinforcement learning,” in Proceedings of the Thirty-Second AAAI Confer- ence on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’18/IAAI’18/EAAI’18, AAAI Press, 2018.

[47] S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep q-learning with model-based acceleration,” in Proceedings of The 33rd International Conference on Machine Learning (M. F. Balcan and K. Q. Weinberger, eds.), vol. 48 of Proceed- ings of Machine Learning Research, (New York, New York, USA), pp. 2829–2838, PMLR, 20–22 Jun 2016.

[48] D. Quillen, E. Jang, O. Nachum, C. Finn, J. Ibarz, and S. Levine, “Deep reinforce- ment learning for vision-based robotic grasping: A simulated comparative evalu- ation of off-policy methods,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6284–6291, 2018.

[49] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine Learning, vol. 8, no. 3, pp. 229–256, 1992.

[50] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017.

[51] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in Proceedings of the 32nd International Conference on Machine Learning (F. Bach and D. Blei, eds.), vol. 37 of Proceedings of Machine Learning Research, (Lille, France), pp. 1889–1897, PMLR, 07–09 Jul 2015.

[52] J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” in Proceedings of the International Conference on Learning Representations (ICLR), 2016.

[53] Y. Zhang and K. W. Ross, “On-policy deep reinforcement learning for the average- reward criterion,” in Proceedings of the 38th International Conference on Machine Learning (M. Meila and T. Zhang, eds.), vol. 139 of Proceedings of Machine Learn- ing Research, pp. 12535–12545, PMLR, 18–24 Jul 2021.

[54] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deter- ministic policy gradient algorithms,” in Proceedings of the 31st International Con- ference on International Conference on Machine Learning - Volume 32, ICML’14, p. I387I395, JMLR.org, 2014.

[55] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in 4th Inter- national Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings (Y. Bengio and Y. LeCun, eds.), 2016.

[56] S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in Proceedings of the 35th International Conference on Machine Learning (J. Dy and A. Krause, eds.), vol. 80 of Proceedings of Machine Learning Research, pp. 1587–1596, PMLR, 10–15 Jul 2018.

[57] G. Barth-Maron, M. W. Hoffman, D. Budden, W. Dabney, D. Horgan, D. TB, A. Muldal, N. Heess, and T. Lillicrap, “Distributed distributional deterministic policy gradients,” in International Conference on Learning Representations, 2018.

[58] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. Mc- Grew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,” in Advances in Neural Information Processing Systems (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), vol. 30, Curran Associates, Inc., 2017.

[59] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy max- imum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning (J. Dy and A. Krause, eds.), vol. 80 of Proceedings of Machine Learning Research, pp. 1861–1870, PMLR, 10–15 Jul 2018.

[60] T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine, “Soft actor-critic algorithms and applications,” CoRR, vol. abs/1812.05905, 2018.

[61] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in 2nd Interna- tional Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (Y. Bengio and Y. LeCun, eds.), 2014.

[62] B. Eysenbach and S. Levine, “Maximum entropy RL (provably) solves some robust RL problems,” in International Conference on Learning Representations, 2022.

[63] S. Han and Y. Sung, “A max-min entropy framework for reinforcement learning,” in Advances in Neural Information Processing Systems (A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, eds.), 2021.

[64] P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subramanian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs, L. Gilpin, P. Khandelwal, V. Kompella, H. Lin, P. MacAlpine, D. Oller, T. Seno, C. Sherstan, M. D. Tho- mure, H. Aghabozorgi, L. Barrett, R. Douglas, D. Whitehead, P. Du¨rr, P. Stone, M. Spranger, and H. Kitano, “Outracing champion gran turismo drivers with deep reinforcement learning,” Nature, vol. 602, no. 7896, pp. 223–228, 2022.

[65] P. Liu, D. Tateo, H. B. Ammar, and J. Peters, “Robot reinforcement learning on the constraint manifold,” in 5th Annual Conference on Robot 7earning, 2021.

[66] T. Schaul, D. Horgan, K. Gregor, and D. Silver, “Universal value function approxi- mators,” in Proceedings of the 32nd International Conference on Machine Learning (F. Bach and D. Blei, eds.), vol. 37 of Proceedings of Machine Learning Research, (Lille, France), pp. 1312–1320, PMLR, 07–09 Jul 2015.

[67] J. Garc´ıa, Fern, and o Fern´andez, “A comprehensive survey on safe reinforcement learning,” Journal of Machine Learning Research, vol. 16, no. 42, pp. 1437–1480, 2015.

[68] J. A. Martin H and J. Asian, “Learning autonomous helicopter flight with evolu- tionary reinforcement learning,” in In Proceedings of the 12th International Con- ference on Computer Aided Systems Theory, vol. 5717, pp. 75–82, 02 2009.

[69] M. E. Taylor and P. Stone, “Transfer learning for reinforcement learning domains: A survey,” Journal of Machine Learning Research, vol. 10, no. 56, pp. 1633–1685, 2009.

[70] P. Abbeel and A. Y. Ng, “Exploration and apprenticeship learning in reinforce- ment learning,” in Proceedings of the 22nd International Conference on Machine Learning, ICML ’05, (New York, NY, USA), p. 18, Association for Computing Machinery, 2005.

[71] P. Abbeel, A. Coates, and A. Y. Ng, “Autonomous helicopter aerobatics through apprenticeship learning,” The International Journal of Robotics Research, vol. 29, no. 13, pp. 1608–1639, 2010.

[72] W. B. Knox and P. Stone, “Interactively shaping agents via human reinforce- ment: The tamer framework,” in Proceedings of the fifth international conference on Knowledge capture, pp. 9–16, 2009.

[73] G. Kahn, T. Zhang, S. Levine, and P. Abbeel, “Plato: Policy learning using adap- tive trajectory optimization,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3342–3349, 2017.

[74] C. Gaskett, “Reinforcement learning under circumstances beyond its control,” in In Proceedings of the International Conference on Computational Intelligence for Modelling Control and Automation, 2003.

[75] T. Moldovan and P. Abbeel, “Risk aversion in markov decision processes via near optimal chernoff bounds,” in Advances in Neural Information Processing Systems (F. Pereira, C. Burges, L. Bottou, and K. Weinberger, eds.), vol. 25, Curran Asso- ciates, Inc., 2012.

[76] S. Levine and V. Koltun, “Guided policy search,” in Proceedings of the 30th Inter- national Conference on Machine Learning (S. Dasgupta and D. McAllester, eds.), vol. 28 of Proceedings of Machine Learning Research, (Atlanta, Georgia, USA), pp. 1–9, PMLR, 17–19 Jun 2013.

[77] H. Choset, K. Lynch, S. Hutchinson, G. Kantor, W. Burgard, L. Kavraki, and S. Thrun, Principles of Robot Motion: Theory, Algorithms, and Implementations. MIT Press, May 2005.

[78] H. Sikchi, W. Zhou, and D. Held, “Learning off-policy with online planning,” in 5th Annual Conference on Robot Learning, 2021.

[79] K. Lowrey, A. Rajeswaran, S. Kakade, E. Todorov, and I. Mordatch, “Plan on- line, learn offline: Efficient learning and exploration via model-based control,” in International Conference on Learning Representations, 2019.

[80] M. Bhardwaj, S. Choudhury, and B. Boots, “Blending {mpc} & value function approximation for efficient reinforcement learning,” in International Conference on Learning Representations, 2021.

[81] M. Mathieu, C. Couprie, and Y. LeCun, “Deep multi-scale video prediction be- yond mean square error,” in International Conference on Learning Representations, 2015.

[82] W. Lotter, G. Kreiman, and D. Cox, “Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning,” in International Conference on Learning Representations, 2017.

[83] K. Fragkiadaki, P. Agrawal, S. Levine, and J. Malik, “Learning Visual Predictive Models of Physics for Playing Billiards,” in International Conference on Learning Representations, 2016.

[84] X. Jia, B. De Brabandere, T. Tuytelaars, and L. V. Gool, “Dynamic filter net- works,” in Advances in Neural Information Processing Systems 29, pp. 667–675, 2016.

[85] M. Babaeizadeh, C. Finn, D. Erhan, R. H. Campbell, and S. Levine, “Stochastic variational video prediction,” in International Conference on Learning Represen- tations, 2018.

[86] S. Chiappa, S. Racaniere, D. Wierstra, and S. Mohamed, “Recurrent environment simulators,” in International Conference on Learning Representations, 2017.

[87] J. Oh, X. Guo, H. Lee, R. L. Lewis, and S. Singh, “Action-conditional video prediction using deep networks in atari games,” in Advances in Neural Information Processing Systems 28, pp. 2863–2871, 2015.

[88] S. M. A. Eslami, D. Jimenez Rezende, F. Besse, F. Viola, A. S. Morcos, M. Garnelo, A. Ruderman, A. A. Rusu, I. Danihelka, K. Gregor, D. P. Reichert, L. Buesing, T. Weber, O. Vinyals, D. Rosenbaum, N. Rabinowitz, H. King, C. Hillier, M. Botvinick, D. Wierstra, K. Kavukcuoglu, and D. Hassabis, “Neural scene rep- resentation and rendering,” Science, vol. 360, no. 6394, pp. 1204–1210, 2018.

[89] D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” in Advances in Neural Information Processing Systems 31, pp. 2451–2463, Curran Associates, Inc., 2018.

[90] D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, “Dream to control: Learning behaviors by latent imagination,” in International Conference on Learning Repre- sentations, 2020.

[91] D. Hafner, T. P. Lillicrap, M. Norouzi, and J. Ba, “Mastering atari with discrete world models,” in International Conference on Learning Representations, 2021.

[92] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research, vol. 47, pp. 253–279, jun 2013.

[93] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, p. 17351780, nov 1997.

[94] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.

[95] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (Y. Bengio and Y. LeCun, eds.), 2015.

[96] N. Koenig and A. Howard, “Design and use paradigms for gazebo, an open- source multi-robot simulator,” in IEEE/RSJ International Conference on Intel- ligent Robots and Systems, (Sendai, Japan), pp. 2149–2154, Sep 2004.

[97] T. Yamamoto, K. Terada, A. Ochiai, F. Saito, Y. Asahara, and K. Murase, “De- velopment of the research platform of a domestic mobile manipulator utilized for international competition and field test,” in 2018 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), pp. 7675–7682, Oct 2018.

[98] Z. Zhang, H. Rebecq, C. Forster, and D. Scaramuzza, “Benefit of large field-of-view cameras for visual odometry,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 801–808, 2016.

[99] T. Maeda, H. Ishiguro, and S. Tsuji, “Memory-based navigation using omni- directional view in unknown environment,” in IPSJ SIG technical reports, vol. 92, pp. 73–80, 1995.

[100] S. A. Broughton, Discrete Fourier analysis and wavelets applications to signal and image processing. Hoboken, N.J: John Wiley & Sons, 1st edition ed., 2009.

[101] F. Codevilla, M. Mller, A. Lpez, V. Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4693–4700, May 2018.

[102] S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and struc- tured prediction to no-regret online learning,” in Proceedings of the Fourteenth In- ternational Conference on Artificial Intelligence and Statistics (G. Gordon, D. Dun- son, and M. Dudk, eds.), vol. 15 of Proceedings of Machine Learning Research, (Fort Lauderdale, FL, USA), pp. 627–635, PMLR, 11–13 Apr 2011.

[103] G. Kahn, A. Villaflor, B. Ding, P. Abbeel, and S. Levine, “Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 5129–5136, May 2018.

[104] Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, and A. Farhadi, “Target-driven visual navigation in indoor scenes using deep reinforcement learn- ing,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3357–3364, May 2017.

[105] D. Shah, B. Eysenbach, G. Kahn, N. Rhinehart, and S. Levine, “Ving: Learning open-world navigation with visual goals,” 2020.

[106] A. Kumar, S. Gupta, D. Fouhey, S. Levine, and J. Malik, “Visual memory for robust path following,” in Advances in Neural Information Processing Systems 31, pp. 765–774, 2018.

[107] D. Pathak, P. Mahmoudieh, G. Luo, P. Agrawal, D. Chen, Y. Shentu, E. Shel- hamer, J. Malik, A. A. Efros, and T. Darrell, “Zero-shot visual imitation,” in ICLR, 2018.

[108] R. Tsuzaki and K. Yoshida, “Motion control based on fuzzy potential method for autonomous mobile robot with omnidirectional vision (in japanese),” Journal of the Robotics Society of Japan, vol. 21, no. 6, pp. 656–662, 2003.

[109] Ricoh Company, Ltd., “Products—ricoh theta.” https://theta360.com/en/about/theta/s.html, 2022. Accessed 02 May 2022.

[110] Toyota Motor Corporation, “Toyota shifts home helper robot r&d into high gear with new developer community and upgraded prototype.” https://global.toyota/en/detail/8709541, 2015. Accessed 13 March 2022.

[111] AWS RoboMaker, “Aws robomaker small house world.” https://github.com/aws- robotics, 2022. Accessed 13 March 2022.

[112] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), vol. 30, Curran Associates, Inc., 2017.

[113] D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” 2016.

[114] R. Terasawa, Y. Ariki, T. Narihira, T. Tsuboi, and K. Nagasaka, “3d-cnn based heuristic guided task-space planner for faster motion planning,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 9548–9554, 2020.

[115] Y. Chebotar, K. Hausman, Y. Lu, T. Xiao, D. Kalashnikov, J. Varley, A. Irpan, B. Eysenbach, R. C. Julian, C. Finn, and S. Levine, “Actionable models: Unsu- pervised offline reinforcement learning of robotic skills,” in Proceedings of the 38th International Conference on Machine Learning (M. Meila and T. Zhang, eds.), vol. 139 of Proceedings of Machine Learning Research, pp. 1518–1528, PMLR, 18– 24 Jul 2021.

[116] K. Noda, H. Arie, Y. Suga, and T. Ogata, “Multimodal integration learning of robot behavior using deep neural networks,” Robotics and Autonomous Systems, vol. 62, no. 6, pp. 721–736, 2014.

[117] H. Yousef, M. Boukallel, and K. Althoefer, “Tactile sensing for dexterous in-hand manipulation in robotics―a review,” Sensors and Actuators A: Physical, vol. 167, no. 2, pp. 171–187, 2011. Solid-State Sensors, Actuators and Microsystems Work- shop.

[118] K. Toko, Biomimetic Sensor Technology. Cambridge University Press, 2000.

[119] K. Toko, “Taste sensor,” Sensors and Actuators B: Chemical, vol. 64, no. 1, pp. 205–215, 2000.

[120] M. Shridhar, L. Manuelli, and D. Fox, “CLIPort: What and where pathways for robotic manipulation,” in 5th Annual Conference on Robot Learning, 2021.

[121] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driess- che, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Diele- man, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.

[122] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hu- bert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017.

参考文献をもっと見る

全国の大学の
卒論・修論・学位論文

一発検索!

この論文の関連論文を見る