論文の公開元へ

書き出し

Refer/BibIX

RIS

BibTeX

TSV

Analysis of Coordination Structures of Partially Observing Cooperative Agents by Multi-Agent Deep Q-Learning

Smith Ken 早稲田大学

2021.03.15

概要

We compare the coordination structures of agents using different types of in- puts for their deep Q-networks (DQNs) by having agents play a distributed task execution game. The efficiency and performance of many multi-agent systems can be significantly affected by the coordination structures formed by agents. One im- portant factor that may affect these structures is the information provided to an agent’s DQN. In this study, we analyze the differences in coordination structures in an environment involving walls to obstruct visibility and movement. Additionally, we introduce a new DQN input, which performs better than past inputs in a dynamic setting. Experimental results show that agents with their absolute locations in their DQN input indicate a granular level of labor division in some settings, and that the consistency of the starting locations of agents significantly affects the coordination structures and performances of agents.

論文の公開元へ

この論文で使われている画像

参考文献

[1] Chollet, Francois, et al. Keras. https://github.com/fchollet/keras, 2015.

[2] E. A. O. Diallo and T. Sugawara. Coordination in Adversarial Multi-Agent with Deep Re- inforcement Learning Under Partial Observability. In 2019 IEEE 31st International Con- ference on Tools with Artificial Intelligence (ICTAI), 2019.

[3] J. Fan, Z. Wang, Y. Xie, and Z. Yang. A theoretical analysis of deep q-learning. In A. M. Bayen, A. Jadbabaie, G. Pappas, P. A. Parrilo, B. Recht, C. Tomlin, and M. Zeilinger, editors, Proceedings of the 2nd Conference on Learning for Dynamics and Control, volume 120 of Proceedings of Machine Learning Research, pages 486–489, The Cloud, 10–11 Jun 2020. PMLR.

[4] J. N. Foerster, Y. M. Assael, N. de Freitas, and S. Whiteson. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks. CoRR, abs/1602.02672, 2016.

[5] J. K. Gupta, M. Egorov, and M. Kochenderfer. Cooperative Multi-agent Control Using Deep Reinforcement Learning. In Autonomous Agents and Multiagent Systems, pages 66–83. Springer International Publishing, 2017.

[6] M. Hausknecht and P. Stone. Deep Recurrent Q-Learning for Partially Observable MDPs. CoRR, abs/1507.06527, 2015.

[7] M. Hessel, J. Modayil, H. van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. G. Azar, and D. Silver. Rainbow: Combining Improvements in Deep Rein- forcement Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18) New Orleans, Louisiana, USA, February 2-7, 2018, pages 3215– 3222. AAAI Press, 2018.

[8] G. Lample and D. S. Chaplot. Playing FPS Games with Deep Reinforcement Learning. CoRR, abs/1609.05521, 2016.

[9] J. Z. Leibo, V. Zambaldi, M. Lanctot, J. Marecki, and T. Graepel. Multi-Agent Reinforce- ment Learning in Sequential Social Dilemmas. AAMAS ’17, page 464–473, Richland, SC, 2017. International Foundation for Autonomous Agents and Multiagent Systems.

[10] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015.

[11] L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3-4):293–321, May 1992.

[12] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. CoRR, abs/1706.02275, 2017.

[13] Y. Miyashita and T. Sugawara. Cooperation and Coordination Regimes by Deep Q-Learning in Multi-agent Task Executions. In Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation, 2019.

[14] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Ried- miller. Playing Atari with Deep Reinforcement Learning. CoRR, abs/1312.5602, 2013.

[15] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human- level control through deep reinforcement learning. Nature, 518(7540):529–533, feb 2015.

[16] V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, page 807–814, Madison, WI, USA, 2010. Omnipress.

[17] D. Portugal and R. Rocha. A survey on multi-robot patrolling algorithms. In L. M. Camarinha-Matos, editor, Technological Innovation for Sustainability, pages 139–146, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.

[18] A. Sugiyama, V. Sea, and T. Sugawara. Emergence of divisional cooperation with negotia- tion and re-learning and evaluation of flexibility in continuous cooperative patrol problem. Knowledge and Information Systems, 60(3):1587–1609, Dec. 2018.

[19] T. Tieleman and G. Hinton. Neural Networks for Machine Learning - Lecture 6a - Overview of mini-batch gradient descent. 2012.

[20] H. van Hasselt, A. Guez, and D. Silver. Deep Reinforcement Learning with Double Q- learning. CoRR, abs/1509.06461, 2015.

[21] Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas. Duel- ing Network Architectures for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, page 1995–2003. JMLR.org, 2016.

[22] C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3-4):279–292, May 1992.

参考文献をもっと見る

分野

大学

学位論文種類・取得年

言語

Analysis of Coordination Structures of Partially Observing Cooperative Agents by Multi-Agent Deep Q-Learning

概要

この論文で使われている画像

関連論文

Fluency in Real-Time Video Streaming by Learning Human Perceptive Traits to Reveal the Expected Section in Outstanding Quality

Economic Irreversibility in Pandemic Control Processes: Rigorous Modeling of Delayed Countermeasures and Consequential Cost Increases

Forecast and control of dynamical systems with data assimilation: Applications to COVID-19 epidemic and to Lorenz models

Summarization and Visualization of Movement Trajectories

Control and controllability of microswimmers by a shearing flow

参考文献

分野

大学

学位論文種類・取得年

言語

コピーが完了しました

URLをコピーしました

Analysis of Coordination Structures of Partially Observing Cooperative Agents by Multi-Agent Deep Q-Learning

概要

この論文で使われている画像

関連論文

Fluency in Real-Time Video Streaming by Learning Human Perceptive Traits to Reveal the Expected Section in Outstanding Quality

Economic Irreversibility in Pandemic Control Processes: Rigorous Modeling of Delayed Countermeasures and Consequential Cost Increases

Forecast and control of dynamical systems with data assimilation: Applications to COVID-19 epidemic and to Lorenz models

Summarization and Visualization of Movement Trajectories

Control and controllability of microswimmers by a shearing flow

参考文献