[1] A. Farinelli, L. Iocchi, and D. Nardi. Multirobot systems: a classification focused on coordination. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol. 34, No. 5, pp. 2015–2028, 2004.
[2] Tucker Balch and Lynne E Parker. Robot teams: from diversity to polymorphism. CRC Press, 2002.
[3] Tucker Balch, Gary Boone, Thomas Collins, Harold Forbes, Doug MacKenzie, and Juan Carlos Santamar. Io, ganymede, and callisto a multiagent robot trash-collecting team. AI magazine, Vol. 16, No. 2, pp. 39–39, 1995.
[4] Mark d’Inverno, David Kinny, Michael Luck, and Michael Wooldridge. A formal specification of dmars. In Munindar P. Singh, Anand Rao, and Michael J. Wooldridge, editors, Intelligent Agents IV Agent Theories, Architectures, and Languages, pp. 155– 176, Berlin, Heidelberg, 1998. Springer Berlin Heidelberg.
[5] Barbara Dunin-Keplicz and Rineke Verbrugge. Teamwork in multi-agent systems: A formal approach, Vol. 21. John Wiley & Sons, 2011.
[6] Wojciech Jamroga and Thomas ˚Agotnes. Constructive knowledge: what agents can achieve under imperfect information. Journal of Applied Non-Classical Logics, Vol. 17, No. 4, pp. 423–475, 2007.
[7] C. M. Macal and M. J. North. Tutorial on agent-based modeling and simulation. In Proceedings of the Winter Simulation Conference, 2005., pp. 14 pp.–, 2005.
[8] Katia P. Sycara. Multiagent systems. AI Magazine, Vol. 19, No. 2, p. 79, Jun. 1998.
[9] Edmund H Durfee and Jeffrey S Rosenschein. Distributed problem solving and multi- agent systems: Comparisons and examples. In Proceedings of the Thirteenth Inter- national Distributed Artificial Intelligence Workshop, pp. 94–104, 1994.
[10] Jacques Ferber, Olivier Gutknecht, and Fabien Michel. From agents to organizations: An organizational view of multi-agent systems. In International workshop on agent- oriented software engineering, pp. 214–230, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg.
[11] Lin Padgham and Michael Winikoff. Developing intelligent agent systems: A practical guide, Vol. 13. John Wiley & Sons, 2005.
[12] Bryan Horling and Victor Lesser. A survey of multi-agent organizational paradigms.Knowl. Eng. Rev., Vol. 19, No. 4, pp. 281–316, December 2004.
[13] Kumpati S Narendra and Mandayam AL Thathachar. Learning automata: an intro- duction. Courier corporation, 2012.
[14] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
[15] Karl Tuyls and Simon Parsons. What evolutionary game theory tells us about mul- tiagent learning. Artificial Intelligence, Vol. 171, No. 7, pp. 406–416, 2007.
[16] Spiros Kapetanakis and Daniel Kudenko. Reinforcement learning of coordination in cooperative multi-agent systems. AAAI/IAAI, Vol. 2002, pp. 326–331, 2002.
[17] Pieter Jan ’t Hoen, Karl Tuyls, Liviu Panait, Sean Luke, J. A.La Poutr¥’e. An overview of cooperative and competitive multiagent learning. In Karl Tuyls, Pieter Jan’t Hoen, Katja Verbeeck, and Sandip Sen, editors, Learning and Adaption in Multi-Agent Systems, pp. 1–46, Berlin, Heidelberg, 2006. Springer Berlin Heidel- berg.
[18] Gerald Tesauro. Practical issues in temporal difference learning. Mach. Learn., Vol. 8, No. 3–4, pp. 257–277, May 1992.
[19] M. J. Wooldridge and N. R. Jennings. Intelligent agents: Theory and practice. 1995.
[20] Hans Weigand and Virginia Dignum. I am autonomous, you are autonomous. In Matthias Nickles, Michael Rovatsos, and Gerhard Weiss, editors, Agents and Com- putational Autonomy, pp. 227–236, Berlin, Heidelberg, 2004. Springer Berlin Heidel- berg.
[21] Yoav Shoham and Kevin Leyton-Brown. Multiagent systems: Algorithmic, game- theoretic, and logical foundations. Cambridge University Press, 2008.
[22] Robert Duncan. What is the right organization structure? decision tree analysis provides the answer. Organizational Dynamics, Vol. 7, No. 3, pp. 59 – 80, 1979.
[23] Henry Mintzberg. Structure in fives: Designing effective organizations. Prentice-Hall, Inc, 1993.
[24] R. G. Smith and R. Davis. Frameworks for cooperation in distributed problem solving. IEEE Transactions on Systems, Man, and Cybernetics, Vol. 11, No. 1,pp. 61–70, 1981.
[25] Warren B Powell. Approximate Dynamic Programming: Solving the curses of dimen- sionality, Vol. 703. John Wiley & Sons, 2007.
[26] Thomas G Dietterich. Hierarchical reinforcement learning with the maxq value func- tion decomposition. Journal of artificial intelligence research, Vol. 13, pp. 227–303, 2000.
[27] Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Comparing evolutionary and temporal difference methods in a reinforcement learning domain. In Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, GECCO ’06,pp. 1321–1328, New York, NY, USA, 2006. Association for Computing Machinery.
[28] J. M. Vidal and E. H. Durfee. The moving target function problem in multi-agent learning. In Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160), pp. 317–324, 1998.
[29] Katja Verbeeck, Ann Now´e, and Karl Tuyls. Coordinated exploration in multi- agent reinforcement learning: An application to load-balancing. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS ’05, pp. 1105–1106, New York, NY, USA, 2005. Association for Computing Machinery.
[30] Mary McGlohon and Sandip Sen. Learning to cooperate in multi-agent systems by combining q-learning and evolutionary strategy. International Journal on Lateral Computing, Vol. 1, No. 2, pp. 58–64, 2005.
[31] Robert H Crites and Andrew G Barto. Improving elevator performance using re- inforcement learning. In Advances in neural information processing systems, pp. 1017–1023, 1996.
[32] Kagan Tumer and Adrian Agogino. Distributed agent-based air traffic flow man- agement. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS ’07, New York, NY, USA, 2007. Association for Computing Machinery.
[33] A. Agogino and K. Tumer. Efficient evaluation functions for evolving coordination.Evolutionary Computation, Vol. 16, No. 2, pp. 257–288, 2008. PMID: 18554102.
[34] DAVID H. WOLPERT and KAGAN TUMER. Optimal Payoff Functions for Mem- bers of Collectives, pp. 355–369. World Scientific, 2002.
[35] Adrian K Agogino and Kagan Tumer. Analyzing and visualizing multiagent rewards in dynamic and stochastic domains. Autonomous Agents and Multi-Agent Systems, Vol. 17, No. 2, pp. 320–338, 2008.
[36] Yoav Shoham, Rob Powers, and Trond Grenager. Multi-agent reinforcement learning: a critical survey. Technical report, Technical report, Stanford University, 2003.
[37] Lucian Busoniu, Robert Babuska, and Bart De Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Systems, Man, and Cybernetics, Part C, Vol. 38, No. 2, pp. 156–172, 2008.
[38] Caroline Claus and Craig Boutilier. The dynamics of reinforcement learning in co- operative multiagent systems. AAAI/IAAI, Vol. 1998, pp. 746–752, 1998.
[39] Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine Learning Proceedings 1994, pp. 157–163. Elsevier, 1994.
[40] Junling Hu and Michael P Wellman. Nash Q-learning for general-sum stochastic games. Journal of machine learning research, Vol. 4, No. Nov, pp. 1039–1069, 2003.
[41] Amy Greenwald, Keith Hall, and Roberto Serrano. Correlated Q-learning. In ICML, Vol. 3, pp. 242–249, 2003.
[42] Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pp. 330– 337, 1993.
[43] MC Xie and A Tachibana. Cooperative behavior acquisition for multi-agent systems by Q-learning. In Foundations of Computational Intelligence, 2007. FOCI 2007. IEEE Symposium on, pp. 424–428. IEEE, 2007.
[44] Adrian K. Agogino and Kagan Tumer. Unifying temporal and structural credit assignment problems. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2, AAMAS ’04, pp. 980–987, USA, 2004. IEEE Computer Society.
[45] Kagan Tumer and Adrian Agogino. Multiagent learning for black box system reward functions. Advances in Complex Systems, Vol. 12, No. 04n05, pp. 475–492, 2009.
[46] Raphen Becker, Shlomo Zilberstein, Victor Lesser, and Claudia Goldman. Solving transition independent decentralized markov decision processes. J. Artif. Intell. Res. (JAIR), Vol. 22, pp. 423–455, 07 2004.
[47] Yu-Han Chang, Tracey Ho, and Leslie Pack Kaelbling. All learning is local: Multi- agent learning in global reward games. In Proceedings of the 16th International Conference on Neural Information Processing Systems, NIPS’03, pp. 807–814, Cam- bridge, MA, USA, 2003. MIT Press.
[48] Sam Devlin, Logan Yliniemi, Daniel Kudenko, and Kagan Tumer. Potential-based difference rewards for multiagent reinforcement learning. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, pp. 165– 172, 2014.
[49] Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. In AAAI, Vol. 2, p. 5. Phoenix, AZ, 2016.
[50] Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE international conference on robotics and automation (ICRA), pp. 3389–3396. IEEE, 2017.
[51] Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim- to-real transfer of robotic control with dynamics randomization. In 2018 IEEE In- ternational Conference on Robotics and Automation (ICRA), pp. 1–8. IEEE, 2018.
[52] Guillaume Lample and Devendra Singh Chaplot. Playing fps games with deep rein- forcement learning. In AAAI, pp. 2140–2146, 2017.
[53] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep re- inforcement learning. arXiv preprint arXiv:1312.5602, 2013.
[54] Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experi- ence replay. arXiv preprint arXiv:1511.05952, 2015.
[55] Gregory Palmer, Karl Tuyls, Daan Bloembergen, and Rahul Savani. Lenient multi- agent deep reinforcement learning. In Proceedings of the 17th International Con- ference on Autonomous Agents and MultiAgent Systems, pp. 443–451. International Foundation for Autonomous Agents and Multiagent Systems, 2018.
[56] Kun Shao, Yuanheng Zhu, and Dongbin Zhao. Starcraft micromanagement with rein- forcement learning and curriculum transfer learning. IEEE Transactions on Emerging Topics in Computational Intelligence, Vol. 3, No. 1, pp. 73–84, 2018.
[57] Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mor- datch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pp. 6382–6393, 2017.
[58] Guillaume Sartoretti, Yue Wu, William Paivine, TK Satish Kumar, Sven Koenig, and Howie Choset. Distributed reinforcement learning for multi-robot decentralized col- lective construction. In Distributed Autonomous Robotic Systems, pp. 35–49. DARS, Springer, 2019.
[59] Jakob N Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. In AAAI 2018: Pro- ceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, February 2018.
[60] Duc Thien Nguyen, Akshat Kumar, and Hoong Chuin Lau. Credit assignment for collective multiagent rl with global rewards. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 8113–8124, Red Hook, NY, USA, 2018. Curran Associates Inc.
[61] Jiechuan Jiang and Zongqing Lu. Learning attentional communication for multi- agent cooperation. In Advances in neural information processing systems, pp. 7254– 7264, 2018.
[62] Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, Vol. 4, No. 2, pp. 26–31, 2012.
[63] Wei Meng, Zhirong He, Rodney Teo, Rong Su, and Lihua Xie. Integrated multi-agent system framework: decentralised search, tasking and tracking. IET Control Theory & Applications, Vol. 9, No. 3, pp. 493–502, 2014.
[64] Patrick Mannion, Jim Duggan, and Enda Howley. An experimental review of rein- forcement learning algorithms for adaptive traffic signal control. In Autonomic Road Transport Support Systems, pp. 47–66. Springer, 2016.
[65] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, Vol. 521, No. 7553, p. 436, 2015.
[66] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, Vol. 8, No. 3-4, pp. 279–292, 1992.
[67] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, Vol. 518, No. 7540, p. 529, 2015.
[68] Maximilian Hu¨ttenrauch, Adrian Sˇoˇsi´c, and Gerhard Neumann. Guided deep rein- forcement learning for swarm systems. arXiv preprint arXiv:1709.06011, 2017.
[69] Ayumi Sugiyama and Toshiharu Sugawara. Improvement of robustness to environ- mental changes by autonomous divisional cooperation in multi-agent cooperative patrol problem. In Yves Demazeau, Paul Davidsson, Javier Bajo, and Zita Vale, edi- tors, Advances in Practical Applications of Cyber-Physical Multi-Agent Systems: The PAAMS Collection, pp. 259–271, Cham, 2017. Springer International Publishing.
[70] Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Philip Torr, Pushmeet Kohli, Shimon Whiteson, et al. Stabilising experience replay for deep multi-agent rein- forcement learning. Proceedings of the 34th International Conference on Machine Learning, Vol. 70, pp. 1146–1155, 2017.
[71] Yuki Miyashita and Toshiharu Sugawara. Coordination in collaborative work by deep reinforcement learning with various state descriptions. In Matteo Baldoni, Mehdi Dastani, Beishui Liao, Yuko Sakurai, and Rym Zalila Wenkstern, editors, PRIMA 2019: Principles and Practice of Multi-Agent Systems, pp. 550–558, Cham, 2019. Springer International Publishing.