Measuring and Influencing Sequential Joint Agent Behaviours

Type of content
Theses / Dissertations
Publisher's DOI/URI
Thesis discipline
Electrical Engineering
Degree name
Doctor of Philosophy
Publisher
University of Canterbury. Electrical and Computer Engineering
Journal Title
Journal ISSN
Volume Title
Language
Date
2013
Authors
Raffensperger, Peter Abraham
Abstract

Algorithmically designed reward functions can influence groups of learning agents toward measurable desired sequential joint behaviours. Influencing learning agents toward desirable behaviours is non-trivial due to the difficulties of assigning credit for global success to the deserving agents and of inducing coordination. Quantifying joint behaviours lets us identify global success by ranking some behaviours as more desirable than others. We propose a real-valued metric for turn-taking, demonstrating how to measure one sequential joint behaviour. We describe how to identify the presence of turn-taking in simulation results and we calculate the quantity of turn-taking that could be observed between independent random agents. We demonstrate our turn-taking metric by reinterpreting previous work on turn-taking in emergent communication and by analysing a recorded human conversation. Given a metric, we can explore the space of reward functions and identify those reward functions that result in global success in groups of learning agents. We describe 'medium access games' as a model for human and machine communication and we present simulation results for an extensive range of reward functions for pairs of Q-learning agents. We use the Nash equilibria of medium access games to develop predictors for determining which reward functions result in turn-taking. Having demonstrated the predictive power of Nash equilibria for turn-taking in medium access games, we focus on synthesis of reward functions for stochastic games that result in arbitrary desirable Nash equilibria. Our method constructs a reward function such that a particular joint behaviour is the unique Nash equilibrium of a stochastic game, provided that such a reward function exists. This method builds on techniques for designing rewards for Markov decision processes and for normal form games. We explain our reward design methods in detail and formally prove that they are correct.

Description
Citation
Keywords
multi-agent systems, multi-agent reinforcement learning, decentralised systems, resource allocation, turn-taking, Nash equilibria, emergent behaviour, reward functions, reward design, Markov decision processes, Markov chains
Ngā upoko tukutuku/Māori subject headings
ANZSRC fields of research
Rights
Copyright Peter Abraham Raffensperger