Measuring and Influencing Sequential Joint Agent Behaviours

Raffensperger, Peter Abraham

Measuring and Influencing Sequential Joint Agent Behaviours

Files

PeterRaffensperger.pdf (2.71 MB)

Raffensperger_Use_of_thesis_form.pdf (132.24 KB)

Type of content

Theses / Dissertations

UC permalink

http://hdl.handle.net/10092/7472
http://dx.doi.org/10.26021/3096

Thesis discipline

Electrical Engineering

Degree name

Doctor of Philosophy

Publisher

University of Canterbury. Electrical and Computer Engineering

Date

2013

Authors

Raffensperger, Peter Abraham

Abstract

Algorithmically designed reward functions can influence groups of learning agents toward measurable desired sequential joint behaviours. Influencing learning agents toward desirable behaviours is non-trivial due to the difficulties of assigning credit for global success to the deserving agents and of inducing coordination. Quantifying joint behaviours lets us identify global success by ranking some behaviours as more desirable than others. We propose a real-valued metric for turn-taking, demonstrating how to measure one sequential joint behaviour. We describe how to identify the presence of turn-taking in simulation results and we calculate the quantity of turn-taking that could be observed between independent random agents. We demonstrate our turn-taking metric by reinterpreting previous work on turn-taking in emergent communication and by analysing a recorded human conversation. Given a metric, we can explore the space of reward functions and identify those reward functions that result in global success in groups of learning agents. We describe 'medium access games' as a model for human and machine communication and we present simulation results for an extensive range of reward functions for pairs of Q-learning agents. We use the Nash equilibria of medium access games to develop predictors for determining which reward functions result in turn-taking. Having demonstrated the predictive power of Nash equilibria for turn-taking in medium access games, we focus on synthesis of reward functions for stochastic games that result in arbitrary desirable Nash equilibria. Our method constructs a reward function such that a particular joint behaviour is the unique Nash equilibrium of a stochastic game, provided that such a reward function exists. This method builds on techniques for designing rewards for Markov decision processes and for normal form games. We explain our reward design methods in detail and formally prove that they are correct.

Keywords

multi-agent systems, multi-agent reinforcement learning, decentralised systems, resource allocation, turn-taking, Nash equilibria, emergent behaviour, reward functions, reward design, Markov decision processes, Markov chains

Rights

Copyright Peter Abraham Raffensperger

https://canterbury.libguides.com/rights/theses

Collections

Engineering: Theses and Dissertations

Full item page