Measuring and Influencing Sequential Joint Agent Behaviours

dc.contributor.authorRaffensperger, Peter Abraham
dc.date.accessioned2013-03-07T23:10:52Z
dc.date.available2013-03-07T23:10:52Z
dc.date.issued2013en
dc.description.abstractAlgorithmically designed reward functions can influence groups of learning agents toward measurable desired sequential joint behaviours. Influencing learning agents toward desirable behaviours is non-trivial due to the difficulties of assigning credit for global success to the deserving agents and of inducing coordination. Quantifying joint behaviours lets us identify global success by ranking some behaviours as more desirable than others. We propose a real-valued metric for turn-taking, demonstrating how to measure one sequential joint behaviour. We describe how to identify the presence of turn-taking in simulation results and we calculate the quantity of turn-taking that could be observed between independent random agents. We demonstrate our turn-taking metric by reinterpreting previous work on turn-taking in emergent communication and by analysing a recorded human conversation. Given a metric, we can explore the space of reward functions and identify those reward functions that result in global success in groups of learning agents. We describe 'medium access games' as a model for human and machine communication and we present simulation results for an extensive range of reward functions for pairs of Q-learning agents. We use the Nash equilibria of medium access games to develop predictors for determining which reward functions result in turn-taking. Having demonstrated the predictive power of Nash equilibria for turn-taking in medium access games, we focus on synthesis of reward functions for stochastic games that result in arbitrary desirable Nash equilibria. Our method constructs a reward function such that a particular joint behaviour is the unique Nash equilibrium of a stochastic game, provided that such a reward function exists. This method builds on techniques for designing rewards for Markov decision processes and for normal form games. We explain our reward design methods in detail and formally prove that they are correct.en
dc.identifier.urihttp://hdl.handle.net/10092/7472
dc.identifier.urihttp://dx.doi.org/10.26021/3096
dc.language.isoen
dc.publisherUniversity of Canterbury. Electrical and Computer Engineeringen
dc.relation.isreferencedbyNZCUen
dc.rightsCopyright Peter Abraham Raffenspergeren
dc.rights.urihttps://canterbury.libguides.com/rights/thesesen
dc.subjectmulti-agent systemsen
dc.subjectmulti-agent reinforcement learningen
dc.subjectdecentralised systemsen
dc.subjectresource allocationen
dc.subjectturn-takingen
dc.subjectNash equilibriaen
dc.subjectemergent behaviouren
dc.subjectreward functionsen
dc.subjectreward designen
dc.subjectMarkov decision processesen
dc.subjectMarkov chainsen
dc.titleMeasuring and Influencing Sequential Joint Agent Behavioursen
dc.typeTheses / Dissertations
thesis.degree.disciplineElectrical Engineering
thesis.degree.grantorUniversity of Canterburyen
thesis.degree.levelDoctoralen
thesis.degree.nameDoctor of Philosophyen
uc.bibnumber1876110
uc.collegeFaculty of Engineeringen
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
PeterRaffensperger.pdf
Size:
2.71 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Raffensperger_Use_of_thesis_form.pdf
Size:
132.24 KB
Format:
Adobe Portable Document Format