A dynamic multiagent system consists of
agents,
where each agent pursues its own objective.
The system starts at time 0, moving discretely forward to time
T, or until the state satisfies some termination condition.
At each time t, agents observe the state, choose their actions and then
receive their rewards.
Such dynamic multiagent systems can be modeled as stochastic games , which combines game theory framework [18] with individual agents' Markov decision processes. Stochastic games are defined as non-cooperative games, where agents pursue their self-interests and choose their actions independently. A formal definition of stochastic games is given as the following:
Definition 1613
An n-player stochastic game is a tuple
,where
is the state space, where Si is the part of the state relevant to agent i.
is the joint action space, where Ai is the action space for agent i.
represents the agent reward functions.
is the payoff function for player i.
is
the transition probability map, where
is the set of probability
distributions over state space S. The transition map is stationary over
time.
The stochastic game framework provides us with a powerful tool to describe system dynamics and agent interaction.
A strategy
is defined over the whole
course of the game.
is called the decision rule at time t. A
decision rule assigns probabilities of taking certain actions to different
states. A decision rule is deterministic if the probability of taking a
certain action is 1.
A strategy
is called a stationary strategy if
for all t, where the decision rule is fixed over time.
if
,
is called a behavior strategy , where ht
is the history up to time t,
| |
(1) |
In this paper, our learning agent assume other agents' strategies are stationary, meaning it assumes other agents' decision rules fixed over time. A decision rule is also called a policy function , and we will use the latter term in the following sections.