next up previous
Next: Special assumptions Up: Dynamic Multiagent Systems Previous: Dynamic Multiagent Systems

General framework

A dynamic multiagent system consists of $n\geq 2$ agents, where each agent pursues its own objective. The system starts at time 0, moving discretely forward to time T, or until the state satisfies some termination condition. At each time t, agents observe the state, choose their actions and then receive their rewards.

Such dynamic multiagent systems can be modeled as stochastic games , which combines game theory framework [18] with individual agents' Markov decision processes. Stochastic games are defined as non-cooperative games, where agents pursue their self-interests and choose their actions independently. A formal definition of stochastic games is given as the following:

Definition 1613

An n-player stochastic game is a tuple $\langle S, A, r, p\rangle$,where $S = S^{1}\times\cdots\times S^{n}$ is the state space, where Si is the part of the state relevant to agent i. \(A=A^{1}\times\cdots\times A^{n}\) is the joint action space, where Ai is the action space for agent i. $r=( r^{1},\ldots,r^{n})$represents the agent reward functions. $r^i: S^i\times A \rightarrow \Re$is the payoff function for player i. $p: S\times A \rightarrow \Delta$ is the transition probability map, where $\Delta$ is the set of probability distributions over state space S. The transition map is stationary over time.

The stochastic game framework provides us with a powerful tool to describe system dynamics and agent interaction.

A strategy $\pi=(\pi_0,\ldots,\pi_t,\ldots)$ is defined over the whole course of the game. $\pi_t$ is called the decision rule at time t. A decision rule assigns probabilities of taking certain actions to different states. A decision rule is deterministic if the probability of taking a certain action is 1.

A strategy $\pi$ is called a stationary strategy if $\pi_t=\bar{\pi}$ for all t, where the decision rule is fixed over time. if $\pi_t=f(h_t)$, $\pi$ is called a behavior strategy , where ht is the history up to time t,  
 \begin{displaymath}
h_t=(s_0, a_0^1, a_0^2, s_1, a_1^1, a_1^2, \ldots, a_{t-1}^1, a_{t-1}^2,s_t).\end{displaymath} (1)

In this paper, our learning agent assume other agents' strategies are stationary, meaning it assumes other agents' decision rules fixed over time. A decision rule is also called a policy function , and we will use the latter term in the following sections.


next up previous
Next: Special assumptions Up: Dynamic Multiagent Systems Previous: Dynamic Multiagent Systems
Junling Hu
4/27/1999