next up previous
Next: Modeling Other Agents Up: Dynamic Multiagent Systems Previous: General framework

Special assumptions

We further assume all states are observable and each agent knows its own reward function. The only thing unknown is a-i, the actions to be taken by other agents. The state evolves according to st+1=p(st, at), where st+1i=pi(sti, at). That is, agent i's state at t+1 depends on the agent's current state and the current joint action. We assume that the transition function h is deterministic. We allow both the state and action spaces be real (i.e., continuous) domains.

The agent's objective is to

\begin{displaymath}
\max \sum_{t=1}^{T} r_{t}^{i} \end{displaymath}

Agent i's reward rit is the improvement in utility at time t: rti= Ui(sti) - Ui(st-1i). Note that the agent's utility is a function of its local state. Since  
 \begin{displaymath}
 \sum_{t=1}^{T} r_{t}^{i} = \sum_{t=1}^{T} \left[ U^{i}(s_{t}^{i}) -
U^{i}(s_{t-1}^{i})\right] = U^{i}(s_{T}) - U^{i}(s_{0}),\end{displaymath} (2)
and Ui(s0) is a constant independent of i's actions, we see that maximizing the sum of rewards is equivalent to maximizing final utility.

For agent i, its each period's rewards,

rt+1i = Ui(pi(sti,at)) - Ui(sti),

depend on the actions of other agents, at-i. The object of the learning problem is to predict these actions--explicitly or implicitly--so that the agent can effectively choose its own.

To simplify the learning problem, we assume that the agent makes its decisions myopically , that is, considering only the current time period. To maximize the current reward at t (2), the agent solves

\begin{displaymath}
\arg\max_{a_{t}^{i}\in\mathcal{A}^{i}}
 U^{i}(p^{i}(s_{t}^{i},a_{t}^{i},a_{t}^{-i})).\end{displaymath}

A generally applicable (albeit not optimal) approach is to form an estimate, \(
\hat{a}_{t}^{-i} \), of the other agents' actions, and solve the problem as if the estimate were correct. The entire decision then reduces to the question of how to form estimates.


next up previous
Next: Modeling Other Agents Up: Dynamic Multiagent Systems Previous: General framework
Junling Hu
4/27/1999