Perhaps the most straightforward way to estimate at-i is based
on a time series analysis of the prior observations,
,
. Agents taking this approach do not explicitly attempt
to
reason about what determines the a-i. We call such agents 0-level
agents, meaning that they do not model the underlying behavior of
other agents. One example of a 0-level estimate is to take
as a linear function of prior observations,
![]() |
(3) |
A somewhat deeper way to learn at-i is to learn, for each agent j, a model of the relationship between j's action, atj, and its state, stj. This amounts to learning a policy function for agent j. The policy function can be written as a function of j's state and j's estimate,
![]()
Note that our 1-level agent assumes that the policy function of agent j, fj, is stationary over time. Although it is never true of learning agents, it might be reasonable for an agent to adopt it when learning about others, as it greatly simplifies the modeling problem.
A natural extension is to consider agents that model others as 1-level. What this means is that the agent attempts to learn the policy function of other agents, taking into account that in this policy the subject agent is itself modeling the policies of others. Expressed mathematically, the 2-level agent models the policy of agent j as
![]()
And of course one can define 3-level, ..., n-level agents by repeated deepening of this model.