next up previous
Next: Convergence Up: Four types of agents Previous: Best-response bidding

Three types of learning agents

A 0-level learning agent does not model the underlying policy functions of other agents. It models the actions of other agents by looking at the history data of those actions, and uses time series technique to predict the actions in next time period. For any other agent j, the 0-level agent predicts Ptj by  
 \begin{displaymath}
\hat{P}_t^j=P_{t-1}^j+\alpha (P_{t-1}^j-P_{t-2}^j).\end{displaymath} (7)

A 1-level agent tries to estimate a fixed policy functions of other agents. It assumes that other agents are 0-level competitive agents, who choose actions based on their individual optimization problem. That means other agent's actions are only function of their own states. Assuming the policy functions are linear, the estimation of Ptj is

\begin{displaymath}
\hat{P}_t^j=\alpha +\beta e_t^j. \end{displaymath}

where etj is a vector of dimension m (m is the number of different goods the agent holds), $\hat{P}_t^j=(\hat{P}_t^{j, buy}, \hat{P}_t^{j,
sell})$ is a 2-dimension vector.

A 2-level agent, like a 1-level agent, models the policy functions of other agents. But a 2-level agent assumes others are 1-level learning agents. We adopt a simplified 2-level model: Ptj=f(etj, et-j). We found that a linear regression method does not work well. One reason is the correlation between the independent variables etj, e-jt. In addition, the high dimensionality of input data requires large amount of data in order to get unbiased estimation. Since we assume agents take only a small sample of history data (fixed window length), we need to adopt a nonparametric regression method. We use the K-Nearest Neighbor method in this paper. For current joint state $(e_t^1,\ldots,e_t^m)$, take its K nearest neighbors $\{(e^1_{t_{1}},\ldots,e^m_{t_{1}}), \ldots,
(e^1_{t_k},\ldots,e^m_{t_k})\}$ as the inputs and the corresponding actions of agent j, $\{P^{j}_{t_1}, \ldots,
P^{j}_{t_k}\}$ as the outputs. The estimation of Ptj is:

\begin{displaymath}
\hat{P}_t^j=\sum _{l=1}^K W_lP^{j}_{t_{l}} \end{displaymath}

where Wl is the weight of data point el and is defined as $W_l={{1 \over d_l} \over {\sum _{l'=1}^k {1 \over d_{t_{l'}}}}}$where dl is the distance between the data point $(e_l^1,\ldots,e_l ^m)$and the query point $(e_t^1,\ldots,e_t^m)$,$d_l=\sqrt{(e_l^1-e_t^1)^2+\cdots+(e_l^m-e_t^m)^2}$.


next up previous
Next: Convergence Up: Four types of agents Previous: Best-response bidding
Junling Hu
4/27/1999