Optimization Incentives and Coordination Failure in Laboratory Stag Hunt Games

Pages 21
Views 12

Please download to get full document.

View again

of 21
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Optimization Incentives and Coordination Failure in Laboratory Stag Hunt Games Raymond Battalio; Larry Samuelson; John Van Huyck Econometrica, Vol. 69, No. 3. (May, 2001), pp
Optimization Incentives and Coordination Failure in Laboratory Stag Hunt Games Raymond Battalio; Larry Samuelson; John Van Huyck Econometrica, Vol. 69, No. 3. (May, 2001), pp Econometrica is currently published by The Econometric Society. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. For more information regarding JSTOR, please contact Fri May 25 14:19: Econometncn, Vol. 69, No. 3 (May, 20011, OPTIMIZATION INCENTIVES AND COORDINATION FAILURE IN LABORATORY STAG HUNT GAMES 1. INTRODUCTION THESPECIFICATION OF THE FEASIBLE strategies and preferences that define a strategic-form game, together with the assumption that players are substantively rational, provides a powerful framework for analyzing strategic behavior. This framework in turn can be summarized by the game's best-response correspondence. For example, one need only know the best-response correspondence of a strategic-form game to identify its Nash equilibria. The classical approach to games typically either exploits only the information contained in the best-response correspondence, or augments this information with risk-dominance and payoff-dominance considerations in order to choose between strict Nash equilibria.' This paper reports an experimental investigation of three stag lzunt games. The three games have identical best-response correspondences as well as similar payoff magnitudes, but produce different behavior. Games 2R, R, and 0.6R, shown in Figures 1,2, and 3, were used in the experiment. In each game, strategy X is a strict best response to any mixture that attaches a probability greater than 9':' to X, where q s = 0.8, while Y is a strict best-response to any mixture attaching a lower probability to X. Each game has two pure-strategy equilibria, where ( X, X ) is payoff dominant and (Y,Y )is risk dominant, as well as a mixed equilibrium in which X is played with probability q . Our analysis of games 2R, R, and 0.6R is motivated by the observation that the pecuniary incentive to select a best-response to an opponent's strategy is twice as large in game 2R as it is in game R and six tenths as large in game 0.6R as it is in game R. We call this incentive, given by the difference between the payoff of the best response to an opponent's strategy and the inferior response, the optimization premium. The optimization premium may be irrelevant to substantively rational agents, but we expect people to more readily learn to play a best response when the optimization premium is large, and expect the differing optimization premia of games 2R, R, and 0.6R to induce systematically different play in laboratory experiments. ' W e thank Menesh Patel, Bill Rankin, and Nick Rupp for research assistance, Simon Anderson, John Kagel, Jack Ochs, Richard McKelvey, and John Nachbar for helpful discussions, Dan Friedman, Robert Forsythe, Paul Straub, Martin Sefton, and their collaborators for making their data available to us, and two referees for helpful comments. Eric Battalio implemented the experimental design on the TAMU economic research laboratory network. The National Science Foundation and the Texas Advanced Research Program provided financial support. The first draft of this paper was called Risk Dominance, Payoff Dominance, and Probabilistic Choice Learning, which was drafted while Van Huyck was on faculty development leave at the University of Pittsburgh. Hillas (1990) introduces a reformulation of Kohlberg and Mertens' (1986) strategic stability that makes the exclusive reliance on the best-response correspondence particularly obvious. Among theories that make an equilibrium selection in the stag hunt game, Carlsson and van Damme (1993) and Harsanyi (1995) choose the risk-dominant equilibrium, while Anderlini (1999) and Harsanyi and Selten (1988) choose the payoff-dominant equilibrium. R. BATTALIO, L. SAMUELSON, AND J. VAN HUYCK To the extent possible, games 2R, R, and 0.6R involve payoffs of similar magnitudes. In particular, the expected payoff from the mixed equilibrium is 36 for all three games. One can think of the optimization premium as describing the steepness, rather than the level, of the payoff function near an equilibrium. A larger optimization premium implies that the penalty for inferior play is larger. Our experimental results provide evidence that changing the optimization premium influences behavior. The sensitivity of individual subjects to the history of opponents' play is greater in games with a larger optimization premium. Behavior converges more quickly in game 2R than in R, and more quickly in game R than in game 0.6R. The payoff-dominant equilibrium is more likely to emerge the smaller is the optimization premium. 2. EXPERIMENTAL DESIGN The experiment consists of three treatments. Each treatment consists of eight cohorts. Eight subjects participated in each cohort. Each cohort plays one of the three games, either 2R, R, or 0.6R, seventy-five times. We used a single-population random matching protocol to pair subjects within a cohort. The subjects were informed that they were being randomly paired. The subjects had common and complete information about both their own and everybody else's earnings table. Actions were labeled 1 and 2, and each subject chose one such action in each period. After their choices were made, the subjects were randomly paired with an anonymous opponent to determine an outcome for each pair. Since outcomes were reported privately, subjects could not use common information about the outcomes in previous periods to coordinate on an equilibrium. Cell entries in Figures 1,2, and 3 denote the number of cents earned by a subject pair for each action combination in each round. Earnings were presented in matrix form and subjects were instructed on how to derive the other participant's earnings from the earnings table. OPTIMIZATION INCENTIVES 751 No preplay communication was allowed. Messages were sent electronically on a PC-network. The subjects were recruited from undergraduate economics classes at Texas A & M University in the Spring of 1996, Fall of 1997, and Spring of A total of 192 subjects participated in the experiment: eight cohorts of eight subjects in three treatments. After reading the instructions, but before the session began, the subjects filled out a questionnaire to determine that they understood how to read earnings tables3 A session lasted about two hours. Repeated play of the payoff-dominant equilibrium for seventy-five periods results in a subject earning $ OPTIMIZATION INCENTIVES Games 2R, R, and 0.6R differ in the penalty attached to not playing a best-response or, more optimistically, in the premium for playing a best-response. We refer to this incentive as the optimization premium. Let?ij(X,q) denote the expected payoff to a player in game j who plays X and expects his opponent to play X with probability q. Let?(Y, q ) be similarly defined for Y. Then the optimization premium for game j is the function r,(q): [O, 11-, R given by where 6, is the optimization premium parameter. Hence, for any opponent's strategy q, the optimization premium is twice as large in game 2R as it is in game R and six tenths as large in game 0.6R as it is in game R. Our intuition is that the process attracting players to choose best-responses will be more effective in games in which the optimization premium is larger. To make this precise, consider the following probabilistic choice model that can be derived axiomatically (see Luce (1959)) or from a random utility framework (see Maddalla (1983) and Anderson, de Palma, and Thisse (1992)): where P ( ~ A,, j) is the probability that X is chosen, given q and A, in game j, and A is a precision parameter. We can solve for the logistic-response function If A equals 0, players mix equally over all strategies, while A sufficiently large gives essentially best-response behavior. Holding A constant, subjects' behavior will be more responsive to q in game 2R than in game R and in game R than in game 0.6R, since a - The instructions for the experiment are available on the web at erl.tamu.edun or www.ssc.wisc.edu\ larrysam . 752 R. BATTALIO, L. SAMUELSON, AND J. VAN HUYCK larger optimization parameter 6, gives a logistic-response function closer to the bestresponse function: HYPOTHESIS1: Subjects' behauior will be more responsiue to beliefs the larger is the optimization premium parameter. Following Fudenberg and Levine (1998), we can use the logistic-response function to define a single-population continuous-time logistic-response dynamic, where q is reinterpreted as the frequency of action X in the population and it is assumed that the population is sufficiently large as to allow the random individual choices to be captured by a deterministic population equation.5 Figure 4 illustrates this dynamic for the case of A = 1. For any finite A 0, the magnitude of the change in the population state q, and hence the speed of convergence, differs by optimization premia. HYPOTHESIS2: Behauior will converge to an equilibrium more quickly the larger is the optimization premium. This result is typical of noisy belief-based models in which players react more vigorously to beliefs when payoff differences are larger. Common models of population behavior based on deterministic or stochastic generalizations of the replicator dynamic similarly assume that rates of adjustment are increasing in the current difference in payoffs between strategies (for example, Binmore, Gale, and Samuelson (1995), Borgers and Sarin (19971, or Weibull (1995)). Fixing A, a logit equilib~ium is a fixed point of the two players' logistic-response functions (McKelvey and Palfrey (1995)). The stationary states of the single-population logistic-response dynamic correspond to symmetric logit equilibria. Figure 4 graphs the logistic-response dynamic for the case of A = 1. For comparison, it also graphs the single-population continuous-time best-response dynamic, which is the same for all three games. Games 2R and R have three logit equilibria that are close to the best-response equilibria, with the risk-dominant equilibrium having a larger basin of attraction in the case of game R than game 2R, and with both basins of attraction being larger than in the case of the best-response dynamic.6 Game 0.6R has a single logit equilibrium (given A = 11, which is close to the risk-dominant equilibrium, and whose basin of attraction comprises the entire state space. 'A growing literature examines models of behavior in games. Rather than a complete model of adaptive behavior, our goal is to answer the question, Does the optimization premium matter? , which is most effectively answered within the context of the logit response function. 'Crawford (1995) examines an alternative belief-based dynamic. Borgers and Sarin (19971, Binmore and Samuelson (19971, Binmore, Gale, and Samuelson (19951, Erev and Roth (19981, and Roth and Erev (1995) model current actions as functions of previous experience, with favorable experiences reinforcing the tendency to take an action. In practice, beliefs are typically estimated as a function of previous outcomes, bringing the two types of model closer together (see Hopkins (1999)). More general models include Camerer and Ho's (1999) experience-weighted attraction model and Stahl's (1996, 1999) rule-learning models. 'This observation is a consequence of the way the logit equilibrium close to the mixed equilibrium changes as players become imprecise in their responses (Fudenberg and Levine (1998)). OPTIMIZATION INCENTIVES qdot FIGURE 4.-One population continuous time best-response and logistic-response dynamics ( A = 1). For any finite h 0, the basin of attraction of the logit equilibrium closest to the risk dominant equilibrium expands as the optimization premium falls, until a sufficiently low optimization premium is reached that there is a single logit equilibrium, closer to the risk-dominant than the payoff-dominant equilibrium. If we think of some fixed distribution governing the initial condition of the dynamic, then the effect of probabilistic choice is to make the payoff-dominant equilibrium less likely than in the case of best-response dynamics, and less likely as the optimization premium is smaller. This result is somewhat counterintuitive. Learning is likely to be noisy. We would expect a smaller optimization premium to increase the likelihood that noisy learning induces the population to enter the basin of attraction of the payoff-dominant equilibrium ( X,X).7A variety of forces may be behind this result, one of which is captured by the aspiration-and-imitation model of Binmore and Samuelson (1997). In their model, players are more likely to revise their strategies whenever their payoffs fall below an aspiration level. Learning is thus noisier when payoffs are smaller, and the population is more likely to stumble away from the neighborhood of an equilibrium if the latter involves relatively low payoffs. Hence, whenever the risk-dominant and payoff-dominant equilibria differ, the learning process is more likely to cause the proportion of the population playing strategy X to move away from the relatively low-payoff risk-dominant equilibrium than from the payoff-dominant equilibrium, and this difference is more 'When the optimization premium is smaller, we expect considerations other than expected-payoff calculations to become more important in shaping behavior. Analysis is likely to give way to behavioral rules and payoff consequences are likely to be assessed not by calculation but by experimentation, in the form of simply playing a strategy to see what happens. Learning thus becomes noisier. R. BATTALIO, L. SAMUELSON, AND J. VAN HUYCK CONTINGENCY TABLE I TREATMENT BY PERIOD1 SUBJECT CHOICE X Y Total 0.6R 41 (0.64) 23 (0.36) 64 (1.00) R 45 (0.70) 19 (0.30) 64 (1.00) 2 R 34 (0.53) 30 (0.47) 64 (1.00) Total 120 (0.63) 72 (0.37) 192 (1.00) CONTINGENCY TABLE I1 TREATMENT BY PERIOD75 SUBJEC~CHOICE X Y Total 0.6R 28 (0.44) 36 (0.56) 64 (1.00) R 16 (0.25) 48 (0.75) 64 (1.00) 2 R 3 (0.05) 61 (0.95) 64 (1.00) Total 47 (0.24) 145 (0.76) 192 (1.00) pronounced the smaller is the optimization premium.8 This leads to a prediction that is not made by best-response, logistic-response, or replicator dynamics: HYPOTHESIS3: Behauior is more likely to converge to the payoff-dominant equilibrium the smaller is the optimization premium. 4. EXPERIMENTAL RESULTS 4.1. Treatment Behauior In period 1, 63 percent of the subjects play X, the payoff-dominant action. Risk dominance is thus not a salient deductive selection principle, though not enough subjects focus on payoff dominance to make playing the payoff-dominant action a best-response, since 0.63 is less than q*. Contingency Table I, crossing treatment, and subject choice in period 1, can be used to test the hypothesis that initial behavior did not vary by treatment. The Chi-square statistic is 4.1 which, given 2 degrees of freedom, has a p-ualue of Hence, subjects' slight tendency to initially play the payoff-dominant action more frequently when the optimization premium is smaller is not statistically significant at conventional levels. The insignificant difference in initial behavior across treatments grows to a large treatment effect by the end of the session. Contingency Table I1 shows that in period 75, only 5 percent of subjects in treatment 2R play action X, while 44 percent of subjects in Similar considerations appear in the heterogeneous-payoff model of Myatt and Wallace (1997). In contrast, Kandori, Mailath, and Rob (1993) and Young (1993) use evolutionary arguments based on the best-response function to select the risk-dominant equilibrium of a stag hunt game, regardless of the optimization premium, while Robson and Vega-Redondo (1996) use a similar model to select the payoff-dominant equilibrium. Friedman (1996)suggests that a population may be more likely to move away from the risk-dominant equilibrium as a result of subjects' efforts to teach others that the payoff-dominant equilibrium would be better, though this intuition contrasts with the theoretical results of Ellison (1997).See also Camerer, Ho, and Chong (2000),and compare Van Huyck, Cook, and Battalio (1997). OPTIMIZATION INCENTIVES TABLE I11 THE AVERAGE CHANGEIN x GIVENx treatment 0.6R are still playing action X. The payoff-dominant action is thus more prevalent in games with smaller optimization premia. To gain some insight into the dynamics behind these outcomes, let state x denote the number of subjects choosing action X in a cohort in a period. It ranges from 0 to 8. Table I11 reports the average of the change in x, denoted by Ax, for each state and treatment. For every x in the interval {2,3,4,5), larger optimization premia are associated with average changes whose absolute values are larger, though two of the changes appear to go in the wrong direction in the case of state 3. In contrast, for states near the risk-dominant equilibrium, the largest average changes are attached to the 0. 6R treatment, which exhibits a strong tendency to move away from the risk-dominant equilibrium. This suggests that something beyond the considerations captured by the logistic choice model, such as an aspiration-based desire to avoid exceptionally low payoffs, is at work, pushing the population toward the payoff-dominant equilibrium when the optimization premium is small. Figure 5 supplements Table I11 by reporting the count for each value of Ax that goes into the average change in x.the figure is truncated at +4, because no value of x ever changed by more than +3 from one period to the next. Figure 5 shows that no value of x is perfectly absorbing. However, in treatment 0.6R, the state with the largest count for Ax = 0 was state x = 8, the payoff-dominant equilibrium, while for the other two games the largest count for Ax = 0 was at state x = 0, the risk-dominant equilibrium. This pattern remains if we normalize the counts by dividing through by the number of times each state x arose in a treatment Cohort Behavior Our analysis of the results by treatment suggests that initial behavior varies little across treatments, but experience teaches subjects to play the risk-dominant action more effectively the larger the optimization premium. In this section, we examine the data by cohort to develop an understanding of how this happens. Table IV reports the initial and terminal outcome by cohort. All 24 of the cohorts start in the basin of attraction of the risk-dominant equilibrium (Y,Y).Three 0.6R cohorts, four R cohorts, and five 2 R cohorts implement an equilibrium in period 75. This observation is consistent with hypothesis 2: cohorts with a larger optimization premium were more likely to have converged to an equilibrium by the end of the session. If we examine states near the best-response separatrix in Figure 5, that is, states x = 6 and x = 7, we do not find that movements toward the payoff dominant equilibrium (upward) are especially likely when the optimization premium is small (compare the 0.6R and R cases). Because our games have identical mixed-equilibrium payoffs, differences in the behavior predictions of the aspiration and imitation model, across optimization premia, disappear as the
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!