Eighteen different mixed equilibrium games are identified that can be classified into Regret games, Risk games, and RiskRegret games, with six games in each class. We model social preferences with the Fehr-Schmidt inequity aversion model, which contains parameters for "envy" and "spite". However, qualitatively we can find patterns that pinpoint in the direction of either belief or reinforcement learning.įinally, in the last chapter, we study the effect of a player’s social preferences on his own payoff in 2 x 2 games with only a mixed strategy equilibrium, under the assumption that the other player has no social preferences. Hence, we must conclude that re-estimating the exact parameters in a quantitative manner is difficult in most experimental setups. The results show low rates of convergence of the estimation algorithm, and even if the algorithm converges then biased estimates of the parameters are obtained most of the time. For Chapter 4, we therefore consider a broader class of learning models and we try to find under which conditions, we can re-estimate three parameters of EWA learning model from simulated data, generated for different games and scenarios. So far, we only considered "pure" belief and "pure" reinforcement learning, and nothing in between. That other characteristics of the players’ behavior, such as the number of times a player changes strategy and the number of strategy combinations the player uses, can help differentiate between the two learning models. Our conclusion is that this is also possible, especially in games with positive payoffs and in the repeated Prisoner’s Dilemma game, even when the repeated game has a relatively small number of rounds. We use the same three types of 2 x 2 games as before and investigate whether we can discern between reinforcement and belief learning in an experimental setup.
Thereto, we also examine the main question by simulating data from learning models in Chapter 3. It is also not clear how likely it is that stability actually occurs in game play. Our theoretical results imply that the learning models can be distinguished after a sufficient number of rounds have been played, but it is not clear how large that number needs to be. Our results help researchers to identify games in which belief and reinforcement learning can be discerned easily.
Maximum differentiation in behavior resulting from either belief or reinforcement learning is obtained in games with pure Nash equilibria with negative payoffs and at least one other strategy combination with only positive payoffs. Herewith, we conclude that belief and reinforcement learning can be distinguished, even in 2 x 2 games. In Chapter 2 we derive predictions for behavior in three types of games using the EWA learning model using the concept of stability: there is a large probability that all players will make the same choice in round t +1 as in t. This leads to the main question of this thesis: Can we distinguish between different types of EWA-based learning, with reinforcement and belief learning as special cases, in repeated 2 x 2 games?
Learning, belief learning, or something else) that generated the data for several games. Some previous studiesĮxplicitly state that it is difficult to determine the underlying process (either reinforcement Belief learning and (a specific type of) reinforcement learning are special cases of a hybrid learning model called Experience Weighted Attraction (EWA). Belief learning assumes that players have beliefs about which action the opponent(s) will choose and that players determine their own choice of action by finding the action with the highest payoff given the beliefs about the actions of others. Reinforcement learning assumes that successful past actions have a higher probability to be played in the future. Many approaches to learning in games fall into one of two broad classes: reinforcement and belief learning models.