Does Capitalism Need a Government to Be Nice: Robert Axelrod and His Iterated Prisoner’s Dilemma Computer Tournament

In the past few decades, the progressive political ideal of strong central governments or collective morality being necessary to enforce cooperative behavior has become widely-accepted. However, many philosophers throughout history, especially that from the classical liberalism tradition, have espoused free market, open society ideals arguing not only that governments are the source of much of the immorality and conflict many are afraid of, but also that man left in a free environment can and will cooperate and develop naturally moral systems that allow for economic and societal development. Robert Axelrod, more recently, studied this debate using the famous Prisoner’s Dilemma (PD) game from traditional game theory and applied computer science to find that, indeed, cooperation can arise naturally even between selfish prisoners. This paper develops the study further by examining more realistic situations of PD games involving multiple players, using computer simulations, i.e. agent-based modeling, and finds that Axelrod’s original conclusions hold true.

ego and to do what best helps themselves. Philosophically, those who think like Rand stress an inextricable relationship between selfishness and productivity, which would evolve into that between economic liberty and political freedom, capitalism, and democracy. This is a common stream of logic that flows from the minds of classical liberals and libertarians. They deny the necessity of a central authority to mediate economic activities but support a minimal state whose role is limited to no less than a night watchman (Boaz, 2015, p. 26).
If capitalism has a conundrum to which it must answer, it is "Under what conditions will cooperation emerge in a world of egoists without central authority?" (Axelrod, 2009, p. 3). In a predominantly selfish society where all are "free to cooperate or not", the absence of a central authority seems intimidating because everyone works for themselves and thus no one works for the whole. The idea that humans are purely motivated by greed, ego, and profit was initially a sound explanation on how competition and productivity improved under capitalism, and the magical invisible hand promotes the best for all. However, it is also an obsession with self-interest that causes capitalism's most pernicious flaws, namely growth in economic inequality and moral hazard (Piketty, 2017, p. 304).
It has long been a focus of philosophers, economists, and politicians, in the course of history, to figure out if humans are born cooperative or altruistic enough that even without a forceful central authority, capitalism can prosper without individual greed bringing the demolition of other individuals. Are humans too selfish to persist in society on our own? Right and left-wing theorists have two different answers. John Locke provided a seminal answer on which conservatives and libertarians tend to base their answers. Locke's Two Treatises firmly hold the belief that human nature is consistently selfish and evil because it is in their nature (Forster & Parker, 2008, p. 169). Based on Locke's writings, many right-wing economists and thinkers today continue the claim that humans are by nature self-centered (Machan, 2006, p. 100). They also go a step further because they do not only argue that selfishness is the de-facto state of things but that selfishness itself should be understood as a virtue that leads to rational choices (Ozinga, 1999, p. 92). The idea of unyielding selfishness and the elevation of it then makes left-wing, progressive thinkers with their claim that a strong, robust regime must be in charge of the economy and politics to prevent men's selfishness from obliterating humanity. However, what many of those thinkers tend to leave out is that Locke himself argued this within the Two Treatises, and a significant number of intellectual heavyweights contend that a stateless society is a chaotic one.
Hobbes, in particular, argues that a state of nature is of Bellum omnium contra omnes, where without a powerful, centralized government, there is neither protection of private property nor individual liberty but a tragedy for all (Karlberg & Buell, 2005, p. 22). Many progressive people today also sympathize with Hobbes' viewpoint and support the existence of a strong central government that can elicit cooperation among selfish individuals for the common good of all (Shambra & West, 2007, pp. 1-2).
A capitalistic society has not always been as competitive and prosperous as hard-wired rightists had envisioned to be. The ascent of global wealth discrepancy and labor exploitation are the tip of an iceberg of a problematic, selfish, and uncooperative society. This is yet another indication that www.scholink.org/ojs/index.php/fet Frontiers in Education Technology Vol. 3, No. 2, 2020 3 Published by SCHOLINK INC.
capitalists can no longer sustain their society through repeating an anachronistic and anti-socialist narrative. For now, the threat that capitalism faces is not just political (as in the Cold War) but rather economic: an uncooperative society not only unjustly concentrates wealth in the hands of the top elites but in the end, debilitates the economy by impoverishing social minorities, encouraging moral hazard, and prompting precipitous resource depletion (Appleby, 2010, pp. 416-417).
A solution to growing inequality, climate change, and a remarkable lack of "morality", all rapidly increasing under capitalism, requires an understanding of how cooperation can be elicited in a capitalistic society without central authority. This paper dares to undertake that job by borrowing the work of Professor Robert Axelrod, who used computer science and game theory to study the conditions under which cooperation arises without a central authority. His work is analytic proof of how cooperation does not need be a product of morality, compassion, love, nor selfless concerns for others, but, indeed, can be a product of self-interest, selfishness, and egoism: the very features of capitalism that have vociferously been criticized for their supposed "immorality" and "incompatibility" with cooperation. In the end, Robert Axelrod's work provides a general insight into how cooperative strategies can rise out of selfishness in his iterated prisoner's dilemma computer tournament. The present paper aims to extend his work onto applying it on how to realize cooperative capitalism.
Furthermore, this paper will come up with specific conditions under which participants in greed-driven capitalism can cooperate than defect to yield the most productive outcome for themselves and the rest of the society.

Game Theory and the Prisoner's Dilemma
Game theory is the study of strategies that rational decision-makers would use against competing actors. It is widely applied in different fields of economy, politics, diplomacy, business, and even biology. Axelrod, through his research on game theory, originally wanted to see what would happen to a decision maker's strategies if their counterparts were to compete and cooperate using multiple strategies. To conduct his experiment, he opened up an iterated prisoner's dilemma computer tournament, receiving submissions of strategies from different game theorists around the world. His collection of studies on the most successful strategies that produced the best outcomes in the iterated prisoner's dilemma tournament are compiled in his work The Evolution of Cooperation, which this paper will heavily rely upon explaining how the prisoner's dilemma works. ends up serving three years alone in prison (vice versa) (Kuhn). Although researchers have applied different variations to PD, both prisoners are assumed to be wholly self-centered and rational, and there exists no coercion or external influences on prisoners making a particular choice.  cooperation. The choice that was considered rational and selfish led to a result far worse than what was possible. Axelrod's prisoner dilemma is just an abstraction of a peculiar situation where the choice best for each player individually ironically leads to a worse outcome than mutual cooperation.

Prisoner's Dilemma
For a prisoner's dilemma to work, it must satisfy two conditions. The first condition has to do with the order of payoffs. In the prisoner's dilemma, the best payoff that one can get must be the temptation to defect when the other player cooperates. Then follows the reward for mutual cooperation, which is followed by the punishment for mutual defection and sucker's payoff in that order. To simplify would be a sequencing of payoffs from best to worse as T, R, P, and S. The second condition is that the dilemma cannot be resolved by players simply alternating turns to exploit each other. In other words, the outcome of mutual cooperation must be better than having an equal chance of exploiting and being exploited. When numerically measured, the mean of T and S must be less than R.
By now, one may have noticed the resemblance between a prisoner's dilemma and libertarian capitalism. It lies in the way mutual defection, which initially sounded like the most rational choice for individuals, ends up getting punished for both defecting. It is similar among individuals within capitalism that have initially made uncooperative choices to profit themselves but ended up losing more than they would have if they cooperated. A single round prisoner's dilemma, as explained above, seems to emphasize that cooperation is difficult to emerge although its outcome, a reward for mutual cooperation is more profitable for all than a punishment for mutual defection.
However, what if the Prisoner's Dilemma is not a single round but occurs over multiple rounds? If players are repeatedly granted a new opportunity to participate in another round of the game, would they still insist on defecting and getting punished or would they revert their strategy and opt into mutual cooperation? Under such conditions, is the dominant strategy still defecting? The experiments of Axelrod that will follow in the next section is necessary for addressing those questions.

Definite Iterated Prisoner's Dilemma
In the iterated version, the rules of the Prisoner's Dilemma are slightly modified. Here, the players are asked to play the prisoner's dilemma more than once, and they may remember the decisions that their opponent has made in the previous round of the game. Therefore, both players can change their own decisions accordingly in the succeeding rounds. This version of the game is commonly known as the always to defect. What about the round right before the final? Again, the game players fully comprehend that their current payoff will have no influence on their future payoff for both players will defect under any circumstances in the final round, so both decide to defect as well. The same goes for the round before that and so on. In other words, in a game of definite prisoner's dilemma, the dominant strategy in every round of the game is to defect. Neither of the players will cooperate, and the problems of a single-shot prisoner's dilemma are still left unresolved.

Indefinite Iterated Prisoner's Dilemma
In an indefinite IPD, each round of prisoner's dilemma is repeated indeterminately, meaning no player knows when the game will end or even whether the game will end. Because every element is indeterminate and particularly the end of the game, no player can reckon the final round. Nevertheless, the players do understand that the game will end someday (or the game may carry on indefinitely, but very unlikely to) and the probability that the game is repeated slowly decreases as more rounds are played. Mathematically, the probability that a prisoner's dilemma is reiterated is known as the discount factor. As such, because the future is valued less than the present, payoffs received in the future are not as valuable as payoffs received today. In other words, future payoffs are discounted so that they are worth a fraction of what they would be if they were received today instead. Axelrod defines the discount parameter as "the degree to which the payoff of each move is discounted relative to the previous move" (p. 13). A $100 payoff received in one month, would be worth only $80 today for a discount parameter of 80%.
For a series of payoffs received over multiple iterations of a game, the discount parameter determines how much they are worth. Consider a game that is played 5 times with a payoff of 10 received each time and a discount parameter, p, of 90%. The present value of the payoffs for this game is 10 + 10(0.9) + 10(0.9) 2 + 10(0.9) 3 + 10(0.9) 4 = 40.951. If the future was not discounted at all (discount parameter equal to 100%), the present value of the game would be 10 + 10 + 10 + 10 + 10 = 5(10) = 50. When the future is discounted, the present value of the series of payoffs is less than would it would have been with no discounting. For the same game played an infinite number of times, the cumulative value of the payoffs is 10 + 10(0.9) + 10(0.9) 2 + 10(0.9) 3 + 10(0.9) 4 + … = 1/(1-0.9). This comes from the fact the sum of an infinite series with discount parameter w is 1/(1-w).
Further, as the discount value p steadily approaches zero, the dominant strategy approaches to defecting because the game more and more resembles a single-shot prisoner's dilemma. On the other hand, if p approaches one, the game resembles an infinite IPD and the value of cooperating increases while the value of defecting decreases (a high discount factor indicates that choosing to defect can lead to retaliation in the subsequent rounds). Although in an indefinite IPD no player understands what upper bound would be, there always exists a probability of 1-discount factor p that the game will stop being played.
To sum up, players in a definite IPD can make a backward induction from the fact that rational players are meant to defect in the final round of the game. On the other hand, it intuitively seems that players cannot make inductive decisions in an indefinite IPD. Nevertheless, even in an indefinite IPD, the players may be aware of each round's discount value, p from the outset and the probability pi for each stage i decreases. In such circumstances, a player on a particular stage may assume that both the chance of winning rewards and the harm of future retaliation no longer outweigh the benefits of immediate defection. When a player can predict or reasonably expect a stage i at which both players are likely to defect, then backward induction is possible in an indefinite IPD as well.

Robert Axelrod and Iterated Prisoner's Dilemma
If indefinite IPD can yield a dominant strategy other than absolute defection, it is worthwhile to examine strategies that can elicit cooperation among players. Robert Axelrod conducted an IPD computer tournament to examine the effectiveness of different strategies. In the tournament, different participants wrote a computer program that encodes whether to select a cooperative or uncooperative choice on each round. The program has access to the full history of moves by far and actively utilizes the history of the game to make new choices on the next round. The tournament was a round-robin, and every entry matched with each other, so with its twin and with RANDOM strategy that either cooperates or defects randomly. The tournament awarded three points for mutual cooperation, a point for mutual defection, five points for temptation to defect, and none for sucker's payoff.

Tournament and Results
With fourteen entries submitted by experts in economics, mathematics, political science, sociology, and psychology, the most successful strategy was TIT FOR TAT, turned in by Anatol Rapoport, a professor at the University of Toronto (Simpson 16 implying that the strategy was repeatedly successful in eliciting mutual cooperation. However, surprisingly enough, every eight top-ranking entry including the TIT FOR TAT was never the first to defect. The property of never defecting first is known as being "nice". In the tournament, "nice" strategies recorded a score between 472 and 504 whereas the highest score that a not nice strategy received was only 401. The instance that made Rapoport's TIT FOR TAT the most successful among all the other "nice" strategies was its moves against not nice strategies. A majority of "nice" strategies score well when playing against their twin or other "nice" strategies because both are sure to cooperate until literally the end of the game.

Replication of Tournament
In order to gain a deeper understanding of the mechanics and dynamics of the tournament, strategies and the indefinite IPD, I programmed a few of the major strategies, i.e., TITFORTAT, JOSS and TRANQUILIZER, and ran the tournament as a computer simulation. The results were similar to that of Axelrod's, confirming the effectiveness of cooperative strategies such as TITFORTAT. The pseudocode for the strategies are explained below.

Analysis of Tournament Results
Further, one of the most necessary traits among successful strategies was the propensity to forgive, and one of the deleterious flaws that unsuccessful strategies have committed was being unforgiving. Out of all the "nice" strategies, the one with the lowest score was FRIEDMAN, which was "nice" because it never defected first but also unforgiving because once the other defects even once, it retaliated with permanent defection.
Another example of how short-term unwillingness to forgive is detrimental to earning points is shown in a duel between TIT FOR TAT and JOSS (Axelrod, Effective Choice in the Prisoner's Dilemma 14).
JOSS starts off cooperating and mirrors the choices that its opponent has made in the previous round, but instead of always cooperating after the opponent cooperated, it defects ten percent of the time.
Thus, JOSS attempts to exploit its opponent in a sly manner. In a game between TIT FOR TAT and JOSS, the outcome cannot be worse thanks to both of their short-term unwillingness to forgive. When JOSS, by a percentage of ten, decides to defect, TIT FOR TAT immediately retaliates by defecting, and so does JOSS. The result is an endless chain of recriminations, where single defection echoes back and forth (remember the second rule of a prisoner's dilemma states that the outcome of mutual cooperation is better than that of an equal chance of altering to exploit and to get exploited). JOSS emphasizes how important it is that a single defection does not spur an endless series of recriminations and counter-recriminations. In other words, the strategy that wins is the one that can forgive defections and cultivate mutual cooperation in the following rounds.
Using the examples above, professor Robert Axelrod analyzes four properties that made TIT FOR TAT the most successful strategy after all. The first is the strategy's label as "nice." TIT FOR TAT never starts off defecting, thus eliciting a reasonable degree of mutual cooperation with different strategies.
Second is a willingness to provoke. TIT FOR TAT avenges itself against opponents that have made uncalled defections, warning them that to defect will not pay because TIT FOR TAT is not exploitable.
The characteristic that makes TIT FOR TAT's short-term provocativeness effective is its forgiveness that immediately follows. TIT FOR TAT opens itself to resuming cooperation even when the opponent has a record of defection. By forgiving the opponent, TIT FOR TAT helps its opponent to understand that mutual cooperation will bring the most optimal outcome for both the strategies.
Professor Axelrod tested the accuracy of his experiment by conducting a second IPD computer tournament with a far larger size and a fuller range of strategies. To get rid of small end-game effects, instead of running a tournament of 200 games, Axelrod set the length of the tournament probabilistically, fixing the discount factor as 0.00346. Participants were all given a detailed analysis of first tournament outcomes, and all the entrants understood clearly that TIT FOR TAT was the first to rank in the tournament. Surprisingly enough, despite a larger pool of sixty-two entries from six countries, the strategy to win the tournament was again TIT FOR TAT. Although entrants were informed of TIT FOR TAT's niceness and forgiveness, none was able to come up with a single better program that TIT FOR TAT.
Although the second tournament, unlike the first one, is difficult to analyze a successful strategy thanks to its sheer size-there were over a million moves from 63 strategies matched in 3969 different ways-it once again paid to be nice. Fourteen out of the top fifteen strategies were "nice", and fourteen out of the bottom fifteen strategies were not nice (Axelrod, Effective Choice in the Prisoner's Dilemma 21-24). Then, again, among those nice rules, only the ones that retaliated were able to prevent the opponent from taking exploitative strategies of repeated defection. A strategy that lacked such retaliatory property did poorly in the tournament. One such strategy, DOWNING, which performed so well against top contenders in the first tournament, did not perform well in the second because it calculated to keep on cooperating with a program that has cooperated over half of the time by then.
Another strategy that exploited other's unwillingness to retaliate is TRANQUILIZER.
TRANQUILIZER starts off the first several rounds of cooperation to cultivate an atmosphere of mutual cooperation. When a stable interaction of mutual cooperation is formed, TRANQUILIZER lulls the opponent into forgiving occasional defections. The strategy never defects twice in succession, and never more than one-quarter of the rounds, thus not preventing its opponents from doubting its action but rather eliciting forgiveness among its opponents for sporadic defections. Again, a lack of retaliation after an "uncalled for" defection should not be confused with niceness. A successful strategy is "nice" and forgiving yet ALSO retaliatory. TIT FOR TAT had them all, thus making itself the champion of both the first and the second tournament. Being incited by the opponent's defection is also different from being an exploitative strategy. TRANQUILIZER did well at exploiting unresponsive programs, but overall, they did not do well in the tournament because their tendency to sneakily exploit their opponents ended up getting punished for defection, resulting in lower scores and a less rewarding game than mutual cooperation would have provided.
It is nevertheless important to remember that in a game of IPD there is no absolute best rule independent of its opponents. The reason TIT FOR TAT was successful in winning both the tournaments was not because it was absolutely the strongest but rather because it was generally stable and robust against a wide variety of strategies and environments. Different strategies understand that to do well with TIT FOR TAT requires mutual cooperation and the clarity of TIT FOR TAT's behavior lets other strategies get this quickly. A wide range of strategies including the most exploitative ones like TESTER understood this and apologized for their defection by cooperating in the following rounds. In the other hand, TIT FOR TAT itself also abandoned the possibility of exploiting others to prevent risking retaliation and more importantly stop mutual recriminations from taking hold of the entire game. A combination of TIT FOR TAT's property of being "nice" retaliatory, and forgiving made it the most persistent and sturdy rule against different strategies in different environments.

Multi-Agent-Based Model of N-Person Prisoner's Dilemma
A common objection to Axelrod's depiction of cooperation using the Prisoner's Dilemma is that it is unrealistic as most situations involved more than two agents. That is, seldom do we see a one versus one dynamic. Rather, more than one agent is commonly involved and each of those may be employing different strategies. For example, the current trade war between the US and China, while fought mainly between the two countries, involve numerous other countries. Taiwan has its conflict with China and is using ties with the US for protective measures. Vietnam and other Southeast Asian countries are trying to win business from US companies by lobbying for manufacturing operations to be moved to their own countries.
To understand with more detail how cooperation can emerge in complex, realistic situations is not an easy task.  Agents either cooperate or defect. But each agent can change that strategy depending on what happens within their environment. The environments are modeled below; they are varied in aggregate and numerous enough to represent various strategies agents would use in real life. The key parameter p means that player will continue his strategy with probability p and switch to other strategy with probability 1-p.

4) Greedy setup
Greedy strategy usually produces stochastic process but different from pavlovian.
So initially everyone gravitates towards Defectors since they had the highest payoff. But if there are a sufficiently large number of Cooperators, the Cooperators in total will score higher (3) than the Defectors (since they defected against each other they scored 0). Then many will start to convert over to Cooperators.
The Greedy environment is most realistic as herding behaviors have been exhibited in numerous studies in a variety of fields. As the herding instinct drives agents to certain behaviors, cooperation starts to take effect.

Conclusion
Are humans purely selfish, or can we cooperate when we need to without the hand of a central authority? It is a question that philosophers, political thinkers, and economists have asked for years and neither side has yet to win the battle. When Axelrod asked, "Under what conditions will cooperation emerge in a world of egoists without central authority?", he received an answer through the use of the Prisoner's Dilemma experiment. Although capitalism comes with issues such as moral hazard, wealth inequality, and more, Axelrod's experiments show that there is not necessarily overwhelming evidence that a powerful, centralized government is the answer to the issues capitalism presents.
Axelrod's work provides an analytical account of how the conditions of cooperation arise without the heavy hand of a central authority. It shows that cooperation does not require an absence of selfishness, but mutual cooperation can, in fact, be a product of self-interest and selfishness. The experiments showed that successful strategies even amongst prisoners (who are inherently selfish) are characterized by cooperative behaviors. Thus, Axelrod shows even in an anarchistic environment, cooperation can happen and perhaps the classical liberals are right in saying we do not need a central state to police its citizenry. Even if it is not conclusive, at the very least, it casts some doubt on the common rational among government officials that without a large government society would descend into chaos.