Evolving Core Warriors Dave Hillis Class Project for Evolutionary Computation, 1998 Introduction ------------ This paper describes the results of a project that applied genetic programming (GP) techniques in an attempt to generate core wars programs (called warriors) that would be competitive with those generated by experienced human programmers. Core wars is an attractive application; the fitness of a program involves complicated interactions with multiple competing agents and there is a public domain simulator and debugger (pmars) and a large number of published hand-written programs to compare against. Although the project did not produce a warrior strong enough to challenge current tournament winners, it did produce warriors whose strength and sophistication surpassed those previously reported in the literature for evolved warriors. Core Wars --------- Core wars is a game in which 2 virus-like computer programs battle each other for control of a virtual core memory. The game first became widely known in the 1980's with the publication of 2 articles by Dewdny[1]. The programs are written in "redcode" which is similar to an assembly language. In a battle, a simulator program loads 2 warriors into random locations in memory. The remainder of the memory contains "DAT" instructions - any process that tries to execute a "DAT" instruction is terminated. The simulator alternates virtual machine cycles between the warriors. Each warrior can read or write into any memory location; allowing it to copy parts of itself and to modify it's own code or its opponent's. A warrior may spawn several independent processes but does not increase its share of the machine cycles by doing so. A warrior loses when all of its processes have terminated. If both warriors survive past a fixed number of cycles (usually 80,000) a tie is declared. Players compete by writing warriors and entering them into tournaments. In a tournament, each warrior battles all the others (usually 200 times each with random starting locations). Each tie earns 1 point, each win earns 3, and each loss earns nothing. The warrior's score is the total over all of its battles. There are a number of king of the hill tournaments that run continuously via email submission. In a king of the hill tournament, 25 or so warriors occupy a virtual hill and other warriors try to displace them. Each warrior on the hill has a hill-score consisting of its average score against all of the warriors (including itself) on the hill. A warrior challenges the hill by playing against all of the warriors on the hill (200 times each) thus obtaining its own hill-score. If the challenging warrior has a higher hill-score than the lowest scored member of the hill, it replaces that warrior on the hill. (The new warrior on the hill need not necessarily have won any of its battles with the displaced warrior: only the average score matters.) Core wars players tend to group strategies into 3 broad categories. Stone strategies involve blindly changing (or "bombing") as much memory as possible, usually by copying DAT commands into different addresses in hopes of disrupting the other warrior's program. Paper strategies try to fill up the memory with independently running, self replicating, copies of themselves. Scissors strategies try to find where the other program is and then destroy it. As the names imply: scissors tend to beat papers which tend to beat stones, which tend to beat scissors. A warrior may use one or a combination of strategies, but a tournament can always be expected to include a mix of strategies. A warrior must do reasonably well against all of these strategies to do well in a tournament. Beginning core wars players commonly test their programs against the Wilkies benchmark: a group of 12 warriors. The benchmark score is the average obtained by competing a warrior 100 times against each of the 12 warriors and dividing by 12. A warrior that won all of its battles would score 300. A warrior that won exactly half of its battles and lost half of its battles would score 150. A warrior that tied in every battle would score 100. A "decent" human written warrior should score well over 100. A full description of core wars is beyond the scope of this paper. The KOTH website [2] contains redcode reference manuals, tutorials, free software, access to on-line tournaments, and many related links. Core Wars and Evolutionary Computation -------------------------------------- Previous Work ------------- Core wars has inspired a number of Alife studies involving evolutionary computation. Some like TIERRA[3] and Venus[4] involve a core wars-like environment in which programs compete in a virtual memory filled with "food" and other resources. Others, which I will call pure core wars studies, use the standard core wars rules: allowing their results to be directly compared to a large body of human generated programs. (Such a comparison is not necessarily fair since these were intended to be Alife experiments.) To date, I have been unable to find a published paper on evolved pure core warriors. The following examples come from unpublished papers posted on the world wide web. Perry[5] describes a study in which he used a GA with mutation and multi-point crossover and a binary representation. He experimented both with an exogenous fitness function (competing the individuals against a suite of test warriors) and an endogenous one (competing the individuals against each other). Although the compute power he had available severely limited the experiments performed, he did a good job of describing the attractiveness of the problem and of the technical hurdles involved. His paper has a link from the KOTH web site and the other pure core wars studies (below) all cite it. Boer[6] describes a study in which he used an evolutionary strategy and an endogenous fitness function to evolve warriors. His program, ga_war.c, uses the redcode source files as the genetic representation, and a mutation operator, but no crossover. The file based approach is slow but allows one to use the publicly available "pmars" simulator[2] for the objective function. He makes the source code for ga_war.c freely available. ga_war.c, forms an initial population by randomly generating one line warriors, testing them against a particular (very weak) warrior and only accepting those that win or tie against it. Once an initial population is formed, 2 warriors are selected at random and conduct some number of battles (nominally 5) against each other. If 1 warrior outscores the other, the looser is replaced by an exact copy of the winner. Otherwise both warriors are replaced by mutated copies of themselves. The program generates new warriors in this way for some fixed number of iterations. Mutation can add lines, delete lines, or modify any fields in any lines. If a mutated warrior cannot win or tie against the weak fixed warrior, the program tries again until an acceptable warrior is generated. A warrior that has been replaced may, depending on how many battles it has won before, be sent to "Valhalla" from whence it may return to the population in some later generation. Coleman[7] uses Boer's program and obtains similar results. Newton[8] uses Boer's program, a program called "Sys 4" written by George Lebl, and a program of his own called "RedMaker. Each of the pure core wars studies described above included some comparison with human derived warriors. None of them produced a warrior that was competitive with good human written ones or reported a Wilkies benchmark score over 85. Technical Challenges -------------------- The most serious challenge is that the quality of human written programs is high. Many people have been competing against each other for years and the resulting programs are highly optimized. Fitness function ---------------- A pure core wars fitness function makes the problem a reinforcement learning one. In the pure core wars studies, the fitness function is based entirely on the number of wins, loses and ties of the individual over some number of battles. There is no placing of "food" in memory or providing additional mating opportunities to programs that split as ways to lead the population into desired behaviors like self-replication and exploration. Placing the warriors at random starting positions for each battle adds a strong stochastic element to the outcome of a battle. This is a problem in that many (very time consuming) battles between 2 individual warriors are needed to measure how well they do against each other. It is also a blessing in that a relatively weak warrior will still win against a stronger opponent in the battles where its starting position is sufficiently favorable. This stochastic element smoothes out the fitness function. It is much more useful to be able to say that warrior A wins 10% of its battles with warrior B than to say only that it is weaker. For a tournament, 100 to 200 battles are fought between individual warriors. For most of the studies reviewed, only 5 or 10 battles were used - relying on the GA's ability to work with noisy evaluation functions. For an endogenous fitness function, the fitness is based on battles against other warriors in the population. But to be competitive against human coded warriors a warrior must be able compete against a range of strategies. Unless the population contains a mix of strategies (scissors, paper, stone), the individuals need not evolve to deal with a mix of strategies. None of the pure core wars studies reviewed reported the generation of paper or scissors strategies. An alternative is to use an exogenous fitness function by training the warriors against a static (unchanging) hill of human coded warriors. In practice, it is difficult to select a hill that is just hard enough and there is always the danger that the population will not evolve generally useful strategies. Evolvability ------------ The search space of solutions is large. For a single line in a warrior, there are 17 possible command types each of which has 7 possible modifiers and 2 fields. Each field contains a number between 0 and 7999 as well as its own addressing mode. There are 8 different addressing modes. There are (17*7*8*8000*8*8000) unique possible lines. A warrior may contain up to 100 lines. In the pure core wars studies, the population always converged toward stone-type solutions that were very short (not counting introns), efficient, and ultimately not very good. Since a core Wars battle is a race to kill before you are killed, any change that slows a warrior down is strongly selected against. Any independent process that a warrior splits off will be parasitic (wasting precious cycles) unless it is immediately useful. Evolution pushes toward warriors with short functioning code lengths both to increase efficiency and in response to disruption from crossover or mutation. The operators themselves also resist the development of complex programs. One way to allow a GP to escape this evolutionary dead end may be to force it to evolve warriors with multiple semi-independent parts. A suitable crossover operator may be able to encourage more complex individuals that may ultimately be of better quality. Description of the Approach --------------------------- In this work, I used ga_war.c [6] as a starting point and experimented with some modifications to try to achieve better performance. Specifically, I replaced the evolution strategy with a steady state GA using tournament selection, I imposed a block based organization on the individuals, added 2 crossover operators, and added an endogenous fitness function based on the king of the hill concept. Figure 1 shows a typical individual. The representation is nearly the same as that of Boer: an individual is represented as an ascii file, complete with header, and in a format that the pmars simulator can work with. One major difference is that each individual is of fixed length (5 blocks of 10 lines each) plus 4 "spl" commands that force it to start by splitting new processes off to 4 locations in the program. These locations correspond to the start points of the second, third, fourth, and fifth blocks in the program. The result is that every warrior will, at least briefly, have 5 independent processes running on different parts of the program. (In practice, this is a mild constraint as some of the processes quickly die.) The rational for using the block based organization was that it could reduce brittleness by forcing some redundancy and encouraging modularity. The block based crossover operator selected a cut point on a randomly chosen block boundary, placed the second "half" of the first parent into the first half of the child, then placed the first half of the second parent into the second half of the child. The child then underwent a mutation operation where every command, modifier, address mode and field had a 2% chance of a random change. The idea was to produce minimal disruption within each block while producing significant changes in relative block location. The program also included an ordinary 1-point crossover operator with a completely random cut point. Other departures from Boer's work were that warriors were not sent to or from Valhalla (reintroduced to the population) and new children created by crossover and mutation were not screened against the weak warrior prior to acceptance into the population. (I had originally intended to use the screening to test the effects of lethality of the crossover operators, but, as the observed lethality rates were low, I did not pursue this issue.) At the end of each generation, the program takes the individual with the highest fitness and calculates its Wilkies benchmark score. This computationally expensive diagnostic gives some measure of the population's strength independent of its fitness metric. Results ------- My original intention was to conduct a quick pilot study to establish a reasonable level of performance for the program and then to conduct a parametric study to evaluate the usefulness of the block based approach described above. Not surprisingly, perhaps, the pilot study expanded to take up the whole time for the project leaving no time for the parametric study. Although I provide quantitative data below, there is not a sufficient statistical basis to establish which techniques were helpful and which were not. It can take a population of 50 individuals close to an hour of CPU time on a SPARC ULTRA-2 to process 1 generation. (The run-time varies: battles that result in a tie require 80,000 cycles in the pmars simulator while battles that don't are shorter.) Static Exogenous Hill --------------------- The warriors with the highest Wilkies benchmark scores evolved in populations evaluated against human coded warriors. The best of these, shown in figure 2, had a Wilkies benchmark score of 94 which is significantly higher than the highest score that I found in the literature for an evolved warrior. (This figure was calculated as the average of 10 benchmark measurements and is fairly precise.) This warrior combined a stone strategy with a strategy of blindly splitting off new processes to different locations in memory. When these new processes landed on another warrior's instructions, the new process would execute its opponent's code, making it harder to kill. The process splitting strategy worked best against papers, which spread their code throughout memory, producing more ties. Dynamic Endogenous Hill ----------------------- The most interesting results came from populations that evolved with fitness functions based on a dynamic endogenous hill. The program started by creating a random population of warriors and a hill with a fixed number (nominally 12) of randomly formed warriors. Each warrior on the hill competed against the whole hill (100 battles each) to receive a hill-score. Each individual in the population competed against all of the warriors on the hill (10 battles each for a quicker but noisy score) and received a fitness value based on its hill-score. At the end of each generation, the best individual in the population challenged the hill (100 battles each for a more accurate measure). If the challenger's hill- score was higher than that of the lowest scored individual on the hill, a copy of the challenger replaced the loser on the hill. Using a dynamic endogenous hill is not qualitatively much different from simply competing each warrior against a random subset of the warriors in the population .(Boer's program uses a random subset of size 1.) It does have the advantages of smoothing out the changes to the fitness landscape (only one warrior on the hill can change per generation), it is biased toward stronger warriors than average, it makes it easier to identify the best and worst individuals in the population, and examining the interactions between warriors on the hill provides insight into the dynamics of the population. Interestingly, the challenging warriors almost always made it onto the hill. In fact, the new warrior on the hill frequently had the highest hill-score. This phenomena did not appear to be the result of noisy evaluations but rather the absence of any Nash equilibria. Turn-over on the hill was also high. When a warrior made it onto the hill, the GP started selecting for the ability to defeat it. Individuals that out matched it were rewarded and soon some of them made it onto the hill. Its hill-score dropped and eventually it was replaced. Figure 3 shows the Wilkies benchmark scores of the best individuals for a test using a population size of 50 a dynamic endogenous hill of size 8 and probabilities of using block based and ordinary crossover of 95% and 5% respectively. The population evolved for several hundred generations, but only the last 500 or so are shown. This experiment ran for many days and was subjected to occasional tinkering such as killing off the hill and introducing individuals that had evolved against different fitness functions. Figure 4 shows the Wilkies benchmark scores of the best individuals for a test using a population size of 50 a dynamic endogenous hill of size 12 and probabilities of using block based and ordinary crossover of 95% and 5% respectively, with no tinkering. The population rose for the first 170 generations then declined to very low strength. Figure 1 shows a warrior that occurred near the peak of this run. It is quite significant in that it has a genuine paper strategy, it makes copies of itself and those copies make copies in turn. It also has a stone function as most human written papers do. Although self-replicating programs have been evolved before [3,4], this may be the first time that it has been done using a pure core wars strategy. For a time, the population held a mixture of paper and stone strategies but then it declined into a group of low scoring, uncharismatic stones. In a related test, the paper warrior from figure 1 was reintroduced into the population and quickly took over the hill and the population. This suggests that the decline could have been averted had the population been protected from forgetting previous good solutions. The Valhalla function that Boer used may be a practical way of achieving this. Figure 5 shows the Wilkies benchmark scores of the best individuals for a test using a population size of 50 a dynamic endogenous hill of size 12 and probabilities of using block based and ordinary crossover of 0% and 100% respectively, with no tinkering. Many more generations and many more runs will be needed to determine whether the block based crossover operator is more useful than ordinary crossover. Future Work ----------- This is one of the most addictive technical problems that I have encountered. The next step is to add a Valhalla function that will periodically reintroduce old hill members back into the population. Conclusion ---------- The ultimate measure of success for evolved core warriors will come when they start to win tournaments against human coded programs. This work fell far short of that goal but did show real progress toward it by evolving warriors that showed more sophisticated behavior and higher benchmark scores than those that I was able to find in the literature. The fact that a paper strategy evolved in less than 200 generations is encouraging evidence that competitive core warriors may indeed evolve using purely endogenous fitness functions. References ---------- 1. Dewdney, A.K. (1984) "Computer Recreations: In the Game called Core War Hostile Programs Engage in a Battle of Bits", Sci. Amer. 250(5), 14-22. 2. http://www.koth.org/ 3. T. Ray, "Tierra.doc. Documentation for the Tierra Simulator V4.0, 9-9-92". anonomous ftp: tierra.slhs.udel.edu 4. "Emergent Computation. Self-Organizing, Collective, and Cooperative Phenomena in Natural and Artificial Computing Networks", [FORREST90]:, Cambridge, MA: MIT Press. (Special issue of Physica D.) 5. John Perry, "Core Wars Genetics: The Evolution of Predation", http://www.KOTH.org/evolving_warriors.html 6. Jason Boer, http://www.avalon.net/~jboer/projects/corewar/corewar.html 7. Ryan Coleman, "Learning By Simulating Evolution Using Corewars", http://www.geocities.com/CapeCanaveral/Launchpad/6898/evolving.html 8. Terry Newton, "Evolved Warriors", http://www.nc5.infi.net/~wtnewton/corewar/evol/index.html ---------------------------- figure 1 ------------------------------------------ >From - Wed Dec 09 22:16:08 1998 ;redcode-94 ;assert CORESIZE == 8000 ;name mypaper.red ;strat 12/09/98. Paper with core clear. Generated using a GA with a 12 ;strat warrior endogenous hill. (Wilkies score of 57). org S; S; spl 14; spl 23; spl 32; spl 41; add.f $ 7034, } -49 mov.i { 6405, } 0 mod.b { 6561, { 1 dat.ab $ 2367, > 31 seq.f } 35, > -18 mov.f < 665, > 2127 dat.b } 7, @ -14 djn.ba } 28, # 5382 djn.i } 2904, { -48 slt.i # 433, > -16 add.b $ 25, } 35 mov.i { 7003, } 40 mod.a # 6561, { 1 dat.ab $ 2367, > 2431 div.f } 35, > -27 mov.f } -42, $ 5432 dat.b @ 3534, } -14 djn.b $ 28, # 5382 djn.i { 6290, { -48 slt.i # 433, < -45 mul.b < 3509, { 59 seq.i * 4122, @ 38 sub.f > 23, } 23 spl.ab * 2509, < 2134 jmn.x * 27, { -36 spl.b # 0, # 24 mov.i } -33, } -8 djn.f > 6006, { 3287 djn.i $ 3528, $ 41 sub.ab $ -21, > -10 mul.a * 25, < -20 mov.ab $ 5, } 47 jmz.ab > 4941, $ 4584 mul.b { 2367, > 5 jmp.i } -11, > -27 jmz.b $ 48, } 7475 dat.b @ 20, > -14 div.i @ 28, # 359 jmn.x { 3553, { 7109 slt.i } 5169, < 49 dat.a # -9, > 4138 sub.ba @ 14, < 5845 sne.a # 7555, $ 24 jmp.a } -11, } -2 seq.ba $ 1110, @ -40 add.x } -17, } 3803 div.ab $ 46, > 48 seq.ab } 5, $ 21 nop.ba < -19, > 2860 mov.i # 511, $ 5405 end ;0 ------------------------------------------ figure 1 ------------------------------------------ ------------------------------------------------- figure 2 ;redcode-b verbose ;assert CORESIZE == 8000 ;name HAL_00.red ;author Dave Hillis ;strat submitted @date@ Genetic Programming ... ... spl 8; spl 12; spl 16; jmn.b >A7 ,A16 djn.ab #A16,A12,{A15 sub.ab *A12,>A16 spl.b *A13,A13,$A4 djn.b