Evolving Core Warriors 
Dave Hillis 
Class Project for Evolutionary Computation, 1998


Introduction
------------
	This paper describes the results of a project that applied genetic programming 
(GP) techniques in an attempt to generate core wars programs (called warriors) that 
would be competitive with those generated by experienced human programmers. Core 
wars is an attractive application; the fitness of a program involves complicated interactions 
with multiple competing agents and there is a public domain simulator and debugger 
(pmars) and a large number of published hand-written programs to compare against.

	Although the project did not produce a warrior strong enough to challenge current 
tournament winners, it did produce warriors whose strength and sophistication surpassed 
those previously reported in the literature for evolved warriors. 


Core Wars
---------
	Core wars is a game in which 2 virus-like computer programs battle each other for 
control of a virtual core memory. The game first became widely known in the 1980's with 
the publication of 2 articles by Dewdny[1]. The programs are written in "redcode" which 
is similar to an assembly language.

	In a battle, a simulator program loads 2 warriors into random locations in memory. 
The remainder of the memory contains "DAT" instructions - any process that tries to 
execute a "DAT" instruction is terminated. The simulator alternates virtual machine cycles 
between the warriors. Each warrior can read or write into any memory location; allowing 
it to copy parts of itself and to modify it's own code or its opponent's. A warrior may 
spawn several independent processes but does not increase its share of the machine cycles 
by doing so. A warrior loses when all of its processes have terminated. If both warriors 
survive past a fixed number of cycles (usually 80,000) a tie is declared. 

	Players compete by writing warriors and entering them into tournaments. In a 
tournament, each warrior battles all the others (usually 200 times each with random 
starting locations). Each tie earns 1 point, each win earns 3, and each loss earns nothing. 
The warrior's score is the total over all of its battles. 

	There are a number of king of the hill tournaments that run continuously via email 
submission. In a king of the hill tournament, 25 or so warriors occupy a virtual hill and 
other warriors try to displace them. Each warrior on the hill has a hill-score consisting of 
its average score against all of the warriors (including itself) on the hill. A warrior 
challenges the hill by playing against all of the warriors on the hill (200 times each) thus 
obtaining its own hill-score. If the challenging warrior has a higher hill-score than the 
lowest scored member of the hill, it replaces that warrior on the hill. (The new warrior on 
the hill need not necessarily have won any of its battles with the displaced warrior: only 
the average score matters.)

	Core wars players tend to group strategies into 3 broad categories. Stone 
strategies involve blindly changing (or "bombing") as much memory as possible, usually by 
copying DAT commands into different addresses in hopes of disrupting the other warrior's 
program. Paper strategies try to fill up the memory with independently running, self 
replicating, copies of themselves. Scissors strategies try to find where the other program is 
and then destroy it. 

	As the names imply: scissors tend to beat papers which tend to beat stones, which 
tend to beat scissors. A warrior may use one or a combination of strategies, but a 
tournament can always be expected to include a mix of strategies. A warrior must do 
reasonably well against all of these strategies to do well in a tournament.

	Beginning core wars players commonly test their programs against the Wilkies 
benchmark: a group of 12 warriors. The benchmark score is the average obtained by 
competing a warrior 100 times against each of the 12 warriors and dividing by 12. A 
warrior that won all of its battles would score 300. A warrior that won exactly half of its 
battles and lost half of its battles would score 150. A warrior that tied in every battle 
would score 100. A "decent" human written warrior should score well over 100.

	A full description of core wars is beyond the scope of this paper. The KOTH 
website [2] contains redcode reference manuals, tutorials, free software, access to on-line 
tournaments, and many related links.


Core Wars and Evolutionary Computation
--------------------------------------

Previous Work
-------------
	Core wars has inspired a number of Alife studies involving evolutionary 
computation. Some like TIERRA[3] and Venus[4]  involve a core wars-like environment 
in which programs compete in a virtual memory filled with "food" and other resources. 
Others, which I will call pure core wars studies, use the standard core wars rules: allowing 
their results to be directly compared to a large body of human generated programs. (Such 
a comparison is not necessarily fair since these were intended to be Alife experiments.) To 
date, I have been unable to find a published paper on evolved pure core warriors. The 
following examples come from unpublished papers posted on the world wide web.

	Perry[5] describes a study in which he used a GA with mutation and multi-point 
crossover and a binary representation. He experimented both with an exogenous fitness 
function (competing the individuals against a suite of test warriors) and an endogenous 
one (competing the individuals against each other). Although the compute power he had 
available severely limited the experiments performed, he did a good job of describing the 
attractiveness of the problem and of the technical hurdles involved. His paper has a link 
from the KOTH web site and the other pure core wars studies (below) all cite it.

	Boer[6] describes a study in which he used an evolutionary strategy and an 
endogenous fitness function to evolve warriors. His program, ga_war.c, uses the redcode 
source files as the genetic representation, and a mutation operator, but no crossover. The 
file based approach is slow but allows one to use the publicly available "pmars" 
simulator[2] for the objective function. He makes the source code for ga_war.c freely 
available.

	ga_war.c, forms an initial population by randomly generating one line warriors,  
testing them  against a particular (very weak) warrior and only accepting those that win or 
tie against it. Once an initial population is formed, 2 warriors are selected at random and 
conduct some number of battles (nominally 5) against each other. If 1 warrior outscores 
the other, the looser is replaced by an exact copy of the winner. Otherwise both warriors 
are replaced by mutated copies of themselves. The program generates new warriors in this 
way for some fixed number of iterations. 

	Mutation can add lines, delete lines, or modify any fields in any lines.  If a mutated 
warrior cannot win or tie against the weak fixed warrior, the program tries again until an 
acceptable warrior is generated. A warrior that has been replaced may, depending on how 
many battles it has won before, be sent to "Valhalla" from whence it may return to the 
population in some later generation.

	Coleman[7] uses Boer's program and obtains similar results. Newton[8] uses 
Boer's program, a program called "Sys 4" written by George Lebl, and a program of his 
own called "RedMaker.

	Each of the pure core wars studies described above included some comparison 
with human derived warriors. None of them produced a warrior that was competitive with 
good human written ones or reported a Wilkies benchmark score over 85.


Technical Challenges
--------------------
	The most serious challenge is that the quality of human written programs is high. 
Many people have been competing against each other for years and the resulting programs 
are highly optimized. 


Fitness function
----------------
	A pure core wars fitness function makes the problem a reinforcement learning one. 
In the pure core wars studies, the fitness function is based entirely on the number of wins, 
loses and ties of the individual over some number of battles. There is no placing  of "food" 
in memory or providing additional mating opportunities to programs that split as ways to 
lead the population into desired behaviors like self-replication and exploration.

	Placing the warriors at random starting positions for each battle adds a strong 
stochastic element to the outcome of a battle. This is a problem in that many (very time 
consuming) battles between 2 individual warriors are needed to measure how well they do 
against each other. It is also a blessing in  that a relatively weak warrior will still win 
against a stronger opponent in the battles where its starting position is sufficiently 
favorable. This stochastic element smoothes out the fitness function. It is much more 
useful to be able to say that warrior A wins 10% of its battles with warrior B than to say 
only that it is weaker.

	For a tournament, 100 to 200 battles are fought between individual warriors. For 
most of the studies reviewed, only 5 or 10 battles were used - relying on the GA's ability 
to work with noisy evaluation functions.

	For an endogenous fitness function, the fitness is based on battles against other 
warriors in the population. But to be competitive against human coded warriors a warrior 
must be able compete against a range of strategies. Unless the population contains a mix 
of strategies (scissors, paper, stone), the individuals need not evolve to deal with a mix of 
strategies. None of the pure core wars studies reviewed reported the generation of paper 
or scissors strategies.

	An alternative is to use an exogenous fitness function by training the warriors 
against a static (unchanging) hill of human coded warriors. In practice, it is difficult to 
select a hill that is just hard enough and there is always the danger that the population will 
not evolve generally useful strategies.


Evolvability
------------
	The search space of solutions is large. For a single line in a warrior, there are 17 
possible command types each of which has 7 possible modifiers and 2 fields. Each field 
contains a number between 0 and 7999 as well as its own addressing mode. There are 8 
different addressing modes. There are (17*7*8*8000*8*8000) unique possible lines. A 
warrior may contain up to 100 lines.	

	In the pure core wars studies, the population always converged toward stone-type 
solutions that were very short (not counting introns), efficient, and ultimately not very 
good. Since a core Wars battle is a race to kill before you are killed, any change that slows 
a warrior down is strongly selected against. Any independent process that a warrior splits 
off will be parasitic (wasting precious cycles) unless it is immediately useful. Evolution 
pushes toward warriors with short functioning code lengths both to increase efficiency and 
in response to disruption from crossover or mutation. The operators themselves also resist 
the development of complex programs.

	One way to allow a GP to escape this evolutionary dead end may be to force it to 
evolve warriors with multiple semi-independent parts. A suitable crossover operator may 
be able to encourage more complex individuals that may ultimately be of better quality.


Description of the Approach
---------------------------
	In this work, I used ga_war.c [6] as a starting point and experimented with some 
modifications to try to achieve better performance. Specifically, I replaced the evolution 
strategy with a steady state GA using tournament selection, I imposed a block based 
organization on the individuals, added 2 crossover operators, and added an endogenous 
fitness function based on the king of the hill concept.
	
	Figure 1 shows a typical individual. The representation is nearly the same as that of 
Boer: an individual is represented as an ascii file, complete with header, and in a format 
that the pmars simulator can work with. One major difference is that each individual is of 
fixed length (5 blocks of  10 lines each) plus 4 "spl" commands that force it to start by 
splitting new processes off to 4 locations in the program. These locations correspond to 
the start points of the second, third, fourth, and fifth blocks in the program. The result is 
that every warrior will, at least briefly, have 5 independent processes running on different 
parts of the program. (In practice, this is a mild constraint as some of the processes 
quickly die.) The rational for using the block based organization was that it could reduce 
brittleness by forcing some redundancy and encouraging modularity.

	The block based crossover operator selected a cut point on a randomly chosen 
block boundary, placed the second "half" of the first parent into the first half of the child, 
then placed the first half of the second parent into the second half of the child. The child 
then underwent a mutation operation where every command, modifier, address mode and 
field had a 2% chance of a random change. The idea was to produce minimal disruption 
within each block while producing significant changes in relative block location. The 
program also included an ordinary 1-point crossover operator with a completely random  
cut point.

	Other departures from Boer's work were that warriors were not sent to or from 
Valhalla (reintroduced to the population) and new children created by crossover and 
mutation were not screened  against the weak warrior prior to acceptance into the 
population. (I had originally intended to use the screening to test the effects of lethality of 
the crossover operators, but, as the observed lethality rates were low, I did not pursue this 
issue.)

	At the end of each generation, the program takes the individual with the highest 
fitness and calculates its Wilkies benchmark score. This computationally expensive 
diagnostic gives some measure of the population's strength independent of its fitness 
metric.


Results
-------
	My original intention was to conduct a quick pilot study to establish a reasonable 
level of performance for the program and then to conduct a parametric study to evaluate 
the usefulness of the block based approach described above. Not surprisingly, perhaps, the 
pilot study expanded to take up the whole time for the project leaving no time for the 
parametric study. Although I provide quantitative data below, there is not a sufficient 
statistical basis to establish which techniques were helpful and which were not.

	It can take a population of 50 individuals close to an hour of CPU time on a 
SPARC ULTRA-2 to process 1 generation. (The run-time varies: battles that result in a tie 
require 80,000 cycles in the pmars simulator while battles that don't are shorter.) 


Static Exogenous Hill
---------------------
	The warriors with the highest Wilkies benchmark scores evolved in populations 
evaluated against human coded warriors. The best of these, shown in figure 2, had a 
Wilkies benchmark score of 94 which is significantly higher than the highest score that I 
found in the literature for an evolved warrior. (This figure was calculated as the average of 
10 benchmark measurements and is fairly precise.)

	This warrior combined a stone strategy with a strategy of blindly splitting off new 
processes to different locations in memory. When these new processes landed on another 
warrior's instructions, the new process would execute its opponent's code, making it 
harder to kill. The process splitting strategy worked best against papers, which spread 
their code throughout memory, producing more ties.


Dynamic Endogenous Hill
-----------------------
	The most interesting results came from populations that evolved with fitness 
functions based on a dynamic endogenous hill. The program started by creating a random 
population of warriors and a hill with a fixed number (nominally 12) of randomly formed 
warriors. Each warrior on the hill competed against the whole hill (100 battles each) to 
receive a hill-score. Each individual in the population competed against all of the warriors 
on the hill (10 battles each for a quicker but noisy score) and received a fitness value based 
on its hill-score. At the end of each generation, the best individual in the population 
challenged the hill (100 battles each for a more accurate measure). If the challenger's hill-
score was higher than that of the lowest scored individual on the hill, a copy of the 
challenger replaced the loser on the hill.

	Using a dynamic endogenous hill is not qualitatively much different from simply 
competing each warrior against a random subset of the warriors in the population .(Boer's 
program uses a random subset of size 1.) It does have the advantages of smoothing out 
the changes to the fitness landscape (only one warrior on the hill can change per 
generation), it is biased toward stronger warriors than average, it makes it easier to 
identify the best and worst individuals in the population, and examining the interactions 
between warriors on the hill provides insight into the dynamics of the population.

	Interestingly, the challenging warriors almost always made it onto the hill. In fact, 
the new warrior on the hill frequently had the highest hill-score. This phenomena did not 
appear to be the result of noisy evaluations  but rather the absence of any Nash equilibria. 
Turn-over on the hill was also high. When a warrior made it onto the hill, the GP started 
selecting for the ability to defeat it. Individuals that out matched it were rewarded and 
soon some of them made it onto the hill. Its hill-score dropped and eventually it was 
replaced.

	Figure 3 shows the Wilkies benchmark scores of the best individuals for a test 
using a population size of 50 a dynamic endogenous hill of size 8 and probabilities of using 
block based and ordinary crossover of 95% and 5% respectively. The population evolved 
for several hundred generations, but only the last 500 or so are shown. This experiment 
ran for many days and was subjected to occasional tinkering such as killing off the hill and 
introducing individuals that had evolved against different fitness functions.	

	Figure 4 shows the Wilkies benchmark scores of the best individuals for a test 
using a population size of 50 a dynamic endogenous hill of size 12 and probabilities of 
using block based and ordinary crossover of 95% and 5% respectively, with no tinkering. 
The population rose for the first 170 generations then declined to very low strength.

	 Figure 1 shows a warrior that occurred near the peak of this run. It is quite 
significant in that it has a genuine paper strategy, it makes copies of itself and those copies 
make copies in turn. It also has a stone function as most human written papers do. 
Although self-replicating programs have been evolved before [3,4], this may be the first 
time that it has been done using a pure core wars strategy.

	For a time, the population held a mixture of paper and stone strategies but then it 
declined into a group of low scoring, uncharismatic stones. In a related test, the paper 
warrior from figure 1 was reintroduced into the population and quickly took over the hill 
and the population. This suggests that the decline could have been averted had the 
population been protected from forgetting previous good solutions. The Valhalla function 
that Boer used may be a practical way of achieving this. 

	Figure 5 shows the Wilkies benchmark scores of the best individuals for a test 
using a population size of 50 a dynamic endogenous hill of size 12 and probabilities of 
using block based and ordinary crossover of 0% and 100% respectively, with no tinkering. 
Many more generations and many more runs will be needed to determine whether the 
block based crossover operator is more useful than ordinary crossover.


Future Work
-----------
	This is one of the most addictive technical problems that I have encountered. The 
next step is to add a Valhalla function that will periodically reintroduce old hill members 
back into the population.


Conclusion
----------
	The ultimate measure of success for evolved core warriors will come when they 
start to win tournaments against human coded programs. This work fell far short of that 
goal but did show real progress toward it by evolving warriors that showed more 
sophisticated behavior and higher benchmark scores than those that I was able to find in 
the literature.

	The fact that a paper strategy evolved in less than 200 generations is encouraging 
evidence that competitive core warriors may indeed evolve using purely endogenous 
fitness functions.


References
----------
1.	 Dewdney, A.K. (1984) "Computer Recreations: In the Game called Core War 
Hostile Programs Engage in a Battle of Bits", Sci. Amer. 250(5), 14-22. 
2.	http://www.koth.org/
3.	T. Ray, "Tierra.doc. Documentation for the Tierra Simulator V4.0, 9-9-92". 
anonomous ftp: tierra.slhs.udel.edu
4.	 "Emergent Computation. Self-Organizing, Collective, and Cooperative 
Phenomena in Natural and Artificial Computing Networks", [FORREST90]:, Cambridge, 
MA: MIT Press. (Special issue of Physica D.)
5.	John  Perry, "Core Wars Genetics: The Evolution of Predation", 
http://www.KOTH.org/evolving_warriors.html
6.	Jason Boer, http://www.avalon.net/~jboer/projects/corewar/corewar.html
7.	Ryan Coleman, "Learning By Simulating Evolution Using Corewars", 
http://www.geocities.com/CapeCanaveral/Launchpad/6898/evolving.html
8.	Terry Newton, "Evolved Warriors", 
http://www.nc5.infi.net/~wtnewton/corewar/evol/index.html




---------------------------- figure 1 ------------------------------------------
>From - Wed Dec 09 22:16:08 1998
;redcode-94
;assert CORESIZE == 8000
;name mypaper.red
;strat 12/09/98. Paper with core clear. Generated using a GA with a 12
;strat		warrior endogenous hill. (Wilkies score of 57).
org	S;
S;
spl 14;
spl 23;
spl 32;
spl 41;
add.f  $  7034, }   -49
mov.i  {  6405, }     0
mod.b  {  6561, {     1
dat.ab $  2367, >    31
seq.f  }    35, >   -18
mov.f  <   665, >  2127
dat.b  }     7, @   -14
djn.ba }    28, #  5382
djn.i  }  2904, {   -48
slt.i  #   433, >   -16
add.b  $    25, }    35
mov.i  {  7003, }    40
mod.a  #  6561, {     1
dat.ab $  2367, >  2431
div.f  }    35, >   -27
mov.f  }   -42, $  5432
dat.b  @  3534, }   -14
djn.b  $    28, #  5382
djn.i  {  6290, {   -48
slt.i  #   433, <   -45
mul.b  <  3509, {    59
seq.i  *  4122, @    38
sub.f  >    23, }    23
spl.ab *  2509, <  2134
jmn.x  *    27, {   -36
spl.b  #     0, #    24
mov.i  }   -33, }    -8
djn.f  >  6006, {  3287
djn.i  $  3528, $    41
sub.ab $   -21, >   -10
mul.a  *    25, <   -20
mov.ab $     5, }    47
jmz.ab >  4941, $  4584
mul.b  {  2367, >     5
jmp.i  }   -11, >   -27
jmz.b  $    48, }  7475
dat.b  @    20, >   -14
div.i  @    28, #   359
jmn.x  {  3553, {  7109
slt.i  }  5169, <    49
dat.a  #    -9, >  4138
sub.ba @    14, <  5845
sne.a  #  7555, $    24
jmp.a  }   -11, }    -2
seq.ba $  1110, @   -40
add.x  }   -17, }  3803
div.ab $    46, >    48
seq.ab }     5, $    21
nop.ba <   -19, >  2860
mov.i  #   511, $  5405
end
;0
------------------------------------------ figure 1 ------------------------------------------







------------------------------------------------- figure 2
;redcode-b verbose
;assert CORESIZE == 8000
;name HAL_00.red
;author Dave Hillis
;strat submitted @date@ Genetic Programming
...
...

spl 8;
spl 12;
spl 16;

jmn.b  >A7 ,<A11
seq.f  *A8 ,#A3 
spl.ba #A6 ,{A8 
djn.x  #A12,{A0 
nop.x  $A12,>A16

djn.ab #A16,<A11
seq.a  *A14,#A2 
spl.ab #A12,$A1 
djn.ba >A12,{A15
sub.ab *A12,>A16

spl.b  *A13,<A6 
djn.a  *A6 ,#A16
sne.ba #A12,$A1 
djn.ab #A7 ,{A11
slt.x  *A13,{A16

jmn.ab *A2 ,@A9 
mod.x  @A4 ,$A17
mov.b  >A13,$A4 
djn.b  <A7 ,#A12
dat.x  <A13,@A10

end
;0