This evolver is designed to
refine a few things...
Able to launch multiple pmars battles for better speed on multi-core
systems
In-memory soup for less disk activity than a warrior-file-based evolver
Better compatibility with Windows and Linux systems (not dos or script)
Supports the "swap lines" mutation for evolving nano without length
change
Optional selective crossover with adjustable granularity and proportion
Soup wrap-around is optional, can be enabled for X and/or Y directions
Automatic benchmarking and saving with optional re-insertion (Valhalla)
RedMixer-like interface for exploring the soup from within the program
Written in native FreeBasic for better code structure
Evolve starts evolution, pressing Esc returns to the interface.
The cursor is moved using the cursor keys to select a warrior.
List lists the warrior code, Run runs the warrior in pmarsv.
The 1 to 9 keys (on the numeric keypad) battles the warrior with a
neighbor.
Test performs a benchmark test on the warrior.
Upon exit it saves all soup warriors and writes a soup.dat file for
restarting.
Here is the MEVO source code, last
modified
12/3/09.
Here is the documentation for MEVO,
last modified 12/4/09.
Here is MEVO compiled for x86 Linux,
code version 12/3/09, last modified 12/4/09.
Here is MEVO compiled for Windows, code
version 12/3/09, last modified 12/4/09.
[these are for the original MEVO, a new version is now available...
see below]
Notes - It often takes millions of battles and several days to
produce
competition-strength nano warriors, for larger core sizes don't expect
anything beyond beginner-level strength. The included INI files have
not been fully optimized. MEVO is a processor-intensive program and
requires a computer and disk system that can operate continuously
without overheating or other issues. Don't run MEVO from a thumbdrive
or other limited-write "flash" disk such as those found in some
netbooks. A ram disk is recommended, see the readme file in the package
for instructions.
The Linux-compiled archive includes pmars and pmarsv binaries and
benchmark sets, suitable for running from a Ubuntu live-CD or other x86
Linux systems. The included binaries are the slower but
more
compatible "N2" versions compiled using gcc 3.3, somewhat faster
"N1" binaries are in the pmars092N.tar.gz
archive. The Windows-compiled archive includes a version of "N2" pmars
compiled using MinGW (gcc 3.2), however for size and other reasons the
included pmarsv is the old dos 286 version, recommend replacing with
the SDL
"pmarsw" version.
The source can be compiled using the FreeBasic compiler, edit first to
indicate platform (for Windows set both usewin and useuni to 0).
Compile using option
-mt to specify the thread-safe library, to enable bounds checking for
debugging also include the -exx option (in addition to -mt). The Linux
version must be
run
from a terminal (xterm, konsole, gnome-terminal etc), it checks to make
sure TERM is set to something. The Windows exe can be run by
double-clicking or for better performance run from a batch file that
sets comspec to the binrun.exe utility (included in the package, tested
using WinXP). All settings are controlled by an INI file, which is
created if it doesn't exist. If soup.dat exists it's reloaded to
continue a run, don't change instructions, soup size or pmars coresize
etc during a run as the warrior code is saved in numerical form. To
restart a run, remove the soup.dat file, the soup directory and if
present the save directory and the mevo.log file.
More software, benchmarks
and corewar info is at the corewar.co.uk
and www.koth.org web sites.
MEVO2
It's been awhile since I played around... but got the bug and
dragged out the old code to try some new things...
Added an option to evolve the core mutation rates for each
individual warrior
Added an option to periodically increase the mutation rates to simulate
"storms"
Allow using multiple bench sets to try to find warriors that are
generally strong
Added an option to initialize warriors to a predefined instruction
sequence
Added an option to only reinsert warriors that scored over a percentage
of top score
Added an option to adjust local references after insert delete and swap
operations
...not that these things have really helped so far :-) so leaving
the original MEVO alone.
Other new features include...
Most filenames and directories can be specified in the INI file
Can point tempdir to a ramdisk without having to copy the whole thing
Can now restore directly from the soup directory, the .dat files are
now optional
Option to periodically save the soup to avoid losing a run if something
crashes the lights go out
Now uses a hash function to detect and avoid saving dups rather than
relying on identical bench score
(which only worked for nano set to 142 rounds to trigger the -P
option anyway)
Somewhat better parsing when loading soup and save warriors, now
understands pmars "load" format
Can now seed the soup directory with other evolved or human-made
warriors
Reinsert no longer requires the saved warriors to be in sequence (but
they still have to be numbered)
Added some debugging options - one option lists the last
source/destination/cross code for each thread
Modified the plotlog utility so it displays stats and automatically
refreshes the display
Added the (Linux-specific) blassic scripts I use for benchmarking to
the readme file
Added more nano test sets - nanoht2 and nanoht3
Added bash shell scripts for listing the top warrior and last saved
warrior, with automatic refresh
Added a bash script for archiving a run and deleting files to start over
(have to do that a lot, most runs end up being dead ends)
Here's MEVO2 for 32-bit/x86 Linux
- code version 3/2/15, last archive update 3/5/15
For 64-bit Linux enable multiarch and install a few i386 libraries.
For Windows, recompile with FreeBasic and use the pmars/pmarsv/binrun
exe's from the original package.
3/15/15 - Updated the mevo2 archive - fixed typos in readme.txt and
mevo_nano.ini. NanoCracker made KOTH on the Koenigstuhl Nano hill -
wasn't expecting that but I'll take it. Possibly because it was
selected using my benchallnano script (in the readme of the new
package), this script tests a directory of warriors against several
bench sets and sorts them by average score to find warriors that score
well against all of the bench sets - usually the top.red warrior is not
the overall best but just happens to score well against one bench set.
BTW the funny case in NanoCracker wasn't just to look funny, dups in
the instruction and modifier lists were spelled differently so I could
see mutations that happened to pick the same instruction.
3/3/15 - MEVO2! Transfer Function got knocked from KOTH.. not a
"this is war" moment (it had a very good run!) but it surprised me how
strong bvowk's new evolved warrior was on the hill - around 160! He's
doing something right. Time to experiment. One of the more interesting
mods I made was added an array so it could track mutation rates for
each warrior, and evolve them along with the warrior using similar
methods used to evolve the warrior data values.. gradual
increases/decreases and sometimes pick something else. I noticed that
the warriors prefer lower rates than I usually set the evolver for.
Here's a new warrior I submitted to Sal Nano (made with this INI file)...
;redcode-nano
;name NanoCracker
;author Terry Newton
;strategy evolved by mevo2 on 2015-02-21
;strategy orig name 30_1/2048.red, gen 3896
;strategy nanoht2:152 nanobm07:154 nanob2:154
;generation 3896
;species 808
;origin 7_21
;assert CORESIZE==80
spL.i #-38,>-9
mOV.i <-1,{-1
MoV.i {-2,<-2
mOv.i >-34,{-3
dJN.I $-2,{-4
end
;soupwins 11
;instrate 0.01045615
;modrate 0.007155502
;moderate 0.005
;datarate 0.0051
;insrate 0.005797357
;delrate 0.01031488
;swaprate 0.0063132
;benchscore 152.87 (nanoht2 142 rounds)
And the bench scores against my new nanoht3 50-warrior test
set...
(142 rounds, score, wins, ties)
02_08.red 148.59 68 7 **************
0535.red 156.33 71 9 ***************
b1f3ad11-a6073e-f711ff69 191.54 88 8 *******************
b69e6908-a5be0bba-ab4445 142.25 66 4 **************
rdrc: Breakaway Carte 142.25 59 25 **************
c82f15b5-85011fd8-5a969d 165.49 78 1 ****************
90ae3a01-58e6f8d2-f52b2a 185.91 84 12 ******************
[RS] Cefaexive Motween 163.38 74 10 ****************
Clear-2 145.77 65 12 **************
Cosmic Uncertainity 162.67 75 6 ****************
Creamed Corn 156.33 73 3 ***************
Crimson Climber 145.77 68 3 **************
[RS] C-Strobia 164.78 75 9 ****************
Dodecadence 166.90 73 18 ****************
eerie glow 135.91 58 19 *************
e6843-5724-xt430-4-nano- 117.60 41 44 ***********
[RS] EvoTrick II 164.08 76 5 ****************
[RS] Existephall Apris 143.66 59 27 **************
Fragor Calx 166.90 76 9 ****************
Frothering Foam 133.09 58 15 *************
Furry Critter 142.95 65 8 **************
girl from the underworld 128.87 60 3 ************
hemlock 174.64 76 20 *****************
Hot Soup 157.04 70 13 ***************
JB268 145.07 67 5 **************
Left Alone 159.15 74 4 ***************
looking glass 166.19 76 8 ****************
Man&Machine 176.05 82 4 *****************
Military Grade Nano 149.29 69 5 **************
a slice of moonbeam pie 169.01 78 6 ****************
Muddy Mouse (RBv1.6r1.1. 154.22 71 6 ***************
Obsidian peasoup 117.60 51 14 ***********
Pacler Deux 135.91 58 19 *************
Plasmodium vivax 184.50 84 10 ******************
Little Red Rat 155.63 72 5 ***************
ripples in space-time 159.85 70 17 ***************
Rocket propelled monkey 154.22 71 6 ***************
rumpelstiltskin 128.87 52 27 ************
Science Abuse 153.52 70 8 ***************
Shutting Down Evolver No 155.63 72 5 ***************
Sleepy Lepus 157.74 70 14 ***************
rdrc: Snapback Sprite 174.64 81 5 *****************
the spiders crept 151.40 63 26 ***************
Steaming Pebbles 146.47 67 7 **************
Stegodon Aurorae 148.59 67 10 **************
Subterranean Homesick Al 150 63 24 **************
Transfer Function 140.14 62 13 **************
war of the snowflakes 131.69 51 34 *************
White Moon 161.97 74 8 ****************
wrath of the machines 187.32 84 14 ******************
Score = 154.35
And the results of a simulated hill using the same warriors...
1 156.91 b69e6908-a5be0bba-ab4445f7.rc by bvowk
2 154.22 NanoCracker by Terry Newton
3 152.62 [RS] C-Strobia by inversed
4 152.25 Transfer Function by Terry Newton
5 151.46 eerie glow by John Metcalf
6 150.86 Hot Soup by Terry Newton
7 148.61 Crimson Climber by Terry Newton
8 148.37 Frothering Foam by Terry Newton
9 147.94 Little Red Rat by Terry Newton
10 147.81 Steaming Pebbles by Terry Newton
11 147.59 02_08.red by Terry Newton
12 147.55 Sleepy Lepus by Terry Newton
13 147.39 Pacler Deux by Roy van Rijn
14 147.00 [RS] Existephall Apris by inversed
15 146.86 ripples in space-time by S.Fernandes
16 146.81 Fragor Calx by Terry Newton
17 146.10 Creamed Corn by Terry Newton
18 145.38 Science Abuse by Roy van Rijn
19 145.09 Shutting Down Evolver Now.. by Roy van Rijn
20 145.04 hemlock by John Metcalf
21 144.86 Obsidian peasoup by Miz
22 144.82 White Moon by Fluffy
23 144.53 Furry Critter by Terry Newton
24 144.53 a slice of moonbeam pie by John Metcalf
25 144.15 the spiders crept by hwm
26 143.78 Cosmic Uncertainity by Terry Newton
27 143.68 0535.red by Terry Newton
28 143.33 [RS] EvoTrick II by inversed
29 143.33 Military Grade Nano by Ken Hubbard
30 143.27 Left Alone by Fluffy
31 142.77 [RS] Cefaexive Motween by inversed
32 142.65 Muddy Mouse (RBv1.6r1.1.231) by The MicroGP Corewars Collective
33 142.58 Man&Machine by Roy van Rijn
34 142.55 wrath of the machines by John Metcalf
35 140.90 Dodecadence by G.Labarga
36 140.04 JB268 by Terry Newton
37 139.94 c82f15b5-85011fd8-5a969d25.rc by bvowk
38 138.63 Rocket propelled monkey II by G.Labarga
39 136.99 rumpelstiltskin by gnik
40 136.35 b1f3ad11-a6073e-f711ff69.rc by bvowk
41 136.31 looking glass by John Metcalf
42 136.20 Stegodon Aurorae by S.Fernandes
43 135.52 e6843-5724-xt430-4-nano-eve78 by bvowk
44 135.30 girl from the underworld by John Metcalf
45 134.75 Clear-2 by Roy van Rijn
46 132.42 rdrc: Snapback Sprite by Dave Hillis
47 131.01 rdrc: Breakaway Carte by Dave Hillis
48 129.17 Plasmodium vivax by Fluffy
49 127.83 war of the snowflakes by John Metcalf
50 126.11 90ae3a01-58e6f8d2-f52b2a83 by caelian
51 125.46 Subterranean Homesick Alien by Roy van Rijn
On the actual Sal Nano hill NanoCracker placed 3rd... this set
doesn't exactly replicate hill conditions.
12/10/11 - if running MEVO under a 64-bit version of GNU/Linux make
sure the libncurses5:i386 package is installed.
12/26/11 - libxpm4:i386 is also required to use the plotlog program.
12/5/09 - a nano MEVO warrior called "Transfer Function" got KOTH at
the SAL Nano Hill. I guess it works :-)
I've been using the nanobm07 and nanob2 benchmarks to pre-test
warriors, but lately they've been a poor predictor of hill performance.
So I put together a custom 30-warrior benchmark (in nanoht1.zip), not perfect but seemed to hit the
hill scores within a few points (past tense since it's a moving
target). I started with a bench set of all the current hill warriors I
could find, then for each warrior if the bench score was too high
compared to the hill score I added
warrior(s) that it did poorly against, and if too low added warrior(s)
that it did well against but reducing the scores of other warriors that
were checking too high. Here's the resulting (dressed up) warrior...
;redcode-nano
;name Transfer Function
;author Terry Newton
;strategy evolved as 20_7 by MEVO, generation 696
;strategy selected using custom benchmark selected
;strategy to mimick hill performance, scored 154
;strategy nanobm07 148, nanob2 153
;generation 696
;species 6523
;origin 68_15
;assert CORESIZE==80
spl.i #23,}2
mov.i >10,}1
mov.i >-34,}-2
mov.i #32,}-2
djn.f $-4,<-36
end
;benchscore 154.06 (nanoht1 142 rounds)
...and here's a link to the mevo.ini
file I used to make it. Around 1.7 million iterations I set
reinsertmode and reinsertinterval both to 1 and ran for a short time to
concentrate the soup, by then the top score was already around 154,
bumped it up to 155. Put back to normal, ran a bit longer then settled
on one of the new varients for its relatively high nanobm07 score.
Here's the plotlog chart of the run...
12/4/09 - Updated the source, docs and packages for the 12/3/09
code, which adds a couple of subtle options to control selection of
crossover mates - anychance (default 0) controls the chances of
crossing with any warrior rather than just warriors of the same size
and species, and bestchance (default 1) controls the chances of
choosing a warrior with the most wins. Usually crossing dissimilar
warriors results in broken code, but a certain amount of that can
result in overall improvement by providing additional chances for
weaker warriors to win battles and reproduce. Always choosing the
warrior with the most wins might be limiting. Whether or not these
changes improve the scores (I don't know yet), it feels better if
warriors are free to select any warrior as the odds dictate. The Linux
version contains an updated version of the plotlog utility that better
auto-detects parameters from the log file and draws the average score
curve using lines instead of points. The plotlog utility has not been
updated in the Windows version, I really need a different approach as
the present code can only run full-screen and to make up for that dumps
to a bitmap file (using somewhat scary code), would much prefer a
solution that can display graphics in a window then if a copy is needed
it can be obtained using normal copy/paste functions.
11/29/09 - Updated the packages to the 11/27/09 code, also rewrote a
lot of the docs (was going for less but ended up with more, oh well).
Part of the doc update clarified the disk-write paranoia section to
include a rough formula for calculating the lifetime of a solid-state
disk with MEVO running on it - for a cheap MLC-type disk like in some
netbooks it worked out to about 100 days before failure (under good
conditions). That's why I include a warning (MEVO doesn't produce all
that much disk activity but those things are very fragile). Then I
computed the lifetime of a real server-grade SSD with an endurance of a
million writes per block and 5GB free for leveling and it came out to
900 years. I don't think there's much concern about wearing those
things out.
11/27/09 - One of the things I like about RedMixer is it can record
a run as a sequence of ANSI soup dumps which can be played back at high
speed, compressing hours and days into seconds. ANSI has drawbacks
though, in raw form it's hard to control the playback speed unless a
custom ANSI viewer program is used, and it's difficult to convert the
output file to a standard animated GIF - the one app that did that
obscure thing no longer works on my system. This time I took a
different approach, rather than modifying the evolver to save snapshots
I wrote a script that uses ImageMagick's import utility to save
periodic snapshots to a sequence of GIF files which can then easily be
combined to produce an animation. Here's the imgdump.sh
script I'm using, once the individual GIFs were obtained (about 2.5
hours of run time) they were combined into a 2.8 meg animated GIF using
ImageMagick's
convert utility (command line: convert -delay 15 -loop 0 dump*.gif
mevo800u_1.gif). The main drawback of this approach is the evolver
window has to be displayed and unobstructed to record (I used the
wmctrl utility to focus before each dump), this makes it difficult to
use the computer for anything else while recording a run.
I added end number (starting location) mutation to the "test" source
and docs, this is the last major thing
RedMixer has that MEVO didn't have. Don't know yet if it will help or
hurt or make no difference, but that's what research is all about. I'd
been putting off this change due to its perceived complexity and
because it required a change in the soup.dat format, but ended up being
easy and compatible with previous data files. The hard part is dialing
in the settings... [sample INI's in the docs are guesses]
11/25/09 - Minor update to the packages - added an altini directory
under the benchtools directory so other core sizes can be benchmarked
without editing the INI files, instead I can run use_tiny, use_standard
etc to copy over an appropriate set of INI files for the core size
being used. The primary mevo.ini for evolving must still be manually
configured by deleting the existing mevo.ini and copying one of the
examples to mevo.ini but that's easy enough, and I'm not satisfied with
the INI's for larger core sizes (so I don't want to give the illusion
that they work:-). MEVO is a code-evolution research project and a big
part of the ongoing research is determining optimal evolving parameters
and also determining what parameters can be omitted. It usually takes
about 2 days to perform a run to "saturation", and each set of
parameters takes at least 3 tries to be statistically significant. I'm
gathering a lot of data but so far most of it only indicates what
doesn't work all that well. Evolving strong nano warriors is fairly
easy but it is quite difficult to produce even half-competitive
warriors for larger core sizes. Of course that's what makes it
interesting to try. For these experiments absolute strength is only an
indicator, not the goal (good thing!), there are other evolvers
(Maezumo, CCAI, Species, etc) that use more direct methods for
producing stronger warriors however my goal is to not use direct
knowledge of corewar techniques, but rather to study the creation of
code where the optimum code sequences are not specified. Here is a weak
but interesting warrior that was produced during one of the tiny runs...
;redcode
;name 22_16
;author mevo
;strategy evolved
;generation 63
;species 141
;origin 40_7
;assert 1
spl.a #17,<-372
spl.b {124,#-285
mov.i >-5,}-1
end
11/21/09 - Updated the packaged Linux and Windows versions to the
11/19/09 code, tbe new options seem to improve things slightly (at
least for nano, more testing is needed with larger coresizes) and
the changes are fairly minimal so going for it. If there are any
flub-ups with the new packages here
are the previous Linux and Windows versions that use the 10/02/09
code.
11/19/09 - I've
noticed that MEVO isn't quite as strong as RedMixer, especially for
larger coresizes. It's very close
but some of the things I omitted to simplify
might have an effect so I'm experimenting with adding a few things. The
"test" source and docs now include new INI options xwrap and ywrap to
enable wrap-around, and a flipstart option to select the odds of
starting the crossed warrior with the warrior that just won the battle
as opposed to the mate (previously it was 50/50 but it might work
better if it starts with the winner more often).
11/1/09 - Mostly fixed the (lack of) speed issue with Windows by
making my own "fast" command interpreter, written in FreeBasic at that.
The 1000 shell test now turns in 18 seconds, compared to 47 seconds
when using the default "cmd" interpreter. The
"binrun" program also works with Maezumo and should be
adaptable to almost anything that shells binaries, just wrap it in a
batch that sets comspec to the runbin.exe file. At the moment it has
only been tested with Windows XP. The fallback feature assumes that the
normal cmd.exe interpreter is in %windir%\System32, if not then it
won't process anything except for binary shells.
The Windows version of MEVO now comes with binrun, it's still not as
fast as the Linux version when evolving nano warriors but good enough
for now, possibly approaching the limits of the platform. Can be
made a bit faster by setting pmars in the INI to bin/pmars.exe instead
of just pmars so it won't have to search the path for it.
10/31/09 - Updated both the Linux and Windows versions to include
mods to the bench tools - the "findbest" program that cross-references
two benchmark reports now can read settings from a file to permit
scripting without having to type in filenames, now it works by just
double-clicking after running the "bench1" and "bench2" tests.
I'm honing in on the shell speed issue - I found that on my WinXP
system if I modify the startup batch file to include the line "set
comspec="%windir%\SYSTEM32\COMMAND.COM" along with a "set mode
lines=25" to avoid screen flicker the shells are over 50% faster...
when applied to the shell test program it turns in about 30 seconds for
1000 shells, down from 47 seconds. This isn't exactly a desirable
solution but it is faster despite having to shift to 16 bits then back
to 32 bits with every shell (that seems to be what causes the flicker
and surely is unnecessary overhead). I'm now searching for a 32-bit cmd
replacement that can perform faster shells but so far not having luck -
seems that the priority with add-on shells is with the GUI, more
commands and other interactive features, I see no evidence that much
thought is given to how fast a simple "%comspec% /c program > file"
can execute. What I really need for shell-based evolvers like MEVO,
Maezumo, etc is a very simple command interpreter that only knows how
to (quickly) run a program and redirect its console output to a file.
10/27/09 - To try to get to the bottom of the shell slowness problem
when evolving nano warriors under Windows I wrote a little test
program...
100 print "Shelling pmars nano 1000 times..."
110 print TIME$
120 for i = 1 to 1000
130 shell "bin\pmars -bks 80 -p 80 -c 800 -l 5 -d 5 -r 100 top.red > temp.fil"
140 next i
150 print TIME$
160 kill "temp.fil"
170 shell "pause"
180 system
I ran this under Blassic, FreeBasic and QBasic, and under Linux
using Blassic and (patched) FreeBasic on my 1.4ghz AMD system.
Results...
Windows Blassic: 48 seconds or about 21 shells per second
Windows FreeBasic: 48 seconds or about 21 shells per second
Windows QBasic: 23 seconds or about 43 shells per second
Ubuntu Blassic: 7 seconds or about 143 shells per second
Ubuntu patched FreeBasic: 7 seconds or about 143 shells per second
The most significant thing about this test is the problem is NOT
with FreeBasic, Blassic turned in identical results. QBasic does better
but not by as much as I thought. I think what's going on is shell has
to load the command interpreter, cmd in the case of Win32 and command
for QBasic, this appears to dominate the run time. Linux is based on
programs shelling programs, the time it takes to load an sh shell is
insignificant so it's clearly the OS of choice for shell-based
evolving. Open Pipe in native FreeBasic is probably slightly faster
since it doesn't have to write a temp file for scores.
So... [edited 10/28] the only practical
code-based solution I can think of for faster nano evolving with MEVO
under Windows is to modify pmars or exmars so that it writes
a file, eliminating the need to load a command shell. Should be
possible to execute the binary
directly. Theoretically. Another option used by some evolvers is to
embed the mars simulator in the program itself, totally eliminating an
external binary. FreeBasic permits linking external libraries into the
binary, but such a solution would not work for MEVO - the whole idea
behind MEVO's multi-threaded approach is to tell the OS to run pmars as
a separate task so it can be executed in another core on a multi-core
machine. This is really multi-tasking as the way I understand it,
multiple threads in the same task still only run on one core, at best
using "HyperThreading" to speed up execution.
[but... since cmd.exe seems to be the holdup, a potential
environment-based solution is to replace it with something faster
rather than trying to change the program code]
10/13/09 - I packaged MEVO for Windows. Results are mixed - nano
evolving is about 7 times slower under Windows XP on my
hardware but it's probably OK for casual use and larger coresizes
aren't impacted nearly as much. To keep the size down the included
pmarsv and plotlog binaries are dos-based. The pmarsv binary can be
easily replaced with the much better SDL version, but I had no luck
making a true Windows version of the plotlog utility - could not get
FreeBasic graphics to work (will probably take using OpenGL or other
libraries, not QB methods). The dos version of plotlog (assuming it
runs at all, works under XP) can only run full-screen, making it
difficult to save the graphics output so added a bit of code to dump
the graphics to a BMP file. Well... now I know what the limitations
are. For obscure software like this it doesn't matter that much but
occasionally I
get called on to (quickly) write Windows programs so knowing what works
and what doesn't and how to work around the issues is very useful for
future reference.
10/5/09 - In further tests a tiny run without crossover has reached
a Franz score of 175 by almost 2M iterations (mostly stones), whereas a
run with crossover enabled is only up to a score of 136 by about 1.5M
iterations (mostly papers). This is in contrast with previous results.
Side by size nano runs didn't show a whole lot of difference, both
produced warriors scoring around 150 with the run with crossover
enabled doing a bit better against the nanobm07 benchmark and a bit
worse against the nanob2 benchmark. A pair of coresize 8000 runs
produced mostly stones (usually I get more papers), the run without
crossover reached a Wilkies score of 84 and the run with crossover
reached a score of 93. It appears that random forces have more of an
effect on the quality of the evolved code than whether or not crossover
is enabled (at least with the stock settings), confirming previous
observations with other evolvers that this style of redcode crossover
doesn't make a whole lot of difference. It'll take many more runs to
draw statistically valid conclusions but it's fairly clear that it's
not a magic bullet, just another kind of mutation to try that might or
might not help.
10/3/09 - The "test" source and docs have been updated to the
version
with crossover [...]. The crossover scheme is
fairly simple, picks another surrounding warrior of the same size and
species, of the available candidates randomly selects the one with the
most wins. The warrior that just lost is excluded from selection.
Starting source is random. It only flips the source on lines,
controlled by two rate variables. If no suitable mate is found it just
mutates. Hmm... tiny evolving is much better now, in an overnight run
reached a Franz score of over 160 versus about 155 without crossover
after a much longer period of evolving [...] the main tar.gz
package is now updated
with new code, docs and tools. If there are any problems, the previous
9/22/09 compiled package (without crossover and tools but fairly
well-tested) is here.
10/2/09 - One easy way to make a standalone benchmark solution is
just compile Blassic code with FreeBasic (the INI-based version similar
to the one in Maezumo, obviously if compiled the source can't be edited
to specify parameters). This provides the best of both worlds,
standalone operation or to save space, or delete the binaries and run
the programs interpreted. [...]
I'm not exactly thrilled with tiny warrior evolving performance...
have to try really hard to get even 150 Franz points, only in the 90's
against tinybm04. Not good. With RedMixer I had at least one run that
produced warriors scoring in the 170 Franz range. So the question still
remains - does crossover produce a tangible benefit? Or was it just
luck? I'd say for nano it doesn't matter that much, but there might be
an effect for larger coresizes. I am considering adding crossover to
MEVO.
In RedMixer the only crossover mode that seemed to work well was
crossing with the same species and choosing a surrounding warrior with
the highest accumulated score, otherwise too much broken code was
produced, making it worse than no crossover at all. If crossover is
added to MEVO I want it to be simplified, avoiding options known (now)
not to work well. Also to keep from cluttering the INI too much.
Redmixer only permitted 50/50 (on average) mixing, I'd like to be able
to control the proportions. Keeping track of accumulated score is
probably unnecessary, for selecting the mate it's probably enough to
simply track the number of wins and pick the surrounding mate of the
same species with the most wins - high accumulated score often just
means the warrior is surrounded by weak code. No real need to save the
wins information, when restarting crossover might not be as effective
for the first few cycles but I doubt that has a significant effect, at
least not as much of an effect as changing the file formats which I'd
like to avoid (INI doesn't count since if the settings are missing it'd
just default to no crossover). RedMixer could cross both lines and
individual items but the latter is dubious, numbers usually depend on
the instruction using them so could go with line only. So... might be
able to implement crossover using only settings for enablecross
(yes/no), [edit] flipmaterate (chance of changing to mate) and
flipbackrate (chance of flipping back).
9/29/09 - Updated plotlog with better error checking and an option
to autodetect the X iteration and Y score settings from the log file
rather than entering numbers (defaults used for the other variables)
[updated again 9/30/09(B) with better default operation and to fix an
issue with FB - doing SCREEN 0 then exiting was leaving a hung task so
took out that part].
It's slightly more refined now but still crude and rude, some inputs
will result in a messy (or no) display. Not a problem, proper inputs
look fine. As far as MEVO itself, I'm fairly happy with its performance
and have no immediate plans for changing anything in the program code
itself (unless I find a bug). Could be a bit stronger (especially for
tiny warriors), but there's a lot of experimenting left to do with the
INI settings to try and improve strength before considering hacking in
more stuff. There will probably be updates to the pre-compiled Linux
archive to add in the log plotter, hopefully better INI files, and some
kind of benchmarking solution. The Blassic-based nanotest package works (for nano) but
I'd prefer a self-contained compiled menu-driven solution. Something
that can automatically run a dual benchmark and report the warriors
that do well against both test sets, and doesn't require another
download to use.
9/28/09 - Here is plotlog.bas for
graphing a MEVO log file. It's written in QBasic syntax but can be
compiled using FreeBasic's -lang qb -e options [for Windows has to be
run in QBasic, full-screen only]. The
Linux-compiled version doesn't require a terminal, on my Ubuntu system
I can double-click the binary and the current directory gets set right,
but this isn't true of all versions of Linux so a script with cd
`dirname $0` etc might be needed.
The following charts show the results from a couple of runs...
The X axis shows millions of iterations, the Y axis shows the benchmark score. The blue line shows the average peak score, the brown line holds the top score, the red dots show the average score (note that it oscillates), the purple dots show individual score results. The first run was sampled every 1000 iterations, with averages computed over 100 samples (100,000 iterations), the second run was sampled every 500 iterations with averages computed over 50 samples (25,000 iterations). The first run used the nanob2 test set (the top-20 Koenigstuhl warriors), the second run used the nanobm07 test set.
The program prompts for the details when starting...
Nothing fancy, just enter for the default or enter something to
change. X iterations and Y score define the scale maximums, averaging
size is the number of data points to integrate for the averages, the
increms determine the chart lines and label points. [updated 9/30]
Here are the top-scoring warriors at the ends of those runs...
;redcode ;redcode
;name 3_20 ;name 45_19
;author mevo ;author mevo
;strategy evolved ;strategy evolved
;generation 516 ;generation 532
;species 4659 ;species 1719
;origin 37_15 ;origin 32_12
;assert 1 ;assert 1
mov.f <-7,$-36 mov >23,$-36
spl.i #-42,>18 spl.ab #-5,<-40
mov.i *-3,{-2 mov.i >23,{-1
mov.i {-3,{-2 mov.i >-33,<-2
djn.i $-2,$-3 djn.i $-2,@-2
end end
;benchscore 156.37 (nanob2 142 rounds) ;benchscore 149.71 (nanobm07 142 rounds)
Both of these warriors show "focus" from re-insertion, scoring in
the low 140's against the other test set. Not so great but it
represents typical results - it usually takes several runs to produce a
warrior able to place near the top of the SAL nano hill. Also, the
top.red warrior is often not
the overall best, in these runs other warriors in the save directory
scoring slightly less against the evolving test set scored much better
against the other test set - the findtop.bas program in the nanotest
package (see 9/23 entry) compares scores from two benchmarks to find
the "best" warriors. These look like they would do a little better
(modified to add extra info to comments)...
;redcode ;redcode
;name 7_3 ;name 57_6
;author mevo ;author mevo
;strategy evolved ;strategy evolved
;generation 456 ;generation 368
;species 2960 ;species 4926
;origin 37_15 ;origin 32_12
;assert 1 ;assert 1
mov.i }-9,$-33 mov <-1,$-33
spl.ba #-42,>18 spl.a #-6,<-40
mov.i @24,{-2 mov.i *19,{-1
mov.i {-3,{-2 mov.i >-28,<-2
djn.i $-2,$-3 djn.f $-2,$-3
end end
;09-23-2009,iter 1147286,save/181.red ;09-16-2009, iter 862022,save/162.red
;benchscore 154.01 (nanob2 142 rounds) ;benchscore 148.32 (nanobm07 142 rounds)
;otherscore 149.36 (nanobm07) ;otherscore 152.78 (nanob2)
The INI files used to make this are similar to the 9/15 nano INI's,
the first run had 3 .i modifiers instead of a space replacing one of
them, benchinterval set to 1000. The second run had rounds set to 150,
modrate to .04 and swaprate to .01.
9/23/09 - Here's a possible benchmarking/selection solution for nano
warriors - mevo_nanotest.tar.gz
contains adaptations of the warbench
program (requires Blassic) for
testing the soup and save directories against the nanobm07 and nanob2
test sets. Another pair of scripts reads the bench reports and
generates new reports to find warriors that score well on both test
sets. One possible (but simple) way to do it. Although configured for
nano warriors and MEVO's directories, it would be trivial to adapt
these to other coresizes and apps.
9/22/09 - The 9/20/09 code seems pretty stable, no more major
features are planned in the near term, so archived a pre-compiled Linux
version of MEVO (or at least one way the program can be packaged). The
archived directory tree contains pmars and pmarsv binaries and
benchmark sets plus a couple of run scripts, one for running in xterm
and another that tries other terminals. MEVO should work (perhaps with
a bit of script editing) on most Linux systems but how easily depends
on the distribution. Ironically the most complex aspect when trying
some live-CD's was merely extracting the tar.gz file and
creating a new tar.gz file from the run directory to copy the evolved
results back to a thumbdrive. Other than archiver/OS oddities MEVO ran
under Fedora, Mandriva and Slax, but did not work right under Puppy due
to problems with color support in the terminal. I recommend Ubuntu
which has a very easy-to-use archiver and doesn't mind files on the
desktop. Recent Ubuntu live CD's don't work on my odd hardware but MEVO
runs fine using an old Ubuntu 5.10 live CD. Not as fast as on my
installed Ubuntu 8.04 OS but still by my measurements evolves nano
warriors almost 3 times faster than under WinXP.
9/20/09 - This update improves random selection when picking battle
pairs, edges are no longer always in the 2nd position, works with very
small soups or a single row. For Linux, improved the visual battle
function (keys 1-9), doesn't prompt to press a key to continue unless
the battle is completed and scores are displayed. Added a displayopts
setting to permit adjusting the minimum text display lines, colors for
text and the soup border and optionally disable/enable unicode border
characters.
I did some Windows testing - confirmed that the nice unicode border
does not work under Windows (set both usewin and useuni to 0 when
compiling for Windows, at least this is the case with XP). Evolving
nano and tiny warriors under Windows is much slower than in Linux -
under Linux I get over 1000 nano battles every 8 seconds, it takes
about a minute to do the same thing under Windows, about 7 times
slower. The difference seems to be a few dozen extra milliseconds when
shelling pmars and opening the results file [or possibly also updating
the display]. When evolving for coresize
8000 where battles might take a quarter of a second or more it doesn't
make much difference, but when [millions of] nano battles are needed
to make a strong warrior then a few
extra milliseconds per shell is a big deal. QBasic is faster in
this
situation (assuming the OS can run a dos app), but requires spaghetti
error
handling and can't access enough memory to use an array soup except for
very small evolvers using a relatively small soup, larger QBasic-based
evolvers have to use individual warrior files [further benchmarking
shows that FreeBasic or Blassic under Ubuntu is still about 3 times
faster than QBasic under WinXP when shelling pmars].
9/19/09 - Had previously added a testdir INI option so a different
bench set can be used for single-warrior testing via the interface, but
forgot to add the code to actually use it. Fixed.
9/18/09 - Wrote up some docs, it's preliminary and might have typos
etc but explains things in a lot more detail. Possibly too much detail
but I'm compelled to explain things like why the simple pair-battle
algorithm works reasonably well (and very well for nano) and why strict
evaluation schemes usually don't work so well. In addition to
documenting the program itself. But there are still many things to
learn, new techniques to discover. Or rediscover. I suspect that
there are major improvements to be had not with more code but just
better parameters.
Then there is crossover... although MEVO is mostly an improved
version of RedMixer written to more modern standards, I didn't copy
over the "mix" part as my previous implementation produced only modest
improvement. Perhaps I eventually will (every little bit helps) but I'd
rather see a better implementation of crossover that produces a more
obvious improvement in strength. But before doing that, it probably
would be wise to try to get as much performance as possible from the
simple mutation scheme. Otherwise it is difficult to tell if
improvement from adding crossover is really from crossover, or simply
because it added more mutation.
Instead of worrying about fancy stuff, for now concentrating on
refining the evolving framework itself. Time permitting of course -
slam time approaches once again so I don't want to get too deep into
it. Tonight's update is for a couple of simple nice things... now when
starting and scanning the saved warriors it parses the filenames to
determine the next number to save to, even if some of the warriors are
missing, ignores warriors that aren't numbered so if I want I can name
and dress warriors in the save directory without interfering, and saves
the file number of the top-scoring warrior so the top score can be
displayed as soon as the evolve process is started.
Here's a fairly strong nano warrior made using the 9/15/09
settings...
;redcode-nano
;name Creamed Corn
;author Terry Newton
;strategy Evolved by MEVO, gen 2706
;strategy NanoBM07=153.9 Nanob2=156
;generation 2706
;species 8922
;origin 58_12
;assert CORESIZE==80
mov >22,$10
spl.a #-6,>-38
mov.i <-1,{-1
mov.i {-2,<-2
djn.x $-2,<-34
end
;benchscore 153.94 (nanobm07 142 rounds)
...took a couple of days but got 3rd place on SAL Nano.
9/16/09 - Added logging - set enablelog to yes to write start and
stop times and all benchmark results to a mevo.log file in the program
directory. Start and stop actions record date, time and iteration
count. Bench actions record the score, top score, generation, date,
time, iteration count and if saved, the filename saved to. This data
can be useful in a couple ways, can be used to more accurately gage
performance and can be used to analyse strength increase over the
course of a run. For performance, my Ubuntu desktop does 1100 nano
battles (1000 iterations plus a 50-warrior bench every 500 iterations)
about every 8 seconds with one thread, my HP 110 Mi does the same
number of nano battles every 9 seconds with 4 threads, 10 to 11 seconds
with a single thread. Pushing it to 8 threads gets it down to about 8
seconds. Hmm... feels faster but I guess not. Still, that's over 100
nano battles per second, haven't measured it yet but my little Asus
701SD seems to perform at about 1/2 to 2/3 that speed. Don't expect
anything like this kind of speed with MEVO under Windows (the old
QBasic evolvers can come close, but FreeBasic for Windows has a fairly
slow shell command I've noticed). For larger coresizes the speed of
Linux vs Windows will probably be closer to the same, less limited by
the speed of the shell command.
Alternate nano settings that seem to be doing a bit better on my
Asus 701SD...
rounds: 200
instrate: 0.02
modrate: 0.03
moderate: 0.03
datarate: 0.07
insrate: 0.01
delrate: 0.01
swaprate: 0.01
dupline: 0.3
incdec: 0.3
bignum: 0.5
instructions: mov mov mov mov mov spl spl spl djn djn
instructions: add sne seq slt jmn jmz mod
modifiers: .i .i .i .a .b .f .x .ab.ba
modes: $#@*<>{}
After a couple days of churning it's up to a nanobm07 score of over
157
and a nanob2 (Koenigstuhl top-20) score of around 150, whereas the
other settings are up to 154 or so. Could be just statistical
differences. I notice some
focusing from using reinsertion to guide the soup, in other evolving
experiments free-running evolution tends to produce warriors that score
about
the same on those two benchmarks.
9/15/09 - Here is a mevo.ini file for evolving nano warriors...
;mevo.ini file for nano (9/15/09)
xsize: 77 ;width of soup
ysize: 21 ;height of soup
maxsize: 5 ;max evolved length
;pmars parameters...
coresize: 80 ;size of core array
processes:80 ;max processes
cycles: 800 ;cycles before tie
maxlen: 5 ;maximum warrior length
mindist: 5 ;minimum separation
rounds: 250 ;# of battle rounds
pmarsbin: pmars ;path/name of pmars binary
pmarsvbin: pmarsv ;path/name of pmarsv binary
pmarsvopt: -v 564 ;pmarsv view options etc
;mutation parameters...
instrate: 0.02 ;chance of instruction change
modrate: 0.03 ;chance of modifier change
moderate: 0.03 ;chance of address mode change
datarate: 0.06 ;chance of field value change
insrate: 0.01 ;chance of line insert
delrate: 0.01 ;chance of line delete
swaprate: 0.02 ;chance of line swap
dupline: 0 ;if insert, chance of dup line
incdec: 0.3 ;if data, chance of inc or dec
bignum: 0.5 ;if data, chance of big number
;code generation...
instructions: mov mov mov mov mov spl spl spl djn djn
instructions: add sne seq slt jmn jmz mod
modifiers: .i .i .a .b .f .x .ab.ba
modes: $$#@*<>{}
;bench parameters...
enablebench: yes ;yes to enable auto-bench/re-ins
benchrounds: 142 ;rounds used for benchmarking
savethresh: 95 ;percent of top score to save
reinsertmode: 2 ;0=none 1=top.red 2=from save
benchinterval: 500 ;bench every n iterations (0=none)
reinsertinterval: 5000 ;re-insert every n iterations (0=none)
benchdir: nanobm07 ;directory containing benchmark warriors
savedir: save ;directory to save warriors to
testdir: ;directory for single-test warriors (def.benchdir)
;other parameters...
initmode: 1 ;start warriors 0=first inst, 1=random
spthresh: 5 ;percent change before different color
infoline: evolved ;added to strategy line
threads: 4 ;# of processing threads
threadsleep: 0 ;# ms to sleep between threads
;end of ini file
Here's a warrior made with these settings... (not super-strong but
it's early)
;redcode
;name 11_16
;author mevo
;strategy evolved
;generation 556
;species 9940
;origin 58_12
;assert 1
spl.x #-21,>-31
mov >17,{-1
mov.i >10,<-2
mov <-3,{-4
djn.f $-2,<-16
end
;benchscore 151.97 (nanobm07 142 rounds)
Most of the INI settings should be obvious if familiar with corewar
and
evolvers. The mutation rate options permit specifying the chances of
changing various code elements and
performing code rearranging operations. If auto-bench is enabled, it
scans the save directory to determine the best score so far, and saves
warriors within a percentage of the top score specified by savethresh.
The reinsertmode option enables and specifies periodic re-insertion of
either the top-scoring warrior or a randomly-selected saved warrior
back into the soup. The interval settings specify the number of soup
battles (approximately) between benchmarks and re-insertions. The
initmode setting determines how the warriors start, 1 for random or 0
for zeroed arrays which with the above instruction settings starts new
warriors with lines of nop.i $0,$0. Warrior length is variable, the
maximum evolved length is set by maxsize. Warriors always begin
executing at the first line. The spthresh setting determines the
percentage of instructions that can change before picking a new display
color, threads determines how many copies of pmars it attempts to run
at once, threadsleep determines the delay between each thread launch
and can be used to dial back CPU usage for better cooling or system
response while evolving.
The evolving method used by MEVO is similar to my other evolvers...
initialize
do
randomly pick two nearby warriors
battle them in pmars and collect scores
copy the winner over the loser while making random changes
loop until stopped
The multi-threaded version just launches multiple copies of the key
sequence with additional code to make sure they don't pick the same
warriors or otherwise interfere with each other. This is a very simple
way to evolve but it resembles how evolution occurs in nature and in my
experience produces superior results compared to other relatively
simple methods. With a grid-based soup, "nearby" means adjacent. When
battling only the winner matters, in the case of a tie the program
considers the first warrior to be the loser. When making random
changes it's important to not change too much, even allow some perfect
copies, but provide a chance of making many changes at once for more
complex advancements in a single generation. The algorithm doesn't
impose strict evaluation, so weak code has a chance to survive even if
surrounded by stronger warriors. This is important as new forms (like
papers) usually start out weak and usually require a few generations to
be able to defeat simpler but initially stronger warriors (like
stones). But if the form remains weak it will be consumed. Islands of
similar code develop as warriors replicate, where they can work out
strategies while other areas of the soup are working out different
strategies. Eventually the soup will mostly fill with a dominant
strategy but this algorithm improves diversity by providing an
opportunity for different strategies to develop.
9/14/09 - whew! I had to pull the previous packages after
discovering several unnoticed bugs... the settings in the included nano
INI worked but other options including the coresize 800 defaults were
broken. Hopefully it's straightened out now, seems to be better than
ever with the new interface, but I don't dare package it yet.
Terry Newton (wtn90125@yahoo.com)