APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

Goal seeking code

(Traditional format)

Wed Jan 12 00:02:26 2005 Goal seeking code
Posted by Tony Lawrence
Referencing: /Blog/B1226.html

This is an extremely unsophisticated program that demonstrates evolutionary DNA. You start by passing it a string:


./goal.pl Goal
 

The goal, starting from two strings "abcdefg" and "gfedcba", is to end up with your string duplicated four times: "GoalGoalGoalGoal". It does this by having children that are produced by mating with other strings in its memory. The child produced is just position one from string1, position two from string two, etc. So starting with "abcdefg" and "gfedcba", the first "child" would be "afcdebg". However, random mutations are introduced to the children. Each string is then scored, and the highest scores get to reproduce again. This reproduction pattern tends toward similar parents producing similar children - much like real life.

Mutations here are more apt to change the case of a letter than change it outright, but they are aggressive. You can easily change these factors to see how they affect the results. So that the string can grow, letters can be doubled, though they can also be taken away. The strings are held to a maximum size of four times the length of your goal string, and the total number of strings is trimmed to a small number so that this doesn't grind away for hours and hours sucking up cpu and memory..

It also does a little Permian Extinction now and then - which brings some fresh faces into the gene pool. Simply, we just randomly wipe out up to 75% of the strings on each generation, thus giving the new strings more chance to mate.

It doesn't take long for the first patterns to appear. It's interesting that "bad" strings often hold good positions for quite a while, but by a few thousand generations, the top scoring strings have at least one correctly anchored pattern.

It may take quite a while for better patterns to emerge. People who prefer to believe that life came from "intelligent design" sometimes forget how much time the primitive life of this planet had for its patterns to emerge. For hundreds of millions of years, simple living things reproduced over billions and billions of generations, gathering improvements as mutations failed and succeeded or made no difference at all. Bacteria might reproduce every twenty minutes. How many generations is that over a few hundred million years? What improvements can come about? Well, we are the results of just that. Life on this planet had many more generations available than you are likely to have the patience or time for on your computer.

I've recently been re-reading John McPhee's Annals of the Former World. I have to keep a large dictionary beside me while reading this, which amuses my wife greatly as she doesn't often see me nonplussed by mere words. Nevertheless, it's a great read, and while it is starting to get a bit out of date with current knowledge, I still enjoy it. McPhee notes that for most of us, the concept of geographic time is beyond real comprehension - 100 million years or so is just a number, it's nothing that we can really appreciate. Analogies like "if all of the earth's history were the length of your arm, all of human history could be erased by a fingernail file" don't really help, but it is these incredible lengths of time that made complex life possible.

The scoring here rewards simple goals, and gives higher scores for better matches. The scoring is deliberately imperfect - though it is programatically interesting:

sub score {
my $trg=shift;
my $len=length($Goal);
my $lct=lc($trg);
my $s=0;
my $G=$Goal;
my $g=lc($Goal);

# This looks for matches in position.  If the goal is "Abc" and the 
# target is "Xyz-Mno-123-456", this compares "abc" to "xyz", then
# "Abc" to "Xyz, then "ab" to "xy" (and "Ab" to "Xy"), 
# "a" to "x", "bc" to "yz", "b" to "y", and "c" to "z".
#
# It then moves to "abc" compared to "mno", "ab" to "mn", etc.
#
# The longer the comparison, the more score for a match.  "Abc" 
# compared to "abc-xyz-mno-sdf" will match all of the case insensitive cases 
# for "Abc" to "abc"  and will match "bc" to "bc" and "c" to "c" 
# for both insensitive and case sensitive tests.

for ($y=0;$y< length($trg);$y+=$len) {
   for ($z=0;$z<$len;$z++) {
      for ($zz=$len-$z;$zz>0;$zz--) {
          $s+=(10 * $zz) if substr($g,$z,$zz) eq substr($lct,$y+$z,$zz);
          $s+=(20 * $zz) if substr($G,$z,$zz) eq substr($trg,$y+$z,$zz);
          }
   }
 }
if ( $trg eq "$Goal$Goal$Goal$Goal") {
  print "$s $trg - DONE!\n";
  exit 0;
}
return $s;
}
 
Crude as it is, the strings slowly get closer to the desired result. Just leave it alone for a few hours, and you should see the strings building toward the goal. There will be imperfect versions along the way; for example this has "Fog" as its target:
 -----Fog----(Population 14)----
01 of generation 00000550:  FJg-BCg-bbb-bAA 90
02 of generation 00000550:  CAg-ggg-AAb-bgg 90
03 of generation 00000550:  FBg-ggg-bbb-AAA 90
04 of generation 00000550:  FJg-ggg-bbB-AAA 90
05 of generation 00000550:  rOg-ABg-bAb-bBb 90
06 of generation 00000550:  rOg-ABg-bbb-bAb 90
07 of generation 00000550:  Ccg-Bgg-AAb-bgg 90
08 of generation 00000550:  AAA-AAb-bbb-bbB 0
09 of generation 00000550:  ABa-Abb-bbb-bbb 0
10 of generation 00000550:  ABA-abb-bbb-bbb 0

After just 550 generations, this is off to a good start.

Shorter strings of course get good patterns early.

 -----Xy----(Population 24)----
01 of generation 00000625:  XG-XC-AB-XA 90
02 of generation 00000625:  XG-XB-AB-XC 90
03 of generation 00000625:  XD-XB-AB-XA 90
04 of generation 00000625:  XA-XA-BB-XA 90
05 of generation 00000625:  XB-XC-AB-XA 90
06 of generation 00000625:  XG-XA-CB-XB 90
07 of generation 00000625:  XG-XB-CB-XA 90
08 of generation 00000625:  Aa-Xb-AB-XB 60
09 of generation 00000625:  Aa-Xb-AA-XB 60
10 of generation 00000625:  Aa-Xb-BC-XA 60

This has more partial matches, and is well on its way to perfection. As it gets closer, though, changing becomes harder, because most changes won't score well enough to beat these already good scores. The change has to be more beneficial, which is just as it works in the real world: succesful organisms tend not to evolve quickly.

Of course, we get much quicker results for a single letter goal:

 -----M----(Population 22)----
01 of generation 00001025:  M-M-F-C 60
02 of generation 00001025:  B-M-M-C 60
03 of generation 00001025:  M-M-C-D 60
04 of generation 00001025:  M-M-G-C 60
05 of generation 00001025:  O-M-M-B 60
06 of generation 00001025:  M-M-A-E 60
07 of generation 00001025:  M-M-F-A 60
08 of generation 00001025:  D-M-C-A 30
09 of generation 00001025:  C-B-M-A 30
10 of generation 00001025:  A-M-C-A 30
At Generation 1034 with score 120 MMMM - DONE!
 

It is is fun to play with various mutation/extinction schemes just to see what happens, though I don't think we have any scientific validity here. This was fun to write, and fun to play with, but it's not anything like the Avida project.

Perl source for all the other off-balance folks.

#!/usr/bin/perl
# Tony Lawrence http://aplawrence.com
srand;
$Goal=shift @ARGV;
$debug=shift @ARGV;
chomp $Goal;
$Glen=4 * length($Goal);
$generation=0;
$alive{'abcdefg'}="0";
$alive{'gfedcba'}="0";
$last[0]="abcdefg";
$last[1]="gfedcba";
while (1) {
  $population=30;
  foreach ( keys %alive ) {
    $alive{$_}=score($_);
  }
  mate();
  $generation++;
  killem();
  show();
}
sub killem {
 my $killoff=rand($population)+ $population/4;;
 foreach (sort scoresort keys %alive) {
   $killoff--;
   delete $alive{$_} if $killoff <= 0 ;
 }

}
sub show {
 myprint() if (not ($generation % 25));
}

sub scoresort {
  $alive{$b} <=> $alive{$a};
}

sub mate {
  my @babies=();
  my %mated;
  my %copy=%alive;
  my $matecounter=15;
  my $mykey;
  my $mate;
  foreach (sort scoresort keys %alive) {
   $mykey=$_;
   $mate=$_;
   $counter=15;
    foreach (keys %copy)  {
      $mate=$_;
      $newvalue=merge($mykey,$_);
      $newvalue=substr($newvalue,0,$Glen) if (length($newvalue) > $Glen);
      push @babies,$newvalue;
      last if not $counter--;
    }
 last if not $matecounter--;
  }
  foreach (@babies) {
    $key=$babies[$_];
    $tkey=$key;
    for ($babycounter=65;$babycounter <91;$babycounter++) {
      last if not $alive{$tkey};
      $where=rand(length($tkey));
      substr($tkey,$where,1)=chr($babycounter) ;
  }
  $parents{$tkey}="$mykey|$mate";
  $alive{$tkey}=score($tkey);
  #print "\tNew baby $tkey $alive{$tkey}\n";
}
}

sub merge {
 my $a=shift;
 my $b=shift;
 my $alen=length($a);
 my $blen=length($b);
 my $newstring="";
 my $longest=$alen;
 my  $swap=rand($alen);
 my @a=split //,$a;
 my @b=split //,$b;
 $longest=$blen if ($alen > $blen);
 # mutation actually starts up here
 $a[0]=$b[2] if $swap < 2;
 $a[1]=$b[2] if $swap < 2;
 $b[3]=$a[2] if $swap < 2;
 $b[2]=$a[2] if $swap < 2;
 while ($longest > 0) {
   $swap=rand(10)+1;
   if ($swap > 7) {
     $newstring .= $a[$longest];
     $newstring .= $b[$longest];
     $longest--;
     next;
   }
   if ($swap <4 ) {
     $newstring .= $b[$longest];
     $newstring .= $a[$longest];
     $longest--;
     next;
   }
   $a[longest]=uc($a[longest]) if $swap == 5;
   $b[longest]=uc($b[longest]) if $swap == 6;
   $newstring .= mutate($a[$longest]);
   $newstring .= mutate($b[$longest]);
   $longest--;
}
 chomp $newstring;
 $newstring =~ tr/A-Za-z//cd;
 $newstring="$a$b" if not $newstring;
 $newstring =~ tr/A-Za-z//cd;
 return $newstring;
}

sub mutate {
 my $string=shift;
 my $x=rand(1000);
 my $y=rand(25);
 $string=chr($y+65) if not $string;
 my $aval=rand(7) - 3 + ord($string);
 $string=chr($aval) if ($x >600 and $x < 700);
 $string=chr($y+65) if ($x >700 and $x < 800);
 $string.=$string if ($x >500 and $x < 600);
 $string=lc($string) if $x < 500;
 $string=uc($string) if $x < 250;
 $string=" " if ($x < 12);
 return $string;
}


sub score {
my $trg=shift;
my $len=length($Goal);
my $lct=lc($trg);
my $s=0;
my $G=$Goal;
my $g=lc($Goal);
# This looks for matches in position.  If the goal is "Abc" and the 
# target is "Xyz-Mno-123-456", this compares "abc" to "xyz", then
# "Abc" to "Xyz, then "ab" to "xy" (and "Ab" to "Xy"), 
# "a" to "x", "bc" to "yz", "b" to "y", and "c" to "z".
#
# It then moves to "abc" compared to "mno", "ab" to "mn", etc.
#
# The longer the comparison, the more score for a match.  "Abc" 
# compared to "abc-xyz-mno-sdf" will match all of the case insensitive cases
# for "Abc" to "abc"  and will match "bc" to "bc" and "c" to "c" 
# for both insensitive and case sensitive tests.

for ($y=0;$y< length($trg);$y+=$len) {
   for ($z=0;$z<$len;$z++) {
      for ($zz=$len-$z;$zz>0;$zz--) {
          $s+=(10 * $zz) if substr($g,$z,$zz) eq substr($lct,$y+$z,$zz);
          $s+=(20 * $zz) if substr($G,$z,$zz) eq substr($trg,$y+$z,$zz);
          }
   }
 }
if ( $trg eq "$Goal$Goal$Goal$Goal") {
  print "At Generation $generation with score $s $trg - DONE!\n";
  exit 0;
}
return $s;
}
sub myprint {
my $len=length($Goal);
my  $icount=1;
my  $count=9;
my $currentpop=keys(%alive);
print " -----$Goal----(Population $currentpop)----\n";
 foreach (sort scoresort keys %alive) {
    my $key=$_;
    my $string=substr($key,0,$len);
    my $klen=length($key);
    for ($x=$len;$x<$klen;$x+=$len) {
       $string .= "-" . substr($key,$x,$len) ;
   }
   $string .= " (" . $parents{$key}. ")" if $debug;
   printf "%0.2d of generation %0.8d:  $string $alive{$key}\n",$icount,$generation;
   $icount++;
   last if not $count--;

}
}




Got something to add? Send me email.





Increase ad revenue 50-250% with Ezoic


More Articles by

Find me on Google+

© Tony Lawrence



Kerio Samepage


Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us





If you ask "Should we be in space?" you ask a nonsense question. We are in space. We will be in space. (Frank Herbert)

Never let a computer know you're in a hurry. (Anonymous)












This post tagged: