This posting and php code sample is about fascinating topic of Genetic Algorithms (GA) which simulate evolution using computer code to help find near-optimal solutions when dealing with problems that involve multiple disparate requirements. I recently began playing with Genetic algorithms (Genetic algorithms GA, for short) which are part of a class of evolutionary algorithms (EA) that make up a larger segment , machine learning software) as a way for me to learn them. I was inspired after reading a few excellent tutorials on machine learning here by Burak Kanber.
GA’s are good at finding near optimal solutions for a variety of combinatorial optimization problems, where you are trying to determine the best resource allocation strategy (money, time, volume , amounts, design, etc) given certain constraints (capacity, size , time, gravity etc.).
To see a neat visual example of this, check out BoxCar2d, they are using the javascript Box2D physics engine and attached a GA to help the computer evolve a design to produce a car/bike that with wheels will traverse a rolling terrain efficiently (making fast forward progress and not flipping over). When it first starts you get some odd design that fail instantly, but over successive generations the machine adapts and produces better designs. The entire simulation runs in your browser. You can see more examples of what people are doing with GA’s and how they are creating visualizations for them.
What fascinates me about GA’s is first ,GA’s give you a glimpse into machine learning, and you can almost see the machine learn in a way that you can understand, , second GA’s have lots of practical applications (see use case section below), each solution is novel and often times unexpected, and lets the machine work out the solution on its own, finally GA’s are actually an algorithm you can more easily get your head around, instead of more complex machine learning algorithms…
Basics of Genetic algorithms
The basic steps of a Genetic algorithm are as follows:
- Data Representation (the genes): Come up with a method to represent the data (data being the individual properties/characteristics that make up an individual element), these individual pieces of the data can be termed genes. Determining how to represent the genes is a big part of preparing your GA. The genes can be a sequence of binary values, string characters, or other array of elements, that point to more complex data structures. To get a practical idea of what this means, see how the folks at boxcar2d represented their genes.
- Individual (aka Chromosome): Individual (often referred to as the chromosomes) is one entity that consists of genes arranged in a particular sequence. It is this arrangement that will be used as the basis for the evolutionary process.
- Initialization – Create an initial population (of chromosomes/individuals). This population is usually randomly generated (randomly generated genes for each individual) and can be any desired size, from only a few individuals to thousands individuals. The population consists of individuals, and these individuals consist of genes.
- Evaluation (the fitness function) – Each member of the population is then evaluated and we calculate a ‘fitness’ (sometimes fitness can be a cost function , such as searching for minimum cost is the “best fitness”) for that individual. The fitness value is calculated by how well it fits with our desired requirements. How you calculate the fitness of an individual is the most important part of the algorithm as it guides the evolutionary process to the best answer.
- Selection – We want to be constantly improving our populations overall fitness. Selection helps us to do this by discarding the bad designs and only keeping the best individuals in the population. There are a few different selection methods but the basic idea is the same, make it more likely that fitter individuals will be selected for our next generation.
-
Crossover (aka reproduction) – During crossover we create new individuals by combining aspects of our selected individuals. We can think of this as mimicking how sex works in nature. The hope is that by combining certain traits from two or more individuals we will create an even ‘fitter’ offspring which will inherit the best traits from each of it’s parents.Actually GA’s are more efficient than real world reproduction as we’re already pre-selecting the top n individuals to mate, versus just having some desired individual mate with some less desirable random one.
-
Mutation – We need to add a little bit randomness into our populations’ genetics, otherwise every combination of solutions we created would be in our initial population, this would create less than optimal solutions and would eventually hit a local maximum situation, where we really have the best solution for a constrained gene pool. Mutation typically works by making very small changes at random to an individual genes. Doing this “expands” the gene pool and may produce potential (novel) individuals with improved fitness, this is much like mutation works in real biology.
- Repeat (the evolution part) – Now we have our next generation we can start again from step four (evaluation) until we reach a termination condition.
Our Simple Genetic algorithm. .. re-create a string phrase
In this example genetic algorithm I will ask the GA to re-generate the character string “A genetic algorithm found me!“. Obviously we know the answer, but the interesting part is watching the machine figure out this solution starting from a random string to the final answer, using the GA approach.
The first step is determining how to represent the data (the genes), this is the basis for the GA, representing the individual via genes which mimic some element of the data we’re trying to work on. In our example we’re going to represent the genes as characters (ASCII characters), each character may come from one of these 78 letters numbers and symbols:
$solution_phrase="A genetic algorithm found!"; public static $characters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#$%^&*()-+,. ';
but we could also have represented the genome as 0,1’s or perhaps as a linked list where the index points to a particular method or property . How we represent the genes is directly related to how we’ll evaluate the fitness of the individual later in the fitness function.
Genetic Algorithm PHP classes
I derived this PHP GA from similar examples in other OOP languages, breaking up each piece of the algorithm into classes. You could of course do this as procedural code and it would reduce the code size. But I chose to uses PHP classes it provides a little more flexibility to adapt this code to more realistic GA purpose.
Keep in mind there is a degree of flexibility the algorithm and you should tweak it to better fit your use case. GA’s algorithms will have many different strategies in dealing with mutation rates, crossover points, selection approaches, and evaluation functions, all these vary greatly depending on the solution you are looking for and how you would like the algorithm to perform.
Individual class
In this class we simply choose the data representation (genes) and randomly populate them for each individual (aka chromosome). There’s a few extra setters and getters but they may be superfluous.
class individual {
public static $characters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#$%^&*()-+,. ';
public $genes = array(); //defines an empty array of genes arbitrary length
// Cached fitness value for this individual
public $fitness = 0;
// Create a random individual
public function generateIndividual($size) {
//now lets randomly load the genes (array of ascii characters to the size of the array
for ($i=0; $i < $size; $i++ )
$this->genes[$i] = individual::$characters[rand(0, strlen(individual::$characters) - 1)];
}
//.. a few more setters and getters
}
Population Class
Is mostly a convenience class that holds all of our individuals together are a unit. Below is the most important method the constructor that builds the inital population.
/*
* Constructor Create a population
*/
function __construct($populationSize,$initialise=false)
{
if (!isset($populationSize) || $populationSize==0)
die("Must specify a populationsize > 0");
for ($i=0;$i<$populationSize; $i++)
$this->people[$i] = new individual(); //instantiate a new object
// Ininitialize population
if ($initialise)
{
// Loop and create individuals
for ($i = 0; $i < count($this->people); $i++) {
$new_person = new individual();
$new_person->generateIndividual(count(fitnesscalc::$solution) );
$this->saveIndividual($i, $new_person );
}
}
}
Algorithm class
This is the heart of this example, here we do all the steps of the GA for this example. It has mostly static methods that perform all the core GA functions.
For example in the selection class, I elected to use a Roulette Selection ( Select x random genes and pick the best one ) approach, or I could have used Tournament selection (chance of being picked depends on fitness, higher fitness greater chance).
It’s important to not just to choose the top n fittest otherwise you may only be limiting yourself to the local maximum when better solutions exist, remember the algorithm is searching for a near optimal solution, if you reduce the mutation properties and limit its selection pool you are inadvertently limiting its potential optimal solution.
Also note that the mutation rate 0.01 in my example and the cross-over point (uniform rate) also affect the performance of the algorithm.
<?php
/**********************************************************
/ Class geneticAlgorithm : Genetic Algorithms
/
/*****************/
require_once('individual.php'); //supporting class file
require_once('population.php'); //supporting class file
class algorithm {
/* GA parameters */
public static $uniformRate=0.5; /* crosssover determine what where to break gene string */
public static $mutationRate=0.20; /* When choosing which genes to mutate what rate of random values are mutated */
public static $poolSize=10; /* When selecting for crossover how large each pool should be */
public static $max_generation_stagnant=200; /*how many unchanged generations before we end */
public static $elitism=true;
/* Public methods */
// Convenience random function
private static function random() {
return (float)rand()/(float)getrandmax(); /* return number from 0 .. 1 as a decimal */
}
public static function evolvePopulation( $pop) {
$newPopulation = new population($pop->size(), false);
// Keep our best individual
if (algorithm::$elitism) {
$newPopulation->saveIndividual(0, $pop->getFittest());
}
// Crossover population
$elitismOffset=0;
if (algorithm::$elitism) {
$elitismOffset = 1;
} else {
$elitismOffset = 0;
}
// Loop over the population size and create new individuals with
// crossover
for ($i = $elitismOffset; $i < $pop->size(); $i++)
{
$indiv1 = algorithm::poolSelection($pop);
$indiv2 = algorithm::poolSelection($pop);
$newIndiv = algorithm::crossover($indiv1, $indiv2);
$newPopulation->saveIndividual($i, $newIndiv);
}
// Mutate population
for ($i=$elitismOffset; $i < $newPopulation->size(); $i++) {
algorithm::mutate($newPopulation->getIndividual($i));
}
return $newPopulation;
}
// Crossover individuals (aka reproduction)
private static function crossover($indiv1, $indiv2)
{
$newSol = new individual(); //create a offspring
// Loop through genes
for ($i=0; $i < $indiv1->size(); $i++) {
// Crossover at which point 0..1 , .50 50% of time
if ( algorithm::random() <= algorithm::$uniformRate)
{
$newSol->setGene($i, $indiv1->getGene($i) );
} else {
$newSol->setGene($i, $indiv2->getGene($i));
}
}
return $newSol;
}
// Mutate an individual
private static function mutate( $indiv) {
// Loop through genes
for ($i=0; $i < $indiv->size(); $i++) {
if ( algorithm::random() <= algorithm::$mutationRate) {
$gene = individual::$characters[rand(0, strlen(individual::$characters) - 1)]; // Create random gene
$indiv->setGene($i, $gene); //substitute the gene into the individual
}
}
}
// Select a pool of individuals for crossover
private static function poolSelection($pop) {
// Create a pool population
$pool = new population(algorithm::$poolSize, false);
for ($i=0; $i < algorithm::$poolSize; $i++) {
$randomId = rand(0, $pop->size()-1 ); //Get a random individual from anywhere in the population
$pool->saveIndividual($i, $pop->getIndividual( $randomId));
}
// Get the fittest
$fittest = $pool->getFittest();
return $fittest;
}
} //class
?>
FitnessCalc Class (Evaluation function)
This class has mostly static functions and its purpose is to provide the evaluation function. The fitness function (aka evaluation function) is perhaps the most important part of the GA , as it is what guides the GA towards its solution. Any solution that you find via a GA is only as good as its fitness(Evaluation ) function. How we create a fitness function depends a lot on the type of data (genes) that we are dealing with. So the fitness function is very much dependent on the data we are representing. In this example the fitness function is simple. We’ll use a modified cost function (lower cost= better fitness), in other cases you may be looking for higher (maximum values).
In our simple example we choose to create a minimum cost function for the fitness function that is we evaluate each character in the string and compare is ASCII code value, then we calculate the difference from the solution character, and add them up. The strings with the lower values differences (least cost) are better… then this fitness value is assigned to the each individual’s fitness variable.
// Calculate individuals fitness by comparing it to our candidate solution
// low fitness values are better,0=goal fitness is really a cost function in this instance
static function getFitness($individual) {
$fitness = 0;
$sol_count=count(fitnesscalc::$solution); /* get array size */
// Loop through our individuals genes and compare them to our candidates
for ($i=0; $i < $individual->size() && $i < $sol_count; $i++ )
{
$char_diff=0;
//compare genes and note the difference using ASCII value vs. solution Ascii value note the difference
$char_diff=abs( ord($individual->getGene($i)) - ord(fitnesscalc::$solution[$i]) ) ;
$fitness+=$char_diff; // low fitness values are better,
}
//echo "Fitness: $fitness";
return $fitness; //inverse of cost function
}
Finally make use of all these classes here in the GA.PHP file which dos all the essential steps and provides a basic console UI.
// Create an initial population
$time1 = microtime(true);
$myPop = new population($initial_population_size, true);
// Evolve our population until we reach an optimum solution
while ($myPop->getFittest()->getFitness() > fitnesscalc::getMaxFitness())
{
$generationCount++;
$most_fit=$myPop->getFittest()->getFitness();
$myPop = algorithm::evolvePopulation($myPop); //create a new generation
if ($most_fit < $most_fit_last) //display only generations when there has been a change
{
// echo " *** MOST FIT ".$most_fit." Most fit last".$most_fit_last;
echo "\n Generation: " .$generationCount." (Stagnant:".$generation_stagnant.") Fittest: ". $most_fit."/".fitnesscalc::getMaxFitness() ;
echo " Best: ". $myPop->getFittest();
$most_fit_last=$most_fit;
$generation_stagnant=0; //reset stagnant generation counter
}
else
$generation_stagnant++; //no improvement increment may want to end early
if ( $generation_stagnant > algorithm::$max_generation_stagnant)
{
echo "\n-- Ending TOO MANY (".algorithm::$max_generation_stagnant.") stagnant generations unchanged. Ending APPROX solution below \n..)";
break;
}
} //end of while loop
//we're done
$time2 = microtime(true);
echo "\nSolution at generation: ".$generationCount. " time: ".round($time2-$time1,2)."s";
echo "\n---------------------------------------------------------\n";
echo "\nGenes : ".$myPop->getFittest() ;
echo "\nSolution: ".implode("",fitnesscalc::$solution); //convert array to string
echo "\n---------------------------------------------------------\n";
Termination condition: knowing when to finish
Determining when the GA has “Solved” the problem varies with the data and fitness function. In our example we’re looking for a particular string pattern and we know when to end, so we can tell the GA to keep working until it finds it. In many GA’s solutions you don’t know what the optimal solution is, in these cases, you need to settle for a near optimal, and this is where a termination condition comes into play.
Typically your going to terminate a GA when the max-fitness (or min cost in our case) of a particular population hasn’t changed in many generations, that is if the best fitness any one individual hasn’t changed at all, we can say we have found a near-optimal solution. You may also say because of time constraints you want to end after n generations, regardless.
Limitations of GA .a few things to watch out for..
As you will see, changing the number of genes in each individual , or changing the complexity of the fitness function will have a big impact on how fast and how accurately the GA will work. For example in our “Hello Genetic Algorithms!” example above it takes on average about 300-600 generations (~ 30 seconds) to reproduce the string exactly . If the string (genes) were longer or if the fitness function was more complex , it would take much longer or may terminate before the optimal solution is found.
GA’s aren’t applicable to all AI-type problems, they are really best at solving combinatorial optimization problems , when there may not be time to define a mathematically precise solution. However note, GA’s are onyl one of the many tools in the AI /machine learning toolkit. Also keep in mind because GA’s require a relatively long iterative process (the evolution steps relative to other AI algorithms) they may not be applicable to near real-time solutions, when the computation time to solve the problem exceeds the problem characteristics.
For example if your trying to determine optimum fuel-mixture for an engine that takes into account altitude. So say the car is going up or down a hill (altitude changes), the time to perform the GA for Altitude xxx, will be obsolete because the computation time may be longer than the real-time problem description.
Demo
I have created two demo’s both are included in the .zip. The one shown below is web page that calls the php server that runs the GA and the results are periodically streamed to the page, using server side events (javascript: EventSource)
The second demo (ga.php) in your command console. You’ll have to download the zip, extract it and run it from the command line where a php interpreter is available .(where php is available)
$ php ga.php
Download
- GitHub Code URL here: https://github.com/acbrandao/PHP/tree/master/ga
- Download the PHP class files here: ga_string_simple.zip
Applications of Genetic Algorithms – typical use cases
While the example above is a very simple basic one, I think that it should plant a seed in your mind that allows you to see how you can apply this programming technique to a much wider array of problems. GA’s are particularly adept at solving some of the following programming problems.
- The knapsack problem (combinatorial optimization problems) : GA’s are good at finding a near optimal solutions for this variety of combinatorial optimization problems, where you are trying to determine the best resource (money, time, volume , amounts, etc) allocation strategy given certain constraints. While certain types of problems can be solved optimally using techniques like linear programming, not all can and often times adjusting a GA to provide a near-optimal solution is faster and more beneficial than searching for the ideal solution.
- Traveling Salesman problem (and other similar NP-Hard problem): The TSP problem states: Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city . We can use GA to help solve or at least provide a near-optimal solution for certain classes of graph algorithms like the Traveling Salesman problem.
- Engineering and Physics: Here’s a neat example of using GA’s using Lego blocks and physics to generate interesting designs.
- Wide variety of scheduling and timetable problems. The list goes on , here’s at least a nice list of 15 real world uses of genetic algorithm here.




