GATSS Information

Information
This document contains general information about GATSS.

Background
Genetic Algorithms
The Traveling Salesman problem
GATSS Algorithms
System Overview
References

A not complete test analysis of the behavior of GATTS is also available.
If you want to see how things are done in more detail, feel free to take a look on the source files.

Background

GATSS is a Genetic Algorithm based solver of the Traveling Salesman problem written in GNU C++ linked to a user interface in HTML via CGI-script. It does nothing revolutional to the science of GA, it is just a standard algorithm enclosed in a user friendly interface. Maybe it can be used as a tool in teaching the concepts of Genetic Algorithms.

This program and its documentation is the result of an exercise in writing some kind of application related to the science of Artificial Life. The exercise is an examination part of the course "Artificial Intelligence II" given by the Department of Computing Science at Umeå University, Sweden.

The Story
When I first heard about Genetic Algorithms, I was fascinated by the fact that for once, the nature was the role model. That something as formal and abstract as Computer Science could learn from nature.
I searched the web for information on GA:s and found several interesting applications, general GA problem solvers, as well as GA software engineering environments.

One of the applications i found was 'DOUGAL' written by Brett S. Parker in 1993. It is a GA demonstration program written in Pascal for MS-DOS solving the Traveling Salesman problem (hereafter called TSP). I thought that the TSP was a good example in showing how GA works. It was a well known graph-theoretical problem and the results could, obviously, be nice visualised.
GATSS is very similar to DOUGAL. The differences are mainly that GATSS is implemented in C++ and has a HTML user interface.
So, thank you Brett, I hope you don't mind!.

When I had decided what kind of problem to solve (TSP) and in what language (C++), I searched for software engineering tools to speed up development. My first stop was not beautiful. I won't mention the name of the system here but it was ugly. Bad written code and bad written documentation.

Later on I ran into a C++/perl GA software engineering environment called 'GAGS' written by professor J.J. Merelo, GeNeura Team at the Department of Electronics and Computer Technology, University of Granada, Spain. I had difficulties in compiling GAGS for two reasons, one: I got the latest beta-version (with link bugs) at the time, two: The system had never been compiled at the UNIX-platform I used. Mr. Merelo was very helpful, and after some email-correspondance, the system was up and running.

Since I'm no C++-guru, and Merelo really uses the language at it's full extent, together with the fact that I'm new to the field of GA, I had difficulties in understanding the code. I finally decided to create the code I needed from scratch instead. No shadow on the GAGS-system though. It is a well written general C++-library combined with perl scripts, with program examples and a nice user interface. The documentation is at the time a little poor (at least the programmers manual) but Merelo says he's working on it.
Thank you J.J. for the help in compiling GAGS! Even if I couldn't use the system itself, it has inspired my own code writing.

When the actual coding was finished, phase two of this project was to be done: Analysis of the different parameters in the GA, i.e. what effect they have on the evolution process. This is greatly simplified with a good user interface. To execute the original program, the command line would look like this:

> travel 100 10 1.0 0 0.5 1 0 5 0

resulting in

Using seed: 799759313
Data:
[0] average: 66.8872    best: 52.477
[1] average: 58.9228    best: 48.2341
[2] average: 56.191    best: 48.2341
[3] average: 54.6087    best: 48.2341
[4] average: 53.034    best: 48.2341
[5] average: 51.8041    best: 48.2341
[6] average: 50.38    best: 48.2341
[7] average: 49.4576    best: 48.2341
[8] average: 48.5745    best: 48.2341
[9] average: 48.4072    best: 48.2341
End of Data
Best solution:
[0]: [0]: Umeå (0,0)
[1]: [1]: Stockholm (-1,-5)
[2]: [4]: London (-8,-12)
[3]: [5]: Berlin (-2,-11)
[4]: [6]: Prag (-1,-14)
[5]: [3]: Moskva (9,-9)
[6]: [2]: Helsingfors (3,-4)
[7]: [7]: Umeå (0,0)
Distance:    48.2341

The difficulties in keeping track of which of the parameters that was corresponding to which part of the GA made me dizzy. I decided to create a better user interface using HTML.
Here, WWW-guru Daniel Silén, helped me out. He wrote the CGI-script that links the user interface to the program. Thank you Daniel!

Genetic Algorithms

History
Genetic Algorithms were invented to mimic some of the processes observed in natural evolution. Many people, biologists included, are astonished that life at the level of complexity that we observe could have evolved in the relatively short time suggested by the fossil record. The idea with GA is to use this power of evolution to solve optimization problems. The father of the original Genetic Algorithm was John Holland who invented it in the early 1970's.

The Basic Algorithm

Initialize a population of chromosomes.
Evaluate each chromosome in the population.
Create new chromosomes by mating current chromosomes; apply mutation and crossover as the parent chromosomes mate.
Delete members of the population to make room for the new chromosomes.
Evaluate the new chromosomes and insert them into the population.
If time is up, stop and return the best chromosome; if not, go to 3.

If all goes well throughout this process of simulated evolution, an initial population of unexceptional chromosomes will improve as parents are replaced by better and better children. The best individual in the final population produced can be a highly evolved solution to the problem.

The most important parts of a GA is crossover, mutation, population deletion, and chromosome fitness evaluation.

Crossover

Mutation
The purpose of mutation is to reintroduce divergence into a converging population. This is a tool in avoiding local maxima, which is a common problem in stochastic algorithms. The mutation can be thought of as an error in the reproduction process. The thing is that this error is very important since it is the only possible way to create truly new chromosomes (crossover uses old gene material).

Mutation is applied in combination with crossover.

Population Deletion
This GA part defines how new chromosomes will be put into the existing population.
In general this can be done in at least two ways. You can either put the new chromosomes into an empty population and, when finished mating, make the new population the current population, or you can update the current population continuosly during the mating process, allowing new chromosomes act like parents to children in their own generation!
The former deletion technique is called 'Delete All' and the latter 'Steady Delete'. Both these techniques are provided in GATSS.

Fitness Evaluation
The evaluation algorithm used is crucial to how well the GA manages to find a good solution. If, like in Traveling Salesman, the structure of a good solution is well known (the sum of the distance between nodes shold be as small as possible), it is quite easy to construct a suitable algorithm. There are, however problems that are more tricky to give a good fitness evaluation algorithm.

The fitness evaluation of chromosomes is very problem dependent.

Data Structures
The data structures used in GA are usually rough simplifications of their natural prototype. Genes, chromosomes and populations are designed hierarchically somewhat like this:

The genes are usually numerals like integers, floating point numerals or simple bits.

More information about Genetic Algorithms is available at The Genetic Algorithms Archive or in the comp.ai.genetic FAQ's.

The Traveling Salesman problem

The Traveling Salesman problem (TS hereafter) is a classical graph-theoretical problem. It involves a traveling salesman who has to visit each of the cities in a given network before returning to his starting point, at which time his trip is complete. The algorithmic problem asks for the cheapest route, namely for a closed tour that passes through each of the nodes in the graph and whose total cost (i.e. the sum of the distances labeling the edges) is minimal.

Its variants arise in the design of telephone networks and integrating circuits, in planning construction lines, and in the programming of industrial robots, to mention but a few applications. In all of these, the ability to find inexpensive tours of the graphs in question can be quite crucial.

TS is a NPC (Nondeterministic Polynomial-time Complete) problem which in short means that it is unsolvable with a standard linear algorithm if the problems complexity is large enough. TS can be solved with a very fast computer when the number of nodes are up to, say 50. If the graph is larger, a non-linear algorithm is needed. GA is one alternative.

GATSS Algorithms

In this section I will discuss the GATSS specific algorithms. Since GATSS is a true GA, the execution algorithm outlined in the general "Genetic Algorithms" section above applies to the execution cycle of GATSS.
The algorithms that is specific for GATSS are found within the larger steps of the overview algorithm and are implied by the problem at hand, the Traveling Salesman problem.

Crossover
The crossover types used in GATSS are called Fixed Position and Fixed Position Sequential:

Fixed Position
The genes of parent 1 are selected at random and copied into the offspring at exactly the same location as they appear in the parent gene. Thus, both order and position of the genes in parent 1 are preserved.

The genes that aren't selected from parent 1 are copied from parent 2 to fill out the empty spaces in the offspring. The order of the genes in parent 2 is preserved but the exact locations are, for obvious reasons, not.

Fixed Position Sequential
One at random selected part (consisting of consecutive genes) of the parent 1 chromosome is copied into the offspring at exactly the same location. The order and location is preserved.

Just as in 'Fixed Position' described above, the parent 2 chromosome is used to fill out the empty locations in the offspring. Order but not location is preserved.

Mutation
One at random selected part (consisting of consecutive genes) of the chromosome is permuted. In practice, this is done by switching location of the genes within the selected part.

Fitness Evaluation
The fitness is calculated in two steps. First, all chromosomes in the population is evaluated. Then, one of the three fitness algorithms are applied.

Evaluation
The genes in this GA application are nodes in a graph, and a chromosome can be regarded as a walk through this graph. To evaluate how good the solution is, the total length of the walk is calculated by summing the distances between each node (gene).
Fitness
The fitness calculation is based on the evaluation described above. Since the system uses a parent selection technique called 'Roulette Wheel Selection', all fitness calculation techniques starts with assigning the initial fitness value by inverting the evaluation value. By doing this, the chromosome with the shortest solution gets the highest initial fitness value, and a higher probability in becoming parent to new chromosomes.

Population Deletion
In this system, two different kinds are implemented:

Delete All

The "Elitism Value" parameter is used to assure that the best [x] chromosomes will survive the generational replacement.

Steady Delete
In this algorithm, every new offspring is replacing the worst chromosome in the population. Chromosomes that are members of the same generation can be eachother's parents! This is, of course, not biological viable but can have good effect in speeding up the population fitness convergence during evolution.

System Overview

GATSS consists of a C++ program, a HTML user interface and a CGI-script that links the two former parts together. I this section I will discuss the design of the C++ program in short.

The program consists of a main part and three classes: Gene, Chromo and Pop.

Gene
This class defines the 'atomic' parts of the GA.

class Gene
{
   public:
   Gene(char *_name, coord_t _x, coord_t _y,
        unsigned int _index, unsigned int _distVecLength);
   ~Gene();
   
   // Calculates Eucledian distance between this gene and _gene and returns the value.
   distance_t calculateDistanceTo(Gene& _gene);

   // Returns Eucledian distance between this gene and _gene.
   distance_t distanceTo(Gene& _gene);

   // Returns this gene's index in original chromosome pool.
   unsigned int getIndex() {return index;}

   // Returns this gene's name.
   char *getName() {return name;}

   // Prints gene information to stream.
   friend ostream& operator<<(ostream& outStr, Gene& gene);
   
   private:
   distance_t *distance; // distance vector
   unsigned int distVecLength; // length of distance vector
   char *name; // name of gene
   coord_t x,y; // gene's coordinates
   unsigned int index; // gene's index
};

A Gene consists of a name, x- and y-coordinates, a distance vector describing the distances to other genes (only calculated once at every program execution) and a gene index.

Possible operations on a Gene can be seen in the class declaration above.

Chromo
This class defines what a chromosome looks like.

class Chromo
{
public:
   Chromo(unsigned int _chromoLength, Gene *_genePool[]);
   ~Chromo();
   
   // Copy constructor.
   Chromo& Chromo::operator=(Chromo& original);

   // Chromosome initialization.
   void init();

   // Calculates (if necessary) and returns chromosome evaluation.
   eval_t eval();

   // Sets chromosome's modified evaluation (fitness).
   eval_t setModEval(eval_t _modEval) {return (myModEval = _modEval);}

   // Returns chromosome's modified evaluation (fitness).
   eval_t getModEval() {return myModEval;}

   // Returns chromosome's evaluation (fitness).
   eval_t getEval() {return myEval;}

   // Mates this chromosome with partner resulting in two new offsprings, child1 and child2.
   void mate(float _crossover, cross_t crossoverType, float _mutation, Chromo& partner,
             Chromo& child1, Chromo& child2);

   // Returns chromosome length.
   unsigned int getLength() {return chromoLength;}

   // Prints chromosome information to stream.
   friend ostream& operator<<(ostream& outStr, Chromo& chromo);
   
private:
   unsigned int chromoLength; // chromosome length
   eval_t myEval; // chromosome evaluation
   eval_t myModEval; // chromosome modified evaluation (fitness)
   Gene **genePool; // pointer to original gene pool vector
   Gene **myGenePool; // private gene pool vector

   // Keeping track of used genes during crossover.
   int Chromo::isUsed(Gene *gene, Gene *usedGene[], int noOfUsedGenes);

   // Performs crossover using parent1 and parent2 as templates.
   // This chromosome is assigned the result. 
   void crossover(cross_t crossoverType, Chromo& parent1, Chromo& parent2);

   // Performs mutation on this chromosome.
   void mutate();

   // Modified random generator.
   unsigned int myRand();
};

A Chromo consists mainly of a Gene vector and two evaluation values (one original and one modified by one of the fitness algorithms).

Possible operations, including crossover and mutation, can be seen in the class declaration above.

Pop
This class defines the population.

class Pop
{
public:
   Pop(int _popSize, unsigned int _chromoLength, Gene *_genePool[]);
   ~Pop();
   
   // Population initialization.
   void init();

   // Calculates (if necessary) fitness for each chromosome in the
   // population and returns the sum of the fitness values.
   eval_long_t fitness(fitness_t fitnessType, eval_t fitnessParam);

   // Creates a new population using this population as a parent pool
   // and returns a pointer to the new population.
   // If deletionType == deleteAll, the user of this class has to
   // delete this population (the new one is really a _new_ one).
   // If deletionType == steadyDelete, the old population is altered
   // and thus, requires no further action by the user.
   Pop *newPop(fitness_t fitnessType, eval_t fitnessParam,
               float crossover, cross_t crossoverType, float mutation,
               delete_t deletionType, int elitismValue);

   // Returns the fitness of the best cromosome in the population.
   eval_t bestFitness() {return myChromoPool[0]->getEval();}

   // Returns the modified fitness of the best cromosome in the population.
   eval_t bestFitnessMod() {return myChromoPool[0]->getModEval();}

   // Returns the average chromosome fitness of the population.
   eval_t fitnessAverage();

   // Returns the average modified chromosome fitness of the population.
   eval_t fitnessAverageMod() {return (eval / popSize);}

   // Returns the best chromosome in the population.
   Chromo& bestChromo() {return *myChromoPool[0];}
   
   Chromo **myChromoPool; // Chromosome vector
   // (has to be public for some reason I can't remember at the time beeing)

   // Prints population information to stream.
   friend ostream& operator<<(ostream& outStr, Pop& pop);
   
private:
   Gene **genePool; // pointer to original gene pool vector
   unsigned int chromoLength; // length of chromosomes in the population
   eval_long_t eval; // sum of all chromosome evaluation values
   int popSize; // number of chromosomes in the population

   // Returns a suitable parent using Roulette Wheel Selection.
   Chromo& getParent();

   // Sorts chromosome pool in fitness order.
   void quickSort(int start, int finish);
};

A Pop mainly consists of a Chromo vector.

Possible operations can be seen in the class declaration above.

Main
The major object for this part is to assign the GA the user parameters and to execute the GA. The GA execution loop looks like this:

//******************* Main Evolution Loop **********************

  Pop *myPop = new Pop(popSize, chromoLength, genePool);
  Pop *myNewPop;

  int k;

cout << "Data:" << endl;

  for (k = 0; k < generations; k++)
    {
      myPop->fitness(fitnessType, fitnessParam);
        cout << k << " "
          << myPop->fitnessAverage() << " "
            << myPop->bestFitness() << endl;

      myNewPop = myPop->newPop(fitnessType, fitnessParam, crossover,
                               crossoverType, mutation, deletionType,
                               elitismValue);
      if (deletionType == deleteAll)
        {
          delete myPop;
          myPop = myNewPop; 
        }
    }

References:

Parker, B. (1993). DOUGAL. Found somewhere in the web.
Davis, L. (19xx). What is a Genetic Algorithm?
Harel, D. (1987). Algorithmics: the spirit of computing. Addison-Wesley Publishing Company.

1 May 1995, Thomas Pederson.

Chromosome no:	1	2	3	4	5	6	7	8	9	10
Evaluation value:	125	113	092	037	022	018	009	008	005	003
Fitness:	125	113	092	037	022	018	009	008	005	003

Chromosome no:	1	2	3	4	5	6	7	8	9	10
Evaluation value:	125	113	092	037	022	018	009	008	005	003
Fitness (guard = 1):	123	111	090	035	020	016	007	006	003	001
Fitness (guard = 10):	123	111	090	035	020	016	010	010	010	010