Thursday, August 31, 2006

Using Genetic Algorithms for Optimization Problems

One of the most common things we humans face in our daily lives is the problem of optimization. viz what is the optimum solution for a problem using certain mix of resources that we have, and bearing in mind our objective(s) In Life we seldom find that there is only one solution to a problem, but in fact many possible choices for a solution. Here are some examples:

Which is the best route to travel from A to B, after defining what is ‘best’. {is it the shortest route in time, which again is different from the shortest route in distance, or is it the most economical route, the most scenic route etc}.

What is the minimum risk and maximum gain I can get from re-allocating the number of shares in each stock that I hold in a portfolio?

What combination of resources [land, labor, capital] will enable me to complete a certain construction project in a specified period of time.

What is the best way to draw up the School timetable using the variables we can juggle with viz the school hours Mon-Fri, the number of classes, the number of teachers, the number of classrooms.

This is a similar problem for an airline trying to draw up it's schedule, and must find the number of planes it must use, slotted in routes and times such that wasted time is minimized.

In all these examples, there may be several ‘optimum’ and it is a case of trade-offs among various combinations of inputs that generate the optimization paths. What we have is a complex Search space, where getting trapped in local optima is common, and finding a global optima is difficult. In quasi-Newtonian search techniques like gradient-descent, getting stuck in local optima is very common, because of starting with one random initialization vector, which may or may not fall at a suitable location of the optimization landscape. This issue also applies in the case of a stochastic search method like simulated annealing. Simulated Annealing can be said to be the ancestor of Genetic Algorithms, with it’s less constrained methodology, it’s specification of criterion for the progress of the optimization procedure and the application of random shocks. But it is still a parametric method where the estimation of parameters is a heuristic or to put it less kindly, a trial and error.

Genetic Algorithms borrows from Nature to find the optimum path[s]. A random vector population is first sowed into the search space. An ‘individual’ is a chromosome, which is made up of a string of genes. Each individual in this population is assembled from genes using various methods ranging from binary zeros and ones to various forms of alpha-numeric, with the only condition being that they must be able to be manipulated computationally for the purpose of breeding better individuals. Representation technique is the most challenging aspect of having a good GA process. Next are our criteria for “Fitness” so that we can run the GA through generations and only allow the fit individuals to survive. This is again a problem that calls for a lot of clear thinking else it will result in poor final results. It must able to produce a practical scoring and ranking system of a degree of clarity and fineness that will make deciding the selection of individuals qualifying for the next generation subject to ambiguity The advantages of GA’s over other methods are that you don’t have to make any assumptions about the shape of probability distribution of output, do the estimation of parameters, or even the choice of model input variables. Less significant variables will automatically be weeded out.

Once the random population is seeded, running the evolutionary process involves using methods to ensure good breeding:viz selection, mutation, crossovers. Mutation can be set to be at certain random intervals to throw up some pleasant surprises and rejuvenate the gene pool, and crossover is like combination of two splices taken from two individuals as in human marriage. The point for splicing can be specified. 'Election tournaments' can be held among family members, subjecting them to a fitness test, and allowing the fittest couples to proceed to the next step for reproduction. Elitism can be introduced. Evaluate all the members if the new generation and the past generation according to the fitness criteria. If the best member of the older generation is better than the best member of the newer generation, then this member displaces the worst member of the new generation and is thus eligible for selection in the coming generation.

GA’s enable us to do in a few minutes or hours, what Nature does in hundreds of years by allowing us to breed thousands of generations each fitter than the previous for finding an optimal solution. At this point it must be mentioned that just as in Nature the optimum solution can often involve taking long, convoluted , seemingly wasteful, illogical paths. But quite often we find that these paths[solutions] are the most robust. An example from a Complex Adaptive System such as a road or rail network will illustrate this more clearly. When master-planned from the beginning, the road/rail network of a City is usually very clean and logical in layout , like the grid of Avenues and Streets of New York City and Philadelphia, and the Interstate highways. But when the road/rail network is the result of evolution over 200 years, such the old county and trade routes of New Jersey and the subway of NYC, what we get looks like a tangled mess. Yet, these evolved networks can often get you to more places and 'tie' the whole region up better than a master-planned system of road and rail. It is 'robust' in the sense of being able to function even when one or more of the 'nodes' get knocked out.

Evolutionary GA’s mimic the open-ended co-evolutionary process that is Nature, even better, allowing for evolution of the GA parameters themselves such as the population size, probability of crossover, and mutation probability. Even the parameters for the Fitness criteria can evolve. Like the population of prey and predator, lion and wildebeeste, co-evolution takes into account the feedback from any step taken by one party, and changes the situation accordingly.

In the modeling of Financial markets with their mix of chaotic determinism, stochastic processes, elements of autocorrelation, trends, and cycles I think GA is a natural choice for optimization. When framed in a way that is suited to the objective, GA's can be used for classification problems too, leading to categories that are more 'natural' than Euclidean classification boundaries.


No comments:

Post a Comment