Dr. Thomas Weise

Why research in Computational Intelligence should be less inspired.

The inspiration gleaned from observing nature has led to several important advances in the field of optimization. Still, it seems to me that currently a lot of work is mainly based on such inspiration alone. This might divert attention away from practical and algorithmic concerns. As a result, there is a growing number of specialized terminologies used in the field of Evolutionary Computation (EC) and Swarm Intelligence (SI), which I consider as a problem for clarity in research. With this letter, I would like to formulate my thoughts with the hope to start fruitful debate.

1. Origin and Terminology

Before getting to the core, let me first exemplarily summarize (quickly and coarsely) the terminology and origin surrounding two algorithms, one from the area of EC and one from SI.

1.1. Genetic and Evolutionary Algorithms

The field of EC more or less originated with research on Genetic Algorithms (GAs) [1,2]. GAs are usually referred to as metaheuristics that mimic the process of natural evolution in order to solve optimization problems.

In a GA, candidate solutions (“phenotypes”) are usually encoded as bit strings of a fixed length, called genotypes in correspondence to the DNA in nature. If the candidate solutions are something else than bit strings, a genotype-phenotype mapping (GPM) takes place, an injective binary relation from genotypes to phenotypes. A tuple of genotype and corresponding phenotype is often called an ‘individual’.

A list of individuals (the population) is maintained and refined in an iterative procedure (where each iteration is called a generation). The objective function is usually called fitness function and often is subject to maximization. In the main loop of the algorithm, those strings that stand for candidate solutions of higher quality (with better objective values, i.e., higher fitness) are selected with a higher probability and those that represent bad solutions are (more likely) discarded.

The selected individuals enter the so-called mating pool and are allowed to produce offsprings. These new solutions are derived by either creating modified copies of single strings (mutation) or by combining two strings to a new one (crossover, recombination). Both search operators are usually randomized. This way, the list of solutions is filled again and the next iteration begins. This process is repeated in a loop until, e.g., a time limit is reached or sufficiently good solutions are found.

We can replace the bit string genotypes with basically any genotype data structure in this algorithmic paradigm. This generalized algorithm from is then called Evolutionary Algorithm (EA).

1.2. Ant Colony Optimization

Ant Colony Optimization (ACO) [3] is one of the most well-known SI techniques. It is solves problems that can be mapped to finding paths in graphs with weighted edges by mimicking the path-finding behavior of ants. Examples for such problems are TSPs and a variety of other VRPs.

The problem solving is done by repetitively constructing paths (candidate solutions) by letting simulated ants walk through the graph. An ant starts its journey at a given node and probabilistically decides which node to visit next. This decision is influenced by two factors: (1) the cost of going to the nodes, i.e., the weights of the edge from the current location to them, and (2) the simulated pheromone on the edges to the node. The ant iteratively chooses where to go until its path is completed. Several ants do this in each turn. The simulated pheromones are often maintained as floating point numbers in a matrix. Usually the ants producing the best-rated paths will update the pheromone matrix by increasing the pheromones along the edges they have travelled. This, in turn, increases the probability that subsequent ants make similar decisions. At the same time, existing pheromones are reduced by evaporation. Then the process is repeated, new ants walk through the graph.

2. Criticism

Two processes from nature have been successfully emulated as algorithms. These algorithms turned out to be quite useful in solving hard optimization problems. Although this itself is good, I believe that over the years, four problematic issues have arisen from this situation.

2.1. Incompatible Terminology and Redundancy

Each “inspired” optimization method has its own terminology. This makes it more complicated to discuss these methods without learning a lot of different vocabularies. Moreover, different vocabularies often use different words for the same thing so there are quite a lot of synonyms. A candidate solution in an EA called phenotype (or sometimes individual), in ACO it is called ant or path, in Particle Swarm Optimization (PSO) [4] it is called particle.

It seems that sometimes, the origins of optimization methods are valued over their algorithmic structures. Then, instead of the abstract computer science perspective, it is the inspirational source that dictates the terminology.

This can cause confusion and inconsistencies. In my opinion, for instance, an ACO basically is a special Estimation of Distribution Algorithm (EDA) [5]. An EDA is an optimization method that tries to build and update a probabilistic model of what good solutions look like. In each iteration, it will sample this model to generate new candidate solutions. The best generated candidate solutions will then be used to update the model. A pheromone matrix as used in ACO could be considered as such a model. The process of letting the ants find new paths guided by the pheromone matrix (the model) is nothing else than model sampling. Pheromone evaporation and update then are the model update. To the best of my knowledge, this similarity is not widely recognized or discussed – and I believe that this is because of the “language barrier”.

I would also argue that the language used in EDAs is clearer and more precise than the biological metaphor used in the field of ACO, without belittling the achievements of the ACO community. A language focused on the statistical, mathematical, and theoretical features of an algorithm automatically invites researchers to test more complex stochastic concepts, e.g., more complex model building and sampling approaches, which are entirely unrelated to biological processes and not so easy to get to when starting from biological idea.

If ACO would indeed be a special form of EDAs, then having two distinct terminologies for essentially the same thing would potentially impose a problem for researchers when looking for related work. If an improvement was discovered for ACOs, would it not be harder for researchers in the field of EDAs to see the relationship to their own work? Some research might even be carried out in duplication.

Now take a look on the Wikipedia page for metaheuristics [6] from February 20131. As you can see, there are more than 50 different algorithms inspired by different natural phenomena listed there and this is just a subset of the existing ones. How many different vocabularies will this entail? Whom amongst the readers has not yet reviewed a paper introducing a new nature-inspired algorithm together with a new vocabulary? I fear that it may become quite hard to see whether a new idea is truly new or already exists in an algorithmically identical form under a name that cannot be inferred from the algorithmic properties (such as Snapping Turtle Herding algorithm2).

2.2. Dissonance between Terminology and Reality

I furthermore perceive a dissonance between the inspiration based on which an algorithm was created and its actual (feasible) working version, as well as between the vocabularies in EC and SI and their actual meaning.

Obviously, it is neither the goal of GAs to simulate evolution nor is it the goal of an ACO to actually simulate ant behavior or stigmergy in any realistic way. Instead, the goal is to solve optimization problems. This leads to the fact that any part of the simulation of the natural process will be abandoned if it is not useful for optimization or simply too complicate to implement.

A historical example for when the aim of modelling nature closely does not help in optimization is the original selection method used in in GAs, Fitness Proportional Selection. Here, the expected number of offsprings of an individual is proportional to its fitness. This sort of resembles the ideas of fitness in nature and natural selection. It has several drawbacks [7] and thus may not always be a good idea. The algorithm will produce different results if, e.g., a constant offset is added to the fitness function, which does not change the nature of the optimization problem. In many cases, Fitness Proportional Selection is unsuitable and researchers often use a less nature-like to get better results.

We can also look at this from a top-down perspective: Natural evolution is generally considered to be undirected whereas GAs have a clear direction (minimization or maximization of the objective function). The original GAs do not simulate concepts such as geography, environment, population or social structure, age, time, bio-chemical, genetic, or molecular processes, although some of these aspects have, separately, been considered in later research. In ACO, all ants execute an identical behavior, all ants are the same (no queens, workers, etc.). Both ACO and GAs proceed in distinct iterations which solely depend on the state at the end of the previous iterations. These diversions from the natural paragon are necessary in order to achieve good results (and to avoid making the algorithms too complex).

If we look closely, we will furthermore find that industrial-level and state-of-the-art EC methods do not only omit simulating phenomena from nature that are not useful, they even introduce algorithmic concepts that have no relationship to evolution at all. They do so because good results are not enough, excellent results are sought for.

For several problems, the best EAs are Memetic Algorithms3 (MAs) [8], which incorporate local search methods. Often, data structures such as permutations, real vectors, or trees are used to represent candidate solutions, which cannot be understood as equivalents of DNA anymore. One of the best EC approaches for numerical problems is the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [9]. CMA-ES uses an approximated co-variance matrix to guide its search procedure. Sharing and niching penalize solutions in the population which are too similar to each other by changing their fitness in order to prevent an EA from converging to locally optimal solutions [10]. EAs for multi-objective optimization use methods such as Pareto Ranking or hypervolume metrics which can hardly related to evolution [11]. Several Evolution strategy algorithms encode not only the candidate solutions themselves but also the step widths to be used to create new solutions from them inside the individual records, i.e., they store a direction for mutation inside the individual [12]. ACOs are often modified to have minimum/maximum pheromone limits or “elitist ants” in order to achieve better results [3], pACO maintains a set of ants to define the pheromone [13]. None of these things have obvious counterparts in nature.

This raises questions such as: Why should we describe an algorithm that has a list of λ permutations, selects the best μ of them as input for unary and binary search operations in order to generate the next λ permutations in a way any different than this? If there is a divergence between the original biological meaning of a word and its meaning in our algorithm description, is it good to use it? May it not rather confuse beginners who are new to the subject and may make it harder for them to understand why the algorithm actually works. Maybe it makes them even think that it is good to closely simulate nature to solve an optimization problem?

2.3. Inspiration by Nature does not mean Good Optimization!

Inspiration from nature alone does not necessarily equal good optimization. Why do we not encounter “pure” GAs, ACO, or PSO in practical applications or as winners of optimization competitions? Because plain EAs or GAs can solve optimization problems to a certain degree, but often lose even to simple local search methods when compared to them thoroughly. Plain EAs are not really good optimization algorithms [14,15]. It seems that in order to utilize their full power, they must be modified, hybridized, and extended with problem-specific knowledge. In other words, the natural inspiration only carries so-and-so far in optimization.

It is not the goal of natural evolution to produce perfectly adapted species. Evolution has no goal. Species will survive if they are adapted sufficiently well, not necessarily perfectly well. It is not the goal of the ants to find the perfect shortest path between a food resource and the nest. A sufficiently short path discovered in reasonable time will do. Otherwise, this would mean that biological processes could solve NP>-hard problems in sub-exponential time – and although this would be awesome, it seems unlikely.

There is no reason to assume that being inspired by nature is enough to obtain better optimization methods than with purely mathematical approaches. Thus, teaching the “inspired-by-nature” theme to many PhD students may pull research into a wrong direction. Instead of focusing on the algorithmic and computer science perspective, much research energy may be spent in trying to develop algorithms inspired by baboons, squid, and the mating behavior of little crakes2. It is probably not a good idea to just simulate an arbitrary process in nature in order to obtain an optimization method. This may actually hurt the EC and SI research community in the long term.

2.4. It sounds unprofessional.

And this hurting begins at the point where it makes research sound like quackery. People in the industry are probably willing to apply algorithms that seem to be mathematically sound, rigorously researched, and professionally presented. But even years of thorough research may become useless in the moment an interested engineer sees “Squid Mating Algorithm”2 on the very first slide of a presentation.

3. What can we do?

3.1. Using Inspiration-Neutral Terminology

If a terminology has become folklore, it becomes very hard to go against it. If publishing a paper at an EC-related conference, one has to use the corresponding vocabulary to some degree. However, there are subtle changes to the wording that can help to improve clarity. Some more general, formal terms can be used to describe our works. The introduction part of the (2014 version of the) Wikipedia page on PSO [16] is doing very well on using an abstract terminology, for example.

Here I attempt to list some inspiration-neutral terms, most of which are actually quite common and frequently used. Maybe these could be used more consciously and more often.

Nature-Inspired TermPotential Alternative(s)
GenomeSearch Space
GenotypePoint in Search Space
Phenome?Solution Space
PhenotypesCandidate Solution, Solution, Point in Solution Space
SelectionSelection (term is inspiration-neutral)
GenerationIteration
Creation?Nullary Search Operation
MutationUnary Search Operation
Crossover, RecombinationBinary Search Operation
Ant (in ACO)Depending on situation: Point in search space or candidate solution
Pheromone MatrixModel (à la EDA)
Genotype-Phenotype Mapping?
Individual?
Population?
Memetic AlgorithmHybrid Algorithm

For some terms like genotype-phenotype mapping, population, and individual, good “inspiration-free” replacements still need to be found.

3.2. Clear Teaching

I admit using the bio-inspired terminology in my own teaching as well. One needs to make the connection to literature in one way or another. However, GAs are not the first subject in my course. It instead begins by introducing the components of metaheuristic optimization in a way that is formal and as “inspiration-free” as possible. And I use the above terms.

3.3. Clear Algorithm Descriptions in Research

While we can change our own wording and notations to make our own work clearer, we may also influence others to do so as well. There are two “access points” that come to mind: When advising research students (research master students, PhD students, … and when reviewing papers.

We can make clear that although inspiration may be drawn from whatever source, in the end, there must a formal algorithm, described in a clear and formal way. This entails discouraging the use (and in particular, the invention) of specialized vocabulary. Algorithms should be described short and precisely. There should be meaningful pseudo code.

3.4 Proper Benchmarking

As same as important as a clear description what an algorithm is doing is a clear assessment of its results. This means:

3.4.1. Benchmark Problems

Benchmarking is done on well-known instances of well-known benchmarking problems. If a new algorithm is developed, results should be reported which are meaningful to a large share of the research community. For example, most researchers may either already know what the Traveling Salesman Problem is or can find that out quickly. They can google “Traveling Salesman Problem Benchmark” and will immediately find TSPLib [17], a benchmark dataset with known optimal results. If a new algorithm is applied to the TSPLib instances, it is extremely easy to get a quick impression on how good the results are and to find literature about the state-of-the-art or potentially related works. If the new algorithm is applied to a problem from, say, laser optics, it will be significantly harder to judge whether it is good or not.

3.4.2. Fair Comparison

Good research requires fair comparisons with the best methods out there. If a new algorithm is compared with a GA, then the GA should use the same search operators and be hybridized with the same local search (if any). It should also be granted the same amount of function evaluations and if the new algorithm is population-based, the GA should have the same population settings. If the setup of the new algorithm has been determined with a smaller preliminary experiment testing several configurations, then a similar experiment should also be carried out for the GA to find its best setup as well. A new algorithm for the TSP representing solutions as permutations should not be compared with a GA using an unsuitable representation (say bit strings) for its solutions (and hence different search operators), because it would be entirely unfair.

3.4.3. Negative Results

Of course, for most such well-known problems like the TSP, one will not be able to beat the state-of-the-art which was developed during years of research. I believe, however, that even negative results are interesting, if they were obtained with thorough experimentation with interesting new concepts. Finding out why an algorithm did not work as well as expected can also be valuable. It may help to develop better methods. If someone comes up with a similar idea, the knowledge that it already has been investigated and that it failed may save time and effort. Accepting negative results at conferences also creates more incentive for thorough experimentation.

References

Footnotes

  1. The page version from June 2013 [18] has been “pruned” by somebody unrelated to the author.
  2. Names are fictional and polemic, similarities with actual names are unintended and random.
  3. The term meme serving as inspiration for MAs might be another example for unnecessary inspiration: the term hybrid instead would be much clearer.

Literature

  1. John Henry Holland. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. Dublin, OH, USA: NetLibrary, Inc., May 1992.
    ISBN: 0585038449 and 9780585038445 / Google Books.
  2. Thomas Bäck, David B. Fogel, and Zbigniew Michalewicz, editors. Handbook of Evolutionary Computation. Computational Intelligence Library. New York, NY, USA: Oxford University Press, Inc., Dirac House, Temple Back, Bristol, UK: Institute of Physics Publishing Ltd. (IOP), and Boca Raton, FL, USA: CRC Press, Inc., January&nbp;1, 1997.
    ISBN: 0-7503-0392-1, 0-7503-0895-8, 978-0-7503-0392-7, and 978-0-7503-0895-3 / Google Books.
  3. Marco Dorigo and Thomas Stützle. Ant Colony Optimization. Bradford Books. Cambridge, MA, USA: MIT Press, July 1, 2004.
    ISBN: 0-262-04219-3 and 978-0-262-04219-2 / Google Books.
  4. Russel C. Eberhart and James Kennedy. A New Optimizer Using Particle Swarm Theory. In Proceedings of the Sixth International Symposium on Micro Machine and Human Science (MHS'95), pages 39–43, Nagoya, Japan, October 4–6, 1995. Piscataway, NJ, USA: IEEE Computer Society.
    doi:10.1109/MHS.1995.494215 / pdf icon pdf.
  5. Pedro Larrañaga and José Antonio Lozano, editors. Estimation of Distribution Algorithms – A New Tool for Evolutionary Computation, volume 2 of Genetic and Evolutionary Computation. Boston, MA, USA: Springer US and Norwell, MA, USA: Kluwer Academic Publishers, 2001.
    ISBN: 0-7923-7466-5 and 978-0-7923-7466-4 / Google Books.
  6. Wikipedia page on “Metaheuristic”, February 10, 2013.
    http://en.wikipedia.org/w/index.php?title=Metaheuristic&oldid=537593171.
    This page lists 54 different metaheuristics under “major contributions” to the field of metaheuristics.
  7. Tobias Blickle and Lothar Thiele. A Comparison of Selection Schemes Used in Genetic Algorithms. TIK-Report 11, Zürich, Switzerland: Eidgenössische Technische Hochschule (ETH) Zürich, Department of Electrical Engineering, Computer Engineering and Networks Laboratory (TIK), December 1995.
    ps icon ps.
  8. Pablo Moscato. Memetic Algorithms. In Panos M. Pardalos and Mauricio G.C. Resende, editors, Handbook of Applied Optimization, chapter 3.6.4, pages 157–167. New York, NY, USA: Oxford University Press, Inc., February 22, 2002.
  9. Nikolaus Hansen, Andreas Ostermeier, and Andreas Gawelczyk. On the Adaptation of Arbitrary Normal Mutation Distributions in Evolution Strategies: The Generating Set Adaptation. In Larry J. Eshelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms (ICGA'95), pages 57–64, Pittsburgh, PA, USA: University of Pittsburgh, July 15–19, 1995. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
    CiteSeer:10.1.1.29.9321
  10. Paul J. Darwen and Xin Yao. Every Niching Method has its Niche: Fitness Sharing and Implicit Sharing Compared. In Hans-Michael Voigt, Werner Ebeling, Ingo Rechenberg, and Hans-Paul Schwefel, editors, Proceedings of the 4th International Conference on Parallel Problem Solving from Nature (PPSN IV), volume 1141/1996 of Lecture Notes in Computer Science (LNCS), pages 398–407, Berlin, Germany, September 22–24, 1996. Berlin, Germany: Springer-Verlag GmbH.
    doi:10.1007/3-540-61723-X_1004 / pdf icon pdf.
  11. Hisao Ishibuchi, Noritaka Tsukamoto, and Yusuke Nojima. Evolutionary Many-Objective Optimization: A Short Review. In Zbigniew Michalewicz and Robert G. Reynolds, editors, Proceedings of the IEEE Congress on Evolutionary Computation (CEC'08), Computational Intelligence: Research Frontiers – IEEE World Congress on Computational Intelligence – Plenary/Invited Lectures (WCCI), volume 5050/2008 of Lecture Notes in Computer Science (LNCS), pages 2424–2431, Hong Kong (Xianggang), China: Hong Kong Convention and Exhibition Centre, June 1–6, 2008. Piscataway, NJ, USA: IEEE Computer Society.
    doi:10.1109/CEC.2008.4631121 / CiteSeer:10.1.1.139.2046.
  12. Hans-Georg Beyer. The Theory of Evolution Strategies. Natural Computing Series. New York, NY, USA: Springer New York, May 27, 2001.
    ISBN: 3-540-67297-4 and 978-3-540-67297-5 / Google Books.
  13. Michael Guntsch. Ant Algorithms in Stochastic and Multi-Criteria Environments. PhD thesis, Karlsruhe, Germany: University of Karlsruhe (Friedriciana), Department of Economics and Business Engineering and Karlsruhe, Germany: University of Karlsruhe (Friedriciana), Institute for Applied Computer Science and Formal Description Methods (AIFB), January 13, 2004.
  14. Nikolaus Hansen, Anne Auger, Raymond Ros, and Dimo Brockhoff. COCO (COmparing Continuous Optimisers), October 17, 2012.
    http://coco.gforge.inria.fr/.
  15. Thomas Weise, Raymond Chiong, Ke Tang, Jörg Lässig, Shigeyoshi Tsutsui, Wenxiang Chen, Zbigniew Michalewicz, and Xin Yao. Benchmarking Optimization Algorithms: An Open Source Framework for the Traveling Salesman Problem. IEEE Computational Intelligence Magazine (CIM), 9(3):40-52, August 2014. Featured article and selected paper at the website of the IEEE Computational Intelligence Society (http://cis.ieee.org/).
    details / doi:10.1109/MCI.2014.2326101 / pdf icon pdf
  16. Wikipedia page on “Particle swarm optimization”, February 2, 2014.
    http://en.wikipedia.org/w/index.php?title=Particle_swarm_optimization&oldid=593577321
  17. Gerhard Reinelt. TSPLIB – A Traveling Salesman Problem Library. ORSA Journal on Computing, 3(4):376–384, Fall 1991.
    doi:10.1287/ijoc.3.4.376 / TSPLib
  18. Wikipedia page on “Metaheuristic”, December 6, 2013.
    http://en.wikipedia.org/w/index.php?title=Metaheuristic&oldid=584835175.
    This page lists 16 different metaheuristics under “major contributions” to the field of metaheuristics.