The travelling salesman problem (TSP) is one of the most intensively studied problems in optimization. Loosely speaking, given a set of cities on a map, the problem consists of finding a tour that goes through each city exactly once and ends in the same city it started with. There has been much research done on finding efficient heuristics to get provably optimal and close to optimal solutions to TSP problems. Recently a polynomial time approximation scheme was discovered for the Euclidian TSP problem. With parallel and cloud computing gaining prominence in the past few years, finding efficient parallelization of existing algorithms will gain importance.

Furthermore, given the push towards cloud computing, it will become increasingly necessary to adopt algorithms to existing cluster computing frameworks like MapReduce. This project deals with Parallelizing Ant Colony Optimization for travelling salesman problem Over Hadoop Map-Reduce.There’s already been much work done in parallelizing TSP heuristics. However there doesn't seem to be any literature on using advantage of existing cluster computing architectures for solving TSP. In this project we explore the problem of parallelizing the Ant Colony Optimization algorithm for TSP on the MapReduce cluster computing framework. We focus on parallelizing Ant Colony Optimization for travelling salesman problem Over Hadoop Map-Reduce.

Generally, for a TSP solver, one either tries to obtain a provably optimal solution of one tries to get a solution as close to the optimum as possible without actually proving that the solution is close to the optimum. While the former goal has an advantage in that it gives a guarantee of the quality of the solution, it is generally very slow and infeasible to apply to instances of a large size. Thus we opt for the second goal. We use an algorithm termed the Ant Colony Optimization algorithm that simulates the way ants find the shortest route to a food source. There already exist sequential versions of this algorithm. We aim to parallelize this algorithm over Hadoop MapReduce and thereby improve its performance.

In computer science and operations research, the ant colony optimization (ACO) algorithm is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. Ant colony optimization algorithms have been applied to many combinatorial optimization problems, ranging from quadratic assignment to protein folding or routing vehicles and a lot of derived methods have been adapted to dynamic problems in real variables, stochastic problems, multi-targets and parallel implementations. It has also been used to produce near-optimal solutions to the travelling salesman problem.

They have an advantage over simulated annealing and genetic algorithm approaches of similar problems when the graph may change dynamically; the ant colony algorithm can be run continuously and adapt to changes in real time. This is of interest in network routing and urban transportation systems.

As a very good example, ant colony optimization algorithms have been used to produce near-optimal solutions to the travelling salesman problem. The first ACO algorithm was called the Ant system and it was aimed to solve the travelling salesman problem, in which the goal is to find the shortest round-trip to link a series of cities. The general algorithm is relatively simple and based on a set of ants, each making one of the possible round-trips along the cities. At each stage, the ant chooses to move from one city to another according to some rules:

- It must visit each city exactly once;
- A distant city has less chance of being chosen (the visibility);
- The more intense the pheromone trail laid out on an edge between two cities, the greater the probability that that edge will be chosen;
- Having completed its journey, the ant deposits more pheromones on all edges it traversed, if the journey is short;After each iteration, trails of pheromones evaporate.

This project aimed at parallelizing TSP problem by which the execution time was supposed to reduce. From the observation of the experiment it is seen that as the number of nodes in the cluster increases, the efficiency of the parallelism increases but we were unable to reduce the execution time to the level we were aiming at from the start.We believe that by conducting several experiments with different parameters, we can arrive at optimum values for each of the parameters used i.e. number of mappers per stage, the number of ants per mapper and the number of stages in the chained job.From our experience, we have found that Hadoop has too much overhead and so is a bit unsuitable for problems of this kind. Possibly, with the importance of solving such problems on parallel systems increasing day by day, Hadoop will be improved in the near future that will make it suitable for such purposes.One of the main obstacles that we faced in this project was regarding the sharing of pheromones between the mappers. Hence we are duplicating some of the work in the reducer and because we have only a single reducer in each stage, we think it is affecting the total execution time.

As explained, the results we obtained were not as good as we expected. Hence we plan to conduct more work on this project in the future. Some of the main areas that we would like to improve are:

- Conduct more study in order to find a better way to implement the pheromone updates. We think that one of the reasons for the results being not as good as expected is because of the reason that the ants in the different mappers cannot see each other’s pheromone updates. So improving this is one of the priorities for the future.
- Because we are unable to share pheromone updates between mappers, in the current implementation all the pheromone updates are done by the single reducer in each stage. So, the reducer has too much work to do as the number of ants increases. We hope to improve this behavior by studying the possibility of having multiple reducers for pheromone updates and reducing the duplication of work.
- Another important priority is to study the configuration of Hadoop clusters in more detail and implement a better cluster than the one that we used for this project. This should also help us obtain better results.
- Finally, we aim to study more heuristic algorithms for other NP complete problems and try and parallelize them also.