The reason put forward by Holland (and reported in [7]) for convergence in evolutionary algorithms is that they identify good building-blocks and eventually combine these into bigger building blocks. Premature convergence happens when all chromosomes throughout a population become the same. This means that all crossover operations will yield offspring identical to their parents. If only crossover is used as an operator, it is clear that premature convergence would result very quickly as the best building blocks reached the same chromosome. In order to optimise it is necessary to generate new information. To get to an optimum from this situation though, even using mutation, is a very tedious process. It is also necessary to take timely action to ensure population diversity is maintained as well.
The function of mutation is to add new material into populations and thereby avoid just this premature convergence issue. It is possible to perform a random walk over the whole solution space using only mutation. Some even have argued that cross-over is not strictly necessary in EAs [12] since repeated mutations allow a random walk through the solution space.