There is no simple solution how to achieve the best or at least a good conversion rate. Usually, one just makes an assumption, develops a page, tracks user behavior, analyses the data — and then makes another assumption. This repeats until one can find a satisfying conversion rate, or one runs out of time.

Interestingly enough, there is a quite large amount of mathematical work regarding a similar problem. In the mathematical slang, they call it the global optimization problem.

Mathematically, conversion rate optimization can be described as finding local (or at best, global) maximum of function f(X), where X is a vector of different factors influencing conversion rate. Unlike a typical mathematical optimization problem, the function f is not known beforehand, and both the function f and the factors included in the vector X might change after each optimization step.

There are a lot of metaheuristics formulating different strategies that can be used for the optimization. I’ve just read about a very small subset of them, and I find that the strategies are very interesting. Especially, their names. Who has said mathematicians are not creative?

This is my interpretation of them, as applied to the conversion rate optimization problem.

**Gradient descent**

You make a small change of all factors in X at the same time, to maximize the conversion rate as you see it. Then you measure it. Then you try to understand, what factor change worked positively and what negatively, and revert changes in wrong factors while increasing the change in the right factors. Iterate, until a local maximum is found.

**Hill climbing**

Start with a baseline of factors X. Now change the first factor x1 in X. Measure conversion rate. Revert x1 and change second factor x2 in X. Measure. Repeat testing, until all factors in X are tested. This procedure can also be done parallelly using a multi-variate A/B testing. However the testing is done, at the end of the day we can find out, which winning factor xi has made the largest improvement in conversion rate. Now establish a new baseline X’, which is just like X, but with the improved winning factor xi. Continue iterating from this new point, until a local maximum is found. It is quite probably that in each iteration a different factor will be improved.

These two strategies have the following drawbacks:

- Walking on ridges is slow. Imagine the N-dimensional space of factors X, and the function f as a very sharp ridge leading to a peak. You might be constantly landing on a side of ridge, so that you are forced to move to the peak in zigzag. This takes valuable time.
- Plateaus are traps. Imagine that no change of the current baseline X is measurably better than X. You know you’re on a plateau. Perhaps it is also the highest possible point. Or perhaps, if you make some steps, you’ll find an even higher point. The problem is, in which direction should you go (i.e. which factors do you want to change?)
- Only local maximum might be found. If your starting point is nearer to a local maximum than to a global maximum, you’ll reach the smaller peak and stop, because any small change of the current baseline X will be measurably worse than X.

To fight these problems, a number of strategies has been developed.

In the **shotgun hill climbing**, when you’re stuck, you just restart the search from a random position that is sufficiently different from your latest one.

In the **simulated annealing**, you are allowed to go into directions what actually show worse conversion rates, but for some short time. This can help to jump over abysses in function f, if they are not too wide.

And in **tabu search**, if one is stuck, one explicitly ignores all the positive factor changes and explores all the other factors (that previously didn’t play a big role).