## Models for Optimization under Data Uncertainty

As we saw in a previous post, data uncertainty poses major challenges for optimization. In this article, we will explore the various theoretical answers and models that have been proposed over the years to properly handle that issue. The key idea of all these approaches is to fully integrate uncertainty within the optimization model, to let algorithms handle the risk. Having said that, the question remains: how?

To start with something, let us give a description of a generic, deterministic (that is, without uncertainty), optimization problem: Here, we aim at maximizing a function g of our decision vector x, under the constraint that another function f of x is under 0. That model encompasses most classical mathematical programming paradigms, such as linear programming, or semi-definite programming, without loss of generality. In a deterministic setting, all parameters of function f and g are perfectly know, without ambiguity.

# The first step, knowing probabilities instead of knowing data : Stochastic Optimization

The first approach that was designed to handle uncertainty was stochastic optimization, introduced by George B. Dantzig in 1955 . The idea with stochastic optimization is that when facing uncertainty, it is natural (at least mathematically) to postulate that uncertainty is caused by random variables affecting the data of the problem, and that these random variables follow a probability distribution, assumed to be known. Therefore, with stochastic programming, we replace the absolute knowledge of data by the absolute knowledge of the probability distribution followed by data.

More formally, it implies that our functions f and g may not be depending on the decision vector x alone anymore, but also on some random vector (often represented by the greek letter xi). Therefore, once you input randomness into the description of your problem, it is not deterministic anymore. Indeed, what is the meaning of optimizing a function over which you do not have full control with the decision vector x? In a similar fashion, what does it mean to impose constraints on something that will randomly behave, despite your decision?

Depending on the source of uncertainty in the data, it can give rise to two different problems:

• if uncertainty affects the objective function of the problem, it’s called stochastic optimization and we are trying to optimize a criterion over the probabilistic outcome of function g, which is almost always the expected value: • if uncertainty affects the constraints data of the problem, litterature refers to it as “chance-constraints programming”, where the model aims at satisfaying constraints with a high probability, given by the decider: Of course, the combination of both is possible.

Obviously, solving both problem is very challenging, as it somehow imply the integration of functions g or f over continuous infinites. That’s where the maths comes in, as tractable reformulations of these problems have been proposed. The reader can refer to ,  and  to get a comprehensive review of the field and of these methods.

Without giving too much detail, most methods rely on producing a large amount of samples for the random vector to approximate the true distribution, thus reducing the problem to a deterministic, solvable one (with a few tricks though). Other methods rely on the knowledge of the moments of the distribution (usually the first and second moment), as well as famous distributions such as the Gaussian or Poisson distribution.

However, if we set aside the tractability of solving these problems, the model itself carries some hidden drawbacks that are worth mentioning:

• By assuming we could not know data perfectly, we started assuming that the probability distribution can be known exactly, which is just as wrong (even though it is way better than assuming that data is known)!
• Optimizing an expected value implies that the decision will be optimized on average. For repeated decision making (like portfolio investment), this is highly relevant, but what about decisions where you will decide once and for all (like network design), and that things go wrong?
• Even though describing uncertainty as random variables following known probability distribution seems natural, there may be situations where no one can actually give a description of that distribution!

To face those issues, a parallel stream of research investigated another modeling approach for optimization under uncertainty: robust optimization.

# The second step, knowing almost nothing to be protected against anything: Robust Optimization

In order to tackle the drawbacks of Stochastic Optimization, a novel approach was designed in order to work with much less hypothesis and to be protected against uncertainty in a stronger way. The premisces of the approach were defined in 1973 by Soyster  to be later popularized by the concept of budgeted uncertainty by Bertsimas and Sim in 2004 . In this approach, we keep a random vector affecting data, but instead of having it follow a probability distribution, we will only postulate that it must be contained in a bounded space, called the uncertainty set.

The main idea behind the uncertainty set is to postulate that the random vector can take any possible value within a bounded set, and that we want the solution to be immune to the worst realization of the random vector: This kind of worst-case approach allows to make sure that the gain will never be lower than what was computed, and that the solution will always be feasible. Indeed, this kind of minimax criterion guarantees models the fact that the optimal decision vector x* is defined in order to be the most resilient possible to all possible variations of the random vector.

More complex criterion have been elaborated, such as the max regret, but the general spirit of the method truely is in the definition of the uncertainty set Xi. Many have been explored (see  and  for general view of the available approaches), but if we take the simplest one from Bertsimas and Sim, for each value a, a minimum and a maximum possible value is defined around the nominal value: With â being the nominal value. The main idea behind Bertsimas and Sim’s uncertainty set is that even though the random vector can take any value between -1 and 1, it is highly unlikely that it will simultaneously take all the worst realization at the same time. In other words, it cannot deviate too much from 0. The uncertainty set is thus defined such as to limit the norm of the random vector under a given “budget”: Using an euclidean norm (p=2) would lead to an ellipsoïdal set, while using the 1-norm leads to a polyhedral uncertainty set. Enabling the random vector to take any value within that set and making the solution feasible and optimal for the worst realization of the vector thus provides a very strong guarantee for the solution not to be deceptive afterwards. Many papers provided ways to set the budget value, through theorems based on Bienaymé-Tchebychev inequalities (see  and ).

As these problems cannot be solved directly, most papers provided tractable reformulation based on duality theorems to transform these problems into deterministic ones, with great results on many applications, especially when optimizing a decision for long term demand (power generation in , fiber network design in ). But then again, even though the method seem to solve Stochastic Optimization limitations, it also has its dark side:

• Solutions may not be deceptive afterwards, but they can surely be deceptive before! Indeed, protecting against the worst case often lead to very conservative solutions. In other words, the Price of Robustness is sometimes way too high!
• Even though the very limited number of hypothesis involved is appealing compared to assuming know probability distributions, it may seem too limited as in practice (especially in the era of great data availability we’re living in) we might have more information than this, and it is not exploited at all.

To conclude, before 2015, depending on the kind of problem you were trying to tackle, you had two different models to play with:

• Stochastic Optimization for problems where you can describe probability distributions with high enough precision that imply repeated decision making
• Robust Optimization for problems where you have little information on uncertain data that imply decisions with high impact, where bad surprises should be avoided at all cost

However, most problems do not entirely fall into either of these two categories. That is why, in the last 5 years, a reunificating framework emerged: Distributionally Robust Optimization.

# The reunification step, knowing something, and knowing it’s wrong: Distributionally Robust Optimization

Nowadays, the availability of data makes the very limited hypotheses of Robust Optimization look like we are missing some opportunities to actually use that data. On the other hand, stochastic optimization can make use of data to build its probability distribution, but it lacks the kind of safety provided by Robust Optimization. Distributionally Robust Optimization (DRO) takes the best of these two worlds and merges them into a framework that assumes:

• a nominal probability distribution that we know is probably close to the real, unknown distribution, like Stochastic Optimization
• we want protection against the worst possible distribution “around” the nominal one, like Robust Optimization

It can be modeled like this: Where F is the uncertainty set built to contain the “true” probability distribution P* with a very high probability: where d is a distance measure between two distributions. Many distances can be used, like the Kullback-Leibler divergence (which is not, formally speaking, a distance) or the Wasserstein / Kantorovich distance. Recent theorems from Bolley, Guillin and Villani exhibited interesting properties of the Wasserstein metric (see ), especially in terms of probability guarantees. Later on, Van Parys, Esfahani and Kuhn showed in  that DRO is actually optimal in presence of historical data, in addition to being actually tractable in  and .

The main advantages of DRO as a framework is that depending on the amplitude of the uncertainty set, it can either reduce to a Stochastic Optimization or a Robust Optimization one, so it can be viewed as a generalization of those approaches, enabling the user to actually leverage existing data, without sacrificing robustness.

In conclusion, the last 15 years of research have been producing a lot of theory and models to offer Optimization practionners all the tools to properly handle data uncertainty. The most recent advances are based on Distributionally Robust Optimization, and they are opening the way to finding the right balance between performance and robustness, while providing very convincing support for data-driven approaches.

In our next article on data uncertainty, we will try to explain how these theoretical approaches can be declined in the various contexts of Vehicle Routing Problems.

# References

 George B. Dantzig. Linear Programming under Uncertainty. Management Science Vol. 1 (3-4), 1955.

 Peter Kall, Stein W. Wallace. Stochastic Programming. Journal of the Operational Research Society 46(9), 1994.

 Alexander Shapiro, Darinka Dentcheva, Andrzej Ruszczynski. Lectures on Stochastic Programming: Modeling and Theory. MOS-SIAM Series on Optimization, 2009.

 John R. Birge, François Louveaux. Introduction to Stochastic Programming. Springer, 2011.

 Allen L. Soyster. Convex programming with set-inclusive constraints and applications to inexact linear programming. Operations Research Vol. 21, pp 1154-1157, 1973.

 Dimitris Bertsimas, Melvyn Sim. The Price of Robustness. Operations Research Vol. 52, No. 1 (2004).

 Aharon Ben-Tal, Laurent El-Ghaoui, Arkadi Nemirovski. Robust Optimization. Princeton Series in Applied Mathematics (2009).

 Bram L. Gorissen, Ihsan Yanikoglu, Dick den Hertog. A practical guide to robust optimization. Omega Vol. 53, pp. 124-137 (2015).

 F. Babonneau, J.P. Vial, R. Apparigliato. Handbook on “Uncertainty and Envrionmental Decision Making”, chapter Robust Optimization for environmental and energy planning, International Series in Operations Research and Management Science, Springler Verlag, 2009.

 Cédric Hervet. Optimization of optical network deployment. Considerations on demand uncertainty. PhD Thesis, 2013.

 François Bolley, Arnaud Guillin, Cédric Villani. Quantitative concentration inequalities for empirical measures on non-compact spaces. Probability Theory and Related Fields, Springer Verlag vol. 137, pp. 541-593, 2007.

 Bart P.G. Van Parys, Peyman M. Esfahani, Daniel Kuhn. From Data to Decisions: Distributionally Robust Optimization is Optimal. Optimization Online (April 2017).

 Peyman M. Esfahani, Daniel Kuhn. Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Mathematical Programming Vol. 171, Issue 1-2 (2018).

 Zhi Chen, Daniel Kuhn, Wolfram Wiesemann. Data-Driven Chance Constrained Programs over Wasserstein Balls. Optimization Online (June 2018).

 Daniel Z. Long, Melvyn Sim, Minglong Zhou. The Dao of Robustness. Available at SSRN (November 2019). 