## 1 Introduction

Since their introduction bismut1973conjugate; pardoux1990adapted, backward stochastic differential equations have found many applications in areas like stochastic control, theoretical economics, and mathematical finance. They have received considerable attention in the literature and interesting connections to partial differential equations have been obtained (see e.g., cheridito2007second and the references therein). The key feature of backward stochastic differential equations is the random terminal condition that the solution is required to satisfy. These equations are referred to as forward-backward stochastic differential equations, if the randomness in the terminal condition is coming from the state of a forward stochastic differential equation. The solution to a forward-backward stochastic differential equation can be written as a deterministic function of time and the state process. Under suitable regularity assumptions, this function can be shown to be the solution of a parabolic partial differential equation cheridito2007second. A forward-backward stochastic differential equation is called uncoupled if the solution of the backward equation does not enter the dynamics of the forward equation and coupled if it does. The corresponding parabolic partial differential equation is semi-linear in case the forward-backward stochastic differential equation is uncoupled and quasi-linear if it is coupled.

In this work, we approximate the aforementioned deterministic function of time and space by a deep neural network. This choice is inspired by modern techniques for solving forward and inverse problems associated with partial differential equations, where the unknown solution is approximated either by a neural network raissi2017physics_I; raissi2017physics_II; raissi2018deep or a Gaussian process raissi2018numerical; raissi2018hidden; raissi2017inferring; raissi2017machine

. Moreover, putting a prior on the solution is fully justified by the similar approach pursued in the past century by classical methods of solving partial differential equations such as finite elements, finite differences, or spectral methods, where one would expand the unknown solution in terms of an appropriate set of basis functions. However, the classical methods suffer from the curse of dimensionality mainly due to their reliance on spatio-temporal grids. In contrast, modern techniques avoid the tyranny of mesh generation, and consequently the curse of dimensionality, by approximating the unknown solution with a neural network or a Gaussian process. Moreover, unlike the state of the art deep learning based algorithms for solving high-dimensional partial differential equations

beck2017machine; weinan2017deep; han2017overcoming, our algorithm (upon a single round of training) results in a solution function that can be evaluated anywhere in the space-time domain, not just at the initial point.## 2 Problem Setup and Solution methodology

We consider coupled forward-backward stochastic differential equations of the general form

(1) |

where

is a vector-valued Brownian motion. A solution to these equations consists of the stochastic processes

, , and . It is shown in antonelli1993backward and ma1994solving (see also pardoux1999forward; delarue2006forward; cheridito2007second) that coupled forward-backward stochastic differential equations (1) are related to quasi-linear partial differential equations of the form(2) |

with terminal condition , where is the unknown solution and

(3) |

Here, and denote the gradient vector and the Hessian matrix of , respectively. In particular, it follows directly from Ito’s formula (see e.g., cheridito2007second) that solutions of equations (1) and (2) are related according to

(4) |

Inspired by recent developments in *physics-informed deep learning* raissi2017physics_I; raissi2017physics_II and *deep hidden physics models* raissi2018deep, we proceed by approximating the unknown solution by a deep neural network. We obtain the required gradient vector

by applying the chain rule for differentiating compositions of functions using automatic differentiation

baydin2015automatic. It is worth emphasizing that automatic differentiation is different from, and in several respects superior to, numerical or symbolic differentiation; two commonly encountered techniques of computing derivatives. In its most basic description baydin2015automatic, automatic differentiation relies on the fact that all numerical computations are ultimately compositions of a finite set of elementary operations for which derivatives are known. Combining the derivatives of the constituent operations through the chain rule gives the derivative of the overall composition. This allows accurate evaluation of derivatives at machine precision with ideal asymptotic efficiency and only a small constant factor of overhead. In particular, to compute the derivatives involved in equation (4) we rely on Tensorflow

abadi2016tensorflow which is a popular and relatively well documented open source software library for automatic differentiation and deep learning computations.Parameters of the neural network representing

can be learned by minimizing the following loss function given explicitly in equation (

6) obtained from discretizing the forward-backward stochastic differential equation (1) using the standard Euler-Maruyama scheme. To be more specific, let us apply the Euler-Maruyama scheme to the set of equations (1) and obtain(5) |

for , where and

is a random variable with mean

. The loss function is then given by(6) |

which corresponds to different realizations of the underlying Brownian motion. Here, and . The subscript corresponds to the -th realization of the underlying Brownian motion while the superscript corresponds to time . It is worth recalling from equations (4) that and , and consequently the loss (6) is a function of the parameters of the neural network . Furthermore, from equation (5) we have

and for every .

##
3 Related Work^{1}^{1}1This section can be skipped in the first read.

In weinan2017deep; han2017overcoming, the authors consider uncoupled forward-backward stochastic differential equations of the form

(7) |

which are subcases of the coupled equations (1) studied in the current work. The above equations are related to the semilinear and parabolic class of partial differential equations

(8) |

The authors of weinan2017deep; han2017overcoming then devise an algorithm to compute by treating and as parameters in their model. Then, they employ the Euler-Maruyama scheme to discretize equations (7). Their next step is to approximate the functions for at time steps by different neural networks. This enables them to approximate by evaluating the corresponding neural network at time at the spatial point . Moreover, no neural networks are employed in weinan2017deep; han2017overcoming to approximate the functions . In fact is computed by time marching using the Euler-Maruyama scheme used to discretize equations (7). Their loss function is then given by

(9) |

which tries to match the terminal condition. The total set of parameters are consequently given by , , and the parameters of the neural networks used to approximate the gradients. There are a couple of major drawbacks associated with the method advocated in weinan2017deep; han2017overcoming.

The first and the most obvious drawback is that the number of parameters involved in their model grows with the number of points used to discretized time. This is prohibitive specially in cases where one would need to perform long time integration (i.e., when the final time is large) or in cases where it is a requirement to employ smaller time step size in order to increase the accuracy of the Euler-Maruyama scheme. The second major drawback is that the method as outlined in weinan2017deep; han2017overcoming is designed in such a way that it is only capable of approximating . This means that in order to obtain an approximation to at a later time , they will have to retrain their algorithm. The third drawback is that assuming and to act as parameters of the models in addition to approximating the gradients by distinct (not sharing any parameters) neural networks seems a little bit ad-hoc.

In contrast, the method proposed in the current work circumvents all of the drawbacks mentioned above by placing a neural network directly on the object of interest, the unknown solution . This choice is justified by the similar well-established approach taken by the classical methods of solving partial differential equations, such as finite elements, finite differences, or spectral methods, where one would expand the unknown solution in terms of an appropriate set of basis functions. In addition, modern methods for solving forward and inverse problems associated with partial differential equations approximate the unknown solution by either a neural network raissi2017physics_I; raissi2017physics_II; raissi2018deep; raissi2018multistep or a Gaussian process raissi2018numerical; raissi2018hidden; raissi2017inferring; raissi2017machine; raissi2017parametric; perdikaris2017nonlinear; raissi2016deep. The classical methods suffer from the curse of dimensionality mainly due to their reliance on spatio-temporal grids. Here, inspired by the aforementioned modern techniques, we avoid the curse of dimensionality by approximating with a neural network. It should be highlighted that the number of parameters of the neural network we use to approximate is independent of the number of the number of points needed to discretized time (see equation (5)). Moreover, upon a single round of training, the neural network representing can be evaluated anywhere in the space-time domain, not just at the initial point . Furthermore, we compute the required gradients by differentiating the neural network representing using automatic differentiation. Consequently, the networks and share the same set of parameters. This is fully justified by the theoretical connection (see equation (4)) between solutions of forward-backward stochastic differential equations and their associated partial differential equations. A major advantage of the approach pursued in the current work is the reduction in the number of parameters employed by our model, which helps the algorithm generalize better during test time and consequently mitigate the well-known over-fitting problem.

In beck2017machine, a follow-up work on weinan2017deep; han2017overcoming, the authors extend their framework to fully-nonlinear second-order partial differential equations of the general form

(10) |

with terminal condition . Here, let denote a high-dimensional stochastic process satisfying the forward stochastic differential equation

(11) |

where is the drift vector and is the diffusion matrix. It then follows directly from Ito’s formula cheridito2007second that the processes

(12) |

solve the second-order backward stochastic differential equation

(13) |

Similar to their prior works weinan2017deep; han2017overcoming, the authors then devise an algorithm to compute by treating , , , and as parameters of their model. Then, they proceed by discretizing equations (13) by the Euler-Maruyama scheme. Their next step is to approximate the functions and for , corresponding to each time step , by distinct neural networks. This enables them to approximate and by evaluating the corresponding neural networks at . Moreover, no neural networks are employed in beck2017machine to approximate the functions and . In fact and are computed by time marching using the Euler-Maruyama scheme applied to equations (13). Their loss function is then given by (9) which tries to match the terminal condition. The total set of parameters are consequently given by , , , , and the parameters of the neural networks used to approximate the functions and . This framework, being a descendant of weinan2017deep; han2017overcoming, also suffers from the drawbacks listed above. It should be emphasized that, although not pursued here, the framework proposed in the current work can be straightforwardly extended to solve the second-order backward stochastic differential equations (13). The key (see e.g., cheridito2007second) is to leverage the fundamental relationships (12).

## 4 Results

The proposed framework provides a universal treatment of coupled forward-backward stochastic differential equations of fundamentally different nature and their corresponding high-dimensional partial differential equations. This generality will be demonstrated by applying the algorithm to a wide range of canonical problems spanning a number of scientific domains including a 100-dimensional Black-Scholes-Barenblatt equation and a 100-dimensional Hamilton-Jacobi-Bellman equation. These examples are motivated by the pioneering works beck2017machine; weinan2017deep; han2017overcoming. All data and codes used in this manuscript will be publicly available on GitHub at https://github.com/maziarraissi/FBSNNs.

### 4.1 Black-Scholes-Barenblatt Equation in 100D

Let us start with the following forward-backward stochastic differential equations

(14) |

where , , , , and . The above equations are related to the Black-Scholes-Barenblatt equation

(15) |

with terminal condition . This equation admits the explicit solution

(16) |

which can be used to test the accuracy of the proposed algorithm. We approximate the unknown solution

by a 5-layer deep neural network with 256 neurons per hidden layer. Furthermore, we partition the time domain

into equally spaced intervals (see equations (5)). Upon minimizing the loss function (6), using the Adam optimizer kingma2014adam with mini-batches of size (i.e., 100 realizations of the underlying Brownian motion), we obtain the results reported in figure 1. In this figure, we are evaluating the learned solution at representative realizations (not seen during training) of the underlying high-dimensional process . Unlike the state of the art algorithms beck2017machine; weinan2017deep; han2017overcoming, which can only approximate at time and at the initial spatial point , our algorithm is capable of approximating the entire solution function in a single round of training as demonstrated in figure 1. Figures such as this one are absent in beck2017machine; weinan2017deep; han2017overcoming, by design.To further scrutinize the performance of our algorithm, in figure 2 we report the mean and mean plus two standard deviations of the relative errors between model predictions and the exact solution computed based on independent realizations of the underlying Brownian motion. It is worth noting that in figure 1 we were plotting 5 representative examples of the 100 realizations used to generate figure 2. The results reported in figures 1 and 2 are obtained after , , , and consecutive iterations of the Adam optimizer with learning rates of , , , and , respectively. The total number of iterations is therefore given by . Every iterations of the optimizer takes about seconds on a single NVIDIA Titan X GPU card. In each iteration of the Adam optimizer we are using different realizations of the underlying Brownian motion. Consequently, the total number of Brownian motion trajectories observed by the algorithm is given by . It is worth highlighting that the algorithm converges to the exact value in the first few hundred iterations of the Adam optimizer. For instance after only 500 steps of training, the algorithms achieves an accuracy of around in terms of relative error. This is comparable to the results reported in beck2017machine; weinan2017deep; han2017overcoming

, both in terms of accuracy and the speed of the algorithm. However, to obtain more accurate estimates for

at later times we need to train the algorithm using more iterations of the Adam optimizer.### 4.2 Hamilton-Jacobi-Bellman Equation in 100D

Let us now consider the following forward-backward stochastic differential equations

(17) |

where , , , and . The above equations are related to the Hamilton-Jacobi-Bellman equation

(18) |

with terminal condition . This equation admits the explicit solution

(19) |

which can be used to test the accuracy of the proposed algorithm. In fact, due to the presence of the expectation operator in equation (19), we can only approximately compute the exact solution. To be precise, we use Monte-Carlo samples to approximate the exact solution (19) and use the result as ground truth. We represent the unknown solution by a 5-layer deep neural network with 256 neurons per hidden layer. Furthermore, we partition the time domain into equally spaced intervals (see equations (5)). Upon minimizing the loss function (6), using the Adam optimizer kingma2014adam with mini-batches of size (i.e., 100 realizations of the underlying Brownian motion), we obtain the results reported in figure 3. In this figure, we are evaluating the learned solution at a representative realization (not seen during training) of the underlying high-dimensional process . It is worth noting that computing the exact solution (19) to this problem is prohibitively costly due to the need for the aforementioned Monte-Carlo sampling strategy. That is why we are depicting only a single realization of the solution trajectories in figure 3. Unlike the state of the art algorithms beck2017machine; weinan2017deep; han2017overcoming, which can only approximate at time and at the initial spatial point , our algorithm is capable of approximating the entire solution function in a single round of training as demonstrated in figure 3.

To further investigate the performance of our algorithm, in figure 4 we report the relative error between model prediction and the exact solution computed for the same realization of the underlying Brownian motion as the one used in figure 3. The results reported in figures 3 and 4 are obtained after , , , and consecutive iterations of the Adam optimizer with learning rates of , , , and , respectively. The total number of iterations is therefore given by . Every iterations of the optimizer takes about seconds on a single NVIDIA Titan X GPU card. In each iteration of the Adam optimizer we are using different realizations of the underlying Brownian motion. Consequently, the total number of Brownian motion trajectories observed by the algorithm is given by . It is worth highlighting that the algorithm converges to the exact value in the first few hundred iterations of the Adam optimizer. For instance after only 100 steps of training, the algorithms achieves an accuracy of around in terms of relative error. This is comparable to the results reported in beck2017machine; weinan2017deep; han2017overcoming, both in terms of accuracy and the speed of the algorithm. However, to obtain more accurate estimates for at later times we need to train the algorithm using more iterations of the Adam optimizer.

### 4.3 Allen-Cahn Equation in 20D

Let us consider the following forward-backward stochastic differential equations

(20) |

where , , and . The above equations are related to the Allen-Cahn equation

(21) |

with terminal condition . We represent the unknown solution by a 5-layer deep neural network with 256 neurons per hidden layer. Furthermore, we partition the time domain into equally spaced intervals (see equations (5)). Upon minimizing the loss function (6), using the Adam optimizer kingma2014adam with mini-batches of size (i.e., 100 realizations of the underlying Brownian motion), we obtain the results reported in figure 5. In this figure, we are evaluating the learned solution at five representative realizations (not seen during training) of the underlying high-dimensional process . Unlike the state of the art algorithms beck2017machine; weinan2017deep; han2017overcoming, which can only approximate at time and at the initial spatial point , our algorithm is capable of approximating the entire solution function in a single round of training as demonstrated in figure 5.

## 5 Summary and Discussion

In this work, we put forth a deep learning approach for solving coupled forward-backward stochastic differential equations and their corresponding high-dimensional partial differential equations. The resulting methodology showcases a series of promising results for a diverse collection of benchmark problems. As deep learning technology is continuing to grow rapidly both in terms of methodological, algorithmic, and infrastructural developments, we believe that this is a timely contribution that can benefit practitioners across a wide range of scientific domains. Specific applications that can readily enjoy these benefits include, but are not limited to, stochastic control, theoretical economics, and mathematical finance.

In terms of future work, one could straightforwardly extend the proposed framework in the current work to solve second-order backward stochastic differential equations (13). The key (see e.g., cheridito2007second) is to leverage the fundamental relationships (12) between second-order backward stochastic differential equations and fully-nonlinear second-order partial differential equations. Moreover, our method can be used to solve stochastic control problems, where in general, to obtain a candidate for an optimal control, one needs to solve a coupled forward-backward stochastic differential equation (1), where the backward components influence the dynamics of the forward component.

## Acknowledgements

This work received support by the DARPA EQUiPS grant N66001-15-2-4055 and the AFOSR grant FA9550-17-1-0013.

Comments

There are no comments yet.