Solving nonlinear soliton equations using improved physics-informed neural networks with adaptive mechanisms

Yanan Guo; Xiaoqun Cao; Kecheng Peng

doi:10.1088/1572-9494/accb8d

Communications in Theoretical Physics >

2023 , Vol. 75 >Issue 9: 95003

DOI: https://doi.org/10.1088/1572-9494/accb8d

Mathematical Physics

Solving nonlinear soliton equations using improved physics-informed neural networks with adaptive mechanisms

Yanan Guo ^,¹ ,
Xiaoqun Cao ^,²^,³^,^* ,
Kecheng Peng ²

Expand

¹Simulation and Training Center, Naval Aviation University, Huludao 125001, China
²College of Computer, National University of Defense Technology, Changsha 410073, China
³College of Meteorology and Oceanography, National University of Defense Technology, Changsha 410073, China

^*Author to whom any correspondence should be addressed.

Received date: 2022-11-13

Revised date: 2023-02-27

Accepted date: 2023-04-10

Online published: 2023-09-01

Copyright

Fold

Abstract

Partial differential equations (PDEs) are important tools for scientific research and are widely used in various fields. However, it is usually very difficult to obtain accurate analytical solutions of PDEs, and numerical methods to solve PDEs are often computationally intensive and very time-consuming. In recent years, Physics Informed Neural Networks (PINNs) have been successfully applied to find numerical solutions of PDEs and have shown great potential. All the while, solitary waves have been of great interest to researchers in the field of nonlinear science. In this paper, we perform numerical simulations of solitary wave solutions of several PDEs using improved PINNs. The improved PINNs not only incorporate constraints on the control equations to ensure the interpretability of the prediction results, which is important for physical field simulations, in addition, an adaptive activation function is introduced. By introducing hyperparameters in the activation function to change the slope of the activation function to avoid the disappearance of the gradient, computing time is saved thereby speeding up training. In this paper, the mKdV equation, the improved Boussinesq equation, the Caudrey–Dodd–Gibbon–Sawada–Kotera equation and the p-gBKP equation are selected for study, and the errors of the simulation results are analyzed to assess the accuracy of the predicted solitary wave solution. The experimental results show that the improved PINNs are significantly better than the traditional PINNs with shorter training time but more accurate prediction results. The improved PINNs improve the training speed by more than 1.5 times compared with the traditional PINNs, while maintaining the prediction error less than 10⁻² in this order of magnitude.

Key words： physics-informed neural networks; adaptive activation function; partial differential equations; solitary wave

Cite this article

Yanan Guo , Xiaoqun Cao , Kecheng Peng . Solving nonlinear soliton equations using improved physics-informed neural networks with adaptive mechanisms[J]. Communications in Theoretical Physics, 2023 , 75(9) : 095003 . DOI: 10.1088/1572-9494/accb8d

1. Introduction

Partial differential equations (PDEs) are important tools for studying the laws of nature [1]. A large number of natural science and engineering problems can be described by differential equations, such as the motion of galaxies in the Universe, weather forecasting about temperature and wind speed, and the interaction between atoms [2, 3]. So far, PDEs have become an important link between mathematical theory and studies in the field of physics. There is no doubt that the development of the theory of PDEs has greatly advanced modern science and revolutionized many fields such as aerospace, radio wave communication, ocean exploration, and numerical weather prediction. However, solving most PDEs is very difficult, such as Maxwell’s equations in the field of electromagnetism and Navier–Stokes equations (NS) in the field of fluids. Due to their complexity and nonlinearity, it is difficult to find analytical solutions. Therefore, various numerical methods have been proposed to approximate the solutions of PDEs, such as finite difference method (FDM), finite element method (FEM), finite volume method (FVM), etc [4–7]. Although these methods have been widely used, they still have problems such as instability, error accumulation, and local truncation error. In addition to the problem of computational accuracy, the traditional numerical methods also have the problems of long computational time, high computational cost, and difficulty of grid dissection.

In recent years, deep learning technology has triggered a new wave of artificial intelligence research [8–10]. At present, deep learning has achieved great success in many fields such as image classification, natural language processing, and object detection [11, 12]. In addition to traditional research fields, the integration of artificial intelligence and natural sciences has received more and more attention [13–15]. At present, there have been many exciting research results in this cross-research field. For example, AlphaFold2 addresses the prediction of protein folding and three-dimensional structure [16]. Another hot work in this field is the application of deep neural networks to solve PDEs [17–19]. The theoretical basis of this research is the universal approximation theorem [20], that is, a multi-layer feedforward network with a sufficient number of hidden layer neurons can adequately approximate any continuous function. Mathematically, deep learning provides a tool for approximating high-dimensional functions. Traditional numerical methods often suffer from the curse of dimensionality, and deep neural networks can cope with this problem [21, 22]. Because for neural networks, the amount of computation brought about by the increase in dimension increases linearly. At the same time, compared to traditional grid-based methods such as the finite difference method and finite volume method, deep learning is a flexible, gridless method that is simpler to implement. In addition, in terms of hardware, it is easier for deep neural networks to take advantage of the hardware advantages of a graphics processing unit (GPU), thereby improving the speed of solving PDEs. Relevant researchers have carried out a lot of research, and the results show that neural network has unique advantages in the research of solving PDE [23–26]. Some researchers use neural networks instead of traditional numerical discretization methods to approximate the solutions of PDEs, for example, Lagaris et al used neural networks to solve the initial and boundary value problems [27, 28]. There are also some researchers who use neural networks to improve the traditional numerical solutions of PDEs, thereby improving the computational efficiency of existing methods [29, 30]. Among many works, one of the most representative achievements is the related work of physics-informed neural networks (PINNs) [31–35], which proposes a new composite loss function. The loss function consists of four parts: observation data constraints, PDE constraints, boundary condition constraints, and initial condition constraints. PINNs encode physics governing equations into neural networks, significantly improving the performance of neural network models. In terms of programming implementation, PINNs take full advantage of the automatic differentiation framework [36]. Currently, automatic differentiation mechanisms are widely used in deep learning frameworks. Different from symbolic differentiation and numerical differentiation, automatic differentiation relies on the concept of computational graph, performs symbolic differentiation within each computational graph node, and uses numerical values to store the differential results between nodes. Therefore, the automatic differentiation mechanism is more accurate than numerical differentiation and more efficient than symbolic differentiation. After PINNs was proposed, a large number of researchers have studied and improved the original PINNs, proposed various improved PINNs and applied them in many fields, showing great application prospects [37–41]. Chen et al have carried out a large number of simulation studies on localized wave solutions of the integrable equations using deep learning methods, which can effectively characterize their dynamical behaviors, and proposed the theoretical architecture and development direction of integrable deep learning, which has promoted the related research on nonlinear mathematical physics and integrable systems [42–47]. In recent years, with the rapid development of ocean sound field calculation, atmospheric pollution diffusion simulation, seismic wave inversion and prediction, etc [48–50], higher requirements are put forward for the accuracy and efficiency of numerical methods, and it is urgent to develop new methods and tools for solving PDE. The researchers have used physics-informed neural networks to solve the frequency-domain wave equation, simulated seismic multifrequency wavefields, and conducted in-depth research on wave propagation and full waveform inversions, and achieved a series of research results [51–55]. It is expected that PINNs can provide a new opportunity to solve a large number of scientific and engineering problems at the present time, and bring breakthroughs in the relevant research.

Aiming at the problem of solitary wave solutions in the research of PDEs, this study uses improved PINNs to carry out solitary wave simulation research. Compared with the original PINNs, the improved PINNs modified the activation function in the neural network. Specifically, hyperparameters are introduced into the activation function to change the slope of the activation function and avoid the disappearance of the gradient, thereby saving computing time and speeding up training. The structure of this paper is as follows. Section 2 introduces PINNs with adaptive activation functions. Next, section 3 gives the specific mathematical forms and experimental schemes of the mKdV equation, the improved Boussinesq equation, the Caudrey–Dodd–Gibbon–Sawada–Kotera (CDGSK) equation and the p-gBKP equation, and then constructs the corresponding neural network to carry out numerical experiments and analyzes the experimental results. Finally, section 4 summarizes our work and and look forward to future research directions.

2. Method

This section focuses on how to improve PINNs with adaptive mechanisms for solving PDEs with solitary wave solutions. The basic idea of PINNs is to introduce the control equation as a regularization term into the loss function of the neural network, so as to transform the problem of solving PDEs in physical space into the optimization problem of neural network parameters, and then obtain the approximation solution of PDEs by training the neural network. Next, PINNs and adaptive mechanisms will be introduced in detail.

2.1. Physics-informed neural networks

From the perspective of mathematical function approximation theory, a neural network can be regarded as a general nonlinear function approximator. On the other hand, the problem of solving PDEs is to find nonlinear functions that satisfy constraints. Therefore, there is a certain connection between neural networks and PDEs. Raissi et al [31] used limited observation data and control equations, as well as initial and boundary conditions to construct residual terms, and used the sum of them to construct a loss function, and proposed PINNs under physical constraints. The neural network trained in this way can not only approximate the observation data, but also automatically satisfy the symmetry, invariance, conservation and other physical properties followed by PDEs. The design idea of PINNs is to combine data-driven with physical constraints, which provides a new idea for solving PDEs. For the following PDE:

(1) $\begin{eqnarray}\begin{array}{rcl}{u}_{t}+{{ \mathcal N }}_{x}[u] & = & 0,\quad x\in {\rm{\Omega }},t\in [{T}_{0},T]\\ u(x,{T}_{0}) & = & h(x),\quad x\in {\rm{\Omega }}\\ u(x,t) & = & g(x,t),\quad x\in \partial {\rm{\Omega }},t\in [{T}_{0},T].\end{array}\end{eqnarray}$

In the above formula, t and x represent the time variable and space variable respectively, T and Ω represent their value ranges respectively, and ∂Ω is the boundary of the space domain Ω. ${{ \mathcal N }}_{x}$ is a combination of linear and nonlinear operators, h(x) is the initial condition, and g(x, t) is the boundary condition.

To get the approximate solution of PDE, we first construct a neural network $\widehat{u}(x,t;\theta )$, where θ represents the parameters of the neural network. The neural network $\widehat{u}(x,t;\theta )$ should meet two requirements: on the one hand, the neural network is first trained using the observed data set, and later, when tested, the trained model should be able to output accurate function values for the input variables; on the other hand, it should conform to the physical constraints of PDEs. Here, the automatic differential technique is used to integrate the differential constraints in the PDE into the neural network, and a residual network is constructed to meet the requirements of the second term. Mathematically, the definition of a residual network is shown in formula (2).

(2) $\begin{eqnarray}f(x,t;\theta ):= \displaystyle \frac{\partial }{\partial t}\widehat{u}(x,t;\theta )+{{ \mathcal N }}_{x}[\widehat{u}(x,t;\theta )].\end{eqnarray}$

The next task is to construct the loss function of PINNs, which mainly consists of four parts, including observation data constraints, governing equation constraints, initial condition constraints and boundary condition constraints. Mathematically, the four parts are defined as follows.

(3) $\begin{eqnarray}{L}_{{\rm{o}}}\left(\theta ;{N}_{{\rm{o}}}\right)=\displaystyle \frac{1}{2\left|{N}_{{\rm{o}}}\right|}\displaystyle \sum _{j=1}^{{N}_{{\rm{o}}}}{\parallel \hat{u}\left({x}_{{\rm{o}}}^{j},{t}_{{\rm{o}}}^{j};\theta \right)-{u}_{{\rm{o}}}^{j}\parallel }_{2}^{2}\end{eqnarray}$

(4) $\begin{eqnarray}{L}_{\mathrm{PDE}}\left(\theta ;{N}_{f}\right)=\frac{1}{2\left|{N}_{f}\right|}\displaystyle \sum _{j=1}^{{N}_{f}}{\parallel f\left({x}_{f}^{j},{t}_{f}^{j};\theta \right)\parallel }^{2}\end{eqnarray}$

(5) $\begin{eqnarray}{L}_{\mathrm{IC}}\left(\theta ;{N}_{i}\right)=\frac{1}{2\left|{N}_{i}\right|}\displaystyle \sum _{j=1}^{{N}_{i}}{\parallel \hat{u}\left({x}_{i}^{j},0;\theta \right)-{h}_{i}^{j}\parallel }_{2}^{2}\end{eqnarray}$

(6) $\begin{eqnarray}{L}_{\mathrm{BC}}\left(\theta ;{N}_{b}\right)=\frac{1}{2\left|{N}_{b}\right|}\displaystyle \sum _{j=1}^{{N}_{b}}{\parallel \hat{u}\left({x}_{b}^{j},{t}_{b}^{j};\theta \right)-{g}_{b}^{j}\parallel }_{2}^{2}.\end{eqnarray}$

Equation (3) represents the loss term calculated according to the observation data, and the quality of the observation data will directly affect the effectiveness of the training. Equation (4) represents the PDE residual term calculated based on the selected residual points. All residual points are randomly selected in the time and space domains, and N_f is the number of selected residual points. Equation (4) requires the neural network to satisfy the constraints of PDEs, which are also known as physical information regularization terms. Equation (5) represents the loss term calculated from the initial condition, which requires the neural network to satisfy the constraints of the initial condition. Equation (6) represents the loss term calculated according to the boundary conditions, which requires the neural network to satisfy the constraints of the boundary conditions. Finally, the mathematical expression of the loss function to be optimized is as follows

(7) $\begin{eqnarray}\begin{array}{rcl}L(\theta ;N) & = & {\lambda }_{o}{L}_{o}\left(\theta ;{N}_{o}\right)+{\lambda }_{f}{L}_{{PDE}}\left(\theta ;{N}_{f}\right)\\ & & +{\lambda }_{b}{L}_{{BC}}\left(\theta ;{N}_{b}\right)+{\lambda }_{i}{L}_{{IC}}\left(\theta ;{N}_{i}\right),\end{array}\end{eqnarray}$

where λ = {λ_o, λ_f, λ_b, λ_i} is a vector of weight coefficients of the loss function. The weight coefficient of each loss item in this study is set to 1.0. Next, gradient descent methods, such as Adam, Stochastic Gradient Descent (SGD) or L-BFGS, are used to find the minimum value of the loss function. By continuously optimizing the parameters of the neural network, the value of the loss function decreases, so that the output of the neural network is close to the true solution of the PDE. The framework of the physics-informed neural networks is shown in figure 1.

模态框（Modal）标题

Abstract

Cite this article

1. Introduction

2. Method

2.1. Physics-informed neural networks

Figure 1. Schematic of physics-informed neural networks for solving partial differential equation.

2.2. The improved physics-informed neural networks with adaptive activation function

3. Numerical experiments and results

3.1. The numerical solution for the modified Korteweg–de Vries (mKdV) equation

3.1.1. The single soliton solution of mKdV equation

Figure 2. The exact single soliton solution (top) and learned solution (bottom) of the mKdV equation.

Figure 3. Comparison of the exact single soliton solution of the mKdV equation and the approximate solution of the neural network at different moments.

Figure 4. Error statistics for the improved PINNs and the original PINNs for solving single-soliton solutions of the mKdV equation.

3.1.2. The double soliton solution of mKdV equation

Figure 5. The exact two soliton solutions (top) and learned solution(bottom) of the mKdV equation.

Figure 6. Comparison of the exact double soliton solution of the mKdV equation and the approximate solution of the neural network at different moments.

Figure 7. Error statistics for the improved PINNs and the original PINNs for solving two-soliton solutions of the mKdV equation.

3.2. The numerical solution for the improved Boussinesq equation

3.2.1. The single soliton solution of the improved Boussinesq equation

Figure 8. The exact single soliton solution (top) and learned solution (bottom) of the improved Boussinesq equation.

Figure 9. Comparison of the exact single soliton solution of the improved Boussinesq equation and the approximate solution of the neural network at different moments.

Figure 10. Error statistics for the improved PINNs and the original PINNs for solving single soliton solutions of the improved Boussinesq equation.

3.2.2. The double soliton solution of the improved Boussinesq equation

Figure 11. The exact double soliton solution (top) and learned solution (bottom) of the improved Boussinesq equation.

Figure 12. Comparison of the exact double soliton solution of the improved Boussinesq equation and the approximate solution of the neural network at different moments.

Figure 13. Error statistics for the improved PINNs and the original PINNs for solving two soliton solutions of the improved Boussinesq equation.

3.3. The numerical solution for the CDGSK equation

3.3.1. The single soliton solution of CDGSK equation

Figure 14. The exact single soliton solution (top) and learned solution (bottom) of the CDGSK equation.

Figure 15. Comparison of the exact single soliton solution of the CDGSK equation and the approximate solution of the neural network at different times.

Figure 16. Error statistics for the improved PINNs and the original PINNs for solving single soliton solutions of the CDGSK equation.

3.3.2. The two soliton solution of CDGSK equation

Figure 17. The exact double soliton solution (top) and learned solution (bottom) of the CDGSK equation.

Figure 18. Comparison of the exact double soliton solution of the CDGSK equation and the approximate solution of the neural network at different times.

Figure 19. Error statistics for the improved PINNs and the original PINNs for solving double soliton solutions of the CDGSK equation.

3.4. The numerical solution for fractal solitons waves of the p-gBKP equation

Figure 20. The exact solution (left) and learned solution (right) of the p-gBKP equation with t = 5.

Figure 21. Error statistics for the improved PINNs and the original PINNs for solving the fractal solitons waves of the p-gBKP equation.

4. Conclusion

References