Solving nonlinear Boussinesq equation of second-order time derivatives with physics-informed neural networks

Yi Cheng; Chao Dong; Shaolong Zheng; Wei Hu

doi:10.1088/1572-9494/adcc8e

Communications in Theoretical Physics >

2025 , Vol. 77 >Issue 10: 105001

DOI: https://doi.org/10.1088/1572-9494/adcc8e

Mathematical Physics

Solving nonlinear Boussinesq equation of second-order time derivatives with physics-informed neural networks

Yi Cheng ¹ ,
Chao Dong ²^,³ ,
Shaolong Zheng ¹ ,
Wei Hu ^,¹^,^∗

Expand

¹College of Engineering, Lishui University, Lishui 323000, China
²Beijing National Laboratory for Condensed Matter Physics and Laboratory of Soft Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China
³University of Chinese Academy of Sciences, Beijing 100049, China

∗Author to whom any correspondence should be addressed.

Received date: 2024-10-17

Revised date: 2025-02-25

Accepted date: 2025-04-15

Online published: 2025-06-13

Copyright

This article is available under the terms of the IOP-Standard License.

Fold

Abstract

Deep learning combining the physics information is employed to solve the Boussinesq equation with second-order time derivative. High prediction accuracies are achieved by adding a new initial loss term in the physics-informed neural networks along with the adaptive activation function and loss-balanced coefficients. The numerical simulations are carried out with different initial and boundary conditions, in which the relative L₂-norm errors are all around 10⁻⁴. The prediction accuracies have been improved by two orders of magnitude compared to the former results in certain simulations. The dynamic behavior of solitons and their interaction are studied in the colliding and chasing processes for the Boussinesq equation. More training time is needed for the solver of the Boussinesq equation when the width of the two-soliton solutions becomes narrower with other parameters fixed.

Key words： improved physics-informed neural networks; Boussinesq equation; colliding-soliton; chasing-soliton

Cite this article

Yi Cheng , Chao Dong , Shaolong Zheng , Wei Hu . Solving nonlinear Boussinesq equation of second-order time derivatives with physics-informed neural networks[J]. Communications in Theoretical Physics, 2025 , 77(10) : 105001 . DOI: 10.1088/1572-9494/adcc8e

1. Introduction

Artificial intelligence based on deep learning has promoted the significant development of computer vision [1, 2] and especially large language model [3, 4] recently with the huge increase of the GPU computing capability. Physics-informed neural networks (PINNs) model depending on the deep neural network has been utilized to solve a lot of nonlinear partial differential equations (NPDEs), such as Burgers, nonlinear Schrödinger, and Navier–Stokes equations with the initial and boundary (IB) conditions [5–7]. Moreover, PINNs have been used to solve a complex nonlinear Schrödinger equation, enhancing optical fiber modeling and sensor/communication performance [8]. PINNs with a local time-updating discrete scheme are proposed for efficient and accurate forward and inverse soil consolidation analysis, demonstrating its superiority over continuous-time schemes, especially in high-dimensional problems [9]. In the scientific computation field, extensive studies have been carried out to improve, accelerate, and extend the original PINNs software. Convergence of the numerical simulation is expedited by introducing the global and local adaptive activation functions in the neural networks [10–12]. Improved PINNs are achieved by leveraging the conservative properties of the physical model to enhance the prediction accuracy [13]. Distributed GPU computing is made use of to reduce the training time in some pioneering research [14–16]. The adaptive sampling method significantly reduces convergence time and the required number of collocation points [17, 18]. A gradient-optimized model, achieved by analyzing the distribution of the back-propagated gradients of loss with respect to the neural network parameters, can balance the different loss terms to enhance the training efficiency [19, 20]. Incorporating physical causality into the loss function extends PINNs software capabilities to simulate the solutions of the dynamical systems exhibiting multi-scale, chaotic, or turbulent behaviors [21, 22]. The Fourier neural operator by parameterizing the integral kernel directly in Fourier space, is another method with a novel deep neural network to model turbulent flows with zero-shot super-resolution [23–25]. A deep learning framework that forcibly encodes a given physics structure in a recurrent convolutional neural network can facilitate the learning of the reaction-diffusion processes with high accuracy, robustness, and generalizability [26, 27]. PINNs with a self-adaptive macro model to accurately predict mixed traffic flow using sparse data, improving traffic simulation and interpretability [28]. The Deep Ritz Method uses deep learning to solve variational problems from PDEs, offering adaptability and scalability to high dimensions [29]. Deep Neural Networks are utilized to solve PDEs in computational mechanics by leveraging energy-based formulations as natural loss functions [30]. The adaptive deep learning method is introduced to analyze melting heat transfer in non-Newtonian fluids, efficiently solving boundary-layer problems using physics-informed constraints and transfer learning [31].

Here, we focus on the soliton wave solutions of the fourth-order Boussinesq equation using the PINNs model. The application of the Boussinesq equations includes the description of nonlinear phenomena in ocean engineering [32, 33], ion sound waves in plasmas [34, 35], and solitary waves in shallow water models [36, 37]. Additionally, the Boussinesq equation is explored for its ability to accurately simulate wave propagation, breaking, and interaction with nearshore topography, including dam-break scenarios [38]. Mathematically, the exact soliton wave solutions of the Boussinesq equation have been presented using the Hirota bilinear method, a comprehensive approach for constructing exact solutions of NPDEs [39, 40]. Boussinesq equations are in the form of ${u}_{tt}+{{ \mathcal N }}_{x}[u]=0$, in which ${{ \mathcal N }}_{x}[u]$ is a nonlinear operator acting on u with respect to x. Former research handling these kinds of NPDEs do not give a specific emphasis on the second-order operator in time with PINNs model [10, 19, 41]. Qian et al suggested that new forms for the PINN training loss function, including the gradient of the equation residual and the gradient of the initial-condition residual, are crucial to the error estimate but would be absent from the canonical PINNs formulation of the loss function in recent studies [42]. More attention should be devoted to solving the Boussinesq equation, which involves high-order spatial derivatives and second-order time derivatives, using the PINNs model. PINNs leverage automatic differentiation, which allows for the efficient calculation of gradients with respect to the inputs of the neural network. In contrast, traditional methods require discretizing differential operators, potentially introducing approximation errors, especially for high-order PDEs [6, 43]. Additionally, PINNs eliminate the need for mesh generation, offering significant advantages for problems with complex geometries or moving boundaries over traditional mesh-based approaches [16, 44]. Moreover, PINNs excel in solving inverse problems, enabling the inference of unknown parameters in PDEs from sparse data, thus outperforming traditional optimization-based techniques [6, 16].

In this paper, the improved PINNs (IPINNs) model is utilized to solve the Boussinesq equation with loss function containing a new initial first-order time derivative term by the aid of adaptive activation function and loss-balanced coefficients. The adaptive activation functions in the neural networks are employed to accelerate numerical convergence, as demonstrated in previous studies [10, 12]. In the current simulation, only the global adaptive activation function is used, achieved by adding a new scalable hyperparameter that dynamically alters the topology of loss function involved in the optimization process [10]. Loss-balanced coefficients are introduced based on the former research to enhance the prediction accuracy of PINNs [42, 45, 46]. Raissi previously suggested the potential extension of the original PINNs framework to solve the Boussinesq-type equations [5]. Recent studies have provided soliton [41, 47] and rogue [48] wave solutions of the Boussinesq equation with data-driven neural network model. In our current simulations, we revisit soliton wave solvers using the IPINNs model, achieving relative L₂-norm errors around 10⁻⁴ in all the simulations. This represents a significant improvement of two orders of magnitude in specific cases compared to previous results [41] using conventional PINNs. Parameter scans for the Boussinesq equation solvers are performed to demonstrate the versatility of the IPINNs model, revealing a close correlation between the width of two-soliton solutions and the training time. Notably, the increased computational demand for solving the fourth-order equation is mitigated by automatic differentiation, allowing the model to bypass the Courant–Friedrichs–Lewy (CFL) condition encountered in traditional methods. The diverse initial and boundary (IB) conditions, generated by the solvers with varying parameters, are utilized to validate the model within the same process. In some simulations, comparisons are drawn between the IPINNs model and a variant of the PINNs model, which omits the new initial first-order time derivative term.

The rest of the paper is organized as follows. In Section 2, we describe the updated loss function, which incorporates a new loss term and loss-balanced coefficients for the IPINNs model used in the current simulations. In Section 3, the one-soliton and two-soliton wave solvers for the Boussinesq equation are presented, showing good agreement between the exact and learned results. Finally, brief conclusions and discussions are provided in Section 5.

2. Model

To describe the IPINNs model, the Boussinesq equation constrained by the IB conditions is firstly brought in,

(1)$\begin{eqnarray}\left\{\begin{array}{cc}{u}_{tt}+{{ \mathcal N }}_{x}[u]=0,\quad & x\in [{X}_{1},{X}_{2}],t\in [{T}_{1},{T}_{2}],\\ u[x,{T}_{1}]=I(x),\quad & x\in [{X}_{1},{X}_{2}],\\ {u}_{t}[x,{T}_{1}]=Ip(x),\quad & x\in [{X}_{1},{X}_{2}],\\ u[X,t]=B(X,t),\quad & t\in [{T}_{1},{T}_{2}],X={X}_{1},{X}_{2},\end{array}\right.\end{eqnarray}$

where X₁, X₂ and T₁, T₂ are spatial and temporal domain boundaries, respectively. I(x), Ip(x), B(X₁, t) and B(X₂, t) represent the IB conditions. Data-driven solutions for the Boussinesq equation are then simulated using the neural network setup of the IPINNs model. For the current Boussinesq equation, the loss function is defined as follows utilizing the mean squared error method:

(2)$\begin{eqnarray}\begin{array}{rcl}J({\rm{\Theta }},{\lambda }_{u},{\lambda }_{I},{\lambda }_{B},a) & = & {J}_{u}({\rm{\Theta }},{\lambda }_{u},a)+{J}_{I}({\rm{\Theta }},{\lambda }_{I},{\lambda }_{{Ip}},a)+{J}_{B}({\rm{\Theta }},{\lambda }_{B},a),\\ {J}_{u}({\rm{\Theta }},{\lambda }_{u},a) & = & \displaystyle \frac{{\lambda }_{u}}{{N}_{u}}\sum _{i=1}^{{N}_{u}}|f({\rm{\Theta }},a,{t}_{u}^{i},{x}_{u}^{i}){|}^{2},\\ {J}_{I}({\rm{\Theta }},{\lambda }_{I},{\lambda }_{{Ip}},a) & = & \displaystyle \frac{{\lambda }_{I}}{{N}_{I}}\sum _{i=1}^{{N}_{I}}|u({\rm{\Theta }},a,{t}_{I}^{i},{x}_{I}^{i})-{\hat{u(t}}_{I}^{i},{x}_{I}^{i}){|}^{2}+\displaystyle \frac{{\lambda }_{{Ip}}}{{N}_{{Ip}}}\sum _{i=1}^{{N}_{{Ip}}}|{u}_{t}({\rm{\Theta }},a,{t}_{I}^{i},{x}_{I}^{i})-{\hat{u}}_{t}({t}_{I}^{i},{x}_{I}^{i}){|}^{2},\\ {J}_{B}({\rm{\Theta }},{\lambda }_{B},a) & = & \displaystyle \frac{{\lambda }_{B}}{{N}_{B}}\sum _{i=1}^{{N}_{B}}|u({\rm{\Theta }},a,{t}_{B}^{i},{x}_{B}^{i})-{\hat{u(t}}_{B}^{i},{x}_{B}^{i}){|}^{2}.\end{array}\end{eqnarray}$

Here Θ denotes the weight matrices and bias vectors in the neural network architecture. λ_u, λ_I, λ_Ip, and λ_B are the loss-balanced coefficients and will be learned to balance the interplay among the different loss terms [42, 45, 46]. The new initial loss term would be neglected when the hyperparameter λ_Ip = 0. The newly added hyperparameter a adjusts the slope of the activation function, accelerating the training process compared to the fixed default value of 1.0 [10]. Mathematically, the adaptive activation function replaces the x variable with ax. Thus, the TANH (hyperbolic tangent) function, for example, changes from the common form (e^x − e^−x)/(e^x + e^−x) to (e^ax − e^−ax)/(e^ax + e^−ax). $({x}_{I}^{i},{t}_{I}^{i})$, $({x}_{B}^{i},{t}_{B}^{i})$, and $({x}_{u}^{i},{t}_{u}^{i})$ represent the initial, boundary, and collocation points for u(x, t) in the spatial and temporal domains(x ∈ [X₁, X₂], t ∈ [T₁, T₂]). u and $\hat{u}$(${\hat{u}}_{t}$) represent the learned and exact values with $f={u}_{tt}({\rm{\Theta }},a,t,x)\,+{{ \mathcal N }}_{x}[u({\rm{\Theta }},a,t,x)]$ on behalf of the predicted physical model. N_u, N_I(N_Ip) and N_B are the number of the collocation points, initial and boundary points. A new generalized loss term J_Ip (second term in J_I) is added in the initial condition due to the second-order derivative of u(x, t) with respect to t. Data-driven deep neural network is determined by the collocation points in the simulation domain and the IB points. The loss term J_u to be minimized at a finite set of collocation points can learn the physical model ${u}_{tt}+{{ \mathcal N }}_{x}[u]=0$. The other two loss terms J_I and J_B are trained to satisfy the constraints of the IB points. The automatic differentiation method and backpropagation algorithm are utilized to train the deep neural network by calculating the derivative of the loss function with respect to Θ, λ_u, λ_I, λ_Ip, λ_B, and a denoting the trainable variables.

To solve the Boussinesq equation, the IPINNs neural network model is configured with TANH activation functions and initialized Xavier methods. Dirichlet boundary conditions are applied to the spatial domain edges. The Adam optimizer is utilized to achieve a better set of initial learnable variables [49]. The L-BFGS algorithm [50], with second-order derivatives of the loss function, is then used to finetune the parameters to obtain higher accuracy. The combined use of the Adam and L-BFGS algorithms significantly accelerates numerical convergence and enhances robustness compared to the exclusive use of a single optimizer. The default maximum optimizer epochs for the Adam and L-BFGS algorithms are set as 10 000 and 60 000, respectively. The L-BFGS algorithm would always finish running the software earlier automatically by the convergence mechanism in all the current simulations. The initial learning rates for Adam optimizers and L-BFGS algorithm are 0.007 and 0.9 for the simulations. The simulations are conducted on Python 3.7 and TensorFlow_GPU-1.14 software, utilizing the NVIDIA K80 GPU computing platforms with the Linux system.

3. Simulation of Boussinesq equation

The nonlinear (1 + 1) dimensional fourth-order Boussinesq equation in the current simulation is as follows:

(3)$\begin{eqnarray}{u}_{tt}-{u}_{xx}-3{({u}^{2})}_{xx}-{u}_{xxxx}=0.\end{eqnarray}$

The number of collocation points N_u is set as 10 000 by default generated by the Latin hypercube sampling method which are randomly parsed from the full high-resolution dataset in the IPINNs model. The default numbers of the IB training data are N_I = N_Ip = 256 and N_B = 100 in which the latter one is selected for both the lower and upper boundary conditions. In some simulations, the default parameters are modified, and these changes will be explicitly illustrated. The IB training data and collocation points are loaded into the IPINNs model to train the neural network parameters and predict the solutions of the Boussinesq equation. The analytic one-soliton and two-soliton solutions [39, 40] for the Boussinesq equation (3) to be compared with the learned results are shown by the following two equations, respectively:

(4)$\begin{eqnarray}u(x,t)=\frac{{k}^{2}}{2}{{\rm{{\rm{sech}} }}}^{2}\left(\frac{kx+k\sqrt{1+{k}^{2}}t}{2}\right),\end{eqnarray}$

(5)$\begin{eqnarray}u(x,t)=\frac{2A(x,t)}{C(x,t)}-\frac{2{B}^{2}(x,t)}{{C}^{2}(x,t)},\end{eqnarray}$

where

(6)$\begin{eqnarray}\begin{array}{rcl}A(x,t) & = & {k}_{1}^{2}{\,\rm{e}}^{{k}_{1}x+{\omega }_{1}t}+{k}_{2}^{2}{\rm{e}\,}^{{k}_{2}x+{\omega }_{2}t}+D{({k}_{1}+{k}_{2})}^{2}{\,\rm{e}\,}^{({k}_{1}+{k}_{2})x+({\omega }_{1}+{\omega }_{2})t},\\ B(x,t) & = & {k}_{1}{\,\rm{e}}^{{k}_{1}x+{\omega }_{1}t}+{k}_{2}{\rm{e}\,}^{{k}_{2}x+{\omega }_{2}t}+D({k}_{1}+{k}_{2}){\,\rm{e}\,}^{({k}_{1}+{k}_{2})x+D({\omega }_{1}+{\omega }_{2})t},\\ C(x,t) & = & 1+{\,\rm{e}}^{{k}_{1}x+{\omega }_{1}t}+{\rm{e}}^{{k}_{2}x+{\omega }_{2}t}+D{\rm{e}\,}^{({k}_{1}+{k}_{2})x+({\omega }_{1}+{\omega }_{2})t},\\ D & = & \displaystyle \frac{{({\omega }_{1}-{\omega }_{2})}^{2}-{({k}_{1}-{k}_{2})}^{2}-{({k}_{1}-{k}_{2})}^{4}}{{({\omega }_{1}+{\omega }_{2})}^{2}-{({k}_{1}+{k}_{2})}^{2}-{({k}_{1}+{k}_{2})}^{4}},\\ {\omega }_{1} & = & {(-1)}^{cc}({k}_{1}^{2}+{k}_{1}^{4}),\\ {\omega }_{2} & = & -({k}_{2}^{2}+{k}_{2}^{4}).\end{array}\end{eqnarray}$

In the colliding-soliton simulations, the parameter cc = 0 is used, while for the chasing-soliton cases, cc = 1. The parameter k in the one-soliton case, and k₁, k₂ in the two-soliton case, determine the different IB conditions. These parameters influence the solitons’velocity, width, and interaction behavior, thus playing a crucial role in shaping the dynamics of the soliton solutions within the simulation framework.

3.1. One-soliton solution

The one-soliton solution of the Boussinesq equation is first selected to provide the IB training data for the IPINNs model. The simulation domain is chosen as t ∈ [−5, 5], x ∈ [−20, 20] to do comparison with the former work [41]. When the parameter k = 1 is set up, the corresponding IB condition is

(7)$\begin{eqnarray}\left\{\begin{array}{l}I(x)=u[-5,x]=\frac{1}{2}{{\rm{{\rm{sech}} }}}^{2}\left(\frac{x-5\sqrt{2}}{2}\right),\quad \\ B(t)=u[t,\pm 20]=\frac{1}{2}{{\rm{{\rm{sech}} }}}^{2}\left(\frac{\pm 20+\sqrt{2}t}{2}\right),\quad \end{array}\right.\end{eqnarray}$

where the initial loss term about the first-order derivative with respect to time is handled by the automatic differentiation computation in the software that is not listed here due to the complex form.

In the one-soliton simulation, numerical convergence tests about the neural network parameters and collocation points are verified firstly to prove the robustness as shown in table 1, in which the number of the IB points is fixed by default. The abbreviation ‘h’represents hour, with ‘m’and ‘s’being minute and second in this paper. Generally, the number of neural network parameters is positively correlated with the number of collocation points to avoid the underfitting and overfitting phenomena in data learning for a specific problem. While the parameter range is quite broad in the current simulations, parameter tuning is still a necessity in data-driven machine learning. More neurons in the neural network lead to more training time, and so do the collocation points in the simulation domain. But more neurons and collocation points may not ensure a better prediction accuracy from the relative L₂-norm errors in table 1. The best relative L₂-norm error is derived from the appropriate setup of the neural network, collocation points, and IB points from the multiple tests. In general, the relative L₂-norm errors around 10⁻⁴ can be accomplished with extensive parameter settings. The left six figures in figure 1 show the comparison of the exact and learned results, in which the neural network contains 4 hidden layers with 15 neurons in each layer and the collocation points are set as 2000. The subtraction of the exact and learned results plotted in the left-upper third plot gives the error contour with very small values and L₂-norm error about 1.92 × 10⁻⁴. The good fit can also be seen from the left-lower three plots at three different time diagnosis points at t = −2.98, t = 0.05, and t = 3.08. The best L₂-norm error in our result is about two orders better than the former result [41] of 2.03 × 10⁻². The right plot in figure 1 depicts the evolution of the loss function concerning the epoch by the Adam optimizer alongside the L-BFGS algorithm. For the one-soliton solution of the Boussinesq equation, the numerical convergence can be accomplished quickly by the L-BFGS when the different numbers of Adam iterations are selected.

Table 1. The training time and relative L₂-norm errors are given by the different combinations of neural network architecture and collocation points for the one-soliton solver of the Boussinesq equation.

Architecture	collocation points	time	L₂-norm errors
15 units / 4 hidden layers	2000	5 m 44 s	1.92E-04
15 units / 4 hidden layers	5000	6 m 4 s	4.56E-04
15 units / 6 hidden layers	5000	9 m 66 s	6.22E-04
20 units / 8 hidden layers	10000	14 m 21 s	4.87E-04
20 units / 8 hidden layers	15000	26 m 12 s	9.46E-04

View original graphic|Download|PPT slide

Figure 1. The left-upper three figures give the theoretical, simulative, and pointwise error contours for the one-soliton solver of the Boussinesq equation. Comparisons of the theoretical and the predicted results at three different time diagnosis points are shown in the left-lower three plots. The right plot is the loss function versus the epoch.

3.2. Two-soliton solution

The two-soliton solutions of the Boussinesq equation are then numerically simulated by the data-driven IPINNs model with physics information. The colliding-soliton and chasing-soliton are the two states of the two-soliton solutions for the Boussinesq equation. The values of parameters k₁, k₂ and the signs of parameters ω₁, ω₂ in equation (6) can determine the soliton model. The IB conditions with different parameters are described in the following equation [39, 40], which provides the IB training data to the deep neural network model,

(8)$\begin{eqnarray}\left\{\begin{array}{ll}I(x)=u[x,-5]=\frac{2A(0,x)}{C(0,x)}-\frac{2{B}^{2}(0,x)}{{C}^{2}(0,x)},\quad & x\in [-20,20],\\ B(t)=u[\pm 20,t]=\frac{2A(t,\pm 20)}{C(t,\pm 20)}-\frac{2{B}^{2}(t,\pm 20)}{{C}^{2}(t,\pm 20)},\quad & t\in [-5,5],\end{array}\right.\end{eqnarray}$

where the initial loss term about the first-order derivative with respect to time is not listed here as the above one-soliton simulation. The simulation domain is also chosen as t ∈ [−5, 5], x ∈ [−20, 20] to do comparison with the former work. Parameter scans of k₁ and k₂ are carried out to verify the versatile application of the current model to find data-driven solutions of the colliding-soliton and chasing-soliton states, respectively. A wide range of parameter combination can fulfil the high prediction accuracies with the relative L₂-norm errors around 10⁻⁴ as the above one-soliton case. For consistency, the neural network parameters are set to be 5 hidden layers and 30 neurons are included in each hidden layer for the two-soliton simulations.

3.2.1. colliding-soliton solution

The colliding-soliton solution for the Boussinesq equation is validated first by the numerical simulation with the deep neural network model. Parameter scans ranging from k₁ = k₂ = 0.7 to k₁ = k₂ = 1.7 are executed with value interval of 0.2, where the parameters ω₁, ω₂ have opposite sign. k will represent k₁ and k₂ in the following for simplicity due to the same value of k₁ and k₂ in the colliding-soliton case. The two solitons have the same magnitude but the opposite directions in these conditions giving rise to the colliding-soliton phenomena. Different IB conditions are determined by the above different parameter settings in equation (6), leading to distinct loss terms in regard to the IB constraints while maintaining the same loss term describing the physical model. Comparisons of the theoretical and simulative results with k = 0.7, k = 1.1, k = 1.3, and k = 1.7 are shown in figures 2–5 in the spatial and temporal domains. The upper three contour plots, denoted by labels a, b, and c in each case, depict the exact, learned, and pointwise error results. The third plot serves to illustrate the difference between the exact and learned results. The exact and learned results agree with each other very well from the plot (a) and plot (b) in the four cases. A specific value comparison can be obtained from the contour plot (c) with all the domain distributions exhibiting very small values compared to the exact results in each parameter setup. The relative L₂-norm errors can be derived from the four simulations that are 4.83 × 10⁻⁴, 5.47 × 10⁻⁴, 6.55 × 10⁻⁴, and 5.69 × 10⁻⁴. Three diagnosis positions in t which are marked by the three white lines in the exact (a) and learned (b) contour plots, are chosen to give the specific comparisons in x direction. Once again, the perfect line overlap reveals the good agreement of exact and learned results as shown in the lower three plots labeled by (d), (e), and (f) in the four parameter settings. These results collectively affirm the high prediction accuracies achieved by the current model under different IB conditions.

View original graphic|Download|PPT slide

Figure 2. The upper three figures give the exact, learned, and pointwise error contours for k = 0.7 in the colliding-soliton simulations. Comparisons of the theoretical and the predicted results at three different time diagnosis points are shown in the lower three plots.

View original graphic|Download|PPT slide

Figure 3. The comparisons of the exact and learned results are depicted similarly to those in the above figure 2, except that k = 1.1 in the colliding-soliton simulations.

View original graphic|Download|PPT slide

Figure 4. The comparisons of the exact and learned results are depicted similarly to those in the above figure 2, except that k = 1.3 in the colliding-soliton simulations.

View original graphic|Download|PPT slide

Figure 5. The comparisons of the exact and learned results are depicted similarly to those in the above figure 2, except that k = 1.7 in the colliding-soliton simulations.

The best relative L₂-norm errors and the training time with respect to the k₁ and k₂ parameters are exhibited in table 2. Here, J1 represents the IPINNs model, and J2 represents the PINNs model, which also applies to the following description. The relative L₂-norm error is denoted as L₂ error in the Table for simplicity. The best relative L₂-norm errors are above two orders smaller than the results in former research [41] with k = 1.1 in the J1 simulation. Furthermore, the parameter range of k₁ and k₂ is considerably wider, indicating better robustness in data-driven learning. Training time increases as the values of k₁ and k₂ grow from k = 0.7 to k = 1.7. From the theoretical result of the Boussinesq equation in equation (6), it can be known that the two colliding-soliton widths are narrower with larger k₁ and k₂ values. Contour plots of the learned results in figures 2 and 3 exhibit the same phenomena in the simulation domains in the four parameter settings. Collision areas in spatial and temporal domains are greatly correlated with the width of the solitons as seen from the contour plots. From the traditional mesh-based point of view, the narrow soliton width means the grid resolution in spatial and temporal domains should increase for the simulation. The deep neural network model follows the same rule in the current simulations in solving the colliding-soliton solutions for the Boussinesq equation. To evaluate the performance of the deep learning model, the training times in table 2 are analyzed, incorporating the figure plots. The width of the k = 0.7 soliton in figure 2 is approximately 3.5 times wider than that of the k = 1.7 case in figure 5. Additionally, the training time is about 11.0 times longer for the soliton with k = 1.7 than that for the k = 0.7 case from table 2. However, the increased training time for the k = 1.7 soliton using the finite difference method with traditional mesh grids would exceed two orders of magnitude longer than that for the k = 0.7 case due to the CFL condition. The application of automatic differentiation computation can reduce the increased time for the fourth-order derivative in x direction and second-order derivative in t direction. For the J2 simulations, all the description are similar except that the relative L₂-norm errors are much larger due to the ignorance of the initial first-order time derivative term in the loss function. Still, the relative L₂-norm errors are about 3 times smaller than the results in former research [41].

Table 2. The training time and relative L₂-norm errors are given by scanning the parameters k₁, k₂ for the colliding-soliton solver of the Boussinesq equation.

k₁, k₂	0.7	0.9	1.1	1.3	1.5	1.7
L₂ error(10⁻⁴)-J1	4.83	4.48	5.47	6.55	6.72	5.69
L₂ error(10⁻²)-J2	1.47	2.87	2.60	1.95	1.60	3.90
time-J1	6 m 42 s	13 m 23 s	15 m 21 s	23 m 34 s	42 m15 s	1 h 6 m
time-J2	6 m17 s	12 m 42 s	16 m 57 s	25 m 16 s	41 m 52 s	1 h 9 m

In figure 6, the loss function evolutions are given with the k₁ and k₂ parameters setups from table 2 by the Adam optimizer plus the L-BFGS algorithm with the IPINNs model. The total number of training epochs increases from the k = 0.7 parameter setting to the k = 1.7 case in the current simulations. Since the neural network architecture, the number of collocation points and IB points, and the physics information are the same, the total training time is closely related to the optimizer steps. The number of the Adam optimizer iteration is set as 10 000, so the increasing time comes from the L-BFGS iterations. The learning speed is slower for the narrower width of the colliding-soliton for the L-BFGS algorithm. Each epoch consumes more time due to the second-order derivatives of the loss function with the L-BFGS algorithm compared to the Adam optimizer, with the latter relying only on the first-order derivatives.

View original graphic|Download|PPT slide

Figure 6. The six plots give the evolution of the loss function in the training process versus epoch by scanning parameter from k = 0.7 to k = 1.7 for the colliding-soliton case.

The current simulations employ the TANH activation function in the neural networks. In table 3, we provide descriptions and comparisons of the TANH, SIN (sinusoidal), GELU (Gaussian Error Linear Unit), SWISH, and MISH activation functions using the k = 1.1 case. The best relative L₂-norm errors achieved are 1.03 × 10⁻², 6.09 ×10⁻⁴, 6.35 × 10⁻⁴, and 7.33 × 10⁻⁴ for the SIN, GELU, SWISH, and MISH methods, respectively. The GELU, SWISH, and MISH activation functions achieve prediction accuracies comparable to that of the TANH function. In contrast, the sinusoidal activation functions demonstrate significantly poorer prediction precision. Additionally, the ReLU (Rectified Linear Unit) and Leaky ReLU activation functions fail to produce valid results or encounter NaN (Not a Number) issues in the current simulations. The training times for the SIN, GELU, SWISH, and MISH activation functions are approximately 2.37, 3.16, 1.86, and 1.77 times slower, respectively, compared to the TANH case. In figure 7, the evolutions of the loss function are shown, with Adam iterations set to 10 000, consistent with the above setup. The GELU, SWISH, and MISH activation functions achieve similar final loss values, as illustrated in the upper right figure of figure 6. However, despite having comparable training epochs to the TANH case, the SWISH and MISH activation functions exhibit slower per-epoch training times, resulting in longer total training durations. These comparisons confirm that the TANH activation function is efficient for the current simulations.

Table 3. The training time and relative L₂-norm errors are evaluated for different activation functions in the k = 1.1 case of the colliding-soliton.

k = 1.1	TANH	SIN	GELU	SWISH	MISH
L₂ error(10⁻⁴)-J1	5.47	103	6.09	6.35	7.33
time-J1	15 m 21 s	36 m 32 s	48 m 25 s	28 m 34 s	27 m 10 s

View original graphic|Download|PPT slide

Figure 7. The three plots illustrate the evolution of the loss function using sinusoidal, SWISH, and MISH activation functions for the k₂ = 1.1 colliding-soliton case.

3.2.2. Chasing-soliton solution

The chasing-soliton phenomena arise when two solitons move in the same direction but have different magnitudes. Parameter scans from k₂ = 0.7 to k₂ = 1.7 with a value interval of 0.2 are simulated at k₁ = 1.5, and the parameters ω₁ and ω₂ have the same sign. Different IB conditions are determined by the above different parameter settings in equation (6), allowing for the regularization of the same physical model. The total loss terms, governed by different IB constraints, are optimized using the Adam optimizer in conjunction with the L-BFGS algorithm in the IPINNs model. Exact and learned results are compared for k₂ = 0.7, k₂ = 1.1, k₂ = 1.5, and k₂ = 1.7 at k₁ = 1.5 in figures 8–11 in the spatial and temporal domains. The three upper contour plots, designated as (a), (b), and (c), share a resemblance to the earlier colliding-soliton case in their description. Pointwise error values(c), observed from the contour plots, are very small compared to the exact values. Relative L₂-norm errors, derived from the four pointwise error distribution plots (c), are 4.82 × 10⁻⁴, 5.22 × 10⁻⁴, 2.47 × 10⁻⁴, and 5.73 × 10⁻⁴, indicating high prediction accuracies under varying IB conditions. In the lower three plots labeled by d, e, and f of the four parameter settings (marked by three white lines in the exact and learned contour plots in figures 8–11), the blue solid lines and red dotted lines affirm the good agreement between exact and learned results.

View original graphic|Download|PPT slide

Figure 8. The upper three figures give the exact, learned, and pointwise error contours for k₂ = 0.9 in the chasing-soliton simulations. Comparisons of the theoretical and the predicted results at three different time diagnosis points are shown in the lower three plots.

View original graphic|Download|PPT slide

Figure 9. The comparisons of the exact and learned results are depicted similarly to those in the above figure 8, except that k = 1.1 in the chasing-soliton simulations.

View original graphic|Download|PPT slide

Figure 10. The comparisons of the exact and learned results are depicted similarly to those in the above figure 8, except that k = 1.5 in the chasing-soliton simulations.

View original graphic|Download|PPT slide

Figure 11. The comparisons of the exact and learned results are depicted similarly to those in the above figure 8, except that k = 1.7 in the chasing-soliton simulations.

The relative L₂-norm errors and the training time with respect to the k₂ parameters at k₁ = 1.5 are shown in table 4. Notably, for the parameter setup of k₂ = 0.9 at k₁ = 1.5, the best relative L₂-norm errors surpass those reported in previous research [41] by more than two orders of magnitude in J1 simulation. In addition, the comprehensive parameter scans across a wide range of k₂ values validate the model’s robustness in the context of chasing-soliton simulations. The training time exhibits a slight increase from k₂ = 0.7 to k₂ = 1.3 due to that the other parameter k₁ is fixed at 1.5 determining one soliton width in the physical process. When the condition k₂ = k₁ = 1.5 meets, two solitons merge into one soliton, leading to a simple structure in the spatial and temporal domains with the training time significantly decreasing. It should be noted that when the condition k₂ ≪ k₁ is satisfied, the two solitons would undergo a similar transformation into one soliton. When k₂ is around 0.9 at k₁ = 1.5, a notable phase shift phenomenon occurs, marked by a separation-fusion-separation process. The amplitude of the dominant soliton remains almost unchanged before and after the interaction but decreases during the fusion process in the simulation. For the simulations with k₂ = 1.1, 1.3, 1.7 at k₁ = 1.5, the interactions between the two chasing solitons remain strong, but the fusion phenomena are less prominent. For the J2 simulations, the relative L₂-norm errors are significantly larger compared to the J1 cases, similar to the aforementioned colliding-soliton simulations, except for the case when k₂ = k₁ = 1.5. The relative L₂-norm error is approximately 50% smaller than the result reported in the k₂ = 0.9 case, which is solely provided in the previous research [41].

Table 4. The training time and relative L₂-norm errors are given by scanning the parameters k₂ at k₁ = 1.5 for the chasing-soliton solver of the Boussinesq equation.

k₂	0.7	0.9	1.1	1.3	1.5	1.7
L₂ error(10⁻⁴)-J1	4.50	4.82	5.22	4.63	2.47	5.73
L₂ error(10⁻²)-J2	5.90	4.64	5.80	4.14	0.054	4.51
time-J1	15 m 44 s	17 m 4 s	23 m 41 s	23 m 55 s	8 m 32 s	38 m 40 s
time-J2	16 m 23 s	17 m 32 s	21 m 52 s	26 m 23 s	8 m 15 s	47 m 52 s

The evolution of the loss function is depicted in figure 12 concerning the k₂ parameters at k₁ = 1.5 with the IPINNs model. The optimization is performed using the Adam optimizer in conjunction with the L-BFGS algorithm. The neural network architecture, the number of collocation points and IB points, and the physics information are the same as in the above colliding-soliton cases. The number of the Adam optimizer iteration is set as 10 000, and the observed increase in training time is primarily attributed to the L-BFGS iterations. The total number of training epochs via the L-BFGS algorithm correlates significantly with the training time outlined in table 4. The longer training time for the k₂ = 1.7 case at k₁ = 1.5 can be attributed to the interaction of two solitons with narrower widths, resembling the colliding-soliton scenario discussed earlier. The L-BFGS algorithm requires more training epochs under such conditions. Conversely, the number of epochs decreases considerably for the k₂ = 1.5 case at k₁ = 1.5, where one soliton is formed by the fusion of two solitons.

View original graphic|Download|PPT slide

Figure 12. The six plots give the evolution of the loss function in the training process versus epoch by scanning parameter from k₂ = 0.7 to k₂ = 1.7 for the chasing-soliton case.

4. Conclusions and discussion

The IPINNs model has been employed to study one-soliton and two-soliton wave solutions for the fourth-order Boussinesq equation, providing insights into the dynamic behavior and interaction between solitons. The loss function, accounting for the Boussinesq equation’s second-order time dependence, includes a new initial first-order time derivative term, which is crucial for achieving high prediction accuracy. Adaptive activation function mechanism and loss-balanced coefficients are utilized to accelerate and enhance the PINNs model, as highlighted in previous research [10, 12, 42, 45]. The IPINNs model achieves high prediction accuracy, with relative L₂-norm errors around 10⁻⁴. This represents a significant improvement, with prediction precision increased by two orders of magnitude compared to previous results [41]. Comparisons among the TANH, SIN, GELU, SWISH, and MISH activation functions in the neural network are conducted using the colliding-soliton k = 1.1 case, confirming that the selection of TANH activation function is efficient for the current simulations. To validate the versatility and robustness of the IPINNs model, parameter scans are conducted for the two-soliton wave solvers of the Boussinesq equation. The training time is found to be closely related to the width of the two-soliton solutions when using the L-BFGS algorithm during colliding-soliton simulations. For instance, the width of the soliton with k = 1.7 is approximately 3.5 times narrower than that of the k = 0.7 case, with a corresponding training time that is about 11 times longer. However, the increased training time is not as pronounced as in traditional mesh-based algorithms, thanks to the use of automatic differentiation for the fourth-order Boussinesq equation. In the chasing-soliton scenario, when the condition k₂ = k₁ = 1.5 is met, the two solitons merge into one, resulting in a significant reduction in training time. The current model requires more initial data, which presents a challenge in obtaining the additional information.

The widespread applications of the PINNs software in solving NPDEs still require further understanding and optimization, as its prediction abilities are unstable in certain scenarios. To enhance the generalizability of deep neural network models, attention should be directed towards numerous NPDEs in physical fields that have yet to be solved using PINNs software. For instance, the modified Korteweg-de Vries–sine-Gordon equation, which describes few-cycle-pulse optical solitons and features rich structures such as soliton molecules, breather molecules, and breather-soliton molecules, presents a challenging problem for the PINNs model. Laser physics phenomena associated with the Boussinesq model will be studied in the future using the current model to assess its capability [51–53]. The soliton and lump solutions of the (2 + 1)-dimensional (3D) generalized variable-coefficient Hirota-Satsuma-Ito equation are worthy of study with the current model to verify its robustness and efficiency [54]. The computational efficiency of the current model when applied to 3D PDEs is a potential challenge and requires further study. For 3D PDEs with solutions exhibiting locally steep gradients or high-frequency components, adaptive sampling methods [16] and self-adaptive weight algorithms with scalable multi-GPU support [55] emerge as promising candidate approaches. The SAEE-PINN model using sparse data also provides valuable insights for our future research [28]. For PDEs with high-order or irregular domains, PINNs with a mesh-free method offer advantages over traditional methods that rely on mesh grid techniques [16, 56]. Future research will compare the current multilayer perceptron to the Kolmogorov–Arnold representation, which is also receiving attention in PINNs research. Combining domain decomposition techniques [15, 57, 58] with the current IPINNs model offers a promising approach for conducting large-domain simulations with higher prediction accuracy. Further advancements in heterogeneous parallel computing would significantly accelerate training through distributed data-parallel methods, which adjust the mini-batch sizes across workers. Incorporating more accelerated computing platforms in software development will also enhance the versatility and efficiency of software execution. Upgrading from TensorFlow 1 to TensorFlow 2 will be completed in our further work, improving support for distributed computing, automatic mixed precision, and other optimizations.

Declaration of competing interest

The authors declare that they have no conflict of interest.

Ethical standards

This research does not involve human participants and/or animals.

This work was supported by the National Natural Science Foundation of China under Grant No. 12475204.

References

Publishing order | Descend order by publishing year | Descend order by cited within

1	LeCun Y, Bengio Y, Hinton G 2015 Deep learning Nature 521 436 444 DOI

2	Voulodimos A et al 2018 Deep learning for computer vision: a brief review Comput. Intell. Neurosci. 2018 DOI

3	Vaswani A 2017 Attention is all you need Adv. Neural Inf. Process. Syst. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html 30

4	Gozalo-Brizuela R, Garrido-Merchan E C 2023 ChatGPT is not all you need. A state of the art review of large generative ai models arXiv:2301.04655

5	Raissi M 2018 Deep hidden physics models: deep learning of nonlinear partial differential equations J. Mach. Learn. Res. 19 932 955

6	Raissi M, Perdikaris P, Karniadakis G E 2019 Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations J. Comput. Phys. 378 686 707 DOI

7	Raissi M, Yazdani A, Em Karniadakis G 2020 Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations Science 367 1026 1030 DOI

8	Zhang Y, Lü X 2024 Data-driven solutions and parameter discovery of the extended higher-order nonlinear Schrödinger equation in optical fibers Physica D 468 134284 DOI

9	Guo H, Yin Z-Y 2024 A novel physics-informed deep learning strategy with local time-updating discrete scheme for multi-dimensional forward and inverse consolidation problems Comput. Methods Appl. Mech. Eng. 421 116819 DOI

10	Jagtap A D, Kawaguchi K, Em Karniadakis G 2020 Adaptive activation functions accelerate convergence in deep and physics-informed neural networks J. Comput. Phys. 404 109136 DOI

11	Jagtap A D, Kawaguchi K, Em Karniadakis G 2020 Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks Proc. Royal Soc. A 476 20200334 DOI

12	Jagtap A D, Shin Y, Kawaguchi K, Em Karniadakis G 2022 Deep Kronecker neural networks: a general framework for neural networks with adaptive activation functions Neurocomputing 468 165 180 DOI

13	Lin S, Chen Y 2022 A two-stage physics-informed neural network method based on conserved quantities and applications in localized wave solutions J. Comput. Phys. 457 111053 DOI

14	Meng X, Li Z, Zhang D, Em Karniadakis G 2020 PPINN: Parareal physics-informed neural network for time-dependent pdes Comput. Methods Appl. Mech. Eng. 370 113250 DOI

15	Shukla K, Jagtap A D, Em Karniadakis G 2021 Parallel physics-informed neural networks via domain decomposition J. Comput. Phys. 447 110683 DOI

16	Lu L, Meng X, Mao Z, Em Karniadakis G 2021 Deepxde: a deep learning library for solving differential equations SIAM Rev. 63 208 228 DOI

17	Subramanian S, Kirby R M, Mahoney M W, Gholami A 2022 Adaptive self-supervision algorithms for physics-informed neural networks arXiv:2207.04084

18	Daw A, Bu J, Wang S, Perdikaris P, Karpatne A 2022 Rethinking the importance of sampling in physics-informed neural networks arXiv:2207.02338

19	Wang S, Teng Y, Perdikaris P 2021 Understanding and mitigating gradient flow pathologies in physics-informed neural networks SIAM J. Sci. Comput. 43 A3055 A3081 DOI

20	Li J, Chen J, Li B 2022 Gradient-optimized physics-informed neural networks (gopinns): a deep learning method for solving the complex modified kdv equation Nonlinear Dyn. 107 781 792 DOI

21	Wang S, Sankaran S, Perdikaris P 2024 Respecting causality is all you need for training physics-informed neural networks Comput. Meth. Appl. Mech. Engin. 421 116813 DOI

22	Penwarden M, Jagtap A D, Zhe S, Em Karniadakis G, Kirby R M 2023 A unified scalable framework for causal sweeping strategies for physics-informed neural networks (PINNs) and their temporal decompositions J. Comput. Phys. 493 112464 DOI

23	Li Z, Kovachki N, Azizzadenesheli K, Liu B, Bhattacharya K, Stuart A, Anandkumar A 2020 Fourier neural operator for parametric partial differential equations arXiv:2010.08895

24	Kovachki N, Li Z, Liu B, Azizzadenesheli K, Bhattacharya K, Stuart A, Anandkumar A 2021 Neural operator: learning maps between function spaces J. Mach. Learn. Res. https://www.jmlr.org/papers/volume24/21-1524/21-1524.pdf 24 1 97

25	Wen G, Li Z, Azizzadenesheli K, Anandkumar A, Benson S M 2022 U-FNO—an enhanced fourier neural operator-based deep-learning model for multiphase flow Adv. Water Resour. 163 104180 DOI

26	Ren P, Rao C, Liu Y, Wang J-X, Sun H 2022 Phycrnet: Physics-informed convolutional-recurrent network for solving spatiotemporal pdes Comput. Methods Appl. Mech. Eng. 389 114399 DOI

27	Rao C, Ren P, Wang Q, Buyukozturk O, Sun H, Liu Y 2023 Encoding physics to learn reaction-diffusion processes Nat. Mach. Intell. 5 765 779 DOI

28	Su Y-B, Lü X, Li S-K, Yang L-X, Gao Z 2024 Self-adaptive equation embedded neural networks for traffic flow state estimation with sparse data Phys. Fluids 36 10 DOI

29	Yu B et al 2018 The deep ritz method: a deep learning-based numerical algorithm for solving variational problems Commun. Math. Stat. 6 1 12 DOI

Samaniego

, Anitescu

, Goswami

, Nguyen-Thanh

V M

, Guo

, Hamdia

, Zhuang

, Rabczuk

2020 An energy approach to the solution of partial differential equations in computational mechanics via machine learning: Concepts, implementation and applications Comput. Methods Appl. Mech. Eng. 362 112790

DOI

31	Guo H, Zhuang X, Alajlan N, Rabczuk T 2023 Physics-informed deep learning for melting heat transfer analysis with model-based transfer learning Comput. Math. Appl. 143 303 317 DOI

32	Gao J, Zhou X, Zang J, Chen Q, Zhou L 2018 Influence of offshore fringing reefs on infragravity period oscillations within a harbor Ocean Eng. 158 286 298 DOI

33	Su Y, Liu Z Y 2016 Numerical model of sloshing in rectangular tank based on Boussinesq-type equations Ocean Eng. 121 166 173 DOI

34	Infeld E, Rowlands G 2000 Nonlinear Waves, Solitons and Chaos Cambridge Cambridge University Press 84

35	Scott A C 2006 The application of bäcklund transforms to physical problems Bäcklund Transformations, the Inverse Scattering Method, Solitons, and Their Applications: NSF Research Workshop on Contact Transformations Berlin Springer 80 105

36	El-Sheikh M M A, Seadawy A R, Ahmed H M, Arnous A H, Rabie W B 2020 Dispersive and propagation of shallow water waves as a higher order nonlinear Boussinesq-like dynamical wave equations Physica A 537 122662 DOI

37	Kai Y, Chen S, Zhang K, Yin Z 2024 A study of the shallow water waves with some Boussinesq-type equations Waves Random Complex Media 34 1251 1268 DOI

38	Antunes do Carmo J S, Ferreira J A, Pinto L 2019 On the accurate simulation of nearshore and dam break problems involving dispersive breaking waves Wave Motion 85 125 143 DOI

39	Hietarinta J 1987 A search for bilinear equations passing Hirota’s three-soliton condition. I. KdV-type bilinear equations J. Math. Phys. 28 1732 1742 DOI

40	Zhang Y, Ye L-ya, Lv Y-neng, Zhao H-qiong 2007 Periodic wave solutions of the Boussinesq equation J. Phys. A: Math. Theor. 40 5539 DOI

41	Cui S, Wang Z, Han J, Cui X, Meng Q 2022 A deep learning method for solving high-order nonlinear soliton equations Commun. Theor. Phys. 74 075007 DOI

42	Qian Y, Zhang Y, Huang Y, Dong S 2023 Physics-informed neural networks for approximating dynamic (hyperbolic) pdes of second order in time: Error analysis and algorithms J. Comput. Phys. 495 112527 DOI

43	Em Karniadakis G, Kevrekidis I G, Lu L, Perdikaris P, Wang S, Yang L 2021 Physics-informed machine learning Nat. Rev. Phys. 3 422 440 DOI

44	Zhu Y, Kong W, Deng J, Bian X 2024 Physics-informed neural networks for incompressible flows with moving boundaries Phys. Fluids 36 013617 DOI

45	Psaros A F, Kawaguchi K, Em Karniadakis G 2022 Meta-learning PINN loss functions J. Comput. Phys. 458 111121 DOI

46	Mishra S, Molinaro R 2023 Estimates on the generalization error of physics-informed neural networks for approximating pdes IMA J. Numer. Anal. 43 1 43 DOI

47	Guo Y, Cao X, Peng K 2023 Solving nonlinear soliton equations using improved physics-informed neural networks with adaptive mechanisms Commun. Theor. Phys. 75 095003 DOI

48	Gao R, Hu W, Fei J, Wu H 2023 Boussinesq equation solved by the physics-informed neural networks Nonlinear Dyn. 1 13 DOI

49	Kingma D P, Ba J 2014 Adam: a method for stochastic optimization arXiv:1412.6980

50	Byrd R H, Lu P, Nocedal J, Zhu C 1995 A limited memory algorithm for bound constrained optimization SIAM J. Sci. Comput. 16 1190 1208 DOI

51	GROHS G 1976 Chemical laser cavity mixing and turbulence 14th Aerosp. Sci. Meeting 56

52	Ismael H F, Bulut H, Baskonus H M, Gao W 2021 Dynamical behaviors to the coupled Schrödinger–Boussinesq system with the beta derivative AIMS Math. 6 7909 7928 DOI

53	Endtmayer B, Demircan A, Perevoznik D, Morgner U, Beuchler S, Wick T 2023 Adaptive finite element simulations of laser-heated material flow using a Boussinesq model PAMM 23 e202200219 DOI

54	Lü X, Zhang L-L, Ma W-X 2024 Oceanic shallow-water description with (2+1)-dimensional generalized variable-coefficient Hirota–Satsuma-ito equation: Painlevé analysis, soliton solutions, and lump solutions Phys. Fluids 36 064110 DOI

55	McClenny L D, Haile M A, Braga-Neto U M 2021 Tensordiffeq: scalable multi-GPU forward and inverse solvers for physics informed neural networks arXiv:2103.16034

56	Guo H, Zhuang X, Fu X, Zhu Y, Rabczuk T 2023 Physics-informed deep learning for three-dimensional transient heat transfer analysis of functionally graded materials Comput. Mech. 72 513 524 DOI

57	Ryck T D, Jagtap A D, Mishra S 2023 Error estimates for physics informed neural networks approximating the Navier–Stokes equations IMA J. Numer. Anal. 44 83 119 DOI

58	Hu Z, Jagtap A D, Em Karniadakis G, Kawaguchi K 2023 Augmented physics-informed neural networks (APINNs): a gating network-based soft domain decomposition methodology Eng. Appl. Artif. Intell. 126 107183 DOI

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

1. Introduction

2. Model

3. Simulation of Boussinesq equation

3.1. One-soliton solution

Table 1. The training time and relative L2-norm errors are given by the different combinations of neural network architecture and collocation points for the one-soliton solver of the Boussinesq equation.

3.2. Two-soliton solution

3.2.1. colliding-soliton solution

Figure 2. The upper three figures give the exact, learned, and pointwise error contours for k = 0.7 in the colliding-soliton simulations. Comparisons of the theoretical and the predicted results at three different time diagnosis points are shown in the lower three plots.

Figure 3. The comparisons of the exact and learned results are depicted similarly to those in the above figure 2, except that k = 1.1 in the colliding-soliton simulations.

Figure 4. The comparisons of the exact and learned results are depicted similarly to those in the above figure 2, except that k = 1.3 in the colliding-soliton simulations.

Figure 5. The comparisons of the exact and learned results are depicted similarly to those in the above figure 2, except that k = 1.7 in the colliding-soliton simulations.

Table 2. The training time and relative L2-norm errors are given by scanning the parameters k1, k2 for the colliding-soliton solver of the Boussinesq equation.

Figure 6. The six plots give the evolution of the loss function in the training process versus epoch by scanning parameter from k = 0.7 to k = 1.7 for the colliding-soliton case.

Table 3. The training time and relative L2-norm errors are evaluated for different activation functions in the k = 1.1 case of the colliding-soliton.

Figure 7. The three plots illustrate the evolution of the loss function using sinusoidal, SWISH, and MISH activation functions for the k2 = 1.1 colliding-soliton case.

3.2.2. Chasing-soliton solution

Figure 8. The upper three figures give the exact, learned, and pointwise error contours for k2 = 0.9 in the chasing-soliton simulations. Comparisons of the theoretical and the predicted results at three different time diagnosis points are shown in the lower three plots.

Figure 9. The comparisons of the exact and learned results are depicted similarly to those in the above figure 8, except that k = 1.1 in the chasing-soliton simulations.

Figure 10. The comparisons of the exact and learned results are depicted similarly to those in the above figure 8, except that k = 1.5 in the chasing-soliton simulations.

Figure 11. The comparisons of the exact and learned results are depicted similarly to those in the above figure 8, except that k = 1.7 in the chasing-soliton simulations.

Table 4. The training time and relative L2-norm errors are given by scanning the parameters k2 at k1 = 1.5 for the chasing-soliton solver of the Boussinesq equation.

Figure 12. The six plots give the evolution of the loss function in the training process versus epoch by scanning parameter from k2 = 0.7 to k2 = 1.7 for the chasing-soliton case.

4. Conclusions and discussion

Declaration of competing interest

Ethical standards

References

Table 1. The training time and relative L₂-norm errors are given by the different combinations of neural network architecture and collocation points for the one-soliton solver of the Boussinesq equation.

Table 2. The training time and relative L₂-norm errors are given by scanning the parameters k₁, k₂ for the colliding-soliton solver of the Boussinesq equation.

Table 3. The training time and relative L₂-norm errors are evaluated for different activation functions in the k = 1.1 case of the colliding-soliton.

Figure 7. The three plots illustrate the evolution of the loss function using sinusoidal, SWISH, and MISH activation functions for the k₂ = 1.1 colliding-soliton case.

Figure 8. The upper three figures give the exact, learned, and pointwise error contours for k₂ = 0.9 in the chasing-soliton simulations. Comparisons of the theoretical and the predicted results at three different time diagnosis points are shown in the lower three plots.

Table 4. The training time and relative L₂-norm errors are given by scanning the parameters k₂ at k₁ = 1.5 for the chasing-soliton solver of the Boussinesq equation.

Figure 12. The six plots give the evolution of the loss function in the training process versus epoch by scanning parameter from k₂ = 0.7 to k₂ = 1.7 for the chasing-soliton case.