1. Introduction
2. Data and methods
2.1. Data
2.2. Convolutional neural network
Figure 1. Architecture of the CNN model used in this study. Our CNN architecture comprises a convolutional layer with 16 convolution kernels of size 5 × 5. These kernels function as filters, processing the input image through element-wise multiplications and aggregating activation feature maps. Subsequently, the feature maps undergo a 2 × 2 average pooling layer, reducing their size by aggregating values within local neighborhoods. The pooled feature maps are then flattened and fully connected to a hidden layer consisting of 100 neurons. Finally, the neurons in the hidden layer are connected to a fully connected output layer with 3 neurons. |
2.3. Unpacking the black box
3. Results
3.1. Identification of ENSO phases
Figure 2. (a) Identification of ENSO phases and their significance for the validation dataset (2000–2018) with the El Niño, La Niña, and normal phases represented by the red, blue, and white backgrounds respectively. (b) The Oceanic Niño index (ONI) from 2000 to 2018, where the El Niño, La Niña, and normal phases are displayed with red, blue, and white backgrounds, respectively. |
3.2. Unpacking the neural network
3.2.1. Unpacking the convolutional layer
Figure 3. Visualization of the convolution kernels with their values shown as 5 × 5 nodes on the surface. Among the 16 convolution kernels, only kernel 5 (blue surface) and kernel 6 (red surface) have nonzero values. |
3.2.2. Unpacking the hidden layer
Figure 4. (a)–(b) Visualization of nontrivial parameters for hidden neuron 100. The connection parameters associated with feature maps derived from convolution kernels 6 and 5 are visualized in (a) and (b), respectively. |
Figure 5. (a)–(b) Visualization of nontrivial parameters for hidden neuron 50. The connection parameters associated with feature maps derived from convolution kernels 6 and 5 are visualized in (a) and (b), respectively. |
3.2.3. Unpacking the output layer
Figure 6. (a)–(c), Visualization of the parameters for output neurons. The connection parameters for the El Niño, La Niña and normal neuron are visualized in (a), (b), and (c), respectively. The connection parameters are denoted as ${w}_{i}^{\mathrm{El}\,\mathrm{Ni}\tilde{{\rm{n}}}{\rm{o}}},$ ${w}_{i}^{\mathrm{La}\,\mathrm{Ni}\tilde{{\rm{n}}}{\rm{a}}}$ and ${w}_{i}^{\mathrm{Normal}}.$ The identifier i is used to represent each hidden neuron that connects to these output neurons. Parameters with significant contributions are marked by their corresponding identifier i. |
4. Conclusions
Acknowledgments
Appendix A. Demonstration of the effects of convolution kernels
Figure A1. Illustration of convolution kernels with the input image sampled from 1 October 2015, during the EP El Niño period. |
Figure A2. Illustration of convolution kernels with the input image sampled from 1 November 2009, during the CP El Niño period. |
Figure A3. Illustration of convolution kernels with the input image sampled from 1 February 2018, during the weak La Niña period. |
Figure A4. Illustration of convolution kernels with the input image sampled from 1 January 2008, during the extreme La Niña period. |
Figure A5. Illustration of convolution kernels with the input image sampled from 1 March 2007, during the normal period. |
Appendix B. Reproducibility and the robustness of results
Figure B1. The monthly predictions for the validation dataset (2000–2018). |
Figure B2. The monthly predictions for the validation dataset (2000–2018) in parallel training 1. |
Figure B3. The monthly predictions for the validation dataset (2000–2018) in parallel training 2. |
Figure B4. The nontrivial outputs of hidden layer for the validation dataset (2000–2018). |
Figure B5. The nontrivial outputs of hidden layer for the validation dataset (2000–2018) in parallel training 1. |
Figure B6. The monthly predictions for the validation dataset (2000–2018) in parallel training 2. |
Figure B7. The outputs of output layer for the validation dataset (2000–2018). |
Figure B8. The outputs of output layer for the validation dataset (2000–2018) in parallel training 1. |
Figure B9. The outputs of output layer for the validation dataset (2000–2018) in parallel training 2. |
Figure B10. Visualization of the convolution kernels in parallel training 1 with their values shown as 5 × 5 nodes on the surface. |
Figure B11. Visualization of nontrivial parameters in the hidden layer of parallel training 1. |
Figure B12. Visualization of the parameters for output neurons in parallel training 1. |
Figure B13. Visualization of the convolution kernels in parallel training 2 with their values shown as 5 × 5 nodes on the surface. |
Figure B14. Visualization of nontrivial parameters in the hidden layer of parallel training 2. |
Figure B15. Visualization of the parameters for output neurons in parallel training 2. |
Appendix C. Details about the CNN
C.1.Neurons and convolution kernels
Figure C1. The schematic diagram of a neuron. The superscript m is used to identify the neuron. wm and um are the inner parameters of the neuron. Function f is the activation function of the neuron. In this study, the ReLU function is used as the activation function. xm(t) is the input information of the neuron at time t. ${x}_{1}^{m}(t),$ ${x}_{2}^{m}(t),$ ${x}_{3}^{m}(t),$ ..., ${x}_{D}^{m}(t)$ are the components of xm(t). The neuron extract the intensity of feature wm, compare the intensity with the threshold um and decide how to give the output om(t) by the activation function f. |
Figure C2. The ReLU function. $\mathrm{ReLU}(x)=\max (x,0)$. |
Figure C3. The schematic diagram of convolution kernels. This is an example for two convolution kernels of size 3 × 3. These convolution kernels act as filters, transforming the input image by performing element-wise multiplications and subsequently aggregating the resulting activation feature maps. |