Unpacking the black box of deep learning for identifying El Ni&ntilde;o-Southern oscillation

Yu Sun; Yusupjan Habibulla; Gaoke Hu; Jun Meng; Zhenghui Lu; Maoxin Liu; Xiaosong Chen

doi:10.1088/1572-9494/ace17d

Communications in Theoretical Physics >

2023 , Vol. 75 >Issue 9: 95601

DOI: https://doi.org/10.1088/1572-9494/ace17d

Statistical Physics, Soft Matter and Biophysics

Unpacking the black box of deep learning for identifying El Niño-Southern oscillation

Yu Sun ¹ ,
Yusupjan Habibulla ² ,
Gaoke Hu ¹ ,
Jun Meng ³ ,
Zhenghui Lu ⁴ ,
Maoxin Liu ¹ ,
Xiaosong Chen ^,¹

Expand

¹School of Systems Science/Institute of Nonequilibrium Systems, Beijing Normal University, Beijing 100875, China
²School of Physics and Technology, Xinjiang University, Wulumuqi 830017, China
³School of Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
⁴National Institute of Natural Hazards, Ministry of Emergency Management of China, Beijing 100085, China

Received date: 2023-06-21

Revised date: 2023-06-25

Accepted date: 2023-06-26

Online published: 2023-08-10

Copyright

Fold

Abstract

By training a convolutional neural network (CNN) model, we successfully recognize different phases of the El Niño-Southern oscillation. Our model achieves high recognition performance, with accuracy rates of 89.4% for the training dataset and 86.4% for the validation dataset. Through statistical analysis of the weight parameter distribution and activation output in the CNN, we find that most of the convolution kernels and hidden layer neurons remain inactive, while only two convolution kernels and two hidden layer neurons play active roles. By examining the weight parameters of connections between the active convolution kernels and the active hidden neurons, we can automatically differentiate various types of El Niño and La Niña, thereby identifying the specific functions of each part of the CNN. We anticipate that this progress will be helpful for future studies on both climate prediction and a deeper understanding of artificial neural networks.

Key words： deep learning; El Niño-Southern oscillation; convolutional neural network; interpretability

Cite this article

Yu Sun , Yusupjan Habibulla , Gaoke Hu , Jun Meng , Zhenghui Lu , Maoxin Liu , Xiaosong Chen . Unpacking the black box of deep learning for identifying El Niño-Southern oscillation[J]. Communications in Theoretical Physics, 2023 , 75(9) : 095601 . DOI: 10.1088/1572-9494/ace17d

1. Introduction

Deep learning [1–6] has emerged as a powerful and adaptive paradigm for handling complexity and acquiring abstract representations from data. It has led to groundbreaking advancements in various fields such as Earth system science [7–12], biology [13, 14], finance [15], transportation [16, 17], and more. However, the interpretability [8, 18–24] of deep learning models, often referred to as ‘black boxes’ [25–27], remains a pressing concern. Deriving human-comprehensible insights from these models [28–31] is crucial for a deeper understanding and generating domain knowledge. Noteworthy interpretation skills, including layerwise relevance propagation [24, 32–34], saliency maps [35–38], optimal input [39], and others, have been employed in efforts to unravel the inner mechanisms of deep learning models. However, these existing approaches predominantly focus on identifying the empirical features that contribute to the model’s output rather than delving into the causal clues within the black box itself [8, 40–43].

The El Niño-Southern Oscillation (ENSO) is a well-known interannual climate variability phenomenon [44, 45]. ENSO is characterized by abnormal warming or cooling in the equatorial central and eastern Pacific region, and its anomalous state has far-reaching meteorological impacts on a global scale [46–48]. Numerous studies have employed a variety of deep learning architectures to analyze and forecast the evolution of ENSO [9, 34, 49–51]. For example, Ham et al proposed a deep ensemble prediction framework for forecasting El Niño events using CNNs with improved prediction accuracy compared to traditional statistical models [9]. While this study contributes to ENSO prediction accuracy, it has several potential shortcomings. The primary limitation lies in the limited interpretability of CNNs, which hinders a clear understanding of the model’s decision-making process. In addition, the reliance on data quality may also affect the confidence of the predictions.

Deep learning models can be viewed as compositions of constituent units called neurons [1–3], with the representation encompassing both individual neuron parameter features and the collective behavior arising from their activation states. Investigating individual neurons and their activation states is of fundamental importance to achieve a comprehensive understanding of the representation. Moreover, the complexity of deep learning models is intrinsically linked to the complexity of the learning tasks they tackle, as exemplified by DeepMind’s research [52]. When a deep learning model exhibits perfect performance in solving a specific task, it implies that the model’s representation encodes the task’s intrinsic features [53]. Therefore, with improved measurability and transparency offered by deep learning models, the complexity inherent in these systems should not serve as an excuse to avoid studying them. Instead, it should serve as motivation to investigate further.

Based on this perspective, a methodology is proposed to analyze the inner representation of deep learning models. We use it to elucidate the inner workings of a convolutional neural network (CNN) model designed to classify ENSO. Our deep learning task focuses on classifying different phases of ENSO based on near-surface air temperature. The well-trained model demonstrates precise identification of distinct ENSO phases. Analysis of the internal model representation reveals a condensed and simplified parameter structure, improving our understanding of each component’s task-specific function. Remarkably, this parameter structure enables clear differentiation between eastern Pacific (EP) and central Pacific (CP) El Niño patterns, as well as weak and extreme La Niña patterns, shedding light on the crucial features of this natural phenomenon.

This study serves as a rudimentary model for delving into the black boxes of complex systems, providing preliminary insights and inspiration for harnessing the power of deep learning to comprehend phenomena within complex systems. Moreover, we firmly believe that the fundamental perspective and methodology employed in this study possess the potential to be extended to more intricate models, owing to the inherent nature of complexity that is shared among diverse systems.

2. Data and methods

2.1. Data

The aim of our learning task is to identify different phases of the ENSO based on the images of near-surface air temperature. ENSO is a basin-scale phenomenon that involves coupled atmosphere-ocean processes. It consists of three phases: El Niño, La Niña, and the normal phase. El Niño and La Niña refer to irregular warming and cooling of the equatorial central and eastern Pacific region. To define the phases, we conventionally employ the Oceanic Niño Index (ONI, https://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_v5.php, accessed on 31 May 2019) [54], which is a three-month running mean of sea surface temperature anomaly (SSTA) in the Niño3.4 region (5°N–5°S, 170°W–120°W). According to NOAA’s definition, for a full-fledged El Niño or La Niña event, the ONI must exceed +0.5 °C or –0.5 °C for at least five consecutive months.

We utilize near-surface (2 m) air temperature data as input and assign the corresponding ENSO phase as the label. The daily near-surface (2 m) air temperature data is obtained from the National Centers for Environmental Prediction-National Center for Atmospheric Research (NCEP-NCAR) Reanalysis [55] (https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.html, accessed on 31 May 2019), presented as grids with a latitude-longitude interval of 1.9° × 1.875°. Our region of interest spans from 60°S to 60°N, consisting of a total of N = 64 × 192 grids. We label the ENSO phases using a three-dimensional vector c(t), which is equal to (1, 0, 0) for El Niño, (0, 1, 0) for La Niña, and (0, 0, 1) for the normal phase.

For the purposes of training and validation, we split the dataset into two parts. The training dataset covers 1 January 1950 to 31 December 1999, and consists of a total of 18, 262 d. The validation dataset spans 1 January 2000 to 31 December 2018, and consists of a total of 6, 490 d. To filter out the fluctuations of short time scales in the data, we applied a 30 d sliding average with a sliding step of 1 d. To validate the robustness of our analysis, the daily data without the sliding average is applied for training as a comparison in appendix B.

We denote the surface temperature of grid i at time t as S_i(t) with the average

(1) $\begin{eqnarray}\langle {S}_{i}\rangle =\displaystyle \frac{1}{T}\displaystyle \sum _{t={t}_{0}}^{{t}_{1}}{S}_{i}(t),\end{eqnarray}$

where t₀ is 1 January 1950, t₁ is 31 December 1999, and T represents the total number of days in this period. The temperature fluctuation of grid i at time t is given by

(2) $\begin{eqnarray}\delta {S}_{i}(t)={S}_{i}(t)-\langle {S}_{i}\rangle .\end{eqnarray}$

The root mean square deviation for grid i is calculated as

(3) $\begin{eqnarray}{{\rm{\Delta }}}_{i}=\sqrt{\frac{1}{T}\displaystyle \sum _{t={t}_{0}}^{{t}_{1}}{\left(\delta {S}_{i}(t)\right)}^{2}}.\end{eqnarray}$

Considering the significant differences in temperature fluctuations across different regions, we standardize the data for each grid i using

(4) $\begin{eqnarray}{x}_{i}(t)=\displaystyle \frac{\delta {S}_{i}(t)}{{{\rm{\Delta }}}_{i}}.\end{eqnarray}$

The input to the neural network is represented as

(5) $\begin{eqnarray}{\boldsymbol{x}}(t)=\left({x}_{1}(t),{x}_{2}(t),\ldots ,{x}_{N}(t)\right).\end{eqnarray}$

2.2. Convolutional neural network

In this study, we employ CNN, a classical and well-established deep learning architecture, to address the task of ENSO phases recognition within the deep learning paradigm, as shown in figure 1. Our specific CNN architecture consists of a convolutional layer comprising 16 convolution kernels, each with a size of 5 × 5. These kernels act as filters, transforming the input image by performing element-wise multiplications and subsequently aggregating the resulting activation feature maps. These feature maps then undergo a 2 × 2 average pooling layer, reducing the size of the feature maps through coarse-graining values within 2 × 2 local neighborhoods. The pooled feature maps are then connected to a fully-connected hidden layer comprising 100 neurons. Finally, the neurons within the hidden layer are connected to a fully-connected output layer, housing only 3 output neurons. This output layer facilitates the decision-making process, providing the output vector o(t) = (o₁(t), o₂(t), o₃(t)), where o₁(t), o₂(t), and o₃(t) represent the predicted intensities for El Niño, La Niña, and normal phases, respectively.

模态框（Modal）标题

Abstract

Cite this article

1. Introduction

2. Data and methods

2.1. Data

2.2. Convolutional neural network

2.3. Unpacking the black box

3. Results

3.1. Identification of ENSO phases

3.2. Unpacking the neural network

3.2.1. Unpacking the convolutional layer

Figure 3. Visualization of the convolution kernels with their values shown as 5 × 5 nodes on the surface. Among the 16 convolution kernels, only kernel 5 (blue surface) and kernel 6 (red surface) have nonzero values.

3.2.2. Unpacking the hidden layer

Figure 4. (a)–(b) Visualization of nontrivial parameters for hidden neuron 100. The connection parameters associated with feature maps derived from convolution kernels 6 and 5 are visualized in (a) and (b), respectively.

Figure 5. (a)–(b) Visualization of nontrivial parameters for hidden neuron 50. The connection parameters associated with feature maps derived from convolution kernels 6 and 5 are visualized in (a) and (b), respectively.

3.2.3. Unpacking the output layer

4. Conclusions

Acknowledgments

Appendix A. Demonstration of the effects of convolution kernels

Figure A1. Illustration of convolution kernels with the input image sampled from 1 October 2015, during the EP El Niño period.

Figure A2. Illustration of convolution kernels with the input image sampled from 1 November 2009, during the CP El Niño period.

Figure A3. Illustration of convolution kernels with the input image sampled from 1 February 2018, during the weak La Niña period.

Figure A4. Illustration of convolution kernels with the input image sampled from 1 January 2008, during the extreme La Niña period.

Figure A5. Illustration of convolution kernels with the input image sampled from 1 March 2007, during the normal period.

Appendix B. Reproducibility and the robustness of results

Figure B1. The monthly predictions for the validation dataset (2000–2018).

Figure B2. The monthly predictions for the validation dataset (2000–2018) in parallel training 1.

Figure B3. The monthly predictions for the validation dataset (2000–2018) in parallel training 2.

Figure B4. The nontrivial outputs of hidden layer for the validation dataset (2000–2018).

Figure B5. The nontrivial outputs of hidden layer for the validation dataset (2000–2018) in parallel training 1.

Figure B6. The monthly predictions for the validation dataset (2000–2018) in parallel training 2.

Figure B7. The outputs of output layer for the validation dataset (2000–2018).

Figure B8. The outputs of output layer for the validation dataset (2000–2018) in parallel training 1.

Figure B9. The outputs of output layer for the validation dataset (2000–2018) in parallel training 2.

Figure B10. Visualization of the convolution kernels in parallel training 1 with their values shown as 5 × 5 nodes on the surface.

Figure B11. Visualization of nontrivial parameters in the hidden layer of parallel training 1.

Figure B12. Visualization of the parameters for output neurons in parallel training 1.

Figure B13. Visualization of the convolution kernels in parallel training 2 with their values shown as 5 × 5 nodes on the surface.

Figure B14. Visualization of nontrivial parameters in the hidden layer of parallel training 2.

Figure B15. Visualization of the parameters for output neurons in parallel training 2.

Appendix C. Details about the CNN

C.1.Neurons and convolution kernels

Figure C2. The ReLU function. $\mathrm{ReLU}(x)=\max (x,0)$.

C.2. Loss function and L2 parameter regularization

References

C.2. Loss function and L² parameter regularization