Welcome to visit Communications in Theoretical Physics,
Quantum Physics and Quantum Information

Quantifying quantum entanglement via machine learning models

  • Changchun Feng , 1, 2 ,
  • Lin Chen , 1, 2, 3
Expand
  • 1 LMIB (Beihang University), Ministry of Education, Beijing 100191, China​
  • 2School of Mathematical Sciences, Beihang University, Beijing 100191, China
  • 3International Research Institute for Multidisciplinary Science, Beihang University, Beijing 100191, China

Received date: 2024-01-29

  Revised date: 2024-04-19

  Accepted date: 2024-04-19

  Online published: 2024-06-12

Copyright

© 2024 Institute of Theoretical Physics CAS, Chinese Physical Society and IOP Publishing

Abstract

Quantifying entanglement measures for quantum states with unknown density matrices is a challenging task. Machine learning offers a new perspective to address this problem. By training machine learning models using experimentally measurable data, we can predict the target entanglement measures. In this study, we compare various machine learning models and find that the linear regression and stack models perform better than others. We investigate the model's impact on quantum states across different dimensions and find that higher-dimensional quantum states yield better results. Additionally, we investigate which measurable data has better predictive power for target entanglement measures. Using correlation analysis and principal component analysis, we demonstrate that quantum moments exhibit a stronger correlation with coherent information among these data features.

Cite this article

Changchun Feng , Lin Chen . Quantifying quantum entanglement via machine learning models[J]. Communications in Theoretical Physics, 2024 , 76(7) : 075104 . DOI: 10.1088/1572-9494/ad4090

1. Introduction

Quantum entanglement is an important physical resource in quantum information processing [13]. It is widely used in quantum computing [4], teleportation [5], dense coding [6], cryptography [7] and quantum key distribution [8]. To quantify the amount of entanglement of quantum states, someone came up with numerous measures [9, 10]. The von Neumann entropy of subsystems is used to define the entanglement measure of bipartite pure states. However, for bipartite mixed states, there are still numerous open questions regarding entanglement quantification [11]. For more complex cases, i.e. quantifying the multipartite quantum entanglement, many different measures have been raised but they are hard to calculate [12, 13].
Quantum engineering has made great progress during these years. It provides a new approach for detecting and quantifying entanglement. Determining whether a quantum state is entangled or not is a non-deterministic polynomial (NP)-hard problem [14]. Quantifying entanglement of unknown states is even harder than determining the existence of quantum entanglement of states with known density matrices. It is usually very costly to characterize target quantum states by quantum measurements in this case.
In recent years, machine learning has been employed to quantitatively estimate quantum entanglement experimentally [1518]. A range of experimentally accessible data from quantum states is collected, which is then used to train machine learning models. These models can predict the entanglement measures of previously unseen quantum states. Our paper focuses on this method for predicting entanglement measures.
As for entanglement measures, some geometrically motivated entanglement measures have provided us with new insights into quantum entanglement, e.g. entanglement of formation [19], relative entropy of entanglement [20, 21], global robustness [22, 23] and squashed entanglement [24].
In this paper, we give several entanglement measures in section 2, e.g. the relative entropy of entanglement, coherent information, the moments, the entanglement of formation, the von Neumann entropy, and so on. Then we introduce some classical machine learning methods in section 3. Meanwhile, we produce some random quantum states. We choose coherent information as data labels, i.e. the target of prediction. The rest of the quantum measures are treated as data features to train machine learning models. We compare the predictive power of these machine models. Our numerical simulations indicate that the linear regression and stack models outperform the rest. We explore the model's impact on quantum states across different dimensions and find that the higher dimensional quantum states yield better results. Furthermore, we investigate which quantum measures provide more accurate predictions for coherent information. It turns out that quantum moments exhibit a stronger correlation with coherent information. We then conduct principal component analysis to extract the principal components of these data features. The results confirm our conclusion: quantum moments exhibit a stronger correlation with coherent information among these data features.

2. Quantum measures

Suppose ρ is a bipartite state shared by two separated parties, Alice and Bob. We denote the set of measurement devices of Alice (Bob) as X (Y), and denote the possible measurement outcomes as A (B). Then, Alice and Bob repeat the measurement many times. We calculate the joint conditional probabilities p(abxy), which indicates the probability of the outcomes (a, b) ∈ A × B upon selecting measurement settings (x, y) ∈ X × Y. Suppose $\{{M}_{x}^{a}\}$ is the operator for the quantum measurement performed by Alice's measurement device xX, where aA. Similarly, we make assumptions about $\{{N}_{y}^{b}\}$. Then, we have the following observation.
$\begin{eqnarray}p({ab}| {xy})=\mathrm{Tr}[({{M}}_{x}^{a}\otimes {{N}}_{y}^{b})\rho ].\end{eqnarray}$
The correlation p = [p(abxy)] is composed of all the joint conditional probabilities of form p(abxy).
Then we introduce the moments of quantum states. They are defined as the following,
$\begin{eqnarray}{\mu }_{m}(\rho )=\mathrm{Tr}({\rho }^{m}).\end{eqnarray}$
Clearly, ${\mu }_{1}(\rho )=\mathrm{Tr}(\rho )=1$. Experimentally, we can measure μm(ρ) by performing joint measurements on m copies of the same state ρ. This operation is very hard under current quantum technologies, especially when m is large.
The supervised machine learning system is composed of three ingredients: data features, data labels, and a machine learning model. We need to collect training data with the right labels. Then, we feed them into the machine learning model, such that the model can predict labels of the test data precisely. In our problem, the data features of quantum data are correlation data or moments, data labels of quantum states are the values of entanglement measures, and the machine learning model learns the relationship between the above data features and data labels.
In our study, we choose the coherent information and the relative entropy of entanglement as labels. Suppose ρ is an bipartite quantum state $\rho \in {{ \mathcal H }}_{A}\otimes {{ \mathcal H }}_{B}$. The coherent information of ρ is defined as, IC(ρ) = S(ρA) − S(A), where S(ρ) is the von Neumann entropy of ρ, ${\rho }_{A}={\mathrm{Tr}}_{B}(\rho )$ is the subsystem on ${{ \mathcal H }}_{A}$.
The relative entropy of entanglement of quantum state is defined as,
$\begin{eqnarray}\begin{array}{l}{E}_{R}(\rho )=\mathop{\min }\limits_{\sigma \in {\rm{SEP}}}S(\rho | | \sigma )\\ =\mathop{\min }\limits_{\sigma \in {\rm{SEP}}}\mathrm{Tr}(\rho {\mathrm{log}}_{2}\rho -\rho {\mathrm{log}}_{2}\sigma ),\end{array}\end{eqnarray}$
where SEP denotes the set of all separable states. The entanglement of formation is defined as follows. Suppose ρ is a quantum state, ρ = ∑ipi$Psi$i⟩⟨$Psi$i∣. For each pure state $Psi$, the entanglement E is defined as the entropy of either of the two subsystems A and B.
$\begin{eqnarray}E(\psi )=-\mathrm{Tr}({\rho }_{A}{\mathrm{log}}_{2}{\rho }_{A})=-\mathrm{Tr}({\rho }_{B}{\mathrm{log}}_{2}{\rho }_{B}),\end{eqnarray}$
where ρA is the partial trace of ∣$Psi$⟩⟨$Psi$∣ over subsystem B, ρB is the partial trace of ∣$Psi$⟩⟨$Psi$∣ over subsystem A. The entanglement of formation of the mixed state ρ is defined as follows, $E(\rho )=\min {\sum }_{i}{p}_{i}E({\psi }_{i})$. One can show the entanglement of formation, denoted in equation (4), can be written as, $E(\psi )={\mathbb{E}}(C(\psi ))$, where the C is denoted as, $C(\psi )\,=| \langle \psi | \widetilde{\psi }\rangle | $, and the function ${\mathbb{E}}$ is, ${\mathbb{E}}(C)\,=h\left(\tfrac{1+\sqrt{1-{C}^{2}}}{2}\right)$, $h(x)=-x{\mathrm{log}}_{2}x-(1-x){\mathrm{log}}_{2}(1-x)$.
Having denoted the function ${\mathbb{E}}$. We can now present the entanglement of formation of a mixed state ρ of two qubit, $E(\rho )={\mathbb{E}}(C(\rho ))$, where $C(\rho )=\max \{0,{\lambda }_{1}-{\lambda }_{2}-{\lambda }_{3}-{\lambda }_{4}\}$, and λi are the eigenvalues of Hermitian matrix $R\,=\sqrt{\sqrt{\rho }\widetilde{\rho }\sqrt{\rho }}$ and λ1λ2λ3λ4. In other words, λi are the square roots of the eigenvalues of the non-Hermitian matrix $\rho \widetilde{\rho }$. We notice that each λi is a non-negative real number.

3. Predicting entanglement by machine learning models

We introduce how to collect the data features of quantum states. Suppose ρ is an arbitrary three-dimensional bipartite quantum state ρ on ${{ \mathcal H }}^{3}\otimes {{ \mathcal H }}^{3}$. We generate a random quantum state on ${{ \mathcal H }}^{d}\otimes {{ \mathcal H }}^{d}$.
$\begin{eqnarray}\rho =\displaystyle \sum _{i\,=\,0}^{k-1}{\lambda }_{i}| {u}_{i}\rangle \langle {u}_{i}| ,\end{eqnarray}$
where k ∈ {1, 2,…,d2}, and the positive λi is drawn uniformly from the interval [0, 1]. Then we sample the quantum states with coherent information is in interval $[-1.5,{\mathrm{log}}_{2}3]$. The step is 0.1.
After sampling the training quantum states, we generate the data features of these quantum states. Firstly, we calculate the correlation of these states as data features. Suppose Alice's measurement, Ak, is the following form,
$\begin{eqnarray}| r{\rangle }_{{A}_{k}}=\displaystyle \frac{1}{\sqrt{d}}\displaystyle \sum _{q=0}^{d-1}\exp \left(\displaystyle \frac{2\pi {\rm{i}}}{d}q(r-{\alpha }_{k})\right)| q{\rangle }_{A}.\end{eqnarray}$
Bob's measurement, Bl, is the following form,
$\begin{eqnarray}| r{\rangle }_{{B}_{l}}=\displaystyle \frac{1}{\sqrt{d}}\displaystyle \sum _{q=0}^{d-1}\exp \left(\displaystyle \frac{2\pi {\rm{i}}}{d}q(r-{\beta }_{l})\right)| q{\rangle }_{B},\end{eqnarray}$
where 0 ≤ rd − 1, 1 ≤ l, kN. Meanwhile, to apply the moment method, we generate two different sets of training data, which contain different orders of moments. Then we will use different machine learning or deep learning methods to predict coherent information.
Linear regression is a basic and commonly used type of predictive analysis. We use integration learning methods to improve performance of our models. K-Nearest Neighbors (KNN) is a fundamental and intuitive algorithm in machine learning, widely used for both classification and regression tasks. Then we introduce some integration learning methods.
Bagging, which stands for Bootstrap Aggregating, is a method used to reduce the variance of a decision tree by generating additional data for training from the dataset using combinations with repetitions to produce multi-sets of the original data. Random Forest is a supervised learning algorithm that is used for both classification and regression tasks [25]. It builds multiple decision trees and merges them together to get a more accurate and stable prediction [26].
Boosting is a method used to reduce the bias of a decision tree by training the model iteratively. Each subsequent model is built to correct the errors of the previous model. AdaBoost, short for Adaptive Boosting, is a popular boosting technique that combines multiple weak classifiers to build a strong classifier [27]. Gradient Boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees [28]. XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages: faster training speed and higher efficiency, lower memory usage, and better accuracy.
Stacking is a method used to combine the predictions of multiple models to create a final prediction [29]. We briefly compare these machine learning models through table 1. The predictions of each model are used as input to a final model, which is trained to minimize the error of the ensemble of models.
Table 1. Comparison between the different machine learning models.
Model Applicable scenarios Advantages Disadvantages
Linear Regression There is a linear relationship between features and the target. Simple computation; strong interpretability. Limited ability to handle nonlinear relationships and sensitive to outliers.
KNN The distance metric in the feature space is meaningful Simple and intuitive, no training process required High computational cost; sensitive to feature scaling.
Bagging Reducing the variance of the model and improving stability. Reduce overfitting risk and improve prediction accuracy May not significantly improve prediction accuracy
Boosting Converting weak learners into strong learners Improve prediction accuracy; robust to outliers and noisy data. Sensitive to the choice of base learners
Stacking There is significant diversity and complementarity between base learners Leverage the complementarity of base learners, improving prediction accuracy. High computational complexity; sensitive to the choice of base learners.
Then we give the following table to show the different learning methods' performances.
In table 2, we conclude that the root-mean-square-error (RMSE) of predicting the coherent information by the stacking model is 0.019. That of linear regression is 0.020. These two learning methods perform better in this task. For input data features, we calculate the Pearson correlations of these data features.
Table 2. The RMSEs of predicting the coherent information by the different learning methods.
Methods RMSE
Linear regression 0.020
KNN 0.051
Bagging 0.105
Random forest 0.103
AdaBoost 0.134
XGBoost 0.038
Stacking 0.019
NN 0.159
In figure 1, ‘p_11', ‘p_12', ‘p_21', ‘p_22' denote the correlation features of the quantum states. ‘miu_2', ‘miu_3' denote the second and third moments of the quantum states. Let ‘vne_A' and ‘vne_B' be the von Neumann entropy of the partial A and partial B of the quantum states. ‘f_e' is the formation entropy of the quantum states.
Figure 1. The Pearson correlation of input data features.
We obtain that ‘f_e' have the lower correlation with ‘vne_A' and ‘vne_B'. ‘vne_A' and ‘vne_B' have the lower correlation. ‘miu_2' and ‘miu_3' has a strong positive correlation. The von Neumann entropy of the partial quantum states and the moments of quantum states has a strong negative correlation.

3.1. different dimensional quantum states

In this section, we investigate how the models perform in quantum states of different dimensions. Formation entropy is hard to calculate in the quantum state on ${{ \mathcal H }}_{3}\otimes {{ \mathcal H }}_{3}$. To compare the effects in different dimensional quantum states, we chose the same data features as the input features, i.e., ‘p_11', ‘p_12', ‘p_21', ‘p_22', ‘miu_2', ‘miu_3', ‘vne_A' and ‘vne_B'.
In this (table 3), we conclude that the RMSEs of linear regression and stacking in $\rho \in { \mathcal B }({{ \mathcal H }}_{2}\otimes {{ \mathcal H }}_{2})$ are 0.020. These two learning methods perform better in this task. Similarly in $\rho \in { \mathcal B }({{ \mathcal H }}_{3}\otimes {{ \mathcal H }}_{3})$, the RMSEs of linear regression and stacking are 0.015. We obtain that linear regression and stacking both perform better in different dimensional quantum states. Furthermore, we observe that the RMSEs of $\rho \in { \mathcal B }({{ \mathcal H }}_{3}\otimes {{ \mathcal H }}_{3})$ are lower than those of $\rho \in { \mathcal B }({{ \mathcal H }}_{2}\otimes {{ \mathcal H }}_{2})$. Higher dimensional quantum states usually have more information. So the RMSEs of higher dimensional quantum states would usually be smaller.
Table 3. The RMSEs of predicting the coherent information by different learning methods and different dimensional quantum states.
Methods RMSE($\rho \in { \mathcal B }({{ \mathcal H }}_{2}\otimes {{ \mathcal H }}_{2})$) RMSE($\rho \in { \mathcal B }({{ \mathcal H }}_{3}\otimes {{ \mathcal H }}_{3})$)
Linear regression 0.020 0.015
KNN 0.051 0.040
Bagging 0.105 0.074
Random forest 0.108 0.078
AdaBoost 0.122 0.063
XGBoost 0.038 0.030
Stacking 0.020 0.015
NN 0.159 0.076

3.2. Univariate linear regression

We want to know how much each of these features played a role in the model. We only use one input data feature at a time for linear regression. Then we give table 4 to show the different features' performances.
Table 4. The RMSEs of predicting the coherent information by the different features' performances.
Data features RMSE
p_11 0.12
p_12 0.16
p_21 0.13
p_22 0.16
miu_2 0.07
miu_3 0.09
vne_A 0.15
vne_B 0.13
f_c 0.13
In this table, we conclude that the RMSE of predicting the coherent information by the ‘miu_2' is 0.07. ‘miu_2' has a stronger correlation with coherent information.

3.3. Principal component analysis

Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, it accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. The resulting vectors are an uncorrelated orthogonal basis set.
Then we use PCA to reduce dimensions of the input data features. We obtain the first and second components as follows.
$\begin{eqnarray}\left[\begin{array}{cc}0.23 & 0.07\\ 0.03 & 0.69\\ 0.24 & -0.04\\ 0.07 & -0.71\\ 0.49 & 0.02\\ 0.47 & 0.01\\ -0.39 & 0.02\\ -0.40 & -0.05\\ 0.34 & 0\end{array}\right]\end{eqnarray}$
The first component is:
0.23p_11 + 0.03p_12 + 0.24p_21 + 0.07p_22 + 0.49miu_2 + 0.47miu_3-0.39vne_A-0.40vne_B+0.34f_c.
We obtain that the parameters of quantum moments are the biggest. This means that the quantum moments play an important role in the task of predicting correlations.

4. Conclusions

We have evaluated the predictive power of various machine models. Our numerical simulations indicate that the linear regression and stack models outperform the rest. Furthermore, we examined which quantum measures provide more accurate predictions for coherent information. It turns out that quantum moments exhibit a stronger correlation with coherent information. We also examined the impact of quantum states across different dimensions and found that higher dimensional quantum states yield better results. We then conducted principal component analysis to extract the principal components of these data features. The results confirm our conclusion: quantum moments exhibit a stronger correlation with coherent information among these data features. Additionally, we provided new insights into quantifying the correlation between two entanglement measures. In the future, we will investigate how to select more appropriate models to minimize prediction errors.

The authors were supported by the NNSF of China (Grant No. 11871089).

[1]
Schrödinger E 1935 Die gegenwärtige situation in der quantenmechanik Naturwissenschaften 23 844 849

DOI

[2]
Nielsen M A, Chuang I L 2000 Quantum Computation and Quantum Information Cambridge Cambridge University Press

[3]
Horodecki R, Horodecki P, Horodecki M, Horodecki K 2009 Quantum entanglement Rev. Mod. Phys. 81 865

DOI

[4]
Walther P, Resch K J, Rudolph T, Schenck E, Weinfurter H, Vedral V, Aspelmeyer M, Zeilinger A 2005 Experimental one-way quantum computing Nature 434 169 176

DOI

[5]
Riebe M, Haffner H, Roos C F, Hansel W, Benhelm J, Gpt L 2004 Deterministic quantum teleportation with atoms Nature 429 734 737

DOI

[6]
Li X, Pan Q, Jing J, Zhang J, Peng K 2002 Quantum dense coding exploiting a bright einstein-podolsky-rosen beam Phys. Rev. Lett. 88 047904

DOI

[7]
Yin J, Li Y H, Liao S K, Yang M, Pan J W 2020 Entanglement-based secure quantum cryptography over 1,120 kilometres Nature 582 1 5

DOI

[8]
Xu F, Ma X, Zhang Q, Lo H-K, Pan J-W 2020 Secure Quantum Key Distribution with Realistic Devices Rev. Mod. Phys. 92 025002

DOI

[9]
Horodecki R, Horodecki P, Horodecki M, Horodecki K 2009 Quantum entanglement Rev. Mod. Phys. 81 865 942

DOI

[10]
Amico L, Fazio R, Osterloh A, Vedral V 2008 Entanglement in many-body systems Rev. Mod. Phys. 80 517 576

DOI

[11]
Horodecki M, Horodecki P, Horodecki R 1998 Mixed-state entanglement and distillation: Is there a ‘bound' entanglement in nature? Phys. Rev. Lett. 80 5239 5242

DOI

[12]
Barnum H, Linden N 2001 Monotones and invariants for multi-particle quantum states J. Phys. A: Math. Gen. 34 6787

DOI

[13]
Vedral V, Plenio M B 1998 Entanglement measures and purification procedures Phys. Rev. A 57 1619

DOI

[14]
Gurvits L 2004 Classical complexity and quantum entanglement J. Comput. Syst. Sci. 69 448 484

[15]
Lu S, Huang S, Li K, Li J, Chen J, Lu D, Ji Z, Shen Y, Zhou D, Zeng B 2018 Separability-entanglement classifier via machine learning Phys. Rev. A 98 012315

DOI

[16]
Gray J, Banchi L, Bayat A, Bose S 2018 Machine learning assisted many body entanglement measurement Phys. Rev. Lett. 121 150503

DOI

[17]
Wei Z, Lin X, Chen Z 2023 Quantifying quantum entanglement via a hybrid quantum-classical machine learning framework Phys. Rev. A 107 062409

DOI

[18]
Cernoch A, Lemr K, Roik J, Bartkiewicz K 2022 Entanglement Quantification from Collective Measurements Processed by Machine Learning arXiv:2203.01607

[19]
Bennett C H, DiVincenzo D P, Smolin J A, Wootters W K 1996 Mixed-state entanglement and quantum error correction Phys. Rev. A 54 3824 3851

DOI

[20]
Vedral V, Plenio M B, Rippin M A, Knight P L 1997 Quantifying Entanglement Phys. Rev. Lett. 78 2275 2279

DOI

[21]
Vedral V, Plenio M B 1998 Entanglement measures and purification procedures Phys. Rev. A 57 1619 1633

DOI

[22]
Harrow A W, Nielsen M A 2003 Robustness of quantum gates in the presence of noise Phys. Rev. A 68 012308

DOI

[23]
Vidal G, Tarrach R 1999 Robustness of entanglement Phys. Rev. A 59 141 155

DOI

[24]
Christandl M, Winter A 2004 ‘squashed entanglement': An additive entanglement measure J. Math. Phys. 45 829 840

DOI

[25]
Breiman L 2001 Random forests Mach. Learn. 45 5 32

DOI

[26]
Quinlan J R 1986 Induction of decision trees Mach. Learn. 1 81 106

DOI

[27]
Freund Y, Schapire R E 2005 A decision-theoretic generalization of on-line learning and an application to boosting Comput. Learn. Theor. 904 23 37

DOI

[28]
Berk R A 2008 Boosting Statistical Learning from a Regression Perspective Springer 297 337

DOI

[29]
Schonlau M 2023 Stacking Applied Statistical Learning Springer 323 328

DOI

Outlines

/