Quantifying quantum entanglement via machine learning models

Changchun Feng; Lin Chen

doi:10.1088/1572-9494/ad4090

Communications in Theoretical Physics >

2024 , Vol. 76 >Issue 7: 75104

DOI: https://doi.org/10.1088/1572-9494/ad4090

Quantum Physics and Quantum Information

Quantifying quantum entanglement via machine learning models

Changchun Feng ^,¹^,² ,
Lin Chen ^,¹^,²^,³

Expand

¹ LMIB (Beihang University), Ministry of Education, Beijing 100191, China
²School of Mathematical Sciences, Beihang University, Beijing 100191, China
³International Research Institute for Multidisciplinary Science, Beihang University, Beijing 100191, China

Received date: 2024-01-29

Revised date: 2024-04-19

Accepted date: 2024-04-19

Online published: 2024-06-12

Copyright

Fold

Abstract

Quantifying entanglement measures for quantum states with unknown density matrices is a challenging task. Machine learning offers a new perspective to address this problem. By training machine learning models using experimentally measurable data, we can predict the target entanglement measures. In this study, we compare various machine learning models and find that the linear regression and stack models perform better than others. We investigate the model's impact on quantum states across different dimensions and find that higher-dimensional quantum states yield better results. Additionally, we investigate which measurable data has better predictive power for target entanglement measures. Using correlation analysis and principal component analysis, we demonstrate that quantum moments exhibit a stronger correlation with coherent information among these data features.

Key words： quantum information; entanglement; predict; machine learning; coherence

Cite this article

Changchun Feng , Lin Chen . Quantifying quantum entanglement via machine learning models[J]. Communications in Theoretical Physics, 2024 , 76(7) : 075104 . DOI: 10.1088/1572-9494/ad4090

1. Introduction

Quantum entanglement is an important physical resource in quantum information processing [1–3]. It is widely used in quantum computing [4], teleportation [5], dense coding [6], cryptography [7] and quantum key distribution [8]. To quantify the amount of entanglement of quantum states, someone came up with numerous measures [9, 10]. The von Neumann entropy of subsystems is used to define the entanglement measure of bipartite pure states. However, for bipartite mixed states, there are still numerous open questions regarding entanglement quantification [11]. For more complex cases, i.e. quantifying the multipartite quantum entanglement, many different measures have been raised but they are hard to calculate [12, 13].

Quantum engineering has made great progress during these years. It provides a new approach for detecting and quantifying entanglement. Determining whether a quantum state is entangled or not is a non-deterministic polynomial (NP)-hard problem [14]. Quantifying entanglement of unknown states is even harder than determining the existence of quantum entanglement of states with known density matrices. It is usually very costly to characterize target quantum states by quantum measurements in this case.

In recent years, machine learning has been employed to quantitatively estimate quantum entanglement experimentally [15–18]. A range of experimentally accessible data from quantum states is collected, which is then used to train machine learning models. These models can predict the entanglement measures of previously unseen quantum states. Our paper focuses on this method for predicting entanglement measures.

As for entanglement measures, some geometrically motivated entanglement measures have provided us with new insights into quantum entanglement, e.g. entanglement of formation [19], relative entropy of entanglement [20, 21], global robustness [22, 23] and squashed entanglement [24].

In this paper, we give several entanglement measures in section 2, e.g. the relative entropy of entanglement, coherent information, the moments, the entanglement of formation, the von Neumann entropy, and so on. Then we introduce some classical machine learning methods in section 3. Meanwhile, we produce some random quantum states. We choose coherent information as data labels, i.e. the target of prediction. The rest of the quantum measures are treated as data features to train machine learning models. We compare the predictive power of these machine models. Our numerical simulations indicate that the linear regression and stack models outperform the rest. We explore the model's impact on quantum states across different dimensions and find that the higher dimensional quantum states yield better results. Furthermore, we investigate which quantum measures provide more accurate predictions for coherent information. It turns out that quantum moments exhibit a stronger correlation with coherent information. We then conduct principal component analysis to extract the principal components of these data features. The results confirm our conclusion: quantum moments exhibit a stronger correlation with coherent information among these data features.

2. Quantum measures

Suppose ρ is a bipartite state shared by two separated parties, Alice and Bob. We denote the set of measurement devices of Alice (Bob) as X (Y), and denote the possible measurement outcomes as A (B). Then, Alice and Bob repeat the measurement many times. We calculate the joint conditional probabilities p(ab∣xy), which indicates the probability of the outcomes (a, b) ∈ A × B upon selecting measurement settings (x, y) ∈ X × Y. Suppose $\{{M}_{x}^{a}\}$ is the operator for the quantum measurement performed by Alice's measurement device x ∈ X, where a ∈ A. Similarly, we make assumptions about $\{{N}_{y}^{b}\}$. Then, we have the following observation.

(1)$\begin{eqnarray}p({ab}| {xy})=\mathrm{Tr}[({{M}}_{x}^{a}\otimes {{N}}_{y}^{b})\rho ].\end{eqnarray}$

The correlation p = [p(ab∣xy)] is composed of all the joint conditional probabilities of form p(ab∣xy).

Then we introduce the moments of quantum states. They are defined as the following,

(2)$\begin{eqnarray}{\mu }_{m}(\rho )=\mathrm{Tr}({\rho }^{m}).\end{eqnarray}$

Clearly, ${\mu }_{1}(\rho )=\mathrm{Tr}(\rho )=1$. Experimentally, we can measure μ_m(ρ) by performing joint measurements on m copies of the same state ρ. This operation is very hard under current quantum technologies, especially when m is large.

The supervised machine learning system is composed of three ingredients: data features, data labels, and a machine learning model. We need to collect training data with the right labels. Then, we feed them into the machine learning model, such that the model can predict labels of the test data precisely. In our problem, the data features of quantum data are correlation data or moments, data labels of quantum states are the values of entanglement measures, and the machine learning model learns the relationship between the above data features and data labels.

In our study, we choose the coherent information and the relative entropy of entanglement as labels. Suppose ρ is an bipartite quantum state $\rho \in {{ \mathcal H }}_{A}\otimes {{ \mathcal H }}_{B}$. The coherent information of ρ is defined as, I_C(ρ) = S(ρ_A) − S(A), where S(ρ) is the von Neumann entropy of ρ, ${\rho }_{A}={\mathrm{Tr}}_{B}(\rho )$ is the subsystem on ${{ \mathcal H }}_{A}$.

The relative entropy of entanglement of quantum state is defined as,

(3)$\begin{eqnarray}\begin{array}{l}{E}_{R}(\rho )=\mathop{\min }\limits_{\sigma \in {\rm{SEP}}}S(\rho | | \sigma )\\ =\mathop{\min }\limits_{\sigma \in {\rm{SEP}}}\mathrm{Tr}(\rho {\mathrm{log}}_{2}\rho -\rho {\mathrm{log}}_{2}\sigma ),\end{array}\end{eqnarray}$

where SEP denotes the set of all separable states. The entanglement of formation is defined as follows. Suppose ρ is a quantum state, ρ = ∑_ip_i∣$Psi$_i⟩⟨$Psi$_i∣. For each pure state $Psi$, the entanglement E is defined as the entropy of either of the two subsystems A and B.

(4)$\begin{eqnarray}E(\psi )=-\mathrm{Tr}({\rho }_{A}{\mathrm{log}}_{2}{\rho }_{A})=-\mathrm{Tr}({\rho }_{B}{\mathrm{log}}_{2}{\rho }_{B}),\end{eqnarray}$

where ρ_A is the partial trace of ∣$Psi$⟩⟨$Psi$∣ over subsystem B, ρ_B is the partial trace of ∣$Psi$⟩⟨$Psi$∣ over subsystem A. The entanglement of formation of the mixed state ρ is defined as follows, $E(\rho )=\min {\sum }_{i}{p}_{i}E({\psi }_{i})$. One can show the entanglement of formation, denoted in equation (4), can be written as, $E(\psi )={\mathbb{E}}(C(\psi ))$, where the C is denoted as, $C(\psi )\,=| \langle \psi | \widetilde{\psi }\rangle | $, and the function ${\mathbb{E}}$ is, ${\mathbb{E}}(C)\,=h\left(\tfrac{1+\sqrt{1-{C}^{2}}}{2}\right)$, $h(x)=-x{\mathrm{log}}_{2}x-(1-x){\mathrm{log}}_{2}(1-x)$.

Having denoted the function ${\mathbb{E}}$. We can now present the entanglement of formation of a mixed state ρ of two qubit, $E(\rho )={\mathbb{E}}(C(\rho ))$, where $C(\rho )=\max \{0,{\lambda }_{1}-{\lambda }_{2}-{\lambda }_{3}-{\lambda }_{4}\}$, and λ_i are the eigenvalues of Hermitian matrix $R\,=\sqrt{\sqrt{\rho }\widetilde{\rho }\sqrt{\rho }}$ and λ₁ ≥ λ₂ ≥ λ₃ ≥ λ₄. In other words, λ_i are the square roots of the eigenvalues of the non-Hermitian matrix $\rho \widetilde{\rho }$. We notice that each λ_i is a non-negative real number.

3. Predicting entanglement by machine learning models

We introduce how to collect the data features of quantum states. Suppose ρ is an arbitrary three-dimensional bipartite quantum state ρ on ${{ \mathcal H }}^{3}\otimes {{ \mathcal H }}^{3}$. We generate a random quantum state on ${{ \mathcal H }}^{d}\otimes {{ \mathcal H }}^{d}$.

(5)$\begin{eqnarray}\rho =\displaystyle \sum _{i\,=\,0}^{k-1}{\lambda }_{i}| {u}_{i}\rangle \langle {u}_{i}| ,\end{eqnarray}$

where k ∈ {1, 2,…,d²}, and the positive λ_i is drawn uniformly from the interval [0, 1]. Then we sample the quantum states with coherent information is in interval $[-1.5,{\mathrm{log}}_{2}3]$. The step is 0.1.

After sampling the training quantum states, we generate the data features of these quantum states. Firstly, we calculate the correlation of these states as data features. Suppose Alice's measurement, A_k, is the following form,

(6)$\begin{eqnarray}| r{\rangle }_{{A}_{k}}=\displaystyle \frac{1}{\sqrt{d}}\displaystyle \sum _{q=0}^{d-1}\exp \left(\displaystyle \frac{2\pi {\rm{i}}}{d}q(r-{\alpha }_{k})\right)| q{\rangle }_{A}.\end{eqnarray}$

Bob's measurement, B_l, is the following form,

(7)$\begin{eqnarray}| r{\rangle }_{{B}_{l}}=\displaystyle \frac{1}{\sqrt{d}}\displaystyle \sum _{q=0}^{d-1}\exp \left(\displaystyle \frac{2\pi {\rm{i}}}{d}q(r-{\beta }_{l})\right)| q{\rangle }_{B},\end{eqnarray}$

where 0 ≤ r ≤ d − 1, 1 ≤ l, k ≤ N. Meanwhile, to apply the moment method, we generate two different sets of training data, which contain different orders of moments. Then we will use different machine learning or deep learning methods to predict coherent information.

Linear regression is a basic and commonly used type of predictive analysis. We use integration learning methods to improve performance of our models. K-Nearest Neighbors (KNN) is a fundamental and intuitive algorithm in machine learning, widely used for both classification and regression tasks. Then we introduce some integration learning methods.

Bagging, which stands for Bootstrap Aggregating, is a method used to reduce the variance of a decision tree by generating additional data for training from the dataset using combinations with repetitions to produce multi-sets of the original data. Random Forest is a supervised learning algorithm that is used for both classification and regression tasks [25]. It builds multiple decision trees and merges them together to get a more accurate and stable prediction [26].

Boosting is a method used to reduce the bias of a decision tree by training the model iteratively. Each subsequent model is built to correct the errors of the previous model. AdaBoost, short for Adaptive Boosting, is a popular boosting technique that combines multiple weak classifiers to build a strong classifier [27]. Gradient Boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees [28]. XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages: faster training speed and higher efficiency, lower memory usage, and better accuracy.

Stacking is a method used to combine the predictions of multiple models to create a final prediction [29]. We briefly compare these machine learning models through table 1. The predictions of each model are used as input to a final model, which is trained to minimize the error of the ensemble of models.

Table 1. Comparison between the different machine learning models.

Model	Applicable scenarios	Advantages	Disadvantages
Linear Regression	There is a linear relationship between features and the target.	Simple computation; strong interpretability.	Limited ability to handle nonlinear relationships and sensitive to outliers.
KNN	The distance metric in the feature space is meaningful	Simple and intuitive, no training process required	High computational cost; sensitive to feature scaling.
Bagging	Reducing the variance of the model and improving stability.	Reduce overfitting risk and improve prediction accuracy	May not significantly improve prediction accuracy
Boosting	Converting weak learners into strong learners	Improve prediction accuracy; robust to outliers and noisy data.	Sensitive to the choice of base learners
Stacking	There is significant diversity and complementarity between base learners	Leverage the complementarity of base learners, improving prediction accuracy.	High computational complexity; sensitive to the choice of base learners.

Then we give the following table to show the different learning methods' performances.

In table 2, we conclude that the root-mean-square-error (RMSE) of predicting the coherent information by the stacking model is 0.019. That of linear regression is 0.020. These two learning methods perform better in this task. For input data features, we calculate the Pearson correlations of these data features.

Table 2. The RMSEs of predicting the coherent information by the different learning methods.

Methods	RMSE
Linear regression	0.020
KNN	0.051
Bagging	0.105
Random forest	0.103
AdaBoost	0.134
XGBoost	0.038
Stacking	0.019
NN	0.159

In figure 1, ‘p_11', ‘p_12', ‘p_21', ‘p_22' denote the correlation features of the quantum states. ‘miu_2', ‘miu_3' denote the second and third moments of the quantum states. Let ‘vne_A' and ‘vne_B' be the von Neumann entropy of the partial A and partial B of the quantum states. ‘f_e' is the formation entropy of the quantum states.

View original graphic|Download|PPT slide

Figure 1. The Pearson correlation of input data features.

We obtain that ‘f_e' have the lower correlation with ‘vne_A' and ‘vne_B'. ‘vne_A' and ‘vne_B' have the lower correlation. ‘miu_2' and ‘miu_3' has a strong positive correlation. The von Neumann entropy of the partial quantum states and the moments of quantum states has a strong negative correlation.

3.1. different dimensional quantum states

In this section, we investigate how the models perform in quantum states of different dimensions. Formation entropy is hard to calculate in the quantum state on ${{ \mathcal H }}_{3}\otimes {{ \mathcal H }}_{3}$. To compare the effects in different dimensional quantum states, we chose the same data features as the input features, i.e., ‘p_11', ‘p_12', ‘p_21', ‘p_22', ‘miu_2', ‘miu_3', ‘vne_A' and ‘vne_B'.

In this (table 3), we conclude that the RMSEs of linear regression and stacking in $\rho \in { \mathcal B }({{ \mathcal H }}_{2}\otimes {{ \mathcal H }}_{2})$ are 0.020. These two learning methods perform better in this task. Similarly in $\rho \in { \mathcal B }({{ \mathcal H }}_{3}\otimes {{ \mathcal H }}_{3})$, the RMSEs of linear regression and stacking are 0.015. We obtain that linear regression and stacking both perform better in different dimensional quantum states. Furthermore, we observe that the RMSEs of $\rho \in { \mathcal B }({{ \mathcal H }}_{3}\otimes {{ \mathcal H }}_{3})$ are lower than those of $\rho \in { \mathcal B }({{ \mathcal H }}_{2}\otimes {{ \mathcal H }}_{2})$. Higher dimensional quantum states usually have more information. So the RMSEs of higher dimensional quantum states would usually be smaller.

Table 3. The RMSEs of predicting the coherent information by different learning methods and different dimensional quantum states.

Methods	RMSE($\rho \in { \mathcal B }({{ \mathcal H }}_{2}\otimes {{ \mathcal H }}_{2})$)	RMSE($\rho \in { \mathcal B }({{ \mathcal H }}_{3}\otimes {{ \mathcal H }}_{3})$)
Linear regression	0.020	0.015
KNN	0.051	0.040
Bagging	0.105	0.074
Random forest	0.108	0.078
AdaBoost	0.122	0.063
XGBoost	0.038	0.030
Stacking	0.020	0.015
NN	0.159	0.076

3.2. Univariate linear regression

We want to know how much each of these features played a role in the model. We only use one input data feature at a time for linear regression. Then we give table 4 to show the different features' performances.

Table 4. The RMSEs of predicting the coherent information by the different features' performances.

Data features	RMSE
p_11	0.12
p_12	0.16
p_21	0.13
p_22	0.16
miu_2	0.07
miu_3	0.09
vne_A	0.15
vne_B	0.13
f_c	0.13

In this table, we conclude that the RMSE of predicting the coherent information by the ‘miu_2' is 0.07. ‘miu_2' has a stronger correlation with coherent information.

3.3. Principal component analysis

Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, it accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. The resulting vectors are an uncorrelated orthogonal basis set.

Then we use PCA to reduce dimensions of the input data features. We obtain the first and second components as follows.

(8)$\begin{eqnarray}\left[\begin{array}{cc}0.23 & 0.07\\ 0.03 & 0.69\\ 0.24 & -0.04\\ 0.07 & -0.71\\ 0.49 & 0.02\\ 0.47 & 0.01\\ -0.39 & 0.02\\ -0.40 & -0.05\\ 0.34 & 0\end{array}\right]\end{eqnarray}$

The first component is:

0.23p_11 + 0.03p_12 + 0.24p_21 + 0.07p_22 + 0.49miu_2 + 0.47miu_3-0.39vne_A-0.40vne_B+0.34f_c.

We obtain that the parameters of quantum moments are the biggest. This means that the quantum moments play an important role in the task of predicting correlations.

4. Conclusions

We have evaluated the predictive power of various machine models. Our numerical simulations indicate that the linear regression and stack models outperform the rest. Furthermore, we examined which quantum measures provide more accurate predictions for coherent information. It turns out that quantum moments exhibit a stronger correlation with coherent information. We also examined the impact of quantum states across different dimensions and found that higher dimensional quantum states yield better results. We then conducted principal component analysis to extract the principal components of these data features. The results confirm our conclusion: quantum moments exhibit a stronger correlation with coherent information among these data features. Additionally, we provided new insights into quantifying the correlation between two entanglement measures. In the future, we will investigate how to select more appropriate models to minimize prediction errors.

The authors were supported by the NNSF of China (Grant No. 11871089).

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Schrödinger E 1935 Die gegenwärtige situation in der quantenmechanik Naturwissenschaften 23 844 849 DOI

[2]	Nielsen M A, Chuang I L 2000 Quantum Computation and Quantum Information Cambridge Cambridge University Press

[3]	Horodecki R, Horodecki P, Horodecki M, Horodecki K 2009 Quantum entanglement Rev. Mod. Phys. 81 865 DOI

[4]	Walther P, Resch K J, Rudolph T, Schenck E, Weinfurter H, Vedral V, Aspelmeyer M, Zeilinger A 2005 Experimental one-way quantum computing Nature 434 169 176 DOI

[5]	Riebe M, Haffner H, Roos C F, Hansel W, Benhelm J, Gpt L 2004 Deterministic quantum teleportation with atoms Nature 429 734 737 DOI

[6]	Li X, Pan Q, Jing J, Zhang J, Peng K 2002 Quantum dense coding exploiting a bright einstein-podolsky-rosen beam Phys. Rev. Lett. 88 047904 DOI

[7]	Yin J, Li Y H, Liao S K, Yang M, Pan J W 2020 Entanglement-based secure quantum cryptography over 1,120 kilometres Nature 582 1 5 DOI

[8]	Xu F, Ma X, Zhang Q, Lo H-K, Pan J-W 2020 Secure Quantum Key Distribution with Realistic Devices Rev. Mod. Phys. 92 025002 DOI

[9]	Horodecki R, Horodecki P, Horodecki M, Horodecki K 2009 Quantum entanglement Rev. Mod. Phys. 81 865 942 DOI

[10]	Amico L, Fazio R, Osterloh A, Vedral V 2008 Entanglement in many-body systems Rev. Mod. Phys. 80 517 576 DOI

[11]	Horodecki M, Horodecki P, Horodecki R 1998 Mixed-state entanglement and distillation: Is there a ‘bound' entanglement in nature? Phys. Rev. Lett. 80 5239 5242 DOI

[12]	Barnum H, Linden N 2001 Monotones and invariants for multi-particle quantum states J. Phys. A: Math. Gen. 34 6787 DOI

[13]	Vedral V, Plenio M B 1998 Entanglement measures and purification procedures Phys. Rev. A 57 1619 DOI

[14]	Gurvits L 2004 Classical complexity and quantum entanglement J. Comput. Syst. Sci. 69 448 484

[15]	Lu S, Huang S, Li K, Li J, Chen J, Lu D, Ji Z, Shen Y, Zhou D, Zeng B 2018 Separability-entanglement classifier via machine learning Phys. Rev. A 98 012315 DOI

[16]	Gray J, Banchi L, Bayat A, Bose S 2018 Machine learning assisted many body entanglement measurement Phys. Rev. Lett. 121 150503 DOI

[17]	Wei Z, Lin X, Chen Z 2023 Quantifying quantum entanglement via a hybrid quantum-classical machine learning framework Phys. Rev. A 107 062409 DOI

[18]	Cernoch A, Lemr K, Roik J, Bartkiewicz K 2022 Entanglement Quantification from Collective Measurements Processed by Machine Learning arXiv:2203.01607

[19]	Bennett C H, DiVincenzo D P, Smolin J A, Wootters W K 1996 Mixed-state entanglement and quantum error correction Phys. Rev. A 54 3824 3851 DOI

[20]	Vedral V, Plenio M B, Rippin M A, Knight P L 1997 Quantifying Entanglement Phys. Rev. Lett. 78 2275 2279 DOI

[21]	Vedral V, Plenio M B 1998 Entanglement measures and purification procedures Phys. Rev. A 57 1619 1633 DOI

[22]	Harrow A W, Nielsen M A 2003 Robustness of quantum gates in the presence of noise Phys. Rev. A 68 012308 DOI

[23]	Vidal G, Tarrach R 1999 Robustness of entanglement Phys. Rev. A 59 141 155 DOI

[24]	Christandl M, Winter A 2004 ‘squashed entanglement': An additive entanglement measure J. Math. Phys. 45 829 840 DOI

[25]	Breiman L 2001 Random forests Mach. Learn. 45 5 32 DOI

[26]	Quinlan J R 1986 Induction of decision trees Mach. Learn. 1 81 106 DOI

[27]	Freund Y, Schapire R E 2005 A decision-theoretic generalization of on-line learning and an application to boosting Comput. Learn. Theor. 904 23 37 DOI

[28]	Berk R A 2008 Boosting Statistical Learning from a Regression Perspective Springer 297 337 DOI

[29]	Schonlau M 2023 Stacking Applied Statistical Learning Springer 323 328 DOI

Options

Outlines

模态框（Modal）标题