Welcome to visit Communications in Theoretical Physics,
Statistical Physics, Soft Matter and Biophysics

Theoretical analysis of RNA polymerase fidelity: a steady-state copolymerization approach

  • Wenbo Fu 1 ,
  • Qiushi Li 1 ,
  • Yongshun Song 2 ,
  • Yaogen Shu 3 ,
  • Zhongcan Ouyang 4 ,
  • Ming Li , 1
Expand
  • 1School of Physical Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
  • 2School of Physics, East China University of Science and Technology, Shanghai, 200237, China
  • 3Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China
  • 4Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, 100190, China

Received date: 2021-10-23

  Revised date: 2021-11-12

  Accepted date: 2021-11-15

  Online published: 2022-02-09

Copyright

© 2021 Institute of Theoretical Physics CAS, Chinese Physical Society and IOP Publishing

Abstract

The fidelity of DNA transcription catalyzed by RNA polymerase (RNAP) has long been an important issue in biology. Experiments have revealed that RNAP can incorporate matched nucleotides selectively and proofread the incorporated mismatched nucleotides. However, systematic theoretical researches on RNAP fidelity are still lacking. In the last decade, several theories on RNA transcription have been proposed, but they only handled highly simplified models without considering the high-order neighbor effects and the oligonucleotides cleavage both of which are critical for the overall fidelity. In this paper, we regard RNA transcription as a binary copolymerization process and calculate the transcription fidelity by the steady-state copolymerization theory recently proposed by us for DNA replication. With this theory, the more realistic models considering higher-order neighbor effects, oligonucleotides cleavage, multi-step incorporation and multi-step cleavage can be rigorously handled.

Cite this article

Wenbo Fu , Qiushi Li , Yongshun Song , Yaogen Shu , Zhongcan Ouyang , Ming Li . Theoretical analysis of RNA polymerase fidelity: a steady-state copolymerization approach[J]. Communications in Theoretical Physics, 2022 , 74(1) : 015601 . DOI: 10.1088/1572-9494/ac3993

1. Introduction

Accurate transfer of genetic information is critical for the survival and reproduction of living organisms. For example, the transcription fidelity in bacteria and eukaryotes is about 103–105 [14]. The kinetic proofreading mechanism, proposed by Hopfield [5] and Ninio [6], correctly pointed out that such high fidelity is not determined thermodynamically by the free energy difference, but kinetically by the incorporation rate difference between matched pairs and mismatched pairs. However, the original version of the kinetic proofreading mechanism assumed that the proofreading occurs before the nucleotide is covalently incorporated into the terminal, which is different from the real mechanism of RNA polymerases (RNAP).
Experiments show that the incorporated mismatched nucleoside triphosphates (NTPs) can still be proofread by RNAP [79]. Since the proofreading mechanism is not fully clear, various models were proposed with different details but also with the following similarities [1012]. (I) RNAP has two working modes: the incorporation mode and the cleavage mode. In the incorporation mode, RNAP can selectively incorporate the matched NTP. The fidelity contributed by the incorporation mode is defined as the initial discrimination. In the cleavage mode, RNAP can proofread the incorporated mismatched nucleotides, which further enhance the fidelity for about 102 [13]. This enhancement is defined as proofreading efficiency. (II) RNAP cleaved at least two nucleotides, and it is widely believed that the proofreading efficiency is mainly contributed by dinucleotides cleavage [7]. (III) There are neighbor effects, i.e. the terminal, the penultimate, and deeply buried mismatches may inhibit incorporation and promote proofreading [14]. These neighbor effects could be attributed to the incorporated mismatches that weaken the adjacent base pairs [14, 15].
The central issue of all the relevant theoretical studies is to show how transcription fidelity is determined by the involved kinetic parameters. Several theories were proposed for highly simplified transcription models [1012], without considering the neighbor effects or just making very rough estimates on the fidelity by using untested assumptions. There still lacks a systematic and more precise study on much more realistic models. Recently we have treated a quite similar problem for DNA polymerase (DNAP) fidelity, proposing the steady-state copolymerization method which can be used to handle highly complicated kinetic models of DNA replication [16, 17]. In this paper, we will generalize this method to investigate the fidelity issue of RNA transcription. The paper is organized as follows. We first illustrate the basic theory of the steady-state copolymerization method by the minimal transcription model with first-order neighbor effect and dinucleotides cleavage in section 2.1 More realistic models considering higher-order neighbor effects, multi-step incorporation and oligonucleotides cleavage are also handled in sections 2 and 3. In particular, we analytically derive the mathematical expressions of the transcription fidelity in terms of some key kinetic parameters in sections 2.3 and 3. These expressions provide new and intuitive insights on how the incorporation mode and cleavage mode of RNAP are coordinated to achieve high transcription fidelity.

2. Copolymerization model with dinucleotides cleavage

The interaction between the RNAP and the RNA/DNA hybrid is extremely complex, but the dominating factor for the transcription fidelity is widely believed to be the incorporation-cleavage kinetics occurring at the terminal region of the RNA transcript. In the real transcription, the template DNA contains 4 types of deoxynucleoside monophosphate (dNMP): A, U, C, and G, and there are 3 types of mismatches for each type of template dNMP. Here, inspired by the experimental data of DNA replication kinetics [17], we assume that any kinetic rate of a match (or mismatch) is of the same order of magnitude to the counterpart of another match (or mismatch), and there are orders of magnitude difference between the rates of the matches and the counterparts of the mismatches. Hence, we can approximately regard RNA transcription as a binary copolymerization process of two monomers, R (the match) and W (the mismatch), without explicit consideration of the sequence of the DNA template.
During the transcription, the percentage of R and W of the transcript changes with time but eventually reaches a constant, i.e. d(NR/NW)/dt = 0, here NR and NW are the total number of R and W in the transcript, respectively. This leads to ${N}_{R}/{N}_{W}={\dot{N}}_{R}/{\dot{N}}_{W}$, here ${\dot{N}}_{R}$ and ${\dot{N}}_{W}$ represent the time derivatives of NR and NW, respectively, from which the fidelity can be defined as,
$\begin{eqnarray}\phi \equiv \displaystyle \frac{{N}_{R}}{{N}_{W}}=\displaystyle \frac{{\dot{N}}_{R}}{{\dot{N}}_{W}}.\end{eqnarray}$
Below we will first introduce the minimal model with mono- and dinucleotides cleavage and show how to calculate fidelity considering the first-order neighbor effect (section 2.1). The higher-order effect can also be considered in the minimal model (section 2.2). Then we will try to get the analytical expression of the fidelity approximately under bio-relevant conditions (section 2.3), and this logic will be extended to the more realistic multi-step models (section 2.4).

2.1. Minimal model with first-order neighbor effect

Denoting the incorporation rate constants, mono- and dinucleotides cleavage rate constants as ${ \mathcal K },{ \mathcal Z },{ \mathcal Q }$, we have the minimal model as shown in figure 1.
Figure 1. The minimal copolymerization model of RNA transcription with mono- and dinucleotides cleavage.
Here the transcript of length m is denoted as Dm. In principle the kinetic rate constants can be affected by several nucleoside monophosphates (NMPs) in the transcript terminal, but for brevity to illustrate the basic logic we first consider the first-order neighbor effect. For example, ${{ \mathcal K }}_{m}={{ \mathcal K }}_{{i}_{2}{i}_{1}}$ for any m, in which i1 represents the incoming NTP and i2 represents the terminal NMP (i1, i2 = R, W). Similar, ${{ \mathcal Z }}_{m}={{ \mathcal Z }}_{{i}_{2}{i}_{1}}$, and ${{ \mathcal Q }}_{m}={{ \mathcal Q }}_{{i}_{2}{i}_{1}}$, in which i1 represents the terminal NMP and i2 represents the penultimate NMP.
During the transcription, the occurrence probability of terminal ini2i1 is denoted as ${P}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}$. And at some moment the total number of sequence ini2i1 appearing in the transcript is ${N}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}$, which changes with time as following,
$\begin{eqnarray}\begin{array}{l}{\dot{N}}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}\equiv {J}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}},\\ {J}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}={{ \mathcal K }}_{{i}_{2}{i}_{1}}{P}_{{i}_{n} \ \cdots \ {i}_{2}}\\ -{{ \mathcal Z }}_{{i}_{2}{i}_{1}}{P}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}-{{ \mathcal Q }}_{{i}_{2}{i}_{1}}{P}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}\\ -{{ \mathcal Q }}_{{i}_{1}R}{P}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}R}-{{ \mathcal Q }}_{{i}_{1}W}{P}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}W}.\end{array}\end{eqnarray}$
In general, we have ${P}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}={P}_{{{Ri}}_{n} \ \cdots \ {i}_{2}{i}_{1}}+{P}_{{{Wi}}_{n} \ \cdots \ {i}_{2}{i}_{1}}$, ${J}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}={J}_{{{Ri}}_{n} \ \cdots \ {i}_{2}{i}_{1}}+{J}_{{{Wi}}_{n} \ \cdots \ {i}_{2}{i}_{1}}$. The kinetic equations of ${P}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}$ can be written as,
$\begin{eqnarray}\begin{array}{l}{\dot{P}}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}={J}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}-{\tilde{J}}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}},\\ {\tilde{J}}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}\equiv {J}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}R}+{J}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}W}.\end{array}\end{eqnarray}$
For example,
$\begin{eqnarray}\begin{array}{l}{\dot{P}}_{R}={J}_{R}-{\tilde{J}}_{R}={J}_{{WR}}-{J}_{{RW}}\\ =({{ \mathcal K }}_{{WR}}{P}_{W}-{{ \mathcal Z }}_{{WR}}{P}_{{WR}}-{{ \mathcal Q }}_{{WR}}{P}_{{WR}}\\ -{{ \mathcal Q }}_{{RR}}{P}_{{WRR}}-{{ \mathcal Q }}_{{RW}}{P}_{{WRW}})\\ -({{ \mathcal K }}_{{RW}}{P}_{R}-{{ \mathcal Z }}_{{RW}}{P}_{{RW}}-{{ \mathcal Q }}_{{RW}}{P}_{{RW}}\\ -{{ \mathcal Q }}_{{WR}}{P}_{{RWR}}-{{ \mathcal Q }}_{{WW}}{P}_{{RWW}}).\end{array}\end{eqnarray}$
In the long time limit, the occurrence probability of any terminal sequence will eventually reach a stationary distribution (${\dot{P}}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}=0$). In this steady-state stage,
$\begin{eqnarray}{J}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}={\tilde{J}}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}.\end{eqnarray}$
Unfortunately, these unclosed equations (3) can not be solved to get ${P}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}$, ${J}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}$ or ${N}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}$. Therefore we propose the following factorization conjecture to close the equations,
$\begin{eqnarray}{P}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}=\left(\prod _{m=3}^{n}\displaystyle \frac{{P}_{{i}_{m}{i}_{m-1}}}{{P}_{{i}_{m-1}}}\right)\cdot {P}_{{i}_{2}{i}_{1}},n\geqslant 3.\end{eqnarray}$
For instance, ${P}_{{i}_{3}{i}_{2}{i}_{1}}=({P}_{{i}_{3}{i}_{2}}/{P}_{{i}_{2}})\cdot {P}_{{i}_{2}{i}_{1}}$. This kind of factorization conjecture had been successfully applied to the steady-state copolymerization theory for DNA replication [17]. We also give a numerical verification by using the Gillespie algorithm [18] (i.e. kinetic Monte Carlo simulation) as shown in figure 2. We directly simulate the copolymerization of 105 chains with $\mathop{\underbrace{{RR} \ \cdots \ R}}\limits_{20}$ as the initial seed and count the terminal sequences of all chains to get the occurrence probabilities. The occurrence probabilities changing with time are plotted in figure 2, with the kinetic parameters given in the caption.
Figure 2. Simulation verification of the factorization conjecture of the minimal model with first-order neighbor effect. Rate constant: ${{ \mathcal K }}_{{RR}}=40$, ${{ \mathcal K }}_{{RW}}=8$, ${{ \mathcal K }}_{{WR}}=8$, ${{ \mathcal K }}_{{WW}}=1$, ${{ \mathcal Z }}_{{RR}}=1$, ${{ \mathcal Z }}_{{RW}}=2$, ${{ \mathcal Z }}_{{WR}}=6$, ${{ \mathcal Z }}_{{WW}}=4$, ${{ \mathcal Q }}_{{RR}}=1$, ${{ \mathcal Q }}_{{RW}}=8$, ${{ \mathcal Q }}_{{WR}}=12$, ${{ \mathcal Q }}_{{WW}}=4$. The number of simulations is 105 samples.
With this factorization conjecture, the original unclosed equation (3) can be transformed into the following closed equations with four basic variables ${P}_{{i}_{2}{i}_{1}}$,
$\begin{eqnarray}\begin{array}{rcl}\displaystyle \frac{{J}_{{RR}}}{{J}_{{WR}}} & = & \displaystyle \frac{{P}_{{RR}}}{{P}_{{WR}}},\displaystyle \frac{{J}_{{RW}}}{{J}_{{WW}}}=\displaystyle \frac{{P}_{{RW}}}{{P}_{{WW}}},\\ {J}_{{RW}} & = & {J}_{{WR}},\qquad \displaystyle \sum _{{i}_{2},{i}_{1}}^{R,W}{P}_{{i}_{2}{i}_{1}}=1.\end{array}\end{eqnarray}$
According to equation (1), the steady-state fidelity can be calculated by,
$\begin{eqnarray}\phi =\displaystyle \frac{{\dot{N}}_{R}}{{\dot{N}}_{W}}=\displaystyle \frac{{J}_{R}}{{J}_{W}},\end{eqnarray}$
here JR = JRR + JWR and JW = JRW + JWW, which can be got by solving equation (7).

2.2. Minimal model with higher-order neighbor effect

The same logic in section (2.1) can be generalized to cases with any order neighbor effects. For h-order neighbor effect, the rate constants can be written as ${ \mathcal K }{\left({\rm{or}}\ { \mathcal Z },{ \mathcal Q }\right)}_{{i}_{h+1}{i}_{h} \ \cdots \ {i}_{1}}$. The kinetic equations of ${P}_{{i}_{h+1}{i}_{h} \ \cdots \ {i}_{1}}$ and ${J}_{{i}_{h+1}{i}_{h} \ \cdots \ {i}_{1}}$ are,
$\begin{eqnarray}\begin{array}{l}{\dot{P}}_{{i}_{n} \ \ \cdots \ \ {i}_{2}{i}_{1}}={J}_{{i}_{n} \ \ \cdots \ \ {i}_{2}{i}_{1}}-{\tilde{J}}_{{i}_{n} \ \ \cdots \ \ {i}_{2}{i}_{1}},\\ {\tilde{J}}_{{i}_{n} \ \ \cdots \ \ {i}_{2}{i}_{1}}\equiv {J}_{{i}_{n} \ \ \cdots \ \ {i}_{2}{i}_{1}R}+{J}_{{i}_{n} \ \ \cdots \ \ {i}_{2}{i}_{1}W},\\ {J}_{{i}_{n} \ \ \cdots \ \ {i}_{2}{i}_{1}}={{ \mathcal K }}_{{i}_{h+1} \ {i}_{h} \ \ \cdots \ \ {i}_{1}}{P}_{{i}_{n} \ \ \cdots \ \ {i}_{2}}\\ -{{ \mathcal Z }}_{{i}_{h+1} \ {i}_{h} \ \ \cdots \ \ {i}_{1}}{P}_{{i}_{n} \ \ \cdots \ \ {i}_{2}{i}_{1}}-{{ \mathcal Q }}_{{i}_{h+1} \ {i}_{h} \ \ \cdots \ \ {i}_{1}}{P}_{{i}_{n} \ \ \cdots \ \ {i}_{2}{i}_{1}}\\ -{{ \mathcal Q }}_{{i}_{h} \ \ \cdots \ \ {i}_{1}R}{P}_{{i}_{n} \ \ \cdots \ \ {i}_{2}{i}_{1}R}-{{ \mathcal Q }}_{{i}_{h} \ \ \cdots \ \ {i}_{1}W}{P}_{{i}_{n} \ \ \cdots \ \ {i}_{2}{i}_{1}W}.\end{array}\end{eqnarray}$
To solve these unclosed equations, we use the h-order factorization conjecture as following,
$P_{i_{n} \ \cdots \ i_{2} i_{1}}=\left(\prod_{m=h+1}^{n} \frac{P_{i_{m} \ i_{m-1} \ ... i_{m-h+1} \ \ i_{m-h}}}{P_{i_{m-1} \ \ \cdots \ \ i_{m-h+1} \ \ i_{m-h}}} \ \ \right) \\ × P_{i_{h+1} \ i_{h} \ \cdots \ i_{2} i_{1}}, n \geqslant h+2$
These finally lead to a set of closed equations,
$\begin{eqnarray}\begin{array}{l}\displaystyle \frac{{J}_{{{Ri}}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}}{{J}_{{{Wi}}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}}=\displaystyle \frac{{P}_{{{Ri}}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}}{{P}_{{{Wi}}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}},\\ {J}_{{i}_{h} \ \cdots \ {i}_{2}{i}_{1}}={\tilde{J}}_{{i}_{h} \ \cdots \ {i}_{2}{i}_{1}},\\ \displaystyle \sum _{{i}_{h+1} \ , \ {i}_{h}, \ \cdots \ ,{i}_{1}}^{R,W}{P}_{{i}_{h+1} \ {i}_{h} \ \cdots \ {i}_{1}}=1.\end{array}\end{eqnarray}$
By solving these equations, one can get J and further calculate the fidelity defined by equation (8).

2.3. The approximate fidelity under bio-relevant conditions

In order to intuitively understand how RNAP fidelity is determined by some key kinetic parameters, one has to solve equations (7), (11) to obtain the analytical expression of the fidelity. However, it is very hard to solve such nonlinear equations, particularly when the number of variables becomes large (i.e. h is large). Below we will try to derive the approximate expression of the fidelity for any h-order model, under some special conditions of the kinetic parameters.
Inspired by the experiments of DNAP fidelity analysis [17], we also assume there are huge differences in the order of magnitudes of rate constants for RNAP. For example, the incorporation of the terminal match may be much faster than that of the terminal mismatch. These conditions are reasonable for real transcription process. Below we list such biologically-relevant conditions, with the simplified mathematical notations, i.e. ${{ \mathcal K }}_{\mathop{\underbrace{{RR} \ \cdots \ R}}\limits_{h+1}}$ as ${{ \mathcal K }}_{0}$, ${{K}}_{\mathop{\underbrace{{RR} \ \cdots \ R}}\limits_{n-1}\ W\mathop{\underbrace{{RR} \ \cdots \ R}}\limits_{h-n+1}}$ as ${{ \mathcal K }}_{n}$, and ${{K}}_{\mathop{\underbrace{{RR} \ \cdots \ R}}\limits_{n-1}\ W\mathop{\underbrace{{RR} \ \cdots \ R}}\limits_{l-n-1}\ W\mathop{\underbrace{{RR} \ \cdots \ R}}\limits_{h-l+1}}$ as ${{ \mathcal K }}_{n,l}$.
a

(a)φ ≫ 1, meaning that the fidelity is always larger than 1.

b

(b)${{ \mathcal K }}_{0}\gg {{ \mathcal Z }}_{0},{{ \mathcal Q }}_{0},{{ \mathcal Z }}_{h+1},{{ \mathcal Q }}_{h+1}$, here ≫ means more than one order of magnitude larger. This condition means that successive additions of R always dominate the transcription process in order to guarantee the high velocity.

c

(c)${{ \mathcal K }}_{n,h+1}\lt {{ \mathcal Z }}_{n+1}+{{ \mathcal Q }}_{n+1}$, where n = 2, 3, ⋯ , h. This condition means that there can be at most one mismatch in the h + 1 length oligonucleotides ih+1ini1, i.e. ${P}_{{i}_{h+1} \ {i}_{n} \ \ \cdots \ \ {i}_{1}}\approx 0$ if there are more than two mismatches.

d

(d)${{ \mathcal Z }}_{0}\lt {{ \mathcal Z }}_{n}$ and $\ {{ \mathcal Q }}_{0}\lt {{ \mathcal Q }}_{n},n=1,2, \ \cdots \ ,h$. This condition means that the mismatch (even hth terminal mismatch) can enhance the proofreading probability.

Under such bio-relevant conditions, we can solve the equations (7), (11) analytically. For h-order neighbor effect, ${\phi }_{h}={\phi }_{{s}_{h}}{\phi }_{{e}_{h}}$, in which ${\phi }_{{s}_{h}}$ represents the initial discrimination and ${\phi }_{{e}_{h}}$ represents the proofreading efficiency. For the first-order model the approximate fidelity is,
$\begin{eqnarray}\begin{array}{l}{\phi }_{1}={\phi }_{{s}_{1}}{\phi }_{{e}_{1}},\ {\phi }_{{s}_{1}}\approx \displaystyle \frac{{{ \mathcal K }}_{{RR}}}{{{ \mathcal K }}_{{RW}}},\\ {\phi }_{{e}_{1}}\approx 1+\displaystyle \frac{({{ \mathcal Z }}_{{RW}}+{{ \mathcal Q }}_{{RW}})}{{{ \mathcal K }}_{{WR}}}.\end{array}\end{eqnarray}$
Details of the calculation can be found in appendix A. This approximate expression shows good agreement with the numerical results by directly solving equation (7), as shown in appendix A. The initial discrimination can be roughly regarded as the ratio between the two incorporation rates of the match and the mismatch. The proofreading efficiency can be roughly regarded as the ratio of the elongation probability of the terminal match (Pel,R) to that of the terminal mismatch Pel,W. Obviously Pel,R ≈ 1 and ${P}_{{el},W}\approx {{ \mathcal K }}_{{WR}}$ $/({{ \mathcal K }}_{{WR}}+{{ \mathcal Q }}_{{RW}}+{{ \mathcal Z }}_{{RW}})$, which leads to ${\phi }_{{e}_{1}}\approx 1/{P}_{{el},W}$. In fact, many experimentalists used similar expressions to estimate the fidelity intuitively without any justification [10, 13]. Here we have provided rigorous proof. It should be pointed out that their intuitive analysis is hard to be applied to high-order neighbor effects, while our theory can easily handle such complex cases.
For h-order model the fidelity can be approximately written as,
$\begin{eqnarray}\begin{array}{l}{\phi }_{h}={\phi }_{{s}_{h}}{\phi }_{{e}_{h}},\ {\phi }_{{s}_{h}}\simeq \displaystyle \frac{{{ \mathcal K }}_{0}}{{{ \mathcal K }}_{h+1}},\\ {\phi }_{{e}_{h}}\simeq 1+\displaystyle \frac{{Q}_{h}}{{{ \mathcal K }}_{h-1}}{\phi }_{{e}_{h-2}}\\ +\displaystyle \frac{{{ \mathcal Z }}_{h+1}+{{ \mathcal Q }}_{h+1}}{{{ \mathcal K }}_{h}}{\phi }_{{e}_{h-1}},\\ {\phi }_{{e}_{n}}\simeq 1+\displaystyle \frac{{{ \mathcal Q }}_{n}}{{{ \mathcal K }}_{n-1}}{\phi }_{{e}_{n-2}}\\ +\displaystyle \frac{{{ \mathcal Z }}_{n+1}+{{ \mathcal Q }}_{n+1}}{{{ \mathcal K }}_{n}}{\phi }_{{e}_{n-1}}\ (n=h-1,h, \ \cdots \ ,3),\\ {\phi }_{{e}_{2}}\simeq 1+\displaystyle \frac{{{ \mathcal Q }}_{2}}{{{ \mathcal K }}_{1}}+\displaystyle \frac{{{ \mathcal Z }}_{3}+{{ \mathcal Q }}_{3}}{{{ \mathcal K }}_{2}}{\phi }_{{e}_{1}},\\ {\phi }_{{e}_{1}}\simeq 1+\displaystyle \frac{{{ \mathcal Z }}_{2}+{{ \mathcal Q }}_{2}}{{{ \mathcal K }}_{1}}.\end{array}\end{eqnarray}$
Details of the calculation can be found in appendix B. To better illustrate the complexity of equation (13), we compare it with the DNA replication fidelity as following [17],
$\begin{eqnarray}\begin{array}{l}{\phi }^{\mathrm{DNA}}={\phi }_{{s}_{h}}^{\mathrm{DNA}}{\phi }_{{e}_{h}}^{\mathrm{DNA}},\quad {\phi }_{{s}_{h}}^{\mathrm{DNA}}\approx \displaystyle \frac{{{ \mathcal K }}_{0}}{{{ \mathcal K }}_{h+1}},\\ {\phi }_{{e}_{h}}^{\mathrm{DNA}}\approx \left(1+\displaystyle \frac{{{ \mathcal Z }}_{2}}{{{ \mathcal K }}_{1}}\left(1+\displaystyle \frac{{{ \mathcal Z }}_{3}}{{{ \mathcal K }}_{2}}...\left(1+\displaystyle \frac{{{ \mathcal Z }}_{h+1}}{{{ \mathcal K }}_{h}}\right)\right)\right).\end{array}\end{eqnarray}$
The main difference between DNAP and RNAP proofreading mechanism is that DNAP can excise only one terminal dNMP, i.e. the minimal model in figure 1 can describe DNA replication if ${ \mathcal Q }=0$ and equation (13) is thus reduced to equation (14). Equation (14) provides an intuitive way to generalize the analytical expressions of DNAP fidelity from lower-order (e.g. h = 1) to any higher-order neighbor effects, in terms of a set of elongation probabilities defined similarly to the one mentioned above. It seems natural to follow the same logic to generalize equation (12) to higher-order models of RNA transcription by defining elongation probabilities with only the total excision rates ${ \mathcal Q }+{ \mathcal Z }$, i.e. simply replacing ${{ \mathcal Z }}_{n}$ in equation (14) by ${{ \mathcal Q }}_{n}+{{ \mathcal Z }}_{n}$. However, this intuitive logic leads to expressions for RNAP fidelity much different from equation (13).

2.4. Multi-step transcription model

The real transcription scheme can be more complex than the minimal model, for example, there can be a multi-step incorporation process. One can also apply the steady-state copolymerization analysis to these complex reaction schemes. Here we show a more realistic model proposed by Ehrenberg et al [10] as shown in figure 3.
Figure 3. The transcription model proposed by Ehrenberg et al [10].
This model includes multi-step dinucleotides cleavage and multi-step incorporation. Before NTP incorporation, the RNAP should translocate forward along the template from ${D}_{m}^{\mathrm{PRE}}$ (pre-translocation state) in figure 3 to ${D}_{m}^{\mathrm{POST}}$ (post-translocation state), which empties the active pocket to accept the next incoming NTP. Then one NTP can bind to the RNAP (denoted as association) in figure 3 and eventually be incorporated by an irreversible chemical step (denoted as elongation). Then RNAP will translocate from ${D}_{m+1}^{\mathrm{PRE}}$ to ${D}_{m+1}^{\mathrm{POST}}$ to start the next incorporation. To cleave the terminal NMP, RNAP should first translocate backward from ${D}_{m}^{\mathrm{PRE}}$ to ${D}_{m}^{\mathrm{BACK}}$ and cleave dinucleotides to reach ${D}_{m-2}^{\mathrm{POST}}$.
For brevity, we only consider the first-order neighbor effect here. One can follow the same logic in section 2.1 to write the kinetic equations,
$\begin{eqnarray}\begin{array}{l}{\dot{P}}_{{ij}}^{\mathrm{PRE}}={k}_{c,{ij}}{P}_{{ij}}^{\mathrm{NTP}}-{k}_{1,{ij}}{P}_{{ij}}^{\mathrm{PRE}}\\ +{q}_{1,{ij}}{P}_{{ij}}^{\mathrm{BACK}}-{k}_{2,{ij}}{P}_{{ij}}^{\mathrm{PRE}}+{q}_{2,{ij}}{P}_{{ij}}^{\mathrm{POST}},\\ {\dot{P}}_{{ij}}^{\mathrm{POST}}={k}_{2,{ij}}{P}_{{ij}}^{\mathrm{PRE}}-{q}_{2,{ij}}{P}_{{ij}}^{\mathrm{POST}}\\ -\displaystyle \sum _{k}^{R,W}{k}_{3,{jk}}{P}_{{ij}}^{\mathrm{POST}}+\displaystyle \sum _{k}^{R,W}{q}_{3,{jk}}{P}_{{ijk}}^{\mathrm{NTP}}\\ +\displaystyle \sum _{k,l}^{R,W}{q}_{c,{kl}}{P}_{{ijkl}}^{\mathrm{BACK}},\\ {\dot{P}}_{{ij}}^{\mathrm{NTP}}={k}_{3,{ij}}{P}_{i}^{\mathrm{POST}}-{q}_{3,{ij}}{P}_{{ij}}^{\mathrm{NTP}}-{k}_{c,{ij}}{P}_{{ij}}^{\mathrm{NTP}},\\ {\dot{P}}_{{ij}}^{\mathrm{BACK}}={k}_{1,{ij}}{P}_{{ij}}^{\mathrm{PRE}}-{q}_{1,{ij}}{P}_{{ij}}^{\mathrm{BACK}}-{q}_{c,{ij}}{P}_{{ij}}^{\mathrm{BACK}},\end{array}\end{eqnarray}$
where superscripts X of PX denote sub-states, i, j, k, l = R, W. ${k}_{3}={k}_{3}^{0}[\mathrm{NTP}]$, [NTP] represents the NTP concentration. In steady-state, ${\dot{P}}^{{\rm{X}}}=0$, each PX (X ≠ POST) in equation (15) can be written as a function of PPOST. So after eliminating other sub-states except PPOST, equation (15) can be reduced to,
$\begin{eqnarray}\begin{array}{rcl}{J}_{{ij}}^{\mathrm{POST}} & = & {\tilde{J}}_{{ij}}^{\mathrm{POST}},\\ {\tilde{J}}_{{ij}}^{\mathrm{POST}} & \equiv & {J}_{{ijR}}^{\mathrm{POST}}+{J}_{{ijW}}^{\mathrm{POST}},\\ {J}_{{ij}}^{\mathrm{POST}} & = & {\bar{{ \mathcal K }}}_{{ij}}{P}_{i}^{\mathrm{POST}}-{\bar{{ \mathcal Z }}}_{{ij}}{P}_{{ij}}^{\mathrm{POST}}-{\bar{{ \mathcal Q }}}_{{ij}}{P}_{{ij}}^{\mathrm{POST}}\\ & & -{\bar{{ \mathcal Q }}}_{{jR}}{P}_{{ijR}}^{\mathrm{POST}}-{\bar{{ \mathcal Q }}}_{{jW}}{P}_{{ijW}}^{\mathrm{POST}}.\end{array}\end{eqnarray}$
The effective kinetic rates $\bar{{ \mathcal K }}$, $\bar{{ \mathcal Z }}$, $\bar{{ \mathcal Q }}$ are determined uniquely,
$\begin{eqnarray}\begin{array}{rcl}{\bar{{ \mathcal K }}}_{{ij}} & = & {\left(\displaystyle \frac{{k}_{3}}{1+\tfrac{{q}_{3}}{{k}_{c}}}\displaystyle \frac{1}{1+\tfrac{{k}_{1}}{{k}_{2}(1+\tfrac{{q}_{1}}{{q}_{c}})}}\right)}_{{ij}},\\ {\bar{{ \mathcal Z }}}_{{ij}} & = & \displaystyle \sum _{k}^{R,W}{\left(\displaystyle \frac{{k}_{3}}{1+\tfrac{{q}_{3}}{{k}_{c}}}\displaystyle \frac{1}{1+\tfrac{{k}_{2}(1+\tfrac{{q}_{1}}{{q}_{c}})}{{k}_{1}}}\right)}_{{jk}},\\ {\bar{{ \mathcal Q }}}_{{ij}} & = & {\left(\displaystyle \frac{{q}_{2}}{1+\tfrac{{k}_{2}(1+\tfrac{{q}_{1}}{{q}_{c}})}{{k}_{1}}}\right)}_{{ij}}.\end{array}\end{eqnarray}$
Here $\bar{{ \mathcal K }}=\bar{{{ \mathcal K }}^{0}}[\mathrm{NTP}]$. It should be noted that $\bar{{ \mathcal Z }}$ are just effective cleavage rates due to the existence of dinucleotides cleavage and they do not correspond to any real mononucleotide cleavage process.
Comparing equations (2) and (16), one can apply the same factorization conjecture in section 2.1 to get,
$\begin{eqnarray}\begin{array}{rcl}\displaystyle \frac{{J}_{{RR}}^{\mathrm{POST}}}{{J}_{{WR}}^{\mathrm{POST}}} & = & \displaystyle \frac{{P}_{{RR}}^{\mathrm{POST}}}{{P}_{{WR}}^{\mathrm{POST}}},\ \displaystyle \frac{{J}_{{RW}}^{\mathrm{POST}}}{{J}_{{WW}}^{\mathrm{POST}}}=\displaystyle \frac{{P}_{{RW}}^{\mathrm{POST}}}{{P}_{{WW}}^{\mathrm{POST}}},\\ {J}_{{RW}}^{\mathrm{POST}} & = & {J}_{{WR}}^{\mathrm{POST}},\displaystyle \sum _{{i}_{2},{i}_{1}}^{R,W}{P}_{{i}_{2}{i}_{1}}^{\mathrm{POST}}={P}^{\mathrm{POST}}.\end{array}\end{eqnarray}$
We introduce the normalized ${\bar{P}}_{{i}_{2}{i}_{1}}\equiv {P}_{{i}_{2}{i}_{1}}^{\mathrm{POST}}/{P}^{\mathrm{POST}}$ and ${\bar{J}}_{{i}_{2}{i}_{1}}\equiv {J}_{{i}_{2}{i}_{1}}^{\mathrm{POST}}/{P}^{\mathrm{POST}}$ to rewritten the above equations as,
$\begin{eqnarray}\begin{array}{rcl}\displaystyle \frac{{\bar{J}}_{{RR}}}{{\bar{J}}_{{WR}}} & = & \displaystyle \frac{{\bar{P}}_{{RR}}}{{\bar{P}}_{{WR}}},\displaystyle \frac{{\bar{J}}_{{RW}}}{{\bar{J}}_{{WW}}}=\displaystyle \frac{{\bar{P}}_{{RW}}}{{\bar{P}}_{{WW}}},\\ {\bar{J}}_{{RW}} & = & {\bar{J}}_{{WR}},\displaystyle \sum _{{i}_{2},{i}_{1}}^{R,W}{\bar{P}}_{{i}_{2}{i}_{1}}=1.\end{array}\end{eqnarray}$
By solving equation (19) to get ${\bar{J}}_{R}$ and ${\bar{J}}_{W}$, one can calculate the fidelity defined as,
$\begin{eqnarray}\phi =\displaystyle \frac{{J}_{R}^{\mathrm{POST}}}{{J}_{W}^{\mathrm{POST}}}=\displaystyle \frac{{\bar{J}}_{R}}{{\bar{J}}_{W}}.\end{eqnarray}$
With these rigorously determined effective rates, the approximate expression in section 2.3 can be applied. Below we give an example of using the effective rates and equation (12) to calculate the fidelity. Alic et al [13] have measured the apparent incorporation rates of Pol III as kRR ≃ 2.2 × 102 s−1, kRW ≃ 3.9 × 10−2 s−1, kWR ≃ 9.4 × 10−4 s−1 (note that k = k0[NTP], and [NTP] are all taken as 600 μM in the experiments), and the apparent dinucleotides cleavage rate as kexo,RW ≃ 8.3 × 10−1 s−1. Since $\bar{{ \mathcal Z }}$ and $\bar{{ \mathcal Q }}$ are both effective cleavage rates due to the existence of dinucleotides cleavage, we assume the apparent dinucleotides cleavage rate kexo can be roughly regarded as $\bar{{ \mathcal Z }}+\bar{{ \mathcal Q }}$, i.e. ${\bar{{ \mathcal Z }}}_{{RW}}+{\bar{{ \mathcal Q }}}_{{RW}}\approx {k}_{{\rm{exo}},{RW}}$. Similar, the apparent incorporation rate can be roughly regarded as effective incorporation rates, i.e. ${\bar{{ \mathcal K }}}_{{RR}}\approx {k}_{{RR}}$, ${\bar{{ \mathcal K }}}_{{RW}}\approx {k}_{{RW}}$, ${\bar{{ \mathcal K }}}_{{WR}}\approx {k}_{{WR}}$. This leads to φs ≈ 5.6 × 103, φe ≈ 0.88 × 103 and φ ≈ 5.0 × 106.

3. Copolymerization model with oligonucleotides cleavage

Experiments have shown that RNAP can cleave more than two nucleotides once, for example, bacterial cleavage factor GreA can bind to RNAP and stimulate it to cleave di- and trinucleotides, and GreB responds for much longer RNA segment cleavage [19]. Although it is still not clear whether the long backtracking contributes to the proofreading efficiency, it deserves a systematic theoretical treatment. Here we propose the oligonucleotides cleavage model and apply the steady-state copolymerization analysis.
Considering the h-order neighbor effect and L-length oligonucleotides cleavage, we give the minimal model as shown in figure 4.
Figure 4. The minimal model of RNA transcription with mono- to Lth-nucleotide cleavage.
Following the same logic in section 2.1, the kinetic equations of P and J are,
$\begin{eqnarray}\begin{array}{l}{\dot{P}}_{{i}_{h+1} \ \ \cdots \ {i}_{2}{i}_{1}}={J}_{{i}_{h+1} \ \ \cdots \ {i}_{2}{i}_{1}}^{L}-{\tilde{{J}^{L}}}_{{i}_{h+1}\ \ \cdots \ {i}_{2}{i}_{1}},\\ {\tilde{{J}^{L}}}_{{i}_{h+1} \ \cdots \ {i}_{2}{i}_{1}}\equiv {J}_{{i}_{h+1} \ \ \cdots \ {i}_{2}{i}_{1}R}^{L}+{J}_{{i}_{h+1} \ \ \cdots \ {i}_{2}{i}_{1}W}^{L},\\ {J}_{{i}_{h+1} \ \ \cdots \ {i}_{2}{i}_{1}}^{L}={{ \mathcal K }}_{{i}_{h+1} \ \ \cdots \ {i}_{2}{i}_{1}}{P}_{{i}_{h+1} \ \ \cdots \ {i}_{2}}\\ -\left[\sum_{j=1}^{L} \mathcal{Q}_{j, i_{h+1} \ \ \cdots \ i_{2} i_{1}} P_{i_{h+1} \ \cdots \ i_{2} i_{1}}\right. \\ +\sum_{j=2}^{L} \sum_{I_{1}}^{R, W} \mathcal{Q}_{j, i_{h} \ \cdots \ i_{1} I_{1}} P_{i_{h+1} \ \cdots \ i_{1} I_{1}} \\ + \ \cdots \ +\sum_{j=k}^{L} \sum_{I_{1}, \ \ \cdots \ , I_{k-1}}^{R, W} \\ \times \mathcal{Q}_{j, i_{h-k+1} \ \ \cdots \ i_{1} I_{1} \ \cdots \ I_{k-1}} P_{i_{h+1} \ \ \cdots \ i_{1} I_{1} \ \cdots \ I_{k-1}}+ \ \cdots \ \\ \left.+\sum_{I_{1}, \ \cdots \ , I_{L-1}}^{R, W} \mathcal{Q}_{L, i_{h-L+1} \ \ \cdots \ i_{1} I_{1} \ \cdots \ I_{L-1}} P_{i_{h+1} \ \ \cdots \ i_{1} I_{1} \ \cdots \ I_{L-1}}\right].\end{array}\end{eqnarray}$
It is obvious that hL since ${ \mathcal Q }$ depends on all the nucleotides on the cleaved fragment. The structure of equation (21) is the same as equation (9) despite J becomes extremely complex. Since the total transcript length N = NR + NW is far greater than L, steady-state are still satisfied, factorization conjecture in equation (10) can be applied to get the following closed equations,
$\begin{eqnarray}\begin{array}{l}\displaystyle \frac{{J}_{R\ {i}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}^{L}}{{J}_{{{Wi}}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}^{L}}=\displaystyle \frac{{P}_{R\ {i}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}}{{P}_{{{Wi}}_{h}{i}_{h-1} \ \cdots \ \ {i}_{1}}},\\ {J}_{{i}_{h} \ \cdots \ {i}_{2}{i}_{1}}^{L}={\tilde{J}}_{{i}_{h} \ \cdots \ {i}_{2}{i}_{1}}^{L},\\ \sum _{{i}_{h+1} \ , \ {i}_{h}, \ \cdots \ ,{i}_{1}}^{R,W}{P}_{{i}_{h+1} \ {i}_{h} \ \cdots \ {i}_{1}}=1.\end{array}\end{eqnarray}$
One can reasonably assume the bio-relevant conditions are still satisfied and get the approximate fidelity,
$\begin{eqnarray}\begin{array}{l}{\phi }^{L}={\phi }_{{s}_{h}}^{L}{\phi }_{{e}_{h}}^{L},\quad {\phi }_{{s}_{h}}^{L}\approx \displaystyle \frac{{{ \mathcal K }}_{0}}{{{ \mathcal K }}_{h+1}},\quad {\phi }_{{e}_{0}}^{L}\equiv 1,\\ {\phi }_{{e}_{n}}^{L}\approx 1+\sum _{i={i}_{0}}^{n}\displaystyle \frac{{\sum }_{j=n-i+1}^{L}{{ \mathcal Q }}_{j,i+1}}{{{ \mathcal K }}_{i}}{\phi }_{{e}_{i-1}}^{L},\\ {i}_{0}=\max (1,n-L+1),n=1,2, \ \cdots \ ,h.\end{array}\end{eqnarray}$
Details of the calculation can be found in appendix C.
This expression clearly shows the contribution of oligonucleotides cleavage to overall proofreading efficiency. For example, the contribution of trinucleotides cleavage is ${\phi }_{{e}_{2}}^{L=3}/{\phi }_{{e}_{2}}^{L=2}$, in which ${\phi }_{{e}_{2}}^{L=3}$ and ${\phi }_{{e}_{2}}^{L=2}$ represent the proofreading efficiency with or without trinucleotides cleavage. Since the trinucleotides cleavage corresponds to 2- or higher-order neighbor effect, here we discuss the simplest situation h = 2. This ratio can be written as,
$\begin{eqnarray}\begin{array}{l}\displaystyle \frac{{\phi }_{{e}_{2}}^{L=3}}{{\phi }_{{e}_{2}}^{L=2}}\approx \left(1+\displaystyle \frac{{{ \mathcal Q }}_{3,{RRW}}}{{{ \mathcal Q }}_{1,{RRW}}+{{ \mathcal Q }}_{2,{RRW}}}\right)\\ \cdot \left(1+\displaystyle \frac{{{ \mathcal Q }}_{3,{RWR}}}{{{ \mathcal Q }}_{1,{RWR}}+{{ \mathcal Q }}_{2,{RWR}}}\right).\end{array}\end{eqnarray}$
${{ \mathcal Q }}_{3,{RRW}}\geqslant {{ \mathcal Q }}_{1,{RRW}}+{{ \mathcal Q }}_{2,{RRW}}$ or ${{ \mathcal Q }}_{3,{RWR}}\geqslant {{ \mathcal Q }}_{1,{RWR}}+{{ \mathcal Q }}_{2,{RWR}}$ can lead to equation (24) ≥ 2, which means the trinucleotides cleavage contributes more to the overall fidelity than dinucleotides cleavage. Equation (24) provides a clear perspective on which critical kinetic parameters should be concerned by experimentalists, and hence the contribution of trinucleotides cleavage to the fidelity can be quantified if these rates are measured.

4. Summary

In this paper, we develop a systematic method to study DNA transcription fidelity rigorously. First, we propose and handle the minimal model with first-order neighbor effect to illustrate the basic logic of the steady-state copolymerization analysis in section 2.1. This analysis is then extended to handle high-order effects as shown in section 2.2. Section 2.3 gives the approximate fidelity with high precision and clearly shows the fidelity determined by some critical rate constants. This approximate fidelity expression also shows how the RNAP incorporation mode and cleavage mode contribute to the overall fidelity. To consider more realistic models, we handle the multi-step model and get the fidelity with uniquely defined effective rates in section (2.4). These sections show that the steady-state copolymerization analysis can be generalized to any realistic model considering more details, such as the oligonucleotides cleavage model in section (3).
It should be pointed out that the high-order neighbor effect deserves more concerns. For a similar process DNA replication, it has been indicated the second-order proofreading can contribute a factor ∼10 to the overall fidelity [17], which is due to the extra instability of the DNA duplex terminal inside the DNAP when a mismatch occurs at the penultimate site of the terminus. Since the physical properties of the DNA duplex (in DNA replication) and the RNA-DNA duplex (in DNA transcription) are quite similar, one can expect similar second-order effects may also be presented in DNA transcription. Unfortunately, so far as we know, there is no experiment to discuss such higher-order effects. Our theory and major conclusions presented in this paper may stimulate future experimental investigations on these issues.
Last, there are some sub-processes in DNA transcription which are not considered in this paper, e.g. substeps such as RNAP dissociation and rebinding. During the transcription, RNAP is tightly bound to the nascent transcript and the template DNA, forming the transcription bubble. The deformation of the transcription bubble is also not considered in our models. All these complex sub-processes can easily be handled by our theory. One can follow the same logic to get the effective rates and calculate the fidelity by the approximate expression. However, our theory has a critical limitation, i.e. it ignores the template sequence effects which is more concerned about by biochemists. We hope this theory may serve as a start for future theoretical studies to incorporate more realistic and more complex factors such as template sequence effects.

Acknowledgments

The authors thank the financial support by National Natural Science Foundation of China (No.11675180, 11774358), the CAS Strategic Priority Research Program (No.XDA17010504), Key Research Program of Frontier Sciences of CAS (No.Y7Y1472Y61), Research Fund of Wenzhou Institute CAS (No.WIUCASYJ2020004, WIUCASQD2020009).

Appendix A. The derivation of the approximate fidelity with first-order neighbor effect

Below we give the derivation of the approximate fidelity equation (12). The equations of the minimal model with first-order neighbor effect are
$\begin{eqnarray}\begin{array}{l}\displaystyle \frac{{J}_{{RR}}}{{J}_{{WR}}}=\displaystyle \frac{{P}_{{RR}}}{{P}_{{WR}}},\ \displaystyle \frac{{J}_{{RW}}}{{J}_{{WW}}}=\displaystyle \frac{{P}_{{RW}}}{{P}_{{WW}}},\\ {J}_{{RW}}={J}_{{WR}},{P}_{{RR}}+{P}_{{RW}}+{P}_{{WR}}+{P}_{{WW}}=1,\end{array}\end{eqnarray}$
where ${J}_{{ij}}={{ \mathcal K }}_{{ij}}{P}_{i}$ $-{{ \mathcal Z }}_{{ij}}{P}_{{ij}}-{{ \mathcal Q }}_{{ij}}{P}_{{ij}}$ $-{{ \mathcal Q }}_{{jR}}{P}_{{ijR}}-{{ \mathcal Q }}_{{jW}}{P}_{{ijW}},i,j=R,W$. According to the bio-relevant conditions, one can reasonable assume that,
$\begin{eqnarray}{P}_{R}\approx {P}_{{RR}},{P}_{{WW}}\approx 0,{P}_{W}\approx {P}_{{RW}}.\end{eqnarray}$
With equations (A.1), (A.2), one can get JRRJWR, JRWJWW, which gives,
$\begin{eqnarray}\begin{array}{l}\displaystyle \frac{{J}_{R}}{{J}_{W}}=\displaystyle \frac{{J}_{{RR}}+{J}_{{WR}}}{{J}_{{RW}}+{J}_{{WW}}}\\ \approx \displaystyle \frac{{J}_{{RR}}}{{J}_{{RW}}}=\displaystyle \frac{{J}_{{RR}}}{{J}_{{WR}}}=\displaystyle \frac{{P}_{{RR}}}{{P}_{{WR}}}.\end{array}\end{eqnarray}$
To calculate PRR/PWR in equation (A.3), we introduce that,
$\begin{eqnarray}{a}_{2}\equiv \displaystyle \frac{{P}_{{RR}}}{{P}_{{RW}}},\quad {a}_{1}\equiv \displaystyle \frac{{P}_{{RW}}}{{P}_{{WR}}},\quad \displaystyle \frac{{P}_{{RR}}}{{P}_{{WR}}}={a}_{2}{a}_{1}.\end{eqnarray}$
With equation (A.2), JRR/JWR = PRR/PWR can be rewritten as,
$\begin{eqnarray}{{ \mathcal K }}_{{RR}}-{{ \mathcal Z }}_{{RR}}-{{ \mathcal Q }}_{{RR}}\approx {{ \mathcal K }}_{{WR}}\cdot {a}_{1}-{{ \mathcal Z }}_{{WR}}-{{ \mathcal Q }}_{{WR}}.\end{eqnarray}$
With ${{ \mathcal K }}_{{RR}}\gg {{ \mathcal Z }}_{{WR}}+{{ \mathcal Q }}_{{WR}}$ in condition (b) one can get a1 as,
$\begin{eqnarray}{a}_{1}\approx \displaystyle \frac{{{ \mathcal K }}_{{RR}}}{{{ \mathcal K }}_{{WR}}}.\end{eqnarray}$
And JRW = JWR in equation (A.1) leads to,
$\begin{eqnarray}\begin{array}{l}\displaystyle \frac{1}{{a}_{2}}\left({{ \mathcal K }}_{{RW}}\cdot {a}_{2}-{{ \mathcal Z }}_{{RW}}-{{ \mathcal Q }}_{{RW}}-{{ \mathcal Q }}_{{WR}}\displaystyle \frac{1}{{a}_{1}}\right)\\ \approx \displaystyle \frac{1}{{a}_{2}{a}_{1}}({{ \mathcal K }}_{{WR}}\cdot {a}_{1}-{{ \mathcal Z }}_{{WR}}-{{ \mathcal Q }}_{{WR}}).\end{array}\end{eqnarray}$
Since ${{ \mathcal Z }}_{{WR}}/{a}_{1}\approx {{ \mathcal K }}_{{WR}}\cdot {{ \mathcal Z }}_{{WR}}/{{ \mathcal K }}_{{RR}}\ll {{ \mathcal K }}_{{WR}}$ in condition (b), one can get a2 as,
$\begin{eqnarray}\begin{array}{l}{a}_{2}\approx \displaystyle \frac{({{ \mathcal K }}_{{WR}}+{{ \mathcal Z }}_{{RW}}+{{ \mathcal Q }}_{{RW}}-{{ \mathcal Z }}_{{WR}}/{a}_{1})}{{{ \mathcal K }}_{{RW}}}\\ \approx \displaystyle \frac{({{ \mathcal K }}_{{WR}}+{{ \mathcal Z }}_{{RW}}+{{ \mathcal Q }}_{{RW}})}{{{ \mathcal K }}_{{RW}}}.\end{array}\end{eqnarray}$
With a1, a2, one can finally get the approximate fidelity as,
$\begin{eqnarray}\phi \approx {a}_{2}{a}_{1}\approx \displaystyle \frac{{{ \mathcal K }}_{{RR}}}{{{ \mathcal K }}_{{RW}}}\left[1+\displaystyle \frac{({{ \mathcal Z }}_{{RW}}+{{ \mathcal Q }}_{{RW}})}{{{ \mathcal K }}_{{WR}}}\right].\end{eqnarray}$
We introduce ${\rm{\Delta }}\phi \equiv | \phi -{\phi }^{{\rm{app}}}| /{\rm{\min }}[\phi ,{\phi }^{{\rm{app}}}]$ to represent the relative derivation of approximation from precise fidelity. Here φapp represents the approximate fidelity estimated by equation (A.9) and φ represents the precise fidelity calculated by numerically solving equation (7). Some numerical examinations given in figure A1 show the Δφ is usually only about 10−2 with some reasonable rate constants, which means the approximate fideltiy is a good estimation with high precision.
Figure A1. Numerical examinations of the accuracy of the approximate fidelity. In each subfigure, only one parameter is changed while others are fixed. Red point: precise fidelity φ. Blue point: relative deviation Δφ. (a) ${{ \mathcal K }}_{{RW}}:{10}^{-1}-10$, (b) ${{ \mathcal K }}_{{WR}}:{10}^{-1}-10$, (c) ${{ \mathcal Z }}_{{RW}}:1-{10}^{2}$, (d) ${{ \mathcal Z }}_{{RW}}:1-{10}^{2}$, the fixed parameters see table A1.
Table A1. Parameters used in figure A1.
Rate parameter ${{ \mathcal K }}_{{RR}}$ ${{ \mathcal K }}_{{RW}}$ ${{ \mathcal K }}_{{WR}}$ ${{ \mathcal K }}_{{WW}}$ ${{ \mathcal Z }}_{{RR}}$ ${{ \mathcal Z }}_{{RW}}$ ${{ \mathcal Z }}_{{WR}}$ ${{ \mathcal Z }}_{{WW}}$ ${{ \mathcal Q }}_{{RR}}$ ${{ \mathcal Q }}_{{RW}}$ ${{ \mathcal Q }}_{{WR}}$ ${{ \mathcal Q }}_{{WW}}$
Figure A1(a) 103 10−1–10 1 10−3 10−2 10 1 10 10−2 10 10 20
Figure A1(b) 103 1 10−1–10 10−3 10−2 10 1 10 10−2 10 10 20
Figure A1(c) 103 1 1 10−3 10−2 1–102 1 10 10−2 10 10 20
Figure A1(d) 103 1 1 10−3 10−2 10 1 10 10−2 1–102 10 20

Appendix B. The derivation of the approximate fidelity with high-order neighbor effect

Below we give the derivation of the approximate fidelity equation (13). The equations of the minimal model with h-order neighbor effect are
$\begin{eqnarray}\begin{array}{l}\displaystyle \frac{{J}_{{{Ri}}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}}{{J}_{{{Wi}}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}}=\displaystyle \frac{{P}_{{{Ri}}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}}{{P}_{{{Wi}}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}},\\ {J}_{{i}_{h} \ \cdots \ {i}_{2}{i}_{1}}={\tilde{J}}_{{i}_{h} \ \cdots \ {i}_{2}{i}_{1}},\\ \sum _{{i}_{h+1},{i}_{h}, \ \cdots \ ,{i}_{1}}^{R,W}{P}_{{i}_{h+1}{i}_{h} \ \cdots \ {i}_{1}}=1,\end{array}\end{eqnarray}$
where ih, ih−1, ⋯ , i1 = R, W, and,
$\begin{eqnarray}\begin{array}{l}{J}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}={{ \mathcal K }}_{{i}_{h+1} \ {i}_{h} \ \cdots \ {i}_{1}}{P}_{{i}_{n} \ \cdots \ {i}_{2}}\\ -{{ \mathcal Z }}_{{i}_{h+1} \ {i}_{h} \ \cdots \ {i}_{1}}{P}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}-{{ \mathcal Q }}_{{i}_{h+1} \ {i}_{h} \ \cdots \ {i}_{1}}{P}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}\\ -{{ \mathcal Q }}_{{i}_{h} \ \cdots \ {i}_{1}R}{P}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}R}-{{ \mathcal Q }}_{{i}_{h} \ \cdots \ {i}_{1}W}{P}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}W,}\\ {\tilde{J}}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}}\equiv {J}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}R}+{J}_{{i}_{n} \ \cdots \ {i}_{2}{i}_{1}W}.\end{array}\end{eqnarray}$
According to the bio-relevant conditions, one can reasonable assume that,
$\begin{eqnarray}\begin{array}{l}{P}_{R}\approx {P}_{{RR}}\approx {P}_{R...R}\approx {P}_{0},\\ {P}_{n,l}\approx 0,\quad n,l=1,2,\ldots ,h+1,\\ {P}_{W}\approx {P}_{{RW}}\approx {P}_{R...{RW}}\approx {P}_{h+1},\\ {P}_{{WR}}\approx {P}_{{RWR}}\approx {P}_{R...{RWR}}\approx {P}_{h},\\ ...\end{array}\end{eqnarray}$
With equations (A.10), (A.12), one can get JRJ0, JWJh+1, which gives,
$\begin{eqnarray}\phi =\displaystyle \frac{{J}_{R}}{{J}_{W}}\approx \displaystyle \frac{{J}_{0}}{{J}_{h+1}}=\displaystyle \frac{{J}_{0}}{{J}_{1}}=\displaystyle \frac{{P}_{0}}{{P}_{1}}.\end{eqnarray}$
To calculate P0/P1 in equation (A.13), we introduce that,
$\begin{eqnarray}\begin{array}{l}{a}_{h+1}\equiv \displaystyle \frac{{P}_{0}}{{P}_{h+1}},\quad {a}_{h}\equiv \displaystyle \frac{{P}_{h+1}}{{P}_{h}},\quad \ \cdots \ ,\quad {a}_{1}\equiv \displaystyle \frac{{P}_{2}}{{P}_{1}},\\ \displaystyle \frac{{P}_{0}}{{P}_{1}}={a}_{h+1}{a}_{h} \ \cdots \ {a}_{1}.\end{array}\end{eqnarray}$
It is easy to obtain the expression of factor a1 by equation Jh+1/J1 = Ph+1/P1 in equation (A.10),
$\begin{eqnarray}{a}_{1}\approx \displaystyle \frac{{{ \mathcal K }}_{0}}{{{ \mathcal K }}_{1}}.\end{eqnarray}$
And equations (A.10), (A.12) leads to,
$\begin{eqnarray}{J}_{h+1}\approx {J}_{h}\approx \ \cdots \ \approx {J}_{1},\end{eqnarray}$
one can expend equation (A.16) and get,
$\begin{eqnarray}\begin{array}{l}\displaystyle \frac{1}{{a}_{h+1}}\left({{ \mathcal K }}_{h+1}\cdot {a}_{h+1}-{{ \mathcal Z }}_{h+1}-{{ \mathcal Q }}_{h+1}-{{ \mathcal Q }}_{h}\displaystyle \frac{1}{{a}_{h}}\right)\\ \approx \ \cdots \ \quad \\ \approx \displaystyle \frac{1}{{a}_{h+1} \ \cdots \ {a}_{2}}\left({{ \mathcal K }}_{2}\cdot {a}_{2}-{{ \mathcal Z }}_{2}-{{ \mathcal Q }}_{2}-{{ \mathcal Q }}_{1}\displaystyle \frac{1}{{a}_{1}}\right)\\ \approx \displaystyle \frac{1}{{a}_{h+1} \ \cdots \ {a}_{2}{a}_{1}}({{ \mathcal K }}_{1}\cdot {a}_{1}-{{ \mathcal Z }}_{1}-{{ \mathcal Q }}_{1}),\end{array}\end{eqnarray}$
further we have,
$\begin{eqnarray}\begin{array}{l}{a}_{i}\approx \displaystyle \frac{1}{{{ \mathcal K }}_{i}}\left({{ \mathcal K }}_{i-1}+{{ \mathcal Z }}_{i}+{{ \mathcal Q }}_{i}-{{ \mathcal Z }}_{i-1}\displaystyle \frac{1}{{a}_{i-1}}\right.\\ \left.-{{ \mathcal Q }}_{i-2}\displaystyle \frac{1}{{a}_{i-1}{a}_{i-2}}\right),(i=h+1,h,\ldots ,3)\\ {a}_{2}\approx \displaystyle \frac{1}{{{ \mathcal K }}_{2}}\left({{ \mathcal K }}_{1}+{{ \mathcal Z }}_{2}+{{ \mathcal Q }}_{2}-{{ \mathcal Z }}_{1}\displaystyle \frac{1}{{a}_{1}}\right),\\ {a}_{1}\approx \displaystyle \frac{{{ \mathcal K }}_{0}}{{{ \mathcal K }}_{1}}.\end{array}\end{eqnarray}$
Since ${a}_{1}$ ≫ 1, one can rewritten equation (A.18) as,
$\begin{eqnarray}\begin{array}{l}{A}_{i}={a}_{i}\cdot \frac{{{ \mathcal K }}_{i}}{{{ \mathcal K }}_{i-1}}\approx 1+\frac{{{ \mathcal Z }}_{i}+{{ \mathcal Q }}_{i}}{{{ \mathcal K }}_{i-1}}-\frac{{{ \mathcal Z }}_{i-1}}{{{ \mathcal K }}_{i-2}{A}_{i-1}}\\ -\frac{{{ \mathcal Q }}_{i-2}}{{{ \mathcal K }}_{i-3}{A}_{i-2}{A}_{i-1}},(i=h+1,h,\ldots ,4)\\ {A}_{3}={a}_{3}\cdot \frac{{{ \mathcal K }}_{3}}{{{ \mathcal K }}_{2}}\approx 1+\frac{{{ \mathcal Z }}_{3}+{{ \mathcal Q }}_{3}}{{{ \mathcal K }}_{2}}-\frac{{{ \mathcal Z }}_{2}}{{{ \mathcal K }}_{1}{A}_{2}},\\ {A}_{2}={a}_{2}\cdot \frac{{{ \mathcal K }}_{2}}{{{ \mathcal K }}_{1}}\approx 1+\frac{{{ \mathcal Z }}_{2}+{{ \mathcal Q }}_{2}}{{{ \mathcal K }}_{1}},\\ {A}_{1}={a}_{1}\cdot \frac{{{ \mathcal K }}_{1}}{{{ \mathcal K }}_{h+1}}\approx \frac{{{ \mathcal K }}_{0}}{{{ \mathcal K }}_{h+1}}.\end{array}\end{eqnarray}$
With the product of the above h + 1 factors, one can finally get the approximate fidelity as,
$\begin{eqnarray}\begin{array}{l}{\phi }_{h}={\phi }_{{s}_{h}}{\phi }_{{e}_{h}},\ {\phi }_{{s}_{h}}\simeq \displaystyle \frac{{{ \mathcal K }}_{0}}{{{ \mathcal K }}_{h+1}},\\ {\phi }_{{e}_{h}}\simeq 1+\displaystyle \frac{{Q}_{h}}{{{ \mathcal K }}_{h-1}}{\phi }_{{e}_{h-2}}\\ +\displaystyle \frac{{{ \mathcal Z }}_{h+1}+{{ \mathcal Q }}_{h+1}}{{{ \mathcal K }}_{h}}{\phi }_{{e}_{h-1}},\\ {\phi }_{{e}_{n}}\simeq 1+\displaystyle \frac{{{ \mathcal Q }}_{n}}{{{ \mathcal K }}_{n-1}}{\phi }_{{e}_{n-2}}+\displaystyle \frac{{{ \mathcal Z }}_{n+1}+{{ \mathcal Q }}_{n+1}}{{{ \mathcal K }}_{n}}{\phi }_{{e}_{n-1}}\\ (n=h-1,h, \ \cdots \ ,3),\\ {\phi }_{{e}_{2}}\simeq 1+\displaystyle \frac{{{ \mathcal Q }}_{2}}{{{ \mathcal K }}_{1}}+\displaystyle \frac{{{ \mathcal Z }}_{3}+{{ \mathcal Q }}_{3}}{{{ \mathcal K }}_{2}}{\phi }_{{e}_{1}},\\ {\phi }_{{e}_{1}}\simeq 1+\displaystyle \frac{{{ \mathcal Z }}_{2}+{{ \mathcal Q }}_{2}}{{{ \mathcal K }}_{1}}.\end{array}\end{eqnarray}$

Appendix C. The derivation of the approximate fidelity of oligonucleotides cleavage model

Below we give the derivation of the approximate fidelity equation (23). The equations of the minimal model with h-order neighbor effect and L-length oligonucleotides cleavage are
$\begin{eqnarray}\begin{array}{l}\displaystyle \frac{{J}_{R\ {i}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}^{L}}{{J}_{{{Wi}}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}^{L}}=\displaystyle \frac{{P}_{R\ {i}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}}{{P}_{{{Wi}}_{h}{i}_{h-1} \ \cdots \ {i}_{1}}},\\ {J}_{{i}_{h} \ \cdots \ {i}_{2}{i}_{1}}^{L}={\tilde{J}}_{{i}_{h} \ \cdots \ {i}_{2}{i}_{1}}^{L},\\ \displaystyle \sum _{{i}_{h+1}\ ,\ {i}_{h}, \ \cdots \ ,{i}_{1}}^{R,W}{P}_{{i}_{h+1} \ {i}_{h} \ \cdots \ {i}_{1}}=1,\end{array}\end{eqnarray}$
where ih, ih−1, ⋯ , i1 = R, W, and,
$\begin{eqnarray}\begin{array}{l}{J}_{{i}_{h+1} \ \cdots \ {i}_{2}{i}_{1}}^{L}={{ \mathcal K }}_{{i}_{h+1} \ \cdots \ {i}_{2}{i}_{1}}{P}_{{i}_{h+1} \ \cdots \ {i}_{2}}\\ -\left[\displaystyle \sum _{j=1}^{L}{{ \mathcal Q }}_{j,{i}_{h+1} \ \cdots \ {i}_{2}{i}_{1}}{P}_{{i}_{h+1} \ \cdots \ {i}_{2}{i}_{1}}\right.\\ +\displaystyle \sum _{j=2}^{L}\displaystyle \sum _{{I}_{1}}^{R,W}{{ \mathcal Q }}_{j,{i}_{h} \ \cdots \ {i}_{1}{I}_{1}}{P}_{{i}_{h+1} \ \cdots \ {i}_{1}{I}_{1}}\\ + \ \cdots \ +\displaystyle \sum _{j=k}^{L}\displaystyle \sum _{{I}_{1}, \ \cdots \ ,{I}_{k-1}}^{R,W}{{ \mathcal Q }}_{j,{i}_{h-k+1} \ \cdots \ {i}_{1}{I}_{1} \ \cdots \ {I}_{k-1}}{P}_{{i}_{h+1} \ \cdots \ {i}_{1}{I}_{1} \ \cdots \ {I}_{k-1}}+ \ \cdots \ \\ \left.+\displaystyle \sum _{{I}_{1}, \ \cdots \ ,{I}_{L-1}}^{R,W}{{ \mathcal Q }}_{L,{i}_{h-L+1} \ \cdots \ {i}_{1}{I}_{1} \ \cdots \ {I}_{L-1}}{P}_{{i}_{h+1} \ \cdots \ {i}_{1}{I}_{1} \ \cdots \ {I}_{L-1}}\right],\\ {\tilde{{J}^{L}}}_{{i}_{h+1} \ \cdots \ {i}_{2}{i}_{1}}={J}_{{i}_{h+1} \ \cdots \ {i}_{2}{i}_{1}R}^{L}+{J}_{{i}_{h+1} \ \cdots \ {i}_{2}{i}_{1}W}^{L}.\end{array}\end{eqnarray}$
Following the same logic, equation (A.12) still holds here to give,
$\begin{eqnarray}\phi =\displaystyle \frac{{J}_{R}^{L}}{{J}_{W}^{L}}\approx \displaystyle \frac{{J}_{0}^{L}}{{J}_{h+1}^{L}}=\displaystyle \frac{{J}_{0}^{L}}{{J}_{1}^{L}}=\displaystyle \frac{{P}_{0}}{{P}_{1}}.\end{eqnarray}$
Equations (A.12) and (A.21) lead to,
$\begin{eqnarray}{J}_{1}^{L}\approx {J}_{2}^{L}\approx \ \cdots \ \approx {J}_{h+1}^{L}.\end{eqnarray}$
According to condition (c), every P contains two more Ws are negligible in equation (A.24), for example, ${\sum }_{j=2}^{L}{\sum }_{{I}_{1}}^{R,W}{{ \mathcal Q }}_{j,{i}_{h} \ \cdots \ {i}_{1}{I}_{1}}{P}_{{i}_{h+1} \ \cdots \ {i}_{1}{I}_{1}}\approx {\sum }_{j=2}^{L}{{ \mathcal Q }}_{j,{i}_{h} \ \cdots \ {i}_{1}R}{P}_{{i}_{h+1} \ \cdots \ {i}_{1}R}$. So equation (A.24) can be rewritten as,
$\begin{eqnarray}\begin{array}{l}{{ \mathcal K }}_{1}{P}_{2}-\left(\displaystyle \sum _{j=1}^{L}{{ \mathcal Q }}_{j,1}{P}_{1}\right)\\ \approx {{ \mathcal K }}_{2}{P}_{3}-\left(\displaystyle \sum _{j=1}^{L}{{ \mathcal Q }}_{j,2}{P}_{2}+\displaystyle \sum _{j=2}^{L}{{ \mathcal Q }}_{j,1}{P}_{1}\right)\\ \approx \ \cdots \ \\ \approx {{ \mathcal K }}_{L}{P}_{L+1}-\left(\displaystyle \sum _{j=1}^{L}{{ \mathcal Q }}_{j,L}{P}_{L}+\displaystyle \sum _{j=2}^{L}{{ \mathcal Q }}_{j,L-1}{P}_{L-1}\right.\\ \left.+ \ \cdots \ +{{ \mathcal Q }}_{L,1}{P}_{1}\right)\\ \approx {{ \mathcal K }}_{L+1}{P}_{L+2}-\left(\displaystyle \sum _{j=1}^{L}{{ \mathcal Q }}_{j,L+1}{P}_{L+1}+\displaystyle \sum _{j=2}^{L}{{ \mathcal Q }}_{j,L}{P}_{L}\right.\\ \left.+ \ \cdots \ +{{ \mathcal Q }}_{L,2}{P}_{2}\right)\\ \approx \ \cdots \ \\ \approx {{ \mathcal K }}_{h+1}{P}_{0}-\left(\displaystyle \sum _{j=1}^{L}{{ \mathcal Q }}_{j,h+1}{P}_{h+1}+\displaystyle \sum _{j=2}^{L}{{ \mathcal Q }}_{j,h}{P}_{h}\right.\\ \left.+ \ \cdots \ +{{ \mathcal Q }}_{L,h-L+1}{P}_{h-L+1}\right).\end{array}\end{eqnarray}$
To calculate P0/P1 in equation (A.23), we introduce that,
$\begin{eqnarray}\begin{array}{l}{A}_{1}=\displaystyle \frac{{{ \mathcal K }}_{1}}{{{ \mathcal K }}_{h+1}}\displaystyle \frac{{P}_{2}}{{P}_{1}},\ {A}_{2}=\displaystyle \frac{{{ \mathcal K }}_{2}}{{{ \mathcal K }}_{1}}\displaystyle \frac{{P}_{3}}{{P}_{2}},\ \ \cdots \ ,\\ {A}_{h}=\displaystyle \frac{{{ \mathcal K }}_{h}}{{{ \mathcal K }}_{h-1}}\displaystyle \frac{{P}_{h+1}}{{P}_{h}},\ {A}_{h+1}=\displaystyle \frac{{{ \mathcal K }}_{h+1}}{{{ \mathcal K }}_{h}}\displaystyle \frac{{P}_{0}}{{P}_{h+1}},\\ \displaystyle \frac{{P}_{0}}{{P}_{1}}={A}_{h+1}\cdot {A}_{h} \ \cdots \ {A}_{1}.\end{array}\end{eqnarray}$
One can get ${A}_{1}\approx {{ \mathcal K }}_{0}/{{ \mathcal K }}_{h+1}$ by equation ${J}_{h+1}^{L}/{J}_{1}^{L}={P}_{h+1}/{P}_{1}$ in equation (A.21). From equations (A.25) and (A.26), we have,
$\begin{eqnarray*}\begin{array}{l}{A}_{2}\approx 1+\displaystyle \frac{{\sum }_{j=1}^{L}{{ \mathcal Q }}_{j,2}}{{{ \mathcal K }}_{1}},\\ {A}_{3}\approx 1+\displaystyle \frac{{\sum }_{j=1}^{L}{{ \mathcal Q }}_{j,3}}{{{ \mathcal K }}_{2}}-\displaystyle \frac{{{ \mathcal Q }}_{\mathrm{1,2}}}{{{ \mathcal K }}_{1}}\displaystyle \frac{1}{{A}_{2}},\\ \ \cdots \ ,\\ {A}_{n}\approx 1+\displaystyle \frac{{\sum }_{j=1}^{L}{{ \mathcal Q }}_{j,n}}{{{ \mathcal K }}_{n-1}}-\left(\displaystyle \frac{{{ \mathcal Q }}_{1,n-1}}{{{ \mathcal K }}_{n-2}}\displaystyle \frac{1}{{A}_{n-1}}\right.\\ +\displaystyle \frac{{{ \mathcal Q }}_{2,n-2}}{{{ \mathcal K }}_{n-3}}\displaystyle \frac{1}{{A}_{n-1}{A}_{n-2}}\\ \left.+ \ \cdots \ +\displaystyle \frac{{{ \mathcal Q }}_{n-\mathrm{2,2}}}{{{ \mathcal K }}_{1}}\displaystyle \frac{1}{{A}_{n-1}{A}_{n-2} \ \cdots \ {A}_{2}}\right),\\ (n\leqslant L+1)\end{array}\end{eqnarray*}$
$\begin{eqnarray}\,\begin{array}{l}{A}_{n}\approx 1+\displaystyle \frac{{\sum }_{j=1}^{L}{{ \mathcal Q }}_{j,n}}{{{ \mathcal K }}_{n-1}}-\left(\displaystyle \frac{{{ \mathcal Q }}_{1,n-1}}{{{ \mathcal K }}_{n-2}}\displaystyle \frac{1}{{A}_{n-1}}\right.\\ +\displaystyle \frac{{{ \mathcal Q }}_{2,n-2}}{{{ \mathcal K }}_{n-3}}\displaystyle \frac{1}{{A}_{n-1}{A}_{n-2}}\\ \left.+ \ \cdots \ +\displaystyle \frac{{{ \mathcal Q }}_{L,n-L}}{{{ \mathcal K }}_{n-(L+1)}}\displaystyle \frac{1}{{A}_{n-1}{A}_{n-2} \ \cdots \ {A}_{n-L}}\right).\\ (n\gt L+1)\\ \end{array}\,\end{eqnarray}$
Let φs = A1 and ${\phi }_{{e}_{h}}={A}_{2}{A}_{3} \ \cdots \ {A}_{h+1}$, we finally have,
$\begin{eqnarray}\begin{array}{l}\phi ={\phi }_{{s}_{h}}{\phi }_{{e}_{h}},\quad {\phi }_{{s}_{h}}=\displaystyle \frac{{{ \mathcal K }}_{0}}{{{ \mathcal K }}_{h+1}},\quad {\phi }_{{e}_{0}}\equiv 1,\\ {\phi }_{{e}_{n}}=1+\displaystyle \sum _{i={i}_{0}}^{n}\displaystyle \frac{{\sum }_{j=n-i+1}^{L}{{ \mathcal Q }}_{j,i+1}}{{{ \mathcal K }}_{i}}{\phi }_{{e}_{i-1}},\\ {i}_{0}=\max (1,n-L+1),n=1,2, \ \cdots \ ,h.\end{array}\end{eqnarray}$

Corrections were made to this article on 25 January 2022. The layout of equation 21 was amended.

1
Blank A Gallant J A Burgess R R Loeb L A 1986 Biochemistry 25 5920

DOI

2
Rosenberger R F Hilton J 1983 Mol. Gen. Genet. 191 207

DOI

3
Ninio J 1991 Biochimie 73 1517

DOI

4
Roghanian M Zenkin N Yuzenkova Y 2015 Nucleic Acids Res. 43 1529

DOI

5
Hopfield J J 1974 Proc. Natl Acad. Sci. USA 71 4135

DOI

6
Ninio J Bernardi F Brun G Assairi L Lauber M Chapeville F 1975 FEBS Lett. 57 139

DOI

7
Surratt C K Milan S C Chamberlin M J 1991 Proc. Natl Acad. Sci. USA 88 7983

DOI

8
Izban M G Luse D S 1992 Genes Dev. 6 1342

DOI

9
Lange U Hausner W 2004 Mol. Microbiol. 52 1133

DOI

10
Mellenius H Ehrenberg M 2017 Nucleic Acids Res. 45 11582

DOI

11
Voliotis M Cohen N Molina-París C Liverpool T B 2009 Phys. Rev. Lett. 102 258101

DOI

12
Sahoo M Klumpp S 2013 J. Phys.: Condens. Matter 25 374104

DOI

13
Alic N Ayoub N Landrieux E Favry E Baudouin-Cornu P Riva M Carles C 2007 Proc. Natl Acad. Sci. USA 104 10400

DOI

14
Erie D Hajiseyedjavadi O Young M von Hippel P 1993 Science 262 867

DOI

15
Watkins N E Jr Kennelly W J Tsay M J Tuin A Swenson L Lee H-R Morosyuk S Hicks D A SantaLucia J Jr 2011 Nucleic Acids Res. 39 1894

DOI

16
Shu Y G Song Y S Ou-Yang Z C Li M 2015 J. Phys.: Condens. Matter 27 235105

DOI

17
Song Y S Shu Y G Zhou X Ou-Yang Z C Li M 2016 J. Phys.: Condens. Matter 29 025101

DOI

18
Gillespie D T 1977 J. Phys. Chem. 81 2340

DOI

19
Borukhov S Sagitov V Goldfarb A 1993 Cell 72 459

DOI

Outlines

/