3.9 Correction for correlations 校正相关性

Suppose that, from an intricate logical analysis, drawing and replacing a red ball increases the probability for a red one at the next draw by some small amount $\epsilon$ > 0, while drawing and replacing a white one decreases the probability for a red one at the next draw by a (possibly equal) small quantity δ > 0; and that the influence of earlier draws than the last one is negligible compared with $\epsilon$ or δ. You may call this effect a small ‘propensity’ if you like; at least it expresses a physical causation that operates only forward in time. Then, letting C stand for all the above background information, including the statements just made about correlations and the information that we draw n balls, we have

假设，从复杂的逻辑分析中，抽到一个红球然后放回去,会让下一次抽中红球的概率稍微增加 $\epsilon >0$ ，而抽到白球再放回,会让下一次抽到红球的概率稍微减少（这两者可能相等）δ> 0;并且与 $\epsilon$ 或δ相比，较早的抽取对当前的抽取的影响,与最近一次的抽取相比,其影响可以忽略不计。如果你愿意，你可以称这种效果为小小的“偏好”; 至少它表达了一种只能在时间上单向传递的物理因果关系。然后，让C代表所有上述背景信息，包括关于抽取之间的相关性的陈述,还有我们已经抽取了n个球的信息，我们有

 $P(R_k|R_{k−1}C) = p + \epsilon, P(R_k|W_{k−1}C) = p − δ,$ 
 $P(W_k|R_{k−1}C) = 1 − p − \epsilon, P(W_k|W_{k−1}C) = 1 − p + δ,$  (3.93)

where p ≡ M/N. From this, the probability for drawing r red and (n − r ) white balls in any specified order is easily seen to be

其中p≡M/N.由此,容易知道,当第一次抽出的是红球,以任意顺序抽到r个红球和(n-r)个白球的概率是

 $p(p + \epsilon)^c(p − δ)^{c'}(1 − p + δ)^w(1 − p − \epsilon)^{w'}$  (3.94)

if the first draw is red; whereas, if the first is white, the first factor in (3.94) should be (1 − p). Here, c is the number of red draws preceded by red ones, c' the number of red preceded by white, w the number of white draws preceded by white, and w' the number of white preceded by red. Evidently,

然而，如果第一次抽出的是白球，则（3.94）中的第一个因子应该是（1-p）。这里，c是抽中一个红球之前的连续个红球的个数，c'是抽中一个红球之前连续的白球的数量，w是抽到一个白球之前的白球的数量，w'是抽到一个红球之前的白球的数量。显然，

 $c + c' = \begin{bmatrix} r-1 \\ r \end{bmatrix} , w+ w' = \begin{bmatrix} n-r \\ n − r − 1 \end{bmatrix}$ , (3.95)

the upper and lower cases holding when the first draw is red or white, respectively. When r and (n − r ) are small, the presence of $\epsilon$ and δ in (3.94) makes little difference, and the equation reduces for all practical purposes to

最高和最低的情况分别对应了第一次抽中红球或白球的清考。当r和（n-r）很小时，（3.94）中 $\epsilon$ 和δ的差别几乎为0，且对所有实际目的而言,可以规约为

 $p^r (1 − p)^{n−r}$  , (3.96)

as in the binomial distribution (3.92). But, as these numbers increase, we can use relations of the form

正如(3.92)的二项式分布.但是,随着数量的增加,用

 $(1 + \frac {\epsilon}{p} )^c \simeq exp \{ \frac {\epsilon c}{p} \}$ , (3.97)

and (3.94) goes into

的形式关系还有(3.94),有

 $p^r (1−p)^{n−r} exp \{ \frac {\epsilon c − δc'} {p} + \frac {δw − \epsilon w'} {1−p} \}$ . (3.98)

The probability for drawing r red and (n − r ) white balls now depends on the order in which red and white appear, and, for a given $\epsilon$ , when the numbers c, c', w, w' become sufficiently large, the probability can become arbitrarily large (or small) compared with (3.92).

抽到r个红球和（n-r）个白球的概率现在取决于红色和白色出现的顺序，对于给定的 $\epsilon$ ，当c，c'，w，w'足够大时，和（3.92）相比，其概率可能变得任意大（或小）。

We see this effect most clearly if we suppose that N = 2M, p = 1/2, in which case we will surely have $\epsilon = δ$ . The exponential factor in (3.98) then reduces to

如果假设N=2M，p=1/2，我们就会清楚地看到这种效果，在此情况下，必有 $\epsilon =δ$ 。然后（3.98）中的指数因子可规约成

 $exp \{ 2 \epsilon [(c − c') + (w − w')] \}.$  (3.99)

This shows that (i) as the number n of draws tends to infinity, the probability for results containing ‘long runs’ (i.e. long strings of red (or white) balls in succession), becomes arbitrarily large compared with the value given by the ‘randomized’ approximation; (ii) this effect becomes appreciable when the numbers $(\epsilon c)$ , etc., become of order unity. Thus, if $\epsilon = 10^{−2}$ , the randomized approximation can be trusted reasonably well as long as n < 100; beyond that, we might delude ourselves by using it. Indeed, it is notorious that in real repetitive experiments where conditions appear to be the same at each trial, such runs – although extremely improbable on the randomized approximation – are nevertheless observed to happen.

这表明（i）当采样的次数n趋近于无穷大时，一个包含“长组”（即连续的一长串红球（或白球））的采样的概率,和"随机化"的近似方法得到的结果相比,变得任意大;（ii）当数字 $(\epsilon c)$ 等变为有序统一时，这种效应变得显而易见。因此，如果 $\epsilon=10^{-2}$ ，只要n<100，随机化的近似就可以得到可信的结果; 除此之外，可能只会误导自己。实际上，众所周知的是，在的真实重复实验中,保证了每次采样的条件几乎一样时，出现长同色球序列的情况--尽管在随机化的近似方法中极不可能出现--但仍然可以被观察到。

Now let us note how the correlations expressed by (3.93) affect some of our previous calculations. The probabilities for the first draw are of course the same as (3.8); we now use the notation

现在让我们注意（3.93）表达的一些关系如何影响我们之前的计算。第一次抽签的概率与（3.8）相同; 我们现在使用符号

 $p = P(R_1|C) = \frac M N, q = 1 − p = P(W_1|C) = \frac {N − M}{N}.$  (3.100)

But for the second trial we have instead of (3.35)

但是对第二次采样,(3.35)可替换为

 $P(R_2|C) = P(R_2R_1|C) + P(R_2W_1|C)$ 
 $= P(R_2|R_1C) P(R_1|C) + P(R_2|W_1C) P(W_1|C)$ 
 $= (p + \epsilon )p + (p − δ)q$ 
 $= p + (p \epsilon − qδ),$        (3.101)

and continuing for the third trial

然后对第三次采样

 $P(R_3|C) = P(R_3|R_2C)P(R_2|C) + P(R_3|W_2C)P(W_2|C)$ 
 $= (p + \epsilon)(p + p \epsilon − qδ) + (p − δ)(q − p \epsilon + qδ)$ 
 $= p + (1 + \epsilon + δ)(p \epsilon − qδ).$     (3.102)

We see that $P(R_k |C)$ is no longer independent of k; the correlated probability distribution is no longer exchangeable. But does $P(R_k |C)$ approach some limit as k →∞?

我们看到 $P（R_k|C）$ 不再独立于k; 与此相关的概率分布不再有可交换性。但是，当P→∞时， $P（R_k|C）$ 接近某个极限吗？

It would be almost impossible to guess the general $P(R_k |C)$ by induction, following the method in (3.101) and (3.102) a few steps further. For this calculation we need a more powerful method. If we write the probabilities for the kth trial as a vector,

按照（3.101）和（3.102）中的方法进一步推测，几乎不可能归纳出通用的 $P（R_k|C）$ 。对于这个计算，我们需要一个更强大的方法。如果我们将第k次试验的概率写为向量，

 $Vk ≡ \begin{bmatrix} P(R_k|C) \\ P(W_k|C) \end{bmatrix}$ , (3.103)

then (3.93) can be expressed in matrix form:

那么(3.93)可以表示为矩阵形式

 $V_k = MV_{k−1}$ , (3.104)

with

和

 $M = \binom {[p + \epsilon] \, [p − δ]} {[q − \epsilon] \, [q + δ]}$ . (3.105)

This defines a Markov chain of probabilities, and M is called the transition matrix. Now the slow induction of (3.101) and (3.102) proceeds instantly to any distance we please:

这定义了一个马尔可夫概率链，M称为转移矩阵。现在，（3.101）和（3.102）的缓慢的归纳过程,能快速的递进到我们期望的任意远处：

 $V_k = M^{k−1}V_1$ . (3.106)

So, to have the general solution, we need only to find the eigenvectors and eigenvalues of M. The characteristic polynomial is

因此，为了得到一般解，我们只需要找到M的特征向量和特征值。特征多项式是

 $C(λ) ≡ det(M_{ij} − λδ_{ij} ) = λ^2 − λ(1 + \epsilon + δ) + (\epsilon + δ)$  (3.107)

so the roots of C(λ) = 0 are the eigenvalues

所以C（λ）= 0的根是特征值

 $λ_1 = 1$ 
 $λ_2=\epsilon + δ.$    (3.108)

Now, for any 2 × 2 matrix

现在,对任意2×2矩阵

 $M = \binom {a \, b} {c \, d}$    (3.109)

with an eigenvalue λ, the corresponding (non-normalized) right eigenvector is

具有特征值λ，相应的（非归一化的）右特征向量是

x=(bλ−a) , (3.110)

for which we have at once Mx = λx. Therefore, our eigenvectors are

对此我们有Mx=λx。因此，我们的特征向量是

 $x1 = \binom {p − δ} {q − \epsilon} , x2 = \binom {1}{−1}$ . (3.111)

These are not orthogonal, since M is not a symmetric matrix. Nevertheless, if we use (3.111) to define the transformation matrix

这些不是正交的，因为M不是对称矩阵。然而，如果我们使用（3.111）来定义变换矩阵

 $S = \binom {[p − δ] \, 1} {[q − \epsilon] \, −1}$ , (3.112)

we find its inverse to be

可以找到其逆矩阵是

 $S^{−1} = \frac {1}{1 − \epsilon − δ} \binom {1 \qquad 1} {[q − \epsilon] \qquad −[p − δ]}$ , (3.113)

and we can verify by direct matrix multiplication that

我们可以用矩阵乘法来验证

 $S^{−1}MS = \Lambda = \binom {λ_1 \quad 0} {0 \quad λ_2}$ , (3.114)

where $\Lambda$ is the diagonalized matrix. Then we have for any r , positive, negative, or even complex:

其中 $\Lambda$ 是对角化矩阵。然后我们有对任何r，正数，负数甚至复数有：

 $M^r = S \Lambda ^r S^{−1}$  (3.115)

或者

 $M^r = \frac {1}{1 − \epsilon − δ} \binom {p − δ + [\epsilon + δ]^r [q − \epsilon] \quad [p − δ][1 − (\epsilon + δ)^r ]} {[q − \epsilon][1 − (\epsilon + δ)^r ] \quad q − \epsilon + [\epsilon + δ]^r [p − δ]}$  , (3.116)

and since

并且,由于

 $V_1 = \binom {p}{q}$  (3.117)

the general solution (3.106) sought is

找到的通用解(3.106) 是

 $P(R_k|C) = \frac {(p − δ) − (\epsilon + δ)^{k−1}(p\epsilon − qδ)} {1 − \epsilon − δ}$ . (3.118)

We can check that this agrees with (3.100), (3.101) and (3.102). From examining (3.118) it is clear why it would have been almost impossible to guess the general formula by induction. When $\epsilon$ = δ = 0, this reduces to $P(R_k |C) = p$ , supplying the proof promised after Eq. (3.37).

done here

我们可以检查这应与（3.100），（3.101）和（3.102）一致。从检查（3.118）可以清楚地看出为什么几乎不可能通过归纳来猜测出通用公式。当 $\epsilon$ =δ=0时，这将约化为 $PR_k|C）=p$ ，给出了在公式(3.37)之后承诺的证明。

Although we started this discussion by supposing that $\epsilon$ and δ were small and positive, we have not actually used that assumption, and so, whatever their values, the solution (3.118) is exact for the abstract model that we have defined. This enables us to include two interesting extreme cases. If not small, $\epsilon$ and δ must be at least bounded, because all quantities in (3.93) must be probabilities (i.e. in [0, 1]). This requires that

虽然在开始讨论时,我们假设 $\epsilon$ 和δ是很小的正数，但我们实际上没有用到，因此无论它们的取值为何，（3.118）的解都是我们的抽象模型的精确定义。这使我们能够将两个有趣的极端情况包括进来。 $\epsilon$ 和δ不是足够小那么至少必须是有界的，因为（3.93）中的所有数值必须都是个概率（即在区间[0,1]中）。这要求

 $−p ≤ \epsilon ≤ q, −q ≤ δ ≤ p$ , (3.119)

或者

 $−1 ≤ \epsilon + δ ≤ 1$ . (3.120)

But from (3.119), $\epsilon + δ = 1$ if and only if $\epsilon = q$ , δ = p, in which case the transition matrix reduces to the unit matrix

但是从(3.119), $\epsilon + δ = 1$ 当且仅当 $\epsilon = q$ , δ = p, 其中转换矩阵可约化为单位矩阵

 $M = \binom {1 \, 0} {0 \, 1}$  (3.121)

and there are no ‘transitions’. This is a degenerate case in which the positive correlations are so strong that whatever color happens to be drawn on the first trial is certain to be drawn also on all succeeding ones:

就不再需要'转换'了.这是一种简化情况,其中的正相关性足够强,导致无论第一次抽到的是什么颜色,则接下来抽到的几乎肯定都是这个颜色的球.

 $P(R_k|C) = p$ , all k. (3.122)

Likewise, if $\epsilon + δ = −1$ , then the transition matrix must be

类似的, 如果有 $\epsilon + δ = −1$ , 则转换矩阵必须为

 $M = \binom {0 \, 1} {1 \, 0}$   (3.123)

and we have nothing but transitions; i.e. the negative correlations are so strong that the colors are certain to alternate after the first draw:

则我们只有转换矩阵了；负相关性足够强,使得第一次之后抽到的颜色肯定都是和第一次的颜色不同的.

 $P(R_k|C) = \begin{Bmatrix} p, k \, odd \\ q, k \, even \end{Bmatrix}$ . (3.124)

This case is unrealistic because intuition tells us rather strongly that $\epsilon$ and δ should be positive quantities; surely, whatever the logical analysis used to assign the numerical value of $\epsilon$ , leaving a red ball in the top layer must increase, not decrease, the probability of red on the next draw. But if $\epsilon$ and δ must not be negative, then the lower bound in (3.120) is really zero, which is achieved only when $\epsilon$ = δ = 0. Then M in (3.105) becomes singular, and we revert to the binomial distribution case already discussed.

这种情况是不现实的，因为直觉强烈地告诉我们 $\epsilon$ 和δ应该是正数; 可以肯定，无论用什么逻辑分析来对 $\ epsilon$ 赋值，在顶层留下一个红球,使得接下来抽到红球的概率必定是增加，而不是减少。但如果 $\ epsilon$ 和δ不能为负，则（3.120）的下限应为零，且只有在 $\ epsilon$ =δ= 0时为0。然后（3.105）中的M变为1(?单调?归一?)，让我们又回到已经讨论过的二项分布的情况。

In the intermediate and realistic cases where $0 < |\epsilon + δ| < 1$ , the last term of (3.118) attenuates exponentially with k, and in the limit

在区间内且现实的情况下,即 $0 <| \epsilon +δ| <1$ ，（3.118）的最后一项以指数k的速度衰减，当极限

 $P(R_k|C) → \frac {p − δ}{1 − \epsilon − δ}$ . (3.125)

But although these single-trial probabilities settle down to steady values as in an exchangeable distribution, the underlying correlations are still at work and the limiting distribution is not exchangeable. To see this, let us consider the conditional probabilities $P(R_k |R_jC)$ . These are found by noting that the Markov chain relation (3.104) holds whatever the vector $V_{k−1}$ ; i.e. whether or not it is the vector generated from $V_1$ as in (3.106). Therefore, if we are given that red occurred on the j th trial, then

但是，尽管在可交换的概率分布中单次采样的概率都是确定值，潜在的相关性仍在发挥作用而且极限分布是不可交换的。为了看到这一点，让我们考虑条件概率 $P（R_k | R_jC）$ 。可以注意到无论向量 $V_{K-1}$ 为何,马尔可夫链关系（3.104）都有效; 即，无论它是否是像（3.106）那样从 $V_1$ 生成的向量。因此，如果给定第j次采样是红色,那么

 $V_j = \binom {1} {0}$ , (3.126)

and we have from (3.104)

从(3.104)有

 $V_k = M^{k−j}V_j , j ≤ k$ , (3.127)

from which, using (3.115),

然后应用(3.115)

 $P(R_k|R_jC) = \frac {(p − δ) + (\epsilon + δ)^{k−j} (q − \epsilon)} {1 − \epsilon − δ}$ , j < k, (3.128)

which approaches the same limit (3.125). The forward inferences are about what we might expect; the steady value (3.125) plus a term that decays exponentially with distance. But the backward inferences are different; note that the general product rule holds, as always:

这和（3.125）趋向相同的极限。前向推理正如我们期望的; 稳定值（3.125）加上一个随距离呈指数衰减的项。但后向推理是不同的; 请注意，通用乘法规则依然有效：

 $P(R_kR_j|C) = P(R_k|R_jC) P(R_j|C) = P(R_j |R_kC) P(R_k |C)$ . (3.129)

Therefore, since we have seen that $P(R_k |C) \neq P(R_j |C)$ , it follows that

所以,既然我们看到 $P(R_k |C) \neq P(R_j |C)$ , 接下来

 $P(R_j |R_kC) \neq P(R_k |R_jC)$ . (3.130)

The backward inference is still possible, but it is no longer the same formula as the forward inference as it would be in an exchangeable sequence.

反向推断仍然是可能的，但它不再与正向推断为相同的公式，因为它将是一个可交换的序列。

As we shall see later, this example is the simplest possible ‘baby’ version of a very common and important physical problem: an irreversible process in the ‘Markovian approximation’. Another common technical language would call it an autoregressive model of first order. It can be generalized greatly to the case of matrices of arbitrary dimension and many-step or continuous, rather than single-step, memory influences. But for reasons noted earlier (confusion of inference and causality in the literature of statistical mechanics), the backward inference part of the solution is almost always missed. Some try to do backward inference by extrapolating the forward solution backward in time, with quite bizarre and unphysical results. Therefore the reader is, in effect, conducting new research in doing the following exercise.

正如我们稍后将看到的，这个例子可能是普遍且重要的物理问题的一个最简单的“婴儿”版本：“马尔可夫近似”中的一个不可逆转的过程。另一种常见的术言称其为一阶自动回归模型.它可以大大推广到任意维矩阵,多步或者连续的,而非单步的记忆影响.但是之前看到的原因（统计力学文献中,混淆了逻辑推理关系和因果关系），解决方案的反向推理部分几乎总是被丢弃了。有些人试图在推断正向解时采用逆向时间,来做反向推断,而得出了非常奇怪和非物质的结果。因此，在完成下述练习是,读者实际上是在进行新的研究。

Exercise 3.6. Find the explicit formula $P(R_j |R_kC)$ for the backward inference corresponding to the result (3.128) by using (3.118) and (3.129). (a) Explain the reason for the difference between forward and backward inferences in simple intuitive terms. (b) In what way does the backward inference differ from the forward inference extrapolated backward? Which is more reasonable intuitively? (c) Do backward inferences also decay to steady values? If so, is a property somewhat like exchangeability restored for events sufficiently separated? For example, if we consider only every tenth draw or every hundredth draw, do we approach an exchangeable distribution on this subset?

练习3.6 参照使用（3.118）和（3.129）得到结果（3.128）,请找到对应的反向推理的显式公式 $P（R_j | R_kC）$ 。（a）用简单直观的术语解释正向和反向推断之间的差异。（b）反向推断与倒过来的前向推断,其方式有什么不同?哪个更直观合理？（c）反向推论也会衰退到稳定值吗？如果会，一个类似可交换性的属性是否被恢复了,使得事件被充分隔离？例如，如果我们只考虑所有的第十次采样或第一百次采样，在这个子集上是否能得到可交换分布？

Chapter 3.9 校正相关性

3.9 Correction for correlations 校正相关性

results matching ""

No results matching ""