4.2 Testing binary hypotheses with binary data 基于二值数据的二值假设检验

The simplest nontrivial problem of hypothesis testing is the one where we have only two hypotheses to test and only two possible data values. Surprisingly, this turns out to be a realistic and valuable model of many important inference and decision problems. Firstly, let us adapt (4.3) to this binary case. It gives us the probability that H is true, but we could have written it equally well for the probability that H is false:

假设检验中最简单的非平凡问题是被检验的假设只有两个,而且数据的取值也只有两个。令人惊讶的是，这被证明是许多重要推理和决策问题的有价值的实际模型。首先，对两个可能的取值应用（4.3）。可求出H为真的概率，但对于H为假的概率,形式也一样：

 $P(\bar{H}|DX) = P(\bar{H}|X) \frac {P(D|\bar{H}X)} {P(D|X)},$  (4.4)

and if we take the ratio of the two equations,

如果我们将两者相除

 $\frac {P(H|DX)} {P(\bar{H}|DX)} = \frac {P(H|X)} {P(\bar{H}|X)} \frac {P(D|HX)} {P(D|HX)},$  (4.5)

the term P(D|X) will drop out. This may not look like any particular advantage, but the quantity that we have here, the ratio of the probability that H is true to the probability that it is false, has a technical name. We call it the ‘odds’ on the proposition H. So if we write the ‘odds on H, given D and X’, as the symbol

P(D|X)被消去了.这看起来好像没取得什么进展,但是这个为真和为假的概率的比值,有一个技术性的名称.我们称其为命题H的"胜率".所以,我们用符号表示"给定D和X时,H的胜率"是

 $O(H|DX) ≡ \frac {P(H|DX)} {P(\bar{H}|DX)},$  (4.6)

then we can combine (4.3) and (4.4) into the following form:

然后结合(4.3)和(4.4)为如下形式:

 $O(H|DX) = O(H|X) \frac {P(D|HX)} {P(D|\bar{H}X)}.$  (4.7)

The posterior odds on H is (are?) equal to the prior odds multiplied by a dimensionless factor, which is also called a likelihood ratio. The odds are (is?) a strict monotonic function of the probability, so we could equally well calculate this quantity.1

H的后验胜率等于先验胜率乘以一个无量纲的因子，这个因子也称为似然比。胜率是概率的一个严格单调函数，所以我们可以同样地计算出这个值.

1 Our uncertain phrasing here indicates that ‘odds’ is a grammatically slippery word. We are inclined to agree with purists who say that it is, like ‘mathematics’ and ‘physics’, a singular noun in spite of appearances. Yet the urge to follow the vernacular and treat it as plural is sometimes irresistible, and so we shall be knowingly inconsistent and use it both ways, judging what seems euphonious in each case.

注释1:在这里,'赔率'这个词在一个语法上是"模糊"的(译注:不确定应该是单数还是复数名词)。我们倾向于同意纯粹主义者的观点，他们认为，就像“数学”和“物理学”一样，是一个单数名词,尽管看起来是复数名词。然而，为了符合口语习惯而将其视为复数名词的冲动有时是不可抗拒的，因此明知不一致,但我们还是会用两种方式使用它，根据具体情况下那种提起来更舒服。

In many applications it is convenient to take the logarithm of the odds because of the fact that we can then add up terms. Now we could take logarithms to any base we please, and this cost the writer some trouble. Our analytical expressions always look neater in terms of natural (base e) logarithms. But back in the 1940s and 1950s when this theory was first developed, we used base 10 logarithms because they were easier to find numerically; the four-figure tables would fit on a single page. Finding a natural logarithm was a tedious process, requiring leafing through enormous old volumes of tables.

在很多应用中，使用胜率的对数更为方便，因为相加要比相乘更容易计算。目前我们可以用任何我们喜欢的数作为对数的底，不过这给作者带来了一些麻烦。如果用自然对数e为底,我们的解析表达式看起来会更整洁。但是回到20世纪40年代和50年代，当这个理论最初起步时，我们用了10作为对数的底，因为这更容易在对数表中查找,四位数表可以容易的放在一张纸上。查找以自然为底的对数是一个繁琐的过程，需要在很多张旧表中查找。

Today, thanks to hand calculators, all such tables are obsolete and anyone can find a tendigit natural logarithm just as easily as a base 10 logarithm. Therefore, we started happily to rewrite this section in terms of the aesthetically prettier natural logarithms. But the result taught us that there is another, even stronger, reason for using base 10 logarithms. Our minds are thoroughly conditioned to the base 10 number system, and base 10 logarithms have an immediate, clear intuitive meaning to all of us. However, we just don’t know what to make of a conclusion that is stated in terms of natural logarithms, until it is translated back into base 10 terms. Therefore, we re-rewrote this discussion, reluctantly, back into the old, ugly base 10 convention.

今天，感谢手动计算器，所有这些表都已过时，任何人都可以像处理以10为底来计算对数一样容易的计算以e为底的对数。因此，我们开始愉快地用美学上更漂亮的自然对数来重写本节。但结果告诉我们，使用基数为10的对数还有另一个甚至更好的理由。我们的思维完全适应基数为10的数字系统，所有人对以10为底的对数都具有更直接和清晰的直观感觉。但是，对于用自然对数表示结论我们不知道该得出什么结论，直到被翻译回以10为底的对数来表示。因此，我们不十分情愿地再次重写了这个讨论，回到旧的丑陋的以10为底的惯例。

We define a new function, which we will call the evidence for H given D and X:

我们定义了一个新函数，我们将其称为给定D和X时,H的证据：

 $e(H|DX) ≡ 10 \log_{10} O(H|DX).$  (4.8)

This is still a monotonic function of the probability. By using the base 10 and putting the factor 10 in front, we are now measuring evidence in decibels (hereafter abbreviated to db). The evidence for H, given D, is equal to the prior evidence plus the number of db provided by working out the log likelihood in the last term below:

这还是一个单调的概率函数。通过使用以10为底的对数并乘以因子10，我们现在用分贝（以下缩写为db）来作为证据的单位。给定D时H的证据,等于先验证据加上已经求出的以db计量的似然度,即下面的最后一项：

 $e(H|DX) = e(H|X) + 10 \log_{10} [ \frac {P(D|HX)} {P(D|\bar{H}X)} ].$  (4.9)

Now suppose that this new information D actually consisted of several different propositions:

现在假设新信息D实际有多个不同的命题组成:

 $D = D_1D_2D_3···$  . (4.10)

Then we could expand the likelihood ratio by successive applications of the product rule:

那么可以将似然度比值,通过连续使用乘法规则扩展为:

 $e(H|DX) = e(H|X) + 10 \log_{10} [ \frac {P(D_1|HX)}{P(D_1|\bar{H}X)} ] + 10 \log_{10} [ \frac {P(D_2|D_1HX)}{P(D_2|D_1\bar{H}X)} ] + ··· .$   (4.11)

But, in many cases, the probability for getting $D_2$ is not influenced by knowledge of $D_1$ :

但是,在很多情况中,求 $D_2$ 的概率并不受知识 $D_1$ 的影响:

 $P(D_2|D_1HX) = P(D_2|HX).$  (4.12)

One then says conventionally that $D_1$ and $D_2$ are independent. Of course, we should really say that the probabilities which the robot assigns to them are independent. It is a semantic confusion to attribute the property of ‘independence’ to propositions or events; for that implies, in common language, physical causal independence. We are concerned here with the very different quality of logical independence.

那么按传统我们可以说 $D_1$ 和 $D_2$ 是彼此独立的。当然，我们实际上应该说机器人分配给它们的概率是彼此独立的。说是一个命题或者事件具有“独立”这一属性,这会早长语义上的混淆;因为这中说法在日常语言中意味着，两者之间没有物理上的因果关系。在这里,我们关注的是逻辑上的彼此,这在性质上完全不同。

To emphasize this, note that neither kind of independence implies the other. Two events may be in fact causally dependent (i.e. one influences the other); but for a scientist who has not yet discovered this, the probabilities representing his state of knowledge – which determine the only inferences he is able to make – might be independent. On the other hand, two events may be causally independent in the sense that neither exerts any causal influence on the other (for example, the apple crop and the peach crop); yet we perceive a logical connection between them, so that new information about one changes our state of knowledge about the other. Then for us their probabilities are not independent.

为了强调这一点，请注意，这两种独立中,一种是独立的都不会暗示另一种也是独立的。两个事件可能存在事实上的因果关系（即一个事件会影响另一个事件）;但是对于一个尚未发现这一关系的科学家来说，概率代表这他具有的知识状态 — 这决定了他能够唯一做出的推断 — 这两个时间可能是彼此独立的。另一方面，两个事件可能确实是彼此不存在任何因果关系，即任何一个对另一个都没有因果影响（例如，苹果的收成和桃子的收成）;然而，我们认为它们之间存在某种逻辑关系，因此关于一个事件的所得到的新信息可能会改变我们对另一个事件的知识状态。那么对我们来说，他们的概率就不是彼此独立的。

Quite generally, as the robot’s state of knowledge represented by H and X changes, probabilities conditional on them may change from independent to dependent or vice versa; yet the real properties of the events remain the same. Then one who attributed the property of dependence or independence to the events would be, in effect, claiming for the robot the power of psychokinesis. We must be vigilant against this confusion between reality and a state of knowledge about reality, which we have called the ‘mind projection fallacy’.

通常来说，随着表示机器人的知识状态的H和X发生了变化，依赖于此的那些概率可能会从彼此独立变为有所依赖，或者从有所依赖变为彼此独立;然而事件的真实属性其实一直都是原来的样子。然后，如果说这些依赖性或独立性,是某个事件的属性,这实际是在说机器人有了超能力(用意志改变世界的能力)。我们必须警惕这种将现实与关于现实的知识相混淆的说法，我们称之为“臆想谬误”。

The point we are making is not just pedantic nitpicking; we shall see presently (Eq. (4.29)) that it has very real, substantive consequences. In Chapter 3 we have discussed some of the conditions under which these probabilities might be independent, in connection with sampling from a very large known population and sampling with replacement. In the closing Comments section, we noted that whether urn probabilities do or do not factor can depend on whether we do or do not know that the contents of several urns are the same. In our present problem, as in Chapter 3, to interpret causal independence as logical independence, or to interpret logical dependence as causal dependence, has led some to nonsensical conclusions in fields ranging from psychology to quantum theory.

我们提出的观点不仅仅是迂腐的挑剔; 我们很快就能看到（方程（4.29））,是有非常真实的实质性后果。在第3章中，被采样集是非常大的而且采样是被放回的,在与此有关的一些条件下,某些概率可能是彼此独立的。在其之后的评论一节中，我们注意到，概率等于比例因子,取决于我们是否知道几个盒子的内容是相同的。在我们目前的问题中，如第3章所述，将因果上彼此独立解释为逻辑上彼此独立，或将逻辑上的依赖关系解释为因果上的依赖关系，已经导致一些人在心理学和量子理论等领域得到了毫无意义的结论。

In case these several pieces of data are logically independent given (H X) and also given $(\bar{H}X)$ , (4.11) becomes

在给定(H X)和 $(\bar{H}X)$ 的条件下, 如果这些数据在逻辑上是彼此独立，则（4.11）变为

 $e(H|DX) = e(H|X) + 10 \sum_i \log_{10} [ \frac {P(D_i|HX)} {P(D_i|\bar{H}X)} ],$  (4.13)

where the sum is over all the extra pieces of information that we obtain.

其总和包括了我们获得的所有额外信息。

To get some feeling for numerical values here, let us construct Table 4.1. We have three different scales on which we can measure degrees of plausibility: evidence, odds, or probability; they are all monotonic functions of each other. Zero db of evidence corresponds to odds of 1 or to a probability of 1/ 2. Now, every physicist or electrical engineer knows that 3 db means a factor of 2 (nearly) and 10 db is a factor of 10 (exactly); and so if we go in steps of 3 db, or 10, we can construct this table very easily.

为了对这些数值有些直观感受，让我们构建表4.1。我们有三种不同的尺度可以衡量似真度：证据，胜率或概率;它们都是彼此的单调函数。证据的数值是0 db对应于胜率为1或概率为1/2。现在，每个物理学家或电气工程师都知道3 db表示倍数2（很近似），10 db表示倍数10（精确）;因此，如果我们采用3 db或10 db的步长，我们可以非常轻松地构建此表。

It is obvious from Table 4.1 why it is very cogent to give evidence in decibels. When probabilities approach one or zero, our intuition doesn’t work very well. Does the difference between the probability of 0.999 and 0.9999 mean a great deal to you? It certainly doesn’t to the writer. But after living with this for only a short while, the difference between evidence of plus 30 db and plus 40 db does have a clear meaning to us. It is now in a scale which our minds comprehend naturally. This is just another example of the Weber–Fechner law; intuitive human sensations tend to be logarithmic functions of the stimulus.

从表4.1中可以明显看出为什么用分贝来作为证据的单位是非常令人信服的。当概率接近1或0时，我们的直觉用处不大。概率分别是0.999和0.9999,这之间的差异对你来说意味着什么？对我来说,这好像没什么差别。花一点时间对此感到适应了之后，我们很容易看出证据增加了30分贝和增加了40分贝之间的区别。这种度量能让我们很自然的就理解了。这只是Weber-Fechner法则的另一个例子;人类的直觉感受,常常是刺激的对数函数。

Table 4.1. Evidence, odds, and probability.

表 4.1. 证据 e, 胜率 o, 概率 p.

e	O	p
0	1:1	1/2
3	2:1	2/3
6	4:1	4/5
10	10:1	10/11
20	100:1	100/101
30	1000:1	0.999
40	$10^4$ :1	0.9999
−e	1/O	1 − p

1:1

1/2

2:1

2/3

4:1

4/5

10:1

10/11

100:1

100/101

1000:1

0.999

$10^4$ :1

0.9999

−e

1/O

1 − p

Even the factor of 10 in (4.8) is appropriate. In the original acoustical applications, it was introduced so that a 1 db change in sound intensity would be, psychologically, about the smallest change perceptible to our ears. With a little familiarity and a little introspection, we think that the reader will agree that a 1 db change in evidence is about the smallest increment of plausibility that is perceptible to our intuition. Nobody claims that the Weber–Fechner law is a precise rule for all human sensations, but its general usefulness and appropriateness is clear; almost always it is not the absolute change, but more nearly the relative change, in some stimulus that we perceive. For an interesting account of the life and work of Gustav Theodor Fechner (1801–87), see Stigler (1986c).

即使是（4.8）中的比例为10也是合适的。在最初的声学应用中，引入分贝这个单位时,声音强度的1分贝变化是在心理上是我们耳朵能感知到的最小变化。根据一点了解和内省，我们认为读者会同意证据发生1 db的变化是我们直觉能够感知的似真度的最小增量。还没有人声称韦伯-费希纳法则是关于人类所有感觉的一个准确规则，但我们很清楚它的普遍适用性;我们几乎总感觉到的是刺激的相对变化,而不是刺激的绝对值。有关Gustav Theodor Fechner（1801-87）的生活和工作的一些有趣描述，请参阅Stigler（1986c）。 donehere Now let us apply (4.13) to a specific calculation, which we shall describe as a problem of industrial quality control (although it could be phrased equally well as a problem of cryptography, chemical analysis, interpretation of a physics experiment, judging two economic theories, etc.). Following the example of Good (1950), we assume numbers which are not very realistic in order to elucidate some points of principle. Let the prior information X consist of the following statements:

现在让我们将（4.13）应用于一个特定的计算，我们将其描述为工业质量控制的问题（它同样适用于密码学问题，化学分析问题，对物理实验的解释，对两个经济理论进行判别等等）。参考Good（1950）的例子，为了阐明一些原则和要点,我们假设数字不是和实际完全一致。我们让先验信息X包含以下陈述：

X ≡ We have 11 automatic machines turning out widgets, which pour out of the machines into 11 boxes. This example corresponds to a very early stage in the development of widgets, because ten of the machines produce one in six defective. The 11th machine is even worse; it makes one in three defective. The output of each machine has been collected in an unlabeled box and stored in the warehouse.

X≡我们有11台生产一些小部件的自动机器，生产出来的小部件将被自动倒进到11个盒子中。这只是生成过程的一个早期阶段，因为有十台机器的产品会有六分之一是有缺陷的。而第11台机器更糟,生产的产品有三分之一是有缺陷的。每个机器生产出来的部件都收集在一个对应的,没有标签的盒子中,被存储在仓库中。

We choose one of the boxes and test a few of the widgets, classifying them as ‘good’ or ‘bad’. Our job is to decide whether we chose a box from the bad machine or not; that is, whether we are going to accept this batch or reject it.

我们选中一个盒子,并取出一些小部件进行测试，将它们分为“好”的或“坏”的。我们的工作就是决定选中的盒子是不是对应了那台最糟糕的机器,如果是就丢弃掉盒子中的所有小部件,反之则全部接受。

Let us turn this job over to our robot and see how it performs. Firstly, it must find the prior evidence for the various propositions of interest. Let

让我们把这个工作交给我们的机器人，看看它是如何处理。首先，它必须找到有意义的命题和相应的先验证据。让

A ≡ we chose a bad batch (1/3 defective),
B ≡ we chose a good batch (1/6 defective).

A ≡ 应该全部丢弃本批次的产品 (1/3是瑕疵品),
B ≡ 得到一个可接受批次的产品 (1/6是瑕疵品).

The qualitative part of our prior information X told us that there are only two possibilities; so in the ‘logical environment’ generated by X, these propositions are related by negation: given X, we can say that

先验信息X中的定性部分告诉我们,只有两种可能,所以在由X产生的"逻辑环境"中,这两个命题是互补的:在给定X时,有

 $\bar{A} = B, \bar{B} = A.$  (4.14)

The only quantitative prior information is that there are 11 machines and we do not know which one made our batch, so, by the principle of indifference, P(A|X) = 1/11, and

唯一的定量信息就是共有11台机器,但不知道我们拿到的是那一台的产品,根据无差别原则,P(A|X) = 1/11,且

 $e(A|X) = 10 \log_{10} \frac {P(A|X)}{P(\bar{A}|X)} = 10 \log_{10} \frac {(1/11)}{(10/11)} = −10 db,$  (4.15)

whereupon we have necessarily e(B|X) = +10 db.

由此必然有e(B|X) = +10 db.

Evidently, in this problem the only properties of X that will be relevant for the calculation are just these numbers, ±10 db. Any other kind of prior information which led to the same numbers would give us just the same mathematical problem from this point on. So, it is not necessary to say that we are talking only about a problem where there are 11 machines, and so on. There might be only one machine, and the prior information consists of our previous experience with it.

显然，在这个问题中，与计算相关的X的唯一属性就是这些数字，±10 db。导致相同数字的任何其他类型的先验信息,从这一点上看,给出的数学问题都是一样的。因此，没有必要说我们只讨论这11台机器的问题，依此类推。也许问题中只有一台机器，而且在先验信息中已经将经验考虑在内。

Our reason for stating the problem in terms of 11 machines was that we have, thus far, only one principle, indifference, by which we can convert raw information into numerical probability assignments. We interject this remark because of a famous statement by Feller (1950) about a single machine, which we consider in Chapter 17 after accumulating some more evidence pertaining to the issue he raised. To our robot, it makes no difference how many machines there are; the only thing that matters is the prior probability for a bad batch, however this information was arrived at.2

问题中只提及了11台机器,是因为到目前为止，我们只有一个可用原则，即无差别原则，以此我们可以根据原始信息为概率分配数值。我们插入这句话是因为Feller（1950）关于单台机器的一个著名声明，我们将在第17章中,在积累了与此相关的更多证据之后,再来关注这个问题。对于我们的机器人来说，并不在乎机器一共有多少台;唯一重要的是得到一个坏批次产品的先验概率，无论这个信息是从哪来的.[2]

Now, from this box we take out a widget and test it to see whether it is defective. If we pull out a bad one, what will that do to the evidence for a bad batch? That will add to it

现在，从这个盒子中我们取出一个小部件并测试它以查看它是否有缺陷。如果我们拿出一个瑕疵品，这将如何影响得到的是坏批次的证据？证据将增加了

 $10 \log_{10} \frac {P(bad|AX)}{P(bad|\bar{A}X)} db$  (4.16)

where P(bad|AX) represents the probability for getting a bad widget, given A, etc.; these are sampling probabilities, and we have already seen how to calculate them. Our procedure is very much ‘like’ drawing from an urn, and, as in Chapter 3, on one draw our datum D now consists only of a binary choice: (good/bad). The sampling distribution P(D|HX) reduces to

其中P（bad|AX）表示在给定A等等条件时获得瑕疵品的概率;这些是抽样概率，我们已经知道如何计算它们。我们的过程非常“类似”从一个盒子中抽取色球，并且，正如在第3章中，在一次抽取中，我们的数据D现在仅包含二元选择:(好/坏）。采样分布P（D|HX）减少为

 $P(bad|AX) = \frac {1}{3}, P(good|AX) = \frac {2}{3},$   (4.17)
 $P(bad|BX) = \frac {1}{6}, P(good|BX) = \frac {5}{6}.$  (4.18)

Thus, if we find a bad widget on the first draw, this will increase the evidence for A by

因此,如果第一次抽到的是瑕疵品,将对A的证据将增加了

 $10 \log_{10} \frac {(1/3)}{(1/6)} = 10 \log_{10} 2 = 3 db.$  (4.19)

What happens now if we draw a second bad one?We are sampling without replacement, so as we noted in (3.11), the factor (1/3) in (4.19) should be updated to

如果我们再次抽出的还是瑕疵品,会发生什么?我们的采样是非放回的,正如在(3.11)中看到的,(4.19)中的因子(1/3)将变为

 $\frac {(N/3)−1}{N−1} = \frac {1}{3} − \frac {2}{3(N−1)},$  (4.20)

where N is the number of widgets in the batch. But, to avoid this complication, we suppose that N is very much larger than any number that we contemplate testing; i.e. we are going to test such a negligible fraction of the batch that the proportion of bad and good ones in it is not changed appreciably by the drawing. Then the limiting form of the hypergeometric distribution (3.22) will apply, namely the binomial distribution (3.86). Thus we shall consider that, given A or B, the probability for drawing a bad widget is the same at every draw regardless of what has been drawn previously; so every bad one we draw will provide +3 db of evidence in favor of hypothesis A.

其中N是本批次中小部件的数量。但是，为了避免这种复杂情况，我们假设N比我们测试中的任何数字都要大得多; 也就是说，我们取出来进行测试的产品个数在这个批次中少到可忽略不计，使得在取走这些被测产品后,本批次中的良品和次品的比例不会有明显改变。那么就可以应用超几何分布的极限形式（3.22），即二项分布（3.86）。因此，我们将考虑在给定A或B时，无论之前抽到了什么,每次抽到瑕疵品的概率都相等;所以每当抽到的一个瑕疵品,有利于支持假设A的证据都将增加+3 db.

Now suppose we find a good widget. Using (4.14), we get evidence for A of

现在假设我们拿到了一个良品。使用（4.14），我们得到A的证据

 $10 \log_{10} \frac {P(good|AX)}{P(good|BX)} = 10 \log_{10} \frac {(2/3)}{(5/6)} = −0.97 db,$  (4.21)

but let’s call it −1 db. Again, this will hold for any draw, if the number in the batch is sufficiently large. If we have inspected n widgets, of which we found $n_b$ bad ones and $n_g$ good ones, the evidence that we have the bad batch will be

但我们称之为-1 db。同样，如果一个批次中的个数足够大，任何一次抽样都将如此。如果我们检查了n个小部件，其中我们发现 $n_b$ 个瑕疵品和 $n_g$ 个良品，那么这次是坏批次的证据将是

 $e(A|DX) = e(A|X) + 3n_b − n_g.$  (4.22)

You see how easy this is to do once we have set up the logarithmic machinery. The robot’s mind is ‘driven in one direction or the other’ in a very simple, direct way.

你能看到一旦我们建立了对数机制，这是多么容易。机器人的思维是以一种非常简单，直接的方式“向一个方向被驱动”。

Perhaps this result gives us a deeper insight into why the Weber–Fechner law applies to intuitive plausible inference. Our ‘evidence’ function is related to the data that we have observed in about the most natural way imaginable; a given increment of evidence corresponds to a given increment of data. For example, if the first 12 widgets we test yield five bad ones, then

也许这一结果让我们更深入地了解为什么Weber-Fechner定律适用于直观的合情推理。我们的“证据”函数与我们在可以想象的最自然的方式中观察到的数据有关; 给定的证据增量对应于给定的数据增量。例如，如果我们测试的前12个小部件中有五个瑕疵品，那么

e(A|DX) = −10 + 3 × 5 − 7 = −2 db, (4.23)

or, the probability for a bad batch is raised by the data from (1/11) = 0.09 to $P(A|DX) \simeq 0.4$ .

或者,数据使得本次是坏批次的概率从(1/11)=0.09增加到 $P(A|DX) \simeq 0.4$ .

In order to get at least 20 db of evidence for proposition A, how many bad widgets would we have to find in a certain sequence of $n = n_b + n_g$ tests? This requires

为了让命题A的证据至少有20 db,在一个特定的 $n = n_b + n_g$ 次检测中,我们至少要有多少个瑕疵品?这要求

 $3n_b − n_g = 4n_b − n = n(4 f_b − 1) ≥ 20,$  (4.24)

so, if the fraction $f_b ≡ n_b/n$ of bad ones remains greater than 1/4, we shall accumulate eventually 20 db, or any other positive amount, of evidence for A. It appears that $f_b = 1/4$ is the threshold value at which the test can provide no evidence for either A or B over the other; but note that the +3 and −1 in (4.22) are only approximate. The exact threshold fraction of bad ones is, from (4.19) and (4.21),

所以，如果瑕疵品的比例 $f_b≡n_b/n$ 大于1/4，我们最终将为A积累20分贝或任何其他正数量的证据。看来 $f_b = 1 / 4$ 是测试不能提供能够对A或B作出区别的证据的阈值;但请注意，（4.22）中的+3和-1只是近似值。瑕疵品的确切阈值分数,是从（4.19）到（4.21），

 $f_t = \frac {log (\frac {5}{4})} {log(2) + log(\frac{5}{4})} = 0.2435292,$  (4.25)

in which the base of the logarithms does not matter. Sampling fractions greater (less) than this give evidence for A over B (B over A); but if the observed fraction is close to the threshold, it will require many tests to accumulate enough evidence.

其中对数的底是什么无关紧要。比这更大（更少）的抽样比例给出了A优于B（或B优于A）的证据;但如果观察到的比例接近阈值，则需要进行更多次测试以积累足够的证据。

Now all we have here is the probability or odds or evidence, whatever you wish to call it, of the proposition that we chose the bad batch. Eventually, we have to make a decision: we’re going to accept it, or we’re going to reject it. How are we going to do that? Well, we might decide beforehand: if the probability of proposition A reaches a certain level, then we’ll decide that A is true. If it gets down to a certain value, then we’ll decide that A is false. There is nothing in probability theory per se which can tell us where to put these critical levels at which we make our decision. This has to be based on value judgments: what are the consequences of making wrong decisions, and what are the costs of making further tests?

对于抽到了一个坏批次这个命题,我们得到是它的概率或胜率或证据，无论你想称之为什么。最终，我们必须做出决定：我们将接受它，还是拒绝它。我们该怎么做出这个决定？好吧，我们可能事先决定：如果命题A的概率达到一定水平，那么我们将判定A是真的。如果它下降到某个值，那么我们将判定A是假的。概率论本身无法告诉我们让我们做出决定的关键阈值应该是什么。这必须基于我们对价值的判断：做出错误决定的后果是什么，进一步测试的成本是多少？

This takes us into the realm of decision theory, considered in Chapters 13 and 14. But for now it is clear that making one kind of error (accepting a bad batch) might be more serious than making the other kind of error (rejecting a good batch). That would have an obvious effect on where we place our critical levels.

这将我们带入决策理论领域，将在第13章和第14章中对此进行讨论。但是现在很明显，犯一种错误（接受了一个瑕疵品的批次）可能比犯另一种错误（拒绝一个好的批次）后果更严重。这将对我们决定判断阈值是什么产生明显的影响。

So we could give the robot some instructions such as ‘If the evidence for A is greater than +0 db, then reject this batch (it is more likely to be bad than good). If it goes as low as −13 db, then accept it (there is at least a 95% probability that it is good). Otherwise, continue testing.’We start doing the tests, and every time we find a bad widget the evidence for the bad batch goes up 3 db; every time we find a good one, it goes down 1 db. The tests terminate as soon as we enter either the accept or reject region for the first time.

所以我们可以给机器人一些指令，例如'如果A的证据大于+0 db，那么拒绝这个批次（它更可能是环批次）。如果它低至-13 db，则接受它（至少有95％的概率是好批次）。否则，测试继续进行。我们开始进行测试，每当我们发现瑕疵品时，是坏批次的证据上升3分贝;每次我们找到一个良品时，则下降1分贝。一旦证据的取值第一次进入接受区域或拒绝区域，那么终止测试。

The way described above is how our robot would do it if we told it to reject or accept on the basis that the posterior probability of proposition A reaches a certain level. This very useful and powerful procedure is called ‘sequential inference’ in the statistical literature, the term signifying that the number of tests is not determined in advance, but depends on the sequence of data values that we find; at each step in the sequence we make one of three choices: (a) stop with acceptance; (b) stop with rejection; (c) make another test. The term should not be confused with what has come to be called ‘sequential analysis with nonoptional stopping’, which is a serious misapplication of probability theory; see the discussions of optional stopping in Chapters 6 and 17.

上面描述的方式是我们告诉机器人应有的行为,如果在命令A的后验概率达到一定值后,拒绝还是接受。这个非常有用和强大的过程在统计文献中被称为“顺序推理”，这个术语表示测试的次数不是事先确定的，而是取决于我们找到的数据值的序列;在序列的每一步，我们做出三个选择之一：（a）停止测试并接受本批次; （b）停止测试并拒绝本批次; （c）再取一个做测试。这个术语不应该与所谓的“非静态终止的顺序分析”相混淆，这是对概率论的严重误用;请参阅第6章和第17章中关于动态终止的讨论。

2 Notice that in this observation we have the answer to a point raised in Chapter 1: How does one make the robot ‘cognizant’ of the semantic meanings of the various propositions that it is being called upon to deal with? The answer is that the robot does not need to be ‘cognizant’ of anything. If we give it, in addition to the model and the data, a list of the propositions to be considered, with their prior probabilities, this conveys all the ‘meaning’ needed to define the robot’s mathematical problem for the applications now being considered. Later, we shall wish to design a more sophisticated robot which can also help us to assign prior probabilities by analysis of complicated but incomplete information, by the maximum entropy principle. But, even then, we can always define the robot’s mathematical problem without going into semantics.

注释2: 请注意，在这个观察中，我们得到了第1章中提出的观点的答案：如何使机器人“认知”它被要求处理的各种命题的语义意义？答案是机器人不需要“认知”任何东西。如果在除了模型和数据之外，我们给它一个要考虑的命题列表及其先验概率，这就足以将为现在处理的应用问题定义成数学问题所需的所有“意义”传递给机器人了。之后，我们希望设计一个更复杂的机器人，它还可以通过最大熵原理分析复杂但不完整的信息来帮助我们分配先验概率。但是，即使这样，我们也可以在不用处理语义的情况下给机器人定义数学问题。

Chapter 4.2 基于二值数据的二元假设检验

4.2 Testing binary hypotheses with binary data 基于二值数据的二值假设检验

results matching ""

No results matching ""