On the Accumulation of Individual Probabilities

by J. W. Deardorff, 2003

A heuristic approach. A simple formula allows one to provide an overall probability, P, that something is true or false -- hoax or no hoax -- if these are the only two possible outcomes, given that a collection of mutually exclusive pieces of evidence e(i) exist that bear upon the problem, each of which by itself lends an indication or judgment indicative of either hoax or no hoax. We may more correctly think of these pieces of evidence as the judgments that they invite. A probability p(i), which lies between zero and one, is then assigned as to the likelihood that each judgment is correct. If any one piece of evidence is considered so convincing one way or the other that its associated p value is assigned p = 0 or 1, then the matter is considered settled then and there. Hence, in any practical application, neither limiting extreme, 0 or 1, enters into the problem. In the Matthew-TJ verse comparisons of this website, a p value is expressed at the end of each set of verse comparisons in terms of pHoax, for which a value close to zero will suggest a small probability that the TJ verse/passage in question could have been generated by a hoaxer. A value of 0.5 indicates that one cannot decide which direction the dependency goes for that particular verse comparison. A verse comparison assigned a value greater than 0.5 indicates that, if no other information were available, a literary hoax would seem more likely than that the writer of Matthew made use of the TJ.

I am indebted to Bill McKie, Earth Science Division at NASA Ames Research Center, Moffett Field, CA, for my first exposure to the formula and its derivation, as follows:

    
      M1
 P = --------
      M1 + M2

where M1 = p(1) * p(2) * ... * p(n)

and   M2 = [1-p(1)] * [1-p(2)] * ... * [1-p(n)],

the asterisk (*) denotes multiplication, and there are n values of p involved.

To see how this works, consider the collection of three different pieces of evidence for which pHoax(1) = 0.1, pHoax(2) = 0.2 and pHoax(3) = 0.4. We suppose that these three pieces of evidence tell all that is known about the matter. Each piece of evidence is thought of as consisting of sub-units of evidence either favoring or not favoring the hoax hypothesis, in proportion to the estimated probability. These three cases then are represented by three strings of possible H and N outcomes, where H stands for hoax and N stands for No-hoax, as follows:

H N N N N N N N N N
H H N N N N N N N N
H H H H N N N N N N

The number of combinations of all H's among the three pieces of evidence is 8 (i.e., 1*2*4). That is, the H in the first string can be combined in two ways with the two H's in the 2nd string, and each of those resultant two ways can be combined in four ways with the H's of the third string.

The number of combinations of all N's are similarly seen to be 9*8*6 = 432. All other possible combinations involve both H and N together, but these are inadmissible in a problem for which the solution is a priori known to be either Yes or No, or either a Hoax or No-hoax. They are incompatible, or non-realizable, and must be discarded. Thus the overall probability that these three pieces of evidence supply for the hoax hypothesis is:

P = 8/(8 + 432) = 0.018

As a check, the preceding formula gives the same result:

P = 0.1*0.2*0.4 / (0.1*0.2*0.4 + 0.9*0.8*0.6) = 0.018.

It is seen that if the probabilities are each below 0.5, their resultant probability becomes lowered below that of the lowest individual probability existent. This agrees with our intuition that the more evidence the better. In particular, if there are only two p(i) involved, and should each be estimated as 0.293, then their accumulated probability is just one half of this.

It can also be checked that the individual probabilities may be accumulated in this manner into a smaller number of subgroups, and then the subgroups accumulated similarly, to arrive at the same answer.

If one or more values of p = 0.5 are part of the collection of individual probabilities, it can be seen that they do not affect P, as they factor out from both numerator and denominator of the formula. Values of p > 0.5 can be included in the formula right along with values of p < 0.5.

It seems evident that for general validity of this approach, all pieces of evidence that bear upon the yes-no problem at hand must be included in the analysis, each with its best, unbiased estimate of probability, p.

A Bayesian Statistics approach. This problem has been worked out by Peter Sturrock in his article "Applied scientific inference" in J. Sci. Exploration 8, No. 4, pp. 491-508 (1994), using Bayesian statistics as follows, except where I have made a few clarifications applicable to the plagiarism problem at hand.

Let H be the hypothesis under consideration, let I be a piece of information that bears on the hypothesis, and let Z be "baseline" information, which may or may not be present or recognized at the time as influencing one's judgments about the hypothesis and information.

With these parameters (but temporarily omitting Z), Bayes theorem is

P(H | I) = P(I | H)*P(H) / P(I)

where P(H | I) is the probability that the hypothesis H is true given the interpretion of the information, I, P(I | H) is the probability that the interpretaion of the information is correct upon assuming that H is true, the asterisk denotes multiplication, P(H) is the prior probability that H is true—the probability that would be assigned if no explicit information about H is available at all, and P(I) is the probability that would be assigned to the correctness of the interpretion of the information if there were no assumptions made about any hypothesis.

However, to denote the role of the baseline information, Z, Sturrock expresses Bayes theorem as:

P(H | IZ) = P(I | HZ)*P(HZ) / P(IZ)                                                                          (Eqn. 2.4)

The equation numbers here follow those of Sturrock. If there were no hidden assumption or subconscious baseline information (only an open mind), there would only be complete ignorance about the truth of the hypothesis and the correctness of the interpretation of the information, before any of that information is examined. In that case,

P(H | Z) = 0.5 = P(H' | Z)                                                                                                 (2.4a)

where H' stands for the contradiction to H, i.e., P(H') = 1-P(H). Thus, if H is the hypothesis that the Gospel of Matthew was based upon the Talmud of Jmmanuel, then H' is the hypothesis that the TJ is a hoax based upon Matthew (assuming no other possibilities exist). We may rewrite (2.4) with that complementary hypothesis in mind:

P(H'| IZ) = P(I | H'Z)*P(H'Z) / P(IZ)                                                                                (2.4b)

Upon dividing (2.4) by (2.4b), and noting that P(IZ) cancels out, we have

P(H | IZ) / P(H'| IZ) = P(I | HZ) / P(I | H'Z) * P(H | Z) / P(H'| Z)                                        (2.5)

The three ratios in (2.5) are called "odds," which is just how odds are defined in terms of probabilities. They can take on values anywhere between near-zero and near-infinity. Thus, the ratio on the left of (2.5) is the odds on H, based upon the information I and Z. It is the post odds, which we wish to determine after all the evidence, not just one piece of it, is taken into account. The last ratio on the right side of (2.5) is the prior odds. Thus, the post odds on H equals the prior odds times the ratio of the likelihoods of correctness of I based upon H and H'.

Sturrock next defines the base-10 logarithm of the odds on H, using the symbol L, as

L(H | Z) = log10[P(H | Z) / P(H' | Z)]                                                                                    (2.6)

Then (2.5) may be written

L(H |IZ) = L(H | Z) + L(I | HZ)                                                                                           (2.7)

The important final step is to realize that the information I might consist of several, or even hundreds, of individual pieces of information each with its own interpretation—n of them. These interpretations of the information may be designated I1, I2,...In. These would add in just as the last term on the right of (2.7) adds to its preceding term:

L(H | I1,I2...InZ) = L(H | Z) + L(I1 | HZ) + L(I2 | HZ) + ... + L(In | HZ)                          (2.9)

This is the same formula as derived heuristically above, except being expressed in odds instead of probabilities, and including the "baseline information" term, also known as the "prior probability." The latter term vanishes, due to (2.4a), if all the information bearing upon H has been taken explicitly into account through the P(Ii | HZ) terms.

The n individual values of I need to be conditionally independent of each other, or mutually exclusive. They of course are not totally independent, since they all bear upon the same hypothesis. (In the accumulation of probabilities from the Matthew-TJ verse comparisons, those verse comparisons that are not mutually exclusive from others already taken into account are not assigned any probability, or are lumped in with one or more adjacent verses for which a single probability for the whole unit is assigned.)

The fact that the log-odds are additive accounts for the finding that the information bearing on H can be taken in whatever order one wishes, and accumulated into sub-groups, if desired, before those subgroups are accumulated to give the final odds. In the problem of the direction of literary dependence addressed in this website, the sum of the probability that a piece of evidence is interpreted as pointing towards TJ genuineness, and the probability that it indicates a TJ hoax based upon Matthew, equals unity: P(Ii | HZ) + P(Ii | H'Z) = 1.

Although this method provides a rational, mathematical procedure for accumulating the individual probabilities or odds, the outcome of the whole analysis will still depend entirely upon what values are assigned to the individual probabilities, P(Ii | HZ), and these in turn will depend largely upon what rules of evidence are utilized. These rules of evidence comprise some or most of the "baseline" information, Z, that the analyst might take for granted.

An independently derived Bayesian formula. A Bayesian derivation of the formula was worked out by Prof. Stephen Rathbun, Dept. of Statistics, Pennsylvania State University, and then clarified in its presentation of prior probabilities by Bill McKie. Click here for the pdf version. Again, it yields the same answer/formula as the heuristic approach above. The prior probability, if one considers it to enter into the analysis, is seen to act exactly like one or more pieces of additional evidence that may have been overlooked at first.