### Introduction

Elsewhere on this blog we have discussed the distribution of values predicted by confidence intervals, referred to more formally as **probability density functions** (pdfs).

When we plot confidence intervals, we determine the nearest point(s) to our observed value where we would expect the true population value to be and still be considered significantly different from it (at a given error level).

But we can also plot the **distribution** of the interval function by varying the error level α from 1 (every difference is significant) to almost zero (asymptotically: nothing is significant). We can then see which expected values are more likely than others given one further piece of information:

- The upper and lower bounds are computed independently, so the plot effectively assumes that there is equal chance of an observed property
*p*being greater than the expected property*P*, as vice versa, except at the boundary, where one distribution becomes empty.

We can then assess the overall shape, observing how these values tail off, a process that is much more instructive than arguing about ‘p-values’.

We can also see something else.

Most traditional discussions of confidence intervals assume that intervals are approximately Normal, an assumption Wallis (2021: 297) calls the ‘Normal fallacy’. This conceptual error has dogged discussion of confidence intervals in the statistics literature, and deeply affects how people rationalise about intervals.

In my book, I point out that even the simplest interval about the single proportion *p* cannot be Normal. Instead we discover that it is profoundly shaped by the boundaries of the probabilistic range ** P **= [0, 1]. Elsewhere on this blog, I have developed the implications of this argument and plotted more distributions.

I show the shape of the distribution for the Wilson score interval based on *p* (Wilson 1927) and other related distributions. Only when *n* is large and *p* central does this distribution approximate to the Normal.

In this blog post we will refer to the interval *p* ∈ (*w*^{–}, *w*^{+}) in terms of a functional notation. Wallis (2021: 111) proposes two **Wilson functions** with three parameters.

*lower bound w*^{–} = WilsonLower(*p*, *n*, α/2) = *p*′ – *e*′,*upper bound w*^{+} = WilsonUpper(*p*, *n*, α/2) = *p*′ + *e*′, (1)

where

*p*′ = *p* + *z*_{α/2}²/2*n*

1 + *z*_{α/2}²/*n*, and *e*′ = *z*_{α/2} √*p*(1 – *p*)/*n* + *z*_{α/2} ²/4*n*²

1 + *z*_{α/2}²/*n*.

where *n* is the sample size, *p *= *f*/*n* is the observed proportion and α/2 is the error level for each tail. Each bound should be treated separately, although they converge at *p* where α = 1.

Wilson’s interval may be corrected for continuity to adjust for the fact that we are using a continuous function to approximate a discrete set of options (*p* ∈ {0, 1/*n*, 2/*n*… 1}). We can compute corrected intervals with

*w*_{cc}^{–}= WilsonLower(max(0, *p* – 12*n*), *n*, α/2),*w*_{cc}^{+ }= WilsonUpper(min(1, *p* + 12*n*), *n*, α/2). (2)

This simply moves *p* outwards by a continuity correction term *c* = 12*n*, and then computes the Wilson interval for that adjusted proportion.

Other formulae, such as the ‘exact’ Clopper-Pearson (CP) interval may be employed instead. When we plot the distribution of continuity-corrected or CP intervals we can see that the ‘Wilson distribution’ is *two *distributions: one for each bound (Wallis 2021: 308-310). This is why it is important to know whether *p* > *P *or vice versa.

**Note:** In what follows, readers may substitute an alternative formula, such as Jeffreys, CP, or bootstrap formulae for the confidence interval for *p*. We use the Wilson because it is superior to most, and its defects may be simply addressed through corrections for continuity, population size or random-text sampling.

In a new paper (Wallis forthcoming), I also show distributions for intervals on selected monotonic functions of *p*. These are the square, reciprocal, logarithm and logit projections on the Wilson interval. Apart from when *p* is 0 or 1, the logit Wilson distribution, exceptionally, is approximately Normal (see also Wallis 2021: 308).

We also consider non-monotonic functions. An example plot for such a function is found in Confidence intervals on goodness of fit ϕ scores.

The method for plotting these distributions can be summarised as follows.

- We take an observed proportion
*p*, which optionally we may project with some function (fn(*p*)) onto a different scale (e.g. reciprocal 1/*p*∈ [0, ∞]), choose an interval formula (Wilson, continuity-corrected Wilson, Clopper-Pearson, etc.) and compute the position of bounds for every error level α. - This results in a
**cumulative distribution**of scores which we convert into a probability distribution by a method termed**delta approximation**(Wallis 2021: 300, see also Plotting the Wilson distribution).

We are now ready to begin considering properties obtained by combining independent proportions.

#### Intervals on algebraic combinations of two or more proportions

The Newcombe-Wilson interval (Newcombe 1998) is a confidence interval on the difference between two observed proportions, *d* = *p*_{2} – *p*_{1}, derived from the Wilson interval. Although it may be cited as an interval about zero (*w _{d}*

^{–},

*w*

_{d}^{+}), such that

*d*is tested against it, its most easily generalised form is simply as a range of differences,

*d*∈ (

*d*

^{–},

*d*

^{+}). This formula is well behaved and does not exceed [-1, 1]. I discussed the pdf distribution of this interval in Plotting the Newcombe-Wilson distribution.

In An algebra of intervals and Wallis (forthcoming) we generalise this method using Zou and Donner’s (2008) method to obtain intervals on sums of proportions such as *s* = *p*_{1} + *p*_{2}, ratios *r* = *p*_{1} / *p*_{2} and products *p* = *p*_{1} × *p*_{2}. In this blog post we discuss the distribution of these intervals, allowing us to see how they perform.

Zou and Donner’s method can be summarised by the following general schema. Suppose we have two independent parameters, which we call θˆ_{1} and θˆ_{2}, with confidence intervals (*l*_{1}, *u*_{1}), (*l*_{2}, *u*_{2}) respectively, then the interval on the difference (*L*, *U*) may be computed by

(*L*, *U*) ≡ (θˆ_{1} – θˆ_{2} – √(θˆ_{1} – *l*_{1} )² + (*u*_{2} – θˆ_{2} )², θˆ_{1} – θˆ_{2} + √(*u*_{1} – θˆ_{1} )² + (θˆ_{2} – *l*_{2} )²). (3)

The rationale for this formula is that each of the squared terms, such as (θˆ_{1} – *l*_{1})^{2}, is a **scaled variance** for the bound of the single interval, *z*_{α/2}^{2} × *s*^{2}(*l*_{1}). This formula then applies the sum of independence variance rule (also known as the Bienyamé formula) to the two parameters.

Armed with this formula, Zou and Donner argue that θˆ_{1} and θˆ_{2} need not be proportions (which is the case with the Newcombe-Wilson formula). They can be any independent parameter for which a high quality coverage confidence interval exists. Given that we can simply transform any interval on *p *to an interval on a monotonic function of *p*, this permits us to obtain formulae for sums, ratios and products.

### Difference between proportions *d* = *p*_{2} – *p*_{1}

It is conventional to subtract *p*_{2} (the second observed proportion or rate in a pair) from *p*_{1} (the first). So with indexes reversed, we simply substitute the following into Equation (3).

- θˆ
_{1}=*p*_{2}, (*l*_{1},*u*_{1}) = (*w*_{2}^{–},*w*_{2}^{+}), and - θˆ
_{2}=*p*_{1}, (*l*_{2},*u*_{2}) = (*w*_{1}^{–},*w*_{1}^{+}).

This obtains the Newcombe-Wilson interval about *d*, which is based on four variables: proportions *p*_{1} and *p*_{2}, and sample sizes *n*_{1} and *n*_{2}. We can then proceed to calculate the probability density function.

**Tip:**Stefan Gries’ collocation measure*ΔP*is simply –*d*=*p*_{1}–*p*_{2}. Armed with this information, computing an interval for*ΔP*is trivial.

In order to see the resulting function’s relationship to the Wilson distributions for *p*_{1} and *p*_{2} on the same scale, we reposition them on the difference scale, i.e. *d*_{1} = *p*_{2} – Wilson(*p*_{1}) and *d*_{2} = Wilson(*p*_{2}) – *p*_{1}. (For simplicity, we can drop sample size and error level parameters.) These curves model a situation where all of the uncertainty occurs on a single term, e.g. for *d*_{1}, we assume that we know the value of *p*_{2} but *p*_{1} is uncertain. See Figure 1.

In Plotting the Newcombe-Wilson distribution we also discuss a number of permutations of these curves, including where *p*_{1} or *p*_{2} = 0 or 1, small sample sizes, and the application of continuity correction. For reasons of space we will not repeat this here, but we will draw out comparisons.

### Sum of proportions *s* = *p*_{1} + *p*_{2}

We can make the following substitutions into Equation (3) to obtain an interval for the sum. Negation is monotonic, but causes us to swap bounds.

- θˆ
_{1}=*p*_{1}, (*l*_{1},*u*_{1}) = (*w*_{1}^{–},*w*_{1}^{+}), and - θˆ
_{2}= –*p*_{2}, (*l*_{2},*u*_{2}) = (–*w*_{2}^{+}, –*w*_{2}^{–}).

The interval and distribution in Figure 2 is obtained with raw frequencies *p*_{1} = 0.2, *p*_{2} = 0.4 for the same sample sizes as Figure 1.

The resulting sum interval has a close relationship with the difference interval. Indeed, in this case it is a mirror image. Why? Well, the interval for θˆ_{2} = –*p*_{2} is the mirrored interval for *p*_{2}. Another way of thinking about this is that the ‘shape’ of the θˆ_{2} interval and distribution is identical to that for *q*_{2} = 1 – *p*_{2} = 0.6, which is what we saw in Figure 1. So now the lower bound of the sum interval has the same width as the upper bound of the difference interval, and vice versa.

To position the Wilson intervals on the same scale, we simply substitute the Wilson term for each element (e.g. *s*_{1} = Wilson(*p*_{1}) + *p*_{2}).

We can compare this interval with an equivalent Gaussian ‘Normal approximation’ interval, just as we did with the difference interval. This interval is centred at *s*, but with a confidence interval based on the weighted mean prior probability estimate *p*ˆ (a ‘standard error’ approach). This performs poorly and can overshoot the range. (The least said about this method the better: we have included it in the spreadsheet for anyone wishing to see how poorly it performs!)

#### Correcting for continuity

As we noted, an important consequence of adopting Wilson-based methods is that they can be readily corrected for continuity, small population size and random-text sampling.

Let us illustrate this by applying a continuity correction to Wilson score intervals using Equation (2). This adjusts for the fact that our sample is small, so observed frequencies can only be whole numbers.

Where α = 1, *uncorrected* Wilson bounds meet *p*, creating the impression of a single distribution rather than two distinct distributions, one for the lower and one for the upper bound. But following adjustment, the Wilson score intervals start at a distance *c _{i}* = 12

*n*from

_{i}*p*. See Plotting the Wilson distribution.

For our new summed Wilson distribution, the resulting continuity-corrected sum interval is separated from the observed sum *s* by the square root of the sum of these terms, √*c*_{1} ² + *c*_{2} ², except where one proportion is at a boundary, where the sum reduces accordingly. The same logic applies here as it does to difference intervals, which we discussed previously.

We can perform two further steps in our review. We examine what happens to these pdf distribution curves when *p*_{1} or *p*_{2} approaches zero or 1, and the effect of keeping *s* constant while changing proportions. We also consider the effect of small samples.

#### Moving *s* and skewed *p*

For the same reasons that we discussed in the case of the difference interval, it is important to note that the ‘interval for the sum’ is not merely based on the sum *s*. It is the result of four variables: proportions *p*_{1} and *p*_{2}, and corresponding sample sizes *n*_{1} and *n*_{2}. First, let us use an animation to observe what happens if we maintain a constant difference between *p*_{1} and *p*_{2}, say 0.4, and permute *p*_{1} from 0 to 0.6.

We can see that at the extreme proportions, *p*_{1} = 0 and *p*_{2} = 1, the outer Wilson interval disappears. This is by design, as no proportion can possibly exceed the range [0, 1]. The outer interval width must become zero. Where a squared difference term in Equation (3) becomes zero, the resulting interval simply becomes equal to the remaining interval.

#### Small *n*

As you would expect, if we draw data from small samples we increase uncertainty. The effect of reducing one of the samples to, say, *n*_{2} = 2, is essentially the same as we find with the Newcombe-Wilson difference interval distribution. See Figure 5 in Plotting the Newcombe-Wilson distribution, and feel free to experiment with the spreadsheet.

Since we are performing a very similar algebraic operation with our intervals, this should not be surprising.

### Ratio of proportions *r* = *p*_{1} / *p*_{2}

Zou and Donner (2008) illustrate their theorem (Equation (3)) by showing how it can be used to compute an interval for the ratio of two proportions, which they also call the ‘risk ratio’. This formula has other applications, for example to efficiently obtain an accurate interval for percentage difference, or – by applying functions, to compute the **odds ratio**, **log odds ratio**, etc. So it is important to understand its performance!

Zou and Donner apply Equation (3) to the difference of the logarithm of proportions, ln(*p*_{1}) and ln(*p*_{2}). Since a difference in logs is the log of the ratio, we can substitute as follows:

- θˆ
_{1}= ln(*p*_{1}), (*l*_{1},*u*_{1}) = (ln(*w*_{1}^{–}), ln(*w*_{1}^{+})), and - θˆ
_{2}= ln(*p*_{2}), (*l*_{2},*u*_{2}) = (ln(*w*_{2}^{–}), ln(*w*_{2}^{+})).

This allows us to compute an interval for θˆ_{1} – θˆ_{2} = ln(*p*_{1}) – ln(*p*_{2}) = ln(*p*_{1}/*p*_{2}) = ln(*r*), in other words, the confidence interval for the ratio expressed on a log scale. The final stage is to convert the same interval to the ratio *r* scale by applying the inverse logarithm (‘exp’) function. See An algebra of intervals.

Let’s see how this works in practice. First, we will plot the distribution curves on the log(ratio) scale. The ratio curve is most similar to the numerator *p*_{1} curve. Subtracting a narrow denominator ln(*r*_{2}) confidence interval has a limited impact on the overall variation.

We can compute two additional distributions on the same scale. As before, these distributions assume all of the observed variation is found with a single parameter, and the other is constant. We define a numerator distribution, *r*_{1}, which assumes all of the variation is in the observed proportion *p*_{1} (we might say *n*_{2} → ∞). The denominator distribution, *r*_{2}, does the same for an assumption that *p*1 is a constant.

*numerator r*_{1} = Wilson(*p*_{1})/*p*_{2},*denominator r*_{2} = *p*_{1}/Wilson(*p*_{2}). (4)

In Figure 4, we plot *r*_{1}, *r*_{2} and *r* on the same (natural logarithm) scale. Note that thanks to the reciprocal function, the polarity of the *r*_{2} interval is reversed, i.e. (*r*_{2}^{–}, *r*_{2}^{+}) = (*p*_{1}/*w*_{2}^{+}, *p*_{1}/*w*_{2}^{–}).

In the spreadsheet, we perform these calculations via subtraction of logs, so that (for example) ln(*r*_{1}) = ln(Wilson(*p*_{1})) – ln(*p*_{2}). Since we must calculate the logarithms of these terms to obtain the interval for *r*, this ‘long way round’ is not as arduous as it might seem!

Next, we can plot these distributions on the ratio scale by delta approximation, taking the exponent of *r*, *r*_{1} and *r*_{2} in Figure 4 for varying α.

**Note.** We don’t merely apply the ‘exp’ transform to the scale, which would make areas under the curve unequal. We calculate interval (*x*) positions on this scale by the ‘exp’ transform, and then calculate the *y* position at each point, sufficient to include the area under the curve.

Alongside the distribution for ratio *r*, we compute the transformed Wilson distributions *r*_{1} and *r*_{2}. This gives us Figure 5.

Most of the distributions obtained by this method have a long upper tail, because a ratio of two proportions *r* ∈ [0, ∞]. We can see that the resulting distribution is well behaved. By substituting *n*_{1} or *n*_{2} = 10,000 (for example) we can confirm that the *r* curve converges on *r*_{1} and *r*_{2} respectively.

#### Skewed *p*_{1} and *p*_{2 }

Where *p*_{1} and *p*_{2} are 0 or 1, the situation is more complex with division than the other operators due to the fundamentally unequal nature of ratio terms.

*Numerator:*

- If
*p*_{1}= 0, then its Wilson score interval has only an upper bound.

**Note. **We use a tiny delta to make tiny proportions computable: for plotting we find that Excel™ obtains smoother curves with an adjustment of *p*_{1} = δ_{p} = 10^{-6} (this delta is not to be confused with that used for delta approximation). This adjustment is not required if a continuity correction is employed, as *p*_{1} would then equal 12*n*_{1}.

However this approximation does not work when both *p*_{1} and *p*_{2} are zero, and the interval is spread over the entire range [0, ∞].

- If
*p*_{1}= 1, then its interval has only a lower bound. The upper bound curve*r*^{+}=*r*_{2}^{+}.

We can illustrate this by taking sequential values for *p*_{1} from 0 to 1 and animating the result.

*Denominator:*

- If
*p*_{2}= 0, then the ratio*r*→ ∞, so*r*^{+}is also infinite. With the substitution of (e.g.)*p*_{2}= δ_{p}or 12*n*_{2}we can estimate a lower bound distribution*r*^{–}, but the distribution is stretched, and so is not easily plotted! Nonetheless, for*p*_{1}> 0,*r*^{–}→*r*_{2}^{–}(*p*_{1}/*w*_{2}^{+}), which can be computed for*w*_{2}^{+}> 0, and is a good substitute. See Figure 6. - If
*p*_{2}= 1, its transformed Wilson interval has no lower bound (the interval for*p*_{2}having no upper bound). See Animation 2 above. This also means that the lower bound of*r*=^{–}*r*_{1}.^{–}

The final animation in this section explores what happens if the denominator proportion *p*_{2} is varied. Changing the denominator has a dramatic effect on the shape of the distribution, as we might expect. At *p*_{2} ≅ 0, the interval is infinite at the upper bound, and the lower bound distribution is stretched, as in Figure 6. As *p*_{2} increases, the fraction shrinks and the distributions become increasingly bunched around it.

#### Employing a continuity correction

As previously, we apply the continuity corrected formula (Equation (2)) as input to Equation (3), in this case with the log substitutions for the ratio.

Figure 7 plots component intervals for the same observations as Figure 5, with the uncorrected interval for reference. We can see that the distributions for transformed Wilson intervals (*r*_{1}, *r*_{2}) and combined interval (*r*) are more conservative, with bounds pushed further out.

Employing a continuity correction avoids the problem of extreme proportions. Where *p* = 0 or 1, Equation (2) moves the Wilson bound 12*n* away from *p*, towards the centre of the distribution.

### Product of proportions *m* = *p*_{1} × *p*_{2}

The method for computing intervals for the product of independent proportions is very similar to that for the ratio. See also the sum formula above. The product *m* ∈ [0, 1]. (We will use *m*, for ‘multiple’, to refer to the product, to avoid confusion with proportions.)

We can substitute as follows into Equation (3).

- θˆ
_{1}= ln(*p*_{1}), (*l*_{1},*u*_{1}) = (ln(*w*_{1}^{–}), ln(*w*_{1}^{+})), - θˆ
_{2}= –ln(*p*_{2}), (*l*_{2},*u*_{2}) = (–ln(*w*_{2}^{+}), –ln(*w*_{2}^{–})).

In these graphs we will also plot scaled Wilson intervals *m*_{1} and *m*_{2}.

*m*_{1} = Wilson(*p*_{1}) × *p*_{2,}*m*_{2} = *p*_{1} × Wilson(*p*_{2}) (4)

First let us consider the log product distributions.

As an observation, the product interval tends to be closer to the interval for the smaller of the two proportions since the product of two proportions is smaller than both. In a simple product, *m*_{1} and *m*_{2} are interchangeable (like addition, the product is a *symmetric* relation).

As before, once combined we remove the logarithm with the ‘exp’ function and compute the resulting interval distributions on the product scale. The same kind of relationship with the source Wilson distributions can be seen.

#### Skewed *p*_{1} and *p*_{2}

If *p*_{1} = 0, then *m* → 0, the upper bound *m*^{+} → *m*_{1}^{+}. Likewise, if *p*_{2} = 0, *m* → 0 and *m*^{+} → *m*_{2}^{+}. However if both *p*_{1} and *p*_{2} = 0, then this substitution is not possible and we must approximate it with *p*_{1} = *p*_{2} = δ_{p}.

If *p*_{1} = 1, then its interval has only a lower bound, so *m*^{+} = *m*_{2}^{+}. Similarly if *p*_{2} = 1, *m*^{+} = *m*_{1}^{+}.

The animation below illustrates this principle.

#### Employing a continuity correction

We can substitute the continuity-corrected Wilson intervals (Equation (2)), perform log summation by Zou and Donner’s method (Equation (3)) and then project the result. We illustrate this in Figure 10.

As with the ratio interval, performing this adjustment also has the benefit of addressing issues arising in the case of skewed proportions.

### Conclusions

In the past we have computed distributions of intervals for independent proportions robustly using the delta approximation method, and applied this method to a variety of interval functions, notably but not exclusively, the Wilson score interval, with and without continuity correction. In previous posts we plotted the Wilson distribution and the Newcombe-Wilson distribution – the latter being the pdf of the difference interval calculated using Newcombe’s method. In this post we extend the method for the difference to three further algebraic operators – addition, division and multiplication.

We discover that the distribution for the sum of two proportions is as robust as that for the difference in proportions. Difference *d* ∈ [-1, 1] obtains a symmetric distribution centred at 0 if *p*_{1} = *p*_{2}. On the other hand sum *s* ∈ [0, 2] has a condition for symmetry where *p*_{1} + *p*_{2} = 1. In other cases we can obtain ‘mirror distributions’ – distributions for the sum which precisely mirror an equivalent distribution for the difference – because the Wilson interval for an alternate *q* = 1 – *p* is the reflection of that for *p*.

The situation is a little more complex when it comes to ratios and products. It also raises two issues. First, the method requires us to first transform terms to the logarithmic scale. Second, zero proportions present a particular problem unless a continuity correction is applied. Zou and Donner’s (2008) theorem (Equation (3)) relies on independent properties (with independent variances) and intervals with good coverage properties.

Yates’ continuity correction is frequently ignored in more complex statistical methods, but there is rarely a good reason to do so. For example, corrections tend to be left out of meta-tests (Wallis 2021: 238), but as I point out, the source sample has not become less ‘chunky’!

Exploring cases where proportions are 0 or 1 we observe some useful simplifications for ratio and product formulae.

For either product term or the ratio numerator *p*_{1}, the result will be multiplied by that proportion, obtaining a zero result. Importantly however, *the interval *for a zero proportion is not zero-width (certain).

Thus for the product, the upper bound for the multiple *m* = *p*_{1} × *p*_{2} tends to the bound of the other term if one proportion is zero. So if *p*_{1} = 0, *m*^{+} = *m*_{2}^{+}. The same principle applies, for *p*_{1} only, to the ratio *r*. If the denominator *p*_{2} = 0 then the ratio and its upper bound is of course infinite, but the lower bound *r*^{–} tends to *r*_{1}^{–} = *p*_{1}/*w*_{2}^{+}.

The product and ratio methods require that the properties concerned are positive and greater than zero. Indeed the logarithm of zero is infinite, which causes us to substitute a small delta term δ_{p} = 10^{-6} in cases for zero proportions. This method allows us to render distributions for uncorrected intervals computable. Zou and Donner (2008: 1697) suggest substituting the continuity correction term, but *only in extreme cases*, which is an inconsistent approach.

Finally, as I have previously commented in discussing these types of plot, very few instances of these distributions look remotely Normal (Gaussian). As sample sizes increase (and extreme proportions are avoided) these plots may tend towards something more bell-curved in shape, but an expectation that such distributions will be ‘approximately Normal’ is liable to be misleading.

### References

Newcombe, R.G. (1998). Interval estimation for the difference between independent proportions: comparison of eleven methods. *Statistics in Medicine*, **17**, 873-890.

Wallis, S.A. (2021). *Statistics in Corpus Linguistics Research.* New York: Routledge. » Announcement

Wallis, S.A. (2021, forthcoming). Accurate confidence intervals on Binomial proportions, functions of proportions and other related scores. » Post

Wilson, E.B. 1927. Probable inference, the law of succession, and statistical inference. *Journal of the American Statistical Association* **22**: 209-212.

Zou, G.Y. & A. Donner (2008). Construction of confidence limits about effect measures: A general approach. *Statistics in Medicine*, **27**:10, 1693-1702.