This post is an introduction of the bivariate normal distribution.

**Bivariate Normal Distribution**

Consider the following probability density function (pdf):

.

**(1)**……..

……..

where

……..

.

for all and .

If the joint distribution of the random variables and is described by the probability density function (1), and are said to have the bivariate normal distribution with parameters , , , and .

The above definition alone does not provide much insight about the bivariate normal distribution. At this point, it is not clear whether (1) is actually a valid pdf. It says nothing about the roles played by the five parameters. Assuming the joint density function (1), after digging a little deeper, facts about the conditional distribution of given emerge and are summarized by the following theorem.

……..

*Theorem 1*

Suppose that and have the bivariate normal distribution as defined by the pdf (1). Then the following properties hold.

- The marginal distribution of the random variable is a normal distribution with mean and variance .
- The conditional distribution of conditioning on is a normal distribution.
- The mean of the conditional distribution of conditioning on is

.

……... - The variance of the conditional distribution of conditioning on is

.

……... - The parameter is the correlation coefficient of and .

Theorem 1 centers on the conditional distribution of given . We can also derive properties about the conditional distribution of given as summarized in Theorem 2.

……..

*Theorem 2*

Suppose that and have the bivariate normal distribution as defined by the pdf (1). Then the following properties hold.

- The marginal distribution of the random variable is a normal distribution with mean and variance .
- The conditional distribution of conditioning on is a normal distribution.
- The mean of the conditional distribution of conditioning on is

.

……... - The variance of the conditional distribution of conditioning on is

.

……... - The parameter is the correlation coefficient of and .

We only need to prove Theorem 1. Interestingly, the properties in Theorem 1 also imply the joint density function (1) as summarized in the following theorem.

……..

*Theorem 3*

Suppose that the jointly distributed random variables and satisfy the following properties:

- The conditional distribution of , given , is a normal distribution.
- The mean of the conditional distribution of given , , is a linear function of .
- The variance of the conditional distribution of given , , is a constant, i.e. it is not a function of .
- The marginal distribution of is a normal distribution.

Then the joint pdf of and is the same as the one in (1), i.e. and have a bivariate normal distribution.

.

Theorem 1 and Theorem 3 combined show that the definition of bivariate normal using the pdf (1) is equivalent to the conditions in Theorem 1. Thus we can define bivariate normal distribution using either the pdf (1) or the conditions in Theorem 1 or Theorem 2.

**The Bivariate Normal Density**

In order to prove the theorems, it is helpful to reformulate the bivariate normal pdf in (1). Recall the quantity in the pdf (1). Before proving Theorem 1 and Theorem 3, it is helpful to simplify , which of course is the following quantity.

.

.

The quantity is equivalent to the following:

.

**(2)**…….... where .

.

The fact (2) is established by the following.

.

.

It is clear that (2) follows from the last step. With the help of fact (2), the pdf in (1) can be rewritten as follows:

.

**(3)**……..

.

The pdf in (3) can be rearranged as follows:

.

**(4)**……..

.

For easier reference, the function in the first set of square brackets in (4) is called and the function in the second set of square brackets in (4) is called . Note that is the density function for the normal distribution with mean and variance . The function is the density function for the normal distribution with mean , as defined in (2), and variance . Note that the in is a fixed number. So can be regarded as a conditional density of given .

**Proof of Theorem 1**

We now use the pdf in (4) to prove Theorem 1. Immediately, we observe that is a valid pdf.

.

……..

.

The above double integral is 1 because each of and is a normal pdf. The next step is to show that the marginal distribution of is a normal distribution. The marginal density function is the integral . In this integral, the function disappears since the integral of with respect to is 1. Thus , which, as mentioned above, is a normal density function with mean and variance . Thus the marginal distribution of is a normal distribution with mean and variance .

As a result of the preceding observations, (4) can be restated as follows:

.

**(5)**……..

.

Consequently, the normal density function in the square brackets of (5) must be , the density function for the conditional distribution for given . This means that the conditional distribution of given is a normal distribution with the following mean and variance.

.

**(6)**……..

.

**(7)**……..

.

According to Theorem 2 in this previous post, whenever the conditional mean is a linear function, it must be of the form exactly as described in (6). Furthermore, the quantity in (6) must be the correlation coefficient of and . This concludes the proof of Theorem 1.

The proof of Theorem 2 would be similar (just switching the roles of and ) and is not given here.

**Proof of Theorem 3**

We now assume the 4 conditions in Theorem 3 and derive the joint pdf as described in (1). As mentioned above, according to Theorem 2 in this previous post, since the conditional mean is a linear function of , is of the same form as in (6). Using (6), we evaluate the variance of the conditional distribution .

.

……..

.

Then multiply both sides by .

.

……..

.

Integrate both sides of the last expression with respect to . Since is assumed to be a constant, integrating a constant times a pdf gives that constant. Thus the left-hand side remains .

.

……..

.

The right-hand side of the above is the following expectation.

.

……..

.

Further developing the right-hand side, we have the following derivation.

.

……..

.

Thus the variance of is the constant . This means that has a normal distribution with the following mean and variance. Note that is assumed to be normal.

.

**(8)**……..

.

**(9)**……..

.

The following shows the condition pdf of and the marginal pdf of . Note that the marginal distribution of is also assumed to be normal.

.

**(10)**……..

.

**(11)**……..

.

Note that the in is the mean of , which is the expression in (8). The joint pdf of and is obtained by multiplying (10) and (11), i.e. . The result is identical to the expression in (4) above, which is equivalent to the joint pdf in (1). Thus assuming the four conditions in Theorem 3 implies that the joint pdf is the bivraiate normal pdf as described in (1). This completes the proof of Theorem 3.

**One More Theorem**

The above discussion shows that there are two ways to define the bivariate normal distribution. One is to define it using the joint pdf (1). The pdf is hard to work with (e.g. it will be hard to evaluate probabilities using the pdf). Theorem 1 shows that the bivariate normal distribution satisfies the properties concerning the conditional distributions of . The other way is to define the bivariate normal distribution using the properties concerning the conditional distributions of (as stated in Theorem 3). We can do so because these properties will lead to the same pdf in (1).

Whenever the random variables and are independent, the covariance is zero and hence the correlation coefficient is zero. The converse is not true. Examples of zero covariance but dependent are given here. However, when and are bivariate normal, zero covariance or zero correlation does imply independence.

*Theorem 4*

Suppose that and have a bivariate normal distribution. Then and are independent random variables if and only if the correlation coefficient is zero.

One direction does not require bivariate normality. As mentioned, if and are independent, . Suppose that and are bivariate normal and that . Then the pdf in (1) becomes the following.

.

……..

The above is a product of the marginal pdf of and the marginal pdf of . Thus the conditional pdf is simply the unconditional pdf . Likewise, the conditional pdf is simply the unconditional pdf . The knowledge of is simply extraneous information.

.

**Examples**

We now examine some examples.

*Example 1*

Consider the bivariate normal distribution with parameters , , , and . The following is the least squares regression line of on .

.

……..

.

This line gives the conditional mean for each . Because the and are positively correlated, the least squares line is increasing – the larger the , the larger the mean of given . The following diagram is the graph of this least squares line.

.

**Figure 1**

.

The solid green line in Figure 1 is the least squares regression line . The vertical dotted line is the unconditional mean of and the horizontal dotted line is the unconditional mean of . Note that the least squares line always passes through the point .

The variance of the conditional distribution is constant regardless of . It is . Then the standard deviation of is 3.

Consider . The conditional distribution of is a normal distribution with mean and standard deviation 3. About 99.7% of the probabilities in a normal distribution are within 3 standard deviations from the mean. Then about 99.7% of the observations for given are expected to be within the interval (49, 67). When sampling from this normal distribution, it is rare to observe data outside of this range.

Another example. Consider . The conditional distribution of is a normal distribution with mean and standard deviation 3. Then about 99.7% of the observations for given are expected to be within the interval (73, 91). When sampling from this normal distribution, it is rare to observe data outside of this range.

As the mean of the normal distribution increases (along the green least squares line), the 99.7% range of the normal distribution moves up. This is illustrated in the following graph.

.

**Figure 2**

The two red lines in Figure 2 have the same slope as the least square line , but one is 9 units above and the other is 9 units below (in terms of vertical distances). Of course, 9 is 3 times the standard deviation of the conditional distribution for given . As equations, the two red lines are and .

For each , observations of the distribution for given fall in the vertical line that goes through the point . About 99.7% of these observations lie in the line segment within the two red lines. Thus the strip formed by the two red lines contains essentially all of the observations of the conditional distributions for . Consequently, the bivariate normal density is concentrated in this strip around the least squares regression line. How narrow the strip is depends on the size of the constant variance of .

We next calculate probabilities suggested by Figure 3 below.

.

**Figure 3**

The two blue horizontal lines, and , are one standard deviation from 70, the mean of . The area of in this horizontal strip is the probability . Since the strip contains probability within one standard deviation, would be around 0.68. Using a TI84+ calculator, this probability is .

Let’s calculate for several x values (30, 40, 50, 60, 70, 80 and 90). This is further illustrated in the following Figure.

.

**Figure 4**

Figure 4 looks very busy, but it is Figure 3 with short vertical bars added for several values (30, 40, 50, 60, 70, 80 and 90). Based on the discussion above, the short vertical bars would be the range where 99.7% of the normal distribution occurs. For any vertical red bar that has a small (or even negligible) intersection with the horizontal strip, formed by the two blue lines and , the probability is small. Based on Figure 4, should be large whereas should be small. With this in mind, the following table shows the probabilities at the indicated x values, all calculated using a TI84+ calculator.

.

**Table 1**

x | mean | st dev | |
---|---|---|---|

30 | 58 | 3 | 0.009815 |

40 | 62 | 3 | 0.15865 |

50 | 66 | 3 | 0.62921 |

60 | 70 | 3 | 0.90442 |

70 | 74 | 3 | 0.62921 |

80 | 78 | 3 | 0.15865 |

90 | 82 | 3 | 0.009815 |

*Example 2*

Consider the same bivariate normal distribution discussed in Example 1. Suppose that for selected values of , we sample the normal distribution four times. Compute the probability for the values of 30, 40, 50, 60, 70, 80 and 90 where is the mean of the 4 sample items.

For each , the mean of given is the same as . However the standard deviation is smaller. It is . The following table shows the normal probabilities, calculated using a TI84+ calculator. The probabilities are shown in the last column for comparison.

.

**Table 2**

x | mean | st dev | ||
---|---|---|---|---|

sample mean | ||||

30 | 58 | 1.5 | 0.0000015323 | 0.009815 |

40 | 62 | 1.5 | 0.02275 | 0.15865 |

50 | 66 | 1.5 | 0.74751 | 0.62921 |

60 | 70 | 1.5 | 0.99914 | 0.90442 |

70 | 74 | 1.5 | 0.74751 | 0.62921 |

80 | 78 | 1.5 | 0.02275 | 0.15865 |

90 | 82 | 1.5 | 0.0000015323 | 0.009815 |

To gain better insight, Figure 5 below shows the narrower strip around the least squares line (Figure 4 is repeated for comparison).

.

**Figure 4 (repeated for comparison)**

.

**Figure 5**

The two yellow lines in Figure 5 are 3 standard deviations from the least squares line. This time the standard deviation is 1.5, which is the standard deviation of . The strip formed by the yellow lines is narrower than the strip in Figure 4. The vertical yellow bars indicate the 99.7% range for the selected normal distributions. As before the two blue horizontal lines are one standard deviation away from the mean of (at 65 and 75). The size of the probability depends on the intersection of the vertical yellow bar and the horizontal strip formed by the two blue lines. At , the vertical yellow bar is entirely inside the horizontal strip, leading to a probability of 0.99914. At and at , the vertical yellow bar is entirely away from the horizontal strip. Thus both and are negligible.

At and at , the vertical yellow bar intersects with the horizontal strip in only a small segment. Thus the probability is small for both these two values. On the other hand, the vertical red bar in Figure 4 intersects with the horizontal strip in a larger segment, leading to a larger .

**The Next Post**

The next post is a further discussion on bivariate normal distribution. Practice problems on bivariate normal distribution are available here.

.

.

.

Dan Ma math

Daniel Ma mathematics

Dan Ma stats

Daniel Ma statistics

Dan Ma statistical

Daniel Ma statistical

2018 – Dan Ma

Calculating bivariate normal probabilities | Probability and Statistics Problem Solve(23:14:55) :[…] post extends the discussion of the bivariate normal distribution started in this post from a companion blog. Practice problems are given in the next […]

LikeLike

More on bivariate normal distribution | Mathematical Statistics(00:00:28) :[…] post is a continuation of the preceding post on bivariate normal distribution. The preceding post gives one characterization of the bivariate normal distribution. This post […]

LikeLike

Practice Problem Set 5 – bivariate normal distribution | Probability and Statistics Problem Solve(00:05:48) :[…] to reinforce the concept of bivariate normal distribution discussed in two posts – one is a detailed introduction to bivariate normal distribution and the other is a further discussion that brings out more […]

LikeLike