The preceding post is an introduction to order statistics. This post gives examples demonstrating the calculation discussed in the preceding post.
We first work a basic example with a small sample size that walks through all the calculations discussed in the preceding post. More examples are to follow the basic example.
Example 1 – A Basic Example
Suppose that the population distribution is the uniform distribution on the interval . The CDF and the density function of this population are:
-
……
……
, respectively. Draw a random sample from this uniform distribution. The resulting order statistics are , with being the sample minimum, being the sample median and being the sample maximum. The joint density function of these 3 order statistics is
-
……
The support of this joint density is the 3-dimensional region .
The goal here is to derive the density functions of these order statistics and then calculate basic distributional quantities of these order statistics. This is a long example that is presented in five parts – Example 1a through Example 1e.
Example 1a
This example focuses on the minimum statistic.
- Compute the mean and variance of the first order statistic .
- Compute the conditional probability .
The key step is to derive the density function and the CDF for . Then the mean and variance are obtained by evaluating the appropriate integrals. The conditional probability is obtained by evaluating the CDF. The following gives the density function.
-
……
The following integrals give the mean and variance of .
……
……
……
The following gives the CDF of .
-
……
The following gives the conditional probability.
……
Example 1b
This example focuses on the sample median.
- Compute the mean and variance of the second order statistic .
- Compute the conditional probability .
As in Example 1a, the key step is to find the density function and the CDF.
-
……
……
As in Example 1a, evaluating the appropriate integrals and evaluating the CDF appropriately give the desired answers.
-
……
……
……
……
Example 1c
This example focuses on the sample maximum.
- Compute the mean and variance of the third order statistic .
- Compute the conditional probability .
The following gives the density function and the CDF of the sample maximum.
-
……
……
The following gives the desired results after evaluating the integrals and the CDF.
-
……
……
……
……
Example 1d
This example focuses on the joint behavior between the sample minimum and the sample maximum.
- Evaluate the covariance of and .
- Evaluate the correlation coefficient of and .
- Compute the conditional probability .
- Evaluate the conditional mean .
All the calculations in this example are based on the joint density function of the sample minimum and sample maximum.
……
The covariance is defined by . The following shows the necessary calculations.
-
……
……
Using the mean and variances from Example 1a and Example 1c, the following shows the calculation for the correlation coefficient.
-
……
There is a positive correlation between the sample minimum and the sample maximum. This confirms what the natural intuitive idea about the sample minimum and sample maximum. For example, when sample minimum is large, the sample maximum will have to be large as well. Likewise, when sample maximum is small, the sample minimum is also small. However, the correlation is moderate.
To evaluate the conditional probability , it is a matter of integrating the joint density over an appropriate region. the following shows how.
-
……
……
……
Based on the CDF in Example 1a, . The following is the conditional density .
……
The conditional mean is evaluated by using the conditional density.
-
……
From Example 1c, we see that the unconditional mean of is . With the additional information that , the expected value of is now higher at 3.5.
Example 1e
This example focuses on the sample range .
- Evaluate the CDF and the density function of the sample range .
- Determine the mean and variance of the sample range .
- Evaluate the conditional mean .
The CDF of the sample range can be derived by evaluating the following integral (see Fact 5 in the preceding post).
-
……
Because the support of the uniform distribution on , the CDF is 1 when or . Thus the integral should be split into 2 integrals. The limits of the first integral are from 0 to . The second integral goes from to 4.
-
……
The density function is obtained by differentiating the CDF. The following shows the remaining calculations.
-
……
……
……
……
……
Observe that the unconditional mean of the sample range is 2 while the mean is 2.75 given that the sample range is larger than 2.
More Examples
Example 1 is a long example demonstrating all the basic calculations with sampling from a uniform distribution. The next several examples involves sampling from an exponential distribution.
Example 2
Let be a random sample drawn from a population that has an exponential distribution with mean where . Consider the order statistics . Determine the mean and variance for where .
The density function and the CDF of the exponential population are:
-
……
……
The following gives the density functions of the three order statistics.
-
……
……
……
Note that the density is that of the exponential distribution with mean . Since the variance of an exponential distribution is the square of its mean, we have the following information.
-
……
……
Note that both and consist of sums of exponential densities times constants. For example, is three times an exponential density minus two times another exponential density. Thus its mean and second moment can be obtained by multiplying the same constants with exponential mean and second moment.
-
……
……
……
Using the same thought process, the following gives the mean and variance of .
-
……
……
……
We make additional comments about these order statistics in the section below called “A Comparison of Estimators”.
Example 3
This is a continuation of Example 2. Consider the sample range . Evaluate the mean and variance of the sample range.
The CDF of the sample range can be derived by evaluating the following integral (see Fact 5 in the preceding post).
-
……
Because the support of the exponential distribution is unbounded above, there is no need to split the integral.
-
……
Differentiating the CDF producing the density function. The following also shows the mean and variance of the sample range.
……
……
……
……
Example 4
This is a continuation of Example 2 and Example 3. Use the sample range to evaluate the the covariance and the correlation coefficient of and .
The covariance of and is defined by
-
……
Computing requires the joint distribution of and . Because , the following gives an alternative way.
-
……
All the variance terms are easily accessible (and have been calculated). Thus we can solve for the covariance term.
-
……
The following gives the correlation coefficient.
-
……
The correlation between the sample minimum and sample maximum is quite moderate.
Example 5
Examples 2, 3 and 4 are based on a small sample size sampling from the exponential population. We now look at sampling from exponential distribution in general, i.e. the sample size is arbitrary. Let be a sample drawn from an exponential population with mean . Let’s examine the sample range .
As shown in Example 3, the CDF of the sample range is obtained by evaluating the following integral.
-
……
This integral is quite similar to the one in Example 3 and is evaluated in the same manner.
-
……
Thus the CDF of the sample range when sampling from an exponential population is simply the CDF of the exponential population raised to the sample size less one. Interestingly the CDF of the sample maximum when sampling from exponential distribution looks similar, except that the power is the sample size .
-
……
Of course, the distributional form may look the same, the sample range and the sample maximum have different distributions.
A Comparison of Estimators
We close out the post by commenting on Example 2. Recall that Example 2 focuses on the means and variances of the three order statistics when sampling from an exponential distribution with the sample size being 3. We now compare these three order statistics with the sample mean, which in this case is the sum of the three sample items divided by 3.
-
……
Of course, the sample mean is a natural candidate for an estimator of the exponential population mean . The three order statistics , and can also be used as estimators of the parameter . For example, a case can be made that we use the sample median as an estimator of the population mean. How good are the order statistics in comparison with the sample mean? Note that the mean of is , the parameter that the estimator is trying to estimate. Here’s the mean and variance of .
-
……
……
The sample mean as an estimator of has a very desirable property, i.e. on average the estimator is correct. There is no systematic bias. The sample mean never systematically over estimates or under estimates . The estimator is said to be an unbiased estimator of the parameter . How about the order statistics? Are these three order statistics unbiased estimators of ?
The following table lists out the means and variances of , , and .
Statistic | …… | …… | …… | …… | Mean | …… | …… | …… | …… | Variance |
---|---|---|---|---|---|---|---|---|---|---|
…… | …… | …… | …… | …… | …… | …… | …… | |||
…… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… |
…… | …… | …… | …… | …… | …… | …… | …… | |||
…… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… |
…… | …… | …… | …… | …… | …… | …… | …… | |||
…… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… |
…… | …… | …… | …… | …… | …… | …… | …… |
The sample minimum systematically under estimates the population mean. On average it is one third of the target. The sample median also under estimates the population mean, though the under estimation is smaller. On average it is 5/6 of the target. The sample maximum over estimates the population mean. On average it is 11/6 = 1.83 of the target. So it over estimates by about 83%.
Thus the three order statistics are not so desirable as estimators of the population mean of an exponential population as they are biased estimators of the population mean . However, they can be remedied by multiplying with appropriate constants so that the expected value is .
Statistic | …… | …… | …… | …… | Mean | …… | …… | …… | …… | Variance |
---|---|---|---|---|---|---|---|---|---|---|
…… | …… | …… | …… | …… | …… | …… | …… | |||
…… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… |
…… | …… | …… | …… | …… | …… | …… | …… | |||
…… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… |
…… | …… | …… | …… | …… | …… | …… | …… | |||
…… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… |
…… | …… | …… | …… | …… | …… | …… | …… |
The above table shows that all 4 statistics (or estimators) have the same expected value. On average they are correct. Now that we have four unbiased estimators of the same parameter , how do we choose the best one?
When the competing estimators are all unbiased estimators of the same target parameter, what is another property that can distinguish among the seemingly similar estimators? When the estimators are all unbiased, we would prefer the estimator that has the smallest variance. This is because using an estimator with a smaller variance guarantees that in repeated sampling a higher proportion of the values of the estimator will be close to the target parameter. Thus, the estimator with the smallest variance is the one that will more likely to produce good estimates. In addition to unbiasedness, we would prefer the variance of the distribution of the estimator to have a variance that is as small as possible.
Let’s look at the variance of these four estimators. The last table shows that the sample mean is the one with the smallest variance (among these four). The one with the largest variance is . So among these 4 estimators of the exponential population mean , the sample mean is the one that is likely to be closer to the true mean even though they are all unbiased.
Even though the order statistics are not good candidate of the exponential population mean, there are situations where order statistics are good candidates for statistical estimation. The examples of order statistics in this post are also convenient examples for demonstrating the notions of unbiased estimators and the notion that smaller variance is better in comparing two estimators that are otherwise similar.
Practice problems on order statistics are found in this companion blog.
Dan Ma math
Daniel Ma mathematics
Dan Ma stats
Daniel Ma statistics
Dan Ma statistical
Daniel Ma statistical
2018 – Dan Ma
Pingback: Estimating percentiles using order statistics | Mathematical Statistics