## An Introduction to Order Statistics

25 01 2018

This is an introduction to order statistics, focusing on basic notions and calculations.

Suppose that $X_1,X_2,\cdots,X_n$ is a random sample drawn from a continuous distribution with cumulative distribution function $F(x)$ and density function $f(x)$. We can rank the sample items from smallest to the largest. For convenience, the smallest sample item is denoted by $X_{(1)}$, the second smallest sample item is denoted by $X_{(2)}$ and so on. The largest sample item is then $X_{(n)}$. These ordered values are called the order statistics corresponding to the sample $X_1,X_2,\cdots,X_n$. Because the items in the sample are random, the order statistics are random variables too. The goal of this post is to discuss the probability distributions of the order statistics both individually and jointly.

We only focus on samples drawn from a continuous distribution in order to avoid the situation that two sample items are equaled (i.e. a tie). Thus we assume sample items are all distinct and that the order statistics are increasing, i.e. $X_{(1)}.

As mentioned, $X_{(1)}$ is the minimum order statistic and $X_{(n)}$ is the maximum order statistic. In general, $X_{(j)}$ is called the $j$th order statistic where $j=1,2,\cdots,n$.

The Joint Density Function of Order Statistics

Given the population density $f(x)$, we can derive the joint density function of the order statistics $X_{(1)},X_{(2)},\cdots,X_{(n)}$.

Fact 1
Suppose that $X_1,X_2,\cdots,X_n$ is a random sample drawn from a distribution with density function $f(x)$. Then the following is the joint density function of the order statistics $X_{(1)},X_{(2)},\cdots,X_{(n)}$.

……$f_{X_{(1)},X_{(2)},\cdots,X_{(n)}}(x_1,x_2,\cdots,x_n)= n! \ f(x_1) \ f(x_2) \cdots f(x_n)$

The support of the joint density is the $n$-dimensional region $x_1.

For any point $(x_1,x_2,\cdots,x_n)$ in the support, any permutation of the numbers $x_1,x_2,\cdots,x_n$ would lead to the same ordered values. There are $n!$ many such permutations. Furthermore, the density of any one such permutation is $\ f(x_1) \ f(x_2) \cdots f(x_n)$. Thus Fact 1 follows.

With the joint density established, a great deal of information about order statistics can be derived from it. For example, we can integrate $f_{X_{(1)},X_{(2)},\cdots,X_{(n)}}(x_1,x_2,\cdots,x_n)$ to sum out all the variables except one, thus producing the marginal density function of an order statistic $X_{(j)}$. We can sum out all the variables except two, thus producing the joint density of two order statistics $X_{(i)}$ and $X_{(j)}$. We can also determine the probability distribution of the range of the sample $R=X_{(n)}-X_{(1)}$. The rest of the post is to present these and other basic calculations.

Order statistics can of course be viewed as a topic in probability. Order statistics are also important in statistics since they can be applied in statistical inference. For example, they can be used to determine simple statistics such as sample median (and other sample percentiles) and the sample range. Order statistics are often employed in non-parametric inference procedures.

The Distribution of an Order Statistic

We now discuss the distribution of a single order statistic. As mentioned, the density for the $j$th order statistic $X_{(j)}$ can be obtained by integrating ………………………….. $f_{X_{(1)},X_{(2)},\cdots,X_{(n)}}(x_1,x_2,\cdots,x_n)$ to sum out $x_k$ for all $k \ne j$. However, there is a more direct and natural way of deriving the CDF and the density function of $X_{(j)}$.

Fact 2
Suppose that $X_1,X_2,\cdots,X_n$ is a random sample drawn from a distribution with CDF $F(x)$ and density function $f(x)$. Then the following is the cumulative distribution function (CDF) of the order statistics $X_{(j)}$ where $j=1,2,\cdots,n$.

……$\displaystyle F_{X_{(j)}}(x)=P(X_{(j)} \le x)=\sum \limits_{k=j}^n \frac{n!}{k! (n-k)!} [F(x)]^k \ [1-F(x)]^{n-k}$

The support of the CDF is identical to the support of the population CDF $F(x)$.

Fact 2 is based on a binomial argument. The event $X_{(j)} \le x$ occurs when at least $j$ of the sample items $\le x$. When observing each sample item $X_i$, focus on two distinct outcomes: $X_i \le x$ or $X_i>x$. Consider the former as a success and the probability of a success is $F(x)$. Thus observing the random sample $X_1,X_2,\cdots,X_n$ is like performing a series of $n$ independent Bernoulli trials. Then $P(X_{(j)} \le x)$ is the probability of having $j$ or more successes in the binomial experiment.

The density function of $X_{(j)}$ can be derived by taking derivative of the CDF.

Fact 3
Suppose that $X_1,X_2,\cdots,X_n$ is a random sample drawn from a distribution with CDF $F(x)$ and density function $f(x)$. Then the following is the density function of the order statistics $X_{(j)}$ where $j=1,2,\cdots,n$.

……$\displaystyle f_{X_{(j)}}(x)=\frac{n!}{(j-1)! \ 1! \ (n-j)!} \ [F(x)]^{j-1} \ f(x) \ [1-F(x)]^{n-j}$

The support of the density function is identical to the support of the population density $f(x)$.

Mathematically, the density function $f_{X_{(j)}}(x)$ can be derived from the CDF in Fact 2. However, there is a clear and natural way to view the density function in Fact 3. It can be viewed as a multinomial probability. Here’s the thought process for this idea. Think of the density function $f_{X_{(j)}}(x)$ as the probability that the $j$th order statistic $X_{(j)}$ is right around $x$. So there must be $j-1$ sample items less than $x$, exactly one sample item at $x$ and $n-j$ sample items above $x$. One way this can happen is:

……$[F(x)]^{j-1} \ f(x) \ [1-F(x)]^{n-j}$

The first term in the above expression is the probability that $j-1$ sample terms are less than $x$. The second term is the probability that one sample item is right around $x$. The third term is the probability that $n-j$ sample items are above $x$. But this is only one way. To capture all possibilities, we multiply it by the multinomial coefficient. The result is the density function indicated in Fact 3.

When the order statistic $X_{(j)}$ is used as an estimator for a parameter $\theta$ of the population distribution, the CDF in Fact 2 and the density function in Fact 3 give us information on the sampling distribution of the estimator, potentially helping us determine the goodness of the estimator.

The Joint Distribution of Two Order Statistics

When we are only interested in the joint behavior of two order statistics, we can derive the joint density $f_{X_{i},X_{j}}(x,y)$. Mathematically, the joint density can be derived by integrating the joint density in Fact 1 to sum out all variables except for $x_i$ and $x_j$. However, the joint density function can be derived (and remembered) using a heuristic argument similar to the one in the preceding section for $f_{X_{j}}(x)$.

Fact 4
Suppose that $X_1,X_2,\cdots,X_n$ is a random sample drawn from a distribution with CDF $F(x)$ and density function $f(x)$. Then the following is the joint density function of the order statistics $X_{(i)}$ and $X_{(j)}$ where $i and $i,j=1,2,\cdots,n$.

……\displaystyle \begin{aligned} f_{X_{(i)},X_{(j)}}(x,y)&=C \times [F(x)]^{i-1} \times f(x) \times [F(y)-F(x)]^{j-i-1} \\&\times f(y) \ [1-F(x)]^{n-j} \end{aligned}

where $C$ is the multinomial coefficient determined by

……$\displaystyle C=\frac{n!}{(i-1)! \ 1! \ (j-i-1)! \ 1! \ (n-j)!}$

The support of the density function is the region $x in the two-dimensional $xy$-plane.

As mentioned, the joint density function can be derived using a heuristic argument, or memorization scheme, similar to the one in the preceding section. In this scheme, the joint density can be viewed as a multinomial probability of 5 different categories – $X, $X \approx x$, $x, $X \approx y$ and $y. When the $n$ sample items are observed, we are interested in all the scenarios such that the $i$th ordered item is in the category $X \approx x$ and the $j$th ordered item is in the category $X \approx y$. Count the number of items that fall into each category and multiply the respective probabilities. Of course, do not forget to multiply with the multinomial coefficient.

The Range of a Sample

Another distribution that can be derived from order statistics is that of the range, which is defined to be $R=X_{(n)}-X_{(1)}$, i.e. the maximum statistic minus the minimum statistic. Mathematically, the CDF $F_R(r)=P(R \le r)$ can be derived by integrating the joint density $f_{X_{1},X_{n}}(x,y)$ over the region $y-x \le r$. From this idea, we can derive a useful form of $F_R(r)=P(R \le r)$. First, the following is the joint density function $f_{X_{1},X_{n}}(x,y)$.

……$\displaystyle f_{X_{1},X_{n}}(x,y)=\frac{n!}{(n-2)!} \ f(x) \ [F(y)-F(x)]^{n-2} \ f(y) \ \ \ \ \ x

Consider the following derivation.

……\displaystyle \begin{aligned} P(R \le r)&=\iint_{y-x \le r} f_{X_{1},X_{n}}(x,y) \ dy \ dx\\&=\int_{-\infty}^\infty \int_{x}^{x+r} \frac{n!}{(n-2)!} \ [F(y)-F(x)]^{n-2} \ f(x) \ f(y) \ dy \ dx \end{aligned}

The inner integral can be evaluated by a change of variable with $u=F(y)-F(x)$ and $du=f(y) \ dy$.

……\displaystyle \begin{aligned} \int_{x}^{x+r} [F(y)-F(x)]^{n-2} \ f(y) \ dy \ dx&=\int_0^{F(x+r)-F(x)} u^{n-2} \ du \\&=\frac{1}{n-1} [F(x+r)-F(x)]^{n-1} \end{aligned}

With the above integral, we have the following fact.

Fact 5
Suppose that $X_1,X_2,\cdots,X_n$ is a random sample drawn from a distribution with CDF $F(x)$ and density function $f(x)$. Then the following integral gives the CDF of the range $R=X_{(n)}-X_{(1)}$.

……$\displaystyle F_R(r)=P(R \le r)=\int_{-\infty}^\infty n \ [F(x+r)-F(x)]^{n-1} \ f(x) \ dx$

where $r$ belongs to the support of the distribution for $X$.

After $F_R(r)$ is evaluated, the density function $f_R(r)$ can be obtained by taking the derivative of $F_R(r)$.

One comment about the integral in Fact 5. If the support of the distribution of $X$ is an interval of finite length, the integral may have to be split into two integrals. The CDF $F(x+r)$ may become 1 at some $x$ values. If that is the case, one integral reflects $F(x+r) \le 1$ and a second integral has 1 in place of $F(x+r)$. See the example below. If the support of $X$ is unbounded, e.g. like that of the exponential distribution, the integral does not have to be split up.

Examples

It is helpful to go through examples demonstrating the calculations discussed here. More examples are shown in the next post. In the remainder of the post, we demonstrate how to set up the density functions.

Example 1
Suppose that the sample $X_1,X_2,X_3,X_4,X_5,X_6,X_7$ is drawn from a uniform distribution on the interval $(0,2)$. Write the density functions for the following distributions.

• The joint distribution of $X_{(1)},\cdots,X_{(7)}$.
• The median $X_{(4)}$.
• The joint distribution of $X_{(1)}$ and $X_{(7)}$.
• The joint distribution of $X_{(3)}$ and $X_{(5)}$.
• The range $R=X_{(n)}-X_{(1)}$.

All the density functions are derived from $F(x)=\frac{x}{2}$ and $f(x)=\frac{1}{2}$, the CDF and density function of the uniform distribution, respectively. The following gives the first two density functions.

……\displaystyle \begin{aligned} f_{X_{(1)},\cdots,X_{(7)}}(x_1,\cdots,x_7)&=7! \ \biggl(\frac{1}{2} \biggr)^7=\frac{315}{8} \\&\ \ \ \ \ \ 0

……\displaystyle \begin{aligned} f_{X_{(4)}}(x)&=\frac{7!}{3! \ 1! \ 3!} \biggl[\frac{x}{2} \biggr]^3 \ \frac{1}{2} \ \biggl[1-\frac{x}{2} \biggr]^3 \\&=\frac{140}{2^7} x^3 \ (2-x)^3 \\&=\frac{140}{128} (8x^3-12x^4+6x^5-x^6) \ \ \ \ \ \ 0

The first one, the joint density of the 7 order statistics, is obtained based on Fact 1. The second one, the density function of the 4th order statistic, which is also the sample median, is obtained based on Fact 3.

The following gives the next two density functions.

……\displaystyle \begin{aligned} f_{X_{(1)},X_{(7)}}(x,y)&=\frac{7!}{5!} \ \frac{1}{2} \ \biggl[ \frac{y}{2}-\frac{x}{2} \biggr]^5 \ \frac{1}{2} \\&=\frac{42}{2^7} \ (y-x)^5 \ \ \ \ \ \ 0

……\displaystyle \begin{aligned} f_{X_{(3)},X_{(5)}}(x,y)&=\frac{7!}{2! \ 2!} \ \biggl[\frac{x}{2} \biggr]^2 \ \frac{1}{2} \ \biggl[ \frac{y}{2}-\frac{x}{2} \biggr] \ \frac{1}{2} \ \biggl[1-\frac{y}{2} \biggr]^2 \\&=\frac{1260}{2^7} \ x^2 \ (y-x) \ (2-y)^2 \ \ \ \ \ \ 0

The function $f_{X_{(1)},X_{(7)}}(x,y)$ is the joint density function of the minimum statistic and the maximum statistic and is obtained based on Fact 4. The function $f_{X_{(3)},X_{(5)}}(x,y)$ is the joint density function of the 3rd order statistic and the 5th order statistic.

For the sample range $R=X_{(7)}-X_{(1)}$, we first determine its CDF by evaluating the integral indicated in Fact 5.

……$\displaystyle F_R(r)=\int_0^2 7 \ [ F(x+r)-F(x) ]^6 \ f(x) \ dx$

Because the CDF $F(x+r)$ can be 1 after some point, we need to split this integrals into two. The cutoff point is $2-r$.

……\displaystyle \begin{aligned} F_R(r)&=\int_0^{2-r} 7 \ \biggl[\frac{x+r}{2}-\frac{x}{2} \biggr]^6 \ \frac{1}{2} \ dx+\int_{2-r}^2 7 \ \biggl[1-\frac{x}{2} \biggr]^6 \ \frac{1}{2} \ dx \\&=\frac{7}{2^7} \ r^6 \ (2-r)+\frac{1}{2^7} \ r^7 \\&=\frac{1}{2^7} (14 r^6 - 6 r^7) \ \ \ \ \ \ \ 0

……$\displaystyle f_R(r)=\frac{1}{2^7} \ (84 r^5-42 r^6) \ \ \ \ \ \ \ \ 0

The density function of the sample range is the derivative of its CDF. With the CDF and density function known, other distributional quantities can be derived.

Additional examples are shown in the next post.

Practice problems on order statistics are found in this companion blog.

Dan Ma math

Daniel Ma mathematics

Dan Ma stats

Daniel Ma statistics

Dan Ma statistical

Daniel Ma statistical

$\copyright$ 2018 – Dan Ma

### 4 responses

28 01 2018

[…] The preceding post is an introduction to order statistics. This post gives examples demonstrating the calculation discussed in the preceding post. […]

Like

28 01 2018

[…] The preceding post is an introduction to order statistics. This post gives examples demonstrating the calculation discussed in the preceding post. […]

Like

28 01 2018

[…] The first blog post from the companion blog is an introduction to order statistics. That post presents the probability distributions of the order statistics, both individually and jointly. The second post presents basic examples illustrating how to calculate the order statistics. […]

Like

30 01 2018

[…] two posts preceding this one focus on the topics of order statistics. One is an introduction. The other post gives examples demonstrating how to perform the calculation. A post in a companion […]

Like