[1] introduced a class of multivariate mixtures of Erlang distributions or multivariate Erlang mixtures and showed its good application in insurance. A multivariate Erlang mixture is defined as a random vector $ \textbf{X}=(X_1,X_2,\cdots,X_n) $ with probability density function (pdf)
where $ \textbf{x}=(x_1,\cdots,x_n) $, $ \textbf{m}=(m_1,\cdots,m_n) $, $ \alpha_{\textbf{m}} $ are the mixing weights with each $ \alpha_{\textbf{m}}\geq 0 $ and $ \sum\limits_{m_1=1}^\infty\cdots\sum\limits_{m_n=1}^\infty \alpha_{\textbf{m}}=1 $ and $ \beta $ is the common rate parameter ($ \theta=\frac{1}{\beta} $ is called the scale parameter). The mixing weights $ \alpha_{\textbf{m}} $ can be viewed as a joint probability function of a multivariate counting random vector $ \textbf{N}=(N_1,N_2,\cdots,N_n) $, that is,
The expectation-maximum (EM) algorithm is always used to estimate the parameters of a mixture model. A standard EM algorithm and some more modified versions for parameter estimation of Erlang mixtures can be seen in [1-3]. The class of Erlang mixtures are widely used in insurance, reliability theory and many other fields, see [4-8] and the references therein.
Let $ X_{1:n}\leq X_{2:n}\leq\cdots\leq X_{n:n} $ be the order statistics. The use of order statistics is an extremely important subject in a wide range of statistical applications. Some primary results about the order statistics are given under the assumption that the random variables are identically distributed or independent. The studies can be seen in [9-11] and the references therein. The study on order statistics of Erlang mixtures also can be found in recent years. For example, [12] showed that the order statistics of an independent set of mixed Erlang random variables belong to the same class of Erlang mixtures. More studies can be seen in [13-15].
In this paper, we consider a set of dependent and non-identically distributed random variables with the joint distribution being a multivariate Erlang mixture. [15] derived the distributions of the minimum $ X_{1:n} $ and the maximum $ X_{n:n} $ and showed that both distributions belong to the class of univariate Erlang mixtures. The purpose of this paper is to generalize the results in [15] and derive the distributions of all order statistics. Furthermore, we show that the distribution of any $ r $th ($ r=1,2,\cdots,n $) order statistic has the form of univariate Erlang mixtures.
We apply the class of multivariate Erlang mixtures to multiple lifetime area. Traditional actuarial theory of multiple life insurance often assumes independence among the future lifetimes, see, for example [16]. However, extensive research over the past years suggests otherwise, see [17, 18] and the references therein. The class of multivariate Erlang mixtures has been showed to flexibly capture the dependency among the variables, making it a reasonable choice. Another common tool used to describe the dependency in multivariate context is the copula method such as in [19]. Compared with the copula method, a multivariate Erlang mixture is more easier to deal with high dimensional data. The results in this paper show that we can get explicit expressions for some important quantities which can improve the accuracy.
This paper is organized as follows. In Section 2, we derive the density functions of the order statistics of a set of variables from a multivariate Erlang mixture and show that the order statistics are still of the form of Erlang mixtures. In Section 3, we apply the multivariate Erlang mixtures to the multiple lifetime theory and explicit results are given for some common quantities. In Section 4, a conclusion is made and some details about the method proposed in this paper are discussed.
For notational simplicity, we denote an Erlang density with shape parameter $ m $ and rate parameter $ \beta $ as
An Erlang distribution is in fact a gamma distribution with a positive integer shape parameter. The distribution function (df) is given by
and the survival function $ \overline{F}(x|m,\beta)=1-F(x|m,\beta) $.
Let $ f_{r:n}(x) $ and $ F_{r:n}(x) $ be the density function and the distribution function of the $ r $th order statistic in a sample of size $ n $. It is clear that for $ r=1,\cdots,n-1, $ we have
where $ \sum_{S_r} $ denotes the sum over permutations $ \{s_1, \cdots, s_n\} $ of $ \{1,\cdots, n\} $ with $ s_1 < \cdots < s_r $ and $ s_{r+1}<\cdots<s_n $. For notational convenience, let
and
Then we can express the density of the $ r $th order statistic as
In this section, we will prove that any $ r $th ($ 1\leq r\leq n $) order statistic has the form of univariate Erlang mixtures. The derivation will be a little complex. We present our main results in this section and the proof can be seen in the appendix part.
Lemma 2.1 Suppose an $ n $-variate random vector $ \textbf{X}=(X_1,X_2,\cdots,X_n) $ has joint probability density function of form (1.1), the density function $ f_{[r]:n} (x),r=1,\cdots,n $ can be expressed as
where
and $ \sum_{S_r} $ denotes the sum over permutations $ \{s_1, \cdots, s_n\} $ of $ \{1,\cdots, n\} $ with $ s_1 < \cdots < s_r $, $ s_{r+1}<\cdots<s_n $, the notation $ I_{\{j\leq r\}} $ is an indicator function with a value of 1 when $ j\leq r $ and otherwise 0.
Remark The density function in Lemma 2.1 also has form of univariate Erlang mixtures. However, this density function is a combination of Erlang distributions rather than a mixture of Erlang distributions because the coefficients $ \widetilde{\alpha}_\textbf{m}([r],n), \textbf{m}=(m_1,m_2,\cdots,m_n),r=1,\cdots,n $ are not all positive. The density function can be rewritten as
Theorem 2.2 Suppose an $ n $-variate random vector $ \textbf{X}=(X_1,X_2,\cdots,X_n) $ has joint probability density function of form (1.1), the density function of the $ r $th ($ r=1,\cdots,n $) order statistic is given by
Theorem 2.3 Suppose an $ n $-variate random vector $ \textbf{X}=(X_1,X_2,\cdots,X_n) $ has joint probability density function of form (1.1), the distribution of the $ r $th ($ r=1,\cdots,n $) order statistic is a univariate Erlang mixture and the density function can be rewritten as
Now we assume that the marginal random variables $ X_1 ,\cdots ,X_n $ are mutually independent. According to Corollary 2.3 in [1], the counting random variables $ N_1 ,\cdots ,N_n $ are also mutually independent. Hence, considering the relationship between the mixing weights in Erlang mixtures and the corresponding counting random variables, the coefficients in Theorem 2.2 in the independent case can be written as
[12] studied the order statistics of independent Erlang mixtures and our result is consistent with their result.
Example 1 Consider a trivariate Erlang mixture with joint density function given by
In this example, the positive mixing weights are $ \alpha_{(2,5,10)}=0.2, \alpha_{(4,8,2)}=0.3, \alpha_{(1,3,5)}=0.5 $. The coefficients $ \widetilde{\alpha}_\textbf{m}(r,n) $ in (2.7) are much simpler in form than the mixing weights $ \alpha_m(r,n) $ in (2.8), hence we first obtain the density functions of form (2.7) and then transform to form of univariate Erlang mixtures. Take the first order statistic for example and the parameters are given in Table 1.
From the results in Table 1, we obtain the density function of the first order statistic in the form of Erlang mixtures according to Theorem 2.3. Similarly, the parameters of the second order statistic and the third statistic are given in Table 2. The survival curves for the order statistics are shown in Figure 1.
In this section, we consider the payment of an insurance benefit occurs at the moment of death. The theory for analysis of financial benefit based on the death of a single life is well developed and the theory can be extended to the case involving several lives, see [20], [21] and the references therein. Order statistics are particularly relevant in various contexts including those involving multiple lives in life contingency.
In this section, we consider an insurance contract consisting of $ n $ dependent and non-identically distributed lives $ (X_1,X_2,\cdots,X_n) $. Denote $ (x_1),(x_2),\cdots,(x_n) $ be the ages of the members at the start of the contract and let $ T(x_i)=(X_i-x_i|X_i>x_i), i=1,2,\cdots,n $ be the future lifetime of $ (x_i) $.
First, we recall some important definitions in multiple life theory. The $ r $-survivor status of $ n $ lives $ (x_1),(x_2),\cdots,(x_n) $ exists at least $ r $ of the $ n $ lives survivor, denoted by $ ( \begin{array}{c} r:n\\\hline \textbf{x} \end{array}). $ In other words, the $ r $-survivor status of $ n $ lives fails upon the $ (n-r+1) $th death among the $ n $ lives. When $ r=n $, the $ n $-survivor status is also called joint-life status and $ r=1 $, the 1-survivor status called the last-survivor status. Obviously, we can note that the future lifetime of $ r $-survivor status is exactly the $ (n-r+1) $th order statistic of the $ n $ future lifetimes $ T(x_1),T(x_2),\cdots,T(x_n) $. For purpose of analysis, we define another status called the $ [r] $-deferred survivor status which exists while exactly $ r $ of the $ n $ lives survive. We denote the status by $ ( \begin{array}{c} [r]:n\\\hline \textbf{x} \end{array}). $
It should be noted that if $ n $ lives has pdf of form (1.1), the joint distribution of the future lifetimes $ \{T(x_1),T(x_2),\cdots,T(x_n)\} $ is still a multivariate Erlang mixture, see [1]. Hence, the joint pdf of future lifetimes $ \{T(x_1),T(x_2),\cdots,T(x_n)\} $ still has the density function of form (1.1).
The notation $ {_t}p_{\frac{r:n}{\textbf{x}}} $ represents the survival function of the future lifetime of $ r $-survivor status. Then according to Theorem 2.3, we have
Consider an insurance policy which pays a unit on the $ (n-r+1) $th death among the $ n $ lives, the actuarial present value denoted by $ \overline{A}_{\frac{r:n}{\textbf{x}}} $ will be
where $ \mu_{\frac{r:n}{\textbf{x}}}=(-\frac{d}{dt}{_t}p_{\frac{r:n}{\textbf{x}}})/{_t}p_{\frac{r:n}{\textbf{x}}} $, and $ \delta $ is a constant force of interest, we have $ v^t=e^{-\delta t} $. The symbols $ \delta $ and $ v $ represent the same meaning in the following part of this section.
For a continuous annuity of 1 payable annually as long as at least $ r $ of the $ n $ lives survive, the actuarial present value denoted by $ \overline{a}_{\frac{r:n}{\textbf{x}}} $ is
Similarly, the notation $ {_t}p_{\frac{[r]:n}{\textbf{x}}} $ represents the survival function of the future lifetime of $ [r] $-deferred survivor status. According to the definition of $ [r] $-deferred survivor status, the density of the status can be seen in Lemma 2.1. Using the form of (2.6), we have
Consider a continuous annuity which is payable annually as long as any of $ (x_1),(x_2),\cdots, $ $ (x_n) $ are alive. The annual payment is $ c_r $ when there are exactly $ r $ of the $ n $ lives survive. Then the actuarial present value is $ \sum^n\limits_{r=1} c_r \cdot \overline{a}_{\frac{[r]:n}{\textbf{x}}} $ and the term $ \overline{a}_{\frac{[r]:n}{\textbf{x}}} $ can be calculated as
The actuarial present value can also be calculated by the formula $ \overline{A}_{\frac{[r]:n}{\textbf{x}}}+\delta \overline{a}_{\frac{[r]:n}{\textbf{x}}}=1 $, where
Example 2 (Example 1 continued): Suppose the future lifetimes of 3 lives $ (x_1),(x_2),(x_3) $ in an insurance policy have the joint density function (2.9) shown in Example 1. We summarise some important quantities mentioned in this section in Table 3. The values in the last three columns are obtained by setting $ t=5,\delta=5\% $ and the mixing weights can be seen in Table 2.
In this paper, we have studied the order statistics for the class of multivariate Erlang mixtures. We derive the distribution of any order statistic of some dependent variables coming from multivariate Erlang mixtures without the assumption of independence. Furthermore, we have shown that the order statistics are still of the form of univariate Erlang mixtures. This desirable property enables us to deal with multivariable issues more efficiently. For this purpose, we apply the multivariate Erlang mixtures to multiple lifetime theory. One of the advantages is that we can get explicit expressions for some important quantities while numerical methods may be used to calculate the quantities for a general distribution.
Proof Without loss of generality, we set $ S_r=\{1,2,\cdots,r,r+1,\cdots,n\} $, then
the last equation holds due to the fact $ P(N_i=0)=0 $ and
Then take the derivative with respect to $ x $, we have
Obviously, we have,
where the notation $ e_j $ represents an $ n $-length vector with the $ j $th entry equals 1 and others 0.
We repeat the procedure for all summations over $ S_r=\{s_1,s_2,\cdots,s_n\} $ of $ \{1,2,\cdots,n\} $ with conditions $ s_1<\cdots<s_r $ and $ s_{r+1}<\cdots<s_n $ and the result holds.
Proof We apply the mathematical deduction method.
(1) Let $ r=n $, according to the results in [15], it's obviously true.
(2) Assume that the result holds for any $ (r+1) $th order statistic, namely, we have
(3) To prove that the result also holds for $ r $th order statistic, from (2.4), we need to prove
Compare the left hand with the second term on the right hand, we further simplify the problem by proving
According the definition of notation $ H_r(\textbf{m}) $, we have
It means the conclusion also holds for $ r $th order statistic, then we finish the proof.