Geometric distribution is an important kind of probability distribution, which has been widely used in practice. According to the actual situation, geometric distribution has many important generalizations. For example, Muwafi(1980) studied the $ k $ order geometric distribution by Fibonacci sequence, Philippou, et al(1980) further deduced the properties and characteristic functions of this distribution. Miller(2008) introduced the properties and applications of geometric distribution.
In the past decade, geometric distribution has been widely used. Jayakumar(2018), Ahmed, et al (2014) introduced the Weibull geometric distribution . The beta geometric distribution studied by Kemp and Adrienne (2001), and some other applications can be seen in Porwal (2018), Pedro, et al (2014)among others.
In many cases, we need to consider an important kind of problem, which we call "collection problem". The problem is described as: a Bernoulli experiment, one outcome with $ m $ different results $ A_1 , A_2 , \cdots , A_m $ may observed in a trial. The probability of these $ m $ results is $ p_i=P(A_i), i= 1, 2, \cdots, m $, respectively. There are two questions:
● How many trials are needed until all the $ m $ results appear?
● The total number of the trials $ Y $ is subject to what distribution?
In the above problems, the experiment will continue until all of $ A_1, A_2, \cdots, A_m $ or the $ r (r < m) $ results appeared, we are concerned about the probability distribution of $ Y $ (the total number of the trials). Xiao, et al(2015) studied several special cases of this problem. We further study the characteristics and properties of this kind of distribution basis on these results. For the one hand, the distribution of $ Y $ is constructed by using the theory of mixture lattice point set. For the other hand, using the mixexp package in R language to carry out complex calculation in the case of high dimension. The probability distribution, expectation and variance of $ Y $ are calculated and proved strictly.
In this paper, we first give the basic notations and their properties. In Section 3, we give the probability distribution and the numerical characteristics of the random variable $ Y $ which subject to Multinomial Geometric distribution (MGe). In Section 4, we mainly introduce a special case, the Uniform Multinomial Geometric distribution(UMGe). We also discuss the properties and asymptotic distribution of this kind of distribution. In Section 5, the asymptotic properties and estimation methods of MGe and UMGe are discussed by data simulation. Finally, we put forward some problems that can be further studied.
We use the notation, defined in Li and Zhang (2017). For the positive integer $ n \in \mathbb{Z} ^ + $, denote
as the $ m $ components, $ n $ order lattice points set on a simplex. It can be decomposed into the combination of interior point set and boundary point set $ {\cal L}\left\{ {m, n} \right\} = {\cal N}_m (n) \cup {\cal N}_m^0 (n), $ where
is the interior point set of $ {\cal L}\left\{ {m, n} \right\} $, $ {\cal N}_m^0 (n) = {\cal L}\{ m, n\} \backslash {\cal N}_m (n) $ is the boundary point set of $ {\cal L}\left\{ {m, n} \right\} $.
As (2.1) shows, $ {\cal N}_m (n) \ne \emptyset $ if and only if $ n \ge m $, $ {\cal L}\left\{ {m, n} \right\} $ has $ \binom{{n + m - 1}}{{m - 1}} $ elements, $ {\cal N}_m (n) $ has $ \binom{{n - 1}}{{m - 1}} $ elements.
We define the index set as
If $ {\cal I}_m (j) $ does not include an index $ i \in \left\{ {1, 2, \cdots , m} \right\} $, we denote it {by}
In this paper, we always use the form notation $ {\bf{i}}_j = (i_1 , i_2 , \cdots , i_j ) $ to indicate the $ j $ dimensional vector, $ {\bf{n}}_{m\backslash i} = (n_1 , \cdots , n_{i - 1} , n_{i + 1} \cdots , n_m ) $ to indicate the $ m-1 $ dimensional vector obtained by removing the $ i $-th element from $ {\bf{n}}_m $.
Let $ {\bf{p}}_m = (p_1 , p_2 , \cdots , p_m ) $ be a nonnegative vector, denote
For example, $ S^n ({\bf{p}}_m , 0) = \big( {\sum\limits_{i = 1}^m {p_i } } \big)^n , S^n ({\bf{p}}_m , 1) = \sum\limits_{k = 1}^m {\big( {\sum\limits_{i = 1}^m {p_i } - p_k } \big)^n } , \cdots , S^n ({\bf{p}}_m , m - 1) = \sum\limits_{k = 1}^m {p_k^n } $. The form of expansion of $ S^n ({\bf{p}}_m , j) $ is as follows
When $ 1 \le n \le m - 1 $, we have
When $ n \ge m $, we have
Property 2.1 For any nonnegative vector $ {\bf{p}}_m = (p_1 , p_2 , \cdots , p_m ) $ and integer $ n \ge 1 $,
where $ I( \cdot ) $ is indicator function.
To prove our conclusions, there three properties are proposed as follows.
Property 2.2 The sequence $ i_1 , i_2 , \cdots , i_{j + 1} $ is an arbitrary permutation of $ 1, 2, \cdots , j + 1 $, if the function $ g(\cdot) $ satisfies $ g(p_{i_1 } , p_{i_2 } , \cdots , p_{i_{j + 1} } ) = g(p_1 , p_2 , \cdots , p_{j + 1} ), $ then we called the $ g(p_1 , p_2 , \cdots , p_j ) $ is symmetrical, and we have
Property 2.3 When $ n \ge m $, $ \left( {p_1 + p_2 + \cdots + p_m } \right)^n = b^n ({\bf{p}}_m ) + n!\sum\limits_{n_m \in {\cal N}_m^0 (n)} {\prod\limits_{k = 1}^m {\frac{{p_k^{n_k } }}{{n_k !}}} } > 0, $ so we have
Remark In the above discussion, only required $ {\bf{p}}_m $ to be a nonnegative vector. In the following, we use $ {\bf{p}}_m $ to represent a probability vector, that is, to satisfy $ \sum\limits_{i = 1}^m {p_i } = 1, p_i > 0, i = 1, 2, \cdots , m $.
Now, we discuss the case that $ {\bf{p}}_m $ is a parameter in a multinomial distribution.
Definition 2.1 Random variable $ X $ subject to geometric distribution, denote as $ X \sim Ge(p), 0 < p < 1 $, the probability function of $ X $ is $ P(X = n) = (1 - p)^{n - 1} p, n = 1, 2, \cdots . $ Random variable vector $ (X_1 , X_2 , \cdots , X_m ) $ subject to multiple distribution, denote as $ (X_1 , X_2 , \cdots , X_m ) \sim M(n, {\bf{p}}_m ) $, the joint probability function is
where $ \sum\limits_{i = 1}^m {p_i } = 1, \sum\limits_{k = 1}^m {n_k } = n $.
Following the notations and assumption of Definition 2.1 and Property 2.1, for $ 1 \le n \le m - 1 $, we know that the following equation holds.
and $ S^n ({\bf{p}}_m , 0) = \left( {\sum\limits_{i = 1}^m {p_i } } \right)^n = 1 $, then we get the following property.
Property 2.4 When $ 1 \le n \le m - 1 $, we have
Based on the Definition 2.1 and the notations given in Section 2, we discuss the probability distribution function of multiple geometric distribution (MGe) and its properties.
Theorem 3.1 Suppose that $ A_1 , A_2 , \cdots , A_m $ are the $ m $ different results in each Bernoulli experiment, $ p_i = P(A_i ) > 0, i = 1, 2, \cdots , m $. When all $ m $ results appear, the total number of the trials $ Y $ is subject to $ {\rm MGe} $ distribution, denote it as $ Y \sim MGe({\bf{p}}_m ) $, the probability distribution function is
Proof Let $ X_1 , X_2 , \cdots , X_m $ are the appeared times of $ A_1 , A_2 , \cdots , A_m $ in first $ n-1 $ trials, so $ \left( {X_1 , X_2 , \cdots , X_m } \right) \sim M(n - 1, {\bf{p}}_m ) $. According to the total probability formula, there are
Next, we verify its probability regularity.
This completes the proof.
If the stopping condition of the experiment is changed, we give the following two generalizations.
Corollary 3.1 Stopping the experiment until the specified $ r $ results $ A_1 , A_2 , \cdots , A_r $ appear, the total number of trails is $ V_r $, denote it as $ V_r \sim MGe(p_1 , p_2 , \cdots , p_r ) $. The probability distribution function of $ V_r $ is
Corollary 3.2 Stopping the experiment until the arbitrary $ r $ results appear, the total number of trails is $ W_r $, denote it as $ W_r \sim MGe({\bf{p}}_m , r) $. The probability distribution function of $ W_r $ is
where $ p_{0k_i } = \sum\limits_{l = 1}^r {p_{k_{i_l } } } $.
According to the Property 2.2, the probability distribution function of $ Y \sim MGe({\bf{p}}_m ) $ has upper bound. Let
then, we have $ P\left( {Y = n} \right) < u_n $ hold, and $ P\left( {Y = n} \right) = u_n + o(p^{ - n} ), 0 < p < 1 $.
Theorem 3.2 Suppose that $ Y \sim MGe({\bf{p}}_m ) $, the expectation and second order moment of $ Y $ are as follow:
Proof Let us first calculate the expectation
$ \begin{array}{l} E(Y) = \sum\limits_{n = m}^\infty {nP(Y = n)} = \sum\limits_{n = m}^\infty {\sum\limits_{i = 1}^m {n{p_i}{b^{n - 1}}({{\bf{p}}_{m\backslash i}})} } \\ = \sum\limits_{i = 1}^m {{p_i}\sum\limits_{n = m}^\infty {\left\{ {n{{(1 - {p_i})}^{n - 1}} - {\sum \limits_{{{\bf{i}}_1} \in {I_m}(1\backslash i)}}n{{(1 - {p_i} - {p_{{i_1}}})}^{n - 1}} + {\sum\limits _{{{\bf{i}}_2} \in {I_m}(2\backslash i)}}n{{(1 - {p_i} - {p_{{i_1}}} - {p_{{i_2}}})}^{n - 1}}} \right.} } \\ \left. { + \cdots + {{( - 1)}^{m - 2}}{\sum\limits _{{{\bf{i}}_1} \in {I_m}(1\backslash i)}}np_{{i_1}}^{n - 1}} \right\} \end{array}$ $ \begin{array}{l} = \sum\limits_{i = 1}^m {{p_i}\left\{ {\sum\limits_{j = 0}^{m - 2} {{{( - 1)}^j}} {\sum\limits _{{{\bf{i}}_j} \in {I_m}(j\backslash i)}}{{\left( {{p_i} + \sum\limits_{k = 1}^j {{p_{{i_k}}}} } \right)}^{ - 2}}{{\left( {1 - {p_i} - \sum\limits_{k = 1}^j {{p_{{i_k}}}} } \right)}^{m - 1}}\left[ {(m - 1)\sum\limits_{k = 1}^j {{p_{{i_k}}}} + 1} \right]} \right\}} \\ = \sum\limits_{j = 1}^{m - 1} {{{( - 1)}^{j - 1}}} \sum\limits_{{{\bf{i}}_j} \in {I_m}(j)} {{{\left( {1 - \sum\limits_{k = 1}^j {{p_{{i_k}}}} } \right)}^{m - 1}}\left[ {(m - 1) + {{\left( {\sum\limits_{k = 1}^j {{p_{{i_k}}}} } \right)}^{ - 1}}} \right]} \end{array} $ $ = (m - 1)\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{m - 1} } + \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{m - 1} \left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} } $ $ = (m - 1) + \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} } + \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\sum\limits_{l = 1}^{m - 1} {( - 1)^l } \binom{{m - 1}}{l}\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{l - 1} }. $
Let the third item in the above formula be
We can get the conclusion (3.3) by substituting $ Q $ into
Next, we calculate the second moment. Let
It can be seen from the above expected calculation process, there is conclusion as
The second moment can be obtained by the following calculation.
$ E(Y^2 ) = \sum\limits_{n = m}^\infty {n(n + 1)P(Y = n)} - E(Y) \\ = \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left\{ {m(m + 1)\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{m - 1} } \right.} \\ {\; \; \; }\;+ \left. {{\rm 2}\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^m \left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 2} \left( {1 + m\sum\limits_{k = 1}^j {p_{i_k } } } \right)} \right\} \\ {\rm \; \; \; }\;- \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{m - 1} \left[ {(m - 1) + \left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} } \right]} \\ =\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left\{ {\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{m - 1} (m - 1)^2 } \right.} + \left. {(2m - 3)\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} + 2\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 2} } \right\} \\ = (m - 1)^2 \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } S^{m - 1} ({\bf{p}}_m , j) \\ {\rm \; \; \; }\;+ (2m - 3)\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{m - 1} \left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} } \\ {\rm \; \; \; }\;+ 2\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\sum\limits_{l = 0}^{m - 1} {( - 1)^l \binom{{m - 1}}{l}\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{l - 2} } } \\ = (m - 1)^2 + (2m - 3)\left[ {c^{ - 1} ({\bf{p}}_m ) - (m - 1) + ( - 1)^m } \right] \\ {\rm \; \; \; }\;+ 2\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 2} - \binom{{m - 1}}{1}\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} + \binom{{m - 1}}{2}} \right)} \\ {\rm \; \; \; }\;+ 2\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\sum\limits_{l = 3}^{m - 1} {( - 1)^l \binom{{m - 1}}{l}\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{l - 2} } } \\ = - (m - 1)(m - 2) + (2m - 3)c^{ - 1} ({\bf{p}}_m ) + (2m - 3)( - 1)^m \\ {\rm \; \; \; }\;+ 2c^{ - 2} ({\bf{p}}_m ) - 2(m - 1)c^{ - 1} ({\bf{p}}_m ) + \binom{{m - 1}}{2}\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \binom{m}{j} \\ {\rm \; \; \; }\;+ 2\sum\limits_{l = 3}^{m - 1} {( - 1)^l \binom{{m - 1}}{l}} \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} S^{l - 2} ({\bf{p}}_m , m - j)} \\ = 2c^{ - 2} ({\bf{p}}_m ) - c^{ - 1} ({\bf{p}}_m ) + 2( - 1)^m \left\{ {\sum\limits_{l = 2}^{m - 1} {( - 1)^l } \binom{{m - 1}}{l}} \right\} - (2m - 3)( - 1)^m \\ = 2c^{ - 2} ({\bf{p}}_m ) - c^{ - 1} ({\bf{p}}_m ) + ( - 1)^{m - 1}. $
For example, suppose $ Y^{(m)} \sim MGe({\bf{p}}_m ), m=2, 3, 4 $, there are some results:
In this section, we mainly discuss a special case of MGe. When the parameter vector $ {\bf{p}}_m = \left( {p_1 , p_2 , \cdots , p_m } \right), p_i = \dfrac{1}{m}, i = 1, 2, \cdots , m $, we denote $ MGe\left( {{\bf{p}}_m } \right) $ as $ MGe\left( m \right) $, named it as Uniform Multiple Geometric distribution(UMGe).
Theorem 4.1 Suppose $ Y $ subject to $ {\rm UMGe} $ distribution, we denote it as $ Y \sim MGe\left( m \right) $, then the probability distribution function of $ Y $ is
Proof From the condition $ p_i = \dfrac{1}{m}, i = 1, 2, \cdots , m $, we can get
According to equation (3.1), we have
To obtain the expectation and variance of the UMGe distribution, we first give the following three combined formulas.
Property 4.1 The following three combination formulas are always held for arbitrary positive integers $ m \in \mathbb{Z}^ + $.
Proof For the formula (4.2), because $ r_1 (m - 1) = \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \frac{1}{j}\binom{{m - 1}}{j}} $, where
So we have $ r_1 (m) = r_1 (1) + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{m} = \sum\limits_{j = 1}^m {\frac{1}{j}} $.
For the formula (4.3), let $ R_2 (x) = \sum\limits_{j = 0}^m {( - 1)^j \frac{1}{{\left( {j + 1} \right)^2 }}\binom{m}{j}} e^{(j + 1)x}, $ so the second derivative of $ R_2 (x) $ is
where $ C_1 , C_2 $ is the undetermined constant, then from $ R_2 (0) = r_2 (m), r_2 (1) = 1 $, and equation (4.3), we have
Let us prove the equation (4.4) by mathematical induction. Let
When $ m=1 $, $ f_1 (1) = f_2 (1) = 1 $, the equation (4.4) held.
If for a given $ m \in \mathbb{Z}^ + $, $ f_1 (m) = f_2 (m) $ holds, then we prove that the conclusion is also true with $ m + 1 $.
So the equation hold for any $ m \in \mathbb{Z}^ + $.
Theorem 4.2 Suppose $ Y \sim MGe\left( m \right) $, the expectation and variance of Y are as follows
Proof According to Theorem 3.2 and the equation (4.2) and basic condition $ {\bf{p}}_m = \left( {p_1 , p_2 , \cdots , p_m } \right), p_i = \dfrac{1}{m}, i = 1, 2, \cdots , m $, we have
The variance of $ Y $,
There is another explanation for the expectation and variance of the UMGe distribution: let the random variable $ X_i , i = 1, 2, \cdots , m $ represent the number of trails carried out when the "$ i $-th new result" appears. So $ X_i \sim Ge\left( {1 - \frac{{i - 1}}{m}} \right), i = 1, 2, \cdots , m $. The total number of trails is $ Y = \sum\limits_{i = 1}^m {X_i } $, then it can be calculated as
This is consistent with the formula we have deduced. According to this idea, we can generate random numbers that obey UMGe distribution, i.e., by using the sum of random numbers that subject to different geometric distribution.
In this section, we consider the probability distribution function(pdf) and cumulative distribution function(cdf) of MGe distribution. Note $ \Phi (y) $ represents the cumulative distribution function of standard normal distribution.
Example 1 Suppose $ Y^{(m)} \sim MGe({\bf{p}}_m ) $, where $ {\bf{p}}_m = \left( {p_1 , p_2 , \cdots , p_m } \right) $. The parameters are generated as follows
Let $ \mu _m = E\left( {Y^{(m)} } \right), \sigma _m^2 = {\rm{Var}}\left( {Y^{(m)} } \right) $, discretization of normal distribution, let $ \varphi (y) = \Phi \left[ {(y - \mu _m )/\sigma _m } \right] - \Phi \left[ {(y - \mu _m - 1)/\sigma _m } \right] $, $ p(y) = P(Y^{(m)} = y) $, and $ F(y) = P(Y^{(m)} \le y) $ is the $ \rm cdf $ of $ Y^{(m)} $.
Consider four situations when $ m = 5, 10, 15, 20 $, respectively, the expectation and variance of the random variable are calculated as
We draw the image of the pdf of $ Y^{(m)} $, the upper bound $ u_n = u_m (n) $ of $ Y^{(m)} $'s pdf according to formula (3.2), and the scatter plot of discretization of normal distribution $ N(\mu_m, \sigma_m^2) $, respectively. The images in four situations are shown in Figure 1.
From the image point of view, the normal distribution is not a good approximation of the MGe distribution, however, there are two intersections between the density curve of normal distribution and the upper bound $ u_n $ of MGe's pdf.
The first intersection is near its expectation, so we can consider to calculate probability by $ p(y) = P(Y^{(m)} = y) $ when $ y < \mu _m $, using upper bound instead of probability when $ y > \mu _m $, that is $ p(y) = u_m (n) $.
In addition, the pdf of MGe distribution is always a right-skewed distribution which can not be fitted well by using normal distribution. Because of the complexity of the pdf form of MGe distribution, how to obtain its approximate distribution is an important problem to be studied.
Example 2 Suppose $ Y_m \sim MGe(m) $, we still use the notation of Example 1, under conditions $ m = 10, 30, 50, 70 $, the expectation and variance are
respectively.
We draw the image of the pdf of $ Y_m $, the upper bound of $ Y^{(m)} $'s pdf and the scatter plot of discretization of normal distribution $ N(\mu_m, \sigma_m^2) $, respectively. The images in four situations are shown in Figure 2.
It can be seen that the effect of normal approach to UMGe is better than MGe. We define the quantile of the UMGe distribution as $ q_p (m) $ which satisfies
We calculate the expectation of each $ Y_m $ in the situation of $ m = 3, 4, \cdots , 40 $ as shown in columns 7 and 15 of Table 1, the variance of each $ Y_m $ shown in columns 8 and 16 of Table 1, the quantiles of $ p = 0.25, 0.5, 0.75, 0.9, 0.95 $ are calculated and listed in the corresponding columns in Table 1.
Let us calculate the quantile of $ Y_m $ at $ p = 0.25, 0.5, 0.75, 0.9, 0.95 $ for $ m = 3, 4, \cdots, 40 $. The scatter plot of quantile and the fitting curve of the quadratic regression model drawn with the horizontal axis of $ m $ and the vertical axis of $ q_p (m) $ are shown in Figure 3 (a).
From the data in Table 1 and the trend of quantile $ q_p (m) $ in Figure 3(a), we can suppose the quadratic regression model of quantile $ q_p (m) $ and $ m $ as
The regression model that establishes according to (5.1) is very significant in $ p = 0.25, 0.5, 0.75, 0.9, 0.95 $ by calculation, and the regression coefficients as shown in Table 2.
Furthermore, calculate all quantile values for $ m=3, 4, \cdots , 100, p=0.05, 0.06, \cdots, 0.97 $, after regression analysis, the empirical regression equation is
Draw the surface graph of (5.2) and the scatter graph of $ \left( {m, p, q(p, m)} \right) $ shown in Figure 3(b).
For example, The 0.95 quantile of $ Y_{10} $ can be obtained from
Using (5.2) to estimate as
The quantile calculated by (5.1) is more accurate than (5.2), however, in many cases (5.2) is more convenient to use. When $ m $ is larger, the estimated value is often smaller than the real value, so it can be considered to build a nonparametric regression model based on the calculated data, so the calculation is more accurate.
In this paper, the multiple geometric distribution is discussed under the condition of putting back, which means that the probability of occurrence of event $ A_i $ is constant for each test. If the sampling conditions are changed, and the samples are taken one by one in a finite population, the distribution of the total number of tests will be different if several specified results occur. In many cases, upper bound approximation and normal approximation can be considered.