THE MULTINOMIAL GEOMETRIC DISTRIBUTION


扩展功能
	加入收藏夹

	复制引文信息

	加入引用管理器

	Email Alert

	RSS
本文作者相关文章
	LI Guang-hui

	LI Jun-peng

	ZHANG Chong-qi

LI Guang-hui¹, LI Jun-peng², ZHANG Chong-qi²

1. School of Science, Kaili University, Kaili 556011, China;
2. School of Economics and Statistics, Guangzhou University, Guangzhou 510006, China

Received date: 2021-03-15; Accepted date: 2021-06-01

Foundation item: Supported by Science and Technology Fund for Basic Research of Guizhou Province ([2020]1Y010); National Nature Sciences Foundation of China (NSFC 11901260).

Biography: Li Guanghui(1985-), male, born at Guiyang, Guizhou, associate professor, major in statistics, E-mail: liguanghui1985@126.com.

Abstract: In this paper, we studied the collection problem under Bernoulli's trials and derived a probability function of the multinomial geometric distribution by using the theory of mixture lattice point sets. Further, we proposed a uniform multinomial geometric distribution when the probabilities of each of the trial outcomes are assumed to be equal in Bernoulli's trial. Moreover, we obtained the probability functions, the expectation, and the variance of the two types of distributions, and verified the differences between these two types of distributions and the normal distribution by means of simulations. Finally, we developed a polynomial regression model on the probability and the number of trials from the simulation results, which can be used to simplify the calculation effectively.

Keywords: multinomial distribution geometric distribution probability function

多项几何分布

李光辉¹, 李俊鹏², 张崇岐²

1. 凯里学院理学院, 贵州凯里 556011;
2. 广州大学经济与统计学院, 广东广州 510006

摘要：本文研究了在伯努利试验下的收集问题, 利用混料格子点集的理论, 推导出了多项几何分布的概率函数. 在伯努利试验中, 如果假设各种试验结果发生的概率都相等, 进一步提出了均匀多项几何分布. 我们得到了两类分布的概率函数以及期望与方差, 通过模拟验证了这两类分布与正态分布的差异, 并由模拟结果建立关于概率与试验次数的多项式回归模型, 使用该模型可以有效的简化计算.

关键词：多项分布几何分布概率函数

1 Introduction

Geometric distribution is an important kind of probability distribution, which has been widely used in practice. According to the actual situation, geometric distribution has many important generalizations. For example, Muwafi(1980) studied the $ k $ order geometric distribution by Fibonacci sequence, Philippou, et al(1980) further deduced the properties and characteristic functions of this distribution. Miller(2008) introduced the properties and applications of geometric distribution.

In the past decade, geometric distribution has been widely used. Jayakumar(2018), Ahmed, et al (2014) introduced the Weibull geometric distribution . The beta geometric distribution studied by Kemp and Adrienne (2001), and some other applications can be seen in Porwal (2018), Pedro, et al (2014)among others.

In many cases, we need to consider an important kind of problem, which we call "collection problem". The problem is described as: a Bernoulli experiment, one outcome with $ m $ different results $ A_1 , A_2 , \cdots , A_m $ may observed in a trial. The probability of these $ m $ results is $ p_i=P(A_i), i= 1, 2, \cdots, m $, respectively. There are two questions:

● How many trials are needed until all the $ m $ results appear?

● The total number of the trials $ Y $ is subject to what distribution?

In the above problems, the experiment will continue until all of $ A_1, A_2, \cdots, A_m $ or the $ r (r < m) $ results appeared, we are concerned about the probability distribution of $ Y $ (the total number of the trials). Xiao, et al(2015) studied several special cases of this problem. We further study the characteristics and properties of this kind of distribution basis on these results. For the one hand, the distribution of $ Y $ is constructed by using the theory of mixture lattice point set. For the other hand, using the mixexp package in R language to carry out complex calculation in the case of high dimension. The probability distribution, expectation and variance of $ Y $ are calculated and proved strictly.

In this paper, we first give the basic notations and their properties. In Section 3, we give the probability distribution and the numerical characteristics of the random variable $ Y $ which subject to Multinomial Geometric distribution (MGe). In Section 4, we mainly introduce a special case, the Uniform Multinomial Geometric distribution(UMGe). We also discuss the properties and asymptotic distribution of this kind of distribution. In Section 5, the asymptotic properties and estimation methods of MGe and UMGe are discussed by data simulation. Finally, we put forward some problems that can be further studied.

2 Notations and Preliminaries

We use the notation, defined in Li and Zhang (2017). For the positive integer $ n \in \mathbb{Z} ^ + $, denote

$ {\cal L}\left\{ {m, n} \right\} = \left\{ {{\bf{n}}_m = (n_1 , n_2 , \cdots , n_m ):\sum\limits_{i = 1}^m {n_i } = n, n_i \ge 0, n_i \in \mathbb{Z}^+, i = 1, 2, \cdots , m} \right\} $

as the $ m $ components, $ n $ order lattice points set on a simplex. It can be decomposed into the combination of interior point set and boundary point set $ {\cal L}\left\{ {m, n} \right\} = {\cal N}_m (n) \cup {\cal N}_m^0 (n), $ where

$ \begin{eqnarray} {\cal N}_m (n) = \left\{ {{\bf{n}}_m = (n_1 , n_2 , \cdots , n_m ):{\bf{n}}_m \in {\cal L}\{ m, n\} , n_i > 0, i = 1, 2, \cdots , m} \right\} \end{eqnarray} $

(2.1)

is the interior point set of $ {\cal L}\left\{ {m, n} \right\} $, $ {\cal N}_m^0 (n) = {\cal L}\{ m, n\} \backslash {\cal N}_m (n) $ is the boundary point set of $ {\cal L}\left\{ {m, n} \right\} $.

As (2.1) shows, $ {\cal N}_m (n) \ne \emptyset $ if and only if $ n \ge m $, $ {\cal L}\left\{ {m, n} \right\} $ has $ \binom{{n + m - 1}}{{m - 1}} $ elements, $ {\cal N}_m (n) $ has $ \binom{{n - 1}}{{m - 1}} $ elements.

We define the index set as

$ \begin{eqnarray} {\cal I}_m (j) = \left\{ {{\bf{i}}_j = (i_1 , i_2 , \cdots , i_j ):1 \le i_1 < i_2 < \cdots < i_j \le m, i_k \in \mathbb{Z}^ + , k = 1, 2, \cdots , j} \right\}. \end{eqnarray} $

(2.2)

If $ {\cal I}_m (j) $ does not include an index $ i \in \left\{ {1, 2, \cdots , m} \right\} $, we denote it {by}

$ {\cal I}_m (j\backslash i) = \left\{ {{\bf{i}}_j = (i_1 , i_2 , \cdots , i_j ):{\bf{i}}_j \in {\cal I}_m (j), i_k \ne i, k = 1, 2, \cdots , j} \right\}. $

In this paper, we always use the form notation $ {\bf{i}}_j = (i_1 , i_2 , \cdots , i_j ) $ to indicate the $ j $ dimensional vector, $ {\bf{n}}_{m\backslash i} = (n_1 , \cdots , n_{i - 1} , n_{i + 1} \cdots , n_m ) $ to indicate the $ m-1 $ dimensional vector obtained by removing the $ i $-th element from $ {\bf{n}}_m $.

Let $ {\bf{p}}_m = (p_1 , p_2 , \cdots , p_m ) $ be a nonnegative vector, denote

$ \begin{eqnarray} S^n ({\bf{p}}_m , j) = \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\big( {\sum\limits_{i = 1}^m {p_i } - \sum\limits_{k = 1}^j {p_{i_k } } } \big)^n } , j = 0, 1, \cdots , m - 1. \end{eqnarray} $

(2.3)

For example, $ S^n ({\bf{p}}_m , 0) = \big( {\sum\limits_{i = 1}^m {p_i } } \big)^n , S^n ({\bf{p}}_m , 1) = \sum\limits_{k = 1}^m {\big( {\sum\limits_{i = 1}^m {p_i } - p_k } \big)^n } , \cdots , S^n ({\bf{p}}_m , m - 1) = \sum\limits_{k = 1}^m {p_k^n } $. The form of expansion of $ S^n ({\bf{p}}_m , j) $ is as follows

$ S^n ({\bf{p}}_m , j) = n!\sum\limits_{k = 1}^{m - j} {\binom{{m - k}}{j}\big\{ {\sum\limits_{{\bf{i}}_k \in {\cal I}_m (k)} {\sum\limits_{{\bf{n}}_k \in {\cal N}_k (n)} {\prod\limits_{l = 1}^k {\frac{{p_{i_l }^{n_l } }}{{n_l !}}} } } } \big\}}. $

When $ 1 \le n \le m - 1 $, we have

$ \sum\limits_{j = 0}^{m - 1} {( - 1)^j S^n ({\bf{p}}_m , j)} = n!\sum\limits_{j = 0}^{m - n} {\big\{ {\sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\sum\limits_{{\bf{n}}_j \in {\cal N}_j (m)} {\prod\limits_{k = 1}^j {\frac{{p_{i_k }^{n_k } }}{{n_k !}}} } } \big[ {\sum\limits_{l = 0}^{m - j} {\binom{{m - j}}{l}( - 1)^l } } \big]} \big\}} = 0. $

When $ n \ge m $, we have

$ \begin{eqnarray*} \sum\limits_{j = 0}^{m - 1} {( - 1)^j S^n ({\bf{p}}_m , j)} = n!\sum\limits_{n_m \in {\cal N}_m (n)} {\prod\limits_{k = 1}^m {\frac{{p_k^{n_k } }}{{n_k !}}} }. \end{eqnarray*} $

Property 2.1 For any nonnegative vector $ {\bf{p}}_m = (p_1 , p_2 , \cdots , p_m ) $ and integer $ n \ge 1 $,

$ b^n ({\bf{p}}_m ) = \sum\limits_{j = 0}^{m - 1} {( - 1)^j S^n ({\bf{p}}_m , j)} = I(n \ge m)\big( {n!\sum\limits_{n_m \in {\cal N}_m (n)} {\prod\limits_{i = 1}^m {\frac{{p_i^{n_i } }}{{n_i !}}} } } \big), $

where $ I( \cdot ) $ is indicator function.

To prove our conclusions, there three properties are proposed as follows.

Property 2.2 The sequence $ i_1 , i_2 , \cdots , i_{j + 1} $ is an arbitrary permutation of $ 1, 2, \cdots , j + 1 $, if the function $ g(\cdot) $ satisfies $ g(p_{i_1 } , p_{i_2 } , \cdots , p_{i_{j + 1} } ) = g(p_1 , p_2 , \cdots , p_{j + 1} ), $ then we called the $ g(p_1 , p_2 , \cdots , p_j ) $ is symmetrical, and we have

$ \sum\limits_{i = 1}^m {p_i } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j\backslash i)} {\frac{{g(p_i , p_{i_1 } , \cdots , p_{i_j } )}}{{\left( {p_i + p_{i_1 } + \cdots + p_{i_j } } \right)^{r + 1} }}} = \sum\limits_{{\bf{i}}_{j + 1} \in {\cal I}_m (j + 1)} {\frac{{g(p_{i_1 } , \cdots , p_{i_j } , p_{i_{j + 1} } )}}{{\left( {p_{i_1 } + \cdots + p_{i_j } + p_{i_{j + 1} } } \right)^r }}} . $

Property 2.3 When $ n \ge m $, $ \left( {p_1 + p_2 + \cdots + p_m } \right)^n = b^n ({\bf{p}}_m ) + n!\sum\limits_{n_m \in {\cal N}_m^0 (n)} {\prod\limits_{k = 1}^m {\frac{{p_k^{n_k } }}{{n_k !}}} } > 0, $ so we have

$ \begin{eqnarray} b^n ({\bf{p}}_m ) < \left( {p_1 + p_2 + \cdots + p_m } \right)^n. \end{eqnarray} $

(2.4)

Remark In the above discussion, only required $ {\bf{p}}_m $ to be a nonnegative vector. In the following, we use $ {\bf{p}}_m $ to represent a probability vector, that is, to satisfy $ \sum\limits_{i = 1}^m {p_i } = 1, p_i > 0, i = 1, 2, \cdots , m $.

Now, we discuss the case that $ {\bf{p}}_m $ is a parameter in a multinomial distribution.

Definition 2.1 Random variable $ X $ subject to geometric distribution, denote as $ X \sim Ge(p), 0 < p < 1 $, the probability function of $ X $ is $ P(X = n) = (1 - p)^{n - 1} p, n = 1, 2, \cdots . $ Random variable vector $ (X_1 , X_2 , \cdots , X_m ) $ subject to multiple distribution, denote as $ (X_1 , X_2 , \cdots , X_m ) \sim M(n, {\bf{p}}_m ) $, the joint probability function is

$ P(X_1 = n_1 , X_2 = n_2 , \cdots , X_m = n_m ) = n!\prod\limits_{k = 1}^m {\frac{{p_k^{n_k } }}{{n_k !}}} , $

where $ \sum\limits_{i = 1}^m {p_i } = 1, \sum\limits_{k = 1}^m {n_k } = n $.

Following the notations and assumption of Definition 2.1 and Property 2.1, for $ 1 \le n \le m - 1 $, we know that the following equation holds.

$ S^n ({\bf{p}}_m , 0) - \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} S^n ({\bf{p}}_m , j)} = 0, $

and $ S^n ({\bf{p}}_m , 0) = \left( {\sum\limits_{i = 1}^m {p_i } } \right)^n = 1 $, then we get the following property.

Property 2.4 When $ 1 \le n \le m - 1 $, we have

$ \begin{eqnarray*} && \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} S^n ({\bf{p}}_m , j)} = \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {( - 1)^{j - 1} \left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^n } } = 1, \\ && \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} S^n ({\bf{p}}_m , m - j)} = \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {( - 1)^{j - 1} \left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^n } } = ( - 1)^m. \end{eqnarray*} $

3 Multinomial Geometric Distribution and Its Properties

Based on the Definition 2.1 and the notations given in Section 2, we discuss the probability distribution function of multiple geometric distribution (MGe) and its properties.

Theorem 3.1 Suppose that $ A_1 , A_2 , \cdots , A_m $ are the $ m $ different results in each Bernoulli experiment, $ p_i = P(A_i ) > 0, i = 1, 2, \cdots , m $. When all $ m $ results appear, the total number of the trials $ Y $ is subject to $ {\rm MGe} $ distribution, denote it as $ Y \sim MGe({\bf{p}}_m ) $, the probability distribution function is

$ \begin{eqnarray} P\left( {Y = n} \right) = \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{n - 1} } } , n = m, m + 1, \cdots. \end{eqnarray} $

(3.1)

Proof Let $ X_1 , X_2 , \cdots , X_m $ are the appeared times of $ A_1 , A_2 , \cdots , A_m $ in first $ n-1 $ trials, so $ \left( {X_1 , X_2 , \cdots , X_m } \right) \sim M(n - 1, {\bf{p}}_m ) $. According to the total probability formula, there are

$ \begin{eqnarray*} P\left( {Y = n} \right) &=& \sum\limits_{i = 1}^m {P(A_i )P(X_1 \ge 1 , \cdots, X_{i - 1} \ge 1, X_i = 0, X_{i + 1} \ge 1, \cdots , X_m \ge 1)} \\ &=& \sum\limits_{i = 1}^m {p_i \left\{ {(n - 1)!\sum\limits_{{\bf{n}}_{m - 1} \in {\cal N}_{m - 1} (n - 1)} {\prod\limits_{k \ne i}^m {\left( {\frac{{p_k^{n_k } }}{{n_k !}}} \right)} } } \right\}} \\ &=& \sum\limits_{i = 1}^m {p_i b^{n - 1} ({\bf{p}}_{m\backslash i} )} \\ &=& \sum\limits_{i = 1}^m {p_i (1 - p_i )^{n - 1} } - \sum\limits_{i = 1}^m {p_i } \sum\limits_{{\bf{i}}_1 \in {\cal I}_m (1\backslash i)} {(1 - p_i - p_{i_1 } )^{n - 1} } \\ &&+ \sum\limits_{i = 1}^m {p_i \sum\limits_{{\bf{i}}_2 \in {\cal I}_m (2\backslash i)} {(1 - p_i - p_{i_1 } - p_{i_2 } )^{n - 1} } } + \cdots + ( - 1)^{m - 2} \sum\limits_{i = 1}^m {p_i \sum\limits_{{\bf{i}}_1 \in {\cal I}_m (1\backslash i)} {p_{i_1 }^{n - 1} } } \\ &=& \sum\limits_{i = 1}^m {p_{i } (1 - p_{i } )^{n - 1} } - \sum\limits_{{\bf{i}}_2 \in {\cal I}_m (2)} {\left( {p_{i_1 } + p_{i_2 } } \right)(1 - p_{i_1 } - p_{i_2 } )^{n - 1} } \\ &&+ \sum\limits_{{\bf{i}}_3 \in {\cal I}_m (3)} {(p_{i_1 } + p_{i_2 } + p_{i_3 } )(1 - p_{i_1 } - p_{i_2 } - p_{i_3 } )^{n - 1} } \\ &&+ \cdots + ( - 1)^{m - 2} \sum\limits_{{\bf{i}}_1 \in {\cal I}_m (1)} {(1 - p_{i_1 } )p_{i_1 }^{n - 1} } \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; {({\rm Property\; \; 2.2})} \\ &=& \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{n - 1} } }. \end{eqnarray*} $

Next, we verify its probability regularity.

$ \begin{eqnarray*} &&\sum\limits_{n = m}^\infty {P\left( {Y = n} \right)} = \sum\limits_{n = m}^\infty {\sum\limits_{i = 1}^m {p_i b^{n - 1} ({\bf{p}}_{m\backslash i} )} } \\ &=& \sum\limits_{i = 1}^m {p_i \sum\limits_{n = m}^\infty {\left\{ {(1 - p_i )^{n - 1} - \sum\limits_{{\bf{i}}_1 \in {\cal I}_m (1\backslash i)} {(1 - p_i - p_{i_1 } )^{n - 1} } + \sum\limits_{{\bf{i}}_2 \in {\cal I}_m (2\backslash i)} {(1 - p_i - p_{i_1 } - p_{i_2 } )^{n - 1} } } \right.} } \\ &&\quad\left. { + \cdots + ( - 1)^{m - 2} \sum\limits_{{\bf{i}}_1 \in {\cal I}_m (1\backslash i)} {p_{i_1 }^{n - 1} } } \right\} \\ &=& \sum\limits_{i = 1}^m {p_i \left\{ {\frac{{(1 - p_i )^{m - 1} }}{{p_i }} - \sum\limits_{{\bf{i}}_1 \in {\cal I}_m (1\backslash i)} {\frac{{(1 - p_i - p_{i_1 } )^{m - 1} }}{{(p_i + p_{i_1 } )}}} + \sum\limits_{{\bf{i}}_2 \in {\cal I}_m (2\backslash i)} {\frac{{(1 - p_i - p_{i_1 } - p_{i_2 } )^{m - 1} }}{{(p_i + p_{i_1 } + p_{i_2 } )}}} } \right.} \\ &&\left. { + \cdots + ( - 1)^{m - 2} \sum\limits_{{\bf{i}}_1 \in {\cal I}_m (1\backslash i)} {\frac{{p_{i_1 }^{m - 1} }}{{(1 - p_{i_1 } )}}} } \right\} \\ \end{eqnarray*} $

$ \begin{eqnarray*} &=& \sum\limits_{i = 1}^m {(1 - p_i )^{m - 1} } - \sum\limits_{{\bf{i}}_2 \in {\cal I}_m (2)} {(1 - p_{i_1 } - p_{i_2 } )^{m - 1} } + \sum\limits_{{\bf{i}}_3 \in {\cal I}_m (3)} {(1 - p_{i_1 } - p_{i_2 } - p_{i_3 } )^{m - 1} } \\ &&+ \cdots + ( - 1)^{m - 2} \sum\limits_{i = 1}^m {p_i ^{m - 1} } \\ &=& \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} S^{m - 1} ({\bf{p}}_m , j)} \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; {({\rm Property\; \; 2.4})}\\ &=& 1. \end{eqnarray*} $

This completes the proof.

If the stopping condition of the experiment is changed, we give the following two generalizations.

Corollary 3.1 Stopping the experiment until the specified $ r $ results $ A_1 , A_2 , \cdots , A_r $ appear, the total number of trails is $ V_r $, denote it as $ V_r \sim MGe(p_1 , p_2 , \cdots , p_r ) $. The probability distribution function of $ V_r $ is

$ P(V_r = n) = \sum\limits_{j = 1}^r {( - 1)^{j - 1} \sum\limits_{{\bf{i}}_j \in {\cal I}_r (j)} {\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)} } \left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{n - 1} , n = r, r + 1, \cdots . $

Corollary 3.2 Stopping the experiment until the arbitrary $ r $ results appear, the total number of trails is $ W_r $, denote it as $ W_r \sim MGe({\bf{p}}_m , r) $. The probability distribution function of $ W_r $ is

$ P(W_r = n) = \sum\limits_{{\bf{k}}_r \in {\cal I}_m (r)} {\sum\limits_{j = 1}^{r - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_r (j)} {\left( {\sum\limits_{l = 1}^j {p_{k_{i_l } } } } \right)} \left( {p_{0k_i } - \sum\limits_{l = 1}^j {p_{k_{i_l } } } } \right)^{n - 1} }, n = r, r + 1, \cdots , $

where $ p_{0k_i } = \sum\limits_{l = 1}^r {p_{k_{i_l } } } $.

According to the Property 2.2, the probability distribution function of $ Y \sim MGe({\bf{p}}_m ) $ has upper bound. Let

$ \begin{eqnarray} u_m (n) = \left\{ \begin{array}{l} \sum\limits_{i = 1}^m {p_i (1 - p_i )^{n - 1} \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; , m \; \; {\rm is\; \; even} }, \\ \sum\limits_{i = 1}^m {\left( {p_i (1 - p_i )^{n - 1} - (1 - p_i )p_i^{n - 1} } \right), m \; \; {\rm is\; \; odd}}, \\ \end{array} \right. \end{eqnarray} $

(3.2)

then, we have $ P\left( {Y = n} \right) < u_n $ hold, and $ P\left( {Y = n} \right) = u_n + o(p^{ - n} ), 0 < p < 1 $.

Theorem 3.2 Suppose that $ Y \sim MGe({\bf{p}}_m ) $, the expectation and second order moment of $ Y $ are as follow:

$ \begin{eqnarray} E(Y) = \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} } + ( - 1)^{m - 1} }, \end{eqnarray} $

(3.3)

$ \begin{eqnarray} E(Y^2 ) = \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 2} \left( {2 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)} + ( - 1)^{m - 1} }. \end{eqnarray} $

(3.4)

Proof Let us first calculate the expectation

$ \begin{array}{l} E(Y) = \sum\limits_{n = m}^\infty {nP(Y = n)} = \sum\limits_{n = m}^\infty {\sum\limits_{i = 1}^m {n{p_i}{b^{n - 1}}({{\bf{p}}_{m\backslash i}})} } \\ = \sum\limits_{i = 1}^m {{p_i}\sum\limits_{n = m}^\infty {\left\{ {n{{(1 - {p_i})}^{n - 1}} - {\sum \limits_{{{\bf{i}}_1} \in {I_m}(1\backslash i)}}n{{(1 - {p_i} - {p_{{i_1}}})}^{n - 1}} + {\sum\limits _{{{\bf{i}}_2} \in {I_m}(2\backslash i)}}n{{(1 - {p_i} - {p_{{i_1}}} - {p_{{i_2}}})}^{n - 1}}} \right.} } \\ \left. { + \cdots + {{( - 1)}^{m - 2}}{\sum\limits _{{{\bf{i}}_1} \in {I_m}(1\backslash i)}}np_{{i_1}}^{n - 1}} \right\} \end{array}$ $ \begin{array}{l} = \sum\limits_{i = 1}^m {{p_i}\left\{ {\sum\limits_{j = 0}^{m - 2} {{{( - 1)}^j}} {\sum\limits _{{{\bf{i}}_j} \in {I_m}(j\backslash i)}}{{\left( {{p_i} + \sum\limits_{k = 1}^j {{p_{{i_k}}}} } \right)}^{ - 2}}{{\left( {1 - {p_i} - \sum\limits_{k = 1}^j {{p_{{i_k}}}} } \right)}^{m - 1}}\left[ {(m - 1)\sum\limits_{k = 1}^j {{p_{{i_k}}}} + 1} \right]} \right\}} \\ = \sum\limits_{j = 1}^{m - 1} {{{( - 1)}^{j - 1}}} \sum\limits_{{{\bf{i}}_j} \in {I_m}(j)} {{{\left( {1 - \sum\limits_{k = 1}^j {{p_{{i_k}}}} } \right)}^{m - 1}}\left[ {(m - 1) + {{\left( {\sum\limits_{k = 1}^j {{p_{{i_k}}}} } \right)}^{ - 1}}} \right]} \end{array} $ $ = (m - 1)\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{m - 1} } + \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{m - 1} \left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} } $ $ = (m - 1) + \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} } + \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\sum\limits_{l = 1}^{m - 1} {( - 1)^l } \binom{{m - 1}}{l}\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{l - 1} }. $

Let the third item in the above formula be

$ \begin{eqnarray*} Q &=& \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\sum\limits_{l = 1}^{m - 1} {( - 1)^l \binom{{m - 1}}{l}\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{l - 1} } } \\ &=& \binom{{m - 1}}{1}\sum\limits_{j = 1}^{m - 1} {( - 1)^j \binom{m}{j} + \sum\limits_{l = 2}^{m - 1} {( - 1)^l \binom{{m - 1}}{l}\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} S^{l - 1} ({\bf{p}}_m , m - j)} } } \\ &=& \left( {m - 1} \right)\left( { - 1 - ( - 1)^m } \right) + ( - 1)^m \sum\limits_{l = 2}^{m - 1} {( - 1)^l \binom{{m - 1}}{l}}\\ &=& - \left( {m - 1} \right) + ( - 1)^{m - 1}. \end{eqnarray*} $

We can get the conclusion (3.3) by substituting $ Q $ into

$ E(Y) = (m - 1) + \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} } + Q. $

Next, we calculate the second moment. Let

$ c^{ - h} \left( {{\bf{p}}_m } \right) = \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - h} } , h = 1, 2. $

It can be seen from the above expected calculation process, there is conclusion as

$ \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{m - 1} \left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} } = c^{ - 1} \left( {{\bf{p}}_m } \right) - (m - 1) + ( - 1)^{m - 1}. $

The second moment can be obtained by the following calculation.

$ E(Y^2 ) = \sum\limits_{n = m}^\infty {n(n + 1)P(Y = n)} - E(Y) \\ = \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left\{ {m(m + 1)\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{m - 1} } \right.} \\ {\; \; \; }\;+ \left. {{\rm 2}\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^m \left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 2} \left( {1 + m\sum\limits_{k = 1}^j {p_{i_k } } } \right)} \right\} \\ {\rm \; \; \; }\;- \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{m - 1} \left[ {(m - 1) + \left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} } \right]} \\ =\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left\{ {\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{m - 1} (m - 1)^2 } \right.} + \left. {(2m - 3)\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} + 2\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 2} } \right\} \\ = (m - 1)^2 \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } S^{m - 1} ({\bf{p}}_m , j) \\ {\rm \; \; \; }\;+ (2m - 3)\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{m - 1} \left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} } \\ {\rm \; \; \; }\;+ 2\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\sum\limits_{l = 0}^{m - 1} {( - 1)^l \binom{{m - 1}}{l}\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{l - 2} } } \\ = (m - 1)^2 + (2m - 3)\left[ {c^{ - 1} ({\bf{p}}_m ) - (m - 1) + ( - 1)^m } \right] \\ {\rm \; \; \; }\;+ 2\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 2} - \binom{{m - 1}}{1}\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} + \binom{{m - 1}}{2}} \right)} \\ {\rm \; \; \; }\;+ 2\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\sum\limits_{l = 3}^{m - 1} {( - 1)^l \binom{{m - 1}}{l}\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{l - 2} } } \\ = - (m - 1)(m - 2) + (2m - 3)c^{ - 1} ({\bf{p}}_m ) + (2m - 3)( - 1)^m \\ {\rm \; \; \; }\;+ 2c^{ - 2} ({\bf{p}}_m ) - 2(m - 1)c^{ - 1} ({\bf{p}}_m ) + \binom{{m - 1}}{2}\sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} } \binom{m}{j} \\ {\rm \; \; \; }\;+ 2\sum\limits_{l = 3}^{m - 1} {( - 1)^l \binom{{m - 1}}{l}} \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} S^{l - 2} ({\bf{p}}_m , m - j)} \\ = 2c^{ - 2} ({\bf{p}}_m ) - c^{ - 1} ({\bf{p}}_m ) + 2( - 1)^m \left\{ {\sum\limits_{l = 2}^{m - 1} {( - 1)^l } \binom{{m - 1}}{l}} \right\} - (2m - 3)( - 1)^m \\ = 2c^{ - 2} ({\bf{p}}_m ) - c^{ - 1} ({\bf{p}}_m ) + ( - 1)^{m - 1}. $

For example, suppose $ Y^{(m)} \sim MGe({\bf{p}}_m ), m=2, 3, 4 $, there are some results:

$ \begin{align*} P( {Y^{(2)} = n} ) &= p^{n - 1} (1 - p) + (1 - p)^{n - 1} p, n = 2, 3, \cdots ;\\ P( {Y^{(3)} = n} ) &= \sum\limits_{i = 1}^3 {( {p_i (1 - p_i )^{n - 1} - (1 - p_i )p_i^{n - 1} } )} , n = 3, 4, \cdots ; \\ P( {Y^{(4)} = n} ) &= \sum\limits_{i = 1}^4 {\left[ {p_i (1 - p_i )^{n - 1} + p_i^{n - 1} (1 - p_i )} \right]} - \sum\limits_{{\bf{i}}_2 \in {\cal I}_2 (4)} {( {p_{i_1 } + p_{i_2 } } )( {1 - p_{i_1 } - p_{i_2 } } )^{n - 1} } , n = 4, 5, \cdots ;\\ E( Y^{(2)} ) &= \frac{1}{{p(1 - p)}} - 1.\\ E( Y^{(3)} ) &= 1 + \sum\limits_{i = 1}^3 {( {\frac{1}{{p_i }} - \frac{1}{{1 - p_i }}} )} . \\ E( Y^{(4)} ) &= \sum\limits_{i = 1}^4 {( {\frac{1}{{p_i }} + \frac{1}{{1 - p_i }}} )} - \sum\limits_{{\bf{i}}_2 \in {\cal I}_2 (4)} {\frac{1}{{p_{i_1 } + p_{i_2 } }} - 1} . \end{align*} $

4 Uniform Multiple Geometric Distribution

In this section, we mainly discuss a special case of MGe. When the parameter vector $ {\bf{p}}_m = \left( {p_1 , p_2 , \cdots , p_m } \right), p_i = \dfrac{1}{m}, i = 1, 2, \cdots , m $, we denote $ MGe\left( {{\bf{p}}_m } \right) $ as $ MGe\left( m \right) $, named it as Uniform Multiple Geometric distribution(UMGe).

Theorem 4.1 Suppose $ Y $ subject to $ {\rm UMGe} $ distribution, we denote it as $ Y \sim MGe\left( m \right) $, then the probability distribution function of $ Y $ is

$ \begin{eqnarray} P\left( {Y = n} \right) = \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \binom{{m - 1}}{{j - 1}}\left( {1 - \frac{j}{m}} \right)^{n - 1} } , n = m, m + 1, \cdots. \end{eqnarray} $

(4.1)

Proof From the condition $ p_i = \dfrac{1}{m}, i = 1, 2, \cdots , m $, we can get

$ \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {( - 1)^{j - 1} \left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)\left( {1 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)^{n - 1} } = ( - 1)^{j - 1} \binom{m}{j}\left( {\frac{j}{m}} \right)\left( {1 - \frac{j}{m}} \right)^{n - 1}. $

According to equation (3.1), we have

$ P\left( {Y = n} \right) = \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \binom{{m - 1}}{{j - 1}}\left( {1 - \frac{j}{m}} \right)^{n - 1} } , n = m, m + 1, \cdots. $

To obtain the expectation and variance of the UMGe distribution, we first give the following three combined formulas.

Property 4.1 The following three combination formulas are always held for arbitrary positive integers $ m \in \mathbb{Z}^ + $.

$ \begin{eqnarray} r_1 (m) = \sum\limits_{j = 1}^m {( - 1)^{j - 1} \frac{1}{j}} \binom{m}{j} = \sum\limits_{j = 1}^m {\frac{1}{j}}. \end{eqnarray} $

(4.2)

$ \begin{eqnarray} r_2 (m) = \sum\limits_{j = 0}^m {( - 1)^j \frac{1}{{\left( {j + 1} \right)^2 }}\binom{m}{j}} = \frac{1}{{m + 1}}\sum\limits_{j = 1}^{m + 1} {\frac{1}{j}} . \end{eqnarray} $

(4.3)

$ \begin{eqnarray} \sum\limits_{j = 1}^m {( - 1)^{j - 1} \binom{m}{j}\left( {\frac{m}{j}} \right)^2 } = \frac{{m^2 }}{2}\left\{ {\left( {\sum\limits_{j = 1}^m {\frac{1}{j}} } \right)^2 + \sum\limits_{j = 1}^m {\frac{1}{{j^2 }}} } \right\}. \end{eqnarray} $

(4.4)

Proof For the formula (4.2), because $ r_1 (m - 1) = \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \frac{1}{j}\binom{{m - 1}}{j}} $, where

$ \begin{eqnarray*} r_1 (m) &=& \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \frac{1}{j}} \binom{m}{j} + ( - 1)^{m - 1} \frac{1}{m} \\ &=& \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \frac{1}{j}\binom{{m - 1}}{j}} + \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \frac{1}{j}\binom{{m - 1}}{{j - 1}}} + ( - 1)^{m - 1} \frac{1}{m} \\ &=& r_1 (m - 1) + \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \frac{1}{j}\frac{j}{m}} \binom{m}{j} + ( - 1)^{m - 1} \frac{1}{m} \\ &=& r_1 (m - 1) + \frac{1}{m}\left\{ {\left[ { - 1 + \sum\limits_{j = 1}^m {( - 1)^{j - 1} \binom{m}{j}} } \right] + 1} \right\} \\ &=& r_1 (m - 1) + \frac{1}{m}. \end{eqnarray*} $

So we have $ r_1 (m) = r_1 (1) + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{m} = \sum\limits_{j = 1}^m {\frac{1}{j}} $.

For the formula (4.3), let $ R_2 (x) = \sum\limits_{j = 0}^m {( - 1)^j \frac{1}{{\left( {j + 1} \right)^2 }}\binom{m}{j}} e^{(j + 1)x}, $ so the second derivative of $ R_2 (x) $ is

$ R''_2 (x) = \sum\limits_{j = 0}^m {( - 1)^j \binom{m}{j}} e^{(j + 1)x} = e^x (1 - e^x )^m . $

$ \begin{eqnarray*} R_2 (x) = \int {\left( {\int {e^x (1 - e^x )^m dx} } \right)} dx = - \frac{1}{{m + 1}}\left\{ {x + \sum\limits_{j = 1}^{m + 1} {( - 1)^j \frac{1}{j}\binom{{m + 1}}{j}e^x } } \right\} + C_1 x + C_2 \end{eqnarray*} $

where $ C_1 , C_2 $ is the undetermined constant, then from $ R_2 (0) = r_2 (m), r_2 (1) = 1 $, and equation (4.3), we have

$ r_2 (m) = \frac{1}{{m + 1}}\sum\limits_{j = 1}^{m + 1} {\frac{1}{j}} . $

Let us prove the equation (4.4) by mathematical induction. Let

$ f_1 (m) = \sum\limits_{j = 1}^m {( - 1)^{j - 1} \binom{m}{j}\left( {\frac{m}{j}} \right)^2 } , \; \; f_2 (m) = \frac{{m^2 }}{2}\left\{ {\left( {\sum\limits_{j = 1}^m {\frac{1}{j}} } \right)^2 + \sum\limits_{j = 1}^m {\frac{1}{{j^2 }}} } \right\}. $

When $ m=1 $, $ f_1 (1) = f_2 (1) = 1 $, the equation (4.4) held.

If for a given $ m \in \mathbb{Z}^ + $, $ f_1 (m) = f_2 (m) $ holds, then we prove that the conclusion is also true with $ m + 1 $.

$ \begin{eqnarray*} f_1 (m + 1) &=& \sum\limits_{j = 1}^{m + 1} {( - 1)^{j - 1} \binom{{m + 1}}{j}\left( {\frac{{m + 1}}{j}} \right)^2 } \\ &=& \sum\limits_{j = 1}^m {( - 1)^{j - 1} \binom{m}{j}\left( {\frac{{m + 1}}{j}} \right)^2 } + \sum\limits_{j = 1}^m {( - 1)^{j - 1} \binom{m}{{j - 1}}\left( {\frac{{m + 1}}{j}} \right)^2 } + ( - 1)^m \\ &=& \frac{{\left( {m + 1} \right)^2 }}{{m^2 }}f_1 (m) + \left( {m + 1} \right)^2 \left\{ {\frac{1}{{m + 1}}\sum\limits_{j = 1}^{m + 1} {\frac{1}{j}} - \frac{{\left( { - 1} \right)^m }}{{(m + 1)^2 }}} \right\} + ( - 1)^m \\ &=& \frac{{\left( {m + 1} \right)^2 }}{2}\left\{ {\left( {\sum\limits_{j = 1}^m {\frac{1}{j}} } \right)^2 + \sum\limits_{j = 1}^m {\frac{1}{{j^2 }}} } \right\} + \left( {m + 1} \right)^2 \left\{ {\frac{1}{{m + 1}}\sum\limits_{j = 1}^m {\frac{1}{j}} + \frac{1}{{(m + 1)^2 }}} \right\} \\ &=& \frac{{\left( {m + 1} \right)^2 }}{2}\left\{ {\left( {\sum\limits_{j = 1}^m {\frac{1}{j}} } \right)^2 + \frac{2}{{m + 1}}\sum\limits_{j = 1}^m {\frac{1}{j}} + \frac{1}{{(m + 1)^2 }} + \sum\limits_{j = 1}^m {\frac{1}{{j^2 }} + \frac{1}{{(m + 1)^2 }}} } \right\} \\ &=& \frac{{\left( {m + 1} \right)^2 }}{2}\left\{ {\left( {\sum\limits_{j = 1}^{m + 1} {\frac{1}{j}} } \right)^2 + \sum\limits_{j = 1}^{m + 1} {\frac{1}{{j^2 }}} } \right\} \\ &=& f_2 (m + 1) . \end{eqnarray*} $

So the equation hold for any $ m \in \mathbb{Z}^ + $.

Theorem 4.2 Suppose $ Y \sim MGe\left( m \right) $, the expectation and variance of Y are as follows

$ \begin{eqnarray} E\left( Y \right) = \sum\limits_{j = 1}^m {\frac{m}{j}}, \end{eqnarray} $

(4.5)

$ \begin{eqnarray} {\rm{Var(}}Y{\rm{)}} = \sum\limits_{j = 1}^m {\left( {\frac{m}{j}} \right)^2 \left( {1 - \frac{j}{m}} \right)}. \end{eqnarray} $

(4.6)

Proof According to Theorem 3.2 and the equation (4.2) and basic condition $ {\bf{p}}_m = \left( {p_1 , p_2 , \cdots , p_m } \right), p_i = \dfrac{1}{m}, i = 1, 2, \cdots , m $, we have

$ \begin{eqnarray*} E(Y) &=& \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \sum\limits_{{\bf{i}}_j \in {\cal I}_j (m)} {\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 1} } + ( - 1)^{m - 1} } \\ &=& \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \frac{m}{j}\binom{m}{j}} + ( - 1)^{m - 1} \\ &=& m\sum\limits_{j = 1}^m {( - 1)^{j - 1} \frac{1}{j}\binom{m}{j}} \\ &=& m\sum\limits_{j = 1}^m {\frac{1}{j}}.\; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; ({\rm Property \; \; 4.1 \; \; (4.2)}) \end{eqnarray*} $

The variance of $ Y $,

$ \begin{eqnarray*} {\rm{Var(}}Y{\rm{)}} &=& E(Y^2 ) - \left[ {E(Y)} \right]^2 \\ &=& \sum\limits_{j = 1}^{m - 1} {( - 1)^{j - 1} \sum\limits_{{\bf{i}}_j \in {\cal I}_m (j)} {\left( {\sum\limits_{k = 1}^j {p_{i_k } } } \right)^{ - 2} \left( {2 - \sum\limits_{k = 1}^j {p_{i_k } } } \right)} + ( - 1)^{m - 1} } - \left( {m\sum\limits_{j = 1}^m {\frac{1}{j}} } \right)^2 \\ &=& 2\sum\limits_{j = 1}^m {( - 1)^{j - 1} \binom{m}{j}\left( {\frac{m}{j}} \right)^2 } - m\sum\limits_{j = 1}^m {\frac{1}{j}} - \left( {m\sum\limits_{j = 1}^m {\frac{1}{j}} } \right)^2 \\ &=& m^2 \left\{ {\left( {\sum\limits_{j = 1}^m {\frac{1}{j}} } \right)^2 + \sum\limits_{j = 1}^m {\frac{1}{{j^2 }}} } \right\} - m\sum\limits_{j = 1}^m {\frac{1}{j}} - \left( {m\sum\limits_{j = 1}^m {\frac{1}{j}} } \right)^2 \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \; \; \; \; \; \; \; \; \; ({\rm Property \; \; 4.1 \; \; (4.3)}) \\ &=& \sum\limits_{j = 1}^m {\left( {\frac{m}{j}} \right)^2 \left( {1 - \frac{j}{m}} \right)} . \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; ({\rm Property \; \; 4.1 \; \; (4.4)}) \end{eqnarray*} $

There is another explanation for the expectation and variance of the UMGe distribution: let the random variable $ X_i , i = 1, 2, \cdots , m $ represent the number of trails carried out when the "$ i $-th new result" appears. So $ X_i \sim Ge\left( {1 - \frac{{i - 1}}{m}} \right), i = 1, 2, \cdots , m $. The total number of trails is $ Y = \sum\limits_{i = 1}^m {X_i } $, then it can be calculated as

$ \begin{eqnarray*} E\left( Y \right) &=& \sum\limits_{i = 1}^m {E\left( {X_i } \right)} = 1 + \frac{m}{{m - 1}} + \frac{m}{{m - 2}} + \cdots + m , \\ {\rm{Var}}\left( Y \right) &=& \sum\limits_{i = 1}^m {{\rm{Var}}\left( {X_i } \right)} = 0 + \frac{1}{m}\left( {\frac{m}{{m - 1}}} \right)^2 + \frac{2}{m}\left( {\frac{m}{{m - 2}}} \right)^2 + \cdots + \frac{{m - 1}}{m}\left( {\frac{m}{1}} \right)^2. \end{eqnarray*} $

This is consistent with the formula we have deduced. According to this idea, we can generate random numbers that obey UMGe distribution, i.e., by using the sum of random numbers that subject to different geometric distribution.

5 Simulation Example

In this section, we consider the probability distribution function(pdf) and cumulative distribution function(cdf) of MGe distribution. Note $ \Phi (y) $ represents the cumulative distribution function of standard normal distribution.

Example 1 Suppose $ Y^{(m)} \sim MGe({\bf{p}}_m ) $, where $ {\bf{p}}_m = \left( {p_1 , p_2 , \cdots , p_m } \right) $. The parameters are generated as follows

$ p_i = \frac{{2i}}{{m(m + 1)}}, i = 1, 2, \cdots , m. $

Let $ \mu _m = E\left( {Y^{(m)} } \right), \sigma _m^2 = {\rm{Var}}\left( {Y^{(m)} } \right) $, discretization of normal distribution, let $ \varphi (y) = \Phi \left[ {(y - \mu _m )/\sigma _m } \right] - \Phi \left[ {(y - \mu _m - 1)/\sigma _m } \right] $, $ p(y) = P(Y^{(m)} = y) $, and $ F(y) = P(Y^{(m)} \le y) $ is the $ \rm cdf $ of $ Y^{(m)} $.

Consider four situations when $ m = 5, 10, 15, 20 $, respectively, the expectation and variance of the random variable are calculated as

$ \begin{array}{l} \mu _5 = 18.67, \; \; \; \mu _{10} = 68.98, \; \; \; \mu _{15} = 150.61, \; \; \; \; \mu _{20} = 263.58; \\ \sigma _5^2 = 169.57, \; \sigma _{10}^2 = 2420.43, \; \sigma _{15}^2 = 11680.48, \; \sigma _{20}^2 = 35959.53. \\ \end{array} $

We draw the image of the pdf of $ Y^{(m)} $, the upper bound $ u_n = u_m (n) $ of $ Y^{(m)} $'s pdf according to formula (3.2), and the scatter plot of discretization of normal distribution $ N(\mu_m, \sigma_m^2) $, respectively. The images in four situations are shown in Figure 1.

Figure 1 The pdf, cdf, upper bound and normal approximation of MGe distribution in $m = 5, 10, 15, 20$ situations

From the image point of view, the normal distribution is not a good approximation of the MGe distribution, however, there are two intersections between the density curve of normal distribution and the upper bound $ u_n $ of MGe's pdf.

The first intersection is near its expectation, so we can consider to calculate probability by $ p(y) = P(Y^{(m)} = y) $ when $ y < \mu _m $, using upper bound instead of probability when $ y > \mu _m $, that is $ p(y) = u_m (n) $.

In addition, the pdf of MGe distribution is always a right-skewed distribution which can not be fitted well by using normal distribution. Because of the complexity of the pdf form of MGe distribution, how to obtain its approximate distribution is an important problem to be studied.

Example 2 Suppose $ Y_m \sim MGe(m) $, we still use the notation of Example 1, under conditions $ m = 10, 30, 50, 70 $, the expectation and variance are

$ \begin{array}{l} \mu _{10} = 29.29, \; \; \; \mu _{30} = 119.85, \; \; \; \; \mu _{50} = 224.96, \; \; \; \mu _{70} = 338.29; \\ \sigma _{10}^2 = 125.69, \; \; \sigma _{30}^2 = 1331.09, \; \; \sigma _{50}^2 = 3837.87, \; \; \sigma _{70}^2 = 7652.38. \\ \end{array} $

respectively.

We draw the image of the pdf of $ Y_m $, the upper bound of $ Y^{(m)} $'s pdf and the scatter plot of discretization of normal distribution $ N(\mu_m, \sigma_m^2) $, respectively. The images in four situations are shown in Figure 2.

Figure 2 The pdf, cdf, upper bound and normal approximation of UMGe distribution in $m = 10, 30, 50, 70 $ situations

It can be seen that the effect of normal approach to UMGe is better than MGe. We define the quantile of the UMGe distribution as $ q_p (m) $ which satisfies

$ q_p (m)=\inf \{y: P\left( {Y_m \le y} \right) = p \}. $

We calculate the expectation of each $ Y_m $ in the situation of $ m = 3, 4, \cdots , 40 $ as shown in columns 7 and 15 of Table 1, the variance of each $ Y_m $ shown in columns 8 and 16 of Table 1, the quantiles of $ p = 0.25, 0.5, 0.75, 0.9, 0.95 $ are calculated and listed in the corresponding columns in Table 1.

Table 1 The quantile, expectation and variance of UMGe

$ m $	0.25	0.5	0.75	0.9	0.95	$ \mu $	$ \sigma^2 $	$ m $	0.25	0.5	0.75	0.9	0.95	$ \mu $	$ \sigma^2 $
3	3	4	6	8	10	6	7	22	62	75	93	115	130	81	693
4	5	6	9	12	15	8	14	23	65	80	99	121	137	86	762
5	7	9	13	17	20	11	25	24	69	84	104	127	144	91	833
6	9	12	17	22	26	15	39	25	73	89	110	134	151	95	908
7	12	16	21	27	32	18	56	26	77	93	115	140	158	100	986
8	15	19	25	32	37	22	76	27	81	98	121	147	166	105	1068
9	17	22	29	37	43	25	99	28	85	103	126	153	173	110	1152
10	20	26	34	43	50	29	126	29	89	107	132	160	180	115	1240
11	23	30	38	49	56	33	155	30	93	112	137	167	188	120	1331
12	26	34	43	54	62	37	188	31	97	117	143	173	195	125	1425
13	30	37	48	60	69	41	224	32	101	122	149	180	202	130	1523
14	33	41	53	66	75	46	263	33	105	127	154	187	210	135	1624
15	36	45	57	72	82	50	306	34	110	132	160	193	217	140	1728
16	40	50	62	78	89	54	352	35	114	136	166	200	225	145	1835
17	43	54	67	84	95	58	400	36	118	141	172	207	232	150	1946
18	47	58	73	90	102	63	453	37	122	146	178	214	240	155	2060
19	50	62	78	96	109	67	508	38	127	151	183	221	247	161	2177
20	54	66	83	102	116	72	567	39	131	156	189	228	255	166	2298
21	58	71	88	108	123	77	628	40	135	161	195	234	263	171	2421

Table 1
The quantile, expectation and variance of UMGe

Let us calculate the quantile of $ Y_m $ at $ p = 0.25, 0.5, 0.75, 0.9, 0.95 $ for $ m = 3, 4, \cdots, 40 $. The scatter plot of quantile and the fitting curve of the quadratic regression model drawn with the horizontal axis of $ m $ and the vertical axis of $ q_p (m) $ are shown in Figure 3 (a).

Figure 3 (a) Quantile q_p(m) for p = 0:25, 0:5, 0:75, 0:9, 0:95; (b) Empirical regression surface of $ \hat q$(p, m) and scatter of (m, p, q(p, m)).

From the data in Table 1 and the trend of quantile $ q_p (m) $ in Figure 3(a), we can suppose the quadratic regression model of quantile $ q_p (m) $ and $ m $ as

$ \begin{eqnarray} \hat q_p (m) = \hat \beta _p^{(0)} + \hat \beta _p^{(1)} m + \hat \beta _p^{(2)} m^2 \end{eqnarray} $

(5.1)

The regression model that establishes according to (5.1) is very significant in $ p = 0.25, 0.5, 0.75, 0.9, 0.95 $ by calculation, and the regression coefficients as shown in Table 2.

Table 2
The quantile fitting curve coefficient of UMGe

Furthermore, calculate all quantile values for $ m=3, 4, \cdots , 100, p=0.05, 0.06, \cdots, 0.97 $, after regression analysis, the empirical regression equation is

$ \begin{eqnarray} \hat q(p, m) = - 13.42552 + 2.83025m + 0.01111m^2 + 0.86067mp + 2.82063mp^2. \end{eqnarray} $

(5.2)

Draw the surface graph of (5.2) and the scatter graph of $ \left( {m, p, q(p, m)} \right) $ shown in Figure 3(b).

For example, The 0.95 quantile of $ Y_{10} $ can be obtained from

$ \hat q_{0.95} (10) = \beta _{0.95}^{(0)} + \beta _{0.95}^{(1)} \times 10 + \beta _{0.95}^{(2)} \times 10^2 \approx 49.15. $

Using (5.2) to estimate as

$ \hat q(10, 0.95) \approx 49.62044. $

The quantile calculated by (5.1) is more accurate than (5.2), however, in many cases (5.2) is more convenient to use. When $ m $ is larger, the estimated value is often smaller than the real value, so it can be considered to build a nonparametric regression model based on the calculated data, so the calculation is more accurate.

6 Discussion

In this paper, the multiple geometric distribution is discussed under the condition of putting back, which means that the probability of occurrence of event $ A_i $ is constant for each test. If the sampling conditions are changed, and the samples are taken one by one in a finite population, the distribution of the total number of tests will be different if several specified results occur. In many cases, upper bound approximation and normal approximation can be considered.

References

[1]	Ahmed Z Afify, Zohdy M Nofal, Nadeem Shafique Butt. Transmuted complementary weibull geometric distribution[J]. Pakistan Journal of Statistics & Operation Research, 2014, 10(4): 435-454.
[2]	Jayakumar K, Babu M G. Discrete weibull geometric distribution and its properties[J]. Communications in Statistics-Theory and Methods, 2018, 47(7): 1767-1783. DOI:10.1080/03610926.2017.1327074
[3]	Ke mp, Adrienne. The q-beta-geometric distribution as a model for fecundability[J]. Communications in Statistics: Theory and Methods, 2001, 30(11): 2373-2384. DOI:10.1081/STA-100107692
[4]	Li G H, Zhang C Q. The pseudo component transformation design for experiment with mixture[J]. Statistics & Probability Letters, 2017, 131: 19-24.
[5]	Miller F P, Vandome A F, Mcbrewster J. Geometric distribution[M]. New York: Springer, 2008.
[6]	Muwafi A N P A A. Waiting for the kte consecutive success and the fibonacci sequence of order k[J]. Fibonacci Quarterly, 1980, 20(1): 28-32.
[7]	Pedro L Ramos, Fernando A Moala, Jorge A Achcar. Objective priors for estimation of extended exponential geometric distribution[J]. Journal of Modern Applied Statistical Methods Jmasm, 2014, 13(2): 226-243. DOI:10.22237/jmasm/1414815060
[8]	Philippou A N, Georghiou C, Philippou G N. A generalized geometric distribution and some of its properties[J]. Statistics & Probability Letters, 1983, 1(4): 171-175.
[9]	Porwal S. Generalized distribution and its geometric properties associated with univalent functions[J]. Journal of Complex Analysis, 2018, 2018: 1-5.
[10]	Xiao S J, Li G H, Yang W Z. A kind of generalized geometric distribution(in Chinese)[J]. Jilin Normal University Journal, 2015, 36(2): 43-50.