Let $\mathscr{P}=\{1, \cdots, N\}$ denote a finite population of $N$ identifiable units, where $N$ is known. Associated with the $i$th unit of $\mathscr{P}$, there are $p+1$ quantities: $y_{i}, x_{i1}, \cdots, x_{ip}$, where all but $y_{i}$ are known, $i=1, \cdots , N$. Let $y=(y_{1}, \cdots, y_{N})'$ and $X=(X_{1}, \cdots, X_{N})'$, where $X_{i}=(x_{i1}, \cdots, x_{ip})'$, $i=1, \cdots , N$. Relating the two sets of variables, we consider the linear model
where $\beta$ is a $p\times 1$ unknown parameter vector, $V$ is a known symmetric positive definite matrix, but the parameter $\sigma^{2}>0$ is unknown.
For the superpopulation model (1.1), it is interesting to study the optimal prediction of the population quantity $\theta(y)$ such as the population Total $T=\sum\limits_{i=1}^{N}y_{i}$, the population variance $S_{y}^{2}=\sum\limits_{i=1}^{N}(y_{i}-\bar{y}_{N})^{2}/N$, where $\bar{y}_{N}=T/N$ is the population mean and the finite population regression coefficient $\beta_{N}=(X'V^{-1}X)^{-1}X'V^{-1}y$, and so on. In the literature, a lot of predictions for the population quantities were produced. For example, Bolfarine and Rodrigues [1] gave the simple projection predictor, and obtained necessary and sufficient conditions for it to be optimal. Bolfarine et al. [2] studied the best unbiased prediction of finite population regression coefficient under the generalized prediction mean squared error in different kinds of models. Xu et al. [3] obtained a kind of optimal prediction of linear predictable function, and got the necessary and sufficient conditions for any linear prediction to be optimal under matrix loss. Xu and Yu [4] further gave the admissible prediction in superpopulation models with random regression coefficients under matrix loss function. Hu and Peng [5] obtained some conditions for linear prediction to be admissible in superpopulation models with and without the assumption that the underlying distribution is normal, respectively. Furthermore, Hu et al. [6-7] discussed the linear minimax prediction in the multivariate normal populations and Gauss-Markov populations, respectively. Their results showed that linear minimax prediction for finite population regression coefficient is admissible in some conditions. Bolfarine and Zacks [8] studied Bayes and minimax prediction under square error loss function in a finite population with single parametric prior. Meanwhile, Bansal and Aggarwal [9-11] considered Bayes prediction of finite population regression coefficient using a balanced loss function under the same prior information. There are two characteristics in the above studies.
On the one hand, they obtained the optimal, linear admissible and minimax predictions based on statistical decision theory. It is well known that statistical decision theory only consider the sample information and loss function and do not consider the prior information. However, people usually have these information.
On the other hand, they discussed the Bayes prediction by considering the prior information of single parameter, and did not consider the situation of multi-parameters. In other words, they only made use of the prior information of regression coefficient, but not use the prior information of error variance in model (1.1). In fact, multi-parameter situations are often encountered in the practical problems. Therefore, in this paper, we will study Bayes prediction of linear and quadratic quantities in a finite population where regression coefficient and error variance have the normal inverse-Gamma prior.
Assume that the prior distribution of $\beta$ and $\sigma^{2}$ is normal inverse-Gamma distribution, that is,
where $\mu$ is a known $p\times 1$ vector, $\alpha$ and $\lambda$ are known constants, $k^{-1}$ is a ratio between the prior variance of $\beta$ and sample variance of model (1.1). We can suppose that $k^{-1}$ is known by experience or professional knowledge. Therefore, the joint prior distribution of $(\beta, \sigma^{2})$ is
where $M_{1}=(\frac{k}{2\pi})^{\frac{p}{2}}(\frac{\lambda}{2})^{\frac{\alpha}{2}}[\Gamma(\frac{\alpha}{2})]^{-1}$. The Bayes model defined by (1.1) and (1.2) is designated by (1.4). In order to obtain Bayes prediction in the Bayes model (1.4), a sample $\mathscr{S}$ of size $n$ is selected from $\mathscr{P}$ according to some specified sampling plan. Let $\mathscr{R}=\mathscr{P}-\mathscr{S}$ be the unobserved part of $\mathscr{P}$ of size $N-n$. After the sample $\mathscr{S}$ has been selected, we may reorder the elements of $y$ such that we have the corresponding partitions of $y$, $X$ and $V$, that is
where $X$ and $X_{s}$ are known column full rank matrices.
The rest of this paper is organized as follows: in Section 2, we give Bayes predictor of population quantities in the Bayes model (1.4). Section 3 is devoted to discuss Bayesian prediction of linear quantities. In Section 4, we obtain Bayes prediction of quadratic quantities. Some examples are given in Section 5. Concluding remarks are placed in Section 6.
In this section, we will discuss the Bayes prediction of population quantities. Let $L(\hat{\theta}(y_{s}), \theta(y))$ be a loss function for predicting $\theta(y)$ by $\hat{\theta}(y_{s})$. The corresponding Bayes prediction risk of $\hat{\theta}(y_{s})$ in model (1.4) is defined as $\rho(\hat{\theta}(y_{s}), \theta(y))=E_{y}[L(\hat{\theta}(y_{s}), \theta(y))]$, where the expectation operator $E_{y}$ is performed with respect to the joint distribution of $y$ and $(\beta, \sigma^{2})$. The Bayes predictor is the one minimizing the Bayes prediction risk $\rho(\hat{\theta}(y_{s}), \theta(y))$. In particular, when we consider the squared error loss, then the Bayes prediciton of $\theta(y)$ is
and the Bayes prediction risk is
where the expectation operator $E_{y_{s}}$ is performed with respect to the joint distribution of $y_{s}$ and $(\beta, \sigma^{2})$. It is noted that $y_{s}|\beta, \sigma^{2}\sim N_{n}(X_{s}\beta, \sigma^{2}V_{s})$ and
This together with eq. (1.3) will yield the following results.
Theorem 2.1 Under the Bayes model (1.4), the following results hold.
(ⅰ) The joint posterior probability density of $(\beta, \sigma^{2})$ is
(ⅱ) The marginal posterior distribution of $\beta$ is $p$-dimensional $t$ distribution $MT_{p}(\tilde{\beta}_{s}, $ $ \frac{c_{0}\Sigma}{n+\alpha}, n+\alpha)$ with probability density
(ⅲ) The marginal posterior distribution of $\sigma^{2}$ is $\Gamma^{-1}(\frac{n+\alpha}{2}, \frac{c_{0}}{2})$ with probability density
(ⅳ) Bayes prediction distribution of $y_{r}$ given $y_{s}$ is $N-n$ dimensional $t$ distribution $MT_{N-n}(\hat{y_{r}}, \frac{c_{0}U}{n+\alpha}, n+\alpha)$ with probability density
where
Proof The proof of (ⅰ): since
and $y_{s}|\beta, \sigma^{2}\sim N_{n}(X_{s}\beta, \sigma^{2}V_{s})$, the conditional probability density of $y_{s}$ given $(\beta, \sigma^{2})$ is
This together with eq. (1.3) will yield that the joint posterior probability density of $(\beta, \sigma^{2})$ is
where $m(y_{s})$ is the marginal probability density of $y_{s}$, symbol $\propto$ denotes proportional to. By adding the regularization constant $M_{2}|\Sigma|^{-\frac{1}{2}}$ to eq. (2.3), we obtain result (ⅰ).
The proof of (ⅱ): by the integral of eq. (2.2) about $\sigma^{2}$, we have
which implies that the marginal posterior distribution of $\beta$ is $p$-dimensional $t$ distribution with mean vector $\tilde{\beta}$, correlation matrix $\frac{c_{0}\Sigma}{n+\alpha}$ and degrees of freedom $n+\alpha$.
The proof of (ⅲ): by the integral of eq. (2.2) about $\beta$, we can obtain the result. Here it is omitted.
The proof of (ⅳ): by $y_{s}|\beta, \sigma^{2}\sim N_{n}(X_{s}\beta, \sigma^{2}V_{s})$, $y_{r}|\beta, \sigma^{2}, y_{s}\sim N_{N-n}(X_{r}\beta+V_{rs}V_{s}^{-1}(y_{s}-X_{s}\beta), \sigma^{2}(V_{r}-V_{rs}V_{s}^{-1}V_{sr}))$, and eq. (2.2), we know that
Adding the regularization constant to eq. (2.3) and integrating it by $\beta$ and $\sigma^{2}$, respectively, we can obtain the result.
In order to obtain Bayes prediction of $\theta(y)$, we consider the squared error loss
then Bayes prediciton of $\theta(y)$ is
and Bayes prediction risk is
where the expectation operator $E_{y_{s}}$ is performed with respect to the joint distribution of $y_{s}$ and $(\beta, \sigma^{2})$. By result (ⅳ) of Theorem 2.1, we know
and
Now, let $\theta(y)=Qy$ be any linear quantity, where $Q=(Q_{s}', Q_{r}')$ is a known $1\times N$ vector. According to Theorem 2.1, eqs. (3.4) and (3.5), we have the following conclusions.
Theorem 3.1 Under model (1.4) and squared error loss function, Bayes predictor of linear quantity $Qy$ is $\tilde{\theta}(y_{s})=Q_{s}'y_{s}+Q_{r}'\tilde{y}_{r}$, and Bayes predictor risk is $\frac{E_{y_{s}}(c_{0})}{n+\alpha-2}Q_{r}'UQ_{r}$.
As we know, the best linear unbiased prediction of $Qy$ under the squared error loss is $\hat{\theta}(y_{s})$, where $\hat{\theta}(y_{s})=Q_{s}'y_{s}+Q_{r}'\hat{y}_{r}$, and $ \hat{y}_{r}=X_{r}\hat{\beta}_{s}+V_{rs}V_{s}^{-1}(y_{s}-X_{s}\hat{\beta}_{s})$. In the following, we will discuss the superiority between Bayes prediction and the best linear unbiased prediction under the predicative mean squared error (PMSE), which is defined by ${\rm PMSE}(d(y_{s}), Qy)=E[(d(y_{s})-Qy)^{2}].$
Theorem 3.2 Under model (1.4), Bayes prediction $\tilde{\theta}(y_{s})$ of $Qy$ is better than the best linear unbiased prediction $\hat{\theta}(y_{s})$ under the predicative mean squared error.
Proof By the definition of PMSE and $\tilde{\beta}_{s}=\hat{\beta}_{s}-k\Sigma(\hat{\beta}_{s}-\mu)$, we have
That is, ${\rm PMSE}(\hat{\theta}(y_{s}), Qy)-{\rm PMSE}(\tilde{\theta}(y_{s}), Qy)> 0$. Therefore, $\tilde{\theta}(y_{s})$ is better than $\hat{\theta}(y_{s})$ under the predicative mean squared error.
Corollary 3.1 Bayes predictor of the population total $T$ under model (1.4) and the loss function (3.1) is $\tilde{T}(y_{s})=1_{n}'y_{s}+1_{N-n}'[X_{r}\tilde{\beta}_{s}+V_{rs}V_{s}^{-1}(y_{s}-X_{s}\tilde{\beta}_{s})]$, and Bayes risk of this predictor is $\frac{E_{y_{s}}(c_{0})}{n+\alpha-2}1_{N-n}'U1_{N-n}$. Moreover, $\hat{T}(y_{s})$ is dominated by $\tilde{T}(y_{s})$ under the predicative mean squared error, where $\hat{T}(y_{s})=1_{n}'y_{s}+1_{N-n}'\hat{y}_{r}$.
For the finite population regression coefficient $\beta_{N}=(X'V^{-1}X)^{-1}X'V^{-1}y$, following Bolfarine et al. [2], we can write it as
Then by Theorem 3.1, we have the following corollary.
Corollary 3.2 Bayes predictor of the population total $\beta_{N}$ under model (1.4) and the loss function (3.1) is $\tilde{\beta}_{N}(y_{s})=K_{s}y_{s}+K_{r}E(y_{r}|y_{s})$, and Bayes risk of this predictor is $\frac{E_{y_{s}}(c_{0})}{n+\alpha-2}K_{r}UK_{r}'$. Moreover, it is better than $\hat{\beta}_{N}(y_{s})$ under the predicative mean squared error, where $\hat{\beta}_{N}(y_{s})=K_{s}y_{s}+K_{r}\hat{y}_{r}$.
In order to illustrate our results, we give the following example.
Example 3.1 Let $X=(x_{1}, x_{2}, \cdots, x_{N})'$, $V={\rm diag}(x_{1}, x_{2}, \cdots, x_{N})$ in the Bayesian model (1.4), where $x_{i}\neq 0, i=1, 2, \cdots, N$. If $X_{s}=(x_{1}, x_{2}, \cdots, x_{n})', y_{s}=(y_{1}, y_{2}, \cdots, y_{n})'$, we have $\tilde{\beta}_{s}=\frac{1}{\sum\limits_{i=1}^{n}x_{i}+k}(k\mu+1_{n}'y_{s})$, $\hat{\beta}_{s}=\frac{1}{\sum\limits_{i=1}^{n}x_{i}}1_{n}'y_{s}$. According to Theorem 3.1, we have the following conclusions.
(ⅰ) $\tilde{T}(y_{s})=1_{n}'y_{s}+\frac{\sum\limits_{i=n+1}^{N}x_{i}}{\sum\limits_{i=1}^{n}x_{i}+k}(k\mu+1_{n}'y_{s}).$ Its Bayes prediction risk is $\frac{\lambda(\sum\limits_{i=1}^{N}x_{i}+k)\sum\limits_{i=n+1}^{N}x_{i}}{(\alpha-2)(\sum\limits_{i=1}^{n}x_{i}+k)}$. Moreover, $\tilde{T}(y_{s})$ is better than $\hat{T}(y_{s})$.
(ⅱ) $\tilde{\beta}_{N}(y_{s})=\frac{1}{\sum\limits_{i=1}^{N}x_{i}}\hat{T}(y_{s})$, and its Bayes prediction risk is $\frac{\lambda(\sum\limits_{i=1}^{N}x_{i}+k)\sum\limits_{i=n+1}^{N}x_{i}}{(\alpha-2)(\sum\limits_{i=1}^{n}x_{i}+k)(\sum\limits_{i=1}^{N}x_{i})^{2}}$. Moreover, $\tilde{\beta}_{N}(y_{s})$ is better than $\hat{\beta}_{N}(y_{s})$.
In the following, we continue to give the simulation study to explain our results according to the following steps, which are executed on a personal computer using Version 7.9 (R2009b) Matlab software.
(ⅰ) Generating randomly a $N\times p$ full column rank matrix X and a $p$-dimensional vector $\mu$;
(ⅱ) The number $\sigma^{2}$ and random error $\varepsilon$ are generated from distribution $\Gamma^{-1}(\frac{\alpha}{2}, \frac{\lambda}{2})$ and $N(0, \sigma^{2}V)$, respectively;
(ⅲ) Generating a $p$-dimensional vector $\beta$ by the distribution $N(\mu, \frac{\sigma^{2}}{k}I_{p})$;
(ⅳ) Obtaining the dependent variable $y$ by the model $y=X\beta+\varepsilon$.
(ⅴ) Generating randomly $N$-dimensional vector $Q$, then Bayes prediction and the best linear unbiased prediction of $Qy$ are derived by Theorem 3.1, respectively.
(ⅵ) Finally, we compare the PMSE between Bayes prediction and best linear unbiased prediction.
Now, we assume that $N=10, n=6, p=3, \alpha=8, \lambda=12, k=10$, and obtain the above data. The simulation study shows that Bayes prediction is better than the best linear unbiased prediction, which is consistent to our theoretical conclusions. Here, we give the above data in one experiment as following.
At this time, we get randomly
By direct computation, we have $Qy=-4.3971$. By Theorem 2.1, we know $\tilde{\theta}(y_{s})=-4.8497, \hat{\theta}(y_{s})=-5.7928$, and ${\rm PMSE}(\hat{\theta}(y_{s}))-{\rm PMSE}(\tilde{\theta}(y_{s}))= 0.0844>0$. Therefore, Bayes prediction of $Qy$ is better than the best linear unbiased predictor.
In this section, we will discuss Bayes prediction of quadratic quantities $f(H)=y'Hy$, where $H$ is a known symmetric matrix. Assume that $H=\left(\begin{array}{cc}H_{11}&H_{12}\\ H_{21}&H_{22}\end{array}\right)$ with $H_{12}=H_{21}'$, then
By Theorem 2.1 and eq. (3.2), we have the following results.
Theorem 4.1 Under model (1.4) and the loss function (1.3), the Bayes prediction of $f(H)$ is
For the population variance $S_{y}^{2}$, we know that
where $1_{n}$ denotes $n$ dimensional vector with elements 1. Then by Theorem 4.1, we can obtain the following corollary.
Corollary 4.1 The Bayes prediction of the population variance $S_{y}^{2}$ under model (1.4) and the loss function (3.1) is
It is noted that $S_{y}^{2}=\frac{n}{N}S_{y_{s}}^{2}+(1-\frac{n}{N})[S_{y_{r}}^{2}+\frac{n}{N}(\bar{y}_{s}-\bar{y}_{r})^{2}]$, where $\bar{y}_{s}$ and $S_{y_{s}}^{2}$ are the mean and variance of $y_{s}$, $\bar{y}_{r}$ and $S_{y_{r}}^{2}$ are the mean and variance of $y_{r}$. Therefore, the Bayes prediction of the population variance can also be expressed as follows.
Remark 4.1 The Bayes prediction of the population variance $S_{y}^{2}$ under model (1.4) and the loss function (3.1) is
Proof Since $S_{y}^{2}=\frac{n}{N}S_{y_{s}}^{2}+(1-\frac{n}{N})[S_{y_{r}}^{2}+\frac{n}{N}(\bar{y}_{s}-\bar{y}_{r})^{2}]$, we only derive the Bayes prediction of $S_{y_{r}}^{2}+\frac{n}{N}(\bar{y}_{s}-\bar{y}_{r})^{2}$. Moreover, we know that $S_{y_{r}}^{2}=\frac{1}{N-n}y_{r}'(I_{N-n}-\frac{1}{N-n}1_{N-n}1_{N-n}')y_{r}, $ and $\bar{y}_{r}=\frac{1}{N-n}1_{N-n}'y_{r}$. Therefore, the Bayes prediction of $S_{y_{r}}^{2}$ is
And, the Bayes prediction of $(\bar{y}_{s}-\bar{y}_{r})^{2}$ is
According to eqs.(4.1)-(4.2) and the expression of $S_{y}^{2}$, we can derive the result of this remark. It is easy to verify that the result of this remark is consistent to Corollary 4.1.
Example 4.1 Let $X=1_{N}, V=(1-\rho)I_{N}+\rho 1_{N}1_{N}'$ in the Bayesian model (1.4), where $\rho\in (0, 1)$ is known. It can be checked that $X_{s}'V_{s}^{-1}X_{s}=\frac{n}{1+(n-1)\rho}$, and
Then,
where $a=\frac{1-\rho}{1+(n-1)\rho}\tilde{\beta}_{s}+\frac{n\rho}{1+(n-1)\rho}\bar{y}_{s}$, $b=\frac{1-\rho}{1+(n-1)\rho}[\rho+\frac{1-\rho}{n+k+(n-1)k\rho}]$. According to Remark 4.1, we know that
In this paper, we obtain Bayes prediction of linear and quadratic quantities in the finite population with normal inverse-Gamma prior information. In our studies, on the one hand, the distribution of the superpopulation model is need to be normal. However, in many occasions, the distribution of the model is usually unknown in addition to the mean vector and covariance matrix. At this time, how to deal with the Bayes prediction? On the other hand, if the prior distribution is hierarchical and improper, how to obtain the generalized Bayes prediction and discuss its optimal properties? Such as these problems are deserved to discuss in the future.