Consider the nonlinear semiparametric model
where $Y$ is the scalar response variable, $X$ is a $p$-dimensional covariate and $T$ is a univariate random variable, $g(x, \beta)$ is a pre-specified function in which $\beta$ is an unknown parameter vector in $R^d$ and $m(.)$ is an unknown smooth function. The model error $e$ are {independent and identically distributed} with zero mean. Obviously, model (1) is reduced to be a partially linear model if let $g(X, \beta)=X^T \beta$.
Model (1) is a very extensive semiparametric model which was widely studied in many fields, such as econometric, biology, and environmental science. Li and Nie [1] proposed an estimation procedure for parameter $\beta$ through a nonlinear mixed-effects approach. Furthermore, Li and Nie [2] analyzed a real data in ecology with this model and proposed two estimation procedures by profile nonlinear least squares and linear approximation approach. Huang and Chen [3] obtained the spline profile least square estimator of parameter $\beta$ when the baseline function $m(.)$ was approximated by some graduating functions. Later, Song et al. [4] provided a sieve least square method when the nonlinear function $g(., .)$ has some special form. Recently, Xiao et al. [5] applied empirical likelihood approach to this model and compared with the normal approximation method in terms of confidence region of parameter $\beta$.
In practice, some variables of our interest are difficult or expensive to be measured exactly and then are usually replaced by some surrogate observations. The semiparametric errors-in-variables (EV) model has frequently been applied to many fields and has received much attention in the literature. The initial assumption is that the variable error is additive. [6]-[9] applied the empirical likelihood method to partially linear models and varying-coefficient partially linear models with additive error assumption. However, the additive error assumption is usually not appropriate in real situation. The realistic case is that the relationship between the surrogate variables and the true variables is rather complicated and may be that no error model structure is assumed. In this case, one solution is employing the help of validation data to capture the underlying relation between the true variables and surrogate variables.
When the error existed in the covariables, some statistical inference based on validation data were developed. Wang [10] used this method to partially linear error-in-variable model. Wang and Rao [11] and Stute et al. [12] developed empirical likelihood approach to linear models and nonlinear models with errors-in-covariables, respectively. Wang and Zhang [13] and Du et al. [14] applied statistical inference to varying coefficient models and nonparametric regression function with validation sampling. Later, Fang and Hu [15] considered the nonlinear model with the help of validation data when the error is in the response. For nonlinear semiparametric models, Xue [16] constructed empirical log-likelihood ratio statistics for the unknown parameter with the help of validation data. Furthermore, Liu [17] considered nonlinear semiparametric models with missing response variable and error-in-covariables.
In this paper, we consider model (1) with explanatory variable $X$ measured with error and both $Y$ and $T$ measured exactly. Instead of the true variable $X$, the surrogate variable $\tilde{X}$ is observed. The relationship between $X$ and $\tilde{X}$ is not additive, which can be evaluated by regression of $X$ on $\tilde{X}$. This assumption has been used in other statistical models, such as in linear models [11] and varying coefficient models [13]. We define two estimators for the parameter in nonlinear function by considering the two cases where the response variable $Y$ is available or not in the validation sample. Asymptotic results for the two estimators are derived, showing that the two proposed estimators are asymptotically normal.
The rest of this paper is organized as follows: we describe the estimation procedures based on the least square method and kernel method in Section 2. In Section 3, the asymptotic normality of the proposed estimators is proved. Some simulation studies are conducted in Section 4 to evaluate the finite sample properties of the proposed estimators. Finally, Section 5 concludes the paper.
Suppose that $\tilde{X}$ is a $p$-dimensional surrogate variable for $X$. Assume that we have a primary data set containing $N$ independent and identically distributed observations of $\{({Y}_j, \tilde{X}_j, T_j)_{j=n+1} ^{n+N}\}$ and a validation data set containing $n$ independent and identically distributed observations of $\{(X_i, \tilde{X}_i, T_i)_{i=1} ^{n}\}$ or $\{(Y_i, X_i, \tilde{X}_i, T_i)_{i=1} ^{n}\}$. It is also assumed that the two observation subsets are independent.
Denote $Z=(\tilde{X}, T)$ and $G(z, \beta)=E[g(X, \beta)|Z=z]$. Then, model (1) can be rewritten as
where $\varepsilon=e+g(X, \beta)-G(Z, \beta)$.
Clearly, model (2) is a standard partially nonlinear model if $G(., .)$ is a known function. Unfortunately, $G(., .)$ is usually unknown in practice. To solve this difficulty, we estimate $G(., .)$ consistently by the kernel method with validation data as following procedure.
Let
where $K_1(.)$ is a kernel function and $h_{1, n}$ is a bandwidth.
Then, $G(z, \beta)$ can be estimated by $\frac{\hat R_n (z, \beta)}{\hat{f}_{n}(z)}$. Notice that the small value of ${\hat{f}_{n}(z)}$ as the denominator in this estimator, so we can improve this estimator in practice to avoid technical difficulties. Let $\hat{f}_{nb}(z)=\mathrm{max}(\hat{f}_{n}(z), b_n), $ where $b_n$ is a positive constant sequence that decrease to zero as $n$ increase to infinity. Then, the estimator of ${G}(z, \beta)$ with truncation version, say $\hat{G}(z, \beta)$, is given by
Define $G^{(1)}(z, \beta)=\frac{\partial}{\partial \beta} G(z, \beta)=E[g^{(1)}(X, \beta)|Z=z]$ and $g^{{(1)}}(X, \beta)=\frac{\partial}{\partial \beta} g(X, \beta)=\big(\frac{\partial}{\partial \beta_1}g(X, \beta), \cdots, \frac{\partial}{\partial \beta_d}g(X, \beta)\big)^T.$ Then, the estimator of $G^{(1)}(z, \beta)$, denoted by $\hat G^{(1)}(z, \beta)$, can also be obtained by the kernel method.
then, we have
Using $\hat G(z, \beta)$ to replace $G(z, \beta)$ in model (2) and assuming $\beta$ is known, $m(t)$ is estimated by
where ${W}_{Nj}(t)=\frac{ K_2(\frac{T_j-t}{h_{2, N}})}{\sum\limits_{i=n+1}^{n+N} K_2(\frac{T_i-t}{h_{2, N}})}$ with $K_2(.)$ is a kernel function and $h_{2, N}$ is a bandwidth.
Similar to $\hat m(t, \beta)$ defined in (5), the estimator of $E[{G}^{(1)}(Z, \beta)|T=t]$, denoted by $ \hat h(t, \beta)$, can be estimated by the kernel method, which is defined as
Then, the estimator of $\beta$ is defined to be the one which minimizes $\hat S_N(\beta)$ given by
Thus, the estimator of $\beta$, say $\hat \beta_{N}$, solves the equation
Notice that, if we ignore the missing response variable in Liu [17], the estimator $\hat \beta$ will reduce to be the $\hat \beta_N$ in this paper.} In practice, the response variable $Y$ may be fully observed, that is to say $Y$ can also be measured in the validation data set. In this case, considering the validation data $\{(Y_i, X_i, \tilde{X}_i, T_i) _{i=1} ^{n}\}$, an alternative estimator of $\beta$, say $\hat \beta_{n, N}$, can be obtained by following procedures.
where $\tilde{W}_{ni}(t)=\frac{ K_3(\frac{T_i-t}{h_{3, n}})}{\sum\limits_{j=1}^{n} K_3(\frac{T_j-t}{h_{3, n}})}$ with $K_3(.)$ is a kernel function and $h_{3, n}$ is a bandwidth.
Similar to (9), the estimator of $E[g^{(1)}(X, \beta)|T=t]$, denoted by $\tilde{h}(t, \beta)$, is defined as
Then, $\hat \beta_{n, N}$ can be obtained by minimizing the sum of least squares
Thus, $\hat \beta_{n, N}$ solves the equation
Finally, using estimator $\hat \beta_{N}$ or $\hat \beta_{n, N}$, we can define the estimator of $m(.)$ as following
To state our results, we introduce the following assumptions:
(A1) $m(t)$ has two bounded and continuous derivatives on (0, 1).
(A2) $T$ has density function $r(t)$ on $<sup>[0, 1]$, and $0<\underset{{0\leq t\leq 1}}{\mathrm{inf}}{r(t)}<\underset{{0\leq t\leq 1}}{\mathrm{sup}}{r(t)}<\infty$.
(A3) $\mathrm{sup}_zE[e^2|Z=z]<\infty$, $\mathrm{sup}_zE[g^2(X, \beta)|Z=z]<\infty$, $\mathrm{sup}_z E[g_s^{(1)}(X, \beta)^2| Z=z]<\infty$, $s=1, 2, \cdots, d$.
(A4) For some $k>p$, $G(z, \beta)\in \Re^k$, and $G^{(1)}_s(z, \beta)\in \Re^k$.
(A5) The density of $Z$, say $f_Z(z)$, has bounded partial derivative of order one and satisfies $NP(f_z(z)<\eta_N)\rightarrow 0$ for some positive constant sequence $\eta_N>0$ tending to zero.
(A6) The kernel function $K_1(.)$ is a $d+1$-dimensional, continuous and symmetric probability density function with bounded support. Both $K_2(.)$ and $K_3(.)$ are symmetric and bounded probability density function with finite support.
(A7) $nh_{1, n}^{2p}b_n^4\rightarrow\infty$, $nh_{1, n}^{2k}b_n^{-2}\rightarrow 0~ (k>p)$, $Nh_{2, N}\rightarrow\infty$ and $Nh_{2, N}^4\rightarrow 0$, $nh_{3, n}\rightarrow\infty$ and $nh_{3, n}^4\rightarrow 0$.
(A8) Both $\Sigma_1(\beta)$ and $\Sigma_3(\beta)$ are positive definite matrixes which defined in Theorem 1 and Theorem 2.
(A9) $\frac{N}{n}\rightarrow \lambda$, where $\lambda$ is a nonnegative constant.
Remark 1 (A1), (A2), (A3), (A8) are standard assumptions in partially nonlinear regression models. (A4) and (A5) are common assumptions in measurement error data with validation sample. Assumptions (A6), (A7), (A9) are usual used in kernel function and bandwidths assumptions.
For the estimator $\hat \beta_{N}$, asymptotic normality is given by the following theorem.
Theorem 1 Under assumptions A1-A9, we have
where $\stackrel{d}{\longrightarrow}$ denotes the convergence in distribution, $\Sigma_1(\beta)=E[U(Z, \beta)U^T(Z, \beta)]$ with
Proof The proof of Theorem 1 is similar to Theorem 2.3 in Xue [16], so we omit it.
Remark 2 The first term in the asymptotic covariance is the contribution of the primary data in the sample by modeling (2), the partially nonlinear regression relationship between $Y$, and $Z$, $T$. The second term represents the extra cost due to the estimation of unknown mean $g(X, \beta)$ given $Z$ using the validation data. If $\lambda=0$, the second term in the asymptotic covariance will disappears, and the asymptotic covariance is the same as that in Li and Nie [2].
For the estimator $\hat \beta_{n, N}$, we give the following theorem.
Theorem 2 Under assumptions A1-A9, we have
where $\Sigma_3(\beta)=\frac{\lambda}{1+\lambda}\Sigma_1(\beta)+\frac{1}{1+\lambda}\Sigma_2(\beta)$ with
Proof To facilitate the presentation, we give the notations as ${A}^{\otimes 2}={AA}^T$ for a vector or matrix ${A}.$ Define the left side of (12) is $K(\beta)$, that is
By the motivation of (12), we have $K(\hat{\beta}_{n, N})=0$. Using Taylor expression to $K({\beta})$ at $\hat{\beta}_{n, N}$, we get that
where $C_{n, N}(\beta)=\frac{1}{n+N}\big[\sum\limits_{j=n+1}^{n+N}(\hat G^{(1)}(Z_j, \beta)-\hat{h}(T_j, \beta))^{\otimes 2}+ \sum\limits_{i=1}^{n}(g^{(1)}(X_i, \beta)-\tilde{h}(T_i, \beta))^{\otimes 2}\big]$, $\beta^*$ satisfies $||\beta^*-\beta||\leq ||\hat \beta_{n, N}-\beta||$. We can easily prove that $C_{n, N}(\beta^*)\stackrel{p}{\longrightarrow}\frac{\lambda}{1+\lambda}\Sigma_1(\beta)+\frac{1}{1+\lambda}\Sigma_2(\beta).$
For $A(\beta)$, we have
As the same argument of Liu [17}, we can prove that
Using the Kernel estimation method and Taylor expression, we have
This together with (17) and (18), we obtain that
For $B(\beta)$, by simple calculation, it holds that
Then, we have
This together with (16), (20) and (21) complete the proof.
Remark 3 Obviously, compared to $\hat \beta_{N}$, $\hat \beta_{n, N}$ make full use of information, including response variable $Y$ in the validation data, so it will give more accurate estimator than $\hat \beta_{N}$. This conclusion will be confirmed by simulation studies in the next section. However, in most applications, the primary data set is much larger than the validation data set, in such case, there is little information in the validation data, and this will lead to negligible difference between $\hat \beta_{N}$ and $\hat \beta_{n, N}$. On the other hand, $\hat \beta_{N}$ is simple for calculation. So, we recommend $\hat \beta_{N}$ when $\lambda$ is large.
Clearly, the asymptotic covariances of $\hat \beta_{N}$ and $\hat \beta_{n, N}$ can be estimated by combining the sample moment method and the "plug-in" method. We give the following the notations:
Then, the asymptotic covariance of $\hat \beta_{N}$ and $\hat \beta_{n, N}$ can be consistently estimated by $\hat \Sigma_1^{-1}(\hat{\beta}_N)[\hat V_0(\hat{\beta}_N)+\lambda \hat V_1(\hat{\beta}_N)]\hat \Sigma_1^{-1}(\hat{\beta}_N)$ and $\hat \Sigma_3^{-1}(\hat{\beta}_{n, N})(\hat V(\hat \beta_{n, N}))\hat \Sigma_3^{-1}(\hat{\beta}_{n, N})$ with
respectively.
In this section, we conducted some simulation studies to examine the finite sample performances of the proposed approaches.
To show the performance of the proposed estimators $\hat \beta_N$ and $\hat \beta_{N, n}$ in Section 2, we compared them with two other estimators: the naive estimator and the gold standard estimator. The naive estimator was obtained by ignoring the measurement error and applying the standard approach under model (1). The gold standard estimator consider all the true variable can be observed though it can not be obtained in practice.
The data are generated from the partially nonlinear model:
where $g(X, \beta)=2\textrm{exp}( -\beta X)$ with $\beta=1$ and $m(T)=\textrm{sin}(2\pi T)$ in which variables $T$ is simulated from the uniform distribution on [0, 1], $X$ is measured with error and the surrogate variable $\tilde{X}$ is generated as $\tilde{X}=1.25X+0.2u$, $X, e, u$ are standard normal distribution with truncation constants is 3, respectively. The simulation are run with validation data and primary data sizes of $(n, N)$. The kernel function $K_1(x_1, x_2)=K_0(x_1)K_0(x_2)$ with $K_0(x)=(15/16)(1-x^2)^2$ if $|x|\leq 1$, and 0 for otherwise. Let $K_2(x)=K_3(x)=K_0(x)$. Take the bandwidths $h_{1, n}=0.2*n^{-1/5}$, $h_{2, N}=0.2*N^{-1/5}$, $h_{3, n}=0.2*n^{-1/5}$, and truncation constant $b_n=0.1*n^{-1/42}$. To show the effects of the rate of the size of the primary data to the validation data, six cases are studied, which are $(n, N)=(60, 150), (120, 300), (30, 150), (60, 300), (30, 300), (60, 600)$, respectively. For each case, we replicated the simulation 1000 times. Table 1 presents the performance of four estimators of $\beta$. The 'mean' stands for the average of the 1000 estimates, and 'SD' is the standard deviation of the 1000 estimates.
It follows from Table 1 that the naive estimators have much large bias than the gold standard estimators and the proposed estimators in all cases. The proposed estimators have a slight larger bias and SD than the gold standard estimators, which implies that the proposed estimators $\hat \beta_N$ and $\hat \beta_{n, N}$ work well. Compared with $\hat \beta_N$ and $\hat \beta_{n, N}$, $\hat \beta_{n, N}$ performs better than $\hat \beta_N$ in terms of that Mean is much close to the true value and SD is much smaller. This is caused by that the $\hat \beta_{n, N}$ involves more information in the estimation equation. But when the validation data sample is small, we suggest using $\hat \beta_N$, because it is much simple. The proposed estimation method performs well among different sample size of $(n, N)$.
Nonlinear semiparametric model is a very useful semiparametric model which has been studied in many literatures. In this paper, we considered the situation of that the covariable is measured with error, furthermore, there is no specific structure assumption between the surrogate variable and the true variable. With the help of validation data, we obtain two estimators for unknown parameter in nonlinear function and prove its asymptotic normality, respectively. The first estimator is based on the primary data in (7) when applying the least squares method, moreover, the second estimator considers the response variable $Y$ is available in the validation data as additional information in (11). The second estimator gives more accurate estimation at the cost of complexity. However, When the validation data sample is small and the primary data is large, there is little difference between these two estimators. In most cases, we recommend the first estimator because it is simple. Simulation studies show that the estimation methods we proposed are valid.