Current status data or Case Ⅰ interval-censored failure time data occur frequently in survival analysis when an exact event time of interest is not available, and only whether or not the event has occurred up to a certain random monitoring time. That event's current status' is known. This kind of data are often encountered in epidemiological studies, carcinogenicity experiments, econometrics and reliability studies among others. Regression analysis of failure time data is one of the main objectives in survival analysis. In regression analysis, an important and challenging task is to identify the risk factors and their risk contributions. Often, not all the collected covariates may contribute to the predication of outcomes and we need to remove some unimportant covariates.
There are many variable selection techniques in linear regression models. Some of them have been extended to the survival analysis, for example, Bayesian variable selection methods for censored survival data were proposed by Faraggi and Simon [2]. However, the sampling properties of this selection methods are largely unknown(see Fan and Li [3]). The least absolute shrinkage and selection operator (Lasso), proposed by Tibshirani [4], is a member of variable selection family based on a penalized likelihood approach with the $ L_1 $-penalty. It can delete insignificant variables by estimating their coefficients as 0. Tibshirani [5] proposed using the Lasso for estimation and variable selection under the Cox model. However, the Lasso estimator does not possess the oracle properties(see [3]). Many other variable selection methods have been developed following Tibshirani [4]. For example, the smoothly clipped absolute derivation (SCAD) by Fan and Li [6] and the adaptive Lasso (aLasso) by Zou [7]. Both of them have nice properties.
So far many literatures have developed variable selection methods for right-censored data (see for example, [3], [5], [8]). In particular, some penalized methods have been established under the Cox's proportional hazards model. For example, Tibshirani [5] proposed using the Lasso for the variable selection under the Cox model and right-censored data. Fan and Li [3] generalized the SCAD to the Cox model with right-censored data. The aLasso method also has been extended to the context of proportional hazards model when one observes right-censored data by Zhang and Lu [1]. Huang et al. [9] studied the Lasso estimator in sparse, high-dimensional Cox model. Zhao et al [10] studied the simultaneous estimation and variable selection for interval-censored data under the Cox model.
The additive hazards model as an alternative model, which describes a different aspect of the association between the failure time and covariates than the proportional hazards model, is another commonly used regression model in survival analysis. A lot of theoretical results of the estimated regression parameters under additive hazards model have been well established (see for example, [11-13]). It is well-known that many efforts have been focused on the methods of variable selection for Cox model with right-censored observation data. However, as mentioned by Zhao et al [10], there exists little literature on variable selection for interval-censored data. There are relatively less studies developed for the additive hazards model with interval-censored data. This paper considers the variable selection method for case Ⅰ interval-censored data under the additive hazards model.
The remainder of the paper is organized as follows. In Section 2, we will introduce some notations and assumptions that will be used in this paper. In Section 3, we develop an adaptive lasso method, and give its statistical properties. Section 4 gives some details about the ADMM algorithm that will be applied to solve the adaptive lasso. Section 5 provides some numerical results from an extensive simulation study to assess the performance of the proposed method, and Section 6 applies the proposed method to a real data set from a tumorigenicity study.
Consider a random sample of $ n $ independent subjects. For $ i = 1, \ldots, n, $ let $ T_i $ and $ C_i $ denote the failure time of interest and censoring time of the $ i $-th subject, and $ Z_i(t) = (Z_{i1}(t), \dots, Z_{ip}(t))' $ be the vector of possibly time-dependent covariates. Furthermore, since only current status data are available for failure time $ T_i $'s, the observed data are given by $ \left\{C_i, \delta_i = I(T_i\geq C_i), Z_i(t), i = 1, \ldots, n\right\}. $ In the next section, we present methods for the cases in which the monitoring time $ C $ is independent or dependent of $ T $ and $ Z. $
In this subsection, we suppose that $ C $ is independent of $ T $ and $ Z $. To model the covariate effect, we assume that the hazard function of $ T $ at time $ t $, given the history of a $ p $-dimensional covariate process $ Z(\cdot) $ up to $ t, $ has the form
where $ \lambda_0(t) $ is an unspecified baseline hazard function, and $ \beta_0 $ is a $ p $-vector of unknown regression parameters.
For $ i = 1, \dots, n $, define $ N_i(t) = \delta_iI(C_i \leq t), $ and $ Y_i(t) = I(C_i\geq t). $ It can be shown that the counting process $ N_i(t) $ has the Cox type intensity process as follows
where $ dH_0(t) = e^{-\Lambda_0(t)}d\Lambda_c(t), \ \Lambda_0(t) = \int_0^t \lambda_0(s)ds, \ \Lambda_c(t) = \int_0^t \lambda_c(s)ds, \ Z_i^*(t) = \int_0^t Z_i(s)ds, $ and $ \lambda_c(t) $ is the hazard function of $ C. $ Therefore,
are martingales with respect to the $ \sigma $-filtration $ \mathcal{F}_t = \sigma\lbrace N_i(s), Y_i(s), Z_i(s):s\leq t, i = 1, \dots, n\rbrace. $ Thus, we can make inferences about $ \beta_0 $ by applying the partial likelihood principle to model (2.2). For this, we first define the partial likelihood function as follows
Taking logarithm of it yields that
where $ \bar N(t) = \sum_{i = 1}^nN_i(t), $ and $ \tau $ is the longest follow-up time. For $ k = 0, 1, 2, $ we also define $ S^{(k)}(t, \beta) = \sum_{j = 1}^n(Z_j^*(t))^{\otimes k}Y_j(t)e^{-\beta'Z_j^*(t)}, $ where $ Z^{\otimes 0} = 1, Z^{\otimes 1} = Z, Z^{\otimes 2} = ZZ'. $ By differentiation and rearrangement of terms, the gradient of $ l_n(\beta) $ is
and the Hessian matrix is
It can be seen that the Hessian matrix of $ l_1(\beta) $ is negative definite, so $ l_1(\beta) $ is concave in $ \beta, $ that is, $ l_1(\beta) $ has a unique maximizer $ \tilde\beta $. The estimate $ \tilde\beta $ of $ \beta_0 $ can be obtained by maximizing the function $ l_1(\beta), $ or solving the equation $ U_1(\beta) = 0. $
When the censoring time $ C $ is not independent of the covariate vector $ Z, $ we describe the relationship between $ C $ and $ Z $ by the following hazards model,
where $ \Lambda_{c0}(t) $ is an unknown cumulative baseline hazard function, and $ \gamma_0 $ is a $ p $-vector of unknown regression parameters. We assumed that $ C $ is conditionally independent of $ T $ given the covariate vector $ Z. $
By the arguments leading to (2.2), it can be shown that, under model (2.1) and (2.3), the compensated counting processes
are martingales with respect to the $ \sigma $-filtration $ \mathcal{F}_t $. The notations $ N_i(t) $ and $ H_0(t) $ are the same as those defined in subsection 2.1. We can also apply the partial likelihood principle to model (2.4) to make inferences for the unknown parameters $ \beta_0 $ and $ \gamma_0. $ That is, we can consider the following partial likelihood function
However, the function $ L_2(\beta, \gamma) $ above utilizes only the information of $ C_i $'s with non zero $ \delta_i $'s, and we mainly focus on $ \beta $, it would be more efficient to estimate $ \gamma_0 $ by applying the partial likelihood theory directly to the model (2.3). Hence, for the estimate of $ \gamma_0, $ we first consider the following partial likelihood function
The maximum partial likelihood estimator $ \hat{\gamma} $ of $ \gamma_0 $ can be obtained by maximizing the function $ L_3(\gamma). $ Of course, $ \hat\gamma $ can also be obtained by solving the score equation $ U_\gamma(\gamma) = 0, $ where
Given $ \hat{\gamma} $, we estimate $ \beta_0 $ by the following function
The estimate $ \hat\beta $ of $ \beta_0 $ can be obtained by maximizing the function $ L_2(\beta, \hat\gamma) $ or $ l_2(\beta), $ where $ l_2(\beta) $ is defined as
} For $ k = 0, 1, 2, $ define $ \tilde S^{(k)}(t, \beta, \gamma) = \sum_{j = 1}^n(Z_j^*(t))^{\otimes k}Y_j(t)e^{-\beta'Z_j^*(t)+\gamma'Z_j(t)}. $ Similar to the process above, we can get the following score function
The estimate $ \hat\beta $ also can be obtained by solving the equation $ U_2(\beta) = 0. $
In the following, we will discuss the development of a penalized or regularized procedure for covariate selection based on the functions $ l_1(\beta) $ and $ l_2(\beta). $
We assume that one observes right-censored data, to select and estimate important variables under the proportional hazards model, Zhang and Lu [1] proposed to minimize the penalized log partial likelihood function,
where $ l_n^*(\beta) $ denotes the log partial likelihood based on the right-censored data and the proportional hazards model, the positive weights $ \check{\beta} = (\check{\beta_1}, \dots, \check{\beta_p})' $ is the maximizer of the log partial likelihood, $ \lambda $ is a nonnegative penalization tuning parameter.
Consider the current status data under model (2.1), note that the intensity process of the counting process $ N_i(t) $ also satisfies Cox type. This suggests that we can select variables by employing a similar method of Zhang and Lu [1]. We propose the adaptive Lasso estimator $ \hat{\beta}_n $ as follows,
or
The values of $ \omega_j $'s can be chosen by different ways. In this paper, we specify $ \omega_j = 1/|\tilde{\beta}_j| $, where $ \tilde{\beta} = (\tilde{\beta}_1, \dots, \tilde{\beta}_p)' $ is the maximizer of the log partial likelihood $ l_i(\beta) $, $ i = 1, 2 $.
To study the oracle properties of the estimators, we first consider the penalized log partial likelihood function
Let $ \beta_0 = (\beta_{10}', \beta_{20}')' $ denote the true parameter vector, where $ \beta_{10} $ consists of all $ q $ nonzero components and $ \beta_{20} $ consists of the remaining zero components. Similarly, we use $ \hat{\beta}_n = (\hat{\beta}_{1n}', \hat{\beta}_{2n}')' $ to denote the maximizer of (3.1) or (3.2). In the case of independent censoring, we can get the Fisher information matrix $ \Omega(\beta_0), $ which is the limit of $ n^{-1}(-\mathcal{H}_1(\beta)) $. As usual, we assume that $ \Omega(\beta_0) $ is nonsingular. In the case of dependent censoring, let
and let $ \Omega_{\beta} $, $ \Omega_{\beta\gamma} $ and $ D_{\gamma} $ denote their limits at $ \beta = \beta_0 $ and $ \gamma = \gamma_0 $.
Using some similar arguments as those of Lin et al [11], we can prove that the random vectors $ n^{-\frac{1}{2}}U_2(\beta_0;\hat{\gamma}) $ and $ n^{\frac{1}{2}}(\tilde{\beta}-\beta_0) $ converge in distribution to zero-mean normal random vectors with covariance matrices $ M(\beta_0) = \Omega_{\beta}-\Omega_{\beta\gamma}D^{-1}_{\gamma}\Omega'_{\beta\gamma} $ and $ V(\beta_0) = \Omega_{\beta}^{-1}-\Omega_{\beta}^{-1}\Omega_{\beta\gamma}D^{-1}_{\gamma}\Omega'_{\beta\gamma}\Omega_{\beta}^{-1} $, respectively.
Let $ \Omega_1(\beta_{10}) = \Omega_{11}(\beta_{10}, 0) $, where $ \Omega_{11}(\beta_{10}, 0) $ is the leading $ q \times q $ submatrix of $ \Omega(\beta_0) $ with $ \beta_{20} = 0 $ and $ V_1(\beta_{10}) = V_{11}(\beta_{10}, 0) $, where $ V_{11}(\beta_{10}, 0) $ is the leading $ q \times q $ submatrix of $ V(\beta_0) $ with $ \beta_{20} = 0 $. The following theorem shows that $ \hat{\beta}_n $ is root-$ n $ consistent if $ \lambda_n \to 0 $ at an appropriate rate.
Theorem 3.1 Assume that $ (Z_1, T_1, C_1), \dots, (Z_n, T_n, C_n) $ are independently and indentically distributed, and that $ C_i $ is independent of $ T_i $ or conditionally independent of $ T_i $ given $ Z_i $. If $ \sqrt{n}\lambda_n = O_p(1) $, then the adaptive Lasso estimator satisfies $ ||\hat{\beta}_n-\beta_0|| = O_p(n^{-1/2}) $.
Proof As mentioned earlier, in the case of independent censoring, the log partial likelihood is
By Theorem 4.1 and Lemma 3.1 of Andersen and Gill [14], it follows that for each $ \beta $ in a neighbourhood of $ \beta_0 $,
It is sufficient to show that for any given $ \varepsilon>0 $, there exists a large constant $ K $ such that
where $ u = (u_1, \dots, u_p)' $. This implies with probability at least $ 1-\varepsilon $ that there exists a local maximum in the ball $ B_n(K) = \{ \beta_0+n^{-1/2}u, ||u||\leq K \} $, $ K>0 $. Hence, there exists a local maximizer such that $ ||\hat{\beta}-\beta_0|| = O_p(n^{-1/2}) $.
In the case of independent censoring, because $ U_1(\beta_0)/\sqrt{n} \to N\{0, \Omega(\beta_0)\} $ in distribution and $ -\mathcal{H}_1(\beta_0)/n \to \Omega(\beta_0) $ in probability, we can get $ U_1(\beta_0)/\sqrt{n} = O_p(1) $ and $ -\mathcal{H}_1(\beta_0)/n = \Omega(\beta_0)+o_p(1) $. For any $ \beta \in \partial B_n(K) $, where $ \partial B_n(K) $ denotes $ B_n(K) $'s boundary, by the second-order Taylor expansion of the log partial likelihood, we have
Then we have
In the case of dependent censoring, we can write
Since the maximum partial likelihood estimator $ \tilde{\beta} $ satisfies $ ||\tilde{\beta}-\beta_0|| = O_p(n^{-1/2}) $, by the Taylor expansion, we have, for $ 1\leq j \leq q $,
In addition, since $ \sqrt{n}\lambda_n = O_p(1) $, we have
Therefore in (3.6) or (3.7), if we choose a sufficiently large $ K $, the first term is of the order $ K^2n^{-1} $. The second and third terms are of the order $ Kn^{-1} $, which are dominated by the first term. Therefore (3.5) holds and it completes the proof.
If the $ \lambda_n $ is chosen properly, the adaptive Lasso estimator has the oracle property. There are the properties we will show next.
Theorem 3.2 Assume that $ \sqrt{n}\lambda_n \to \lambda_0 $ and $ n\lambda_n \to \infty $. Then, under the conditions of Theorem 3.1, with probability tending to 1, the root-$ n $ consistent adaptive Lasso estimator $ \hat{\beta}_n $ must satisfy the following conditions:
(1) (Sparsity) $ \hat{\beta}_{2n} = 0 $;
(2) (Asymptotic normality) $ \sqrt{n}(\hat{\beta}_{1n}-\beta_{10}) $ converges in distribution to the normal distribution of $ N( 0, \Omega_1^{-1}(\beta_{10})) $ for the independent censoring case, or $ N( 0, V_1(\beta_{10})) $ for the dependent censoring case.
Proof (1) Here we show that $ \hat{\beta}_{2n} = 0 $. It is sufficient to show that, for any sequence $ \beta_1 $ satisfying $ ||\beta_1-\beta_{10}|| = O_p(n^{-1/2}) $ and for any constant $ K $,
We will show that, with probability tending to 1, for any $ \beta_1 $ satisfying $ ||\beta_1-\beta_{10}|| = O_p(n^{-1/2}) $, $ \partial Q_i(\beta)/\partial \beta_j $ and $ \beta_j $ have different signs for $ \beta_j \in (-Kn^{-1/2}, Kn^{-1/2}) $ with $ j = q+1, \ldots, p. $ For each $ \beta $ in a neighbourhood of $ \beta_0 $, by Taylor expansion,
where $ f_1(\beta) = -\frac{1}{2}(\beta-\beta_0)'(\Omega(\beta_0)+o(1) )(\beta-\beta_0) $ or $ f_2(\beta) = -\frac{1}{2}(\beta-\beta_0)'(\Omega_{\beta}+o(1))(\beta-\beta_0). $ For $ j = q+1, \dots, p, $ we have
Note that $ n^{1/2}(\tilde{\beta}_{j}-0) = O_p(1) $, so that we have
Since $ n\lambda_n \to \infty $, the sign of $ \frac{\partial Q_i(\beta)}{\partial \beta_j} $ in (3.8) is completely determined by the sign of $ \beta_j $ when $ n $ is large, and they always have different signs.
(2) We need to show the asymptotic normality of $ \hat{\beta}_{1n}. $ From the proof of Theorem 3.1, it is easy to show that there exists a root-$ n $ consistent maximizer $ \hat{\beta}_{1n} $ of $ Q_i(\beta_1, 0) $, i.e.
In the case of independent censoring, let $ U_{11}(\beta) $ be the first $ q $ elements of $ U_1(\beta) $ and let $ \hat{I}_{11}(\beta) $ be the first $ q \times q $ submatrix of $ -\mathcal{H}_1(\beta) $. Then
where $ \beta^* $ is between $ \hat{\beta}_n $ and $ \beta_0 $. The last equation is implied by $ \rm{\rm{sign}}(\hat{\beta}_{jn}) = \rm{\rm{sign}}(\beta_{j0}) $ when n is large. Using Theorem 3.2 of Andersen and Gill [14], we can prove that $ U_{11}(\beta_0)/\sqrt{n} \to N\{0, \Omega_{1}(\beta_0) \} $ in distribution and $ \hat{I}_{11}(\beta^*)/n \to \Omega_1(\beta_{10}) $ in probability as $ n \to \infty $. Furthermore, if $ n \to \infty $ and $ \sqrt{n}\lambda_n \to \lambda_0, $ a nonnegative constant, we have
with $ b_1 = \left(\frac{\rm{\rm{sign}}(\beta_{10})}{|\beta_{10}|}, \ldots, \frac{\rm{\rm{sign}}(\beta_{q0})}{|\beta_{q0}|}\right)' $, since $ \tilde{\beta}_{j} \to \beta_{j0}\neq 0 $ for $ 1\leq j \leq q $. Then by Slutsky's Theorem, $ \sqrt{n}(\hat{\beta}_{1n}-\beta_{10}) \to N\left(-\lambda_0\Omega^{-1}_1(\beta_{10})b_1, \Omega^{-1}_1(\beta_{10}) \right) $ in distribution as $ n \to \infty $. In particular, if $ n \to \infty $ and $ \sqrt{n} \lambda_n \to 0 $, we have
where $ \overset{d}\longrightarrow $ means converging in distribution.
In the case of dependent censoring, let $ U_{21}(\beta;\gamma) $ be the first $ q $ elements of $ U_2(\beta;\gamma) $ and let $ \hat{I}_{11}(\beta;\gamma) $ be the first $ q \times q $ submatrix of $ \hat{\Omega}_{\beta}(\beta;\gamma) $. Then
where $ \beta^* $ is between $ \hat{\beta}_n $ and $ \beta_0 $. The last equation is implied by $ \rm{\rm{sign}}(\hat{\beta}_{jn}) = \rm{\rm{sign}}(\beta_{j0}) $ when n is large. Let $ M_1(\beta_{10}) = M_{11}(\beta_{10}, 0) $, where $ M_{11}(\beta_{10}, 0) $ is the leading $ q \times q $ submatrix of $ M(\beta_0) $ with $ \beta_{20} = 0 $ and $ \Omega_{\beta1}(\beta_{10}) = \Omega_{\beta_{11}}(\beta_{10}, 0) $, where $ \Omega_{\beta_{11}}(\beta_{10}, 0) $ is the leading $ q \times q $ submatrix of $ \Omega_{\beta} $ with $ \beta_{20} = 0 $. Since $ U_{21}(\beta_0;\hat{\gamma})/\sqrt{n} \to N(0, M_1(\beta_0) ) $ in distribution and $ \hat{I}_{11}(\beta^*)/n \to \Omega_{\beta1}(\beta_{10}) $ in probability as $ n \to \infty $. Furthermore, if $ \sqrt{n}\lambda_n \to \lambda_0, $ a nonnegative constant, we have
with $ b_1 = \left(\frac{\rm{sign}(\beta_{10})}{|\beta_{10}|}, \ldots, \frac{\rm{sign}(\beta_{q0})}{|\beta_{q0}|}\right)^T $, since $ \tilde{\beta}_{j} \to \beta_{j0}\neq 0 $ for $ 1\leq j \leq q $. Then by Slutsky's Theorem, $ \sqrt{n}(\hat{\beta}_{1n}-\beta_{10}) \to N\left(-\lambda_0\Omega^{-1}_{\beta1}(\beta_{10})b_1, V_1(\beta_{10}) \right) $ in distribution as $ n \to \infty $. In particular, if $ \sqrt{ n} \lambda_n \to 0 $, we have
in distribution as $ n \to \infty. $
Remark It is worth noting that as $ n $ goes to infinity, the adaptive Lasso can perform as well as the correct submodel was known. Since the proofs only require the root-$ n $ consistency of $ \tilde{\beta} $, any root-$ n $ consistent estimator of $ \beta_0 $ can be used as the adaptive weight $ \rho $ without changing the asymptotic properties.
The optimization problem (3.1) or (3.2) is strictly convex and therefore can be solved by many convex optimization algorithm. Here we present an algorithm based on the Alternating Direction Method of Multipliers (ADMM)^{[15]}. The ADMM algorithm solves problem in the form
with variables $ x \in R^n $ and $ z \in R^m $, where $ A \in R^{p \times n} $, $ B \in R^{p \times m} $, and $ c \in R^p $. The augmented Lagrangian is
ADMM consists of the iterations
with $ \rho>0. $
In ADMM form, the problem (3.1) or (3.2) can be written as
where $ f(\beta) $ is equal to $ -l_1(\beta)/n $ or $ -l_2(\beta)/n $, and $ g(z) = \lambda\sum_{j = 1}^p|z_j|\omega_j $. The updates performed by the algorithm during each iteration are
where $ S $ is the soft thresholding operator satisfying
The $ \beta $-update can be done by solving the equation $ -\frac{U_i(\beta)}{n}+u^{k}\rho+\rho(\beta-z^{k}) = 0 $, $ i = 1, 2 $. To solve the equation, there are many standard methods, such as the Newton-Raphson method.
This algorithm gives very small values to the coefficients which should be estimated as zero and it converges quickly based on our empirical experience.
In this section, we examine the performance of the adaptive Lasso method under the additive hazards model and as a comparison, Lasso, smoothly clipped absolute deviation (SCAD), maximum partial likelihood estimators (MPLE) are also considered. For given $ p, $ the covariates $ Z $ are assumed to follow the multivariate normal distribution with mean zero, variance one, and the correlation between $ Z_j $ and $ Z_k $ being $ \rho^{|j-k|} $ with $ \rho = 0.5, j, k = 1, \dots, p $. We set $ \beta_{0j} = 1 $ for the first and last two components of the covariates and $ \beta_{0j} = 0 $ for other components. The results given below are based on sample size $ n = 300 $ and 500 replications.
To measure prediction accuracy, we define the mean weighted squared error (MWSE) to be $ (\hat{\beta}-\beta_0)'E(ZZ')(\hat{\beta}-\beta_0) $. Besides MWSE, we also use the averaged number of nonzero estimates of parameters whose true values are not zero (TP): $ TP = \sum_{i = 1}^{p}I(\beta_{0i} \neq 0)I(\hat{\beta}_i \neq 0), $ and the averaged number of nonzero estimates of parameters whose true values are zero (FP): $ FP = \sum_{i = 1}^{p}I(\beta_{0i} = 0)I(\hat{\beta}_i \neq 0). $ It is easy to see that TP and FP provide the estimates of the true and false positive probabilities, respectively. For the selection of the tuning parameters in the proposed method, we use the Bayesian information criterion based on $ \text{BIC}(\lambda) = -2l_i(\hat{\beta})+q_n \times \log(n), $ for $ i = 1\ \text{or}\ 2 $ with $ q_n $ denoting the number of the nonzero $ \beta $ estimates. Then one choose the values of $ \lambda $ that minimize $ \text{BIC}(\lambda) $.
Table 1 displays the results on the covariate selection with $ p = 10 $ or $ 20 $ in the case of independent censoring. In this case, the failure times $ T_i $ are generated from model (2.1) with $ \lambda_0 = 0.5 $ or $ 1 $. For the observation times $ C_i $, we generated it from the uniform distribution over $ (0, 3.5) $ and the exponential distribution with parameter $ \lambda = 0.5 $ or $ 0.7 $. One can see from Table 1 that the aLasso approach gives the smallest FP compared with other methods which means the aLasso chooses unimportant variables much less often than the other methods. At the same time, it kept a fairly high TP and low MWSE. The SCAD method gave the largest TP in most cases among the method considered here.
Table 2 displays the results on the covariate selection with $ p = 10 $ or $ 20 $ in the case of dependent censoring. In this case, we consider different combinations of $ \lambda_0, \lambda_c $ and $ \gamma_0. $ Here, we set all components of $ \gamma_0 $ to be the same, for example, in Table 2, $ \gamma_0 = 0.1 $ means $ \gamma_0' = (0.1, 0.1, \dots, 0.1, 0.1) $ in model (2.3). Keeping $ \gamma_0 $ unchanged, we list four combinations of $ \lambda_0 $ and $ \lambda_c $ in each part, which corresponds to $ \lambda_0 = 0.5 $ or $ 1 $, $ \lambda_c = 0.5 $ or 0.7. As in the case of independent censoring, the aLasso approach gave the smallest FP in all dependent cases.
Also, it can be seen from Tables 1–2 that, as the number of covariates increases, the aLasso tends to give the smallest MWSE and largest TP among the methods considered. Overall, the adaptive Lasso performs well in terms of both variable selection and prediction accuracy.
In this section, we apply the proposed regression selection procedure to a set of data on mice hepatocellular adenoma. This data set arises from a 2-year tumorigenicity study conducted by National Toxicology Program. In the study, groups of mice were exposed to chloroprene at different concentrations by inhalation. Each mouse was examined once for various tumors when it died. Some mice died naturally during the study, and the others who survived at the end of study were sacrificed for examinations. At each examination time, tumors were observed if have developed, but the exact tumor onset times were unknown, therefore, only current status data can be obtained.
Here we considered the liver tumor data, and the covariates on which the information was collected include the initial weight of the mouse, the body weight change, the weight change rate, the gender of the mouse, the dose. For the analysis below, we will focus on 200 mice that either belong to the control group or belong to the PPM80 group.
To apply the aLasso regression procedure, let IW denote the initial weight of the mouse, BWC denote the body weight change and BWCR denote the weight change rate. We define Gender = 1 if the mouse was male and 0 otherwise, PPM80 = 0 if the mouse was in the control group and 1 otherwise. For the analysis, we performed the standardization on the three continuous covariates IW, BWC and BWCR. The analysis results given by the aLasso procedure are presented in Table 3. As in the simulation study and for comparison, we also include the analysis results obtained by applying the other penalized procddures discussed here. ALasso, Lasso and SCAD all suggest that the Gender and the initial weight of the mouse had no relationship with or significant influence on the existence of hepatocellular adenoma.
This paper has discussed the variable selection problem for the additive hazards model based on current status data. In order to select important variables, a penalized log partial likelihood method is developed and the oracle properties are provided. The simulated results suggest that the proposed method performs well for dropping the unimportant variables and retaining the important variables.
As mentioned above, the proposed method can be seen as a generalization of the method given in Zhang and Lu [1], for the case that the model is proportional hazards model and the data is right-censored data. Therefore it could be generalized in several directions. For one, note that in the preceding sections, we assume that $ C $ is independent of $ Z $ and $ T $, it is straightforward to generalize the proposed method to the case where the censoring time $ C $ is not independent of $ Z $ or other type data.
The second direction is that we can change the weights $ \rho_j $ with other estimators since the proofs only require the root-$ n $ consistency of $ \tilde{\beta} $.