We consider a kernel-based online quantile regression problem when the sampling process is unbounded. Let $X$ be an input space, $Y=R$ be an output space and denote $Z=X\times Y$. Learning algorithm is based on samples $\operatorname{Z} =\{(x_i,y_i)\}_{i=1}^T \in Z^T$, $T$ is the sample size, which are drown independently from the Borel measure $\rho$. In previous research, a lot of work was done when the learning scheme is involved with the least square loss $\phi_{l}(u)=u^2, u \in R$. The corresponding target function is the regression function $f_{\rho}:X\rightarrow Y$ by
where $\rho_{x}(\cdot)$ is the conditional distribution of $\rho$ at each $x\in X.$ However, algorithms with the least square will lose robustness if the distribution of the noise has heavy tail or abnormal variance. In 1964, Huber proposed the least modulus method instead of the least square, which weakened the noise condition. For sparsity, Vapnik [5] combined the least modulus method with the threshold value and introduced the $\varepsilon$-insensitive loss $\phi^{\varepsilon}(u)=(|u|-\varepsilon)_{+}, u\in R.$ It is a special case of the pinball loss $\phi_\tau(u):R\rightarrow R_{+}$ with $\varepsilon=0$,
Then the approximation target is a quantile regression function $f_{\rho,\tau}(x)$, which is the $\tau$-quantile of the conditional distribution $\rho_{x}$ at each $x\in X.$ Here a $\tau$-quantile of $\rho_{x}$ means that there exists a value $W\in R$ satisfying
Steinwart and Christmann [1] conducted the error analysis of pinball loss under some noise condition. Furthermore, Xiang et al. [2] investigated the learning ability of $\varepsilon$-insensitive pinball loss
in regularization schemes for sparsity and robustness.
In this paper, we shall associate pinball loss with the online algorithm in the reproducing kernel Hilbert space ${\mathcal H}_K.$ Define a Mercer kernel $K:X\times X\rightarrow R,$ which is continuous, symmetric and semi-definite, ${\mathcal H}_K$ is the completion of linear span of the function set $\{K_{x}=K(x,\cdot),x\in X\}$ with the inner product $\langle\cdot,\cdot\rangle_{K}$ satisfying $\langle K_{x},K_{y}\rangle_{K}=K(x,y).$ Here we consider the varying pinball loss $\phi^{(t)}(u)=\phi_{\tau_ {t}}(u)$, where the quantile parameter $\tau_{t}$ changes with the learning time $t$ and converges to the quantity $\tau$ as $t$ goes to infinity. One point of the paper is to observe the role of $\tau$ in the following algorithm.
Definition 1 The online quantile regression algorithm is defined by $f_1=0$ and
where $\lambda_t>0$ is a regularization parameter, $\eta_t>0$ is a step size and $(\phi^{(t)})_-'$ is the left (one-side) derivative of $\phi^{(t)}$. From the formula of $\phi^{(t)}$, we see that the learning sequence $f_{t}$ with the varying quantile can be expressed as
In error analysis, the parameters $\lambda_{t},\eta_{t}$ are adapted to accelerate the learning rate. The second point of our paper is to abandon the boundness of the output value $y$. Following the framework of [3], the moment hypothesis is exploited.
Moment Hypothesis There exists constants $M\geq 1$ and $C>0$ such that
From the above, one can find that it's a generalization of boundness assumption. The main purpose of this paper is to state the error bound $\|f_{T+1}-f_{\rho,\tau}\|$, where the learning time $T$ is large enough.
From the quantile regression problem, we can estimate the learning performance $\{f_t\}$ defined in (1.1) by the excess generalization error $\varepsilon(f_t)-\varepsilon(f_{\rho,\tau})$. Here the generalization error $\varepsilon(f)$ with a function $f: X\rightarrow Y$ and the pinball loss $\phi_{\tau}(u)$ is defined by
In the following, we denote the generalization error $\varepsilon^{(t)}(f)$ with the varying $\tau_t$ as
The relation between the excess generalization error $\varepsilon(f)-\varepsilon(f_{\rho,\tau})$ and the error bound $\|f-f_{\rho,\tau}\|_{L_{\rho_X}^{r}}$ is explained by the following comparison theorem, which was given in [1].
Definition 2 Let $0<\varphi \leq \infty$ and $\xi> 1$. Denote $r=\varphi \xi /(\varphi+1)>0$. We say that $\rho$ has a $\tau$-quantile of $\varphi$-average type $\xi$ if there exist two positive functions $\omega_{\tau}$ and $b_{\tau}$ on $X$ such that $\{b_{\tau}\omega_{\tau}^{\xi-1}\}^{-1}\in L_{\rho_{X}}^\varphi$ and for any $x\in X$ and $\omega \in (0,\omega_{\tau}(x)]$, there hold
and
In our analysis we shall make use of the following comparison theorem.
Lemma 1 Let $0<\varphi \leq \infty$ and $\xi> 1$. Denote $r=\varphi \xi /(\varphi+1)>0$. If the measure $\rho$ has a $\tau$-quantile of $\varphi$-average type $\xi$, then for any measurable function $f$ on $X$, we have
For the error analysis, we need the approximate error $D(\lambda)$ with $(p,K,\tau)$ is defined by
A minimization $f_{\lambda}$ of (2.2) is called the regularization function. Throughout the paper, we assume that the error bound of $D(\lambda)$ satisfies
with the constant $D_{0}>0$ and the index power $0<\gamma\leq 1$. Now we can present our main theorem.
Theorem 1 Assume the measure $\rho$ has a $\tau$-quantile of $\varphi$-average type $\xi$ and the approximate error (2.3) holds. Take the parameters $\eta_t,\lambda_t$ as the form
with
Then we get
where $\Lambda$ is given by
We are in a position to present the key analysis in our study. We shall consider the sample error $\|f_{T+1}-f_{\lambda_T}\|_{K}$.
Lemma 2 Define the squence $f_t$ by (1.1). Let the parameters $\eta_t,\lambda_t$ be the same form in Theorem 1 and the error bound (2.3) holds. Then
where $\theta^*=\min\{\alpha-\beta,2+\beta \gamma-3\beta-2\alpha\}$ and $C^{'}$ is a constant independent of $T$.
Proof By induction, we assume that $\|f_{t}\|_{K}\leq \frac{\kappa}{\lambda_{t}},t\in N$ and $0<\tau_{t}<1,t\in N$. From the definition of $f_{t}$, we see that
Denote $B_t=(\phi^{(t)})_-'(f_t(x_t)-y_t)K_{x_t}+\lambda_t f_t.$ Then $\|B_t\|_K\leq 2\kappa$ by the fact that $\|(\phi^{(t)})_-'\|_{\infty}\leq 1.$
From definition (1.1), we see by inner products that
For the second term, the reproducing property and the convexity of the loss function $\phi_{t}$ tell us that
Thus, taking expectation with respect to $z_t$, we find
Therefore, together with the bound for $B_t$ and (3.2), we have
We decompose $\|f_{t}-f_{\lambda_t}\|_K$ as $\|f_{t}-f_{\lambda_{t-1}}\|_K$ and the drift error $d_t=\|f_{\lambda_{t-1}}-f_{\lambda_t}\|_K.$ On one hand,
by the lemma in [4]. Noting the fact that $\|f_{\lambda}\|_K\leq \sqrt{\frac{D(\lambda)}{\lambda}}$ for each $\lambda>0.$ Then $d_t\leq d_0 t^{-(1-\frac{\beta}{2}+\frac{\beta^{\gamma}}{2})}$, where $d_0=\beta2^{\beta+1}\sqrt{2D_0\lambda_1^\gamma/\lambda_1}$. Using the elementary inequality $2ab \leq A a^2 b^{q} + b^{2-q}/A$ with $0<q<2, A>0$ to the case of $a=\|f_{t}-f_{\lambda_{t-1}}\|_K, b=d_t$, we obtain
By the above estimats, we get
We take the constant $A=\frac{\frac{1}{2}\eta_1\lambda_1}{d_1^q}$ and $q=\frac{\alpha+\beta}{1-\frac{\beta}{2}+\frac{\beta^\gamma}{2}}$, the restriction of the index power $\alpha,\beta$ implies that
where $\theta=\min\{2+\beta \gamma-2\beta-\alpha,2\alpha\}$ and $C= d_0^{2-q}/A + d_0^{2} +4\kappa^2\eta_1^2>0$.
Applying relation (3.4) iteratively for $t=1, \cdots, T,$ we have that
Applying the following elementary inequality [4] with $0<a_1<1$, $c, a_2>0$ and $t\in \mathbb{N},$
to the case of $a_1=\alpha+\beta <1$, $a_2=\theta$ and $c=\frac{\eta_1\lambda_1}{2}$, we get the desired result (3.1).
Proof of Theorem 1 By the decomposition in [6],
Since $\phi$ is uniformly continuous, then
The above bound together with the comparison theorem (2.1) and Lemma 2 give our conclusion (2.5).