Insurance company claim is an important factor in its development. Insurance companies commonly use exponential distribution, lognormal distribution or Pareto distribution to fit claims data and control risk. In the insurance claim model, making premium to insurance company and resisting the risk, Pareto distribution model is of guiding significance. It is suitable for fitting large claims data. From the historical data, insurance claim often shows high positive bias. On the distribution, it shows fat tail shape. But Pareto distribution has a heavy-tailed charcteristic. So when simulating these data, Pareto distribution is popular with the scholars.
The Pareto positive stable (PPS) distribution was firstly proposed by Sarabia and Prieto in their thesis in 2009 [1]. They explained the reason why the Pareto positive stable distribution is used to model losses in insurance. For instance, the Pareto positive stable distribution easily fit and have a simple quantile expression. It makes the Monte Carlo simulation simple. For the risk value, an analytical expression is provided. Ortobelli et al. [2] proposed some stable Paretian models for optimal portfolio selection and for quantifying the risk of a given portfolio. Guillen et al. [3] proposed using the Pareto positive stable distribution simulation insurance data and studied it.
This paper firstly introduces Pareto positive stable distribution, its the probability density function and the quantile function. Then we give the Pareto positive stable distribution moment estimation, regression estimation, and maximum likelihood estimation. Then, for the randomly generated data using maximum likelihood estimation method, we do estimation for the parameters of the Pareto positive stable distribution, the normal distribution and the Pareto distribution. And their parameters are compared. By AIC information criterion [4], we get the Pareto positive stable distribution can better fit the data in insurance claims. Therefore, the Pareto positive stable distribution can better analyze insurance claims data.
The Pareto positive stable distribution is given by
where $\lambda, \sigma, \nu>0$. Note that $\lambda, \nu>0$ are shape parameters and $\sigma$ is a scale parameter.
Derivating to the Pareto positive stable cumulative distribution function, we can obtain the probability density function ($pdf$) of it:
The Pareto positive stable distribution have a two-fold origin. %It can be obtained by two distributions. One is the classical Pareto distribution [5-6]. Its cumulative distribution function ($cdf$) is
where $\alpha>0$ is a shape parameter and $\sigma$ is a scale parameter, which represents the smallest value in the sample. Let $\alpha=\lambda\nu$, then we can obtain the cumulative distribution function of the PPS distribution. %The Pareto positive stable distribution The other is from a simple transformation of the classical Weibull distribution [7-8]. Let Z be a classical Weibull distribution with cumulative distribution function ($cdf$)
where $\nu>0$. Then, the new random variable
follow the Pareto positive stable distribution, denoted by $X\thicksim PPS(\lambda, \sigma, \nu)$, where $\sigma, \lambda>0$.
Figure 1, Figure 2, Figure 3 and Figure 4 are the probability density function of the $PPS$ distribution with different parameters.
The quantile function of the Pareto positive stable distribution can be easily obtained. Let $p=F(x)$, then
where $x\geq\sigma$, and then
we can obtain
Let $x_{1}, x_{2}, \cdots, x_{n}$ be a sample of size n drawn from a Pareto positive stable distribution. We assume that parameter $\sigma$ is the smallest sample value. Then, we introduce three estimation methods of Pareto positive stable distribution: moments estimates, regression estimates and maximum likelihood estimates. We define the random variable $Z=\log(X/\sigma)$ and its observed values is $z_{i}=\log(x_{i}/\sigma), i=1, 2, \cdots, n$.
The $r$-order origin moments of the random variable $z$ is
where $\Gamma(x)=\displaystyle\int_{0}^{\infty}e^{-t}t^{x-1}dt$, $\Gamma(1+\frac{r}{\nu})=\displaystyle\int_{0}^{\infty}e^{-t}t^{\frac{r}{\nu}}dt$.
Note that $E(Z)=\bar{z}=\lambda^{-1/\nu}\Gamma(1+\frac{1}{\nu})$, $E(Z^{2})=\lambda^{-2/\nu}\Gamma(1+\frac{2}{\nu})$, and
thus
where $\bar{z}=\frac{1}{n}\sum\limits_{i=1}^{n}z_{i}$ and $s_{z}^{2}=\frac{1}{n}\sum\limits_{i=1}^{n}(z_{i}-\bar{z})^{2}$ are mean and variance of sample to random variable $Z$ respectively. We solve the estimator of $\nu$ from the formula (3.1), since $E(Z)=\bar{z}=\lambda^{-1/\nu}\Gamma(1+\frac{1}{\nu})$, we obtain the %following estimator of $\lambda$:
From expression (2.1), taking logarithms twice in $1-F(x)$, we get
If $\sigma$ is know, it is a linear relation in $\log[\log(x/\sigma)]$. Let $a=\log\lambda$, $X=\log[\log(x/\sigma)]$, $b=\nu$, and $y_{i}=\log[-\log(1-F_{n}(x_{i}))]$, then the residual sum of squares ($RSS$):
Taking partial derivative for $RSS$ we get
Because $X_{i}$ are not all equal, the coefficient determinant
hence, equations have a unique solution. The estimators of $b, a$ are
where $\overline{X}=\frac{1}{n}\sum\limits_{i=1}^{n}X_{i}$, $\overline{y}=\frac{1}{n}\sum\limits_{i=1}^{n}y_{i}$, $X_{i}=\log z_{i}$, then
and
From expression (2.2) of the probability density function ($pdf$) of the $PPS$ distribution, the log-likelihood function of $PPS$ distribution is given by
Taking partial derivative with respect to $\lambda$ and $\nu$ we obtain the equations
We solve %eliminate $\lambda$ in the first equation and put it into the second equation. We obtain the equation in $\nu$,
and solve the estimator $\widehat{\nu}$. Then we put it into the first equation and so obtain the estimator of $\lambda$:
We consider the data on motor insurance claims of a major insurance company. A sample of 518 randomly generates by MATLAB between the minimum and maximum claims. For each claim $i$, we observe $X_{1}$ (cost of property damage) and $X_{2}$ (cost of medical expenses). Unit of data is thousand of yuan. %both of them expressed in thousands of yuan. The basic numerical characteristics of $X_{1}$ and $X_{2}$ can be seen in Table 1.
In order to test the adequacy of the Pareto positive stable distribution for data, we estimate the parameters of the $PPS$ distribution by the date $X_{1}$ and $X_{2}$ with the maximum likelihood estimation method. The parameter values see Table 2.
Let $X_{(1)}, X_{(2)}, \cdots, X_{(n)}$ and $Z_{(1)}, Z_{(2)}, \cdots, Z_{(n)}$ be the order statistics of $X_{1}$and $X_{2}$ respectively. $F_{n}(x_{(i)})$ $=\frac{i}{n+1}$ is the empirical cumulative distribution function of the sample, then $1-F_{n}(x_{(i)})$ corresponds to the rank of the $ith$ data divided by $n+1$. For the two sets of data, we take logarithm. The horizontal axis represents the natural logarithm of size of the size of the sample observation value and the vertical axis represents the logarithm of the samples' rank. Then fitting, the abscissa is $\log(x)$ and the ordinate is $\log[(n+1)(1-F(x))]$, as showing in Figure 5 and Figure 6.
By equation (3.3), then
Hence, if $x_{(1)}, x_{(2)}, \cdots, x_{(n)}$ follow the Pareto positive stable distribution, we infer the double log-log scatter plots demonstrate linear features and slope is positive. For the two sets of data, we take logarithm twice. The horizontal axis represents $\log[\log(\frac{size}{\sigma})]$ and the vertical axis represents $\log[-\log(\frac{rank}{n+1})]$. Then fitting, the abscissa is $\log[\log(\frac{x}{\sigma})]$ and the ordinate is $\log[-\log(1-F(x)]$, as showing in Figure 7 and Figure 8.
Form Figure 5 and Figure 6, we find as long as a deviation appears in the large data fitting, this is just corresponding to huge claims.% If we observe Form Figure 7 and Figure 8, we find that both plots are clearly linear, which supports the assumption of the Pareto positive stable distribution for both sets of data. Therefore, these two sets of data fit well.
In order to illustrate the advantages of the Pareto positive stable distribution for fitting insurance claims data, we compare the Pareto positive stable distribution with the normal distribution and the Pareto distribution. Table 3 is the expression of probability density functions and the cumulative distribution functions of different distributions.
All the parameters of three distributions are got by maximum likelihood estimation. The parameters' estimated results are shown in Table 4.
We select the preferred model by using Akaike information criterion ($AIC$). Akaike information criterion is defined as
where $s$ is the number of parameters and $\log\ell$ is the log-likelihood function. Akaike information criterion shows that the preferred model is the one with the lowest $AIC$ value. If the $AIC$ value of the $PPS$ distribution is smaller than the $AIC$ value of the normal and Pareto distribution, indicating the $PPS$ distribution can fit date better than the normal distribution and Pareto distribution. Importing $X_{1}$ and $X_{2}$ into MATLAB to calculate, we can see the $AIC$ (normal)-$AIC$($PPS$) and $AIC$(Pareto)-$AIC$($PPS$) as a function of the sample sequence number $N$ to plot. If the difference is positive, which means that the $AIC$ of the $PPS$ distribution is smaller, it shows the fitting effect of the $PPS$ distribution is better.
Looking at Figure 9, Figure 10, Figure 11 and Figure 12, regardless of the data $X_{1}$ or $X_{2}$, the vast majority difference of $AIC$ value is positive, so it shows the $PPS$ distribution is better than the normal distribution and Pareto distribution for fitting insurance claims data.
In the insurance claims, there exists many small and large claims. Some insurance claims data are with relatively thick tail, for example motor vehicle insurance. For this insurance data, using the Pareto positive stable distribution to fit, it will get better fitting effect. The Pareto positive stable distribution has the simple expression of probability density function and quantile function. Its parameters estimate can be obtained by moments estimates, regression estimates and maximum likelihood estimates. On the basis of parameter estimates, comparing with other distributions, the Pareto positive stable distribution can fit better insurance claims data. Therefore, in the insurance industry, using Pareto positive stable distribution in the analysis of insurance claims data has a better application.