An Empirical Analysis on the Volatility of Return of CSI 300 Index

In order to better observe the trend of the stock market, this paper selects the daily closing price data of CSI 300 index from April 12, 2016 to September 30, 2021, and makes an empirical analysis on the logarithmic return of CSI 300 index. It is found that: (1) the return series of the CSI 300 index shows the statistical characteristics of peak, thick tail, bias, asymmetry and persistence. The ARMA (2,3) model can effectively fit the yield series and predict the future trend to a certain extent. (2) The residuals of ARMA model show obvious cluster effect and ARCH effect (conditional heteroscedasticity). GARCH (1,1) model can better fit the conditional heteroscedasticity, so as to eliminate the ARCH effect. (3) By constructing GARCH (1,1) model, it is found that the sum of ARCH term coefficient and GARCH term coefficient is very close to 1, indicating that GARCH process is wide and stable, the impact on conditional variance is lasting, and the market risk is large, that is, the impact plays an important role in all future forecasts. Shanghai and Shenzhen 300 index. The results show that the factors affecting the strong variation of the yield of Shanghai and Shenzhen 300 index are the lag of orders 4 and 15; There is a strong conditional heteroscedasticity in the yield of CSI 300 index, and tarch model can better eliminate the conditional heteroscedasticity. Compared with the good news, the bad news has a greater impact on 300 there an obvious the

If φ 0 =0，the model is called centralized ARMA(p，q) model. By default, the centralized ARMA(p，q) model can be abbreviated as The delay operator is introduced, and the ARMA (P, q) model is abbreviated as:

Stationary Test
(1) When selecting the data of time series, the stationarity test should be carried out. The test methods usually include: time series diagram, autocorrelation diagram, ADF test, PP test and kpss test. The first two are to judge whether it is stable from the intuitive feeling, while ADF test, PP test and kpss test (i.e. unit root test) are more accurate judgments from the perspective of statistical theory, That is, the T statistics under a given significance level are greater than the T statistics of ADF test and PP test, that is, when the p value is less than 0.05, the sequence is stable, otherwise it is non-stationary. Kpss on the contrary, its original assumption is that the sequence is a stationary sequence, and the T statistics at a given significance level are greater than kpss test statistics, that is, when the p value is greater than 0.05, the sequence is stationary.
(2) When the sequence is non-stationary, that is, it has a certain trend or cycle, it should be processed by difference to eliminate its non-stationary. 1 First order difference Second order difference

Pure Randomness Test
Pure randomness test is also called white noise test. Its purpose is to test whether the sequence is a pure random sequence. When a sequence is a white noise sequence, strictly speaking, there is no correlation between its sequence values, but due to the influence of space and other factors, the sample autocorrelation coefficient is not significant, which is 0, which is statistically expressed as:  Vol. 4, No. 2, 2021 The test statistics of pure random sequences include Q statistics and LB statistics. In practical application, Q statistics has a good test effect in the case of large samples (where n is large), but it is not accurate in the case of small samples. LB statistics is a modification of Q statistics. They are often collectively referred to as Q statistics, which are recorded as QBP Statistics (Q statistics of box and price) and QLB Statistics (Q statistics of box and Ljung). When the Q statistic is greater than the quantile or the p value of the statistic is less than α When, the confidence level can be 1-α Reject the original hypothesis and consider the sequence as a non white noise sequence; Otherwise, the original hypothesis cannot be rejected and the sequence is considered as a pure random sequence.

Model Identification
After calculating the values of sample autocorrelation coefficient and partial autocorrelation coefficient, it is necessary to select an appropriate ARMA model to fit the observed value sequence according to their properties. In fact, this process is to estimate the autocorrelation order and moving average order according to the properties of sample autocorrelation coefficient µ ρ and partial autocorrelation coefficient $ q . Therefore, the process of model identification is also called model order determination process.
The basic principles of ARMA model order determination are shown in the table below:

Parameter Estimation
After selecting the fitting model, the next step is to determine the caliber of the model by using the observations of the sequence, that is, to estimate the value of the unknown parameters in the mode p+q+2 parameters to be estimated： 2 1 1 , , , , , , , Parameter μ Is the sequence mean. Generally, the moment estimation method is used to estimate the overall mean with the sample mean to obtain its estimated value: For the centralization of the original sequence, there are The original p+q+2 parameters to be estimated are reduced to 为 p+q+1： 2 1 1 , , , , , , are three estimation methods for these p+q+1 unknown parameters: moment estimation, maximum likelihood estimation and least square estimation.

Model Significance Test
After the caliber of the fitting model is determined, the fitting model must be tested.
Model test is mainly divided into model significance test and parameter significance test.
The significance test of the model is mainly to test the effectiveness of the model. Whether a model is significantly effective mainly depends on whether the information it extracts is sufficient. A good fitting model should be able to extract almost all the sample related information in the observed value series. In other words, the fitting residual term will no longer contain any relevant information. That is, the residual sequence should be a white noise sequence. Such a model is called a significantly effective model.
On the contrary, if the residual sequence is a non white noise sequence, it means that the relevant information in the residual sequence has not been extracted, which indicates that the fitting model is not effective enough, and it is usually necessary to select other models for re fitting.
Therefore, the significance test of the model is the white noise test of the residual sequence. The original and alternative assumptions are: If the original hypothesis is rejected, it means that there is still relevant information in the residual sequence, and the fitting model is not significant. If the original hypothesis cannot be rejected, the fitting model is considered to be significantly effective. The significance test of parameters is to test whether each unknown parameter is significant non-zero.
The purpose of this test is to simplify the model.
If a parameter is not significant, it means that the influence of the independent variable corresponding to the parameter on the dependent variable is not obvious, and the independent variable can be eliminated from the fitting model. The final model will be represented by a series of independent variables with significantly non-zero parameters.

Cluster Effect
In the macroeconomic and financial fields, we can often see time series with the following characteristics: after eliminating the influence of deterministic non-stationary factors, the fluctuation of residual series is stable in most periods, but it will continue to be large in some periods and small in some periods, showing a cluster effect (Wu & Liu, 2014).
People usually use variance to describe the fluctuation of the sequence. Cluster effect means that the variance of the sequence is basically homogeneous in the whole observation period of the sequence, but the variance is significantly different from the expected variance in a certain period or several periods.
At this time, we need to introduce conditional heteroscedasticity model.

ARCH Test
To fit the ARCH model, ARCH test is needed first. ARCH test is a special heteroscedasticity test. It not only requires the sequence to have heteroscedasticity, but also requires that this heteroscedasticity is caused by some autocorrelation, which can be fitted by the autoregressive model of residual sequence.
The two commonly used statistical methods of ARCH test are portmanteau c test and LM Test.

1). Portmanteau Q Test
In 1983, mold and l proposed portmanteau Q statistical method to test the autocorrelation of the square sequence of residuals. Now it is the statistical method of ARCH test. The construction idea of this test method is that if the variance of the residual sequence is non-homogeneous and has cluster effect, the square sequence of residuals usually has autocorrelation. Therefore, the variance non-homogeneous test can be transformed into the autocorrelation test of the square sequence of residuals.
The assumption of portmanteau Q test as When the P value of Q(q) test statistic is less than the significance level α The original hypothesis is rejected and the variance of the sequence is considered to be non-homogeneous and autocorrelation. 2 When the P value of LM (q) test statistic is less than the significance level α The original hypothesis is rejected, the variance of the sequence is considered to be non-homogeneous, and the autocorrelation in the square sequence of residuals can be fitted by q-order autoregressive model.

GARCH Model
The essence of ARCH model is to use the q-order moving average of residual square sequence to fit the current heteroscedasticity function value. Because the moving average model has the q-order truncation of autocorrelation coefficient, ARCH model is actually only applicable to the short-term autocorrelation process of heteroscedasticity function (Wang, 2015).
However, in practice, the heteroscedasticity function of some residual series has long-term autocorrelation. At this time, if the ARCH model is used to fit the heteroscedasticity function, it will produce a high moving average order, increase the difficulty of parameter estimation and finally affect the fitting accuracy of the ARCH model. In order to correct this problem, bollerslov proposed the generalized autoregressive conditional heteroscedasticity in 1985 (generalized autoregressive conditional heteroskedastic) model, its structure is as follows:

Data Selection and Source
The empirical analysis part selects the daily closing price data of CSI 300 index, and the sample range is from April 12, 2016 to September 30, 2021. Excluding the influence of asynchronous transactions, holidays and other factors, 1335 trading day data are obtained. The data is from the official website of NetEase Finance and Economics. The analysis in this paper is carried out in Rstudio software.
Because this paper mainly studies the yield fluctuation of CSI 300 index, before starting the analysis, the data needs to be transformed into logarithmic yield series, and the transformation formula is as

Descriptive Analysis
In order to understand the fluctuation characteristics of the yield series of CSI 300 index, descriptive statistics are made on the yield series and the sequence distribution diagram is drawn as follows. Table 2 shows the descriptive statistical analysis results of daily logarithmic return of CSI 300 index. It can be seen that the average value of this group of data is very small, indicating that the average return of CSI 300 index is close to 0. Skewness=-1.0680810<0, kurtosis=6.454693>3, indicating that the yield of CSI 300 index is not a standard normal distribution, and this group of data has the distribution characteristics of left deviation and peak. This feature can also be seen in Figure 1. Figure 1 shows the characteristics of peak and thick tail, which shows that the stock price of Shanghai and Shenzhen 300 is easy to fluctuate.  Table 2 shows the descriptive statistical analysis results of daily logarithmic return of CSI 300 index. It can be seen that the average value of this group of data is very small, indicating that the average return of CSI 300 index is close to 0. Skewness=-1.0680810<0, kurtosis=6.454693>3, indicating that the yield of CSI 300 index is not a standard normal distribution, and this group of data has the distribution characteristics of left deviation and peak. This feature can also be seen in Figure 1. Figure 1 shows the characteristics of peak and thick tail, which shows that the stock price of Shanghai and Shenzhen 300 is easy to fluctuate.

Stationary Test
The time series diagram of the yield of CSI 300 index made is as follows: www.scholink.org/ojs/index.php/ijafs International Journal of Accounting and Finance Studies Vol. 4, No. 2, 2021 It can be seen from Figure 2 that the sequence mainly fluctuates around a certain value, without obvious trend or cycle, and can be basically regarded as a stationary sequence. In order to further verify the stability of the sequence, ADF test, PP test and kpss test are also carried out to verify whether the sequence is stable. The results are as follows:

Pure Randomness Test
In order to determine whether the data still has extractable information, a pure randomness test is carried out below, and the results are shown in the table below: It can be seen from the Table 4 that under the condition of significance level of 0.05, the original hypothesis is rejected and it is considered that the yield series of CSI 300 index still has relevant information that can be extracted and cannot be regarded as white noise series.

Model Identification
After a stationary non white noise sequence is obtained, we start to model the sequence. Next, select the appropriate model by observing the autocorrelation diagram and partial autocorrelation diagram of the sequence.

Diagram (Right)
As can be seen from Figure 2, there is no obvious truncation trend in autocorrelation and partial autocorrelation, so we can try to fit the sequence with ARMA (p，q) model. Because the trend is not obvious, auto. ARIMA () is used to automatically identify the model order and obtain the fitted ARIMA (2,0,3), i.e. ARMA (2,3) model. In order to ensure that the selected model is optimal, several models are compared below.

Model Significance Test
It can be seen from Table 5 that the parameters of the model have passed the t-test, so the parameters of the model are significant. The white noise test (pure randomness test) of residual sequence is carried out below, and the test results are as follows:

Cluster Effect Test
In order to determine whether there is cluster effect in the model residuals, the time series diagram and distribution diagram of the model residuals in equation (13) are given below.

Figure 4. Residual Sequence Diagram (Left) and Distribution Diagram (Right)
It can be seen from the residual sequence diagram in Figure 4 that the variance of the sequence is basically homogeneous in the whole sequence observation period, but there are some periods with large fluctuations, showing a cluster effect. In the distribution diagram, it can be seen that the residual sequence presents the characteristic of "peak and thick tail", which, like the cluster effect, can be used as the indication principle of GARCH model.

ARCH Effect Test
To fit ARCH or GARCH model, ARCH effect test must be carried out first. The LM Test (Lagrange multiplier test) is used to verify whether there is ARCH effect in the residual sequence. The results are as follows:

Fitting GARCH Model
The previous paper verified that there is ARCH effect in the residual sequence of the model, and because the heteroscedasticity function of the residual sequence of the return rate often has long-term autocorrelation, if the ARCH model is used to fit the heteroscedasticity function, it will produce a high moving average order, increase the difficulty of parameter estimation and finally affect the fitting accuracy of the ARCH model, so we choose to fit the GARCH model here.
GARCH model generally does not need to be too high. Most people regard GARCH (1,1) model as a "standard" model, so GARCH (1,1) model is also selected here to fit the residual sequence. The fitting parameter results are as follows:  It can be seen from Table 9 that at the significance level of 0.01, portmanteau Q test and LM test show that there is no ARCH effect in this sequence.
In conclusion, it can be considered that the fitting effect of GARCH (1,1) model for residual sequence is relatively good. The expression of some GARCH (1,1) models is as follows: The 95% confidence interval of fluctuation is given below, as shown in the figure: From the GARCH partial model in equation (15), α 1 +β 1 =0.9979<1, very close to 1, indicating that GARCH process is wide and stable, the impact on conditional variance is long-lasting, and the market risk is great, that is, the impact plays an important role in all future forecasts.

Conclusion
Based on the daily closing price trading data of CSI 300 index, this paper constructs the logarithmic return series and carries out modeling analysis. The conclusions are as follows: (1) The yield series of CSI 300 index shows the statistical characteristics of peak, thick tail and bias, as well as asymmetry and persistence. By constructing ARMA (2,3) model, the yield series can be fitted effectively and accurately, and the future trend can be predicted to a certain extent.  Vol. 4, No. 2, 2021 (2) The residuals of ARMA model show obvious cluster effect and ARCH effect (i.e. conditional heteroscedasticity). GARCH (1,1) model can better fit the conditional heteroscedasticity, so as to eliminate the ARCH effect.
(3) By constructing GARCH (1,1) model, it is found that the sum of ARCH term coefficient and GARCH term coefficient is less than 1 but very close to 1, indicating that GARCH process is wide and stable, the impact on conditional variance is lasting, and the market risk is large, that is, the impact plays an important role in all future forecasts.