Forecasting Confidence Intervals: Sensitivity Respecting Panel-Data Point-Value Replacement Protocols

In the practice of Time Series [TS] forecasting there are very often situations where it is prudent to modify certain “outlier” values in the TS-Panel. A simple modification protocol is to replace selected TS-points by the Average of their adjacent Near-Neighbor-points [ANN]. Thus, a research question, not previously addressed and thus of interest, is: Are ANN-TS modifications balanced—50% of the time provoking OLS-forecasting variation, thus reducing the predicative acuity of the 95% Confidence Intervals; and, by symmetry, 50% of the time smoothing resulting in more predicative resolution. Research Plan To address this question, we: (i) collected accounting information to be forecasted from firms on the BloombergTM terminals for Income Statement and Balance Sheet sensitive variables, (ii) formed three ANN-modifications, and (iii) computed 95% Confidence Intervals using the firm-Panel and the three modified Panels. Results Regarding the research question, surprisingly the ANN-replacement protocols were not balanced. In fact, about 2/3 of the time the ANN was smoothing in nature, and thus about 1/3 of the time the ANN-protocol provoked OLS-variation. We discuss the important implication of this result for forecasting in the economic context.


Introduction
The quality and utility of a forecast is dependent on the nature of the time series that is used to make the desired projections. This is hardly surprising and is discussed in detail by Hanke and Wichern (2003) that we used in our forecasting courses. However, this core-concept has promoted, and to some extent, is used as the logical and rational justification for the following pre-analysis forecasting data-"organization" protocol: If there are data-points in the TS-panel that do not seem as they would contribute projections that would inform the decision-making process, then the analysist is justified in replacing them with more appropriate panel realizations.
The usual reasons necessitating such a protocol are: (i) not infrequently there are download/importing errors in accruing data through e-links between the software, such as Excel™, and the Internet as filtered through ever-changing Browser-links, (ii) sometimes outliers in the TS are identified by outlier-screens, such the Tukey (1977) Box-Whiskers-Plot, and others presented in SAS (2014), and, by protocol, they are replaced, or (iii) the analyst decides that certain values in the TS are not likely to be representative and so may bias forecasting projections, and they are judgmentally replaced.
Interestingly, this TS-modification gestalt seems to be intrinsic to the practice of forecasting; we have seen this in operation over the years being practiced by academic and senior forecasters alike. While one may justifiably quibble with "Judgmental or Expert Opinion" adjustment of TS-points, it is not infrequently the case, in the e-download world, that datasets are corrupted and there are missing data-points [MO] or Additive-Errors [AO] where, in the process of creating the value of a particular data-point, there was a value [ ], of a non-trivial magnitude, that was inadvertently "added" to the value reported in the download. This is often noted as: These are sometimes called Level Shifts [LS]. This is a slight misnomer as most LS are temporal spikes or mini-plateaus the effects of which dissipate over time as, for example, a "V-Bound". A true LS is a Step-Level change that establishes a new baseline for the time series. As a point of information, distinguishing between a temporal and permanent LS is fraught with difficulty. For most of the literature, MO, AO, & LS are subsumed under the rubric of Outliers.
As there seems to be: (i) a linguistic behavioral "imperative" for analysts to correct for outliers, (ii) a plethora of outlier screening platforms, the use of which is synonymous with best practices in creating meaningful forecasts and/or data-analytic recommendations, and (iii) analytic-platforms that have output conditioning platforms to adjust for outliers, we should expect that a study that addresses the impact replacement of outliers would be of practical value. This is the departure point of our research report the focus of which is to: 1) Review the Literature on Outliers to give focus to our study, 2) Present the TS-Forecasting Model that we will use to form the forecasts and the related 95% Illustrations that can be used to examine aspects of the modifications that would inform their decision needs, and 7) Offer a Summary and an Extension of this study.

Literature Review
The groundbreaking work of G.E.P. Box and G. Jenkins circa 1976 was the watershed event for the proliferation of research on outliers in the projective-modeling context. One of the research reports at the inception of this study domain was Hillmer (1984). Hillmer examined AO in a simple autoregressive context ARIMA (1,0,0) and also for the Seasonal [B=12] ARIMA-model. His results focused on model "in-adequacy" issues that could be detected by monitoring the one-period-ahead forecasts. In this context, Hillmer used an example presented by Box and Jenkins to illustrate his model-correction protocol for AOs. Hillmer's study suggests that AOs affect the confidence intervals and the forecasts. He observes that the dominant effect is for AOs that occur early in the TS rather than later. Since Hillmer's publication, there have been a number of derivative studies addressing generalized outliers in an ARIMA-context where identification and correction protocols have been researched; usually, these protocols are demonstrated and evaluated using simulations. As these results are along the same lines as reported by Hillmer, we only note the key references. The following are two examples of studies separated by about 10-years. The first is Chen and Liu (1993). They consider four types of outliers/errors in the ARIMA-context: AO, Innovation Outliers (IO), Temporal Change (TC), and Level Shift (LS). For their detection/correction protocol, they offer that judgmental methods aided by statistical methods may be needed to identify which of these four cases is most likely to best characterize the issue under examination. About 10-years later, Hotta, Pereira-Valls and Ota (2004) using simulations addressed ARIMA-error detection and the resulting effect in a design blocked on Disaggregated and the related Aggregated TS-data. They report various effects between the Disaggregated-and the related Aggregated-arms. Interestingly, research reports dealing with ARIMA-models and related Panels used in detection and correction of outliers have waned in the last few years. We did a search where the first point =0 is indexed as 1 and the points follow as integers to the last point all of which are used in the OLS-fitting.

OLSR Inference from the Excel Parameter Range Model
The Excel Regression functionality forms a "wide-covering" confidence interval. These 95%CIs are effectively extreme case CI-scenarios as they are produced from the two crisp-end-point parameters of the 95%CI for the intercept and the slope jointly. See Gaber and Lusk (2017). For this reason, we are NOT interested in the capture-rate of these 95%CI; almost certainly, these wide-95%CIs will capture most of the one-period-ahead holdbacks. We are creating these confidence intervals so as to measure the effect of the TS-data-point replacements on the Forecast and the related Precision. Following, we offer the standard notation: T.INV.2T(5%,(N-2)); h=1 and is the forecasting horizon; N is the last time index in the data-stream and also the number of TS-points used to fit the OLSR.

Instructive Illustration
It is usual to script an illustration to clarify these computations. This will be done following using the data in Table 1 For a one-period ahead forecast, h=1, we produced the following information: (3.a) The TS-version of the 95%CIs for the HSY-dataset are: Boundary for the first projection horizon [h=1] is: (4.a)  the replacement protocol. This is interesting as, in our experience, the ANN-protocol is (i) Intuitive in an autocorrelation context in particular for ARIMA(1,d,0) or the ARIMA(0,2,2)/Holt processes given a Fixed Effects generating process as is the case for most traded organizations. See Lusk and Halperin (2016), (ii) it is simple to program using non-VBA Excel functionalities, (iii) there are no "seasonal" blending issues as seasonality is effectively hidden or smoothed by a yearly-index, (iv) it avoids the temptation to make "judgmental" replacements which may offer dysfunctional data-creation or engineering possibilities, and (v) the ANN seems to be non-differentiable-of the same ilk-from many of the other "simple" replacement protocols such as Median or full-Panel averages. Following the various datasets and modifications are detailed.

Accounting Variable Set for Forecasting
First, we did not modify the downloaded dataset of the BP. For the OLSR-fit, we use all of the Panel data. The BP is the benchmark for the ANN-modifications. For each firm & account dataset, we made the following three ANN replacements.

Both:
Here we made two ANN replacements: The second Panel point and the next to last Panel point.

Late:
Here we made one ANN replacement: The next to last Panel point.

Early:
Here we made one ANN replacements: The second Panel point.
For example, we have the BP, n =12, and:

Nature of the Replacement Protocols: Smoothing or Provoking Variation
The ANN will, of course, have an effect on the nature of the TS depending on the relationship between the TS-value to be replaced by the ANN-Protocol. The question is: What is that effect likely to be?
The ANN-effect is usually measured by the change in the Standard Error [√ ], where: the MSE is the Mean Squared Error of the residuals of the OLSR-Fit of the TS as the denominator in ratio to that of the ANN-modified series as the numerator; note this ratio change as: ∆ � . When ∆ � > 1.0, the ANN-modification created more scaled OLSR-variation than was the case for the original TS: in this case, the ANN-modification is labeled Provoking. When ∆ � < 1.0, the ANN-modification created less scaled variation than was the case for the original TS: in this case, the ANN-modification is labeled: Smoothing. The last case is: ∆ � = 1.0, and the ANN-modification is labeled: Neutral. We are interested in the ∆ � as this ratio is also the relative ratio of the precision of the TS to any of the ANN-modifications. Thus, Provoking modifications will create a wider Confidence Interval; and Smoothing will create a smaller Confidence Interval. An illustration will be most helpful at this point.
Assume that we have the SONY series as presented in Table 3 that presents all the relative measures for the TS and the three ANN-modifications: In the SONY case, as a graphical elucidation, all of the four instances presented in Table 3 are also profiled in Figure 1.  Table 3, the OLSR: √ for the ANN-Early-TS will be higher given the modification created by the ANN-Early point modification as it is further away from the SONY:TS line [Point]. Thus the ∆ � will be: 1.035 [0.096 / 0.093]. As ∆ � is > 1.0 this means that the precision of the 95%CI of the ANN-Early will be > than that of the SONY:TS. This then means that the width 95%CI of the ANN-Early will be wider than that of the SONY:TS. This is what we see in Table 3; also this is consistent with Hillmer's research report.

Hypothesis and Results
The trimming the ANN-impact dataset that is created if ∆ � is ≠ 1.0 and thus is used to count the number of these instances. This is in form the protocol used by Makridakis, Hibon, Lusk and Belhadjali (1987).
To form the usual inferential information, we will proffer an a priori context for addressing the testing of these three research questions Testing Context Mathematically, the ANN-modifications are neutral point blends-i.e., nearest neighbor or adjacent point-averages. Usually, a Panel of firm accounting data traded on exchanges is characterized by: (i) a well-defined exogenous random-stochastic variation, usually termed white-noise, and (ii) a endogenous value-generating process that is usually an ARIMA process no more complicated than an ARIMA(1,0,0) or an ARIMA(0,2,2)-simply: autocorrelation with a relatively small random perturbation.  Vol. 3, No. 2, 2020 strongly indicates that overall the ANN-protocol applied over the TS resulted in a major smoothing.
Pedagogic Note It is instructive for a class presentation to observe that most all of the ANN-modifications move the sinusoidal TS-Panel points towards the basic OLSR-line. Thus, this simple sinusoid-mimicking-graphic, suggests that there is likely a tendency for the ANN-modifications to be more often Smoothing in nature than Provoking. This can thus be formed into an operational hypothesis expectation so as to give an a priori context to the three questions above.

Inferential and Exploratory Results
To provide a complete rendering of the testing profiles, we have created Table 4. With this results profile, we will offer parametric and, in the service of robustness, non-parametric alternatives to inferentially probe this dataset. Given the nature of this research, as expressed in the HC, the testing will be a blend of a priori testing-to wit  the relevant RHS. To reinforce this result: the z-value-using 50% in the standard error computation with no continuity correction-is: 6.3; this produces a directional P-value of < 0.0001. Both of these results indicate that rejection of the Null is the obvious and so logical inferential choice in favor of the likelihood that the ANN-modifications are biased/pre-disposed to Smoothing for Accounting Firm data from traded organizations. The inference, using the counts, is the correct inferential context. However, as additional confirmatory or vetting information, following are the Means & Medians of the ∆ � s and the related exploratory tests. Smoothing. Specifically, we will be using the cells in Table 4 the numbers of which are in Italics. For the Provoking tests the ∆ � s are presented in Table 6 following:  Table 7 following: The simplest test is to benchmark one Impact-set by the other. The ideal test, in this case, is to create inferential information using the Chi2 analysis. As in the above analysis, this will create an overall inferential measure and if the overall Chi2-infernetial indication suggests rejecting the Null of no relative proportional differences, we will explore this fact. The Provoking & Smoothing Chi2 Table is the sub-matrix that has the counts Bolded in  Vol. 3, No. 2, 2020 instabilities are in the offing. Thus controlling for Panel length may provide interesting test results.
Finally, we have selected the ANN-modification as this is the most basic and, in our experience, the most often used when outliers are a foot. At the other end of the spectrum are the Missing Data Analyses detailed by Enders (2010). These are relevant if the rigorous assumptions as to causality are likely to be the generating process(es). In any case, it would be productive to research the effect of these "Enders"-reconstitution protocols on the CIs and the related smoothing and provoking profiles.