Binomial Count Information: How Do the Usual Approximations Fare?

Focus Decision-making is often aided by examining False Positive Error-risk profiles [FPEs]. In this research report, the decision-making jeopardy that one invites by eschewing the Exact factorial-binomial Probability-values used to form the FPEs in favor of: (i) Normal Approximations [NA], or (ii) Continuity-Corrected Normal Approximations [CCNA] is addressed. Results Referencing an audit context where testing sample sizes for Re-Performance & Re-Calculation protocols are, by economic necessity, in the range of 20 to 100 account items, there are indications that audit decisions would benefit by using the Exact Probability-values. Specifically, using a jeopardy-screen of ±2.5% created by benchmarking the NA & the CCNA by the Exact FPEs, it is observed that: (i) for sample sizes of 100 there is little difference between the Exact and the CCNA FPEs, (ii) almost uniformly for both sample extremes of 20 and 100, the FPEs created using the NA are lower and outside the jeopardy screen, finally (iii) for the CCNA-arm for sample sizes of n = 20, only sometimes are the CCNA FPEs interior to the jeopardy screen. These results call into question not using the Exact Factorial Binomial results. Finally, an illustrative example is offered of an A priori FPE-risk Decision-Grid that can be parametrized and used in a decision-making context.

The slope or derivative is: .
As ∆x becomes very small converges to: 2x as a point-limit. This is sometimes referred to as the Point-Process or Instantaneous Slope. This is not an exact slope function as the concept of a point is an intellectual creation and has no measurable value. For example, assume that we want to compute the slope in the right-hand-side range: [12 to 12.00001] for f(x) = . The Point-Process approximation of the slope is: 24. ; the actual slope is: The actual slope is sometimes referred to as the Precision Adjusted Slope.
Thus, one says that the slope of f(x) = IS: 2x. Yes, as a limit; BUT as can NEVER = 0, the correct statement is that the slope of f(x) = is approximated by 2x. However, does this limiting concept or approximation work in practice. The answer is "It depends". As in the flower garden example, if, for the task at hand, one does not need to be concerned with the precision boarder effectively created by , then the slope of 2x will be an approximation but useful for the task at hand. For example, look at any Micro-or Macro-economics textbook. They correctly use the Point-Process slope as it provides conceptual guidance in that illustrative-domain and would only add confusion to the overall idea of the "instructive exercise".
However, if the analyst had a linear function: α + βx, not an OLS-regression estimate, but an actual a priori justified function, the slope function would be: . In this case, there is no term as the Abscissa indicator in the x-range, the rate of change of: α + βx is exactly equal to β. In this case, there is no approximation β is the exact slope function and so can be used for the task at hand.

Summary
The point of the Area and Slope discussions was offered to note that sometimes the analytic context is characterized With this as the operational mantra, consider the statistical decision making that is very often found in the audit context where the information collected is the Number of Events. In this audit-sampling frame, using the very reasonable audit-protocol requirements that: (i) the audit InCharge [IC] decides to randomly select n-Accounts from a defined collection or population of N-Account Events, (ii) it is possible to randomly select the same Account multiple times and so include it multiple times in the sample of n Accounts-this is usually called sampling with replacement and is necessary to have valid population estimates, (iii) there is a protocol for accurately binary-coding: {Yes=1 or Not Yes =0} the Account so selected, and (iv) the IC using experiential judgement specifies the percentage of Accounts scored as Yes that are Expected to be found in the population of Accounts under audit examination.
This type of protocol is called a Bernoulli-Selection or -Scoring Protocol; however the Probability-value context that will be used is not formed from a general binary Bernoulli Probability Density function-this will be addressed subsequently. For notation, the probability density function is scripted as: Where: n is the number of sampled events from the Account under audit, the Account has N elements-as such, this is the population from which a random sample of size n is taken, and % is the a priori expectation of the percentage of targets or successes in the population of N-individual accounts.
In this context,

gambling-odds are low-this indicates that it is UN-likely that the a priori expectation is TRUE given the large-difference between the a priori expectation and the ACTUAL results. In this case, it is better to opt for rejecting the likelihood that the a priori expectation is the TRUE state of nature in favor of that it is NOT likely to be the TRUE
state of nature.

Research Plan
This is the point of departure of this research report. Following, the plan is to:

The Audit Context
In the audit world, the IC has a vast number of client accounts from which to select in the execution of the audit. The end-game of the audit is to script two opinions: of the IC is not likely to be the case given the collected audit evidence. One of the standard inferential tests in the panoply of the IC is to examine the frequency or the number of binary-Bernoulli occurrences in the Account under audit and to base the EPI-decision on the related inference. The inference engine of choice in the typical case is the FPE-risk.

The FPE-risk of the Binomial: The Exact Case
The most effective way to introduce this inferential-FPE-risk testing case is by way of an example. This was an actual audit context, except the size of the sample has been reduced for exposition. The IC is in the COSO: ICoFR-interim-phase of the audit and has selected: Accounts Payable for testing. The issue under audit is how many accounts have taken qualified time-related discounts and so reduced the amount that was need to close/satisfy their payable obligations. If too few discounts are taken then this could raise ICoFR concerns as to adequate managerial oversight in controlling the resources of the firm and so and may require the IC to consider EPI; also, if too many discounts are taken this may strain the Cash Management possibilities need to navigate the economic context and also may require an EPI.
The IC expects that there will a balance between too few and too many APs settled in the audit year.
Specifically, the IC downloads the Accounts Payable payment satisfaction protocol of the client [AP-P].
After reading the AP-P and allowing for the usual unavoidable and justifiable reasons for not taking the qualified discount in the COVID-19 era the IC decides that 70% of the time the discount on any

5) Summarize the inferential test information to be included in the current audit-working papers
and so to later appear in the permanent audit file.

Partial Introductory Illustration: Clarification of the Computational Forms
In what follows, the details of the computations are presented. Subsequently, the full testing using the full B pdf [20, 30%] EPI-Decision Grid will be detailed.
In this case, the B pdf [20, 30%] is presented in Figure 1:  The computational basis of the above information will aid in understanding the nature of the technical aspects of using the exact Binomial in a decision-making context. After these details are presented and discussed a more instructive operational context will be possible.  Probability-Value of Event (5)  Bernoulli & related Binomial processes were en vogue and so offered as "ball-park estimates" that would be relevant in most of the conceivable practical application areas (Note 1). The lack of interest in the "Error" in using the NA vis-à-vis the Exact Binomial is that a Continuity Correction [CC] is usually offered as "a correction". This is a misnomer as the CC does NOT give the Exact value of the NA of the B pdf [n, %]-it is close but not exact. Consider now the approximations to the Exact Binomial.

Approximations
As noted above, there are also approximations to the B pdf [n, %] that broach the issue in focus for this research report. Specifically: The

The Normal Approximation [NA]
For the NA, one computes or estimates the Mean(µ) and Standard Deviation(σ) and uses these parameters to animate the N(µ,σ) probability density function that can then be used to create

"Correction" of the NA
Additionally, it is possible to use a correction to the Normal Approximation. Effectively, the Normal Curve fitted to the binomial probability-blocks "shaves" off a portion of the exact probability in the approximations. To this end, a dataset was collected to determine preliminary indication information as to the error-risk of using the two usual approximations. This is actually not just an academic investigation, as most of the statistical software that are in use do not offer (i) the continuity correction "option" or (ii) suggest computational alerts that are possible misspecifications in failing to use exact information.  Table 2.

Discussion
The codex for this In Excel: =(T.DIST.2T(ABS(0),10000))/2 = 50.0% With these detailed computations as context, the perusal of the FPE-information in Table 2  (iii) the FPE-risk profiles for the sample size of 20 follow the FPE-profile for the sample size N =100.

Context and Rationale
The analysis of the FPE-risk profiles discussed above is interesting to be sure. However, only two (2) boarder cases were displayed in Table 2. In a statistical analysis addressing inferential relevance, it is necessary to enrich the evaluation context. Thus, the evaluation set of FPEs will be those in the Range As a further illustrative elaboration, consider Table 3 where the five index points are presented for B pdf [20, 30%]. Also see Table 1:

Discussion
The first column is the selected P-values that will be used to form the dataset for the statistical analysis.
To be clear regarding the Index information, Max-2 means that once the maximum P-value is located then the next Event-Point selection accrued is two Events in the LHS-direction from that Max-Point and so on for the next four points. This was achieved by finding the largest Exact Binomial value, this will be a variable given the parameters of B pdf [n, %]. This will always be at and so on. After the maximum value is found, the first two (2) interior-values are passed over and the next five (5) are selected for the dataset. Usually this results in the highest P-value in the range of 25% and progressively lower value as one moves to the last index point. This dynamic will form a reasonable comparison set of points that are in the usual P-value test frontier.  The summary inferential indications are best discussed as the following four profiles.

Analytic Context
In the analysis of the full dataset, so as to not overweight the inference for each of the sample sizes, only the data in the LHS-ranges was used. This produced for each of the two-sample size-arms 75

Summary & Extension: A Complete Illustration
In this section, the details of the use of the Exact Factorial Binomial in a typical audit context are developed and summarized. In this case, Table 5 is an expanded, complete and generalizable version of the A priori FPE-risk Decision-Grid using, for purposes of illustration, B pdf [20, 30%] that was introduced in Table 1. impact of the COVID-19 pandemic, the IC expects that 30% of the AP-contracts will not be paid on time so as to take advantage of the qualified discount. This is the a priori probability expectation that is the basis for the FPE-risk used to make the decision IF: The nature of the execution of the Client's AP-protocol is "in sync" with the 30%-expectation of the IC given the actual evidence. In this regard, before the Staffer's report as to how many of the 20 APs were actually not paid in a timely manner so as to take the discount is given to the IC, the IC parametrizes the A priori FPE-risk Decision-Grid: (3) aggregate to 10.7%. This means: If the population expectation of the IC that 30% of the APs are expected not to be paid were to be TRUE, then the chance of observing in a random sample of 20-APs that three (3) or less were not paid would happen only 10.7% of the time.
This is a relatively low chance or risk that the a priori expectation of 30% is, in fact, likely to be the case; thus, the IC would be justified in rejecting that the expectation of 30% is likely to be the case. Simply, as the FPE-risk or -chance of being correct re: the a priori expectation is only at most 10.7%, this would justify the IC to decide to not accept such low odds of being correct and so rationalize the rejection that 30% is likely to be the case. This rationalizes the "Too Many Paid, label thus, suggesting the likelihood that the IC will launch an EPI.

2) Right Hand Side[RHS] Context:
If the number of APs not paid were to be more than eight (8), the IC apparently felt that too few were paid. In this case, referencing the A priori FPE-risk Decision-Grid for B pdf [20, 30%], the FPE-chance is at most around 11%. This likelihood of being correct in believing that the a priori belief of 30% is TRUE is sufficiently low; and, so would likely suggest that there is little support of the a priori expectation. This rationalizes the label "Too Few Paid". This rejection of the Null-belief in this RHS-direction also may call into question the adequacy ICoFR and require an EPI, finally.
3) Interior Range: If the number of APs not paid is in the interval: {4, {5, 6, 7}, 8}-this is often called the "Goldilocks Zone"; not too many & not too few-just the right number according to the expectation of the IC. In the Goldilocks-Range, as the minimal FPE-risk or -chance is about 22% for Events (4 or 8) the label affixed is "Ok" over this set of events. Rationale: These two-lower limits are likely to be suggestive that the a priori expectation may not be TRUE; however, 22% is not likely to call for the rejection of the likelihood that 30% could be TRUE. Further, any FPE-risk values in the interval {5, 6, 7} would be strong evidence that rejecting the a priori belief of 30% would not be consistent with the evidence. Simply, as the FPE-risk or -chance of being correct re: the a priori expectation is, in the worst case, around 40% this would likely justify the IC deciding to not reject that 30% is likely to be the case.
The A priori FPE-risk Decision-Grid for B pdf [n, %] is a simple, exact and intuitive decision making tool.
Additionally, this model allows the DM to form an "asymmetrical" screening grid. It is the case that there is a prevailing intuition that confidence intervals inherently are and need to be symmetric around some expectation mid-point. The reason for this erroneous but longstanding impression is that in the  Table 5. This is consistent with the best practices execution of the audit.

Outlook
Given the benefits of using an A priori FPE-risk Decision-Grid for creating audit evidence, it would be productive to program a Decision Support System [DSS] to calculate the FPE-Risk profiles so as to facilitate the execution the audit. In this case, the audit would, in the testing domain, be Effective as it used Exact decision-making information and, using the DSS, would be Efficient. It is the audit-hallmark of "Best Practices" to have conducted an Effective and Efficient audit. Finally, as an extension, it would be an excellent inferential enhancement to benchmark the A priori FPE-risk Decision-Grid with a False Negative Error-risk context.