Difference between statistical sampling and non statistical sampling

Non-statistical sampling (also called non-probability sampling) sounds to me like an oxymoron same as for example "unbiased opinion" or "approximate solution". Nevertheless, it is a recognised research method, which evolved roughly between the 1620s-1880s as a predecessor to more reliable sampling, which is supported by statistics (Bethlehem 2009). Nowadays, non-statistical sampling is still used, for example in applied social research and in internal auditing. Use of non-statistical sampling in internal auditing and control assurance is the focus of this article.

Since the 17th century, the goal of sampling was to produce reliable estimates about population size or its characteristics. Today we know that reliable population estimates with a quantifiable margin of error can only be made from samples where: (1) probability of selecting each unit is known, and greater than zero; (2) the selection is random; and (3) statistical theory is used to quantify the population estimate and its accuracy (ABS). This type of sampling is referred to as statistical sampling (also probability sampling). If any of the above criteria are not met, the sampling becomes a non-statistical sampling. The critical drawback of the non-statistical sampling is the inability to make inference about the population from the sample observation.

In research, statistical sampling is the norm; however, applied social researchers sometimes opt to use the non-statistical sampling, because statistical approach may not be feasible or practical (Trochim 2020). Probably the most justified use of non-statistical sampling is when the population is hard to obtain or locate, e.g. there is no list with contact details of homeless people, or customers, who buy almond milk. Therefore social or market researchers need to go to the streets or shopping malls and try to "catch" their sample. Also, they might not need to project their sample observations to the entire population.

Internal auditors or risk professionals provide reasonable assurance rather than absolute assurance for various reasons, mainly cost-effectiveness. Therefore, sampling is often used as opposed to the examination of the entire population, with various degree of disclosure about the risk of using samples. I will set aside for another article the Computer Assisted Auditing Techniques (CAATs), also called Data Analytics, which aim for cost-efficient testing of the entire population through the application of various data sciences, including machine learning and artificial intelligence.

The Institute of Internal Auditors (IIA) gave internal auditors an option to use either statistical or non-statistical sampling in the 2013 version of the IIA Standards in one of the Practice Advisories (IIA PA 2320-3). In the current 2017 version of the IIA Standards, the PA 2320-3 or similar guidance is absent. Nevertheless, in a 2017 IIA Australia White Paper on internal audit sampling the author explains the appropriate application of non-statistical sampling: "To establish the existence of errors requires that only one error is found. If the auditor has … knowledge of where the error is likely to be, then there is no need to undertake formal statistical sampling." (Parkinson 2017)    

In internal auditing, control assurance, compliance testing or financial services quality control (e.g. over complaints handling, claims handling, or sales practices), the result of a test is usually binary: a "pass" or a "fail". An instance of control was either executed as designed or not, a claim was handled correctly or not, a matter subject to assurance either does or does not comply with applicable policy, regulation or norm. If one failure is enough to conclude a control is ineffective, the auditor suspects where to find that error and she indeed finds that error in a small sample, then non-statistical sampling is a very efficient and justified method. The auditor concludes the control as a whole does not operate effectively. But if there is no failure identified in the sample or the auditor (applying the concept of significance) or the management (in their risk appetite statement) have some tolerance for failure, making a conclusion is tricky. To make an inference from a sample to the entire population is not possible without statistics.

Imagine I have a bag of jelly beans with 12 beans left in it. Three of them are red, and 9 of them are green. I take out three without looking, and all three are green. Would I conclude from this observation that all beans in the bag are green? There is almost a 40% chance all three beans will be green on a random draw, which is also the risk I will miss the three red beans in the bag. Now replace the bag of jelly beans with an annual population of a monthly control. With a sample of three months, there is almost 40% risk I will make a wrong conclusion about operating effectiveness of the control, if it was not performed correctly three times during the year, i.e. 25% true failure rate. Table 1 below shows the probabilities of missing the true failure rate on a sample of n daily controls selected randomly from one year of operation if all sampled control instances were correctly executed. Note this assumes the number of business days in a year equals 250. The probabilities would be similar, although higher for larger populations (251 to 10,000 items). Refer to the Appendix for the definition of the model.

TABLE 1

From Table 1, we can also see that to be 99% sure (1% risk of the wrong conclusion) the true failure rate is less than 10% we need a random sample of 40 items, all of which are assessed as "pass". We can also assess the assurance obtained from using so-called Non-Statistical Sampling Tables (NSST), I came across in the past.

TABLE 2

Table 2 above shows the level of assurance an auditor can give from the sample size quoted by NSST. For example, if she tested five instances of a weekly control with a "pass" result, then she can conclude with 28% and 42% confidence the true error rate is no greater than 5% (3 weeks) and 10% (5 weeks) respectively. This, to me, does not sound very assuring. If we defined "reasonable assurance", for example, as 90% confidence of an error rate no greater than 10% ("90%/10%"), then the sample has to increase to 18 instances of the weekly control. The key consideration is the definition of "reasonable assurance". The sample sizes (especially at the upper boundary of the interval) for daily and multiple times per day ("> Daily") control frequencies given by NSST provide a fairly high level of assurance.

I have seen a rationale to justify the somewhat lower sample sizes given by the NSST. It is that the sample items should be chosen judgmentally by targeting the riskiest items in the population. The assumption is that if high-risk items "pass" the test, then the rest of the population will likely "pass" too. However, this may not always hold because the riskiest items may not be easy to identify. Even if the high-risk items are identifiable, the sample will be biassed towards this specific sub-population, which may exhibit very different characteristics than the rest of the population.

I believe the NSST originated a long time ago from external auditing standards or their interpretation by the public accounting firms. The difference in the use of these tables in external auditing compared to internal auditing is that external auditors provide assurance over financial statements and top up their testing of controls with substantive testing of accounting balances. They also work with a quantitative definition of financial statements materiality rather than the usually qualitative concept of significance used in internal auditing. For these reasons and the statistical argument presented above the use of NSST in internal auditing and control assurance should be approached with caution.

An alternative method to determine non-statistical sampling sizes I came across in practice is sampling a percentage of the population, e.g. 10%. This method does not improve the non-statistical sampling results. There is no justification for such an approach. Statistically, the sample size is mainly a function of accuracy, not the population size. For example, a 10% sample from 250 items will give you significantly less assurance than a sample of 10% from 2,500 items. Lastly, it is very impractical for capacity planning. Let's assume closed insurance claims fluctuate monthly between 500 and 2,000. If a monthly quality assurance over claims handling is performed on a 10% sample of closed claims, then the sample is going to range between 50 and 200 claims. It is better to target 100 claims each month and keep enough quality assurance staff to handle this specific volume. A statistical sample of 100 will give a very good insight into a population of 500 as well as 2,000, which is consistent and comparable over time and cumulative results can even improve the accuracy of conclusions.

So far I have focused on the case of no errors in the sample. Another case is when errors are found in the sample, but some errors are tolerated. Let's consider testing 40 instances of daily control and finding two errors in the sample (5% sample error rate). If a 5% error rate is tolerated, can I conclude the control operates effectively? How confident am I? It is mathematically more challenging and beyond the scope of this article to answer these questions. However, a similar trade-off between sample size and accuracy will hold. Higher samples will produce higher accuracy (confidence) and increasing the original sample size can provide the desired level of assurance.

In conclusion, non-statistical sampling should be used only to prove a control is not effective, by finding one or more errors and having tolerance for none. In all other cases, only statistical sampling leads to unbiased and quantifiable assurance about the effectiveness of controls or correctness of auditable matters. The Non-Statistical Sampling Tables may provide a false sense of security. This is because conclusions derived from the sample sizes they prescribe may not be strong enough to meet the expectations for a reasonable assurance. The standard setters, particularly the IIA, should provide more guidance, tools and practical examples on the use of statistical sampling, significant limitations of non-statistical sampling and mandate standardised disclosures about the risk of using sampling.

***

Disclaimer

Views and opinions expressed in this article are solely my own and do not represent those of people, institutions, or organisations that I may or may not be associated with, in a professional or personal capacity.

Appendix: The Urn Model (Hypergeometric Distribution)

This model computes the probability Pn of drawing only green beans on n consecutive draws from a bag with N beans of which eN are red. Beans are not replaced, i.e. every bean taken out of the bag is eaten. This simulates test of controls on a sample selected randomly. Every instance of the control in the sample can be evaluated by the tester as either "executed as designed" ("pass" or "green") or "not executed as designed" ("fail" or "red"). The model answers the question of what is the risk (likelihood) of missing the true failure rate e in the population of N control instances, given all sampled instances of the control were a "pass".

Where:

  • Nis the total number of beans in the bag;
  • e is the fraction of red beans (the number of red beans divided by N);
  • p1 is the probability of drawing a green bean on the first draw;
  • pn is the probability of drawing a green bean on draw number n given n-1 green beans were drawn in the previous n-1 draws; and
  • Pn is the probability of drawing only green beans in n consecutive draws

For a twelve-bean bag with three red beans and drawing three green beans as a result of three random draws (test of a monthly control operating over a period of 12 months using a random sample of three months and the true failure rate of 25%) the model yields:

Note, the model was developed by the author based on various YouTube tutorials on the Hypergeometric Distribution, Conditional Probability and the Urn Problem. MS Excel software was used for the computation of tables and figures presented in this article.

Reference

  1. Bethlehem, Jelke (2009), The Rise of Survey Sampling, Dalton Transactions 
  2. Australian Bureau of Statistics (ABS), www.abs.gov.au
  3. Prof William M.K. Trochim (2020), Research Methods Knowledge Base, https://conjointly.com/kb/
  4. Practice Advisory 2320-3: Audit Sampling, abolished in 2017, Institute of Internal Auditors
  5. Parkinson, Michael (2017), White Paper – Internal Audit Sampling, IIA Australia

What is the major difference between statistical and Nonstatistical sampling in substantive testing?

The difference between the two types of sampling is that the sampling risk of a statistical plan can be measured and controlled, while even a perfectly designed nonstatistical plan cannot provide for the measurement of sampling risk.

What is non

Non-statistical sampling is the selection of a test group that is based on the examiner's judgment, rather than a formal statistical method. For example, an examiner could use his own judgment to determine one or more of the following: The sample size. The items selected for the test group. How the results are ...

What are the advantages of statistical sampling versus non

A statistical method provides an objective measure of risk, optimizes the sample size, and is best for a population of a large number of homogeneous transactions. If the population members are dissimilar or there are key items, a non-statistical approach is most suitable.

What are the three main parts of statistical and Nonstatistical methods?

These three parts are:.
The sample size of the population..
The selected items for sampling..
The method of evaluation of the results..