Study finds 94% of business spreadsheets have critical errors

meep · August 14, 2024, 1:28pm

This is the study:

Spreadsheet quality assurance: a literature review

Abstract

Spreadsheets are very common for information processing to support decision making by both professional developers and non-technical end users. Moreover, business intelligence and artificial intelligence are increasingly popular in the industry nowadays, where spreadsheets have been used as, or integrated into, intelligent or expert systems in various application domains. However, it has been repeatedly reported that faults often exist in operational spreadsheets, which could severely compromise the quality of conclusions and decisions based on the spreadsheets. With a view to systematically examining this problem via survey of existing work, we have conducted a comprehensive literature review on the quality issues and related techniques of spreadsheets over a 35.5-year period (from January 1987 to June 2022) for target journals and a 10.5-year period (from January 2012 to June 2022) for target conferences. Among other findings, two major ones are: (a) Spreadsheet quality is best addressed throughout the whole spreadsheet life cycle, rather than just focusing on a few specific stages of the life cycle. (b) Relatively more studies focus on spreadsheet testing and debugging (related to fault detection and removal) when compared with spreadsheet specification, modeling, and design (related to development). As prevention is better than cure, more research should be performed on the early stages of the spreadsheet life cycle. Enlightened by our comprehensive review, we have identified the major research gaps as well as highlighted key research directions for future work in the area.

They reference 192 papers, and I looked over the list, finding papers I would have expected from the European Spreadsheet Risks Interest Group members’ research. So there’s that. It looks pretty solid in how they did their review in finding pertinent papers.

Pulling out the bit on types of errors:

6.4.6 Spreadsheet error classifications and taxonomies

different types of taxonomies

Many studies have developed classifications or taxonomies of
spreadsheet errors. The study in [44] is one of these pioneer
works. In [44], although the classification scheme includes
eight error types, it is not fine-grained enough because:
(a) some of these error types are mistakes (e.g., incorrectly
copied formulae) and some of them are faults (e.g., incorrect
ranges in formulae and incorrect cell references), and (b) one
“error” type called confused range names is in fact undesirable
practice potentially contributing to confusion rather than faults
that directly cause failures. Another work [152] distinguished
between two classes of errors: domain error (e.g., a “mistake”
in logic due to misunderstanding of the depreciation concept
in accounting) and device error (e.g., a “fault” involving a
wrong reference in the depreciation function). Similar to the
work in [44], the error classification in [152] does not
differentiate between mistakes and faults.
With respect to spreadsheet faults, the work in [153]
distinguished between location fault and formula fault.
Location fault refers to a fault in formulae that are
conceptually correct but one or more of their cell references
are wrong, while formula fault refers to a misuse of operators
or wrong number of operators. On the other hand, two studies
[42,137] developed a taxonomy specifically focused on
spreadsheet mistakes (Fig. 1). They defined quantitative
mistakes as those which cause “immediate” incorrect output
from spreadsheets, whereas qualitative mistakes are those
which do not result in “immediate” wrong output but, rather,
they often represent poor design and coding practices (e.g.,
putting a constant instead of a cell reference into a formula).
Such a definition of qualitative “mistakes” is debatable,
because they do not necessarily produce failures. Instead, they
are poor spreadsheet modeling practices and, hence, are
strictly speaking not mistakes. Note that, in [154], latent
errors are defined as those that “do not directly cause the error
and occur upstream of the event”. Thus, latent errors defined
in [154] are obviously of the same in nature as that of
qualitative “mistakes” defined in [42,137].
Along with this taxonomy, the study in [155] included two
more “mistakes”: jamming (values of more than one variable
are placed in a single cell) and duplication (information of a
variable is duplicated in the spreadsheet, possibly resulting in
data inconsistency). Here, similar to qualitative “mistakes”
defined in [42,137], classifying jamming and duplication
(which are unrecommended practices) as mistakes is
controversial. Similarly, the study in [156] developed a
taxonomy to classify the qualitative “mistakes” in Fig. 1 into
four subtypes: formula integrity, semantics, extendibility, and
logic transparency (see Fig. 2). With respect to the qualitative
“mistakes” and quantitative mistakes in Fig. 1, an experiment
involving desk checking was performed [157]. This
experiment found that: (a) quantitative mistakes were more
easily detected than qualitative “mistakes”, and (b) the
detection rate depended on the type and prominence of
mistakes (e.g., whether the mistakes were conspicuous) as
well as prior incremental practice with spreadsheet error
detection [157].

Rastiln · August 14, 2024, 1:42pm

Sure, but did they use an Excel database to compile this information?

Ted_Hoffman · August 14, 2024, 3:12pm

“Can I really trust the information in this Excel file?”
“They’ve done studies, you know. 6% of the time, it works every time.”

meep · August 14, 2024, 4:48pm

They were pretty thorough in their search:

We started our literature search for journal papers from
January 1987. Choosing this start date is based on the
development history of spreadsheets — it was reported that
Microsoft launched the Windows operating system in 1987
and Excel was one of the first application products bundled
with it [29]. Our search period was from January 1987 to June
2022 (35.5 years).

The_Polymath · August 14, 2024, 5:06pm

This is why my company has spent the last few years fanatically developing EUCs for all major spreadsheets
in use.

Doctor_Who · August 14, 2024, 6:03pm

I’m sure he would have said spreadsheets instead of statistics had he lived long enough.

Mathman · August 14, 2024, 7:09pm

Plot twist: the authors of this study did their work in Excel and their file contains errors.