Intelligence Via Compression of Information

Dr. Gerry Wolff
Published 02/16/2023
Share this on:

information compressionAlthough compression of information is normally associated with rather boring tasks like making a file smaller, the idea that information compression (IC) might be important in the workings of our brains has been the subject of research since the 1950s.

As early as 1969, neuroscientist Horace Barlow wrote that the operations involved in the compression of information:

“… have a rather fascinating similarity to the task of answering an intelligence test, finding an appropriate scientific concept, or other exercises in the use of inductive reasoning. Thus, [compression of information] may lead one towards understanding something about the organization of memory and intelligence, as well as pattern recognition and discrimination.”

Here are two examples, from the paper [1], to illustrate how IC is deeply embedded in Human Learning, Perception, and Cognition (HLPC):

  • If we are writing something like “Treaty on the Functioning of the European Union” and we want to refer to it several times, it seems natural and obvious to use an abbreviation like ‘TFEU’. A little reflection shows that this technique for achieving IC, which is known as ‘chunking-with-codes’, is used everywhere in natural languages.
  • For example, a noun like ‘justice’ is a short way of referring to the relatively complex thing which is justice; a verb like ‘fly’ is a short way of referring to the relatively complex processes of flying by birds and by planes; and so on. Almost certainly, similar principles apply to those parts of our thinking that do not involve spoken or written words.
  • If we are looking at a scene, and then close our eyes for a moment and open them again, we normally see ‘the same’ scene. In other words, the ‘before’ and ‘after’ views have been merged to make one, and that means compression of information. Much the same can be said about the way we merge the slightly different views from our left and right eyes.
  • In that second case, it has been shown that there really is a merging of the views from the left and right eyes and not simply the discarding of one view and the retention of the other.



Want More Tech News? Subscribe to ComputingEdge Newsletter Today!



Since there is now a wealth of evidence for the importance of IC in HLPC, one would expect that idea to be the bedrock in all theories of artificial intelligence (AI). But at present most research in AI works with ‘deep neural networks’ (DNNs) with only minor roles for IC, or none.

Apart from some AI-related research within the framework of Algorithmic Information Theory, the only theory of AI which puts IC at center stage is the SP Theory of Intelligence (SPTI) and its realization in the SP Computer Model.

The SPTI is not solely a theory of intelligence. As described later, it has potential as a theory of computation, and it suggests new foundations for mathematics which themselves suggest potentially useful new developments.

A key idea in the SPTI is the concept of SP-multiple-alignment (SPMA), developed from the concept of ‘multiple sequence alignment’ in biochemistry. The SPMA concept is a sophisticated application of the idea that IC may be achieved by searching for patterns that Match each other and the merging or ‘Unification’ of Patterns that are the same (ICMUP).

The importance of the SPMA concept is two-fold: it provides a powerful means of compressing varied kinds of information; it is the key to the versatility of the SPTI in diverse aspects of intelligence. The concept of SP-multiple-alignment has the potential to be as significant for an understanding of intelligence as the concept of DNA for an understanding of biology. It may prove to be the double helix of intelligence!

Two examples of SPMAs are shown in Figure 1. These two examples represent alternative parsings of the ambiguous sentence “Fruit flies like a banana”. Each one has been created by a process that searches for a means of compressing the sentence in terms of patterns like those shown in rows 1 to 8 in a repository of many such patterns. The details of how an SPMA is created and how the IC is measured are described in Chapter 3 in the book [3].

Figure 1. Two examples of SPMAs showing two alternative parsings of the ambiguous sentence “Fruit flies like a banana.”

There is much more to the SPMA concept than parsing sentences. In brief, its intelligence-related strengths include: the modeling of several kinds of intelligent behavior, including several kinds of probabilistic reasoning; the representation and processing of several kinds of intelligence-related knowledge; and the seamless integration of diverse aspects of intelligence, and diverse kinds of knowledge, in any combination.

There are many examples of the variety of things that can be done with SPMAs in the book Unifying Computing and Cognition [3], and in a shortened version of the book [2].

These and other publications about the research, including many peer-reviewed publications, are described with download links on In particular, two recent papers, outlined in the next few paragraphs, provide evidence in support of the SPTI.

As its title suggests, the paper [4] describes twenty significant problems in AI research, with potential solutions via the SPTI. Here are a few examples:

  • DNNs are designed to learn via many repetitions of information. By contrast, the SPTI, like people, can learn from a single exposure or experience. And the SPTI, unlike DNNs, can use new information immediately, much as people do.
  • If a DNN learns one thing and is then required to learn something else, the new learning wipes out the old learning, although there are somewhat clumsy workarounds for this problem. The SPTI, like people, does not suffer from this kind of ‘catastrophic forgetting.’
  • By contrast with DNNs, a major strength of the SPTI is ‘transfer learning.’ This means that with the SPTI, as with people, anything that has already been learned is available at any time to be incorporated with new learning in new structures.
  • The ability of the SPTI to learn from a single exposure or experience, and its strengths in transfer learning, means that it is very unlikely that at any stage in its development, the SPTI will ever require enormous quantities of data, and processing power, which is normally needed by DNNs.
  • Unlike DNNs, the SPTI provides complete transparency in its knowledge and it provides an audit trail for all its processing.

Another paper [5] argues that, since the achievement of human-level AI (AGI) is likely to take a long time, with many uncertainties along the way, we should be evaluating AI systems as potential Foundations for the Development of AGI (FDAGIs).

The paper [5] evaluates the SPTI as a potential FDAGI alongside six other systems, including ‘Gato’ from DeepMind and ‘DALL·E 2’ from OpenAI.

Notwithstanding the impressive achievements of Gato and DALL·E 2, the paper concludes that the SPTI, viewed as an FDAGI, has advantages compared with those two systems and the other four:

“The main reason for the relative strength of the SPTI [as an FDAGI] is the concept of SP-multiple-alignment, which is largely responsible for the versatility of the [SP computer model], both within AI and beyond, and for its small size.”

Other potential benefits and applications of the SPTI include: the management of big data; helping to understand commonsense reasoning and commonsense knowledge; medical diagnosis; natural language processing; and several more.

Since the SPTI is, in effect, a theory of computation (see Chapter 4 in [3]), it suggests the possibility of framing all kinds of computation as IC. That in turn suggests the potentially useful possibility of eliminating the many different programming languages. And it suggests the potentially useful possibility of reducing all the many formats for knowledge to a single coding scheme for all kinds of knowledge.

There has been an unexpected spinoff from the development of the SPTI. The logic behind the new development is this: since IC is known to be of central importance in HLPC, and since mathematics is the product of human minds and is designed to aid human thinking, it should not be surprising to find that IC is of central importance in mathematics.

In effect, this line of thinking suggests that there may be new foundations for mathematics that are different from any of the several ‘isms’ in the philosophy of mathematics. But if we are going to develop new foundations for mathematics, the new foundations cannot themselves contain mathematics. So if IC is to provide those new foundations, we cannot use any of the several mathematical treatments of IC.

Fortunately, the approach to IC that is central in the SPMA and the SPTI is founded on the simple ICMUP idea that IC may be achieved by merging two or more copies of things that match each other. Here are some examples:

  • An addition like ‘5 + 7’ may be seen as an example of the ‘run-length coding’ technique for IC, which means that, if anything is repeated two or more times, the several instances may be merged into one and thus compressed.
  • In this example, run-length coding can be seen like this: 5 + 7 is a compressed version of the procedure “start with 5 and then add 1 seven times”. In 5 + 7, the seven applications of ‘add 1’ have been reduced to one.
  • In a similar way, a multiplication like 3 x 8 may be seen as a compressed version of “start with 0 and then add 3 eight times.”
  • If a body of mathematics is repeated in two or more parts of something larger, then it is natural to declare it once as a named ‘function’, where the body of the function may be seen as a relatively large ‘chunk’ of information, and the name of the function is its relatively short ‘code’ or identifier. This avoids the need to repeat the body of the function in two or more places.Examples in mathematics of this kind of ‘chunking-with-codes’ technique include: ‘sqrt()’, where ‘sqrt’ (or the ‘√’ symbol) is the code, while the chunk, which is hidden from view, is all the relatively large mathematics needed to calculate a square root; in a similar way with the function ‘log()’, the word ‘log’ is the code, while the chunk, which is hidden from view, is all the mathematics needed to calculate logarithms; and so on.

The whole subject is described much more fully in [6].

In view of this evidence that the SPTI and mathematics are both founded on IC, an obvious possibility is an amalgamation of the two. This would give the SPTI the benefit of thousands of years of thinking about mathematics, and at the same time, it would give mathematics an AI dimension. The amalgamation could be a powerful new means of expressing ideas in science and in many other areas.

I’ll be happy to try to answer any questions you may have.

Gerry Wolff
+44 (0) 7746 290775


About the author

Dr. Gerry WolffDr. Gerry Wolff is the Director of He has held academic posts at the University of Wales, Bangor, the University of Dundee, the University Hospital of Wales, Cardiff, and a one-year Research Fellowship with IBM in Winchester, UK. He has also worked as a Software Engineer with Praxis Systems plc in Bath, UK.

His first degree at Cambridge University was in Natural Sciences and his PhD at the University of Wales, Cardiff, was in the area of Cognitive Science. He is a Chartered Engineer, a Life Member of IEEE, and a Member of the British Computer Society.

He has worked on the development of computer models of language learning, and later he has been concentrating on the development of the SP Theory of Intelligence. Between early 2006 and late 2012 he was engaged in environmental campaigning (climate change).

Dr Wolff has numerous publications in a wide range of journals, collected papers, and conference proceedings.



AGI – Artificial General Intelligence (meaning artificial intelligence at human-levels or greater).

AI – Artificial Intelligence.

DNN – Deep Neural Network.

FDAGI – Foundation for the Development of AGI.

HLPC – Human Learning, Perception, and Cognition.

IC – Information Compression.

ICMUP – Information Compression via the Matching and Unification of Patterns.

SPMA – SP-multiple-alignment.

SPTI – SP Theory of Intelligence.



[1] “Information compression as a unifying principle in human learning, perception, and cognition,” J G Wolff, Complexity, vol. 2019, Article ID 1879746, 38 pages, 2019, DOI: (PDF:

[2] “The SP Theory of Intelligence: an overview” J G Wolff, Information, 4 (3), 283-341, 2013,

[3] “Unifying Computing and Cognition: the SP Theory and Its Applications”, J G Wolff,, ISBNs: 0-9550726-0-3 (ebook edition), 0-9550726-1-1 (print edition, produced by print-on-demand), Distributors include and

[4] “Twenty significant problems in AI research, with potential solutions via the SP Theory of Intelligence and its realisation in the SP Computer Model“, J G Wolff, Foundations, 2022, 2(4), 1045-1079,

[5] “The SP Theory of Intelligence, and its realisation in the SP Computer Model, as a foundation for the development of artificial general intelligence“, ‘‘, J G Wolff, forthcoming in the Analytics journal.

[6] “Mathematics as information compression via the matching and unification of patterns,” J G Wolff, Complexity, vol. 2019, Article ID 6427493, 25 pages, 2019, DOI: (PDF,


Disclaimer: The author is completely responsible for the content of this article. The opinions expressed are their own and do not represent IEEE’s position nor that of the Computer Society nor its Leadership.