Part 1: Background
Why do the simplest explanations often turn out to be the most powerful?
In medical diagnosis there’s a saying coined by Dr. Theodore Woodward in the late 1940s:
“When you hear hoofbeats behind you, don’t expect to see a zebra.”
In other words, if a patient comes into the emergency room with crushing chest pain and shortness of breath, it’s far more likely they’re having a heart attack than suffering from a rare tropical infection
Chest pain is the most common symptom of heart attacks and swift treatment hinges on recognising that simple, high-probability cause first.
Occam’s Razor
Occam’s razor was an insight of a 14th–century friar William of Ockham that says that if you have two competing ideas or theories to explain the same phenomenon, you should prefer the simpler one. Today, AI researchers have given it a new twist: instead of counting words in a hypothesis, they measure how well a model compresses its data. If a neural network can squash its training examples down to a few elegant rules, it’s found the simple pattern that’s most likely to generalise to new situations.
In this article, we’ll show how a formal notion called Kolmogorov complexity captures why compression equals understanding in AI.
Kolmogorov complexity: a measure of the computational resources needed to specify the object (piece of text, image, binary string), also known as algorithmic complexity.
Then we’ll turn to your own brain — where the hippocampus acts like a biological compressor, distilling raw experiences into compact “schemas.” Finally, we’ll introduce a six-step, brain-wave–based template for far-transfer in learning: a practical recipe for turning compression into real-world intelligence.
Compression → Learning & Generalisation in AI
Modern AI models like Chat GPT with billion parameters, or vision nets that recognise cats in photos they’ve never seen, can apply learned patterns to brand-new examples. This ability, called generalisation, is like a student who, after practicing dozens of math problems, can solve an unseen puzzle without memorising every past answer. From writing essays to diagnosing X-rays, these networks prove that scale alone doesn’t lead to rote memorisation—it unlocks pattern compression.
Kolmogorov Complexity Primer
Imagine you want the simplest recipe that recreates some data. Kolmogorov complexity asks: what is the shortest possible recipe — or program — that produces that data? In stats and machine learning we regularly balance model ‘size’ against how well it fits the data, and we want to pick the simplest model that still captures the data.
Kolmogorov Complexity captures the idea that finding simplicity in description is the same as finding the essence of the data. It literally measures the length of the shortest ‘program’ that reproduces your data, and compressing down to that minimal code really is synonymous with true understanding as I’ll explain below.
Implications for Deep Learning in AI
At its core, intelligence is about finding the shortest path from inputs to meaningful conclusions. Here we’re talking about minimising both:
Description Length (how much you need to store or transmit your model of the world)
Computational Effort (how many steps it takes to go from observations to a decision)
In Kolmogorov terms, a shorter program both compresses the data and runs faster —because there’s less code to execute.
This is exactly what modern neural nets in AI do at massive scale. Though they contain billions of knobs to turn, these models aren’t memorising every training example; they’re hunting for the leanest, most powerful rules hidden in the data.
From Data to Essence: When a vision model sees a cat, it doesn’t replay stored photos. Instead, it has compressed the essence of feline features — whiskers, ear shapes, movement patterns — into a compact internal code. That compressed code is what lets it recognise a kitten in a cartoon or a cat lounging in a box it’s never encountered before.
Simplicity as Strength: A model whose decision boundary can be described in just a few lines of code is more robust. It resists noise, ignores irrelevant quirks in the training set, and adapts gracefully to new scenarios. In other words, shorter internal descriptions not only fit past data but also unlock future insights - they generalise.
Intelligence Through Compression: By reducing complexity — removing redundancy and focusing on core patterns — massively parameterised networks trained on ‘big data’ like ChatGPT mirror a hallmark of human thought: we compress our experiences into concepts and then apply those concepts flexibly. This link between compression and generalisation isn’t an academic footnote; it’s the very engine of modern AI intelligence.
Compression = Generalisation = Intelligence
The more you can compress your knowledge — and use a streamlined reasoning process — the faster and more capable an intelligent system becomes.
✅ Compression = understanding
✅ Generalisation = discovering simplicity
✅ Intelligence = Understanding & discovering simplicty
When you strip away everything redundant and boil your model of the world down to its essentials, two things happen:
You capture the true structure
A heavily compressed representation only keeps what is genuinely predictive. All the quirks and noise in your training set get tossed out. That means when you meet a new example, you’re matching against core patterns — so you’re far more likely to hit on the right rule than if you had memorised every detail.Inference becomes a quick lookup
With a concise internal ‘recipe,’ you don’t have to wade through layers of special-case logic or noise-filled memories. You apply the same tight rule set — often in one or two steps — and get to your answer immediately.
Together, these create the magic of intelligent generalisation:
Compression ensures you’ve internalised only the essence of past data — so your model isn’t fragile to shifts or oddballs.
Efficiency means you can use that essence instantly in brand-new situations.
Think of a language learner: rather than retaining every sentence they have ever heard, they abstract a handful of universal rules (“noun-verb agreement,” “plural by adding –s”) — then can parse or generate sentences they have never seen before. That’s generalisation born from efficient compression.
In AI terms, when a neural net collapses millions of cat images into a small set of feature detectors (“pointy ears,” “whisker texture,” “arched back”), it’s compressing. When it then classifies a brand new hand-drawn cartoon cat without difficulty, that’s efficient inference driving powerful generalisation — and that combo is at the core of what we call intelligence.
Brain as a Compressor: The Hippocampus at Work
Our hippocampus - which is targetted in IQ Mindware training programs - isn’t just a memory bank — it behaves like a biological compressor, boiling down complex experiences into bite-sized, generalisable codes.
Relational Binding & Schema Extraction
Whenever you meet new people at a party, your hippocampus spots common links — who talked about work, who shared a joke — and compresses those interactions into a simple “friend-of-friend” map. This relational schema lets you later infer that if Annalisa knows Demelza and Demelza knows Evaline, Annalisa might also hit it off with Evaline — without recalling every single chat word for word.
Cognitive Mapping Beyond Space
Originally famous for helping rats navigate mazes, the hippocampus actually builds “maps” of abstract concepts too. Modern models show it weaving together routes in physical space and routes through ideas — so whether you’re finding your way home or drawing analogies between recipes or training course templates, the same compression engine is at work.
Generative Replay & Bits-Back Compression
During sleep, the brain replaying only the most informative fragments — like a highlights reel — factors out predictable bits (“I always wake at sunrise”) and reinforces surprising or useful elements (“that shortcut through the park”). This selective replay acts like a “bits-back” compressor in computer science, tightening your internal model for tomorrow’s challenges.
Cross-Modal Chunking
Ever notice how you grasp spoken sentences as neat phrases rather than endless streams of sound? Neural circuits quietly chunk speech into words and ideas, just as they segment a melody into motifs. By folding continuous input into meaningful units, your brain compresses raw data into ready-to-use building blocks for thought.
Compression and Self-Efficacy: A Surprising Connection
It turns out that squeezing your knowledge down to its purest form doesn’t just power smarter decisions — it also fuels confidence, or self-efficacy. Here’s how:
Compression Builds Mastery
When you compress a skill — say, mental arithmetic doing one of my strategy exercises — into a handful of clear rules, you stop fumbling through dozens of special-case calculations. You have distilled the domain into a neat ‘recipe,’ and every time you apply it successfully, you gain confidence that your mental shortcut really works.Meta-Control Monitors Success
In Trident G, your brain keeps an eye on two key signals:η (eta): How much ‘bit-saving’ (compression) your schemas achieve — the more you reuse compact representations, the higher η rises.
χ (chi): A built-in uncertainty gauge, based on how in-sync your brain’s inference loops are in a complex problem. Low χ means your predictions feel solid; high χ flags that you’re less sure.
When η is high and χ is low, you get a confidence boost — your brain flips into ‘exploit’ D1 dopamine mode, trusting its streamlined rules and reaching for tougher problems. If χ creeps up - signaling rising uncertainty- and η dips - meaning your schemas are no longer saving as many bits and are too complex — you automatically shift into ‘explore’ D2 dopamine mode, hunting for new patterns or tweaking your mental code until compression efficiency is restored.
A Positive Feedback Loop
Success reinforces compression: every time your streamlined model predicts correctly, you log another “win,” strengthening both your internal code and your belief in it. Conversely, when something doesn’t fit, you are nudged into exploration — updating or expanding your compression until you master the new pattern.
By linking compression with self-efficacy, Trident G shows how mastering simplicity isn’t just about smarter thinking — it’s about believing you can think smarter which itself improves IQ.
Part 2: Training
Below are six hands-on exercises to help you compress your thinking, build deep generalisation, and dial up your self-efficacy and confidence.
1. Overlap Detection: Interleaved Concept Drills
What to do:
Pick two related skill domains (e.g. arithmetic and algebra, or synonyms and antonyms).
Create a mixed set of short problems (e.g. 5 arithmetic computations, 5 algebraic equations) and work through them in random order.
After each pair, pause and write down the one principle or pattern that linked today’s two problems — for instance, “both needed me to factor out a common term.”
Why it works: By forcing your hippocampus to hunt for shared structure on every trial, you compress a broad set of examples into a single, reusable insight.
2. Compression Practice: Representational Restructuring
What to do:
Take a complex concept you’re learning (say, a historical timeline or biology cycle).
Sketch it in as few symbols or diagrams as possible — no full sentences allowed!
Iterate until you’ve trimmed redundant arrows, labels, or steps without losing the core causal chain.
Why it works: Turning a verbose story into a minimalist map trains you to spot the essence of any domain and raise your η (compression efficiency) score.
3. Gating Challenge: Uncertainty Sampling Game
What to do:
While solving a logic puzzle or chess tactic, periodically rate your confidence (high / medium / low).
If confidence is high, push yourself to solve the next problem in record time (exploit mode).
If confidence dips, force yourself to explore a novel strategy — read a hint, change your angle, or try a different representation (explore mode).
Why it works: Practicing on-the-spot switches between ‘exploit’ and ‘explore’ mirrors D1/D2 dopamine gating and keeps your χ (uncertainty) signal finely tuned.
4. Generative Replay: Highlight Journaling
What to do:
At the end of each study session, write a one-paragraph ‘gist’ or ‘highlight’ summary: list only the two or three things that surprised or challenged you most.
The next day, try to recreate that summary without reading your notes. Then compare and refine.
Why it works: Selectively replaying only the attention-catching bits — rather than every detail — acts like a bits-back compressor, solidifying compact, high-value mental models.
5. Map + Inference
What to do:
Pick two work artifacts you are familiar with. For example:
A rough wireframe vs. a finished website
An informal Slack message vs. a formal client proposal
Identify the key shifts between them along 2–3 dimensions. For the wireframe → website example, you might pick:
Detail level: low → high
Audience: internal team → end users
Approval stage: exploratory → production-ready
Compute the “difference” in your head—ask yourself, “What exactly changes on each of those axes when we go from A to B?” For each axis, turn those descriptions into an instruction:
“Increase visual fidelity: replace grey boxes with full mockups”
“Switch perspective: write for non-technical users instead of developers”
“Elevate formality: convert rough notes into edited, approved prose”
Apply that same transformation to a brand-new pair. For instance, take an informal Slack summary of yesterday’s meeting and ask, “How would I level up those same three dimensions to turn it into a formal client newsletter?” Reason like this:
“Visual fidelity → translate to “format the Slack list into a branded newsletter template with headings, images, and callouts.”
Perspective → “swap internal shorthand for customer-friendly language, explaining acronyms and context.”
Formality → “polish tone: correct grammar, remove emojis, add an executive summary.”
Why it works:
By isolating the core shifts you practice pulling out the core ‘recipe; that moves you from one format to another. This hones your ability to spot relational patterns in your daily work and to project them onto new, unfamiliar tasks — strengthening both your compression and inference muscles in one go.
6. Far-Transfer Test: Novel-Domain Application
What to do:
Take a schema you’ve mastered (e.g. the rule for adding fractions).
Invent a brand-new domain — say, mixing paint colors in proportions — and apply your fraction rule to solve a problem there.
Reflect: which steps transferred cleanly, and where did you need to tweak your recipe?
Why it works: Forcing a leap into an unfamiliar area tests whether your compressed schema truly captures the underlying pattern — and boosts confidence when it succeeds.
Integrating in your DNB training
After one DNB session with the non-categorical n-back, rotate through two or three of these drills. Note when your compression feels smooth (high η) and when uncertainty nudges you to explore (high χ). Over weeks, you’ll not only sharpen your mental compression engine but also build a robust sense of self-efficacy: you’ll know you can think smarter, where it counts.
Ready to go deeper?
If you found this free tutorial useful, you can achieve more with full access to our Pro toolkit — complete AI-powered prompts, and multi-hypothesis templates designed to supercharge your information compression skills.
👉 Upgrade to a paid subscription and get instant access to the AI powered “Compression Catalyst” Substack deep-dive into this skill-set, and all future premium content.