From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence
Type: academic-paper
Author: Marc Finzi, Shikai Qiu, Yiding Jiang, Pavel Izmailov, J. Zico Kolter, Andrew Gordon Wilson Source: https://arxiv.org/html/2601.03220v1 Date: 2026-01
Institutions: Carnegie Mellon University, New York University
Overview
This paper introduces epiplexity (epistemic complexity), a new information measure designed to quantify structural information extractable by computationally constrained observers. The work addresses fundamental gaps between classical information theory and modern machine learning practice.
Core Problem
Existing frameworks (Shannon entropy, Kolmogorov complexity) fail to explain: - How synthetic data improves model capabilities despite deterministic generation - Why data ordering affects learning outcomes - How models learn more structure than present in data-generating processes - How AlphaZero extracts strategy from simple game rules alone
Key Contributions
1. Three Apparent Paradoxes
The authors identify tensions between theory and practice:
- Paradox 1: Deterministic transformations cannot increase information (yet AlphaZero, synthetic data, and emergent phenomena contradict this)
- Paradox 2: Information is factorization-independent (yet left-to-right text ordering outperforms reverse)
- Paradox 3: Likelihood modeling merely matches distributions (yet observers extract structures beyond generating processes)
2. Epiplexity Definition
Epiplexity separates random from structural information by finding the program minimizing time-bounded MDL (Minimum Description Length):
"Epiplexity captures the structural information present to a computationally bounded observer"
Key distinctions: - Time-bounded entropy (H_T): Unpredictable random content (like CSPRNG outputs) - Epiplexity (S_T): Learnable structural patterns visible within computational limits
3. Measurement Approaches
Prequential Coding: Estimates model description length as area under loss curves above final loss — simple but heuristic.
Requential Coding: Uses cumulative KL divergence between teacher and student models — rigorous but computationally expensive (2-10x slower).
Both approaches rank datasets consistently, though absolute values differ.
Theoretical Results
Theorem 9: CSPRNGs exhibit maximum time-bounded entropy but negligible epiplexity — matching intuition that pseudorandom sequences are structurally empty despite appearing random.
Theorem 10: Under one-way function assumptions, high-epiplexity random variables exist (scaling logarithmically with dimension).
Applications
The framework explains: - Why pre-training on text transfers better than images - How data ordering and interventions affect generalization - Why certain datasets support broader out-of-distribution performance - How emergent phenomena arise in cellular automata and dynamical systems
Key Insights
- Observer-dependent randomness: The same object may appear structured or random depending on computational budget
- Information creation through computation: Deterministic processes can extract exploitable structure when observers have bounded resources
- Separation of concerns: Epiplexity provides a task-independent measure of learnable content, distinct from downstream performance guarantees
Practical Implications
The work provides a theoretical foundation for data selection (complementing traditional model selection), suggesting principles for: - Choosing or generating training data - Curriculum design for language models - Synthetic data creation strategies - Dataset interventions improving generalization
Mathematical Framework
The formalization rests on: - Prefix-free Turing machines for unambiguous program encoding - Time-constructible bounds on computation - Two-part MDL codes balancing model complexity against prediction accuracy - Cryptographic primitives (one-way functions, CSPRNGs) motivating resource-bounded distinctions
Significance
This work bridges algorithmic information theory with practical machine learning, offering novel perspectives on why modern systems succeed where classical theory predicts failure. It establishes computational boundedness as fundamental to understanding information in realistic learning scenarios.