Back to IndexHow Much Do LLMs Really Memorize? A Technical Breakdown

Chapters
0:0 Introduction to Language Model Memorization
0:54 Experimental Details: Varying Training Dataset Size
4:46 Differentiating Memorization and Generalization with Synthetic Sequences
14:40 Using Arithmetic Coding to Approximate Data Complexity
15:20 Using Larger Reference Models for Complexity
26:10 Experimental Details: Model Architectures and Parameters
33:10 Results: Performance vs. Training Set Size
42:12 Kolmogorov Complexity and Shannon Entropy
44:59 Quantifying Unintended Memorization
Transcript