Chuck Darwin<p>“We’ve achieved peak data and there’ll be no more.”</p><p>OpenAI’s cofounder and former chief scientist, <br /><a href="https://c.im/tags/Ilya" class="mention hashtag" rel="tag">#<span>Ilya</span></a> <a href="https://c.im/tags/Sutskever" class="mention hashtag" rel="tag">#<span>Sutskever</span></a>, made headlines earlier this year after he left to start his own AI lab called <br />Safe Superintelligence Inc. </p><p>He has avoided the limelight since his departure but made a rare public appearance in Vancouver on Friday at the <br />Conference on Neural Information Processing Systems (NeurIPS).</p><p>“Pre-training as we know it will unquestionably end,” Sutskever said onstage. </p><p>This refers to the first phase of AI model development, <br />when a large language model learns patterns from vast amounts of unlabeled data <br />— typically text from the internet, books, and other sources. </p><p>During his NeurIPS talk, Sutskever said that, <br />while he believes existing data can still take AI development farther, <br />the industry is tapping out on new data to train on. </p><p>This dynamic will, he said, eventually force a shift away from the way models are trained today. </p><p>He compared the situation to fossil fuels: <br />just as oil is a finite resource, <br />the internet contains a finite amount of human-generated content.</p><p>“We’ve achieved peak data and there’ll be no more,” according to Sutskever. </p><p>“We have to deal with the data that we have. There’s only one internet</p><p>Next-generation models, he predicted, are going to “be agentic in a real ways.” </p><p>Agents have become a real buzzword in the AI field. </p><p>While Sutskever didn’t define them during his talk, they are commonly understood to be an autonomous AI system that performs tasks, makes decisions, <br />and interacts with software on its own.</p><p>Along with being “agentic,” he said future systems will also be able to reason. </p><p>Unlike today’s AI, which mostly pattern-matches based on what a model has seen before, <br />future AI systems will be able to work things out step-by-step in a way that is more comparable to thinking.</p><p>The more a system reasons, “the more unpredictable it becomes,” according to Sutskever. </p><p>He compared the unpredictability of “truly reasoning systems” to how advanced AIs that play chess “are unpredictable to the best human chess players.”</p><p>“They will understand things from limited data,” he said. </p><p>“They will not get confused.”</p><p>On stage, he drew a comparison between the scaling of AI systems and evolutionary biology, <br />citing research that shows the relationship between brain and body mass across species. </p><p>He noted that while most mammals follow one scaling pattern, hominids (human ancestors) show a distinctly different slope in their brain-to-body mass ratio on logarithmic scales.</p><p>He suggested that, just as evolution found a new scaling pattern for hominid brains, <br />AI might similarly discover new approaches to scaling beyond how pre-training works today.<br /><a href="https://www.theverge.com/2024/12/13/24320811/what-ilya-sutskever-sees-openai-model-data-training" target="_blank" rel="nofollow noopener noreferrer" translate="no"><span class="invisible">https://www.</span><span class="ellipsis">theverge.com/2024/12/13/243208</span><span class="invisible">11/what-ilya-sutskever-sees-openai-model-data-training</span></a></p>