Scaling Test Time Compute: How o3-Style Reasoning Works (+ Open Source Implementation)

「ツール」は右上に移動しました。

243いいね 5673回再生

Is scaling test time compute the path to AGI?

Resources:
HF Blog - huggingface.co/spaces/HuggingFaceH4/blogpost-scali…
Search & Learn - github.com/huggingface/search-and-learn/tree/main/…
SCoRE - arxiv.org/pdf/2409.12917
Scaling Laws - arxiv.org/pdf/2001.08361
Scaling Test Time Time Compute Optimally - arxiv.org/pdf/2408.03314
o1 Blog - openai.com/index/learning-to-reason-with-llms/
o3 Blog - arcprize.org/blog/oai-o3-pub-breakthrough

Great resource that’s compiled a lot of o1 papers, documents, videos, and more: github.com/hijkzzz/Awesome-LLM-Strawberry

Chapters:
00:00 - Introduction
01:17 - Scaling Pre Training Background
06:05 - The Idea Behind Scaling Test Time Compute
08:26 - Training Reasoning Models
13:23 - Open Source: Search & Verification Background
16:30 - Open Source: Verification Reward Models
18:39 - Open Source: Best-of-N
20:25 - Open Source: Beam Search
22:48 - Open Source: Diverse Verifier Tree Search
23:50 - Optimally Scaling Test Time Compute
25:42 - Running Test Time Compute Experiments
26:47 - Results: Llama 3.2 1B Instruct
28:35 - Results: Llama 3.2 1B ORPO 40k
31:54 - Discussion

#ai #machinelearning #programming