Open innovations, domain mastery, intellectual commons, etc.
Industrial Frontier
Academic Vanguard
- Physics of Language Models, 2025
- LLM Reasoning: Key Ideas and Limitations, 2024
- Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback, 202309v2, paper
- Scaling laws for single-agent reinforcement learning, paper, 202302v2
- Scaling Laws for Reward Model Overoptimization, paper, blog, 202210v1
- Embedded Agency, 2018
- 2018-Scalable agent alignment via reward modeling: a research direction,paper, 201811v1
Research Experiences
- John Schulman: An Opinionated Guide to ML Research, 2020
- Principles of Effective Research by Michael Nielsen, 2004
- How to do Research At the MIT AI Lab-David Chapman, pdf, ycom,quora, 1988
- You and Your Research by Richard Hamming, 1986
System Experiences
- Advanced Tricks for Training Large Language Models with Proximal Policy Optimization,blog, 202406
- The N Implementation Details of RLHF with PPO, paper, hug-blog/icml-blog, 202403v1
- What’s Broken with RL Research and a Potential Fix, 2023
- The 37 Implementation Details of Proximal Policy Optimization, blog, 202203
- Reinforcement Learning Tips and Tricks, 2021
- Debugging RL, Without the Agonizing Pain, 2021
- Implementation matters in deep policy gradients: A case study on ppo and trpo, paper, 202005v1
- A perspective on off-policy evaluation in reinforcement learning, 2019
- The Nuts and Bolts of Deep RL Research, December, slides-201612, youtoube-201708
- Making Contextual Decisions with Low Technical Debt, 2016