A practical guide to RL environments using Pokemon Red as a case study. Covers the agent-environment loop, observation and action spaces, reward design, trajectories, graders, and the credit assignment problem—with real code examples and lessons from training an LLM-based gameplay agent.
A comprehensive guide to post-training techniques for large language models, covering supervised fine-tuning, RLHF, reward models, and practical implementation details. This guide walks through the entire journey from pre-training to instruct-tuned models with hands-on examples and best practices.