CS336: Language Modeling from Scratch

511 points · 49 comments on HN · read original →

Stanford's CS336 course teaches language model development end-to-end, from tokenization through alignment, with five major assignments and public lecture videos.

CS336 guides students through building a language model from scratch across five assignments. Assignment 1 covers tokenizer, Transformer architecture, and optimizer implementation. Assignment 2 addresses systems optimization including FlashAttention2 in Triton and distributed training. Assignment 3 explores scaling laws. Assignment 4 processes Common Crawl data with filtering and deduplication. Assignment 5 applies supervised finetuning and reinforcement learning for reasoning and safety alignment.

The course requires proficiency in Python, PyTorch, linear algebra, probability, and prior ML/deep learning experience. It is a 5-unit implementation-heavy course. All lectures and assignments are publicly available. For self-study GPU access, costs range from $4.99 to $7.49 per hour for B200 GPUs across multiple providers; Modal sponsors compute for enrolled students. The course staff catches honor code violations through code audits, automated testing, and monitoring code deltas for suspicious patterns.

What HN community is saying

The thread confirms video lectures are available on YouTube and discusses the course's practical value. Several commenters reported successfully completing assignments independently over months of work, describing the experience as challenging but rewarding, with significant time investment required even for experienced practitioners.

Regarding GPU costs, commenters note that early assignments do not require expensive hardware; students can debug on CPU or cheaper GPUs like M-series or 4060Ti before scaling up. The course staff clarified that compute is provided for enrolled Stanford students and that assignments were designed to run on smaller hardware. One limitation noted: the harness assumes Linux with NVIDIA GPU; Windows WSL2 setups and environment configuration details could be better documented. Several commenters recommended CS224N as adequate ML preparation and linked follow-up courses in systems and diffusion models.