← 课程列表·AI 工作流中高级2026-06-17· 209 words

How LLMs Are Trained

/听全文·· MP3 · 209

点击文中橙色高亮词查释义

A , or LLM, learns by predicting the next in a sequence. That simple objective, repeated at massive scale, gives the model grammar, facts, and even reasoning ability. Training happens in three broad stages: , supervised , and .

In , the model reads a huge of text and code, often trillions of tokens. It adjusts its parameters, the internal weights that decide how input is transformed into output, to reduce the prediction . Each step uses backpropagation: gradients flow backward through the network, nudging every weight a little. The result is a base model that knows a lot about language but follows instructions poorly.

Supervised , or SFT, fixes that. Engineers collect examples of high-quality question-and-answer pairs and continue training on them. The model learns the format of helpful responses and stops drifting off topic. A smaller, cleaner at this stage often beats a larger noisy one.

Finally, techniques such as rank several candidate answers and reward the ones humans prefer. This shapes tone, safety, and reasoning style. After these three stages, the model is ready to ship. Understanding the pipeline helps explain why good data, careful evaluation, and clear goals matter more than just throwing more compute at the problem.

/生词 · 点击查释义

/课后 5 题

  1. 1. What does pre-training predict?

  2. 2. What is a parameter?

  3. 3. What is the goal of supervised fine-tuning?

  4. 4. What does RLHF use?

  5. 5. Why does data quality matter in fine-tuning?

5 / 5