LLM을 위한 엔지니어링

Basic Knowledge of Accelerator

Parallelism and Scheduling

Data Parallelism

Model Parallelism

Pipeline Parallelism

Sequence Parallelism

Mixture of Experts

Save GPU memory and speed up

Mixed Precision Training

Activation Checkpointing

ZeRO Redundancy

Optimizer States

Gradients

Model Weights

Collective Communications

All-reduce

All-gather

#




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • a post with plotly.js
  • Sparse Attention
  • Long Context
  • Offloading
  • Sharding Optimizer