LLM을 위한 엔지니어링
Basic Knowledge of Accelerator
Parallelism and Scheduling
Data Parallelism
Model Parallelism
Pipeline Parallelism
Sequence Parallelism
Mixture of Experts
Save GPU memory and speed up
Mixed Precision Training
Activation Checkpointing
ZeRO Redundancy
Optimizer States
Gradients
Model Weights
Collective Communications
All-reduce
All-gather
#
Enjoy Reading This Article?
Here are some more articles you might like to read next: