Galvatron
Galvatron is an automatic distributed training system designed for Transformer models, including Large Language Models (LLMs). It leverages advanced automatic parallelism techniques to deliver exceptional training efficiency. This repository houses the official implementation of Galvatron-2, our latest version enriched with several new features.
Galvatron GitHub: https://github.com/PKU-DAIR/Hetu-Galvatron
Supported Parallelism Strategies
Strategy |
Type |
Supported Variants |
|---|---|---|
Data Parallelism (DP) |
Basic |
Traditional DP |
Sharded DP (SDP) |
Memory-Efficient |
ZeRO-1, ZeRO-2, ZeRO-3 |
Pipeline (PP) |
Model Split |
GPipe, 1F1B-flush |
Tensor (TP) |
Model Split |
Megatron-LM Style, flash-attn Style |
Sequence (SP) |
Data Split |
Megatron-SP, Ulysses |
Checkpointing (CKPT) |
Memory-Efficient |
Activation Checkpoint |
Supported Models
Model Type |
Architecture |
Backend |
|---|---|---|
LLMs |
GPT |
Huggingface, flash-attn |
LLMs |
LLaMA |
Huggingface, flash-attn |
LLMs |
BERT |
Huggingface |
LLMs |
T5 |
Huggingface |
Vision Models |
ViT |
Huggingface |
Vision Models |
Swin |
Huggingface |