Galvatron

GitHub License GitHub Release PyPI - Version Read the Docs Downloads visitors

Galvatron is an automatic distributed training system designed for Transformer models, including Large Language Models (LLMs). It leverages advanced automatic parallelism techniques to deliver exceptional training efficiency. This repository houses the official implementation of Galvatron-2, our latest version enriched with several new features.

Galvatron GitHub: https://github.com/PKU-DAIR/Hetu-Galvatron

Supported Parallelism Strategies

Strategy

Type

Supported Variants

Data Parallelism (DP)

Basic

Traditional DP

Sharded DP (SDP)

Memory-Efficient

ZeRO-1, ZeRO-2, ZeRO-3

Pipeline (PP)

Model Split

GPipe, 1F1B-flush

Tensor (TP)

Model Split

Megatron-LM Style, flash-attn Style

Sequence (SP)

Data Split

Megatron-SP, Ulysses

Checkpointing (CKPT)

Memory-Efficient

Activation Checkpoint

Supported Models

Model Type

Architecture

Backend

LLMs

GPT

Huggingface, flash-attn

LLMs

LLaMA

Huggingface, flash-attn

LLMs

BERT

Huggingface

LLMs

T5

Huggingface

Vision Models

ViT

Huggingface

Vision Models

Swin

Huggingface