Overhead Crane Training Program

Reducing Cross-Pod Communication Overhead for MoE Model Training With Hybrid Parallelism in Multi-Tenant Clusters

Abstract: The massive parameter scale of sparsely-activated Mixture-of-Experts (MoE) models necessitates distributed training with hybrid parallelism. Placing such training tasks, i.e. mapping the ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Reducing Cross-Pod Communication Overhead for MoE Model Training With Hybrid Parallelism in Multi-Tenant Clusters

Trending now