Training large multimodal language models (MLLMs) presents unique challenges due to their heterogeneous nature. This paper proposes an adaptive resource scheduling framework that dynamically maps operators to pipeline stages based on real-time profiling of compute intensity, memory pressure, and communication patterns. The approach introduces a workload balancing algorithm that continuously monitors stage execution times and redistributes operators to minimize pipeline bubbles, leading to significant throughput improvements.
Key findings
Adaptive resource scheduling framework proposed for efficient MLLM training in heterogeneous data centers.
Dynamic Operator Mapping algorithm assigns operators to pipeline stages based on real-time profiling.
Workload Balancing mechanism detects load imbalance and migrates operators to minimize pipeline bubbles.
Heterogeneity Awareness integrates GPU capability profiles into scheduling decisions.
Extensive experiments show up to 149.6% throughput improvement over Megatron-LM baselines.
Limitations & open questions
Further research needed on scalability and optimization for larger clusters.