This research proposal introduces AMCTS-OL, a novel framework that integrates online Bayesian learning into Monte Carlo Tree Search to address dynamic workload scheduling in non-stationary environments. The method employs Thompson Sampling-based adaptive Upper Confidence Bounds and a workload shift detection module to enable rapid adaptation to changing conditions. We provide theoretical regret bounds under bounded workload variation and propose experimental validation on dynamic job shop scheduling and cloud resource allocation benchmarks, demonstrating 2-3x faster convergence compared to existing baselines.
Key findings
AMCTS-OL integrates online Bayesian learning directly into MCTS selection and expansion phases for continuous adaptation to changing workload characteristics.
Thompson Sampling-based adaptive UCB dynamically balances exploration-exploitation based on uncertainty in both action-values and environment parameters.
Lightweight shift detection mechanism monitors search statistics to identify non-stationarity without requiring external change-point detection or task boundary information.
Theoretical analysis establishes regret bounds for the algorithm under bounded variation in workload distributions.
Experimental protocol targets 2-3x convergence improvement over LiZero, PA-MCTS, and standard UCT while maintaining computational efficiency suitable for real-time decisions.
Limitations & open questions
Theoretical guarantees assume bounded variation in workload distributions, which may not hold in scenarios with abrupt, unbounded environmental changes.
Method relies on access to historical task distributions for knowledge transfer, which may be limited in cold-start or rapidly evolving deployment scenarios.
Experimental validation focuses specifically on job shop scheduling and cloud resource allocation domains, limiting generalization claims to other scheduling contexts.