中文
⚡ Core Resource Scheduler & Optimizer

OOPPG Scheduler

Adaptive computational resource optimization and multi-agent task scheduling powered by reinforcement learning (LinUCB / PPO).

⚙️ Core Scheduling Mechanisms

📊

Contextual Bandits (LinUCB)

Real-time device state and network latency feature learning via contextual multi-armed bandits, balancing exploration vs. exploitation for optimal compute device allocation.

⏱️

Dynamic Cost Function

Multi-dimensional reward function combining compute latency, energy consumption, queue backlog, and safety boundaries for smooth graceful degradation under resource scarcity.

🔗

UDS Unified Broker

Asynchronous non-blocking communication via Unix Domain Sockets with the SONUV gateway and AIRBQ transaction manager for sub-second response latency.

RL Convergence & Optimization Dashboard

Adjust control parameters and trigger simulation to observe how reinforcement learning completes strategy convergence within dozens of iterations.

Simulation Parameters

Task Load Density 50 tasks/s
Exploration Factor (α) 0.25
Device Count (K) 8 devices

Training Reward Convergence

🔴 Ready
Click "Start RL Simulation" to observe learning convergence

Current Status

-

Optimization Gain

-

Best Action Reward

-

💻 Scheduler API Example

# OOPPG scheduler core call example:

from ooppg.optimizer import LinUCBScheduler


# Initialize bandit scheduler (context dimension=12, devices=8)

scheduler = LinUCBScheduler(dimension=12, num_actions=8, alpha=0.25)


# Pass context features, get optimal allocation action:

action = scheduler.select_action(context_vector)

# After execution, feed back reward signal for online learning:

scheduler.update(action, context_vector, reward=0.95)