PRIME-RL is a framework for large-scale asynchronous reinforcement learning. It is designed to be easy-to-use and hackable, yet capable of scaling to 1000+ GPUs. Beyond that, here is why we think you ...
Policy (Consumer): Replicas of training instances Rollout (Producer): Replicas of generation engines Low-precision training (FP8) and rollout (FP8 & FP4) support This project will download and install ...