Trainer Interface
Last updated: 06/08/2025 (API docstrings are auto-generated).
Trainers drive the training loop. Introducing new trainer classes in case of new training paradiam is encouraged.
Core APIs
Utils for tokenization.
- verl.trainer.ppo.reward.compute_reward(data: DataProto, reward_fn: AbstractRewardManager) tuple[Tensor, dict[str, Any]][source]
Compute reward for a batch of data. :param data: DataProto object containing the input data. :param reward_fn: Reward function to compute the reward.
- Returns:
Tuple of reward tensor and extra info dictionary.
- verl.trainer.ppo.reward.load_reward_manager(config: DictConfig, tokenizer: Any, num_examine: int, **reward_kwargs: Any) AbstractRewardManager[source]
Load and initialize a reward manager based on the configuration.
- Parameters:
config – PPO trainer configuration object containing reward_model fields.
tokenizer – Tokenizer object used for processing text.
num_examine – Number of samples to examine.
**reward_kwargs – Additional keyword arguments for the reward manager.
- Returns:
An instance of the specified reward manager class.