In the case of supervised Understanding, the trainers played both sides: the user as well as the AI assistant. Within the reinforcement Mastering stage, human trainers initially ranked responses which the design had established inside of a former conversation.[fifteen] These rankings ended up made use of to generate "reward versions" https://chat-gptx.com/understanding-chat-gpt-capabilities-and-applications/