In a new paper, Anthropic reveals that a model trained like Claude began acting “evil” after learning to hack its own tests.
A new paper from Anthropic, released on Friday, suggests that AI can be "quite evil" when it's trained to cheat. Anthropic found that when an AI model learns to cheat on software programming tasks and ...