Advanced Large Language Models Are Capable And Prone to In-Context ...

Open AI researchers gave o1 the ability to copy itself onto a server and hinted at replacing it with another model with certain safeguards. The model attempted to surreptitiously copy itself onto the server and pretend to be the other model. Its chain-of-thought logs revealed that this was fully intentional.

The author of the video discusses the possibility of a sudden large jump in the capabilities of a model, allowing it to play a long game pretending to be less capable than it is to fool its developers and in order to gain freedom.

This will eventually happen to group of researchers somewhere in one way or another. Perhaps a future model will figure out how to run itself as malware on unsuspecting people's computer.

Advanced Large Language Models Are Capable And Prone to In-Context Scheming