This seminar on various forms of models and modeling tactics builds on the premise for the 2022 exhibition “Model Behavior”: that architectural models, like scientific, economic, and political models, ...
Several frontier AI models show signs of scheming. Anti-scheming training reduced misbehavior in some models. Models know they're being tested, which complicates results. New joint safety testing from ...