Skip to main content

Bhishma's corner

Search Shelves Books Log in

Details

Revision #1

Created 11 months ago by bhishma

AI risk demo

Page Revisions

Revision #8

AI risk demo

This project aims to replicate the results from the Armstrong's toy model of reward hacking on LLMs trained with RLVR finetuning

Back to top