Deep ML Safety Research W41

Theory of Law, OOD and too few researchers W41
✉️ Read the ML Safety Newsletter:
📈 Help Center for AI Safety come up with new benchmarks:
💡 Help Redwood Research find interesting heuristics:
👩‍🔬 Join the hackathon:
🙋‍♀️ Register as a local organizer:
🤓 Read the Alignment 201 curriculum:

[Corrections] Marcus = Marius. AGI Safety Fundamentals is not taking applicants but you can read the Alignment 201 curriculum here:

00:00 Intro
00:19 Law defines alignment principles
00:40 Out-of-distribution alignment
01:27 Reward hacking defined
02:38 Inductive biases in learning algorithms
03:34 Warning shots are not enough
04:04 State of AI safety
04:45 Announcements

Legal informatics for AI safety, robust specification
Out-of-distribution GAN examples
Formal definition of ‘reward hacking’
DeepMind: Why correct goals are not enough
QAPR 4, inductive biases of learning processes
QAPR 3: Training NNs from interpretability priors
Neural tangent Kernel distillation
OG paper:
Gaussian processes
Soares’ critique of warning shots
~300 people in AIS
Statistics of machine learning
Chatting AI safety with 100+ researchers
Smaller news
Safety benchmarks prize
Finding heuristics of GPT-2 small
Alignment 201 curriculum:
Deep ML Safety Research W41
Broadcast by