Deep ML Safety Research W41

Theory of Law, OOD and too few researchers W41
✉️ Read the ML Safety Newsletter: https://ais.pub/mlsn6
📈 Help Center for AI Safety come up with new benchmarks: https://ais.pub/benchmarks
💡 Help Redwood Research find interesting heuristics: https://ais.pub/gptheuristics
👩‍🔬 Join the hackathon: https://ais.pub/interpretability
🙋‍♀️ Register as a local organizer: https://ais.pub/localorganizer
🤓 Read the Alignment 201 curriculum: https://ais.pub/alignment201

[Corrections] Marcus = Marius. AGI Safety Fundamentals is not taking applicants but you can read the Alignment 201 curriculum here:
https://ais.pub/alignment201.

00:00 Intro
00:19 Law defines alignment principles
00:40 Out-of-distribution alignment
01:27 Reward hacking defined
02:38 Inductive biases in learning algorithms
03:34 Warning shots are not enough
04:04 State of AI safety
04:45 Announcements

Sources:
Legal informatics for AI safety, robust specification https://arxiv.org/abs/2209.13020
Out-of-distribution GAN examples https://arxiv.org/abs/2209.11960
Formal definition of ‘reward hacking’ https://arxiv.org/abs/2209.13085
DeepMind: Why correct goals are not enough https://arxiv.org/abs/2210.01790
QAPR 4, inductive biases of learning processes https://www.alignmentforum.org/posts/...
QAPR 3: Training NNs from interpretability priors https://www.alignmentforum.org/s/5omS...
Neural tangent Kernel distillation https://www.alignmentforum.org/posts/...
OG paper: https://arxiv.org/pdf/1806.07572.pdf
Gaussian processes https://distill.pub/2019/visual-explo...
Soares’ critique of warning shots https://www.alignmentforum.org/posts/...
~300 people in AIS https://forum.effectivealtruism.org/p...
Statistics of machine learning https://financesonline.com/machine-le...
Chatting AI safety with 100+ researchers https://www.alignmentforum.org/posts/...
Smaller news
MLSN https://www.alignmentforum.org/posts/...
Safety benchmarks prize https://benchmarking.mlsafety.org/
Finding heuristics of GPT-2 small https://www.lesswrong.com/posts/LkBmA...
Alignment 201 curriculum: https://www.agisafetyfundamentals.com...
Deep ML Safety Research W41
Broadcast by