OpenAI, Shard Theory, and Left Turns W36

🄳 Welcome to this Safe AI Progress Report! It is the first of a regular series of videos that summarize what has happened in AI safety since the last report. They accompany a different progress report that focuses on measurable metrics for the field of AI safety.

šŸ“ƒScript
00:00 - Intro
00:13 - OpenAI alignment strategy
00:53 - Strategy explanation
01:50 - John Wentworth's criticism
02:37 - Sam Bowman's NLP survey
03:17 - New perspectives | Simulators
03:43 - New perspectives | Shard theory
04:10 - The Small Side
04:36 - Outro | Learn more

šŸ› Chronological references:
- OpenAI dangerous: https://techcrunch.com/2019/02/17/ope...
- OpenAI = dangerous: https://www.lesswrong.com/posts/Nqn2t...
- OpenAI alignment: https://openai.com/blog/our-approach-...
- Jacob Hilton: https://www.lesswrong.com/posts/3S4ny...
- RLHF: https://arxiv.org/abs/2009.01325
- IDA: https://forum.effectivealtruism.org/p...
- Alignment research tools: https://www.lesswrong.com/posts/ebYio...
- Elicit: https://elicit.org/search?q=AI+safety...
- Eleuther network: https://arxiv.org/pdf/2206.02841.pdf
- Not everyone happy: https://www.lesswrong.com/posts/3S4ny...
- Grokking: https://arxiv.org/pdf/2201.02177.pdf
- Deception: https://www.lesswrong.com/posts/zthDP...
- RLHF is terrible: https://www.lesswrong.com/posts/xFotX...
- Survey of NLP researchers: https://nlpsurvey.net/nlp-metasurvey-...
- Copilot good: https://github.blog/2022-09-07-resear...
- Programs for itself: https://arxiv.org/abs/2207.14502
- Simulators: https://www.lesswrong.com/posts/vJFdj...
- Shard theory: https://www.lesswrong.com/posts/iCfdc...
- Richard’s list: https://www.lesswrong.com/posts/27AWR...
- Thomas & Eli’s list: https://www.lesswrong.com/posts/QBAjn...
- Philosophy fellowship: https://philosophy.safe.ai/
- ML Safety course material: https://course.mlsafety.org/
- ML safety competitions: https://safe.ai/competitions
- Apart Research: https://apartresearch.com
- AI safety ideas: https://aisafetyideas.com/
OpenAI, Shard Theory, and Left Turns W36
Broadcast by