Will machines ever rule the world? W50
Hopes and fears of the current AI safety paradigm, GPU performance predictions and popular literature on why machines will never rule the world. Welcome to the ML & AI safety Update!
Opportunities
Join the AGI safety fundamentals course, starting next year! https://ais.pub/agisf2
Altera, EA adjacent organization, is looking for members: https://ais.pub/astera
Show your interest in joining the Machine Learning Alignment Bootcamp: https://ais.pub/mlab
[28 dec] Join workshops to learn about alignment as a career: https://ais.pub/aisworkshops
[Today!] Global Challenges Project workshops in Oxford: https://ais.pub/gcp
[Today!] AI Testing Hackathon, see our livestream tonight: https://ais.pub/jamlive
0:00 Intro
0:12 Hopes and fears of current AI safety
2:22 GPU forecasting and AGI?
3:45 Why Machines Will Never Rule the World
4:58 Other news
6:02 🎄 Opportunities
Sources:
Karnofsky on hopes for AI safety with current methods, 1) digital neuroscience, 2) limited AI and 3) AI checks and balances: https://www.alignmentforum.org/posts/7BWmLhFtqzqEPs8d5/high-level-hopes-for-ai-alignment
Lance Armstrong, King Lear, Lab mice, first contact
Christiano’s reminder that AI alignment is distinct from applied alignment: https://www.alignmentforum.org/posts/Hw26MrLuhGWH7kBLm/ai-alignment-is-distinct-from-its-near-term-applications
Shlegeris’ RLHF critique: https://www.alignmentforum.org/posts/NG6FrXgmqPd5Wn3mh/trying-to-disambiguate-different-questions-about-whether
Steiner, RLHF / IDA / Debate does not solve outer alignment, showcasing the left turn view: https://www.alignmentforum.org/posts/6YNZt5xbBT5dJXknC/take-9-no-rlhf-ida-debate-doesn-t-solve-outer-alignment
EpochAI’s prediction of GPU performance, 2027-2035 GPU progress stop, cores and transistors: https://epochai.org/blog/predicting-gpu-performance
Saba’s review of Keith’s “Machines Will Never Rule the World”: https://www.youtube.com/watch?v=IMnWAuoucjo
Steve Byrnes’ research update: https://www.alignmentforum.org/posts/qusBXzCpxijTudvBB/my-agi-safety-research-2022-review-23-plans
Discovering latent knowledge in language models: https://arxiv.org/pdf/2212.03827.pdf
Eliciting latent knowledge problem: https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit
Finite factored sets as replacing causal graphs: https://www.alignmentforum.org/posts/PfcQguFpT8CDHcozj/finite-factored-sets-in-pictures-6
Binary correlating variables: https://www.alignmentforum.org/posts/N5Jm6Nj4HkNKySA5Z/finite-factored-sets#2e__Two_Binary_Variables__Pearl_
PIBBBS updates: https://www.alignmentforum.org/posts/gbeyjALdjdoCGayc6/reflections-on-the-pibbss-fellowship-2022#Overview_of_main_updates
Model editing using task vector arithmetic: https://arxiv.org/pdf/2212.04089.pdf
Join the AGI safety fundamentals course, starting next year! https://ais.pub/agisf2
Altera, EA adjacent organization, is looking for members: https://ais.pub/astera
Show your interest in joining the Machine Learning Alignment Bootcamp: https://ais.pub/mlab
[28 dec] Join workshops to learn about alignment as a career: https://ais.pub/aisworkshops
[Today!] Global Challenges Project workshops in Oxford: https://ais.pub/gcp
[Today!] AI Testing Hackathon, see our livestream tonight: https://ais.pub/jamlive
0:00 Intro
0:12 Hopes and fears of current AI safety
2:22 GPU forecasting and AGI?
3:45 Why Machines Will Never Rule the World
4:58 Other news
6:02 🎄 Opportunities
Sources:
Karnofsky on hopes for AI safety with current methods, 1) digital neuroscience, 2) limited AI and 3) AI checks and balances: https://www.alignmentforum.org/posts/7BWmLhFtqzqEPs8d5/high-level-hopes-for-ai-alignment
Lance Armstrong, King Lear, Lab mice, first contact
Christiano’s reminder that AI alignment is distinct from applied alignment: https://www.alignmentforum.org/posts/Hw26MrLuhGWH7kBLm/ai-alignment-is-distinct-from-its-near-term-applications
Shlegeris’ RLHF critique: https://www.alignmentforum.org/posts/NG6FrXgmqPd5Wn3mh/trying-to-disambiguate-different-questions-about-whether
Steiner, RLHF / IDA / Debate does not solve outer alignment, showcasing the left turn view: https://www.alignmentforum.org/posts/6YNZt5xbBT5dJXknC/take-9-no-rlhf-ida-debate-doesn-t-solve-outer-alignment
EpochAI’s prediction of GPU performance, 2027-2035 GPU progress stop, cores and transistors: https://epochai.org/blog/predicting-gpu-performance
Saba’s review of Keith’s “Machines Will Never Rule the World”: https://www.youtube.com/watch?v=IMnWAuoucjo
Steve Byrnes’ research update: https://www.alignmentforum.org/posts/qusBXzCpxijTudvBB/my-agi-safety-research-2022-review-23-plans
Discovering latent knowledge in language models: https://arxiv.org/pdf/2212.03827.pdf
Eliciting latent knowledge problem: https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit
Finite factored sets as replacing causal graphs: https://www.alignmentforum.org/posts/PfcQguFpT8CDHcozj/finite-factored-sets-in-pictures-6
Binary correlating variables: https://www.alignmentforum.org/posts/N5Jm6Nj4HkNKySA5Z/finite-factored-sets#2e__Two_Binary_Variables__Pearl_
PIBBBS updates: https://www.alignmentforum.org/posts/gbeyjALdjdoCGayc6/reflections-on-the-pibbss-fellowship-2022#Overview_of_main_updates
Model editing using task vector arithmetic: https://arxiv.org/pdf/2212.04089.pdf
