Are funding options for AI Safety threatened? W45
The crypto giant FTX crashes, introducing massive uncertainty in the funding space for AI safety, humans cooperate better with lying AI, and interpretability is promising but also not.
The crypto giant FTX crashes, introducing massive uncertainty in the funding space for AI safety, humans cooperate better with lying AI, and interpretability is promising but also not.
Content
Content
00:25 Uncertainty for AI Safety Funding
1:20 Human-AI cooperation
2:30 Interpretability in the wild
Opportunities
- CHAI is offering an AI Research Internship under one of their mentors, https://ais.pub/d0de20
- Today is the day the interpretability hackathon starts, open to all, https://ais.pub/alignmentjam
- AI impacts is looking for a senior Research Analyst, https://ais.pub/aiimpactresearcher
Sources:
FTX fallout continues to roll out markets, https://www.youtube.com/watch?v=IgpbdnXOpEk
FTX has probably collapsed, https://forum.effectivealtruism.org/posts/tdLRvYHpfYjimwhyL/ftx-com-has-probably-collapsed
Human-AI cooperation is better when non-calibrated confidence, https://arxiv.org/pdf/2202.05983.pdf
Google shows inverse scaling can become U-shaped, https://www.alignmentforum.org/posts/LvKmjKMvozpdmiQhP/inverse-scaling-can-become-u-shaped
Ethan Perez calls them out on their methods, https://twitter.com/EthanJPerez/status/1588352204540235776
Ethan Perez calls them out on their methods, https://twitter.com/EthanJPerez/status/1588352204540235776
Interpretability in the wild,
https://arxiv.org/pdf/2211.00593.pdf
https://arxiv.org/pdf/2211.00593.pdf
Eric Drexler discusses superintelligences with Eliezer (in the comments), https://www.alignmentforum.org/posts/HByDKLLdaWEcA2QQD/applying-superintelligence-without-collusion
Janus (an alias for several people) shows which places GPT-3 davinci-text-002 selects very specific outputs - favorite number, no answers, etc, https://www.alignmentforum.org/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse-due-to-rlhf
Mesa-optimizer implemented, https://www.alignmentforum.org/posts/b44zed5fBWyyQwBHL/trying-to-make-a-treacherous-mesa-optimizer
David Krueger argues against mechanistic interpretability, https://www.alignmentforum.org/posts/kjRGMdRxXb9c5bWq5/mechanistic-interpretability-as-reverse-engineering-follow
Nate Soares overviews strategies for “knowing AGI is safe”, https://www.alignmentforum.org/posts/iDFTmb8HSGtL4zTvf/how-could-we-know-that-an-agi-system-will-have-good
Interpretability starter resources, https://ais.pub/alignmentjam
