Will Humans Be Taken Over by 3-Dimensional Chess Playing AIs? W47
5 years ago, the Google AlphaGo beated reigning world number 1 in Go, Ke Jie, but if you think the board game playing AI's have stopped evolving since, think twice! Today we will look into the new language model, Cicero's, deceptive abilities along with considerations on what board-game playing AI's teach us about AI-development.
5 years ago, the Google AlphaGo beated reigning world number 1 in Go, Ke Jie, but if you think the board game playing AI's have stopped evolving since, think twice! Today we will look into the new language model, Cicero's, deceptive abilities along with considerations on what board-game playing AI's teach us about AI-development.
Table of contents:
Table of contents:
- Language model plays Diplomacy better than humans
- 3-dimensional chess-playing AI's might not be that dangerous
- Presuming independence to formalise interpretability work
- Monosemanticity engineering of toy models
- Minor news
Sources
- Meta AI announces Cicero able to play the board game Diplomacy better than humans - https://www.science.org/doi/10.1126/science.ade9097
- Clarifications on the 'wireheading term'
https://www.alignmentforum.org/posts/REesy8nqvknFFKywm/clarifying-wireheading-terminology - The “loss of control” scenario rests on a few key assumptions that are not justified by our current understanding of artificial intelligence research
https://windowsontheory.org/2022/11/22/ai-will-change-the-world-but-wont-take-it-over-by-playing-3-dimensional-chess/ - Alignment Research Center: When deduction suddenly becomes deceptive. Formalizing presumptions of independence
https://arxiv.org/abs/2211.06738 - Monosemanticity in neurons responding is great for interpretability
https://arxiv.org/abs/2211.09169 (https://www.alignmentforum.org/posts/LvznjZuygoeoTpSE6/engineering-monosemanticity-in-toy-models) - In case we need some more thoughts on EA's relation to funders
https://www.lesswrong.com/posts/p4XpZWcQksSiCPG72/sadly-ftx#The_Future_of_Effective_Altruist_Ethics &
https://forum.effectivealtruism.org/posts/NeK9XYY2mDsH5bJdD/our-recommendations-for-giving-in-2022 - Comparing AI Alignment research to orthodox and reform religions
https://www.lesswrong.com/posts/XKraEJrQRfzbCtzKN/distillation-of-how-likely-is-deceptive-alignment - Conjecture report
https://www.lesswrong.com/posts/bXTNKjsD4y3fabhwR/conjecture-a-retrospective-after-8-months-of-work-1
AlphaGo beating Ke Jie in GO (5 years ago)
https://www.bbc.com/news/technology-40042581
https://www.bbc.com/news/technology-40042581
