Interpretability on Go and Language Models

This week, we take a look at interpretability used on a Go-playing neural network, glitchy tokens and the opinions and actions of top AI labs and entrepreneurs.


  • Stanford's AI100 prize is for people to write essays about how AI will affect our lives, work, and society at large. The applications close at the end of this month: 
  • You can apply for a paid three-month fellowship with AI Safety Info to write answers and summaries for alignment questions and topics: 
  • The Future of Life Institute has open rolling applications for remote, full-time and interns: 
  • Similarly, the Epoch team has an expression of interest to join their talented research team: 
  • You can apply for a postdoc / research scientist position in language model alignment at New York University with Sam Bowman and his team.  
  • Of course, you can join our AI governance hackathon at


Interpretability on Go and Language Models
Broadcast by