All Episodes

Displaying episodes 1 - 19 of 19 in total

Was ChatGPT a good idea? W04

In this week’s ML & AI Safety Update, we hear Paul Christiano’s take on one of OpenAI’s main alignment strategies, dive into the second round winners of the inverse sc...

Compiling code to neural networks? W03

Welcome to this week’s ML & AI Safety Report where we look at overfitting and a compiler for Transformer architectures! This week is a bit short because the mechanisti...

Robustness & Evolution W02

Welcome to this week’s ML Safety Report where we talk about robustness in machine learning and the human-AI dichotomy. Stay until the end to check out several amazing ...

AI improving itself! W01

Over 200 research ideas for mechanistic interpretability, ML improving ML and the dangers of aligned artificial intelligence. Welcome to 2023 and a happy New Year from...

Will machines ever rule the world? W50

Hopes and fears of the current AI safety paradigm, GPU performance predictions and popular literature on why machines will never rule the world. Welcome to the ML & AI...

ML Safety at NeurIPS & Paradigmatic AI Safety

This week, we see how to break ChatGPT, integrating diverse opinions in an AI and look at a bunch of the most interesting papers from the ML safety workshop happening ...

Are language models now conscious? W48

This week, we’re looking at the wild abilities of ChatGPT, exciting articles coming out of the NeurIPS conference and AGI regulation at the EU level.

Will Humans Be Taken Over by 3-Dimensional Chess Playing AIs? W47

5 years ago, the Google AlphaGo beated reigning world number 1 in Go, Ke Jie, but if you think the board game playing AI's have stopped evolving since, think twice! To...

How Should AIS Relate to Its Funders, W46

This week's ML & AI Safety Update is out! Considerations on the funding situation for AI Safety, exciting projects from Apart's interpretability hackathon, Meta AI-mat...

Are funding options for AI Safety threatened? W45

The crypto giant FTX crashes, introducing massive uncertainty in the funding space for AI safety, humans cooperate better with lying AI, and interpretability is promis...

Can we predict the abilities of future AI? W44

This week, we look at broken scaling laws, surgical fine-tuning, interpretability in the wild, and threat models of AI.

Defending against artificial intelligence W43

This week, we look at how we can safeguard against AGI, look at new research on Goodhart’s law, see an open source dataset with 60,000 emotional videos, and share new ...

Why AI might not be an existential risk to humanity W42

This week, we’re looking at counterarguments to the basic case for why AI is an existential risk to humanity, looking at how strong AI might come very soon, and sharin...

Deep ML Safety Research W41

Theory of Law, OOD and too few researchers W41

AGI Progress & Theoretical Views W40

Welcome to this week's Safe AI Progress Report where Thomas describes the scary developments in AI, a theoretical dispute, and risks from power-seeking AI.

$1.5 million to change someone's mind W39

👩‍🔬 This week's Safe AI Progress Report puts the focus on the Future Fund's new prize, Conjecture's new research, speed priors, and much more. Come along!

Violent language models & neural hacking W38

This Safe AI Progress Report describes the past week's developments in ML and AI safety. Follow along to get regular updates for the scientific field to stay safe from...

Interpretability state-of-the-art W37

🎉 Interpretability research is doing great and 📈 AI is still getting better

OpenAI, Shard Theory, and Left Turns W36

🥳 Welcome to this Safe AI Progress Report! It is the first of a regular series of videos that summarize what has happened in AI safety since the last report. They acco...

Broadcast by