All Episodes
Displaying episodes 1 - 25 of 25 in total
Governing AI and giving it access to the internet
We might need to shut it all down, AI governance seems more important than ever and technical research is challenged.OpportunitiesJoin us for the interpretability hack...

A Self-Replicating GPT-4!
In this week's MLAISU, we're covering the latest technical safety developments with GPT-4, looking at Anthropic's safety strategy, and covering the fascinating Japanes...

Interpretability on Go and Language Models
This week, we take a look at interpretability used on a Go-playing neural network, glitchy tokens and the opinions and actions of top AI labs and entrepreneurs.Opportu...

Bing Wants to Kill Humanity W07
We look at Bing going bananas, see that certification mechanisms can be exploited and that scaling oversight seems like a solvable problem from our latest hackathon re...

Will Microsoft and Google start an AI arms race? W06
We would not be an AI newsletter without covering the past week’s releases from Google and Microsoft but we will use this chance to introduce the concept of AI race dy...

Extreme AI Risk W05
In this week's newsletter, we explore the topic of modern large models’ alignment and examine criticisms of extreme AI risk arguments.

Was ChatGPT a good idea? W04
In this week’s ML & AI Safety Update, we hear Paul Christiano’s take on one of OpenAI’s main alignment strategies, dive into the second round winners of the inverse sc...

Compiling code to neural networks? W03
Welcome to this week’s ML & AI Safety Report where we look at overfitting and a compiler for Transformer architectures! This week is a bit short because the mechanisti...

Robustness & Evolution W02
Welcome to this week’s ML Safety Report where we talk about robustness in machine learning and the human-AI dichotomy. Stay until the end to check out several amazing ...

AI improving itself! W01
Over 200 research ideas for mechanistic interpretability, ML improving ML and the dangers of aligned artificial intelligence. Welcome to 2023 and a happy New Year from...

Will machines ever rule the world? W50
Hopes and fears of the current AI safety paradigm, GPU performance predictions and popular literature on why machines will never rule the world. Welcome to the ML & AI...

ML Safety at NeurIPS & Paradigmatic AI Safety
This week, we see how to break ChatGPT, integrating diverse opinions in an AI and look at a bunch of the most interesting papers from the ML safety workshop happening ...

Are language models now conscious? W48
This week, we’re looking at the wild abilities of ChatGPT, exciting articles coming out of the NeurIPS conference and AGI regulation at the EU level.

Will Humans Be Taken Over by 3-Dimensional Chess Playing AIs? W47
5 years ago, the Google AlphaGo beated reigning world number 1 in Go, Ke Jie, but if you think the board game playing AI's have stopped evolving since, think twice! To...

How Should AIS Relate to Its Funders, W46
This week's ML & AI Safety Update is out! Considerations on the funding situation for AI Safety, exciting projects from Apart's interpretability hackathon, Meta AI-mat...

Are funding options for AI Safety threatened? W45
The crypto giant FTX crashes, introducing massive uncertainty in the funding space for AI safety, humans cooperate better with lying AI, and interpretability is promis...

Can we predict the abilities of future AI? W44
This week, we look at broken scaling laws, surgical fine-tuning, interpretability in the wild, and threat models of AI.

Defending against artificial intelligence W43
This week, we look at how we can safeguard against AGI, look at new research on Goodhart’s law, see an open source dataset with 60,000 emotional videos, and share new ...

Why AI might not be an existential risk to humanity W42
This week, we’re looking at counterarguments to the basic case for why AI is an existential risk to humanity, looking at how strong AI might come very soon, and sharin...

Deep ML Safety Research W41
Theory of Law, OOD and too few researchers W41

AGI Progress & Theoretical Views W40
Welcome to this week's Safe AI Progress Report where Thomas describes the scary developments in AI, a theoretical dispute, and risks from power-seeking AI.

$1.5 million to change someone's mind W39
👩🔬 This week's Safe AI Progress Report puts the focus on the Future Fund's new prize, Conjecture's new research, speed priors, and much more. Come along!

Violent language models & neural hacking W38
This Safe AI Progress Report describes the past week's developments in ML and AI safety. Follow along to get regular updates for the scientific field to stay safe from...

Interpretability state-of-the-art W37
🎉 Interpretability research is doing great and 📈 AI is still getting better

OpenAI, Shard Theory, and Left Turns W36
🥳 Welcome to this Safe AI Progress Report! It is the first of a regular series of videos that summarize what has happened in AI safety since the last report. They acco...
