How Should AIS Relate to Its Funders, W46
This week's ML & AI Safety Update is out! Considerations on the funding situation for AI Safety, exciting projects from Apart's interpretability hackathon, Meta AI-math transformer interpretability and considerations on what to spend time on in AI Safety.
Considerations on the funding situation for AI Safety, exciting projects from Apart's interpretability hackathon, Meta AI-math transformer interpretability and considerations on what to spend time on in AI Safety.
00:25 Thoughts of the FTX-Crash and the Future of EA
2:20 Alignment Hackathon Interpretability Research
4:20 Mathematical Transformer Interpretability
Sources:
Evan Hubinger on why Fraud must never happen in Effective Altruism: https://www.lesswrong.com/posts/8wYH4WggxFqT9yhzJ/we-must-be-very-clear-fraud-in-the-service-of-effective
Strawberry Calm on what we can consider in EA: https://forum.effectivealtruism.org/posts/yMKmCbmL8ekDcJhQd/ (in the comments)
Emergency funding: https://www.lesswrong.com/posts/SkoxdYCAozBPfcJrP/announcing-nonlinear-emergency-funding
Strawberry Calm on what we can consider in EA: https://forum.effectivealtruism.org/posts/yMKmCbmL8ekDcJhQd/ (in the comments)
Emergency funding: https://www.lesswrong.com/posts/SkoxdYCAozBPfcJrP/announcing-nonlinear-emergency-funding
Apart's interpretability hackathon, https://itch.io/jam/interpretability
1st price: Investigating Neuron Behaviour via Dataset Example Pruning and Local Search, https://alexfoote.itch.io/investigating-neuron-behaviour-via-dataset-example-pruning-and-local-search
2nd price: Backup Head Behaviour is Robust to the Distribution Used to Perform the Ablation, https://satojk.itch.io/backup-transformer-heads-are-robust
3rd price: Model editing hazards at the example of ROME, https://jas-ho.itch.io/model-editing-hazards-at-the-example-of-rome
1st price: Investigating Neuron Behaviour via Dataset Example Pruning and Local Search, https://alexfoote.itch.io/investigating-neuron-behaviour-via-dataset-example-pruning-and-local-search
2nd price: Backup Head Behaviour is Robust to the Distribution Used to Perform the Ablation, https://satojk.itch.io/backup-transformer-heads-are-robust
3rd price: Model editing hazards at the example of ROME, https://jas-ho.itch.io/model-editing-hazards-at-the-example-of-rome
Thoughts on buying time for AI Safety
https://www.lesswrong.com/posts/bkpZHXMJx3dG5waA7/ways-to-buy-time
https://www.lesswrong.com/posts/bkpZHXMJx3dG5waA7/ways-to-buy-time
Martin Soto criticizes Vannessa Kosoy's PreDCA-protocol for interpretability https://www.lesswrong.com/posts/FhKkFcojhKZt7nHzG/a-short-critique-of-vanessa-kosoy-s-predca-1
Will we run out of ML-data? https://www.lesswrong.com/posts/Couhhp4pPHbbhJ2Mg/will-we-run-out-of-ml-data-evidence-from-projecting-dataset
Black box agnostic models are worth still considering? https://www.lesswrong.com/posts/uXGLciramzNfb8Hvz/why-i-m-working-on-model-agnostic-interpretability
Instrumental convergence explains why general intelligence is possible https://www.lesswrong.com/posts/GZgLa5Xc4HjwketWe/instrumental-convergence-is-what-makes-general-intelligence
Will we run out of ML-data? https://www.lesswrong.com/posts/Couhhp4pPHbbhJ2Mg/will-we-run-out-of-ml-data-evidence-from-projecting-dataset
Black box agnostic models are worth still considering? https://www.lesswrong.com/posts/uXGLciramzNfb8Hvz/why-i-m-working-on-model-agnostic-interpretability
Instrumental convergence explains why general intelligence is possible https://www.lesswrong.com/posts/GZgLa5Xc4HjwketWe/instrumental-convergence-is-what-makes-general-intelligence
Opportunities:
- AI impacts is still looking for a senior Research Analyst
- And Anthropic is still looking for a senior software engineer
- While Center of AI Safety is looking for a chief of staff
- David Krueger’s lab is looking for collaborators
