Robustness & Evolution W02
Welcome to this week’s ML Safety Report where we talk about robustness in machine learning and the human-AI dichotomy. Stay until the end to check out several amazing competitions you can participate in today.
Welcome to this week’s ML Safety Report where we talk about robustness in machine learning and the human-AI dichotomy. Stay until the end to check out several amazing competitions you can participate in today.
Opportunities
- Benchmarking competition ideas: https://ais.pub/safebenchideas
- Goal misgeneralization prize: https://ais.pub/gmp
- Shutdown prize: https://ais.pub/sdp
- SERI conference staff: https://ais.pub/sericonf
- Century fellowship: https://ais.pub/century
- Mechanistic interpretability hackathon in 10 locations: https://ais.pub/mechint
- AI trends hackathon ideas: https://aisafetyideas.com/list/ai-trends
- AI trends hackathon resources: https://alignmentjam.com/ai-trends
Sources
- David Krueger interview: https://youtu.be/bDMqo7BpNbk
- How would the viewer feel? (12/12 Mazeika, Tang, et al.): https://arxiv.org/pdf/2210.10039.pdf
- Mitigating lies in vision-language models: https://openreview.net/pdf?id=mAiTuIeWbxD
- Benchmarking generalized out-of-distribution detection: https://openreview.net/pdf?id=gT6j4_tskUt
- Certified adversarial robustness using unmodified pretrained models: https://arxiv.org/pdf/2206.10550.pdf
- Natural selection favors AIs over humans (12/12, Hendrycks): https://drive.google.com/file/d/1p4ZAuEYHL_21tqstJOGsMiG4xaRBtVcj/view
- Impossibility theorems for feature attribution, criticizing SHAP and more: https://arxiv.org/pdf/2212.11870.pdf
- TextGrad adversarial attacks for text: https://arxiv.org/pdf/2212.09254.pdf
- The AI trends hackathon: https://itch.io/jam/latam-ais
- Epoch AI: https://epochai.org/
- Intro talk: https://www.youtube.com/embed/_huhnRjphTw
ML Safety newsletter: https://newsletter.mlsafety.org/p/ml-safety-newsletter-7
