6 May 2026·12 min read·AIResearch DigestAI Research Digest — 6 May 20261Exploration Hacking: Can LLMs Learn to Resist RL Training?2[Linkpost3Risk from fitness-seeking AIs: mechanisms and mitigations+1 more inside →→
2 May 2026·9 min read·AIResearch DigestAI Research Digest — 2 May 20261Sleeper Agent Backdoor Results Are Messy2Risk from fitness-seeking AIs: mechanisms and mitigations3Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers+1 more inside →→
1 May 2026·12 min read·AIResearch DigestAI Research Digest — 1 May 20261Sleeper Agent Backdoor Results Are Messy2Research Sabotage in ML Codebases3Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers+2 more inside →→
28 Apr 2026·7 min read·AIResearch DigestAI Research Digest — 28 April 2026Three pieces worth your time this week: agent skill packages as a new attack surface, the widening gap between LLM-reported and actual completion, and what TileLang means for kernel-level performance work.AI ResearchDigest→
21 Apr 2026·9 min read·AIResearch DigestAI Research Digest — 21 April 20261Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability2Current AIs seem pretty misaligned to me3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research+1 more inside →→
20 Apr 2026·9 min read·AIResearch DigestAI Research Digest — 20 April 20261Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability2Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research+1 more inside →→
19 Apr 2026·9 min read·AIResearch DigestAI Research Digest — 19 April 20261Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability2Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research+1 more inside →→
18 Apr 2026·9 min read·AIResearch DigestAI Research Digest — 18 April 20261Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability2Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research+1 more inside →→
17 Apr 2026·9 min read·AIResearch DigestAI Research Digest — 17 April 20261Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability2Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research+1 more inside →→
16 Apr 2026·8 min read·AIResearch DigestAI Research Digest — 16 April 20261Current AIs seem pretty misaligned to me2After Orthogonality: Virtue-Ethical Agency and AI Alignment3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research+1 more inside →→
15 Apr 2026·9 min read·AIResearch DigestAI Research Digest — 15 April 20261Current AIs seem pretty misaligned to me2Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes3After Orthogonality: Virtue-Ethical Agency and AI Alignment+1 more inside →→
14 Apr 2026·7 min read·AIResearch DigestAI Research Digest — 14 April 20261Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes2After Orthogonality: Virtue-Ethical Agency and AI Alignment3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research→
13 Apr 2026·8 min read·AIResearch DigestAI Research Digest — 13 April 20261After Orthogonality: Virtue-Ethical Agency and AI Alignment2Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research3My picture of the present in AI+1 more inside →→
12 Apr 2026·8 min read·AIResearch DigestAI Research Digest — 12 April 20261After Orthogonality: Virtue-Ethical Agency and AI Alignment2Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research3AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines+1 more inside →→
11 Apr 2026·6 min read·AIResearch DigestAI Research Digest — 11 April 20261After Orthogonality: Virtue-Ethical Agency and AI Alignment2Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research3AGI Is Not Multimodal→
10 Apr 2026·6 min read·AIResearch DigestAI Research Digest — 10 April 20261After Orthogonality: Virtue-Ethical Agency and AI Alignment2Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research3AGI Is Not Multimodal+1 more inside →→
9 Apr 2026·9 min read·AIResearch DigestAI Research Digest — 9 April 20261After Orthogonality: Virtue-Ethical Agency and AI Alignment2Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research3AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines+1 more inside →→
8 Apr 2026·9 min read·AIResearch DigestAI Research Digest — 8 April 20261After Orthogonality: Virtue-Ethical Agency and AI Alignment2Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research3AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines+1 more inside →→
7 Apr 2026·6 min read·AIResearch DigestAI Research Digest — 7 April 20261After Orthogonality: Virtue-Ethical Agency and AI Alignment2Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research3AGI Is Not Multimodal→
6 Apr 2026·6 min read·AIResearch DigestAI Research Digest — 6 April 20261(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL2Predicting When RL Training Breaks Chain-of-Thought Monitorability3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research→
5 Apr 2026·5 min read·AIResearch DigestAI Research Digest — 5 April 20261(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL2Predicting When RL Training Breaks Chain-of-Thought Monitorability3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research→
4 Apr 2026·6 min read·AIResearch DigestAI Research Digest — 4 April 20261(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL2Predicting When RL Training Breaks Chain-of-Thought Monitorability3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research→
3 Apr 2026·8 min read·AIResearch DigestAI Research Digest — 3 April 20261(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL2Predicting When RL Training Breaks Chain-of-Thought Monitorability3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research+1 more inside →→
2 Apr 2026·9 min read·AIResearch DigestAI Research Digest — 2 April 20261(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL2Predicting When RL Training Breaks Chain-of-Thought Monitorability3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research+1 more inside →→
1 Apr 2026·9 min read·AIResearch DigestAI Research Digest — 1 April 20261(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL2Predicting When RL Training Breaks Chain-of-Thought Monitorability3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research+1 more inside →→
31 Mar 2026·8 min read·AIResearch DigestAI Research Digest — 31 March 20261(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL2A Toy Environment For Exploring Reasoning About Reward3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research+1 more inside →→
30 Mar 2026·9 min read·AIResearch DigestAI Research Digest — 30 March 20261(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL2A Toy Environment For Exploring Reasoning About Reward3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research+1 more inside →→
29 Mar 2026·10 min read·AIResearch DigestAI Research Digest — 29 March 20261Metagaming matters for training, evaluation, and oversight2Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research3The State Of LLMs 2025: Progress, Problems, and Predictions+1 more inside →→
28 Mar 2026·8 min read·AIResearch DigestAI Research Digest — 28 March 20261Metagaming matters for training, evaluation, and oversight2Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research3The State Of LLMs 2025: Progress, Problems, and Predictions→
27 Mar 2026·9 min read·AIResearch DigestAI Research Digest — 27 March 20261“Act-based approval-directed agents”, for IDA skeptics2Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research3Metagaming matters for training, evaluation, and oversight→
26 Mar 2026·11 min read·AIResearch DigestAI Research Digest — 26 March 20261Metagaming matters for training, evaluation, and oversight2Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research3A Toy Environment For Exploring Reasoning About Reward+1 more inside →→
20 Mar 2026·8 min read·AIResearch DigestAI Research Digest — 21 March 20261Power Steering: Behavior Steering via Layer-to-Layer Jacobian Singular Vectors2Metagaming matters for training, evaluation, and oversight3Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research+1 more inside →→