4 pieces selected from The Gradient, AI Alignment Forum — only the ones worth your time.
1. After Orthogonality: Virtue-Ethical Agency and AI Alignment
The Gradient
This essay challenges the foundational assumption of mainstream AI alignment research — the orthogonality thesis — which holds that an agent can have any combination of intelligence and goals. The author argues that rational human agency is not goal-directed in the classical sense. Instead, humans act rationally by aligning behavior to 'practices': structured networks of actions, dispositions, and evaluation criteria inherited from social and cultural contexts. Goals, in this view, are derivative outputs of practices, not the inputs that drive behavior.
Why it matters
Most current AI safety and alignment work — including RLHF, Constitutional AI, and reward modeling — implicitly assumes a goal-directed architecture where you specify objectives and the system optimizes toward them. If the essay's critique holds, this framing is fundamentally broken: you cannot align an AI by specifying goals alone, because well-aligned human behavior is grounded in internalized practices and virtues, not terminal objectives. Developers building AI agents, autonomous systems, or any product where the AI must generalize to novel situations should take seriously the possibility that reward/goal specification is insufficient and that practice-based or virtue-ethical constraints need to be part of the design.
What you can build with this
Build a small AI agent evaluation harness that tests whether a GPT-4-class model behaves consistently across novel edge cases not covered by its explicit instructions — specifically probing whether the model exhibits 'virtue-like' generalization (e.g., honesty, care, appropriate restraint) versus pure instruction-following. Use this to empirically distinguish goal-following behavior from practice-consistent behavior, and document where the gap is largest.
Key takeaways
- The orthogonality thesis — that any goal can be paired with any level of intelligence — is contested: the essay argues that high rationality actually constrains what kinds of motivational structures are coherent, making pure goal-directedness a poor model for aligned AI.
- Virtue ethics offers a concrete alternative alignment target: rather than specifying terminal goals, you encode stable dispositions (honesty, prudence, care) that generalize across contexts the same way human character traits do.
- Current alignment methods like RLHF optimize for approval signals tied to specific outcomes, not for the underlying practices that make behavior robustly good — this structural gap may explain why aligned models still fail unpredictably on out-of-distribution inputs.
2. Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research
The Gradient
This essay from The Gradient examines the shifting role of mathematics in modern machine learning research. Over the past decade, the field has moved away from carefully crafted, mathematically principled architectures toward compute-intensive, engineering-first approaches that scale with data and parameters. The argument is that empirical scaling laws and brute-force optimization have repeatedly outperformed theoretically elegant designs, raising questions about whether deep mathematical reasoning still drives meaningful progress in ML.
Why it matters
For developers building AI products today, this tension has direct practical implications: investing heavily in architecturally novel, mathematically motivated components rarely beats fine-tuning a larger pre-trained model with more data. Understanding that the field's progress is currently driven more by scaling and engineering discipline than by theoretical breakthroughs helps you prioritize where to spend effort — data quality, compute budget, and systems engineering often yield faster returns than custom architectural innovations.
What you can build with this
Run a controlled benchmark this week comparing a mathematically motivated custom architecture (e.g., a graph neural network or equivariant model designed around your data's known symmetries) against a vanilla transformer fine-tuned on the same dataset. Measure both final performance and engineering hours invested. This directly tests the essay's central claim on your specific problem domain and gives you concrete data to inform future architecture decisions.
Key takeaways
- Empirical scaling laws have repeatedly shown that adding compute and data outperforms switching to more mathematically principled architectures, making engineering and data pipelines higher-leverage than novel math for most production ML problems.
- Geometric and symmetry-aware methods (e.g., equivariant networks) retain genuine value in constrained-data domains like molecular biology or physics simulation, where the mathematical structure of the problem is well-defined and data is expensive — but not in general-purpose NLP or vision tasks.
- The declining marginal returns of mathematical rigor in ML research do not mean math is irrelevant — understanding loss landscapes, generalization bounds, and optimization theory still informs debugging and architectural choices, but it no longer predicts which models will win at scale.
3. AGI Is Not Multimodal
The Gradient
This essay from The Gradient argues that current multimodal large language models — systems that process text, images, audio, and video — are fundamentally insufficient as a path to AGI, despite widespread belief that adding more modalities brings us closer to human-level intelligence. The core argument, grounded in Terry Winograd's critique and embodied cognition research, is that human intelligence is not primarily linguistic or perceptual in the representational sense these models use, but is instead rooted in tacit, embodied understanding — the kind of knowledge that comes from physically inhabiting and acting in a world.
Why it matters
Developers building AI products often inherit the implicit assumption that scaling multimodal models will eventually produce systems that 'truly understand' tasks, leading to over-reliance on model outputs in domains requiring causal, physical, or procedural reasoning. This essay is a useful corrective: it helps developers set realistic expectations about where current models will fail reliably — especially in robotics, real-world planning, physical skill instruction, and any task where tacit knowledge matters — and should inform how you design fallbacks, human-in-the-loop checkpoints, and product scope boundaries.
What you can build with this
Build a structured failure-case logger for a multimodal model (e.g., GPT-4o or Gemini) in a domain requiring tacit or physical knowledge — such as cooking instructions, furniture assembly guidance, or sports coaching cues. Feed the model ambiguous real-world scenarios that require embodied common sense, log where it confidently fails, and publish a categorized dataset of failure modes. This both validates the essay's thesis empirically and produces a useful benchmark artifact for the community.
Key takeaways
- Adding modalities (vision, audio) to LLMs does not address the absence of embodied, action-grounded understanding — multimodality is a perceptual extension, not a cognitive one.
- Tacit knowledge — the kind humans acquire through physical interaction with the world — is not representable in training data distributions drawn from human-generated media, making it structurally out of reach for current architectures.
- AGI framing around multimodal models risks misdirecting product and research investment toward scaling existing architectures rather than exploring fundamentally different paradigms like sensorimotor learning or world models grounded in physical simulation.
4. AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines
AI Alignment Forum
I've recently updated towards substantially shorter AI timelines and much faster progress in some areas. [1] The largest updates I've made are (1) an almost 2x higher probability of full AI R&D automation by EOY 2028 (I'm now a bit below 30% [2] while I was previously expecting around 15%; my guesses are pretty reflectively unstable) and (2) I expect much stronger short-term
Key takeaways