4 pieces selected from The Gradient, AI Alignment Forum — only the ones worth your time.
1. After Orthogonality: Virtue-Ethical Agency and AI Alignment
The Gradient
This essay challenges the foundational assumption of most AI alignment work — that rational agents must have goals — by arguing that human rationality is not goal-directed but practice-directed. Drawing on virtue ethics (Aristotle, MacIntyre), the author contends that human actions are rational insofar as they conform to practices: structured networks of actions, dispositions, and evaluation criteria that are socially embedded and historically developed. The orthogonality thesis (that any level of intelligence can be paired with any goal) is therefore built on a flawed model of agency, one that treats rationality as purely instrumental toward fixed objectives.
Why it matters
Most production AI systems today — from RLHF-tuned LLMs to autonomous agents — are designed around objective functions or reward signals, implicitly accepting the goal-directed model the essay critiques. If the virtue-ethics framing is correct, aligning AI by specifying goals or reward functions is structurally incomplete: it ignores the role of context, social practice, and dispositional character in producing reliably good behavior. Developers building agentic systems, evaluation frameworks, or AI assistants need to consider whether embedding practices and dispositions — not just objectives — would produce more robust, trustworthy behavior in open-ended real-world deployment.
What you can build with this
Build a small agentic coding assistant that is evaluated not by task completion rate alone, but by adherence to a defined set of software engineering 'practices' (e.g., always write a test before patching, always explain a change before making it, always flag uncertainty). Encode these as constitutional rules or system-level behavioral constraints, then compare user trust and error rates against a purely objective-optimizing baseline (minimize bugs fixed per session). This operationalizes the practice-vs-goal distinction in a measurable way.
Key takeaways
- The orthogonality thesis assumes goal-directed rationality, but virtue ethics argues rationality is constituted by practices — socially structured patterns of action with internal standards of excellence — not by pursuit of terminal goals.
- Designing AI alignment around objective functions or reward signals may be insufficient because it omits dispositional character: the stable, context-sensitive tendencies that make an agent reliably good rather than just locally optimal.
- A practice-based alignment approach would evaluate AI behavior against conformance to domain-specific practices (e.g., medical, legal, engineering norms) rather than scalar rewards, which maps more naturally to how human professional competence is actually assessed.
2. Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research
The Gradient
This essay from The Gradient examines the shifting relationship between formal mathematics and practical machine learning progress over the past decade. The central observation is that carefully constructed, mathematically principled architectures (think geometric deep learning, equivariant networks, or theoretically grounded attention mechanisms) have increasingly yielded only marginal empirical gains compared to brute-force scaling — more compute, more data, and engineering iteration. The essay traces how the field has moved from theory-first design toward empiricism-first development, where the math often follows the results rather than leading them.
Why it matters
For developers building AI products, this reframes when to invest in mathematical sophistication versus engineering scale. If you're working in a domain with abundant data and compute, layering geometric or algebraic constraints on your model architecture is unlikely to beat scaled transformers. However, if your problem involves genuine symmetries — molecular property prediction, robotics, medical imaging, time-series with known physical structure — mathematically motivated inductive biases can meaningfully reduce your data requirements and improve out-of-distribution generalization. Knowing which regime you're in prevents wasted effort chasing theoretical elegance when you need empirical throughput, or chasing scale when domain structure could do the heavy lifting.
What you can build with this
Pick a small structured-data problem you already have (e.g., predicting properties of molecules, sensor data from a physical system, or spatial graph data) and run a direct comparison: train a standard transformer or MLP baseline versus an architecture with an explicit symmetry constraint (e.g., an E(3)-equivariant network using the e3nn library for 3D molecular data, or a graph network that enforces permutation invariance). Measure accuracy and sample efficiency at 10%, 25%, and 100% of your training data. This concretely answers whether your specific domain benefits from mathematical structure — which is the exact question the essay leaves productively open.
Key takeaways
- Mathematically principled architectures have not kept pace with scale-driven approaches on general benchmarks, meaning symmetry and geometry constraints are most valuable in low-data or physics-constrained domains, not as universal design principles.
- Mathematics is increasingly used as an interpretive tool in ML — explaining generalization behavior of empirically discovered architectures — rather than as a generative tool for designing them from first principles.
- Inductive biases from group theory (equivariance, invariance) remain a legitimate engineering lever when your data genuinely exhibits those symmetries, because they reduce the hypothesis space the model must search, directly improving sample efficiency.
3. AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines
AI Alignment Forum
Ryan Greenblatt (AI safety researcher) has updated his probability estimate for full AI R&D automation by end of 2028 from ~15% to ~30%, driven by several concrete observations. The key triggers: Claude Opus 4.5 and Codex 5.2 exceeded expectations on benchmarks and real tasks, then follow-up models exceeded expectations again even after the bar was raised. METR's 50%-reliability time horizon on autonomous tasks showed roughly 3.5-month doubling times throughout 2025, with a sharp jump at the start of 2026. He also personally witnessed AI systems completing 'Easy-and-cheap-to-verify SWE tasks' (ES tasks) that would take humans months to years, using only moderately sophisticated scaffolding — including a C compiler written almost entirely autonomously by Claude.
Why it matters
If current publicly available models already have a 50% reliability time horizon of somewhere between one month and several years on well-scoped SWE tasks — meaning tasks where correctness is cheaply verifiable and novelty requirements are low — then the practical threshold for deploying AI agents on substantial engineering work is already here, not on the horizon. Developers building products today should be designing systems around AI agents capable of multi-day to multi-week autonomous task execution, not just single-step code completion. The bottleneck is increasingly scaffolding quality and task decomposition, not raw model capability.
What you can build with this
Build a minimal autonomous SWE agent scaffold this week that targets ESNI tasks specifically: pick a well-specified, verifiable task (e.g., migrate a medium-sized codebase from one dependency version to another, or implement a full feature from a detailed spec with an automated test suite). Wire Claude or GPT-4o into a loop with bash execution, file I/O, and a test runner as the verifier. Measure how far the agent gets without human intervention, log where it fails, and iterate on the scaffolding. This directly replicates the class of tasks Greenblatt says current models can already handle at 50% reliability.
Key takeaways
- METR's 50%-reliability autonomous task time horizon doubled roughly every 3.5 months in 2025, meaning capability is compounding faster than most baseline extrapolations assumed.
- The 50% vs 90% reliability gap is enormous: Greenblatt estimates 50% reliability extends to months-or-years of human-equivalent work, while 90% reliability is still measured in hours-to-days — task difficulty distribution and verification strategy are the critical engineering variables.
- Scaffolding quality (not raw model capability) is currently the binding constraint on very large autonomous tasks, implying that investing in better agent infrastructure now yields returns proportional to future model improvements at no additional model cost.
4. [Paper] Stringological sequence prediction I
AI Alignment Forum
This paper introduces sequence prediction algorithms grounded in stringology — the formal study of string structure and compression. The algorithms achieve provable mistake bounds tied to two specific complexity measures: the size of the smallest straight-line program (SLP) that generates a sequence, and the number of states in the minimal deterministic finite automaton (DFA) that can compute any symbol given its position in a numeric base representation. Both measures capture different notions of 'compressibility' or regularity in a sequence, and the algorithms exploit this structure to predict efficiently in time and space.
Why it matters
Most practical sequence prediction today relies on neural approaches with no formal guarantees on sample efficiency or worst-case error. This work provides algorithms with provable mistake bounds tied to the structural complexity of the sequence itself — meaning if a sequence has low SLP size or a small position-to-symbol automaton, these algorithms will make predictably few errors. For developers building systems that need interpretable, auditable, or safety-critical prediction — think log anomaly detection, protocol parsing, or compression-adjacent tasks — this offers a theoretically grounded alternative or complement to learned models, and sets the stage for the authors' planned bridging of agent foundations to practical algorithms.
What you can build with this
Implement a small benchmark harness that generates sequences of varying stringological complexity (e.g., Fibonacci strings, Thue-Morse sequences, random strings) and compares the mistake rates of a baseline n-gram predictor against an SLP-size-aware predictor. Use an existing SLP approximation library (e.g., via LZ77 or Re-Pair compression as SLP proxies) to estimate sequence complexity, then plot empirical mistake count vs. theoretical SLP size bound to validate the paper's claims on real compressible sequences like source code tokens or network packet payloads.
Key takeaways
- Mistake bounds are tied to SLP size and minimal DFA state count — sequences that are more compressible in these stringological senses will incur provably fewer prediction errors, not just empirically but by construction.
- The algorithms are both time and space efficient, making them candidates for online/streaming prediction scenarios where you cannot afford to store the full sequence history.
- This is the first paper in a series explicitly designed to connect agent foundations theory (AIXI-style universal prediction, Solomonoff induction) with practical, implementable algorithms — future installments are expected to extend to richer complexity classes.