4 pieces selected from The Gradient, AI Alignment Forum — only the ones worth your time.
1. After Orthogonality: Virtue-Ethical Agency and AI Alignment
The Gradient
This essay challenges the foundational assumption behind most AI alignment research: that rational agents must have goals. Drawing on virtue ethics and philosophy of action, the author argues that human rationality is not goal-directed but practice-directed — we act rationally by conforming to 'practices,' which are networks of actions, dispositions, and evaluation criteria embedded in social and institutional contexts. The orthogonality thesis (the idea that any level of intelligence can be combined with any goal) is challenged on the grounds that it misunderstands the nature of rational agency itself.
Why it matters
Most AI systems — including modern LLMs and RL-trained agents — are built around objective functions or reward signals, implicitly endorsing the goal-directed model the essay critiques. If the essay's argument holds, then aligning AI by specifying better goals or constraints may be structurally misguided. Developers building autonomous agents, copilots, or decision-making systems should take seriously whether embedding 'practices' and contextual evaluation criteria (rather than terminal objectives) could produce more robust and safer behavior, especially in edge cases where goal-directed systems notoriously fail.
What you can build with this
Build a small autonomous agent (e.g., a coding assistant or task planner) where behavior is governed not by a single objective function but by a set of explicit, inspectable 'practice rules' — domain-specific behavioral norms, evaluation heuristics, and role constraints stored as structured prompts or a rule engine. Compare its failure modes on adversarial inputs against a standard goal-directed agent with the same capability level, and document where practice-based constraints prevent misaligned behavior that reward maximization would not.
Key takeaways
- The orthogonality thesis assumes intelligence and goals are separable, but the essay argues rational agency is constituted by practices — social and institutional action norms — not by optimization toward terminal goals.
- Virtue ethics offers a concrete alternative alignment framework: instead of specifying what an AI should maximize, specify the role-based dispositions and contextual evaluation standards it should embody.
- Goal-directed architectures may be fundamentally misaligned with how human rationality works, suggesting that agent design based on practices and norms could be more robust than reward shaping or constraint injection.
2. Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research
The Gradient
This essay from The Gradient examines the shifting relationship between formal mathematics and empirical machine learning research over the past decade. The central observation is that mathematically principled architecture design — grounded in symmetry, geometry, and structure (e.g., equivariant networks, geometric deep learning) — has increasingly been outpaced by brute-force scaling: larger datasets, more compute, and engineering-driven iteration. The author traces how fields like group theory, differential geometry, and topology were once seen as the path to sample-efficient, generalizable models, but transformer-scale training has repeatedly beaten hand-crafted inductive biases on benchmark after benchmark.
Why it matters
For developers building AI products today, this tension has a direct practical implication: investing in mathematically structured architectures (e.g., graph neural networks, equivariant models) may yield gains in low-data, high-symmetry domains like molecular modeling or robotics, but for most product use cases — language, vision, multimodal — throwing more data and compute at a general architecture will outperform clever mathematical design. Understanding where the math still buys you something (physical simulation, scientific ML, small-data regimes) versus where it doesn't helps you make better architectural and resource allocation decisions rather than chasing theoretical elegance that won't ship.
What you can build with this
Build a small comparative benchmark: take a domain with known structure (e.g., 2D molecular property prediction or a graph-based scheduling problem), implement both a standard transformer/MLP baseline and a structure-aware model (e.g., a message-passing GNN with symmetry constraints using PyTorch Geometric), then systematically vary training set size from 100 to 100k samples and plot accuracy vs. data volume. This will give you a concrete, reusable chart showing exactly where structural inductive biases stop paying off — a practical decision tool for your own projects.
Key takeaways
- Mathematically structured architectures (equivariant, geometric) retain a real advantage in low-data and high-symmetry domains, but this advantage collapses as training data scales into the millions.
- The dominant driver of ML progress since ~2017 has been compute and data scale, not architectural elegance — meaning mathematical priors are now a niche tool rather than the mainstream research path.
- Knowing the boundary conditions of when structure matters (small datasets, known physical symmetries, out-of-distribution generalization requirements) is the key engineering judgment call when choosing an architecture for a new product.
3. My picture of the present in AI
AI Alignment Forum
This post is a candid, first-person snapshot of the author's current beliefs about the state of AI as of April 2026, framed as a present-tense scenario forecast rather than future prediction. The author covers AI R&D acceleration at frontier labs, noting that serial research engineering speed-ups at OpenAI and Anthropic have reached roughly 1.6x — meaning AI tooling effectively makes engineers operate as if they're working 60% faster in aggregate, up from ~1.4x at the start of 2026. This gain comes from better models, improved tooling, and human adaptation including workflow changes and task selection shifts.
Why it matters
If you're building AI-assisted developer tooling, productivity platforms, or internal engineering infrastructure, the 1.6x aggregate speed-up figure is a concrete benchmark from a credible insider perspective. More importantly, the post flags a measurement bias that matters for product design: engineers aren't just doing the same work faster — they're shifting toward tasks where AI helps most and tackling work they couldn't previously do. This means measuring productivity gains by asking 'how much faster am I at my old tasks?' systematically underestimates true uplift, which has direct implications for how you instrument, evaluate, and market AI dev tools.
What you can build with this
Build a lightweight workflow audit tool that asks engineers to categorize each task they complete in a day along two axes: (1) AI assistance ratio (how much of the work was AI vs. human) and (2) whether they would have attempted this task without AI available. After two weeks of data collection, compute a corrected productivity multiplier that accounts for task-selection bias — distinguishing between acceleration on existing work vs. net-new work enabled by AI. This gives teams an honest baseline for measuring true AI uplift, which is more defensible than naive time-comparison metrics.
Key takeaways
- Frontier AI labs (OpenAI, Anthropic) are seeing ~1.6x serial engineering speed-up as of April 2026, up from ~1.4x at start of year — gains driven by better models, tooling, and human adaptation.
- Individual task speed-ups can be 3–10x for well-suited tasks, but aggregate productivity gains are far lower because many tasks see minimal AI benefit, pulling the overall multiplier down.
- The standard productivity measurement ('how long would this take without AI?') is systematically biased upward because engineers have already shifted their task mix toward AI-friendly work — a correction that any honest AI productivity measurement framework must account for.
4. AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines
AI Alignment Forum
A researcher on the AI Alignment Forum has updated their probability estimate for full AI R&D automation by end of 2028 from ~15% to ~30%, driven by concrete observations: multiple frontier models (referred to as Opus 4.5, Codex 5.2, and subsequent versions) exceeded expectations on benchmarks and real tasks, and METR data showed roughly 3.5-month doubling times on 50%-reliability task horizons throughout 2025. The author also cites direct demonstrations of AI systems completing software engineering tasks that would take humans months to years, using only moderately sophisticated scaffolding — including an almost-fully-autonomous C compiler written by Claude.
Why it matters
The 50%-reliability time horizon for 'easy-to-verify, low-ideation' SWE tasks is estimated to already be somewhere between one month and several years of human-equivalent work as of early 2026, using publicly available models. This means developers can likely delegate large, well-specified, mechanically complex engineering tasks to AI agents today — not in some future state — provided they invest in proper scaffolding and verification infrastructure. The gap between 50% and 90% reliability is large, so the operational challenge is building systems that catch failures, not assuming success.
What you can build with this
Build a scaffolded agent pipeline this week that takes a large, well-specified refactoring or migration task in your codebase (e.g., upgrading an API client library across hundreds of files, or converting a test suite to a new framework) and runs it with a Claude or GPT-4-class model in a loop with automated test execution as the verifier. Instrument it to measure at what task size or complexity the 50% success threshold breaks down — you'll directly calibrate your own sense of where current capability boundaries sit.
Key takeaways
- METR data showed ~3.5-month doubling times on AI 50%-reliability task horizons throughout 2025, meaning capability growth on long-horizon tasks is compounding rapidly, not plateauing.
- The author estimates current well-elicited AI can handle ESNI (easy-to-verify, low-ideation) SWE tasks at 50% reliability for work equivalent to months-to-years of human effort, but 90% reliability drops to hours-to-days — verification and retry infrastructure matters more than raw capability.
- Scaffolding overhang is identified as a significant factor: existing public models are likely underperforming their actual capability ceiling because most deployments use naive scaffolding, meaning better orchestration alone — without waiting for new models — can unlock substantially larger task completion.