3 pieces selected from The Gradient — only the ones worth your time.
1. After Orthogonality: Virtue-Ethical Agency and AI Alignment
The Gradient
This essay from The Gradient challenges a foundational assumption in AI alignment: that intelligent agents should be goal-directed. The author argues that human rationality doesn't operate through the pursuit of terminal goals but through alignment to 'practices' — structured networks of actions, dispositions, and evaluation criteria drawn from virtue ethics (Aristotle, MacIntyre). The claim is that the standard 'orthogonality thesis' — that any level of intelligence can be combined with any goal — is not just dangerous but philosophically wrong about how rational agency actually works.
Why it matters
Most production AI systems today are designed around objective functions, reward signals, or explicit goal specifications — the very architecture this essay challenges. If the argument holds, developers building agentic AI systems (AutoGPT-style loops, multi-step planners, autonomous coding agents) may be encoding a fundamental misunderstanding of rationality into their systems. Instead of asking 'what is the agent's goal,' a virtue-ethical framing asks 'what practices should govern the agent's behavior' — a shift that has immediate implications for how you design guardrails, evaluation criteria, and agent constitutions in systems like those built on LangChain, CrewAI, or custom RLHF pipelines.
What you can build with this
Build a small agentic coding assistant that uses a 'practice-based constitution' instead of a goal specification: define a set of named practices (e.g., 'prefer reversible actions,' 'verify before executing,' 'ask when ambiguous') and implement a lightweight pre-action evaluation layer that scores proposed actions against each practice before execution. Compare task completion quality and failure modes against a baseline agent given only a terminal goal ('complete the task'). Document which failure modes each architecture produces.
Key takeaways
- The orthogonality thesis assumes goal-directed agency is the natural form of rationality, but virtue ethics argues rationality is constituted by practices — stable, socially-embedded patterns of action-evaluation — not by goal optimization.
- Replacing terminal goals with practice-alignment in AI agent design would mean evaluation criteria are baked into the agent's decision process at every step, not just at outcome measurement, which structurally reduces Goodhart's Law-style failures.
- This framing has a concrete implementation analogue: Anthropic's 'model spec' and 'constitutional AI' approaches are closer to practice-based design than goal-based design, suggesting the field is already drifting toward virtue-ethical architecture without fully theorizing it.
2. Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research
The Gradient
This essay from The Gradient examines the shifting role of mathematics in machine learning research over the past decade. The central observation is that carefully designed, mathematically principled architectures (think geometric deep learning, equivariant networks, or explicitly symmetry-aware models) have largely been outpaced by brute-force scaling — bigger models, more data, more compute — often built on relatively simple architectural primitives like the transformer. The empirical-first, engineering-heavy paradigm has repeatedly beaten theory-first approaches on benchmark after benchmark.
Why it matters
For developers building AI products today, this tension has direct practical consequences: investing engineering effort in highly specialized, mathematically elegant architectures (e.g., graph neural networks with hand-crafted symmetry constraints for your specific domain) may yield diminishing returns compared to fine-tuning a large pretrained model. However, the essay implies this isn't a permanent defeat for principled math — in data-scarce domains like drug discovery, materials science, or robotics, symmetry-aware architectures still carry a meaningful edge. Knowing when to lean on scale versus when to lean on structure is a critical product decision.
What you can build with this
Pick a small, domain-specific prediction task where you have limited labeled data (e.g., predicting molecular properties, 3D object classification, or time-series with known periodic structure). Implement two competing baselines this week: one using a pretrained general-purpose model fine-tuned on your data, and one using a structurally-informed model (e.g., an E(3)-equivariant network via the e3nn library for molecular data, or a graph neural network with explicit symmetry). Measure accuracy, sample efficiency, and training time — the results will tell you concretely which regime your problem lives in.
Key takeaways
- Scaling compute and data has empirically outperformed mathematically principled architecture design on most standard benchmarks, making engineering-first approaches the dominant paradigm in production AI.
- Symmetry-aware and geometry-informed architectures (equivariant networks, group-theoretic models) retain a practical advantage in low-data, high-structure domains like molecular biology and physics simulation — where inductive biases reduce sample complexity measurably.
- The split between 'math-first' and 'scale-first' ML is not merely academic: it should directly inform your architecture selection strategy based on data availability, domain structure, and the cost of labeled examples in your specific application.
3. AGI Is Not Multimodal
The Gradient
This essay from The Gradient argues that current multimodal generative AI systems — those that process text, images, audio, and video — are not on the path to AGI, despite widespread claims to the contrary. The central thesis, drawing on Terry Winograd's critique of language as a model for thought, is that multimodality is a red herring: adding more sensory modalities to a fundamentally language-centric architecture doesn't close the gap between pattern recognition and genuine intelligence. The essay contends that what's missing is tacit, embodied understanding — the kind of grounded, physical, contextual knowledge humans develop through being in the world, not through processing representations of it.
Why it matters
Developers building AI products today are often tempted to treat multimodal capability as a proxy for robustness or generality, choosing models or architectures based on how many modalities they support. This essay is a direct warning against that assumption: a model that can process images and audio still fails at tasks requiring genuine physical or causal understanding, and shipping products that depend on such understanding will produce brittle, unpredictable user experiences. Understanding where current architectures fundamentally break down helps developers scope their systems more honestly, avoid overclaiming in product design, and invest in the right mitigation strategies like retrieval, grounding, and human oversight.
What you can build with this
Build a benchmark harness that tests a multimodal model (e.g., GPT-4o or Gemini 1.5) on tasks requiring physical commonsense or causal reasoning — such as predicting what happens when objects interact, interpreting ambiguous spatial relationships in photos, or answering 'what would you feel if you touched this?' from an image. Log where the model confidently fails versus hedges correctly, and use this dataset to create a model confidence calibration layer for your own product, flagging queries that fall into known failure modes before they reach users.
Key takeaways
- Multimodality (adding image, audio, video inputs) does not equate to embodied understanding — models still operate on statistical representations, not grounded physical experience.
- The architectural assumption that language is the right substrate for general intelligence is the core limitation being critiqued; scaling more modalities on top of that substrate doesn't resolve it.
- Product reliability depends on developers explicitly identifying and bounding the tacit-knowledge failure modes of their chosen models, rather than assuming multimodal capability implies generality.