What it does
/model switches the active Claude model for your session. Claude Code ships with three models: Opus 4.8 (most capable, slowest), Sonnet 4.6 (balanced speed/quality), and Haiku 4.5 (fast, cost-efficient). When you call /model, you pick which one handles your next query and all subsequent messages until you switch again.
/model opus # Switch to Opus 4.8
/model sonnet # Switch to Sonnet 4.6 (default)
/model haiku # Switch to Haiku 4.5
The active model is shown in your status line. Each model has different latency, reasoning depth, and token cost — so switching mid-session is how you tune the tool to the task at hand.
When to use it
Haiku is your default for most coding work. It's 2–3× faster than Sonnet, handles straightforward tasks (grep searches, small edits, code review at low effort), and costs a fraction as much. Use it for quick lookups, simple refactors, and exploratory queries.
Sonnet is the baseline for anything complex: multi-file changes, architecture decisions, security review, or when Haiku's response feels shallow. It's fast enough for interactive use and has the reasoning depth to catch subtle bugs.
Opus is for the hard problems — intricate refactors, novel designs, adversarial code review, or when you're stuck and need a fresh angle. It's 5–10× slower than Haiku, so don't reach for it reflexively; use it when you've hit a wall or the cost of a wrong answer is high.
Caching interaction: Switching models within the same conversation does NOT reset your token cache. Your prior messages stay in the prompt — the new model just reads them again with its own parameters. This means you can scaffold a conversation with Haiku, spot a gap, jump to Sonnet or Opus for that one hard query, and drop back to Haiku. Your context is preserved, and the cache amortizes the cost of the heavier model over the full session. (If you start a new conversation with a different model, the cache is empty for that conversation.)
Try it yourself
Start a conversation with Haiku and ask it to plan a complex refactor. If the plan feels oversimplified, call /model sonnet and ask it to critique or deepen the plan. Notice how Sonnet's response references the prior context without you repeating anything. When you're done, drop back to Haiku for the mechanical edits. This is the efficiency pattern: let Haiku run the session, call in Sonnet for the high-variance moments.
Gotchas
Model switching persists for the session. If you call /model opus, every following query uses Opus until you switch again. It's easy to forget you're on Opus and burn through your quota. Make it a habit to drop back to Haiku or Sonnet when the hard work is done.
Token cache doesn't clear on model switch, but does on new conversation. Starting a fresh conversation with Opus means a cold cache and slower first response. Reuse sessions when possible.
Sonnet is the real workhorse. The temptation is to flip between Opus and Haiku, but Sonnet is usually the right call — it's 80% of Opus's reasoning at 40% of the latency. Save Opus for genuinely stuck moments.
Fast mode doesn't change the model. /fast makes Opus output (not thinking) stream faster. It doesn't downgrade to Haiku; if you want speed, call /model haiku explicitly.
Try it yourself
Type the command in the fake terminal. Nothing leaves your browser.